Directory of Open Access Journals (Sweden)
Pablo Fresia
Full Text Available Insect pest phylogeography might be shaped both by biogeographic events and by human influence. Here, we conducted an approximate Bayesian computation (ABC analysis to investigate the phylogeography of the New World screwworm fly, Cochliomyia hominivorax, with the aim of understanding its population history and its order and time of divergence. Our ABC analysis supports that populations spread from North to South in the Americas, in at least two different moments. The first split occurred between the North/Central American and South American populations in the end of the Last Glacial Maximum (15,300-19,000 YBP. The second split occurred between the North and South Amazonian populations in the transition between the Pleistocene and the Holocene eras (9,100-11,000 YBP. The species also experienced population expansion. Phylogenetic analysis likewise suggests this north to south colonization and Maxent models suggest an increase in the number of suitable areas in South America from the past to present. We found that the phylogeographic patterns observed in C. hominivorax cannot be explained only by climatic oscillations and can be connected to host population histories. Interestingly we found these patterns are very coincident with general patterns of ancient human movements in the Americas, suggesting that humans might have played a crucial role in shaping the distribution and population structure of this insect pest. This work presents the first hypothesis test regarding the processes that shaped the current phylogeographic structure of C. hominivorax and represents an alternate perspective on investigating the problem of insect pests.
Shi, Xiao-Jun; Zhang, Ming-Li
2015-03-01
Zygophyllum xanthoxylon, a desert species, displaying a broad east-west continuous distribution pattern in arid Northwestern China, can be considered as a model species to investigate the biogeographical history of this region. We sequenced two chloroplast DNA spacers (psbK-psbI and rpl32-trnL) in 226 individuals from 31 populations to explore the phylogeographical structure. Median-joining network was constructed and analysis of AMOVA, SMOVA, neutrality tests and distribution analysis were used to examine genetic structure and potential range expansion. Using species distribution modeling, the geographical distribution of Z. xanthoxylon was modeled during the present and at the Last Glacial Maximum (LGM). Among 26 haplotypes, one was widely distributed, but most was restricted to either the eastern or western region. The populations with the highest levels of haplotype diversity were found in the Tianshan Mountains and its surroundings in the west, and the Helan Mountains and Alxa Plateau in the east. AMOVA and SAMOVA showed that over all populations, the species lacks phylogeographical structure, which is speculated to be the result of its specific biology. Neutrality tests and mismatch distribution analysis support past range expansions of the species. Comparing the current distribution to those cold and dry conditions in LGM, Z. xanthoxylon had a shrunken and more fragmented range during LGM. Based on the evidences from phylogeographical patterns, distribution of genetic variability, and paleodistribution modeling, Z. xanthoxylon is speculated most likely to have originated from the east and migrated westward via the Hexi Corridor.
Satler, Jordan D; Carstens, Bryan C
2016-05-01
Comparative phylogeographic investigations have identified congruent phylogeographic breaks in co-distributed species in nearly every region of the world. The qualitative assessments of phylogeographic patterns traditionally used to identify such breaks, however, are limited because they rely on identifying monophyletic groups across species and do not account for coalescent stochasticity. Only long-standing phylogeographic breaks are likely to be obvious; many species could have had a concerted response to more recent landscape events, yet possess subtle signs of phylogeographic congruence because ancestral polymorphism has not completely sorted. Here, we introduce Phylogeographic Concordance Factors (PCFs), a novel method for quantifying phylogeographic congruence across species. We apply this method to the Sarracenia alata pitcher plant system, a carnivorous plant with a diverse array of commensal organisms. We explore whether a group of ecologically associated arthropods have co-diversified with the host pitcher plant, and identify if there is a positive correlation between ecological interaction and PCFs. Results demonstrate that multiple arthropods share congruent phylogeographic breaks with S. alata, and provide evidence that the level of ecological association can be used to predict the degree of similarity in the phylogeographic pattern. This study outlines an approach for quantifying phylogeographic congruence, a central concept in biogeographic research. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
Deep divergences and extensive phylogeographic structure in a clade of lowland tropical salamanders
Directory of Open Access Journals (Sweden)
Rovito Sean M
2012-12-01
Full Text Available Abstract Background The complex geological history of Mesoamerica provides the opportunity to study the impact of multiple biogeographic barriers on population differentiation. We examine phylogeographic patterns in a clade of lowland salamanders (Bolitoglossa subgenus Nanotriton using two mitochondrial genes and one nuclear gene. We use several phylogeographic analyses to infer the history of this clade and test hypotheses regarding the geographic origin of species and location of genetic breaks within species. We compare our results to those for other taxa to determine if historical events impacted different species in a similar manner. Results Deep genetic divergence between species indicates that they are relatively old, and two of the three widespread species show strong phylogeographic structure. Comparison of mtDNA and nuclear gene trees shows no evidence of hybridization or introgression between species. Isolated populations of Bolitoglossa rufescens from Los Tuxtlas region constitute a separate lineage based on molecular data and morphology, and divergence between Los Tuxtlas and other areas appears to predate the arrival of B. rufescens in other areas west of the Isthmus of Tehuantepec. The Isthmus appears responsible for Pliocene vicariance within B. rufescens, as has been shown for other taxa. The Motagua-Polochic fault system does not appear to have caused population vicariance, unlike in other systems. Conclusions Species of Nanotriton have responded to some major geological events in the same manner as other taxa, particularly in the case of the Isthmus of Tehuantepec. The deep divergence of the Los Tuxtlas populations of B. rufescens from other populations highlights the contribution of this volcanic system to patterns of regional endemism, and morphological differences observed in the Los Tuxtlas populations suggests that they may represent an undescribed species of Bolitoglossa. The absence of phylogeographic structure in B
Zhang, Lijuan; Li, Hu; Li, Shujuan; Zhang, Aibing; Kou, Fei; Xun, Huaizhu; Wang, Pei; Wang, Ying; Song, Fan; Cui, Jianxin; Cui, Jinjie; Gouge, Dawn H.; Cai, Wanzhi
2015-01-01
Phylogeographic patterns of some extant plant and vertebrate species have been well studied; however, they are poorly understood in the majority of insects. The study documents analysis of mitochondrial (COI, CYTB and ND5) and nuclear (5.8S rDNA, ITS2 and 28S rDNA) data from 419 individuals of Adelphocoris suturalis, which is one of the main cotton pests found in the 31 locations in China and Japan involved in the study. Results show that the species is highly differentiated between populatio...
Pneumocystis diversity as a phylogeographic tool
Directory of Open Access Journals (Sweden)
S Derouiche
2009-02-01
Full Text Available Parasites are increasingly used to complement the evolutionary and ecological adaptation history of their hosts. Pneumocystis pathogenic fungi, which are transmitted from host-to-host via an airborne route, have been shown to constitute genuine host markers of evolution. These parasites can also provide valuable information about their host ecology. Here, we suggest that parasites can be used as phylogeographic markers to understand the geographical distribution of intra-specific host genetic variants. To test our hypothesis, we characterised Pneumocystis isolates from wild bats living in different areas. Bats comprise a wide variety of species; some of them are able to migrate. Thus, bat chorology and migration behaviour can be approached using Pneumocystis as phylogeographic markers. In the present work, we find that the genetic polymorphisms of bat-derived Pneumocystis are structured by host chorology. Therefore, Pneumocystis intra-specific genetic diversity may constitute a useful and relevant phylogeographic tool.
Directory of Open Access Journals (Sweden)
Cornelya F C Klütsch
Full Text Available Glacial refugia considerably shaped the phylogeographical structure of species and may influence intra-specific morphological, genetic, and adaptive differentiation. However, the impact of the Quaternary ice ages on the phylogeographical structure of North American temperate mammalian species is not well-studied. Here, we surveyed ~1600 individuals of the widely distributed woodland caribou (Rangifer tarandus caribou using mtDNA control region sequences to investigate if glacial refugia contributed to the phylogeographical structure in this subspecies. Phylogenetic tree reconstruction, a median-joining network, and mismatch distributions supported postglacial expansions of woodland caribou from three glacial refugia dating back to 13544-22005 years. These three lineages consisted almost exclusively of woodland caribou mtDNA haplotypes, indicating that phylogeographical structure was mainly shaped by postglacial expansions. The putative centres of these lineages are geographically separated; indicating disconnected glacial refugia in the Rocky Mountains, east of the Mississippi, and the Appalachian Mountains. This is in congruence with the fossil record that caribou were distributed in these areas during the Pleistocene. Our results suggest that the last glacial maximum substantially shaped the phylogeographical structure of this large mammalian North American species that will be affected by climatic change. Therefore, the presented results will be essential for future conservation planning in woodland caribou.
Novaes, Renan Milagres Lage; Ribeiro, Renata Acácio; Lemos-Filho, José Pires; Lovato, Maria Bernadete
2013-01-01
Few studies have addressed the phylogeography of species of the Cerrado, the largest savanna biome of South America. Here we aimed to investigate the phylogeographical structure of Dalbergia miscolobium, a widespread tree from the Cerrado, and to verify its concordance with plant phylogeographical and biogeographical patterns so far described. A total of 287 individuals from 32 populations were analyzed by sequencing the trnL intron of the chloroplast DNA and the internal transcribed spacer of the nuclear ribosomal DNA. Analysis of population structure and tests of population expansion were performed and the time of divergence of haplotypes was estimated. Twelve and 27 haplotypes were identified in the cpDNA and nrDNA data, respectively. The star-like network configuration and the mismatch distributions indicated a recent spatial and demographic expansion of the species. Consistent with previous tree phylogeographical studies of Cerrado trees, the cpDNA also suggested a recent expansion towards the southern Cerrado. The diversity of D. miscolobium was widespread but high levels of genetic diversity were found in the Central Eastern and in the southern portion of Central Western Cerrado. The combined analysis of cpDNA and nrDNA supported a phylogeographic structure into seven groups. The phylogeographical pattern showed many concordances with biogeographical and phylogeographical studies in the Cerrado, mainly with the Cerrado phytogeographic provinces superimposed to our sampling area. The data reinforced the uniqueness of Northeastern and Southeastern Cerrados and the differentiation between Eastern and Western Central Cerrados. The recent diversification of the species (estimated between the Pliocene and the Pleistocene) and the 'genealogical concordances' suggest that a shared and persistent pattern of species diversification might have been present in the Cerrado over time. This is the first time that an extensive 'genealogical concordance' between
Phylogeographic insights into cryptic glacial refugia.
Provan, Jim; Bennett, K D
2008-10-01
The glacial episodes of the Quaternary (2.6 million years ago-present) were a major factor in shaping the present-day distributions of extant flora and fauna, with expansions and contractions of the ice sheets rendering large areas uninhabitable for most species. Fossil records suggest that many species survived glacial maxima by retreating to refugia, usually at lower latitudes. Recently, phylogeographic studies have given support to the existence of previously unknown, or cryptic, refugia. Here we summarise many of these insights into the glacial histories of species in cryptic refugia gained through phylogeographic approaches. Understanding such refugia might be important as the Earth heads into another period of climate change, in terms of predicting the effects on species distribution and survival.
Directory of Open Access Journals (Sweden)
Qingxiang Han
Full Text Available Coastal plants with simple linear distribution ranges along coastlines provide a suitable system for improving our understanding of patterns of intra-specific distributional history and genetic variation. Due to the combination of high seed longevity and high dispersibility of seeds via seawater, we hypothesized that wild radish would poorly represent phylogeographic structure at the local scale. On the other hand, we also hypothesized that wild radish populations might be geographically differentiated, as has been exhibited by their considerable phenotypic variations along the islands of Japan. We conducted nuclear DNA microsatellite loci and chloroplast DNA haplotype analyses for 486 samples and 144 samples, respectively, from 18 populations to investigate the phylogeographic structure of wild radish in Japan. Cluster analysis supported the existence of differential genetic structures between the Ryukyu Islands and mainland Japan populations. A significant strong pattern of isolation by distance and significant evidence of a recent bottleneck were detected. The chloroplast marker analysis resulted in the generation of eight haplotypes, of which two haplotypes (A and B were broadly distributed in most wild radish populations. High levels of variation in microsatellite loci were identified, whereas cpDNA displayed low levels of genetic diversity within populations. Our results indicate that the Kuroshio Current would have contributed to the sculpting of the phylogeographic structure by shaping genetic gaps between isolated populations. In addition, the Tokara Strait would have created a geographic barrier between the Ryukyu Islands and mainland Japan. Finally, extant habitat disturbances (coastal erosion, migration patterns (linear expansion, and geographic characteristics (small islands and sea currents have influenced the expansion and historical population dynamics of wild radish. Our study is the first to record the robust phylogeographic
Fernández-Mazuecos, Mario; Vargas, Pablo
2013-06-01
· The role of Quaternary climatic shifts in shaping the distribution of Linaria elegans, an Iberian annual plant, was investigated using species distribution modelling and molecular phylogeographical analyses. Three hypotheses are proposed to explain the Quaternary history of its mountain ring range. · The distribution of L. elegans was modelled using the maximum entropy method and projected to the last interglacial and to the last glacial maximum (LGM) using two different paleoclimatic models: the Community Climate System Model (CCSM) and the Model for Interdisciplinary Research on Climate (MIROC). Two nuclear and three plastid DNA regions were sequenced for 24 populations (119 individuals sampled). Bayesian phylogenetic, phylogeographical, dating and coalescent-based population genetic analyses were conducted. · Molecular analyses indicated the existence of northern and southern glacial refugia and supported two routes of post-glacial recolonization. These results were consistent with the LGM distribution as inferred under the CCSM paleoclimatic model (but not under the MIROC model). Isolation between two major refugia was dated back to the Riss or Mindel glaciations, > 100 kyr before present (bp). · The Atlantic distribution of inferred refugia suggests that the oceanic (buffered)-continental (harsh) gradient may have played a key and previously unrecognized role in determining Quaternary distribution shifts of Mediterranean plants. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.
Genetic diversity, phylogeographic structure and effect of selection ...
Indian Academy of Sciences (India)
Abdulhakeem B. Ajibike
2017-12-11
Dec 11, 2017 ... RESEARCH ARTICLE. Genetic diversity, phylogeographic ... chickens as genetic resources towards ensuring food security. Keywords. genetic diversity ... PCR product as template DNA, 3.2 pmol of primer and. 8 μL of Big Dye ...
Zhang, Lijuan; Li, Hu; Li, Shujuan; Zhang, Aibing; Kou, Fei; Xun, Huaizhu; Wang, Pei; Wang, Ying; Song, Fan; Cui, Jianxin; Cui, Jinjie; Gouge, Dawn H; Cai, Wanzhi
2015-09-21
Phylogeographic patterns of some extant plant and vertebrate species have been well studied; however, they are poorly understood in the majority of insects. The study documents analysis of mitochondrial (COI, CYTB and ND5) and nuclear (5.8S rDNA, ITS2 and 28S rDNA) data from 419 individuals of Adelphocoris suturalis, which is one of the main cotton pests found in the 31 locations in China and Japan involved in the study. Results show that the species is highly differentiated between populations from central China and peripheral China regions. Analysis of molecular variance showed a high level of geographical differentiation at different hierarchical levels. Isolation-by-distance test showed no significant correlation between genetic distance and geographical distance among A. suturalis populations, which suggested gene flow is not restricted by distance. In seven peripheral populations, the high levels of genetic differentiation and the small Nem values implied that geographic barriers were more likely restrict gene flow. Neutrality tests and the Bayesian skyline plot suggested population expansion likely happened during the cooling transition between Last Interglacial and Last Glacial Maximum. All lines of evidence suggest that physical barriers, Pleistocene climatic oscillations and geographical heterogeneity have affected the population structure and distribution of this insect in China.
Gutiérrez-Rodríguez, Jorge; Barbosa, A Márcia; Martínez-Solano, Íñigo
2017-07-01
Inference of population histories from the molecular signatures of past demographic processes is challenging, but recent methodological advances in species distribution models and their integration in time-calibrated phylogeographic studies allow detailed reconstruction of complex biogeographic scenarios. We apply an integrative approach to infer the evolutionary history of the Iberian ribbed newt (Pleurodeles waltl), an Ibero-Maghrebian endemic with populations north and south of the Strait of Gibraltar. We analyzed an extensive multilocus dataset (mitochondrial and nuclear DNA sequences and ten polymorphic microsatellite loci) and found a deep east-west phylogeographic break in Iberian populations dating back to the Plio-Pleistocene. This break is inferred to result from vicariance associated with the formation of the Guadalquivir river basin. In contrast with previous studies, North African populations showed exclusive mtDNA haplotypes, and formed a monophyletic clade within the Eastern Iberian lineage in the mtDNA genealogy. On the other hand, microsatellites failed to recover Moroccan populations as a differentiated genetic cluster. This is interpreted to result from post-divergence gene flow based on the results of IMA2 and Migrate analyses. Thus, Moroccan populations would have originated after overseas dispersal from the Iberian Peninsula in the Pleistocene, with subsequent gene flow in more recent times, implying at least two trans-marine dispersal events. We modeled the distribution of the species and of each lineage, and projected these models back in time to infer climatically favourable areas during the mid-Holocene, the last glacial maximum (LGM) and the last interglacial (LIG), to reconstruct more recent population dynamics. We found minor differences in climatic favourability across lineages, suggesting intraspecific niche conservatism. Genetic diversity was significantly correlated with the intersection of environmental favourability in the LIG and
Causal inference in survival analysis using pseudo-observations
DEFF Research Database (Denmark)
Andersen, Per K; Syriopoulou, Elisavet; Parner, Erik T
2017-01-01
Causal inference for non-censored response variables, such as binary or quantitative outcomes, is often based on either (1) direct standardization ('G-formula') or (2) inverse probability of treatment assignment weights ('propensity score'). To do causal inference in survival analysis, one needs ...
Sheh, Alexander; Chaturvedi, Rupesh; Merrell, D Scott; Correa, Pelayo; Wilson, Keith T; Fox, James G
2013-07-01
While Helicobacter pylori infects over 50% of the world's population, the mechanisms involved in the development of gastric disease are not fully understood. Bacterial, host, and environmental factors play a role in disease outcome. To investigate the role of bacterial factors in H. pylori pathogenesis, global gene expression of six H. pylori isolates was analyzed during coculture with gastric epithelial cells. Clustering analysis of six Colombian clinical isolates from a region with low gastric cancer risk and a region with high gastric cancer risk segregated strains based on their phylogeographic origin. One hundred forty-six genes had increased expression in European strains, while 350 genes had increased expression in African strains. Differential expression was observed in genes associated with motility, pathogenicity, and other adaptations to the host environment. European strains had greater expression of the virulence factors cagA, vacA, and babB and were associated with increased gastric histologic lesions in patients. In AGS cells, European strains promoted significantly higher interleukin-8 (IL-8) expression than did African strains. African strains significantly induced apoptosis, whereas only one European strain significantly induced apoptosis. Our data suggest that gene expression profiles of clinical isolates can discriminate strains by phylogeographic origin and that these profiles are associated with changes in expression of the proinflammatory and protumorigenic cytokine IL-8 and levels of apoptosis in host epithelial cells. These findings support the hypothesis that bacterial factors determined by the phylogeographic origin of H. pylori strains may promote increased gastric disease.
Markolf, Matthias; Kappeler, Peter M
2013-11-14
Due to its remarkable species diversity and micro-endemism, Madagascar has recently been suggested to serve as a biogeographic model region. However, hypothesis-based tests of various diversification mechanisms that have been proposed for the evolution of the island's micro-endemic lineages are still limited. Here, we test the fit of several diversification hypotheses with new data on the broadly distributed genus Eulemur using coalescent-based phylogeographic analyses. Time-calibrated species tree analyses and population genetic clustering resolved the previously polytomic species relationships among eulemurs. The most recent common ancestor of eulemurs was estimated to have lived about 4.45 million years ago (mya). Divergence date estimates furthermore suggested a very recent diversification among the members of the "brown lemur complex", i.e. former subspecies of E. fulvus, during the Pleistocene (0.33-1.43 mya). Phylogeographic model comparisons of past migration rates showed significant levels of gene flow between lineages of neighboring river catchments as well as between eastern and western populations of the redfronted lemur (E. rufifrons). Together, our results are concordant with the centers of endemism hypothesis (Wilmé et al. 2006, Science 312:1063-1065), highlight the importance of river catchments for the evolution of Madagascar's micro-endemic biota, and they underline the usefulness of testing diversification mechanisms using coalescent-based phylogeographic methods.
A Comparative Analysis of Fuzzy Inference Engines in Context of ...
African Journals Online (AJOL)
Fuzzy inference engine has found successful applications in a wide variety of fields, such as automatic control, data classification, decision analysis, expert engines, time series prediction, robotics, pattern recognition, etc. This paper presents a comparative analysis of three fuzzy inference engines, max-product, max-min ...
DEFF Research Database (Denmark)
Morin, Phillip A; Archer, Frederick I.; Foote, Andrew David
2010-01-01
Killer whales (Orcinus orca) currently comprise a single, cosmopolitan species with a diverse diet. However, studies over the last 30 yr have revealed populations of sympatric "ecotypes" with discrete prey preferences, morphology, and behaviors. Although these ecotypes avoid social interactions...... and are not known to interbreed, genetic studies to date have found extremely low levels of diversity in the mitochondrial control region, and few clear phylogeographic patterns worldwide. This low level of diversity is likely due to low mitochondrial mutation rates that are common to cetaceans. Using killer whales...... as a case study, we have developed a method to readily sequence, assemble, and analyze complete mitochondrial genomes from large numbers of samples to more accurately assess phylogeography and estimate divergence times. This represents an important tool for wildlife management, not only for killer whales...
Directory of Open Access Journals (Sweden)
R Eduardo Palma
Full Text Available The long-tailed pygmy rice rat Oligoryzomys longicaudatus (Sigmodontinae, the major reservoir of Hantavirus in Chile and Patagonian Argentina, is widely distributed in the Mediterranean, Temperate and Patagonian Forests of Chile, as well as in adjacent areas in southern Argentina. We used molecular data to evaluate the effects of the last glacial event on the phylogeographic structure of this species. We examined if historical Pleistocene events had affected genetic variation and spatial distribution of this species along its distributional range. We sampled 223 individuals representing 47 localities along the species range, and sequenced the hypervariable domain I of the mtDNA control region. Aligned sequences were analyzed using haplotype network, bayesian population structure and demographic analyses. Analysis of population structure and the haplotype network inferred three genetic clusters along the distribution of O. longicaudatus that mostly agreed with the three major ecogeographic regions in Chile: Mediterranean, Temperate Forests and Patagonian Forests. Bayesian Skyline Plots showed constant population sizes through time in all three clusters followed by an increase after and during the Last Glacial Maximum (LGM; between 26,000-13,000 years ago. Neutrality tests and the "g" parameter also suggest that populations of O. longicaudatus experienced demographic expansion across the species entire range. Past climate shifts have influenced population structure and lineage variation of O. longicaudatus. This species remained in refugia areas during Pleistocene times in southern Temperate Forests (and adjacent areas in Patagonia. From these refugia, O. longicaudatus experienced demographic expansions into Patagonian Forests and central Mediterranean Chile using glacial retreats.
Causal inference in survival analysis using pseudo-observations.
Andersen, Per K; Syriopoulou, Elisavet; Parner, Erik T
2017-07-30
Causal inference for non-censored response variables, such as binary or quantitative outcomes, is often based on either (1) direct standardization ('G-formula') or (2) inverse probability of treatment assignment weights ('propensity score'). To do causal inference in survival analysis, one needs to address right-censoring, and often, special techniques are required for that purpose. We will show how censoring can be dealt with 'once and for all' by means of so-called pseudo-observations when doing causal inference in survival analysis. The pseudo-observations can be used as a replacement of the outcomes without censoring when applying 'standard' causal inference methods, such as (1) or (2) earlier. We study this idea for estimating the average causal effect of a binary treatment on the survival probability, the restricted mean lifetime, and the cumulative incidence in a competing risks situation. The methods will be illustrated in a small simulation study and via a study of patients with acute myeloid leukemia who received either myeloablative or non-myeloablative conditioning before allogeneic hematopoetic cell transplantation. We will estimate the average causal effect of the conditioning regime on outcomes such as the 3-year overall survival probability and the 3-year risk of chronic graft-versus-host disease. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Pereira, A M; Robalo, J I; Freyhof, J; Maia, C; Fonseca, J P; Valente, A; Almada, V C
2010-08-01
The populations of brook lamprey Lampetra planeri of Portuguese Rivers were analysed phylogeographically using a fragment of 644 bp of the mitochondrial control region of 158 individuals from six populations. Samples representing L. planeri and migratory lampreys Lampetra fluviatilis of rivers draining to the North Sea and the Baltic Sea were also included to assess the relationships of Portuguese samples. The data support a clear differentiation of all the populations studied. Several populations, which are isolated among themselves and also from the migratory lampreys, proved to be entirely composed of private haplotypes, a finding that supports some time of independent evolutionary history for these populations. This, combined with the geographic confinement to small water bodies, justifies the recognition of at least four conservation units in the Portuguese rivers Sado, São Pedro, Nabão and Inha.
Directory of Open Access Journals (Sweden)
Hung-Du Lin
2012-01-01
Full Text Available Phylogeographical analyses on Squalidus argentatus samples from thirteen localities within mainland China and Taiwan were conducted for biogeographic studies, as their dispersal strictly depends on geological evolution of the landmasses. A total of 95 haplotypes were genotyped for mtDNA cyt b gene in 160 specimens from nine river systems. Relatively high levels of haplotype diversity (h = 0.984 and low levels of nucleotide diversity (π = 0.020 were detected in S. argentatus. Two major phylogenetic haplotype groups, A and B, were revealed via phylogenetic analysis. The degree of intergroup divergence (3.96% indicates that these groups diverged about 4.55 myr (million years ago. Haplotype network and population analyses indicated significant genetic structure (FST = 0.775, largely concordant with the geographical location of the populations. According to SAMOVA analysis, we divided these populations into four units: Yangtze-Pearl, Qiantang-Minjiang, Jiulong-Beijiang and Taiwan groups. Mismatch distribution analysis, neutrality tests and Bayesian skyline plots indicated a significant population expansion for lineage A and B, approximately dated 0.35 and 0.04 myr ago, respectively. We found strong geographical organization of the haplotype clades across different geographic scales that can be explained by episodes of dispersal and population expansion followed by population fragmentation and restricted gene flow.
An analysis pipeline for the inference of protein-protein interaction networks
Energy Technology Data Exchange (ETDEWEB)
Taylor, Ronald C.; Singhal, Mudita; Daly, Don S.; Gilmore, Jason M.; Cannon, William R.; Domico, Kelly O.; White, Amanda M.; Auberry, Deanna L.; Auberry, Kenneth J.; Hooker, Brian S.; Hurst, G. B.; McDermott, Jason E.; McDonald, W. H.; Pelletier, Dale A.; Schmoyer, Denise A.; Wiley, H. S.
2009-12-01
An analysis pipeline has been created for deployment of a novel algorithm, the Bayesian Estimator of Protein-Protein Association Probabilities (BEPro), for use in the reconstruction of protein-protein interaction networks. We have combined the Software Environment for BIological Network Inference (SEBINI), an interactive environment for the deployment and testing of network inference algorithms that use high-throughput data, and the Collective Analysis of Biological Interaction Networks (CABIN), software that allows integration and analysis of protein-protein interaction and gene-to-gene regulatory evidence obtained from multiple sources, to allow interactions computed by BEPro to be stored, visualized, and further analyzed. Incorporating BEPro into SEBINI and automatically feeding the resulting inferred network into CABIN, we have created a structured workflow for protein-protein network inference and supplemental analysis from sets of mass spectrometry bait-prey experiment data. SEBINI demo site: https://www.emsl.pnl.gov /SEBINI/ Contact: ronald.taylor@pnl.gov. BEPro is available at http://www.pnl.gov/statistics/BEPro3/index.htm. Contact: ds.daly@pnl.gov. CABIN is available at http://www.sysbio.org/dataresources/cabin.stm. Contact: mudita.singhal@pnl.gov.
da Silva, Marjorie; Noll, Fernando Barbosa; E Castro, Adriana C Morales-Corrêa
2018-01-01
Swarm-founding wasps are endemic and common representatives of neotropical fauna and compose an interesting social tribe of vespids, presenting both complex social characteristics and uncommon traits for a eusocial group, such as the absence of castes with distinct morphology. The paper wasp Protonectarina sylveirae (Saussure) presents a broad distribution from Brazil, Argentina and Paraguay, occurring widespread in the Atlantic rainforest and arboreal Caatinga, being absent in the Amazon region. Given the peculiar distribution among swarm-founding wasps, an integrative approach to reconstruct the evolutionary history of P. sylveirae in a spatial-temporal framework was performed to investigate: the presence of genetic structure and its relationship with the geography, the evolution of distinct morphologic lineages and the possible historical event(s) in Neotropical region, which could explain the observed phylogeographic pattern. Individuals of P. sylveirae were obtained from populations of 16 areas throughout its distribution for DNA extraction and amplification of mitochondrial genes 12S, 16S and COI. Analysis of genetic diversity, construction of haplotype net, analysis of population structure and dating analysis of divergence time were performed. A morphometric analysis was also performed using 8 measures of the body of the adult (workers) to test if there are morphological distinction among populations. Thirty-five haplotypes were identified, most of them exclusively of a group and a high population structure was found. The possibility of genetic divergence because of isolation by distance was rejected. Morphological analysis pointed to a great uniformity in phenotypes, with only a small degree of differentiation between populations of south and the remaining. Divergence time analysis showed a Middle/Late Miocene origin, a period where an extensive marine ingression occurred in South America. Divergence of haplogroups began from the Plio/Pleistocene boundary
Meaningful mediation analysis : Plausible causal inference and informative communication
Pieters, Rik
2017-01-01
Statistical mediation analysis has become the technique of choice in consumer research to make causal inferences about the influence of a treatment on an outcome via one or more mediators. This tutorial aims to strengthen two weak links that impede statistical mediation analysis from reaching its
Inference algorithms and learning theory for Bayesian sparse factor analysis
International Nuclear Information System (INIS)
Rattray, Magnus; Sharp, Kevin; Stegle, Oliver; Winn, John
2009-01-01
Bayesian sparse factor analysis has many applications; for example, it has been applied to the problem of inferring a sparse regulatory network from gene expression data. We describe a number of inference algorithms for Bayesian sparse factor analysis using a slab and spike mixture prior. These include well-established Markov chain Monte Carlo (MCMC) and variational Bayes (VB) algorithms as well as a novel hybrid of VB and Expectation Propagation (EP). For the case of a single latent factor we derive a theory for learning performance using the replica method. We compare the MCMC and VB/EP algorithm results with simulated data to the theoretical prediction. The results for MCMC agree closely with the theory as expected. Results for VB/EP are slightly sub-optimal but show that the new algorithm is effective for sparse inference. In large-scale problems MCMC is infeasible due to computational limitations and the VB/EP algorithm then provides a very useful computationally efficient alternative.
Inference algorithms and learning theory for Bayesian sparse factor analysis
Energy Technology Data Exchange (ETDEWEB)
Rattray, Magnus; Sharp, Kevin [School of Computer Science, University of Manchester, Manchester M13 9PL (United Kingdom); Stegle, Oliver [Max-Planck-Institute for Biological Cybernetics, Tuebingen (Germany); Winn, John, E-mail: magnus.rattray@manchester.ac.u [Microsoft Research Cambridge, Roger Needham Building, Cambridge, CB3 0FB (United Kingdom)
2009-12-01
Bayesian sparse factor analysis has many applications; for example, it has been applied to the problem of inferring a sparse regulatory network from gene expression data. We describe a number of inference algorithms for Bayesian sparse factor analysis using a slab and spike mixture prior. These include well-established Markov chain Monte Carlo (MCMC) and variational Bayes (VB) algorithms as well as a novel hybrid of VB and Expectation Propagation (EP). For the case of a single latent factor we derive a theory for learning performance using the replica method. We compare the MCMC and VB/EP algorithm results with simulated data to the theoretical prediction. The results for MCMC agree closely with the theory as expected. Results for VB/EP are slightly sub-optimal but show that the new algorithm is effective for sparse inference. In large-scale problems MCMC is infeasible due to computational limitations and the VB/EP algorithm then provides a very useful computationally efficient alternative.
Delahay, Robin M; Croxall, Nicola J; Stephens, Amberley D
2018-01-01
The genome of the gastric pathogen Helicobacter pylori is characterised by considerable variation of both gene sequence and content, much of which is contained within three large genomic islands comprising the cag pathogenicity island ( cag PAI) and two mobile integrative and conjugative elements (ICEs) termed tfs3 and tfs4 . All three islands are implicated as virulence factors, although whereas the cag PAI is well characterised, understanding of how the tfs elements influence H. pylori interactions with different human hosts is significantly confounded by limited definition of their distribution, diversity and structural representation in the global H. pylori population. To gain a global perspective of tfs ICE population dynamics we established a bioinformatics workflow to extract and precisely define the full tfs pan-gene content contained within a global collection of 221 draft and complete H. pylori genome sequences. Complete (ca. 35-55kbp) and remnant tfs ICE clusters were reconstructed from a dataset comprising > 12,000 genes, from which orthologous gene complements and distinct alleles descriptive of different tfs ICE types were defined and classified in comparative analyses. The genetic variation within defined ICE modular segments was subsequently used to provide a complete description of tfs ICE diversity and a comprehensive assessment of their phylogeographic context. Our further examination of the apparent ICE modular types identified an ancient and complex history of ICE residence, mobility and interaction within particular H. pylori phylogeographic lineages and further, provided evidence of both contemporary inter-lineage and inter-species ICE transfer and displacement. Our collective results establish a clear view of tfs ICE diversity and phylogeographic representation in the global H. pylori population, and provide a robust contextual framework for elucidating the functional role of the tfs ICEs particularly as it relates to the risk of gastric
Byun, Jinyoung; Han, Younghun; Gorlov, Ivan P; Busam, Jonathan A; Seldin, Michael F; Amos, Christopher I
2017-10-16
Accurate inference of genetic ancestry is of fundamental interest to many biomedical, forensic, and anthropological research areas. Genetic ancestry memberships may relate to genetic disease risks. In a genome association study, failing to account for differences in genetic ancestry between cases and controls may also lead to false-positive results. Although a number of strategies for inferring and taking into account the confounding effects of genetic ancestry are available, applying them to large studies (tens thousands samples) is challenging. The goal of this study is to develop an approach for inferring genetic ancestry of samples with unknown ancestry among closely related populations and to provide accurate estimates of ancestry for application to large-scale studies. In this study we developed a novel distance-based approach, Ancestry Inference using Principal component analysis and Spatial analysis (AIPS) that incorporates an Inverse Distance Weighted (IDW) interpolation method from spatial analysis to assign individuals to population memberships. We demonstrate the benefits of AIPS in analyzing population substructure, specifically related to the four most commonly used tools EIGENSTRAT, STRUCTURE, fastSTRUCTURE, and ADMIXTURE using genotype data from various intra-European panels and European-Americans. While the aforementioned commonly used tools performed poorly in inferring ancestry from a large number of subpopulations, AIPS accurately distinguished variations between and within subpopulations. Our results show that AIPS can be applied to large-scale data sets to discriminate the modest variability among intra-continental populations as well as for characterizing inter-continental variation. The method we developed will protect against spurious associations when mapping the genetic basis of a disease. Our approach is more accurate and computationally efficient method for inferring genetic ancestry in the large-scale genetic studies.
Directory of Open Access Journals (Sweden)
Yann Reynaud
Full Text Available Mycobacterium tuberculosis genetic structure, and evolutionary history have been studied for years by several genotyping approaches, but delineation of a few sublineages remains controversial and needs better characterization. This is particularly the case of T group within lineage 4 (L4 which was first described using spoligotyping to pool together a number of strains with ill-defined signatures. Although T strains were not traditionally considered as a real phylogenetic group, they did contain a few phylogenetically meaningful sublineages as shown using SNPs. We therefore decided to investigate if this observation could be corroborated using other robust genetic markers. We consequently made a first assessment of genetic structure using 24-loci MIRU-VNTRs data extracted from the SITVIT2 database (n = 607 clinical isolates collected in Russia, Albania, Turkey, Iraq, Brazil and China. Combining Minimum Spanning Trees and Bayesian population structure analyses (using STRUCTURE and TESS softwares, we distinctly identified eight tentative phylogenetic groups (T1-T8 with a remarkable correlation with geographical origin. We further compared the present structure observed with other L4 sublineages (n = 416 clinical isolates belonging to LAM, Haarlem, X, S sublineages, and showed that 5 out of 8 T groups seemed phylogeographically well-defined as opposed to the remaining 3 groups that partially mixed with other L4 isolates. These results provide with novel evidence about phylogeographically specificity of a proportion of ill-defined T group of M. tuberculosis. The genetic structure observed will now be further validated on an enlarged worldwide dataset using Whole Genome Sequencing (WGS.
A range-wide synthesis and timeline for phylogeographic events in the red fox (Vulpes vulpes).
Kutschera, Verena E; Lecomte, Nicolas; Janke, Axel; Selva, Nuria; Sokolov, Alexander A; Haun, Timm; Steyer, Katharina; Nowak, Carsten; Hailer, Frank
2013-06-05
Many boreo-temperate mammals have a Pleistocene fossil record throughout Eurasia and North America, but only few have a contemporary distribution that spans this large area. Examples of Holarctic-distributed carnivores are the brown bear, grey wolf, and red fox, all three ecological generalists with large dispersal capacity and a high adaptive flexibility. While the two former have been examined extensively across their ranges, no phylogeographic study of the red fox has been conducted across its entire Holarctic range. Moreover, no study included samples from central Asia, leaving a large sampling gap in the middle of the Eurasian landmass. Here we provide the first mitochondrial DNA sequence data of red foxes from central Asia (Siberia), and new sequences from several European populations. In a range-wide synthesis of 729 red fox mitochondrial control region sequences, including 677 previously published and 52 newly obtained sequences, this manuscript describes the pattern and timing of major phylogeographic events in red foxes, using a Bayesian coalescence approach with multiple fossil tip and root calibration points. In a 335 bp alignment we found in total 175 unique haplotypes. All newly sequenced individuals belonged to the previously described Holarctic lineage. Our analyses confirmed the presence of three Nearctic- and two Japan-restricted lineages that were formed since the Mid/Late Pleistocene. The phylogeographic history of red foxes is highly similar to that previously described for grey wolves and brown bears, indicating that climatic fluctuations and habitat changes since the Pleistocene had similar effects on these highly mobile generalist species. All three species originally diversified in Eurasia and later colonized North America and Japan. North American lineages persisted through the last glacial maximum south of the ice sheets, meeting more recent colonizers from Beringia during postglacial expansion into the northern Nearctic. Both brown
2018-01-01
Swarm-founding wasps are endemic and common representatives of neotropical fauna and compose an interesting social tribe of vespids, presenting both complex social characteristics and uncommon traits for a eusocial group, such as the absence of castes with distinct morphology. The paper wasp Protonectarina sylveirae (Saussure) presents a broad distribution from Brazil, Argentina and Paraguay, occurring widespread in the Atlantic rainforest and arboreal Caatinga, being absent in the Amazon region. Given the peculiar distribution among swarm-founding wasps, an integrative approach to reconstruct the evolutionary history of P. sylveirae in a spatial-temporal framework was performed to investigate: the presence of genetic structure and its relationship with the geography, the evolution of distinct morphologic lineages and the possible historical event(s) in Neotropical region, which could explain the observed phylogeographic pattern. Individuals of P. sylveirae were obtained from populations of 16 areas throughout its distribution for DNA extraction and amplification of mitochondrial genes 12S, 16S and COI. Analysis of genetic diversity, construction of haplotype net, analysis of population structure and dating analysis of divergence time were performed. A morphometric analysis was also performed using 8 measures of the body of the adult (workers) to test if there are morphological distinction among populations. Thirty-five haplotypes were identified, most of them exclusively of a group and a high population structure was found. The possibility of genetic divergence because of isolation by distance was rejected. Morphological analysis pointed to a great uniformity in phenotypes, with only a small degree of differentiation between populations of south and the remaining. Divergence time analysis showed a Middle/Late Miocene origin, a period where an extensive marine ingression occurred in South America. Divergence of haplogroups began from the Plio/Pleistocene boundary
Zhang, Li-Juan; Cai, Wan-Zhi; Luo, Jun-Yu; Zhang, Shuai; Wang, Chun-Yi; Lv, Li-Min; Zhu, Xiang-Zhen; Wang, Li; Cui, Jin-Jie
2017-01-01
Lygus pratensis (L.) is an important cotton pest in China, especially in the northwest region. Nymphs and adults cause serious quality and yield losses. However, the genetic structure and geographic distribution of L. pratensis is not well known. We analyzed genetic diversity, geographical structure, gene flow, and population dynamics of L. pratensis in northwest China using mitochondrial and nuclear sequence datasets to study phylogeographical patterns and demographic history. L. pratensis (n = 286) were collected at sites across an area spanning 2,180,000 km2, including the Xinjiang and Gansu-Ningxia regions. Populations in the two regions could be distinguished based on mitochondrial criteria but the overall genetic structure was weak. The nuclear dataset revealed a lack of diagnostic genetic structure across sample areas. Phylogenetic analysis indicated a lack of population level monophyly that may have been caused by incomplete lineage sorting. The Mantel test showed a significant correlation between genetic and geographic distances among the populations based on the mtDNA data. However the nuclear dataset did not show significant correlation. A high level of gene flow among populations was indicated by migration analysis; human activities may have also facilitated insect movement. The availability of irrigation water and ample cotton hosts makes the Xinjiang region well suited for L. pratensis reproduction. Bayesian skyline plot analysis, star-shaped network, and neutrality tests all indicated that L. pratensis has experienced recent population expansion. Climatic changes and extensive areas occupied by host plants have led to population expansion of L. pratensis. In conclusion, the present distribution and phylogeographic pattern of L. pratensis was influenced by climate, human activities, and availability of plant hosts.
Albaina, Naiara; Olsen, Jeanine L.; Couceiro, Lucia; Miguel Ruiz, Jose; Barreiro, Rodolfo
Because marine species respond differentially to factors governing survival and gene flow, closely related taxa may display dissimilar phylogeographic histories. New data for the patchily distributed gastropod Nassarius nitidus throughout its Atlantic-Mediterranean range (collected during 2008 and
Szövényi, Péter; Hock, Zsófia; Urmi, Edwin; Schneller, Jakob J
2006-01-01
The chloroplast phylogeography of two peat mosses (Sphagnum fimbriatum and Sphagnum squarrosum) with similar distributions but different life history characteristics was investigated in Europe. Our main aim was to test whether similar distributions reflect similar phylogeographic and phylodemographic processes. Accessions covering the European distributions of the species were collected and approx. 2000 bp of the chloroplast genome of each species was sequenced. Maximum parsimony, statistical parsimony and phylodemographic analyses were used to address the question of whether these species with similar distributions show evidence of similar phylogeographic and phylodemographic processes. The chloroplast haplotypes of the currently spreading species S. fimbriatum showed strong geographic structure, whereas those of S. squarrosum, which has stable historical population sizes, showed only very weak geographic affinity and were widely distributed. We hypothesize that S. fimbriatum survived the last glaciations along the Atlantic coast of Europe, whereas S. squarrosum had numerous, scattered refugia in Europe. The dominance of one haplotype of S. fimbriatum across almost all of Europe suggests rapid colonization after the last glacial maximum. We hypothesize that high colonizing ability is an inherent characteristic of the species and its recent expansion in Europe is a response to climate change.
Directory of Open Access Journals (Sweden)
Madec Luc
2010-01-01
Full Text Available Abstract Background Despite its key location between the rest of the continent and Europe, research on the phylogeography of north African species remains very limited compared to European and North American taxa. The Mediterranean land mollusc Cornu aspersum (= Helix aspersa is part of the few species widely sampled in north Africa for biogeographical analysis. It then provides an excellent biological model to understand phylogeographical patterns across the Mediterranean basin, and to evaluate hypotheses of population differentiation. We investigated here the phylogeography of this land snail to reassess the evolutionary scenario we previously considered for explaining its scattered distribution in the western Mediterranean, and to help to resolve the question of the direction of its range expansion (from north Africa to Europe or vice versa. By analysing simultaneously individuals from 73 sites sampled in its putative native range, the present work provides the first broad-scale screening of mitochondrial variation (cyt b and 16S rRNA genes of C. aspersum. Results Phylogeographical structure mirrored previous patterns inferred from anatomy and nuclear data, since all haplotypes could be ascribed to a B (West or a C (East lineage. Alternative migration models tested confirmed that C. aspersum most likely spread from north Africa to Europe. In addition to Kabylia in Algeria, which would have been successively a centre of dispersal and a zone of secondary contacts, we identified an area in Galicia where genetically distinct west and east type populations would have regained contact. Conclusions Vicariant and dispersal processes are reviewed and discussed in the light of signatures left in the geographical distribution of the genetic variation. In referring to Mediterranean taxa which show similar phylogeographical patterns, we proposed a parsimonious scenario to account for the "east-west" genetic splitting and the northward expansion of the
Inference of Well-Typings for Logic Programs with Application to Termination Analysis
DEFF Research Database (Denmark)
Bruynooghe, M.; Gallagher, John Patrick; Humbeeck, W. Van
2005-01-01
A method is developed to infer a polymorphic well-typing for a logic program. Our motivation is to improve the automation of termination analysis by deriving types from which norms can automatically be constructed. Previous work on type-based termination analysis used either types declared...... by the user, or automatically generated monomorphic types describing the success set of predicates. The latter types are less precise and result in weaker termination conditions than those obtained from declared types. Our type inference procedure involves solving set constraints generated from the program...... and derives a well-typing in contrast to a success-set approximation. Experiments so far show that our automatically inferred well-typings are close to the declared types and result in termination conditions that are as strong as those obtained with declared types. We describe the method, its implementation...
Directory of Open Access Journals (Sweden)
Charlotte De Busschere
2016-02-01
Full Text Available Due to both deliberate and accidental introductions, invasive African Clawed Frog (Xenopus laevis populations have become established worldwide. In this study, we investigate the geographic origins of invasive X. laevis populations in France and Portugal using the phylogeographic structure of X. laevis in its native South African range. In total, 80 individuals from the whole area known to be invaded in France and Portugal were analysed for two mitochondrial and three nuclear genes, allowing a comparison with 185 specimens from the native range. Our results show that native phylogeographic lineages have contributed differently to invasive European X. laevis populations. In Portugal, genetic and historical data suggest a single colonization event involving a small number of individuals from the south-western Cape region in South Africa. In contrast, French invasive X. laevis encompass two distinct native phylogeographic lineages, i.e., one from the south-western Cape region and one from the northern regions of South Africa. The French X. laevis population is the first example of a X. laevis invasion involving multiple lineages. Moreover, the lack of population structure based on nuclear DNA suggests a potential role for admixture within the invasive French population.
Effects analysis fuzzy inference system in nuclear problems using approximate reasoning
International Nuclear Information System (INIS)
Guimaraes, Antonio C.F.; Franklin Lapa, Celso Marcelo
2004-01-01
In this paper a fuzzy inference system modeling technique applied on failure mode and effects analysis (FMEA) is introduced in reactor nuclear problems. This method uses the concept of a pure fuzzy logic system to treat the traditional FMEA parameters: probabilities of occurrence, severity and detection. The auxiliary feed-water system of a typical two-loop pressurized water reactor (PWR) was used as practical example in this analysis. The kernel result is the conceptual confrontation among the traditional risk priority number (RPN) and the fuzzy risk priority number (FRPN) obtained from experts opinion. The set of results demonstrated the great potential of the inference system and advantage of the gray approach in this class of problems
Hué, Stéphane; Buckton, Andrew J.; Myers, Richard E.; Duiculescu, Dan; Ene, Luminita; Oprea, Cristiana; Tardei, Gratiela; Rugina, Sorin; Mardarescu, Mariana; Floch, Corinne; Notheis, Gundula; Zöhrer, Bettina; Cane, Patricia A.; Pillay, Deenan
2012-01-01
Abstract In the late 1980s an HIV-1 epidemic emerged in Romania that was dominated by subtype F1. The main route of infection is believed to be parenteral transmission in children. We sequenced partial pol coding regions of 70 subtype F1 samples from children and adolescents from the PENTA-EPPICC network of which 67 were from Romania. Phylogenetic reconstruction using the sequences and other publically available global subtype F sequences showed that 79% of Romanian F1 sequences formed a statistically robust monophyletic cluster. The monophyletic cluster was epidemiologically linked to parenteral transmission in children. Coalescent-based analysis dated the origins of the parenteral epidemic to 1983 [1981–1987; 95% HPD]. The analysis also shows that the epidemic's effective population size has remained fairly constant since the early 1990s suggesting limited onward spread of the virus within the population. Furthermore, phylogeographic analysis suggests that the root location of the parenteral epidemic was Bucharest. PMID:22251065
Østbye, K; Bernatchez, L; Naesje, T F; Himberg, K-J M; Hindar, K
2005-12-01
We compared mitochondrial DNA and gill-raker number variation in populations of the European whitefish Coregonus lavaretus (L.) species complex to illuminate their evolutionary history, and discuss mechanisms behind diversification. Using single-strand conformation polymorphism (SSCP) and sequencing 528 bp of combined parts of the cytochrome oxidase b (cyt b) and NADH dehydrogenase subunit 3 (ND3) mithochondrial DNA (mtDNA) regions, we documented phylogeographic relationships among populations and phylogeny of mtDNA haplotypes. Demographic events behind geographical distribution of haplotypes were inferred using nested clade analysis (NCA) and mismatch distribution. Concordance between operational taxonomical groups, based on gill-raker numbers, and mtDNA patterns was tested. Three major mtDNA clades were resolved in Europe: a North European clade from northwest Russia to Denmark, a Siberian clade from the Arctic Sea to southwest Norway, and a South European clade from Denmark to the European Alps, reflecting occupation in different glacial refugia. Demographic events inferred from NCA were isolation by distance, range expansion, and fragmentation. Mismatch analysis suggested that clades which colonized Fennoscandia and the Alps expanded in population size 24 500-5800 years before present, with minute female effective population sizes, implying small founder populations during colonization. Gill-raker counts did not commensurate with hierarchical mtDNA clades, and poorly with haplotypes, suggesting recent origin of gill-raker variation. Whitefish designations based on gill-raker numbers were not associated with ancient clades. Lack of congruence in morphology and evolutionary lineages implies that the taxonomy of this species complex should be reconsidered.
Walker, Matt J; Stockman, Amy K; Marek, Paul E; Bond, Jason E
2009-01-01
Background Species that are widespread throughout historically glaciated and currently non-glaciated areas provide excellent opportunities to investigate the role of Pleistocene climatic change on the distribution of North American biodiversity. Many studies indicate that northern animal populations exhibit low levels of genetic diversity over geographically widespread areas whereas southern populations exhibit relatively high levels. Recently, paleoclimatic data have been combined with niche-based distribution modeling to locate possible refugia during the Last Glacial Maximum. Using phylogeographic, population, and paleoclimatic data, we show that the distribution and mitochondrial data for the millipede genus Narceus are consistent with classical examples of Pleistocene refugia and subsequent post-glacial population expansion seen in other organismal groups. Results The phylogeographic structure of Narceus reveals a complex evolutionary history with signatures of multiple refugia in southeastern North America followed by two major northern expansions. Evidence for refugial populations were found in the southern Appalachian Mountains and in the coastal plain. The northern expansions appear to have radiated from two separate refugia, one from the Gulf Coastal Plain area and the other from the mid-Atlantic coastal region. Distributional models of Narceus during the Last Glacial Maximum show a dramatic reduction from the current distribution, with suitable ecological zones concentrated along the Gulf and Atlantic coastal plain. We found a strong correlation between these zones of ecological suitability inferred from our paleo-model with levels of genetic diversity derived from phylogenetic and population estimates of genetic structuring. Conclusion The signature of climatic change, during and after the Pleistocene, on the distribution of the millipede genus Narceus is evident in the genetic data presented. Niche-based historical distribution modeling strengthens the
Büsse, Sebastian; von Grumbkow, Philipp; Hummel, Susanne; Shah, Deep Narayan; Tachamo Shah, Ram Devi; Li, Jingke; Zhang, Xueping; Yoshizawa, Kazunori; Wedmann, Sonja; Hörnschemeyer, Thomas
2012-01-01
Unusual biogeographic patterns of closely related groups reflect events in the past, and molecular analyses can help to elucidate these events. While ample research on the origin of disjunct distributions of different organism groups in the Western Paleartic has been conducted, such studies are rare for Eastern Palearctic organisms. In this paper we present a phylogeographic analysis of the disjunct distribution pattern of the extant species of the strongly cool-adapted Epiophlebia dragonflies from Asia. We investigated sequences of the usually more conserved 18 S rDNA and 28 S rDNA genes and the more variable sequences of ITS1, ITS2 and CO2 of all three currently recognised Epiophlebia species and of a sample of other odonatan species. In all genes investigated the degrees of similarity between species of Epiophlebia are very high and resemble those otherwise found between different populations of the same species in Odonata. This indicates that substantial gene transfer between these populations occurred in the comparatively recent past. Our analyses imply a wide distribution of the ancestor of extant Epiophlebia in Southeast Asia during the last ice age, when suitable habitats were more common. During the following warming phase, its range contracted, resulting in the current disjunct distribution. Given the strong sensitivity of these species to climatic parameters, the current trend to increasing global temperatures will further reduce acceptable habitats and seriously threaten the existences of these last representatives of an ancient group of Odonata. PMID:22666462
Directory of Open Access Journals (Sweden)
Teng-Lang Yu
Full Text Available To comprehend the phylogeographic patterns of genetic variation in anurans at Taiwan Island, this study attempted to examine (1 the existence of various geological barriers (Central Mountain Ranges, CMRs; and (2 the genetic variation of Bufo bankorensis using mtDNA sequences among populations located in different regions of Taiwan, characterized by different climates and existing under extreme conditions when compared available sequences of related species B. gargarizans of mainland China.Phylogenetic analyses of the dataset with mitochondrial DNA (mtDNA D-loop gene (348 bp recovered a close relationship between B. bankorensis and B. gargarizans, identified three distinct lineages. Furthermore, the network of mtDNA D-loop gene (564 bp amplified (279 individuals, 27 localities from Taiwan Island indicated three divergent clades within B. bankorensis (Clade W, E and S, corresponding to the geography, thereby verifying the importance of the CMRs and Kaoping River drainage as major biogeographic barriers. Mismatch distribution analysis, neutrality tests and Bayesian skyline plots revealed that a significant population expansion occurred for the total population and Clade W, with horizons dated to approximately 0.08 and 0.07 Mya, respectively. These results suggest that the population expansion of Taiwan Island species B. bankorensis might have resulted from the release of available habitat in post-glacial periods, the genetic variation on mtDNA showing habitat selection, subsequent population dispersal, and co-distribution among clades.The multiple origins (different clades of B. bankorensis mtDNA sequences were first evident in this study. The divergent genetic clades found within B. bankorensis could be independent colonization by previously diverged lineages; inferring B. bankorensis originated from B. gargarizans of mainland China, then dispersal followed by isolation within Taiwan Island. Highly divergent clades between W and E of B
Directory of Open Access Journals (Sweden)
Junha Shin
Full Text Available Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life-Archaea, Bacteria, and Eukaryota-suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co
Lerceteau-Köhler, Estelle; Schliewen, Ulrich; Kopun, Theodora; Weiss, Steven
2013-08-26
Brown trout Salmo trutta have been described in terms of five major mtDNA lineages, four of which correspond to major ocean basins, and one, according to some authors, to a distinct taxon, marbled trout Salmo marmoratus. The Atlantic and Danubian lineages of brown trout meet in a poorly documented contact zone in Central Europe. The natural versus human mediated origin of the Atlantic lineage in the upper Danube is a question of both theoretical and practical importance with respect to conservation management. We provide a comprehensive population genetic analysis of brown trout in the region with the aim of evaluating the geographic distribution and genetic integrity of these two lineages in and around their contact zone. Genetic screening of 114 populations of brown trout across the Danube/Rhine/Elbe catchments revealed a counter-intuitive phylogeographic structure with near fixation of the Atlantic lineage in the sampled portions of the Bavarian Danube. Along the Austrian Danube, phylogeographic informative markers revealed increasing percentages of Danube-specific alleles with downstream distance. Pure Danube lineage populations were restricted to peri-alpine isolates within previously glaciated regions. Both empirical data and simulated hybrid comparisons support that trout in non-glaciated regions north and northeast of the Alps have an admixed origin largely based on natural colonization. In contrast, the presence of Atlantic basin alleles south and southeast of the Alps stems from hatchery introductions and subsequent introgression. Despite extensive stocking of the Atlantic lineage, little evidence of first generation stocked fish or F1 hybrids were found implying that admixture has been established over time. A purely phylogeographic paradigm fails to describe the distribution of genetic lineages of Salmo in Central Europe. The distribution pattern of the Atlantic and Danube lineages is extremely difficult to explain without invoking very strong
Bayesian phylogeography finds its roots.
Directory of Open Access Journals (Sweden)
Philippe Lemey
2009-09-01
Full Text Available As a key factor in endemic and epidemic dynamics, the geographical distribution of viruses has been frequently interpreted in the light of their genetic histories. Unfortunately, inference of historical dispersal or migration patterns of viruses has mainly been restricted to model-free heuristic approaches that provide little insight into the temporal setting of the spatial dynamics. The introduction of probabilistic models of evolution, however, offers unique opportunities to engage in this statistical endeavor. Here we introduce a Bayesian framework for inference, visualization and hypothesis testing of phylogeographic history. By implementing character mapping in a Bayesian software that samples time-scaled phylogenies, we enable the reconstruction of timed viral dispersal patterns while accommodating phylogenetic uncertainty. Standard Markov model inference is extended with a stochastic search variable selection procedure that identifies the parsimonious descriptions of the diffusion process. In addition, we propose priors that can incorporate geographical sampling distributions or characterize alternative hypotheses about the spatial dynamics. To visualize the spatial and temporal information, we summarize inferences using virtual globe software. We describe how Bayesian phylogeography compares with previous parsimony analysis in the investigation of the influenza A H5N1 origin and H5N1 epidemiological linkage among sampling localities. Analysis of rabies in West African dog populations reveals how virus diffusion may enable endemic maintenance through continuous epidemic cycles. From these analyses, we conclude that our phylogeographic framework will make an important asset in molecular epidemiology that can be easily generalized to infer biogeogeography from genetic data for many organisms.
Bryce A. Richardson; Mee-Sook Kim; Ned B. Klopfenstein; Yuko Ota; Kwan Soo Woo; Richard C. Hamelin
2009-01-01
Presently, little is known about the worldwide genetic structure, diversity, or evolutionary relationships of the white-pineblister-rust fungus, Cronartium ribicola. A collaborative international effort is underway to determine the phylogeographic relationships among Asian, European, and North American sources of C. ribicola and...
Vergara, María; Basto, Mafalda P; Madeira, María José; Gómez-Moliner, Benjamín J; Santos-Reis, Margarida; Fernandes, Carlos; Ruiz-González, Aritz
2015-01-01
The stone marten is a widely distributed mustelid in the Palaearctic region that exhibits variable habitat preferences in different parts of its range. The species is a Holocene immigrant from southwest Asia which, according to fossil remains, followed the expansion of the Neolithic farming cultures into Europe and possibly colonized the Iberian Peninsula during the Early Neolithic (ca. 7,000 years BP). However, the population genetic structure and historical biogeography of this generalist carnivore remains essentially unknown. In this study we have combined mitochondrial DNA (mtDNA) sequencing (621 bp) and microsatellite genotyping (23 polymorphic markers) to infer the population genetic structure of the stone marten within the Iberian Peninsula. The mtDNA data revealed low haplotype and nucleotide diversities and a lack of phylogeographic structure, most likely due to a recent colonization of the Iberian Peninsula by a few mtDNA lineages during the Early Neolithic. The microsatellite data set was analysed with a) spatial and non-spatial Bayesian individual-based clustering (IBC) approaches (STRUCTURE, TESS, BAPS and GENELAND), and b) multivariate methods [discriminant analysis of principal components (DAPC) and spatial principal component analysis (sPCA)]. Additionally, because isolation by distance (IBD) is a common spatial genetic pattern in mobile and continuously distributed species and it may represent a challenge to the performance of the above methods, the microsatellite data set was tested for its presence. Overall, the genetic structure of the stone marten in the Iberian Peninsula was characterized by a NE-SW spatial pattern of IBD, and this may explain the observed disagreement between clustering solutions obtained by the different IBC methods. However, there was significant indication for contemporary genetic structuring, albeit weak, into at least three different subpopulations. The detected subdivision could be attributed to the influence of the
Directory of Open Access Journals (Sweden)
María Vergara
Full Text Available The stone marten is a widely distributed mustelid in the Palaearctic region that exhibits variable habitat preferences in different parts of its range. The species is a Holocene immigrant from southwest Asia which, according to fossil remains, followed the expansion of the Neolithic farming cultures into Europe and possibly colonized the Iberian Peninsula during the Early Neolithic (ca. 7,000 years BP. However, the population genetic structure and historical biogeography of this generalist carnivore remains essentially unknown. In this study we have combined mitochondrial DNA (mtDNA sequencing (621 bp and microsatellite genotyping (23 polymorphic markers to infer the population genetic structure of the stone marten within the Iberian Peninsula. The mtDNA data revealed low haplotype and nucleotide diversities and a lack of phylogeographic structure, most likely due to a recent colonization of the Iberian Peninsula by a few mtDNA lineages during the Early Neolithic. The microsatellite data set was analysed with a spatial and non-spatial Bayesian individual-based clustering (IBC approaches (STRUCTURE, TESS, BAPS and GENELAND, and b multivariate methods [discriminant analysis of principal components (DAPC and spatial principal component analysis (sPCA]. Additionally, because isolation by distance (IBD is a common spatial genetic pattern in mobile and continuously distributed species and it may represent a challenge to the performance of the above methods, the microsatellite data set was tested for its presence. Overall, the genetic structure of the stone marten in the Iberian Peninsula was characterized by a NE-SW spatial pattern of IBD, and this may explain the observed disagreement between clustering solutions obtained by the different IBC methods. However, there was significant indication for contemporary genetic structuring, albeit weak, into at least three different subpopulations. The detected subdivision could be attributed to the influence
Inferring Group Processes from Computer-Mediated Affective Text Analysis
Energy Technology Data Exchange (ETDEWEB)
Schryver, Jack C [ORNL; Begoli, Edmon [ORNL; Jose, Ajith [Missouri University of Science and Technology; Griffin, Christopher [Pennsylvania State University
2011-02-01
Political communications in the form of unstructured text convey rich connotative meaning that can reveal underlying group social processes. Previous research has focused on sentiment analysis at the document level, but we extend this analysis to sub-document levels through a detailed analysis of affective relationships between entities extracted from a document. Instead of pure sentiment analysis, which is just positive or negative, we explore nuances of affective meaning in 22 affect categories. Our affect propagation algorithm automatically calculates and displays extracted affective relationships among entities in graphical form in our prototype (TEAMSTER), starting with seed lists of affect terms. Several useful metrics are defined to infer underlying group processes by aggregating affective relationships discovered in a text. Our approach has been validated with annotated documents from the MPQA corpus, achieving a performance gain of 74% over comparable random guessers.
Parametric inference for biological sequence analysis.
Pachter, Lior; Sturmfels, Bernd
2004-11-16
One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.
Nashrulloh, Maulana Malik; Kurniawan, Nia; Rahardi, Brian
2017-11-01
The increasing availability of genetic sequence data associated with explicit geographic and environment (including biotic and abiotic components) information offers new opportunities to study the processes that shape biodiversity and its patterns. Developing phylogeography reconstruction, by integrating phylogenetic and biogeographic knowledge, provides richer and deeper visualization and information on diversification events than ever before. Geographical information systems such as QGIS provide an environment for spatial modeling, analysis, and dissemination by which phylogenetic models can be explicitly linked with their associated spatial data, and subsequently, they will be integrated with other related georeferenced datasets describing the biotic and abiotic environment. We are introducing PHYLOGEOrec, a QGIS plugin for building spatial phylogeographic reconstructions constructed from phylogenetic tree and geographical information data based on QGIS2threejs. By using PHYLOGEOrec, researchers can integrate existing phylogeny and geographical information data, resulting in three-dimensional geographic visualizations of phylogenetic trees in the Keyhole Markup Language (KML) format. Such formats can be overlaid on a map using QGIS and finally, spatially viewed in QGIS by means of a QGIS2threejs engine for further analysis. KML can also be viewed in reputable geobrowsers with KML-support (i.e., Google Earth).
Hill, Véronique; Zozio, Thierry; Sadikalay, Syndia; Viegas, Sofia; Streit, Elisabeth; Kallenius, Gunilla; Rastogi, Nalin
2012-01-01
Multiple-locus variable-number tandem repeat analysis (MLVA) is useful to establish transmission routes and sources of infections for various microorganisms including Mycobacterium tuberculosis complex (MTC). The recently released SITVITWEB database contains 12-loci Mycobacterial Interspersed Repetitive Units – Variable Number of Tandem DNA Repeats (MIRU-VNTR) profiles and spoligotype patterns for thousands of MTC strains; it uses MIRU International Types (MIT) and Spoligotype International Types (SIT) to designate clustered patterns worldwide. Considering existing doubts on the ability of spoligotyping alone to reveal exact phylogenetic relationships between MTC strains, we developed a MLVA based classification for MTC genotypic lineages. We studied 6 different subsets of MTC isolates encompassing 7793 strains worldwide. Minimum spanning trees (MST) were constructed to identify major lineages, and the most common representative located as a central node was taken as the prototype defining different phylogenetic groups. A total of 7 major lineages with their respective prototypes were identified: Indo-Oceanic/MIT57, East Asian and African Indian/MIT17, Euro American/MIT116, West African-I/MIT934, West African-II/MIT664, M. bovis/MIT49, M.canettii/MIT60. Further MST subdivision identified an additional 34 sublineage MIT prototypes. The phylogenetic relationships among the 37 newly defined MIRU-VNTR lineages were inferred using a classification algorithm based on a bayesian approach. This information was used to construct an updated phylogenetic and phylogeographic snapshot of worldwide MTC diversity studied both at the regional, sub-regional, and country level according to the United Nations specifications. We also looked for IS6110 insertional events that are known to modify the results of the spoligotyping in specific circumstances, and showed that a fair portion of convergence leading to the currently observed bias in phylogenetic classification of strains may
Directory of Open Access Journals (Sweden)
Rafael Splendore de Borba
Full Text Available In this study, phylogenetic and phylogeographic analyses of populations identified as Hypostomus strigaticeps from the upper Paraná River basin were conducted in order to test whether these different populations comprises cryptic species or structured populations and to assess their genetic variability. The sequences of the mitochondrial DNA ATP sintetase (subunits 6/8 of 27 specimens from 10 populations (one from Mogi-Guaçu River, five from Paranapanema River, three from Tietê River and one from Peixe River were analyzed. The phylogeographic analysis showed the existence of eight haplotypes (A-H, and despite the ancestral haplotype includes only individuals from the Tietê River basin, the distribution of H. strigaticeps was not restricted to this basin. Haplotypes A, B and F were the most frequent. Haplotypes D, E, F, G, and H were present in the sub-basin of Paranapanema, two (A and B were present in the sub-basin of the Tietê River, one (C was exclusively distributed in the sub-basin of the Peixe River, and one (B was also present in the sub-basin of the Grande River. The phylogenetic analysis showed that the populations of H. strigaticeps indeed form a monophyletic unit comprising two lineages: TG, with representatives from the Tietê, Mogi-Guaçu and Peixe Rivers; and PP, with specimens from the Paranapanema River. The observed degree of genetic divergence within the TG and PP lineages was 0.1% and 0.2%, respectively, whereas the genetic divergence between the two lineages themselves was approximately 1%. The results of the phylogenetic analysis do not support the hypothesis of existence of crypt species and the phylogeographic analysis confirm the presence of H. strigaticeps in other sub-basins of the upper Paraná River: Grande, Peixe, and Paranapanema sub-basins.
Kitajima, Akira; Nonaka, Keisuke; Yoshioka, Terutaka; Ohta, Satoshi; Goto, Shingo; Toyoda, Atsushi; Fujiyama, Asao; Mochizuki, Takako; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu
2016-01-01
Most indigenous citrus varieties are assumed to be natural hybrids, but their parentage has so far been determined in only a few cases because of their wide genetic diversity and the low transferability of DNA markers. Here we infer the parentage of indigenous citrus varieties using simple sequence repeat and indel markers developed from various citrus genome sequence resources. Parentage tests with 122 known hybrids using the selected DNA markers certify their transferability among those hybrids. Identity tests confirm that most variant strains are selected mutants, but we find four types of kunenbo (Citrus nobilis) and three types of tachibana (Citrus tachibana) for which we suggest different origins. Structure analysis with DNA markers that are in Hardy–Weinberg equilibrium deduce three basic taxa coinciding with the current understanding of citrus ancestors. Genotyping analysis of 101 indigenous citrus varieties with 123 selected DNA markers infers the parentages of 22 indigenous citrus varieties including Satsuma, Temple, and iyo, and single parents of 45 indigenous citrus varieties, including kunenbo, C. ichangensis, and Ichang lemon by allele-sharing and parentage tests. Genotyping analysis of chloroplast and mitochondrial genomes using 11 DNA markers classifies their cytoplasmic genotypes into 18 categories and deduces the combination of seed and pollen parents. Likelihood ratio analysis verifies the inferred parentages with significant scores. The reconstructed genealogy identifies 12 types of varieties consisting of Kishu, kunenbo, yuzu, koji, sour orange, dancy, kobeni mikan, sweet orange, tachibana, Cleopatra, willowleaf mandarin, and pummelo, which have played pivotal roles in the occurrence of these indigenous varieties. The inferred parentage of the indigenous varieties confirms their hybrid origins, as found by recent studies. PMID:27902727
Directory of Open Access Journals (Sweden)
Sergei V Drovetski
Full Text Available Phylogeographic studies of Holarctic birds are challenging because they involve vast geographic scale, complex glacial history, extensive phenotypic variation, and heterogeneous taxonomic treatment across countries, all of which require large sample sizes. Knowledge about the quality of phylogeographic information provided by different loci is crucial for study design. We use sequences of one mtDNA gene, one sex-linked intron, and one autosomal intron to elucidate large scale phylogeographic patterns in the Holarctic lark genus Eremophila. The mtDNA ND2 gene identified six geographically, ecologically, and phenotypically concordant clades in the Palearctic that diverged in the Early-Middle Pleistocene and suggested paraphyly of the horned lark (E. alpestris with respect to the Temminck's lark (E. bilopha. In the Nearctic, ND2 identified five subclades which diverged in the Late Pleistocene. They overlapped geographically and were not concordant phenotypically or ecologically. Nuclear alleles provided little information on geographic structuring of genetic variation in horned larks beyond supporting the monophyly of Eremophila and paraphyly of the horned lark. Multilocus species trees based on two nuclear or all three loci provided poor support for haplogroups identified by mtDNA. The node ages calculated using mtDNA were consistent with the available paleontological data, whereas individual nuclear loci and multilocus species trees appeared to underestimate node ages. We argue that mtDNA is capable of discovering independent evolutionary units within avian taxa and can provide a reasonable phylogeographic hypothesis when geographic scale, geologic history, and phenotypic variation in the study system are too complex for proposing reasonable a priori hypotheses required for multilocus methods. Finally, we suggest splitting the currently recognized horned lark into five Palearctic and one Nearctic species.
Copy-number analysis and inference of subclonal populations in cancer genomes using Sclust.
Cun, Yupeng; Yang, Tsun-Po; Achter, Viktor; Lang, Ulrich; Peifer, Martin
2018-06-01
The genomes of cancer cells constantly change during pathogenesis. This evolutionary process can lead to the emergence of drug-resistant mutations in subclonal populations, which can hinder therapeutic intervention in patients. Data derived from massively parallel sequencing can be used to infer these subclonal populations using tumor-specific point mutations. The accurate determination of copy-number changes and tumor impurity is necessary to reliably infer subclonal populations by mutational clustering. This protocol describes how to use Sclust, a copy-number analysis method with a recently developed mutational clustering approach. In a series of simulations and comparisons with alternative methods, we have previously shown that Sclust accurately determines copy-number states and subclonal populations. Performance tests show that the method is computationally efficient, with copy-number analysis and mutational clustering taking Linux/Unix command-line syntax should be able to carry out analyses of subclonal populations.
Structural influence of gene networks on their inference: analysis of C3NET
Directory of Open Access Journals (Sweden)
Emmert-Streib Frank
2011-06-01
Full Text Available Abstract Background The availability of large-scale high-throughput data possesses considerable challenges toward their functional analysis. For this reason gene network inference methods gained considerable interest. However, our current knowledge, especially about the influence of the structure of a gene network on its inference, is limited. Results In this paper we present a comprehensive investigation of the structural influence of gene networks on the inferential characteristics of C3NET - a recently introduced gene network inference algorithm. We employ local as well as global performance metrics in combination with an ensemble approach. The results from our numerical study for various biological and synthetic network structures and simulation conditions, also comparing C3NET with other inference algorithms, lead a multitude of theoretical and practical insights into the working behavior of C3NET. In addition, in order to facilitate the practical usage of C3NET we provide an user-friendly R package, called c3net, and describe its functionality. It is available from https://r-forge.r-project.org/projects/c3net and from the CRAN package repository. Conclusions The availability of gene network inference algorithms with known inferential properties opens a new era of large-scale screening experiments that could be equally beneficial for basic biological and biomedical research with auspicious prospects. The availability of our easy to use software package c3net may contribute to the popularization of such methods. Reviewers This article was reviewed by Lev Klebanov, Joel Bader and Yuriy Gusev.
Nonparametric Bayesian inference for mean residual life functions in survival analysis.
Poynor, Valerie; Kottas, Athanasios
2018-01-19
Modeling and inference for survival analysis problems typically revolves around different functions related to the survival distribution. Here, we focus on the mean residual life (MRL) function, which provides the expected remaining lifetime given that a subject has survived (i.e. is event-free) up to a particular time. This function is of direct interest in reliability, medical, and actuarial fields. In addition to its practical interpretation, the MRL function characterizes the survival distribution. We develop general Bayesian nonparametric inference for MRL functions built from a Dirichlet process mixture model for the associated survival distribution. The resulting model for the MRL function admits a representation as a mixture of the kernel MRL functions with time-dependent mixture weights. This model structure allows for a wide range of shapes for the MRL function. Particular emphasis is placed on the selection of the mixture kernel, taken to be a gamma distribution, to obtain desirable properties for the MRL function arising from the mixture model. The inference method is illustrated with a data set of two experimental groups and a data set involving right censoring. The supplementary material available at Biostatistics online provides further results on empirical performance of the model, using simulated data examples. © The Author 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Directory of Open Access Journals (Sweden)
Olivia Charrier
Full Text Available Genetic variation within plant species is determined by a number of factors such as reproductive mode, breeding system, life history traits and climatic events. In alpine regions, plants experience heterogenic abiotic conditions that influence the population's genetic structure. The aim of this study was to investigate the genetic structure and phylogeographic history of the subalpine shrub Rhododendron ferrugineum across the Pyrenees and the links between the populations in the Pyrenees, the Alps and Jura Mountains. We used 27 microsatellite markers to genotype 645 samples from 29 Pyrenean populations, three from the Alps and one from the Jura Mountains. These data were used to estimate population genetics statistics such as allelic richness, observed heterozygosity, expected heterozygosity, fixation index, inbreeding coefficient and number of migrants. Genetic diversity was found to be higher in the Alps than in the Pyrenees suggesting colonization waves from the Alps to the Pyrenees. Two separate genetic lineages were found in both the Alps and Pyrenees, with a substructure of five genetic clusters in the Pyrenees where a loss of genetic diversity was noted. The strong differentiation among clusters is maintained by low gene flow across populations. Moreover, some populations showed higher genetic diversity than others and presented rare alleles that may indicate the presence of alpine refugia. Two lineages of R. ferrugineum have colonized the Pyrenees from the Alps. Then, during glaciation events R. ferrugineum survived in the Pyrenees in different refugia such as lowland refugia at the eastern part of the chain and nunataks at high elevations leading to a clustered genetic pattern.
Directory of Open Access Journals (Sweden)
Michelle T Guzik
Full Text Available Desert mound springs of the Great Artesian Basin in central Australia maintain an endemic fauna that have historically been considered ubiquitous throughout all of the springs. Recent studies, however, have shown that several endemic invertebrate species are genetically highly structured and contain previously unrecognised species, suggesting that individuals may be geographically 'stranded in desert islands'. Here we further tested the generality of this hypothesis by conducting genetic analyses of the obligate aquatic phreatoicid isopod Phreatomerus latipes. Phylogenetic and phylogeographic relationships amongst P. latipes individuals were examined using a multilocus approach comprising allozymes and mtDNA sequence data. From the Lake Eyre region in South Australia we collected data for 476 individuals from 69 springs for the mtDNA gene COI; in addition, allozyme electrophoresis was conducted on 331 individuals from 19 sites for 25 putative loci. Phylogenetic and population genetic analyses showed three major clades in both allozyme and mtDNA data, with a further nine mtDNA sub-clades, largely supported by the allozymes. Generally, each of these sub-clades was concordant with a traditional geographic grouping known as spring complexes. We observed a coalescent time between ∼2-15 million years ago for haplotypes within each of the nine mtDNA sub-clades, whilst an older total time to coalescence (>15 mya was observed for the three major clades. Overall we observed that multiple layers of phylogeographic history are exemplified by Phreatomerus, suggesting that major climate events and their impact on the landscape have shaped the observed high levels of diversity and endemism. Our results show that this genus reflects a diverse fauna that existed during the early Miocene and appears to have been regionally restricted. Subsequent aridification events have led to substantial contraction of the original habitat, possibly over repeated Pleistocene
Causal inference in econometrics
Kreinovich, Vladik; Sriboonchitta, Songsak
2016-01-01
This book is devoted to the analysis of causal inference which is one of the most difficult tasks in data analysis: when two phenomena are observed to be related, it is often difficult to decide whether one of them causally influences the other one, or whether these two phenomena have a common cause. This analysis is the main focus of this volume. To get a good understanding of the causal inference, it is important to have models of economic phenomena which are as accurate as possible. Because of this need, this volume also contains papers that use non-traditional economic models, such as fuzzy models and models obtained by using neural networks and data mining techniques. It also contains papers that apply different econometric models to analyze real-life economic dependencies.
Franco, Fernando Faria; Jojima, Cecília Leiko; Perez, Manolo Fernandez; Zappi, Daniela Cristina; Taylor, Nigel; Moraes, Evandro Marsola
2017-11-01
In order to investigate biogeographic influences on xeric biota in the Brazilian Atlantic Forest (BAF), a biodiversity hotspot, we used a monophyletic group including three cactus taxa as a model to perform a phylogeographic study: Cereus fernambucensis subsp. fernambucensis , C. fernambucensis subsp. sericifer , and C. insularis . These cacti are allopatric and grow in xeric habitats along BAF, including isolated granite and gneiss rock outcrops (Inselbergs), sand dune vegetation (Restinga forest), and the rocky shore of an oceanic archipelago (islands of Fernando de Noronha). The nucleotide information from nuclear gene phytochrome C and plastid intergenic spacer trnS-trnG was used to perform different approaches and statistical analyses, comprising population structure, demographic changes, phylogenetic relationships, and biogeographic reconstruction in both spatial and temporal scales. We recovered four allopatric population groups with highly supported branches in the phylogenetic tree with divergence initiated in the middle Pleistocene: southern distribution of C. fernambucensis subsp. fernambucensis , northern distribution of C. fernambucensis subsp. fernambucensis together with C. insularis , southern distribution of C. fernambucensis subsp. sericifer , and northern distribution of C. fernambucensis subsp. sericifer . Further, the results suggest that genetic diversity of population groups was strongly shaped by an initial colonization event from south to north followed by fragmentation. The phylogenetic pattern found for C. insularis is plausible with peripatric speciation in the archipelago of Fernando de Noronha. To explain the phylogeographic patterns, the putative effects of both climatic and sea level changes as well as neotectonic activity during the Pleistocene are discussed.
Yu, Dan; Chen, Ming; Tang, Qiongying; Li, Xiaojuan; Liu, Huanzhang
2014-10-25
Rhynchocypris oxycephalus is a cold water fish with a wide geographic distribution including the relatively warm temperate regions of southern China. It also occurs in second- and third-step geomorphic areas in China. Previous studies have postulated that high-altitude populations of R. oxycephalus in southern China are Quaternary glacial relics. In this study, we used the mitochondrial gene Cytb and the nuclear gene RAG2 to investigate the species phylogeographical patterns and to test two biogeographic hypotheses: (1) that divergence between lineages supports the three-step model and (2) climatic fluctuations during the Quaternary resulted in the present distribution in southern China. Phylogenetic analysis detected three major matrilines (A, B, and C); with matrilines B and C being further subdivided into two submatrilines. Based on genetic distances and morphological differences, matriline A potentially represents a cryptic subspecies. The geographic division between matrilines B and C coincided with the division of the second and third geomorphic steps in China, suggesting a historical vicariance event. Pliocene climatic fluctuations might have facilitated the southwards dispersal of R. oxycephalus in matriline C, with the subsequent warming resulting in its split into submatrilines C1 and C2, leaving submatriline C2 as a relic in southern China. Our study demonstrates that geological events (three steps orogenesis) and climate fluctuations during the Pliocene were important factors in shaping phylogeographical patterns in R. oxycephalus. Notably, no genetic diversity was detected in several populations, all of which possessed unique genotypes. This indicates the uniqueness of local populations and calls for a special conservation plan for the whole species at the population level.
A Bayesian Network Schema for Lessening Database Inference
National Research Council Canada - National Science Library
Chang, LiWu; Moskowitz, Ira S
2001-01-01
.... The authors introduce a formal schema for database inference analysis, based upon a Bayesian network structure, which identifies critical parameters involved in the inference problem and represents...
Baur, Brittany; Bozdag, Serdar
2015-04-01
One of the challenging and important computational problems in systems biology is to infer gene regulatory networks (GRNs) of biological systems. Several methods that exploit gene expression data have been developed to tackle this problem. In this study, we propose the use of copy number and DNA methylation data to infer GRNs. We developed an algorithm that scores regulatory interactions between genes based on canonical correlation analysis. In this algorithm, copy number or DNA methylation variables are treated as potential regulator variables, and expression variables are treated as potential target variables. We first validated that the canonical correlation analysis method is able to infer true interactions in high accuracy. We showed that the use of DNA methylation or copy number datasets leads to improved inference over steady-state expression. Our results also showed that epigenetic and structural information could be used to infer directionality of regulatory interactions. Additional improvements in GRN inference can be gleaned from incorporating the result in an informative prior in a dynamic Bayesian algorithm. This is the first study that incorporates copy number and DNA methylation into an informative prior in dynamic Bayesian framework. By closely examining top-scoring interactions with different sources of epigenetic or structural information, we also identified potential novel regulatory interactions.
Stochastic processes inference theory
Rao, Malempati M
2014-01-01
This is the revised and enlarged 2nd edition of the authors’ original text, which was intended to be a modest complement to Grenander's fundamental memoir on stochastic processes and related inference theory. The present volume gives a substantial account of regression analysis, both for stochastic processes and measures, and includes recent material on Ridge regression with some unexpected applications, for example in econometrics. The first three chapters can be used for a quarter or semester graduate course on inference on stochastic processes. The remaining chapters provide more advanced material on stochastic analysis suitable for graduate seminars and discussions, leading to dissertation or research work. In general, the book will be of interest to researchers in probability theory, mathematical statistics and electrical and information theory.
Newton, Richard; Wernisch, Lorenz
2014-01-01
Inferring gene regulatory relationships from observational data is challenging. Manipulation and intervention is often required to unravel causal relationships unambiguously. However, gene copy number changes, as they frequently occur in cancer cells, might be considered natural manipulation experiments on gene expression. An increasing number of data sets on matched array comparative genomic hybridisation and transcriptomics experiments from a variety of cancer pathologies are becoming publicly available. Here we explore the potential of a meta-analysis of thirty such data sets. The aim of our analysis was to assess the potential of in silico inference of trans-acting gene regulatory relationships from this type of data. We found sufficient correlation signal in the data to infer gene regulatory relationships, with interesting similarities between data sets. A number of genes had highly correlated copy number and expression changes in many of the data sets and we present predicted potential trans-acted regulatory relationships for each of these genes. The study also investigates to what extent heterogeneity between cell types and between pathologies determines the number of statistically significant predictions available from a meta-analysis of experiments. PMID:25148247
Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis
Dezfuli, Homayoon; Kelly, Dana; Smith, Curtis; Vedros, Kurt; Galyean, William
2009-01-01
This document, Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis, is intended to provide guidelines for the collection and evaluation of risk and reliability-related data. It is aimed at scientists and engineers familiar with risk and reliability methods and provides a hands-on approach to the investigation and application of a variety of risk and reliability data assessment methods, tools, and techniques. This document provides both: A broad perspective on data analysis collection and evaluation issues. A narrow focus on the methods to implement a comprehensive information repository. The topics addressed herein cover the fundamentals of how data and information are to be used in risk and reliability analysis models and their potential role in decision making. Understanding these topics is essential to attaining a risk informed decision making environment that is being sought by NASA requirements and procedures such as 8000.4 (Agency Risk Management Procedural Requirements), NPR 8705.05 (Probabilistic Risk Assessment Procedures for NASA Programs and Projects), and the System Safety requirements of NPR 8715.3 (NASA General Safety Program Requirements).
cpDNA microsatellite markers for Lemna minor (Araceae): Phylogeographic implications.
Wani, Gowher A; Shah, Manzoor A; Reshi, Zafar A; Atangana, Alain R; Khasa, Damase P
2014-07-01
A lack of genetic markers impedes our understanding of the population biology of Lemna minor. Thus, the development of appropriate genetic markers for L. minor promises to be highly useful for population genetic studies and for addressing other life history questions regarding the species. • For the first time, we characterized nine polymorphic and 24 monomorphic chloroplast microsatellite markers in L. minor using DNA samples of 26 individuals sampled from five populations in Kashmir and of 17 individuals from three populations in Quebec. Initially, we designed 33 primer pairs, which were tested on genomic DNA from natural populations. Nine loci provided markers with two alleles. Based on genotyping of the chloroplast DNA fragments from 43 sampled individuals, we identified one haplotype in Quebec and 11 haplotypes in Kashmir, of which one occurs in 56% of the genotypes, one in 8%, and nine in 4%, respectively. There was a maximum of two alleles per locus. • These new chloroplast microsatellite markers for L. minor and haplotype distribution patterns indicate a complex phylogeographic history that merits further investigation.
Hofmann, B
2008-06-01
Are there similarities between scientific and moral inference? This is the key question in this article. It takes as its point of departure an instance of one person's story in the media changing both Norwegian public opinion and a brand-new Norwegian law prohibiting the use of saviour siblings. The case appears to falsify existing norms and to establish new ones. The analysis of this case reveals similarities in the modes of inference in science and morals, inasmuch as (a) a single case functions as a counter-example to an existing rule; (b) there is a common presupposition of stability, similarity and order, which makes it possible to reason from a few cases to a general rule; and (c) this makes it possible to hold things together and retain order. In science, these modes of inference are referred to as falsification, induction and consistency. In morals, they have a variety of other names. Hence, even without abandoning the fact-value divide, there appear to be similarities between inference in science and inference in morals, which may encourage communication across the boundaries between "the two cultures" and which are relevant to medical humanities.
A Comparative Analysis of Fuzzy Inference Engines in Context of ...
African Journals Online (AJOL)
PROF. O. E. OSUAGWU
Fuzzy Inference engine is an important part of reasoning systems capable of extracting correct conclusions from ... is known as the inference, or rule definition portion, of fuzzy .... minimal set of decision rules based on input- ... The study uses Mamdani FIS model and. Sugeno FIS ... control of induction motor drive. [18] study.
Directory of Open Access Journals (Sweden)
Steven M Carr
Full Text Available Phylogenomic analysis of highly-resolved intraspecific phylogenies obtained from complete mitochondrial DNA genomes has had great success in clarifying relationships within and among human populations, but has found limited application in other wild species. Analytical challenges include assessment of random versus non-random phylogeographic distributions, and quantification of differences in tree topologies among populations. Harp Seals (Pagophilus groenlandicus Erxleben, 1777 have a biogeographic distribution based on four discrete trans-Atlantic breeding and whelping populations located on "fast ice" attached to land in the White Sea, Greenland Sea, the Labrador ice Front, and Southern Gulf of St Lawrence. This East to West distribution provides a set of a priori phylogeographic hypotheses. Outstanding biogeographic questions include the degree of genetic distinctiveness among these populations, in particular between the Greenland Sea and White Sea grounds. We obtained complete coding-region DNA sequences (15,825 bp for 53 seals. Each seal has a unique mtDNA genome sequence, which differ by 6 ~ 107 substitutions. Six major clades / groups are detectable by parsimony, neighbor-joining, and Bayesian methods, all of which are found in breeding populations on either side of the Atlantic. The species coalescent is at 180 KYA; the most recent clade, which accounts for 66% of the diversity, reflects an expansion during the mid-Wisconsinan glaciation 40~60 KYA. FST is significant only between the White Sea and Greenland Sea or Ice Front populations. Hierarchal AMOVA of 2-, 3-, or 4-island models identifies small but significant ΦSC among populations within groups, but not among groups. A novel Monte-Carlo simulation indicates that the observed distribution of individuals within breeding populations over the phylogenetic tree requires significantly fewer dispersal events than random expectation, consistent with island or a priori East to West 2- or 3
Butterfield, John S.; Díaz-Ferguson, Edgardo; Silliman, Brian R.; Saunders, Jonathan W.; Buddo, Dayne; Mignucci-Giannoni, Antonio A.; Searle, Linda; Allen, Aarin Conrad; Hunter, Margaret E.
2015-01-01
The red lionfish (Pterois volitans) is an invasive predatory marine fish that has rapidly expanded its presence in the Western Hemisphere. We collected 214 invasive red lionfish samples from nine countries and territories, including seven unpublished locations. To more comprehensively evaluate connectivity, we compiled our d-loop sequence data with 846 published sequences, resulting in 1,060 samples from 14 locations. We found low nucleotide diversity (π = 0.003) and moderate haplotype diversity (h = 0.59). Using haplotype population pairwise ΦST tests, we analyzed possible phylogeographic breaks that were previously proposed based on other reef organisms. We found support for the Bahamas/Turks/Caicos versus Caribbean break (ΦST = 0.12) but not for the Northwestern Caribbean, Eastern Caribbean, or US East Coast versus Bahamas breaks. The Northern Region had higher variation and more haplotypes, supporting introductions of at least five haplotypes to the region. Our wide-ranging samples showed that a lower-frequency haplotype in the Northern Region dominated the Southern Region and suggested multiple introductions, possibly to the south. We tested multiple scenarios of phylogeographic structure with analyses of molecular variance and found support for a Northern and Southern Region split at the Bahamas/Turks/Caicos versus Caribbean break (percentage of variation among regions = 8.49 %). We found that Puerto Rico clustered with the Southern Region more strongly than with the Northern Region, as opposed to previous reports. We also found the rare haplotype H03 for the first time in the southern Caribbean (Panama), indicating that either secondary releases occurred or that the low-frequency haplotypes have had time to disperse to extreme southern Caribbean locations.
Dehmer, Matthias; Kurt, Zeyneb; Emmert-Streib, Frank; Them, Christa; Schulc, Eva; Hofer, Sabine
2015-01-01
In this paper, we investigate treatment cycles inferred from diabetes data by means of graph theory. We define the term treatment cycles graph-theoretically and perform a descriptive as well as quantitative analysis thereof. Also, we interpret our findings in terms of nursing and clinical management. PMID:26030296
Hird, Sarah; Kubatko, Laura; Carstens, Bryan
2010-11-01
We describe a method for estimating species trees that relies on replicated subsampling of large data matrices. One application of this method is phylogeographic research, which has long depended on large datasets that sample intensively from the geographic range of the focal species; these datasets allow systematicists to identify cryptic diversity and understand how contemporary and historical landscape forces influence genetic diversity. However, analyzing any large dataset can be computationally difficult, particularly when newly developed methods for species tree estimation are used. Here we explore the use of replicated subsampling, a potential solution to the problem posed by large datasets, with both a simulation study and an empirical analysis. In the simulations, we sample different numbers of alleles and loci, estimate species trees using STEM, and compare the estimated to the actual species tree. Our results indicate that subsampling three alleles per species for eight loci nearly always results in an accurate species tree topology, even in cases where the species tree was characterized by extremely rapid divergence. Even more modest subsampling effort, for example one allele per species and two loci, was more likely than not (>50%) to identify the correct species tree topology, indicating that in nearly all cases, computing the majority-rule consensus tree from replicated subsampling provides a good estimate of topology. These results were supported by estimating the correct species tree topology and reasonable branch lengths for an empirical 10-locus great ape dataset. Copyright © 2010 Elsevier Inc. All rights reserved.
Causal inference in economics and marketing.
Varian, Hal R
2016-07-05
This is an elementary introduction to causal inference in economics written for readers familiar with machine learning methods. The critical step in any causal analysis is estimating the counterfactual-a prediction of what would have happened in the absence of the treatment. The powerful techniques used in machine learning may be useful for developing better estimates of the counterfactual, potentially improving causal inference.
Bayesian Inference Methods for Sparse Channel Estimation
DEFF Research Database (Denmark)
Pedersen, Niels Lovmand
2013-01-01
This thesis deals with sparse Bayesian learning (SBL) with application to radio channel estimation. As opposed to the classical approach for sparse signal representation, we focus on the problem of inferring complex signals. Our investigations within SBL constitute the basis for the development...... of Bayesian inference algorithms for sparse channel estimation. Sparse inference methods aim at finding the sparse representation of a signal given in some overcomplete dictionary of basis vectors. Within this context, one of our main contributions to the field of SBL is a hierarchical representation...... analysis of the complex prior representation, where we show that the ability to induce sparse estimates of a given prior heavily depends on the inference method used and, interestingly, whether real or complex variables are inferred. We also show that the Bayesian estimators derived from the proposed...
Directory of Open Access Journals (Sweden)
Miao An
2015-06-01
Full Text Available Thlaspi arvense is a well-known annual farmland weed with worldwide distribution, which can be found from sea level to above 4000 m high on the Qinghai-Tibetan Plateau (QTP. In this paper, a phylogeographic history of T. arvense including 19 populations from China was inferred by using three chloroplast (cp DNA segments (trnL-trnF, rpl32-trnL and rps16 and one nuclear (n DNA segment (Fe-regulated transporter-like protein, ZIP. A total of 11 chloroplast haplotypes and six nuclear alleles were identified, and haplotypes unique to the QTP were recognized (C4, C5, C7 and N4. On the basis of molecular dating, haplotypes C4, C5 and C7 have separated from others around 1.58 Ma for cpDNA, which corresponds to the QTP uplift. In addition, this article suggests that the T. arvense populations in China are a mixture of diverged subpopulations as inferred by hT/vT test (hT ≤ vT, cpDNA and positive Tajima’s D values (1.87, 0.05 < p < 0.10 for cpDNA and 3.37, p < 0.01 for nDNA. Multimodality mismatch distribution curves and a relatively large shared area of suitable environmental conditions between the Last Glacial Maximum (LGM as well as the present time recognized by MaxEnt software reject the sudden expansion population model.
Gottscho, Andrew D
2016-02-01
The purpose of this article is to provide an ultimate tectonic explanation for several well-studied zoogeographic boundaries along the west coast of North America, specifically, along the boundary of the North American and Pacific plates (the San Andreas Fault system). By reviewing 177 references from the plate tectonics and zoogeography literature, I demonstrate that four Great Pacific Fracture Zones (GPFZs) in the Pacific plate correspond with distributional limits and spatially concordant phylogeographic breaks for a wide variety of marine and terrestrial animals, including invertebrates, fish, amphibians, reptiles, birds, and mammals. These boundaries are: (1) Cape Mendocino and the North Coast Divide, (2) Point Conception and the Transverse Ranges, (3) Punta Eugenia and the Vizcaíno Desert, and (4) Cabo Corrientes and the Sierra Transvolcanica. However, discussion of the GPFZs is mostly absent from the zoogeography and phylogeography literature likely due to a disconnect between biologists and geologists. I argue that the four zoogeographic boundaries reviewed here ultimately originated via the same geological process (triple junction evolution). Finally, I suggest how a comparative phylogeographic approach can be used to test the hypothesis presented here. © 2014 Cambridge Philosophical Society.
Mokrousov, Igor; Shitikov, Egor; Skiba, Yuriy; Kolchenko, Sergey; Chernyaeva, Ekaterina; Vyazovaya, Anna
2017-11-01
To date, a major attention was justly given to the global lineages of Mycobacterium tuberculosis. Here, we demonstrated an importance of the minor ones, on an example of intriguing and underestimated NEW-1 family that belongs to Euro-American lineage (lineage 4). Analysis of the global WGS/NGS datasets (5715 strains) identified 2235 strains of Lineage 4 and 66 strains of sublineage L4.5. This latter is marked with RD122 genomic deletion and includes NEW-1 family. Phylogenomic analysis confirmed a separate position of the NEW-1 family that we tentatively designate L4.5.1/Iran. We propose an evolution/migration scenario starting with origin of L4.5 1000-1300 ya in China, subsequent origin of the pre-NEW-1 intermediate genotype in Tibet, further migration to Xinjiang/Uyghur, and finally to Iran since 800 ya (origin of NEW-1), possibly, via expansion of the Mongol Yuan empire. Analysis of longitudinal phylogeographic datasets revealed a sharp increase in prevalence of NEW-1 strains in Iran and its eastwards neighbors in the last 20years; most alarmingly, it is accompanied with significant association with multidrug resistance (MDR). Ongoing migration, especially, Afghan refugees flows to developed countries emphasize a risk of the wider spread of the epidemic MDR subtype within NEW-1 family that we coin as emerging resistant clone of M. tuberculosis in West Asia. Copyright © 2017 Elsevier Inc. All rights reserved.
Wu, Jiaqi; Kohno, Naoki; Mano, Shuhei; Fukumoto, Yukio; Tanabe, Hideyuki; Hasegawa, Masami; Yonezawa, Takahiro
2015-01-01
The Asian black bear Ursus thibetanus is widely distributed in Asia and is adapted to broad-leaved deciduous forests, playing an important ecological role in the natural environment. Several subspecies of U. thibetanus have been recognized, one of which, the Japanese black bear, is distributed in the Japanese archipelago. Recent molecular phylogeographic studies clarified that this subspecies is genetically distantly related to continental subspecies, suggesting an earlier origin. However, the evolutionary relationship between the Japanese and continental subspecies remained unclear. To understand the evolution of the Asian black bear in relation to geological events such as climatic and transgression-regression cycles, a reliable time estimation is also essential. To address these issues, we determined and analyzed the mt-genome of the Japanese subspecies. This indicates that the Japanese subspecies initially diverged from other Asian black bears in around 1.46Ma. The Northern continental population (northeast China, Russia, Korean peninsula) subsequently evolved, relatively recently, from the Southern continental population (southern China and Southeast Asia). While the Japanese black bear has an early origin, the tMRCAs and the dynamics of population sizes suggest that it dispersed relatively recently in the main Japanese islands: during the late Middle and Late Pleistocene, probably during or soon after the extinction of the brown bear in Honshu in the same period. Our estimation that the population size of the Japanese subspecies increased rapidly during the Late Pleistocene is the first evidential signal of a niche exchange between brown bears and black bears in the Japanese main islands. This interpretation seems plausible but was not corroborated by paleontological evidence that fossil record of the Japanese subspecies limited after the Late Pleistocene. We also report here a new fossil record of the oldest Japanese black bear from the Middle Pleistocene
Directory of Open Access Journals (Sweden)
Jiaqi Wu
Full Text Available The Asian black bear Ursus thibetanus is widely distributed in Asia and is adapted to broad-leaved deciduous forests, playing an important ecological role in the natural environment. Several subspecies of U. thibetanus have been recognized, one of which, the Japanese black bear, is distributed in the Japanese archipelago. Recent molecular phylogeographic studies clarified that this subspecies is genetically distantly related to continental subspecies, suggesting an earlier origin. However, the evolutionary relationship between the Japanese and continental subspecies remained unclear. To understand the evolution of the Asian black bear in relation to geological events such as climatic and transgression-regression cycles, a reliable time estimation is also essential. To address these issues, we determined and analyzed the mt-genome of the Japanese subspecies. This indicates that the Japanese subspecies initially diverged from other Asian black bears in around 1.46Ma. The Northern continental population (northeast China, Russia, Korean peninsula subsequently evolved, relatively recently, from the Southern continental population (southern China and Southeast Asia. While the Japanese black bear has an early origin, the tMRCAs and the dynamics of population sizes suggest that it dispersed relatively recently in the main Japanese islands: during the late Middle and Late Pleistocene, probably during or soon after the extinction of the brown bear in Honshu in the same period. Our estimation that the population size of the Japanese subspecies increased rapidly during the Late Pleistocene is the first evidential signal of a niche exchange between brown bears and black bears in the Japanese main islands. This interpretation seems plausible but was not corroborated by paleontological evidence that fossil record of the Japanese subspecies limited after the Late Pleistocene. We also report here a new fossil record of the oldest Japanese black bear from the
Identification and dynamics of a cryptic suture zone in tropical rainforest
Moritz, C.; Hoskin, C.J.; MacKenzie, J.B.; Phillips, B.L.; Tonione, M.; Silva, N.; VanDerWal, J.; Williams, S.E.; Graham, C.H.
2009-01-01
Suture zones, shared regions of secondary contact between long-isolated lineages, are natural laboratories for studying divergence and speciation. For tropical rainforest, the existence of suture zones and their significance for speciation has been controversial. Using comparative phylogeographic evidence, we locate a morphologically cryptic suture zone in the Australian Wet Tropics rainforest. Fourteen out of 18 contacts involve morphologically cryptic phylogeographic lineages, with mtDNA sequence divergences ranging from 2 to 15 per cent. Contact zones are significantly clustered in a suture zone located between two major Quaternary refugia. Within this area, there is a trend for secondary contacts to occur in regions with low environmental suitability relative to both adjacent refugia and, by inference, the parental lineages. The extent and form of reproductive isolation among interacting lineages varies across species, ranging from random admixture to speciation, in one case via reinforcement. Comparative phylogeographic studies, combined with environmental analysis at a fine-scale and across varying climates, can generate new insights into suture zone formation and to diversification processes in species-rich tropical rainforests. As arenas for evolutionary experimentation, suture zones merit special attention for conservation. PMID:19203915
Inference of miRNA targets using evolutionary conservation and pathway analysis
Directory of Open Access Journals (Sweden)
van Nimwegen Erik
2007-03-01
Full Text Available Abstract Background MicroRNAs have emerged as important regulatory genes in a variety of cellular processes and, in recent years, hundreds of such genes have been discovered in animals. In contrast, functional annotations are available only for a very small fraction of these miRNAs, and even in these cases only partially. Results We developed a general Bayesian method for the inference of miRNA target sites, in which, for each miRNA, we explicitly model the evolution of orthologous target sites in a set of related species. Using this method we predict target sites for all known miRNAs in flies, worms, fish, and mammals. By comparing our predictions in fly with a reference set of experimentally tested miRNA-mRNA interactions we show that our general method performs at least as well as the most accurate methods available to date, including ones specifically tailored for target prediction in fly. An important novel feature of our model is that it explicitly infers the phylogenetic distribution of functional target sites, independently for each miRNA. This allows us to infer species-specific and clade-specific miRNA targeting. We also show that, in long human 3' UTRs, miRNA target sites occur preferentially near the start and near the end of the 3' UTR. To characterize miRNA function beyond the predicted lists of targets we further present a method to infer significant associations between the sets of targets predicted for individual miRNAs and specific biochemical pathways, in particular those of the KEGG pathway database. We show that this approach retrieves several known functional miRNA-mRNA associations, and predicts novel functions for known miRNAs in cell growth and in development. Conclusion We have presented a Bayesian target prediction algorithm without any tunable parameters, that can be applied to sequences from any clade of species. The algorithm automatically infers the phylogenetic distribution of functional sites for each miRNA, and
Multi-Objective data analysis using Bayesian Inference for MagLIF experiments
Knapp, Patrick; Glinksy, Michael; Evans, Matthew; Gom, Matth; Han, Stephanie; Harding, Eric; Slutz, Steve; Hahn, Kelly; Harvey-Thompson, Adam; Geissel, Matthias; Ampleford, David; Jennings, Christopher; Schmit, Paul; Smith, Ian; Schwarz, Jens; Peterson, Kyle; Jones, Brent; Rochau, Gregory; Sinars, Daniel
2017-10-01
The MagLIF concept has recently demonstrated Gbar pressures and confinement of charged fusion products at stagnation. We present a new analysis methodology that allows for integration of multiple diagnostics including nuclear, x-ray imaging, and x-ray power to determine the temperature, pressure, liner areal density, and mix fraction. A simplified hot-spot model is used with a Bayesian inference network to determine the most probable model parameters that describe the observations while simultaneously revealing the principal uncertainties in the analysis. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA-0003525.
A Network Inference Workflow Applied to Virulence-Related Processes in Salmonella typhimurium
Energy Technology Data Exchange (ETDEWEB)
Taylor, Ronald C.; Singhal, Mudita; Weller, Jennifer B.; Khoshnevis, Saeed; Shi, Liang; McDermott, Jason E.
2009-04-20
Inference of the structure of mRNA transcriptional regulatory networks, protein regulatory or interaction networks, and protein activation/inactivation-based signal transduction networks are critical tasks in systems biology. In this article we discuss a workflow for the reconstruction of parts of the transcriptional regulatory network of the pathogenic bacterium Salmonella typhimurium based on the information contained in sets of microarray gene expression data now available for that organism, and describe our results obtained by following this workflow. The primary tool is one of the network inference algorithms deployed in the Software Environment for BIological Network Inference (SEBINI). Specifically, we selected the algorithm called Context Likelihood of Relatedness (CLR), which uses the mutual information contained in the gene expression data to infer regulatory connections. The associated analysis pipeline automatically stores the inferred edges from the CLR runs within SEBINI and, upon request, transfers the inferred edges into either Cytoscape or the plug-in Collective Analysis of Biological of Biological Interaction Networks (CABIN) tool for further post-analysis of the inferred regulatory edges. The following article presents the outcome of this workflow, as well as the protocols followed for microarray data collection, data cleansing, and network inference. Our analysis revealed several interesting interactions, functional groups, metabolic pathways, and regulons in S. typhimurium.
cpDNA Microsatellite Markers for Lemna minor (Araceae: Phylogeographic Implications
Directory of Open Access Journals (Sweden)
Gowher A. Wani
2014-07-01
Full Text Available Premise of the study: A lack of genetic markers impedes our understanding of the population biology of Lemna minor. Thus, the development of appropriate genetic markers for L. minor promises to be highly useful for population genetic studies and for addressing other life history questions regarding the species. Methods and Results: For the first time, we characterized nine polymorphic and 24 monomorphic chloroplast microsatellite markers in L. minor using DNA samples of 26 individuals sampled from five populations in Kashmir and of 17 individuals from three populations in Quebec. Initially, we designed 33 primer pairs, which were tested on genomic DNA from natural populations. Nine loci provided markers with two alleles. Based on genotyping of the chloroplast DNA fragments from 43 sampled individuals, we identified one haplotype in Quebec and 11 haplotypes in Kashmir, of which one occurs in 56% of the genotypes, one in 8%, and nine in 4%, respectively. There was a maximum of two alleles per locus. Conclusions: These new chloroplast microsatellite markers for L. minor and haplotype distribution patterns indicate a complex phylogeographic history that merits further investigation.
Audain, Enrique; Uszkoreit, Julian; Sachsenberg, Timo; Pfeuffer, Julianus; Liang, Xiao; Hermjakob, Henning; Sanchez, Aniel; Eisenacher, Martin; Reinert, Knut; Tabb, David L; Kohlbacher, Oliver; Perez-Riverol, Yasset
2017-01-06
inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could
TYPE Ia SUPERNOVA LIGHT-CURVE INFERENCE: HIERARCHICAL BAYESIAN ANALYSIS IN THE NEAR-INFRARED
International Nuclear Information System (INIS)
Mandel, Kaisey S.; Friedman, Andrew S.; Kirshner, Robert P.; Wood-Vasey, W. Michael
2009-01-01
We present a comprehensive statistical analysis of the properties of Type Ia supernova (SN Ia) light curves in the near-infrared using recent data from Peters Automated InfraRed Imaging TELescope and the literature. We construct a hierarchical Bayesian framework, incorporating several uncertainties including photometric error, peculiar velocities, dust extinction, and intrinsic variations, for principled and coherent statistical inference. SN Ia light-curve inferences are drawn from the global posterior probability of parameters describing both individual supernovae and the population conditioned on the entire SN Ia NIR data set. The logical structure of the hierarchical model is represented by a directed acyclic graph. Fully Bayesian analysis of the model and data is enabled by an efficient Markov Chain Monte Carlo algorithm exploiting the conditional probabilistic structure using Gibbs sampling. We apply this framework to the JHK s SN Ia light-curve data. A new light-curve model captures the observed J-band light-curve shape variations. The marginal intrinsic variances in peak absolute magnitudes are σ(M J ) = 0.17 ± 0.03, σ(M H ) = 0.11 ± 0.03, and σ(M Ks ) = 0.19 ± 0.04. We describe the first quantitative evidence for correlations between the NIR absolute magnitudes and J-band light-curve shapes, and demonstrate their utility for distance estimation. The average residual in the Hubble diagram for the training set SNe at cz > 2000kms -1 is 0.10 mag. The new application of bootstrap cross-validation to SN Ia light-curve inference tests the sensitivity of the statistical model fit to the finite sample and estimates the prediction error at 0.15 mag. These results demonstrate that SN Ia NIR light curves are as effective as corrected optical light curves, and, because they are less vulnerable to dust absorption, they have great potential as precise and accurate cosmological distance indicators.
Directory of Open Access Journals (Sweden)
Hero Alfred
2010-11-01
Full Text Available Abstract Background Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP, the Indian Buffet Process (IBP, and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB analysis. Results Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV, Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD, closely related non-Bayesian approaches. Conclusions Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.
Chen, Bo; Chen, Minhua; Paisley, John; Zaas, Aimee; Woods, Christopher; Ginsburg, Geoffrey S; Hero, Alfred; Lucas, Joseph; Dunson, David; Carin, Lawrence
2010-11-09
Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.
Phylogeographic structure and demographic patterns of brown trout in North-West Africa.
Snoj, Aleš; Marić, Saša; Bajec, Simona Sušnik; Berrebi, Patrick; Janjani, Said; Schöffmann, Johannes
2011-10-01
The objectives of the study were to determine the phylogeographic structure of brown trout (Salmo trutta) in Morocco, elucidate their colonization patterns in North-West Africa and identify the mtDNA lineages involved in this process. We also aimed to resolve whether certain brown trout entities are also genetically distinct. Sixty-two brown trout from eleven locations across the Mediterranean and the Atlantic drainages in Morocco were surveyed using sequence analysis of the mtDNA control region and nuclear gene LDH, and by genotyping twelve microsatellite loci. Our study confirms that in Morocco both the Atlantic and Mediterranean basins are populated by Atlantic mtDNA lineage brown trout only, demonstrating that the Atlantic lineage (especially its southern clade) invaded initially not only the western part of the Mediterranean basin in Morocco but also expanded deep into the central area. Atlantic haplotypes identified here sort into three distinct groups suggesting Morocco was colonized in at least three successive waves (1.2, 0.4 and 0.2-0.1 MY ago). This notion becomes more pronounced with the finding of a distinct haplotype in the Dades river system, whose origin appears to coalesce with the nascent stage of the basal mtDNA evolutionary lineages of brown trout. According to our results, Salmo akairos, Salmo pellegrini and "green trout" from Lake Isli do not exhibited any character states that distinctively separate them from the other brown trout populations studied. Therefore, their status as distinct species was not confirmed. Copyright © 2011 Elsevier Inc. All rights reserved.
Meta-learning framework applied in bioinformatics inference system design.
Arredondo, Tomás; Ormazábal, Wladimir
2015-01-01
This paper describes a meta-learner inference system development framework which is applied and tested in the implementation of bioinformatic inference systems. These inference systems are used for the systematic classification of the best candidates for inclusion in bacterial metabolic pathway maps. This meta-learner-based approach utilises a workflow where the user provides feedback with final classification decisions which are stored in conjunction with analysed genetic sequences for periodic inference system training. The inference systems were trained and tested with three different data sets related to the bacterial degradation of aromatic compounds. The analysis of the meta-learner-based framework involved contrasting several different optimisation methods with various different parameters. The obtained inference systems were also contrasted with other standard classification methods with accurate prediction capabilities observed.
Huelsken, Thomas; Keyse, Jude; Liggins, Libby; Penny, Shane; Treml, Eric A; Riginos, Cynthia
2013-01-01
Giant clams (genus Tridacna) are iconic coral reef animals of the Indian and Pacific Oceans, easily recognizable by their massive shells and vibrantly colored mantle tissue. Most Tridacna species are listed by CITES and the IUCN Redlist, as their populations have been extensively harvested and depleted in many regions. Here, we survey Tridacna crocea and Tridacna maxima from the eastern Indian and western Pacific Oceans for mitochondrial (COI and 16S) and nuclear (ITS) sequence variation and consolidate these data with previous published results using phylogenetic analyses. We find deep intraspecific differentiation within both T. crocea and T. maxima. In T. crocea we describe a previously undocumented phylogeographic division to the east of Cenderawasih Bay (northwest New Guinea), whereas for T. maxima the previously described, distinctive lineage of Cenderawasih Bay can be seen to also typify western Pacific populations. Furthermore, we find an undescribed, monophyletic group that is evolutionarily distinct from named Tridacna species at both mitochondrial and nuclear loci. This cryptic taxon is geographically widespread with a range extent that minimally includes much of the central Indo-Pacific region. Our results reinforce the emerging paradigm that cryptic species are common among marine invertebrates, even for conspicuous and culturally significant taxa. Additionally, our results add to identified locations of genetic differentiation across the central Indo-Pacific and highlight how phylogeographic patterns may differ even between closely related and co-distributed species.
Phylogeography by diffusion on a sphere: whole world phylogeography
Directory of Open Access Journals (Sweden)
Remco Bouckaert
2016-09-01
Full Text Available Background Techniques for reconstructing geographical history along a phylogeny can answer many questions of interest about the geographical origins of species. Bayesian models based on the assumption that taxa move through a diffusion process have found many applications. However, these methods rely on diffusion processes on a plane, and do not take the spherical nature of our planet in account. Performing an analysis that covers the whole world thus does not take in account the distortions caused by projections like the Mercator projection. Results In this paper, we introduce a Bayesian phylogeographical method based on diffusion on a sphere. When the area where taxa are sampled from is small, a sphere can be approximated by a plane and the model results in the same inferences as with models using diffusion on a plane. For taxa sampled from the whole world, we obtain substantial differences. We present an efficient algorithm for performing inference in a Markov Chain Monte Carlo (MCMC algorithm, and show applications to small and large samples areas. We compare results between planar and spherical diffusion in a simulation study and apply the method by inferring the origin of Hepatitis B based on sequences sampled from Eurasia and Africa. Conclusions We describe a framework for performing phylogeographical inference, which is suitable when the distortion introduced by map projections is large, but works well on a smaller scale as well. The framework allows sampling tips from regions, which is useful when the exact sample location is unknown, and placing prior information on locations of clades in the tree. The method is implemented in the GEO_SPHERE package in BEAST 2, which is open source licensed under LGPL and allows joint tree and geography inference under a wide range of models.
Thesis Abstract Morphological and phylogeographic analysis of Brazilian tortoises (Testudinidae).
Silva, T L; Venancio, L P R; Bonini-Domingos, C R
2015-12-29
distribution of C. carbonarius can partially be explained by the fact that all the morphotypes are considered as a single taxonomic unit. Behavioral aspects such as intraspecific communication may be as reliable as morphological or molecular data for inferring evolutionary relationships. Analysis of the physical characteristics of vocalization [fundamental frequency (Hz), interval between notes (s), duration of each note (s), and number of notes from each vocalization] between C. carbonarius and morphotype 1 revealed statistically significant differences in the interval between notes (s) (P = 0.0000); duration of each note (s) (P = 0.0000); frequency of notes (Hz) (P = 0.0009); and number of notes (P = 0.0002). The results of preference experiments using sound stimulus were inconclusive with respect to species-specific vocalization preference; only females of C. carbonarius showed intraspecific vocalization preference, indicating possible reproductive isolation mechanisms. To explore the presence of sexual dimorphism and morphological differences between C. denticulatus, C. carbonarius, and morphotype 1, descriptive statistics to analyze the data obtained for the investigated measures were used. Two sets of analysis were conducted - the first for each group, to compare the sexes; and the second for each sex, to compare the groups. To examine the interspecific variation in size and shape, a correlation matrix inprincipal component analysis was used. Next, I used factor analysis to rank the features showing >0.75 correlation in the differentiation between the sexes. The results were consistent with the hypothesis that morphotype 1 corresponds to a new species, because it differs from the species pattern in terms of morphology, coloring, and sexual dimorphism. The results of classical cytogenetic analysis - to differentiate C. denticulatus, C. carbonarius, and morphotype 1 - revealed no consistent data that would enable its use as a taxonomic parameter. Conventional Giemsa
Statistical inference on residual life
Jeong, Jong-Hyeon
2014-01-01
This is a monograph on the concept of residual life, which is an alternative summary measure of time-to-event data, or survival data. The mean residual life has been used for many years under the name of life expectancy, so it is a natural concept for summarizing survival or reliability data. It is also more interpretable than the popular hazard function, especially for communications between patients and physicians regarding the efficacy of a new drug in the medical field. This book reviews existing statistical methods to infer the residual life distribution. The review and comparison includes existing inference methods for mean and median, or quantile, residual life analysis through medical data examples. The concept of the residual life is also extended to competing risks analysis. The targeted audience includes biostatisticians, graduate students, and PhD (bio)statisticians. Knowledge in survival analysis at an introductory graduate level is advisable prior to reading this book.
Bootstrap inference when using multiple imputation.
Schomaker, Michael; Heumann, Christian
2018-04-16
Many modern estimators require bootstrapping to calculate confidence intervals because either no analytic standard error is available or the distribution of the parameter of interest is nonsymmetric. It remains however unclear how to obtain valid bootstrap inference when dealing with multiple imputation to address missing data. We present 4 methods that are intuitively appealing, easy to implement, and combine bootstrap estimation with multiple imputation. We show that 3 of the 4 approaches yield valid inference, but that the performance of the methods varies with respect to the number of imputed data sets and the extent of missingness. Simulation studies reveal the behavior of our approaches in finite samples. A topical analysis from HIV treatment research, which determines the optimal timing of antiretroviral treatment initiation in young children, demonstrates the practical implications of the 4 methods in a sophisticated and realistic setting. This analysis suffers from missing data and uses the g-formula for inference, a method for which no standard errors are available. Copyright © 2018 John Wiley & Sons, Ltd.
Ecosystem Interactions Underlie the Spread of Avian Influenza A Viruses with Pandemic Potential.
Directory of Open Access Journals (Sweden)
Justin Bahl
2016-05-01
Full Text Available Despite evidence for avian influenza A virus (AIV transmission between wild and domestic ecosystems, the roles of bird migration and poultry trade in the spread of viruses remain enigmatic. In this study, we integrate ecosystem interactions into a phylogeographic model to assess the contribution of wild and domestic hosts to AIV distribution and persistence. Analysis of globally sampled AIV datasets shows frequent two-way transmission between wild and domestic ecosystems. In general, viral flow from domestic to wild bird populations was restricted to within a geographic region. In contrast, spillover from wild to domestic populations occurred both within and between regions. Wild birds mediated long-distance dispersal at intercontinental scales whereas viral spread among poultry populations was a major driver of regional spread. Viral spread between poultry flocks frequently originated from persistent lineages circulating in regions of intensive poultry production. Our analysis of long-term surveillance data demonstrates that meaningful insights can be inferred from integrating ecosystem into phylogeographic reconstructions that may be consequential for pandemic preparedness and livestock protection.
Ecosystem Interactions Underlie the Spread of Avian Influenza A Viruses with Pandemic Potential
Bahl, Justin; Pham, Truc T.; Hill, Nichola J.; Hussein, Islam T. M.; Ma, Eric J.; Easterday, Bernard C.; Halpin, Rebecca A.; Stockwell, Timothy B.; Wentworth, David E.; Kayali, Ghazi; Krauss, Scott; Schultz-Cherry, Stacey; Webster, Robert G.; Webby, Richard J.; Swartz, Michael D.; Smith, Gavin J. D.; Runstadler, Jonathan A.
2016-01-01
Despite evidence for avian influenza A virus (AIV) transmission between wild and domestic ecosystems, the roles of bird migration and poultry trade in the spread of viruses remain enigmatic. In this study, we integrate ecosystem interactions into a phylogeographic model to assess the contribution of wild and domestic hosts to AIV distribution and persistence. Analysis of globally sampled AIV datasets shows frequent two-way transmission between wild and domestic ecosystems. In general, viral flow from domestic to wild bird populations was restricted to within a geographic region. In contrast, spillover from wild to domestic populations occurred both within and between regions. Wild birds mediated long-distance dispersal at intercontinental scales whereas viral spread among poultry populations was a major driver of regional spread. Viral spread between poultry flocks frequently originated from persistent lineages circulating in regions of intensive poultry production. Our analysis of long-term surveillance data demonstrates that meaningful insights can be inferred from integrating ecosystem into phylogeographic reconstructions that may be consequential for pandemic preparedness and livestock protection. PMID:27166585
ERC analysis: web-based inference of gene function via evolutionary rate covariation.
Wolfe, Nicholas W; Clark, Nathan L
2015-12-01
The recent explosion of comparative genomics data presents an unprecedented opportunity to construct gene networks via the evolutionary rate covariation (ERC) signature. ERC is used to identify genes that experienced similar evolutionary histories, and thereby draws functional associations between them. The ERC Analysis website allows researchers to exploit genome-wide datasets to infer novel genes in any biological function and to explore deep evolutionary connections between distinct pathways and complexes. The website provides five analytical methods, graphical output, statistical support and access to an increasing number of taxonomic groups. Analyses and data at http://csb.pitt.edu/erc_analysis/ nclark@pitt.edu. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Inferring biological tasks using Pareto analysis of high-dimensional data.
Hart, Yuval; Sheftel, Hila; Hausser, Jean; Szekely, Pablo; Ben-Moshe, Noa Bossel; Korem, Yael; Tendler, Avichai; Mayo, Avraham E; Alon, Uri
2015-03-01
We present the Pareto task inference method (ParTI; http://www.weizmann.ac.il/mcb/UriAlon/download/ParTI) for inferring biological tasks from high-dimensional biological data. Data are described as a polytope, and features maximally enriched closest to the vertices (or archetypes) allow identification of the tasks the vertices represent. We demonstrate that human breast tumors and mouse tissues are well described by tetrahedrons in gene expression space, with specific tumor types and biological functions enriched at each of the vertices, suggesting four key tasks.
Directory of Open Access Journals (Sweden)
João Paulo Monteiro
2001-12-01
Full Text Available Russell's The Problems of Philosophy tries to establish a new theory of induction, at the same time that Hume is there accused of an irrational/ scepticism about induction". But a careful analysis of the theory of knowledge explicitly acknowledged by Hume reveals that, contrary to the standard interpretation in the XXth century, possibly influenced by Russell, Hume deals exclusively with causal inference (which he never classifies as "causal induction", although now we are entitled to do so, never with inductive inference in general, mainly generalizations about sensible qualities of objects ( whether, e.g., "all crows are black" or not is not among Hume's concerns. Russell's theories are thus only false alternatives to Hume's, in (1912 or in his (1948.
cpDNA microsatellite markers for Lemna minor (Araceae): Phylogeographic implications1
Wani, Gowher A.; Shah, Manzoor A.; Reshi, Zafar A.; Atangana, Alain R.; Khasa, Damase P.
2014-01-01
• Premise of the study: A lack of genetic markers impedes our understanding of the population biology of Lemna minor. Thus, the development of appropriate genetic markers for L. minor promises to be highly useful for population genetic studies and for addressing other life history questions regarding the species. • Methods and Results: For the first time, we characterized nine polymorphic and 24 monomorphic chloroplast microsatellite markers in L. minor using DNA samples of 26 individuals sampled from five populations in Kashmir and of 17 individuals from three populations in Quebec. Initially, we designed 33 primer pairs, which were tested on genomic DNA from natural populations. Nine loci provided markers with two alleles. Based on genotyping of the chloroplast DNA fragments from 43 sampled individuals, we identified one haplotype in Quebec and 11 haplotypes in Kashmir, of which one occurs in 56% of the genotypes, one in 8%, and nine in 4%, respectively. There was a maximum of two alleles per locus. • Conclusions: These new chloroplast microsatellite markers for L. minor and haplotype distribution patterns indicate a complex phylogeographic history that merits further investigation. PMID:25202636
DEFF Research Database (Denmark)
Besnard, G.; Garcia-Verdugo, C.; Rubio de Casas, R.
2008-01-01
Background: Phylogenetic and phylogeographic investigations have been previously performed to study the evolution of the olive tree complex (Olea europaea). A particularly high genomic diversity has been found in north-west Africa. However, to date no exhaustive study has been addressed to infer...
Mixed normal inference on multicointegration
Boswijk, H.P.
2009-01-01
Asymptotic likelihood analysis of cointegration in I(2) models, see Johansen (1997, 2006), Boswijk (2000) and Paruolo (2000), has shown that inference on most parameters is mixed normal, implying hypothesis test statistics with an asymptotic 2 null distribution. The asymptotic distribution of the
Directory of Open Access Journals (Sweden)
Thomas Huelsken
Full Text Available Giant clams (genus Tridacna are iconic coral reef animals of the Indian and Pacific Oceans, easily recognizable by their massive shells and vibrantly colored mantle tissue. Most Tridacna species are listed by CITES and the IUCN Redlist, as their populations have been extensively harvested and depleted in many regions. Here, we survey Tridacna crocea and Tridacna maxima from the eastern Indian and western Pacific Oceans for mitochondrial (COI and 16S and nuclear (ITS sequence variation and consolidate these data with previous published results using phylogenetic analyses. We find deep intraspecific differentiation within both T. crocea and T. maxima. In T. crocea we describe a previously undocumented phylogeographic division to the east of Cenderawasih Bay (northwest New Guinea, whereas for T. maxima the previously described, distinctive lineage of Cenderawasih Bay can be seen to also typify western Pacific populations. Furthermore, we find an undescribed, monophyletic group that is evolutionarily distinct from named Tridacna species at both mitochondrial and nuclear loci. This cryptic taxon is geographically widespread with a range extent that minimally includes much of the central Indo-Pacific region. Our results reinforce the emerging paradigm that cryptic species are common among marine invertebrates, even for conspicuous and culturally significant taxa. Additionally, our results add to identified locations of genetic differentiation across the central Indo-Pacific and highlight how phylogeographic patterns may differ even between closely related and co-distributed species.
Objective: To examine the risk factors of developing functional decline and make probabilistic predictions by using a tree-based method that allows higher order polynomials and interactions of the risk factors. Methods: The conditional inference tree analysis, a data mining approach, was used to con...
Directory of Open Access Journals (Sweden)
Abdulahi Alfonso-Morales
Full Text Available Infectious bursal disease is a highly contagious and acute viral disease caused by the infectious bursal disease virus (IBDV; it affects all major poultry producing areas of the world. The current study was designed to rigorously measure the global phylogeographic dynamics of IBDV strains to gain insight into viral population expansion as well as the emergence, spread and pattern of the geographical structure of very virulent IBDV (vvIBDV strains.Sequences of the hyper-variable region of the VP2 (HVR-VP2 gene from IBDV strains isolated from diverse geographic locations were obtained from the GenBank database; Cuban sequences were obtained in the current work. All sequences were analysed by Bayesian phylogeographic analysis, implemented in the Bayesian Evolutionary Analysis Sampling Trees (BEAST, Bayesian Tip-association Significance testing (BaTS and Spatial Phylogenetic Reconstruction of Evolutionary Dynamics (SPREAD software packages. Selection pressure on the HVR-VP2 was also assessed. The phylogeographic association-trait analysis showed that viruses sampled from individual countries tend to cluster together, suggesting a geographic pattern for IBDV strains. Spatial analysis from this study revealed that strains carrying sequences that were linked to increased virulence of IBDV appeared in Iran in 1981 and spread to Western Europe (Belgium in 1987, Africa (Egypt around 1990, East Asia (China and Japan in 1993, the Caribbean Region (Cuba by 1995 and South America (Brazil around 2000. Selection pressure analysis showed that several codons in the HVR-VP2 region were under purifying selection.To our knowledge, this work is the first study applying the Bayesian phylogeographic reconstruction approach to analyse the emergence and spread of vvIBDV strains worldwide.
Alfonso-Morales, Abdulahi; Martínez-Pérez, Orlando; Dolz, Roser; Valle, Rosa; Perera, Carmen L; Bertran, Kateri; Frías, Maria T; Majó, Natàlia; Ganges, Llilianne; Pérez, Lester J
2013-01-01
Infectious bursal disease is a highly contagious and acute viral disease caused by the infectious bursal disease virus (IBDV); it affects all major poultry producing areas of the world. The current study was designed to rigorously measure the global phylogeographic dynamics of IBDV strains to gain insight into viral population expansion as well as the emergence, spread and pattern of the geographical structure of very virulent IBDV (vvIBDV) strains. Sequences of the hyper-variable region of the VP2 (HVR-VP2) gene from IBDV strains isolated from diverse geographic locations were obtained from the GenBank database; Cuban sequences were obtained in the current work. All sequences were analysed by Bayesian phylogeographic analysis, implemented in the Bayesian Evolutionary Analysis Sampling Trees (BEAST), Bayesian Tip-association Significance testing (BaTS) and Spatial Phylogenetic Reconstruction of Evolutionary Dynamics (SPREAD) software packages. Selection pressure on the HVR-VP2 was also assessed. The phylogeographic association-trait analysis showed that viruses sampled from individual countries tend to cluster together, suggesting a geographic pattern for IBDV strains. Spatial analysis from this study revealed that strains carrying sequences that were linked to increased virulence of IBDV appeared in Iran in 1981 and spread to Western Europe (Belgium) in 1987, Africa (Egypt) around 1990, East Asia (China and Japan) in 1993, the Caribbean Region (Cuba) by 1995 and South America (Brazil) around 2000. Selection pressure analysis showed that several codons in the HVR-VP2 region were under purifying selection. To our knowledge, this work is the first study applying the Bayesian phylogeographic reconstruction approach to analyse the emergence and spread of vvIBDV strains worldwide.
International Nuclear Information System (INIS)
Nakayama, Akira; Yoshida, Yuuji; Fukumura, Teruo
1984-01-01
There is a technique using inferring grammer as image- structure analyzing technique. This technique involves a few problems if it is applied to naturally obtained images, as the practical grammatical technique for two-dimensional image is not established. The authors developed a technique which solved the above problems for the main purpose of the automated structure analysis of naturally obtained image. The first half of this paper describes on the automatic inference of line drawing generation grammar and the line drawing analysis based on that automatic inference. The second half of the paper reports on the actual analysis. The proposed technique is that to extract object line drawings out of the line drawings containing noise. The technique was evaluated for its effectiveness with an example of extracting rib center lines out of thin line chest X-ray images having practical scale and complexity. In this example, the total number of characteristic points (ends, branch points and intersections) composing line drawings per one image was 377, and the total number of line segments composing line drawings was 566 on average per sheet. The extraction ratio was 86.6 % which seemed to be proper when the complexity of input line drawings was considered. Further, the result was compared with the identified rib center lines with the automatic screening system AISCR-V3 for comparison with the conventional processing technique, and it was satisfactory when the versatility of this method was considered. (Wakatsuki, Y.)
Caticha, Ariel
2011-03-01
In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEnt and Bayes' rule, and therefore unifies the two themes of these workshops—the Maximum Entropy and the Bayesian methods—into a single general inference scheme.
Statistical learning and selective inference.
Taylor, Jonathan; Tibshirani, Robert J
2015-06-23
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
Efficient Bayesian inference for ARFIMA processes
Graves, T.; Gramacy, R. B.; Franzke, C. L. E.; Watkins, N. W.
2015-03-01
Many geophysical quantities, like atmospheric temperature, water levels in rivers, and wind speeds, have shown evidence of long-range dependence (LRD). LRD means that these quantities experience non-trivial temporal memory, which potentially enhances their predictability, but also hampers the detection of externally forced trends. Thus, it is important to reliably identify whether or not a system exhibits LRD. In this paper we present a modern and systematic approach to the inference of LRD. Rather than Mandelbrot's fractional Gaussian noise, we use the more flexible Autoregressive Fractional Integrated Moving Average (ARFIMA) model which is widely used in time series analysis, and of increasing interest in climate science. Unlike most previous work on the inference of LRD, which is frequentist in nature, we provide a systematic treatment of Bayesian inference. In particular, we provide a new approximate likelihood for efficient parameter inference, and show how nuisance parameters (e.g. short memory effects) can be integrated over in order to focus on long memory parameters, and hypothesis testing more directly. We illustrate our new methodology on the Nile water level data, with favorable comparison to the standard estimators.
Statistical inference based on divergence measures
Pardo, Leandro
2005-01-01
The idea of using functionals of Information Theory, such as entropies or divergences, in statistical inference is not new. However, in spite of the fact that divergence statistics have become a very good alternative to the classical likelihood ratio test and the Pearson-type statistic in discrete models, many statisticians remain unaware of this powerful approach.Statistical Inference Based on Divergence Measures explores classical problems of statistical inference, such as estimation and hypothesis testing, on the basis of measures of entropy and divergence. The first two chapters form an overview, from a statistical perspective, of the most important measures of entropy and divergence and study their properties. The author then examines the statistical analysis of discrete multivariate data with emphasis is on problems in contingency tables and loglinear models using phi-divergence test statistics as well as minimum phi-divergence estimators. The final chapter looks at testing in general populations, prese...
Meyer, Georg F; Spray, Amy; Fairlie, Jo E; Uomini, Natalie T
2014-01-01
Current neuroimaging techniques with high spatial resolution constrain participant motion so that many natural tasks cannot be carried out. The aim of this paper is to show how a time-locked correlation-analysis of cerebral blood flow velocity (CBFV) lateralization data, obtained with functional TransCranial Doppler (fTCD) ultrasound, can be used to infer cerebral activation patterns across tasks. In a first experiment we demonstrate that the proposed analysis method results in data that are comparable with the standard Lateralization Index (LI) for within-task comparisons of CBFV patterns, recorded during cued word generation (CWG) at two difficulty levels. In the main experiment we demonstrate that the proposed analysis method shows correlated blood-flow patterns for two different cognitive tasks that are known to draw on common brain areas, CWG, and Music Synthesis. We show that CBFV patterns for Music and CWG are correlated only for participants with prior musical training. CBFV patterns for tasks that draw on distinct brain areas, the Tower of London and CWG, are not correlated. The proposed methodology extends conventional fTCD analysis by including temporal information in the analysis of cerebral blood-flow patterns to provide a robust, non-invasive method to infer whether common brain areas are used in different cognitive tasks. It complements conventional high resolution imaging techniques.
Sweller, Naomi; Hayes, Brett K
2010-08-01
Three studies examined how task demands that impact on attention to typical or atypical category features shape the category representations formed through classification learning and inference learning. During training categories were learned via exemplar classification or by inferring missing exemplar features. In the latter condition inferences were made about missing typical features alone (typical feature inference) or about both missing typical and atypical features (mixed feature inference). Classification and mixed feature inference led to the incorporation of typical and atypical features into category representations, with both kinds of features influencing inferences about familiar (Experiments 1 and 2) and novel (Experiment 3) test items. Those in the typical inference condition focused primarily on typical features. Together with formal modelling, these results challenge previous accounts that have characterized inference learning as producing a focus on typical category features. The results show that two different kinds of inference learning are possible and that these are subserved by different kinds of category representations.
Aggelopoulos, Nikolaos C
2015-08-01
Perceptual inference refers to the ability to infer sensory stimuli from predictions that result from internal neural representations built through prior experience. Methods of Bayesian statistical inference and decision theory model cognition adequately by using error sensing either in guiding action or in "generative" models that predict the sensory information. In this framework, perception can be seen as a process qualitatively distinct from sensation, a process of information evaluation using previously acquired and stored representations (memories) that is guided by sensory feedback. The stored representations can be utilised as internal models of sensory stimuli enabling long term associations, for example in operant conditioning. Evidence for perceptual inference is contributed by such phenomena as the cortical co-localisation of object perception with object memory, the response invariance in the responses of some neurons to variations in the stimulus, as well as from situations in which perception can be dissociated from sensation. In the context of perceptual inference, sensory areas of the cerebral cortex that have been facilitated by a priming signal may be regarded as comparators in a closed feedback loop, similar to the better known motor reflexes in the sensorimotor system. The adult cerebral cortex can be regarded as similar to a servomechanism, in using sensory feedback to correct internal models, producing predictions of the outside world on the basis of past experience. Copyright © 2015 Elsevier Ltd. All rights reserved.
Deep Learning for Population Genetic Inference.
Sheehan, Sara; Song, Yun S
2016-03-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
DEFF Research Database (Denmark)
Andersen, Jesper
2009-01-01
Collateral evolution the problem of updating several library-using programs in response to API changes in the used library. In this dissertation we address the issue of understanding collateral evolutions by automatically inferring a high-level specification of the changes evident in a given set ...... specifications inferred by spdiff in Linux are shown. We find that the inferred specifications concisely capture the actual collateral evolution performed in the examples....
International Conference on Trends and Perspectives in Linear Statistical Inference
Rosen, Dietrich
2018-01-01
This volume features selected contributions on a variety of topics related to linear statistical inference. The peer-reviewed papers from the International Conference on Trends and Perspectives in Linear Statistical Inference (LinStat 2016) held in Istanbul, Turkey, 22-25 August 2016, cover topics in both theoretical and applied statistics, such as linear models, high-dimensional statistics, computational statistics, the design of experiments, and multivariate analysis. The book is intended for statisticians, Ph.D. students, and professionals who are interested in statistical inference. .
Causal inference in nonlinear systems: Granger causality versus time-delayed mutual information
Li, Songting; Xiao, Yanyang; Zhou, Douglas; Cai, David
2018-05-01
The Granger causality (GC) analysis has been extensively applied to infer causal interactions in dynamical systems arising from economy and finance, physics, bioinformatics, neuroscience, social science, and many other fields. In the presence of potential nonlinearity in these systems, the validity of the GC analysis in general is questionable. To illustrate this, here we first construct minimal nonlinear systems and show that the GC analysis fails to infer causal relations in these systems—it gives rise to all types of incorrect causal directions. In contrast, we show that the time-delayed mutual information (TDMI) analysis is able to successfully identify the direction of interactions underlying these nonlinear systems. We then apply both methods to neuroscience data collected from experiments and demonstrate that the TDMI analysis but not the GC analysis can identify the direction of interactions among neuronal signals. Our work exemplifies inference hazards in the GC analysis in nonlinear systems and suggests that the TDMI analysis can be an appropriate tool in such a case.
Human Inferences about Sequences: A Minimal Transition Probability Model.
Directory of Open Access Journals (Sweden)
Florent Meyniel
2016-12-01
Full Text Available The brain constantly infers the causes of the inputs it receives and uses these inferences to generate statistical expectations about future observations. Experimental evidence for these expectations and their violations include explicit reports, sequential effects on reaction times, and mismatch or surprise signals recorded in electrophysiology and functional MRI. Here, we explore the hypothesis that the brain acts as a near-optimal inference device that constantly attempts to infer the time-varying matrix of transition probabilities between the stimuli it receives, even when those stimuli are in fact fully unpredictable. This parsimonious Bayesian model, with a single free parameter, accounts for a broad range of findings on surprise signals, sequential effects and the perception of randomness. Notably, it explains the pervasive asymmetry between repetitions and alternations encountered in those studies. Our analysis suggests that a neural machinery for inferring transition probabilities lies at the core of human sequence knowledge.
Inference with constrained hidden Markov models in PRISM
DEFF Research Database (Denmark)
Christiansen, Henning; Have, Christian Theil; Lassen, Ole Torp
2010-01-01
A Hidden Markov Model (HMM) is a common statistical model which is widely used for analysis of biological sequence data and other sequential phenomena. In the present paper we show how HMMs can be extended with side-constraints and present constraint solving techniques for efficient inference. De......_different are integrated. We experimentally validate our approach on the biologically motivated problem of global pairwise alignment.......A Hidden Markov Model (HMM) is a common statistical model which is widely used for analysis of biological sequence data and other sequential phenomena. In the present paper we show how HMMs can be extended with side-constraints and present constraint solving techniques for efficient inference...
Pérez-Collazos, E; Sanchez-Gómez, P; Jiménez, F; Catalán, P
2009-03-01
The geology and climate of the western Mediterranean area were strongly modified during the Late Tertiary and the Quaternary. These geological and climatic events are thought to have induced changes in the population histories of plants in the Iberian Peninsula. However, fine-scale genetic spatial architecture across western Mediterranean steppe plant refugia has rarely been investigated. A population genetic analysis of amplified fragment length polymorphism variation was conducted on present-day, relict populations of Ferula loscosii (Apiaceae). This species exhibits high individual/population numbers in the middle Ebro river valley and, according to the hypothesis of an abundant-centre distribution, these northern populations might represent a long-standing/ancestral distribution centre. However, our results suggest that the decimated southern and central Iberian populations are more variable and structured than the northeastern ones, representing the likely vestiges of an ancestral distribution centre of the species. Phylogeographical analysis suggests that F. loscosii likely originated in southern Spain and then migrated towards the central and northeastern ranges, further supporting a Late Miocene southern-bound Mediterranean migratory way for its oriental steppe ancestors. In addition, different glacial-induced conditions affected the southern and northern steppe Iberian refugia during the Quaternary. The contrasting genetic homogeneity of the Ebro valley range populations compared to the southern Iberian ones possibly reflects more severe bottlenecks and subsequent genetic drift experienced by populations of the northern Iberia refugium during the Pleistocene, followed by successful postglacial expansion from only a few founder plants.
Multimodel inference and adaptive management
Rehme, S.E.; Powell, L.A.; Allen, Craig R.
2011-01-01
Ecology is an inherently complex science coping with correlated variables, nonlinear interactions and multiple scales of pattern and process, making it difficult for experiments to result in clear, strong inference. Natural resource managers, policy makers, and stakeholders rely on science to provide timely and accurate management recommendations. However, the time necessary to untangle the complexities of interactions within ecosystems is often far greater than the time available to make management decisions. One method of coping with this problem is multimodel inference. Multimodel inference assesses uncertainty by calculating likelihoods among multiple competing hypotheses, but multimodel inference results are often equivocal. Despite this, there may be pressure for ecologists to provide management recommendations regardless of the strength of their study’s inference. We reviewed papers in the Journal of Wildlife Management (JWM) and the journal Conservation Biology (CB) to quantify the prevalence of multimodel inference approaches, the resulting inference (weak versus strong), and how authors dealt with the uncertainty. Thirty-eight percent and 14%, respectively, of articles in the JWM and CB used multimodel inference approaches. Strong inference was rarely observed, with only 7% of JWM and 20% of CB articles resulting in strong inference. We found the majority of weak inference papers in both journals (59%) gave specific management recommendations. Model selection uncertainty was ignored in most recommendations for management. We suggest that adaptive management is an ideal method to resolve uncertainty when research results in weak inference.
Input data for inferring species distributions in Kyphosidae world-wide
Directory of Open Access Journals (Sweden)
Steen Wilhelm Knudsen
2016-09-01
Full Text Available Input data files for inferring the relationship among the family Kyphosidae, as presented in (Knudsen and Clements, 2016 [1], is here provided together with resulting topologies, to allow the reader to explore the topologies in detail. The input data files comprise seven nexus-files with sequence alignments of mtDNA and nDNA markers for performing Bayesian analysis. A matrix of recoded character states inferred from the morphology examined in museum specimens representing Dichistiidae, Girellidae, Kyphosidae, Microcanthidae and Scorpididae, is also provided, and can be used for performing a parsimonious analysis to infer the relationship among these perciform families. The nucleotide input data files comprise both multiple and single representatives of the various species to allow for inference of the relationship among the species in Kyphosidae and between the families closely related to Kyphosidae. The ‘.xml’-files with various constrained relationships among the families potentially closely related to Kyphosidae are also provided to allow the reader to rerun and explore the results from the stepping-stone analysis. The resulting topologies are supplied in newick-file formats together with input data files for Bayesian analysis, together with ‘.xml’-files. Re-running the input data files in the appropriate software, will enable the reader to examine log-files and tree-files themselves. Keywords: Sea chub, Drummer, Kyphosus, Scorpis, Girella
Rich analysis and rational models: Inferring individual behavior from infant looking data
Piantadosi, Steven T.; Kidd, Celeste; Aslin, Richard
2013-01-01
Studies of infant looking times over the past 50 years have provided profound insights about cognitive development, but their dependent measures and analytic techniques are quite limited. In the context of infants' attention to discrete sequential events, we show how a Bayesian data analysis approach can be combined with a rational cognitive model to create a rich data analysis framework for infant looking times. We formalize (i) a statistical learning model (ii) a parametric linking between the learning model's beliefs and infants' looking behavior, and (iii) a data analysis model that infers parameters of the cognitive model and linking function for groups and individuals. Using this approach, we show that recent findings from Kidd, Piantadosi, and Aslin (2012) of a U-shaped relationship between look-away probability and stimulus complexity even holds within infants and is not due to averaging subjects with different types of behavior. Our results indicate that individual infants prefer stimuli of intermediate complexity, reserving attention for events that are moderately predictable given their probabilistic expectations about the world. PMID:24750256
Mori, Gustavo M; Zucchi, Maria I; Sampaio, Iracilda; Souza, Anete P
2015-04-10
Mangrove plants grow in the intertidal zone in tropical and subtropical regions worldwide. The global latitudinal distribution of the mangrove is mainly influenced by climatic and oceanographic features. Because of current climate changes, poleward range expansions have been reported for the major biogeographic regions of mangrove forests in the Western and Eastern Hemispheres. There is evidence that mangrove forests also responded similarly after the last glaciation by expanding their ranges. In this context, the use of genetic tools is an informative approach for understanding how historical processes and factors impact the distribution of mangrove species. We investigated the phylogeographic patterns of two Avicennia species, A. germinans and A. schaueriana, from the Western Hemisphere using nuclear and chloroplast DNA markers. Our results indicate that, although Avicennia bicolor, A. germinans and A. schaueriana are independent lineages, hybridization between A. schaueriana and A. germinans is a relevant evolutionary process. Our findings also reinforce the role of long-distance dispersal in widespread mangrove species such as A. germinans, for which we observed signs of transatlantic dispersal, a process that has, most likely, contributed to the breadth of the distribution of A. germinans. However, along the southern coast of South America, A. schaueriana is the only representative of the genus. The distribution patterns of A. germinans and A. schaueriana are explained by their different responses to past climate changes and by the unequal historical effectiveness of relative gene flow by propagules and pollen. We observed that A. bicolor, A. germinans and A. schaueriana are three evolutionary lineages that present historical and ongoing hybridization on the American continent. We also inferred a new evidence of transatlantic dispersal for A. germinans, which may have contributed to its widespread distribution. Despite the generally wider distribution of A
Deep Learning for Population Genetic Inference.
Directory of Open Access Journals (Sweden)
Sara Sheehan
2016-03-01
Full Text Available Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data to the output (e.g., population genetic parameters of interest. We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history. Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Deep Learning for Population Genetic Inference
Sheehan, Sara; Song, Yun S.
2016-01-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908
Optimal inference with suboptimal models: Addiction and active Bayesian inference
Schwartenbeck, Philipp; FitzGerald, Thomas H.B.; Mathys, Christoph; Dolan, Ray; Wurst, Friedrich; Kronbichler, Martin; Friston, Karl
2015-01-01
When casting behaviour as active (Bayesian) inference, optimal inference is defined with respect to an agent’s beliefs – based on its generative model of the world. This contrasts with normative accounts of choice behaviour, in which optimal actions are considered in relation to the true structure of the environment – as opposed to the agent’s beliefs about worldly states (or the task). This distinction shifts an understanding of suboptimal or pathological behaviour away from aberrant inference as such, to understanding the prior beliefs of a subject that cause them to behave less ‘optimally’ than our prior beliefs suggest they should behave. Put simply, suboptimal or pathological behaviour does not speak against understanding behaviour in terms of (Bayes optimal) inference, but rather calls for a more refined understanding of the subject’s generative model upon which their (optimal) Bayesian inference is based. Here, we discuss this fundamental distinction and its implications for understanding optimality, bounded rationality and pathological (choice) behaviour. We illustrate our argument using addictive choice behaviour in a recently described ‘limited offer’ task. Our simulations of pathological choices and addictive behaviour also generate some clear hypotheses, which we hope to pursue in ongoing empirical work. PMID:25561321
Vezér, Martin A
2016-04-01
To study climate change, scientists employ computer models, which approximate target systems with various levels of skill. Given the imperfection of climate models, how do scientists use simulations to generate knowledge about the causes of observed climate change? Addressing a similar question in the context of biological modelling, Levins (1966) proposed an account grounded in robustness analysis. Recent philosophical discussions dispute the confirmatory power of robustness, raising the question of how the results of computer modelling studies contribute to the body of evidence supporting hypotheses about climate change. Expanding on Staley's (2004) distinction between evidential strength and security, and Lloyd's (2015) argument connecting variety-of-evidence inferences and robustness analysis, I address this question with respect to recent challenges to the epistemology robustness analysis. Applying this epistemology to case studies of climate change, I argue that, despite imperfections in climate models, and epistemic constraints on variety-of-evidence reasoning and robustness analysis, this framework accounts for the strength and security of evidence supporting climatological inferences, including the finding that global warming is occurring and its primary causes are anthropogenic. Copyright © 2016 Elsevier Ltd. All rights reserved.
Ye, Meirong; Liu, Wei; Xue, Qingyun; Hou, Beiwei; Luo, Jing; Ding, Xiaoyu
2017-11-01
The aim of the current study was to elucidate the phylogeographic history of Dendrobium moniliforme, an endangered orchid species, based on two chloroplast DNA (cpDNA) markers (trnC-petN and trnE-trnT). One hundred and thirty-five samples were collected from 18 natural populations of D. moniliforme covering the entire range of the Sino-Japanese Floristic Region (SJFR) of East Asia. A total of 35 distinct cpDNA haplotypes were identified in these populations, of which 23 haplotypes were each present in only one sample and thus restricted to a single population. The significantly larger N ST value (0.586) than G ST (0.328) (p < 0.05) demonstrated the presence of strong phylogeographic structure. Phylogenetic analyses indicated that all haplotypes were clustered into two lineages. The genetic diversity of D. moniliforme was high at the species level, reflected in its haplotype diversity (H d =0.8862), nucleotide diversity (P i =0.00361), total genetic diversity (H T =0.9011), and significant differentiation (Φ ST =0.5482). Based on mismatch distribution analysis and neutrality tests, population expansion was evident in all sampled populations and also in all populations sampled in mainland China. Three refuge areas were identified, one each in southwestern China, central-southeastern China, and the CKJ (Taiwan, Japan and Korea) Islands. The results supported the hypothesis that glacial refugia were maintained on different spatial-temporal scales in the SJFR during the last glacial maximum or earlier cold periods, suggesting that Quaternary refugial isolation promoted allopatric speciation of D. moniliforme in East Asia.
Inference rule and problem solving
Energy Technology Data Exchange (ETDEWEB)
Goto, S
1982-04-01
Intelligent information processing signifies an opportunity of having man's intellectual activity executed on the computer, in which inference, in place of ordinary calculation, is used as the basic operational mechanism for such an information processing. Many inference rules are derived from syllogisms in formal logic. The problem of programming this inference function is referred to as a problem solving. Although logically inference and problem-solving are in close relation, the calculation ability of current computers is on a low level for inferring. For clarifying the relation between inference and computers, nonmonotonic logic has been considered. The paper deals with the above topics. 16 references.
On statistical inference in time series analysis of the evolution of road safety.
Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora
2013-11-01
Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research. Copyright © 2012 Elsevier Ltd. All rights reserved.
Nagao, Makoto
1990-01-01
Knowledge and Inference discusses an important problem for software systems: How do we treat knowledge and ideas on a computer and how do we use inference to solve problems on a computer? The book talks about the problems of knowledge and inference for the purpose of merging artificial intelligence and library science. The book begins by clarifying the concept of """"knowledge"""" from many points of view, followed by a chapter on the current state of library science and the place of artificial intelligence in library science. Subsequent chapters cover central topics in the artificial intellig
Geometric statistical inference
International Nuclear Information System (INIS)
Periwal, Vipul
1999-01-01
A reparametrization-covariant formulation of the inverse problem of probability is explicitly solved for finite sample sizes. The inferred distribution is explicitly continuous for finite sample size. A geometric solution of the statistical inference problem in higher dimensions is outlined
Statistical causal inferences and their applications in public health research
Wu, Pan; Chen, Ding-Geng
2016-01-01
This book compiles and presents new developments in statistical causal inference. The accompanying data and computer programs are publicly available so readers may replicate the model development and data analysis presented in each chapter. In this way, methodology is taught so that readers may implement it directly. The book brings together experts engaged in causal inference research to present and discuss recent issues in causal inference methodological development. This is also a timely look at causal inference applied to scenarios that range from clinical trials to mediation and public health research more broadly. In an academic setting, this book will serve as a reference and guide to a course in causal inference at the graduate level (Master's or Doctorate). It is particularly relevant for students pursuing degrees in Statistics, Biostatistics and Computational Biology. Researchers and data analysts in public health and biomedical research will also find this book to be an important reference.
Nolasco-Soto, Janet; González-Astorga, Jorge; Espinosa de Los Monteros, Alejandro; Galante-Patiño, Eduardo; Favila, Mario E
2017-04-01
Canthon cyanellus is a roller dung beetle with a wide distribution range in the tropical forests of the New World. In Mexico, it inhabits the Pacific and the Gulf coasts, the Yucatan Peninsula and the south mainly in the State of Chiapas. This species shows a wide geographical variation in cuticle color, which has been used as defining trait for subspecies. In this study we analyzed the phylogeographic and demographic history of the Mexican populations of C. cyanellus using DNA sequences of the nuclear ITS2, and the mitochondrial COI and 16S genes. We found that not all the current valid subspecies are supported by the molecular analysis. The populations are genetically and geographically structured in five lineages. The diversification events that gave origin to the main lineages within this species complex occurred during the Pleistocine in a time range of 1.63-0.91Myr. The demographic history of these lineages suggests post-glacial expansions toward the middle and the end of the Pleistocene. The combined data of mitochondrial and nuclear DNA suggest that the phylogeographic structure and demographic history of the C. cyanellus populations are the result of: the geological and volcanic activity that occurred from the end of the Pliocene to the Pleistocene; and the contraction and expansion of tropical forests due to the glacial and inter-glacial cycles during the Pleistocene. Landscape changes derived from historical events have affected the demographic history of the populations of this species. The results presented here point to the need to review the taxonomic status and delimitation of the lineages encompassed in the Canthon cyanellus complex. Copyright © 2017 Elsevier Inc. All rights reserved.
Goal inferences about robot behavior : goal inferences and human response behaviors
Broers, H.A.T.; Ham, J.R.C.; Broeders, R.; De Silva, P.; Okada, M.
2014-01-01
This explorative research focused on the goal inferences human observers draw based on a robot's behavior, and the extent to which those inferences predict people's behavior in response to that robot. Results show that different robot behaviors cause different response behavior from people.
Bayesian Inference for Functional Dynamics Exploring in fMRI Data
Directory of Open Access Journals (Sweden)
Xuan Guo
2016-01-01
Full Text Available This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM, Bayesian Connectivity Change Point Model (BCCPM, and Dynamic Bayesian Variable Partition Model (DBVPM, and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.
Inferring ontology graph structures using OWL reasoning
Rodriguez-Garcia, Miguel Angel
2018-01-05
Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies\\' semantic content remains a challenge.We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies\\' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph .Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.
Inferring ontology graph structures using OWL reasoning.
Rodríguez-García, Miguel Ángel; Hoehndorf, Robert
2018-01-05
Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies' semantic content remains a challenge. We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph . Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.
Caticha, Ariel
2010-01-01
In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEn...
Behavior Intention Derivation of Android Malware Using Ontology Inference
Directory of Open Access Journals (Sweden)
Jian Jiao
2018-01-01
Full Text Available Previous researches on Android malware mainly focus on malware detection, and malware’s evolution makes the process face certain hysteresis. The information presented by these detected results (malice judgment, family classification, and behavior characterization is limited for analysts. Therefore, a method is needed to restore the intention of malware, which reflects the relation between multiple behaviors of complex malware and its ultimate purpose. This paper proposes a novel description and derivation model of Android malware intention based on the theory of intention and malware reverse engineering. This approach creates ontology for malware intention to model the semantic relation between behaviors and its objects and automates the process of intention derivation by using SWRL rules transformed from intention model and Jess inference engine. Experiments on 75 typical samples show that the inference system can perform derivation of malware intention effectively, and 89.3% of the inference results are consistent with artificial analysis, which proves the feasibility and effectiveness of our theory and inference system.
Genealogical and evolutionary inference with the human Y chromosome.
Stumpf, M P; Goldstein, D B
2001-03-02
Population genetics has emerged as a powerful tool for unraveling human history. In addition to the study of mitochondrial and autosomal DNA, attention has recently focused on Y-chromosome variation. Ambiguities and inaccuracies in data analysis, however, pose an important obstacle to further development of the field. Here we review the methods available for genealogical inference using Y-chromosome data. Approaches can be divided into those that do and those that do not use an explicit population model in genealogical inference. We describe the strengths and weaknesses of these model-based and model-free approaches, as well as difficulties associated with the mutation process that affect both methods. In the case of genealogical inference using microsatellite loci, we use coalescent simulations to show that relatively simple generalizations of the mutation process can greatly increase the accuracy of genealogical inference. Because model-free and model-based approaches have different biases and limitations, we conclude that there is considerable benefit in the continued use of both types of approaches.
Directory of Open Access Journals (Sweden)
Shawn R Kuchta
Full Text Available Species are a fundamental unit of biodiversity, yet can be challenging to delimit objectively. This is particularly true of species complexes characterized by high levels of population genetic structure, hybridization between genetic groups, isolation by distance, and limited phenotypic variation. Previous work on the Cumberland Plateau Salamander, Plethodon kentucki, suggested that it might constitute a species complex despite occupying a relatively small geographic range. To examine this hypothesis, we sampled 135 individuals from 43 populations, and used four mitochondrial loci and five nuclear loci (5693 base pairs to quantify phylogeographic structure and probe for cryptic species diversity. Rates of evolution for each locus were inferred using the multidistribute package, and time calibrated gene trees and species trees were inferred using BEAST 2 and *BEAST 2, respectively. Because the parameter space relevant for species delimitation is large and complex, and all methods make simplifying assumptions that may lead them to fail, we conducted an array of analyses. Our assumption was that strongly supported species would be congruent across methods. Putative species were first delimited using a Bayesian implementation of the GMYC model (bGMYC, Geneland, and Brownie. We then validated these species using the genealogical sorting index and BPP. We found substantial phylogeographic diversity using mtDNA, including four divergent clades and an inferred common ancestor at 14.9 myr (95% HPD: 10.8-19.7 myr. By contrast, this diversity was not corroborated by nuclear sequence data, which exhibited low levels of variation and weak phylogeographic structure. Species trees estimated a far younger root than did the mtDNA data, closer to 1.0 myr old. Mutually exclusive putative species were identified by the different approaches. Possible causes of data set discordance, and the problem of species delimitation in complexes with high levels of population
Matte, Eunice M.; Castilho, Camila S.; Miotto, Renata A.; Sana, Denis A.; Johnson, Warren E.; O’Brien, Stephen J.; de Freitas, Thales R. O.; Eizirik, Eduardo
2013-01-01
The puma is an iconic predator that ranges throughout the Americas, occupying diverse habitats. Previous phylogeographic analyses have revealed that it exhibits moderate levels of genetic structure across its range, with few of the classically recognized subspecies being supported as distinct demographic units. Moreover, most of the species’ molecular diversity was found to be in South America. To further investigate the phylogeographic structure and demographic history of pumas we analyzed mtDNA sequences from 186 individuals sampled throughout their range, with emphasis on South America. Our objectives were to refine the phylogeographic assessment within South America and to investigate the demographic history of pumas using a coalescent approach. Our results extend previous phylogeographic findings, reassessing the delimitation of historical population units in South America and demonstrating that this species experienced a considerable demographic expansion in the Holocene, ca. 8,000 years ago. Our analyses indicate that this expansion occurred in South America, prior to the hypothesized re-colonization of North America, which was therefore inferred to be even more recent. The estimated demographic history supports the interpretation that pumas suffered a severe demographic decline in the Late Pleistocene throughout their distribution, followed by population expansion and re-colonization of the range, initiating from South America. PMID:24385863
Molecular evidence for a recent demographic expansion in the puma (Puma concolor (Mammalia, Felidae
Directory of Open Access Journals (Sweden)
Eunice M. Matte
2013-01-01
Full Text Available The puma is an iconic predator that ranges throughout the Americas, occupying diverse habitats. Previous phylogeographic analyses have revealed that it exhibits moderate levels of genetic structure across its range, with few of the classically recognized subspecies being supported as distinct demographic units. Moreover, most of the species' molecular diversity was found to be in South America. To further investigate the phylogeographic structure and demographic history of pumas we analyzed mtDNA sequences from 186 individuals sampled throughout their range, with emphasis on South America. Our objectives were to refine the phylogeographic assessment within South America and to investigate the demographic history of pumas using a coalescent approach. Our results extend previous phylogeographic findings, reassessing the delimitation of historical population units in South America and demonstrating that this species experienced a considerable demographic expansion in the Holocene, ca. 8,000 years ago. Our analyses indicate that this expansion occurred in South America, prior to the hypothesized re-colonization of North America, which was therefore inferred to be even more recent. The estimated demographic history supports the interpretation that pumas suffered a severe demographic decline in the Late Pleistocene throughout their distribution, followed by population expansion and re-colonization of the range, initiating from South America.
Learning Convex Inference of Marginals
Domke, Justin
2012-01-01
Graphical models trained using maximum likelihood are a common tool for probabilistic inference of marginal distributions. However, this approach suffers difficulties when either the inference process or the model is approximate. In this paper, the inference process is first defined to be the minimization of a convex function, inspired by free energy approximations. Learning is then done directly in terms of the performance of the inference process at univariate marginal prediction. The main ...
Working with sample data exploration and inference
Chaffe-Stengel, Priscilla
2014-01-01
Managers and analysts routinely collect and examine key performance measures to better understand their operations and make good decisions. Being able to render the complexity of operations data into a coherent account of significant events requires an understanding of how to work well with raw data and to make appropriate inferences. Although some statistical techniques for analyzing data and making inferences are sophisticated and require specialized expertise, there are methods that are understandable and applicable by anyone with basic algebra skills and the support of a spreadsheet package. By applying these fundamental methods themselves rather than turning over both the data and the responsibility for analysis and interpretation to an expert, managers will develop a richer understanding and potentially gain better control over their environment. This text is intended to describe these fundamental statistical techniques to managers, data analysts, and students. Statistical analysis of sample data is enh...
Tracing the HIV-1 subtype B mobility in Europe: a phylogeographic approach
Energy Technology Data Exchange (ETDEWEB)
Leitner, Thomas [Los Alamos National Laboratory; Paraskevis, D [KATHOLIEKE UNIV; Pybus, O [UNIV OF OXFORD; Magiorkinis, G [KATHOLIEKE UNIV; Hatzakis, A [KATHOLIEKE UNIV
2008-01-01
The prevalence and the origin of HIV-1 subtype B, the most prevalent circulating clade among the long-term residents in Europe, have been studied extensively. However the spatial diffusion of the epidemic from the perspective of the virus has not previously been traced. In the current study we inferred the migration history of HIV-1 subtype B by way of a phylogeography of viral sequences sampled from 16 European countries and Israel. Migration events were inferred from viral phylogenies by character reconstruction using parsimony. With regard to the spatial dispersal of the HIV subtype B sequences across viral phylogenies, in most of the countries in Europe the epidemic was introduced by multiple sources and subsequently spread within local networks. Poland provides an exception where most of the infections were the result of a single point introduction. According to the significant migratory pathways, we show that there are considerable differences across Europe. Specifically, Greece, Portugal, Serbia and Spain, provide sources shedding HIV-1; Austria, Belgium and Luxembourg, on the other hand, are migratory targets, while for Denmark, Germany, Italy, Israel, Norway, the Netherlands, Sweden, Switzerland and the UK we inferred significant bidirectional migration. For Poland no significant migratory pathways were inferred. Subtype B phylogeographies provide a new insight about the geographical distribution of viral lineages, as well as the significant pathways of virus dispersal across Europe, suggesting that intervention strategies should also address tourists, travellers and migrants.
Tracing the HIV-1 subtype B mobility in Europe: a phylogeographic approach
Directory of Open Access Journals (Sweden)
Perrin Luc
2009-05-01
Full Text Available Abstract Background The prevalence and the origin of HIV-1 subtype B, the most prevalent circulating clade among the long-term residents in Europe, have been studied extensively. However the spatial diffusion of the epidemic from the perspective of the virus has not previously been traced. Results In the current study we inferred the migration history of HIV-1 subtype B by way of a phylogeography of viral sequences sampled from 16 European countries and Israel. Migration events were inferred from viral phylogenies by character reconstruction using parsimony. With regard to the spatial dispersal of the HIV subtype B sequences across viral phylogenies, in most of the countries in Europe the epidemic was introduced by multiple sources and subsequently spread within local networks. Poland provides an exception where most of the infections were the result of a single point introduction. According to the significant migratory pathways, we show that there are considerable differences across Europe. Specifically, Greece, Portugal, Serbia and Spain, provide sources shedding HIV-1; Austria, Belgium and Luxembourg, on the other hand, are migratory targets, while for Denmark, Germany, Italy, Israel, Norway, the Netherlands, Sweden, Switzerland and the UK we inferred significant bidirectional migration. For Poland no significant migratory pathways were inferred. Conclusion Subtype B phylogeographies provide a new insight about the geographical distribution of viral lineages, as well as the significant pathways of virus dispersal across Europe, suggesting that intervention strategies should also address tourists, travellers and migrants.
Computational methods for analysis and inference of kinase/inhibitor relationships
Directory of Open Access Journals (Sweden)
Fabrizio eFerrè
2014-06-01
Full Text Available The central role of kinases in virtually all signal transduction networks is the driving motivation for the development of compounds modulating their activity. ATP-mimetic inhibitors are essential tools for elucidating signaling pathways and are emerging as promising therapeutic agents. However, off-target ligand binding and complex and sometimes unexpected kinase/inhibitor relationships can occur for seemingly unrelated kinases, stressing that computational approaches are needed for learning the interaction determinants and for the inference of the effect of small compounds on a given kinase. Recently published high-throughput profiling studies assessed the effects of thousands of small compound inhibitors, covering a substantial portion of the kinome. This wealth of data paved the road for computational resources and methods that can offer a major contribution in understanding the reasons of the inhibition, helping in the rational design of more specific molecules, in the in silico prediction of inhibition for those neglected kinases for which no systematic analysis has been carried yet, in the selection of novel inhibitors with desired selectivity, and offering novel avenues of personalized therapies.
Nonparametric Bayesian inference in biostatistics
Müller, Peter
2015-01-01
As chapters in this book demonstrate, BNP has important uses in clinical sciences and inference for issues like unknown partitions in genomics. Nonparametric Bayesian approaches (BNP) play an ever expanding role in biostatistical inference from use in proteomics to clinical trials. Many research problems involve an abundance of data and require flexible and complex probability models beyond the traditional parametric approaches. As this book's expert contributors show, BNP approaches can be the answer. Survival Analysis, in particular survival regression, has traditionally used BNP, but BNP's potential is now very broad. This applies to important tasks like arrangement of patients into clinically meaningful subpopulations and segmenting the genome into functionally distinct regions. This book is designed to both review and introduce application areas for BNP. While existing books provide theoretical foundations, this book connects theory to practice through engaging examples and research questions. Chapters c...
Probabilistic inductive inference: a survey
Ambainis, Andris
2001-01-01
Inductive inference is a recursion-theoretic theory of learning, first developed by E. M. Gold (1967). This paper surveys developments in probabilistic inductive inference. We mainly focus on finite inference of recursive functions, since this simple paradigm has produced the most interesting (and most complex) results.
LAIT: a local ancestry inference toolkit.
Hui, Daniel; Fang, Zhou; Lin, Jerome; Duan, Qing; Li, Yun; Hu, Ming; Chen, Wei
2017-09-06
Inferring local ancestry in individuals of mixed ancestry has many applications, most notably in identifying disease-susceptible loci that vary among different ethnic groups. Many software packages are available for inferring local ancestry in admixed individuals. However, most of these existing software packages require specific formatted input files and generate output files in various types, yielding practical inconvenience. We developed a tool set, Local Ancestry Inference Toolkit (LAIT), which can convert standardized files into software-specific input file formats as well as standardize and summarize inference results for four popular local ancestry inference software: HAPMIX, LAMP, LAMP-LD, and ELAI. We tested LAIT using both simulated and real data sets and demonstrated that LAIT provides convenience to run multiple local ancestry inference software. In addition, we evaluated the performance of local ancestry software among different supported software packages, mainly focusing on inference accuracy and computational resources used. We provided a toolkit to facilitate the use of local ancestry inference software, especially for users with limited bioinformatics background.
Bayesian statistical inference
Directory of Open Access Journals (Sweden)
Bruno De Finetti
2017-04-01
Full Text Available This work was translated into English and published in the volume: Bruno De Finetti, Induction and Probability, Biblioteca di Statistica, eds. P. Monari, D. Cocchi, Clueb, Bologna, 1993.Bayesian statistical Inference is one of the last fundamental philosophical papers in which we can find the essential De Finetti's approach to the statistical inference.
Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen
2018-03-01
Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data-space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper we use massive asymptotically-optimal data compression to reduce the dimensionality of the data-space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parameterized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate Density Estimation Likelihood-Free Inference with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological datasets.
Weng, Yi-Ming; Yang, Man-Miao; Yeh, Wen-Bin
2016-04-01
Taiwan, an island with three major mountain ranges, provides an ideal topography to study mountain-island effect on organisms that would be diversified in the isolation areas. Glaciations, however, might drive these organisms to lower elevations, causing gene flow among previously isolated populations. Two hypotheses have been proposed to depict the possible refugia for alpine organisms during glaciations. Nunatak hypothesis suggests that alpine species might have stayed in situ in high mountain areas during glaciations. Massif de refuge, on the other hand, proposes that alpine species might have migrated to lower ice-free areas. By sampling five sympatric carabid species of Nebria and Leistus, and using two mitochondrial genes and two nuclear genes, we evaluated the mountain-island effect on alpine carabids and tested the two proposed hypotheses with comparative phylogeographic method. Results from the phylogenetic relationships, network analysis, lineage calibration, and genetic structure indicate that the deep divergence among populations in all L. smetanai, N. formosana, and N. niitakana was subjected to long-term isolation, a phenomenon in agreement with the nunatak hypothesis. However, genetic admixture among populations of N. uenoiana and some populations of L. nokoensis complex suggests that gene flow occurred during glaciations, as a massif de refuge depicts. The speciation event in N. niitakana is estimated to have occurred before 1.89 million years ago (Mya), while differentiation among isolated populations in N. niitakana, N. formosana, L. smetanai, and L. nokoensis complex might have taken place during 0.65-1.65 Mya. While each of the alpine carabids arriving in Taiwan during different glaciation events acquired its evolutionary history, all of them had confronted the existing mountain ranges.
International Nuclear Information System (INIS)
Na, Man Gyun; Oh, Seungrohk
2002-01-01
A neuro-fuzzy inference system combined with the wavelet denoising, principal component analysis (PCA), and sequential probability ratio test (SPRT) methods has been developed to monitor the relevant sensor using the information of other sensors. The parameters of the neuro-fuzzy inference system that estimates the relevant sensor signal are optimized by a genetic algorithm and a least-squares algorithm. The wavelet denoising technique was applied to remove noise components in input signals into the neuro-fuzzy system. By reducing the dimension of an input space into the neuro-fuzzy system without losing a significant amount of information, the PCA was used to reduce the time necessary to train the neuro-fuzzy system, simplify the structure of the neuro-fuzzy inference system, and also, make easy the selection of the input signals into the neuro-fuzzy system. By using the residual signals between the estimated signals and the measured signals, the SPRT is applied to detect whether the sensors are degraded or not. The proposed sensor-monitoring algorithm was verified through applications to the pressurizer water level, the pressurizer pressure, and the hot-leg temperature sensors in pressurized water reactors
Sly, Nicholas D; Townsend, Andrea K; Rimmer, Christopher C; Townsend, Jason M; Latta, Steven C; Lovette, Irby J
2011-12-01
With its large size, complex topography and high number of avian endemics, Hispaniola appears to be a likely candidate for the in situ speciation of its avifauna, despite the worldwide rarity of avian speciation within single islands. We used multilocus comparative phylogeography techniques to examine the pattern and history of divergence in 11 endemic birds representing potential within-island speciation events. Haplotype and allele networks from mitochondrial ND2 and nuclear intron loci reveal a consistent pattern: phylogeographic divergence within or between closely related species is correlated with the likely distribution of ancient sea barriers that once divided Hispaniola into several smaller paleo-islands. Coalescent and mitochondrial clock dating of divergences indicate species-specific response to different geological events over the wide span of the island's history. We found no evidence that ecological or topographical complexity generated diversity, either by creating open niches or by restricting long-term gene flow. Thus, no true within-island speciation appears to have occurred among the species sampled on Hispaniola. Divergence events predating the merging of Hispaniola's paleo-island blocks cannot be considered in situ divergence, and postmerging divergence in response to episodic island segmentation by marine flooding probably represents in situ vicariance or interarchipelago speciation by dispersal. Our work highlights the necessity of considering island geologic history while investigating the speciation-area relationship in birds and other taxa. © 2011 Blackwell Publishing Ltd.
Directory of Open Access Journals (Sweden)
Noor Adelyna Mohammed Akib
Full Text Available Phylogeographic patterns and population structure of the pelagic Indian mackerel, Rastrelliger kanagurta were examined in 23 populations collected from the Indonesian-Malaysian Archipelago (IMA and the West Indian Ocean (WIO. Despite the vast expanse of the IMA and neighbouring seas, no evidence for geographical structure was evident. An indication that R. kanagurta populations across this region are essentially panmictic. This study also revealed that historical isolation was insufficient for R. kanagurta to attain migration drift equilibrium. Two distinct subpopulations were detected between the WIO and the IMA (and adjacent populations; interpopulation genetic variation was high. A plausible explanation for the genetic differentiation observed between the IMA and WIO regions suggest historical isolation as a result of fluctuations in sea levels during the late Pleistocene. This occurrence resulted in the evolution of a phylogeographic break for this species to the north of the Andaman Sea.
Malle, Bertram F; Holbrook, Jess
2012-04-01
People interpret behavior by making inferences about agents' intentionality, mind, and personality. Past research studied such inferences 1 at a time; in real life, people make these inferences simultaneously. The present studies therefore examined whether 4 major inferences (intentionality, desire, belief, and personality), elicited simultaneously in response to an observed behavior, might be ordered in a hierarchy of likelihood and speed. To achieve generalizability, the studies included a wide range of stimulus behaviors, presented them verbally and as dynamic videos, and assessed inferences both in a retrieval paradigm (measuring the likelihood and speed of accessing inferences immediately after they were made) and in an online processing paradigm (measuring the speed of forming inferences during behavior observation). Five studies provide evidence for a hierarchy of social inferences-from intentionality and desire to belief to personality-that is stable across verbal and visual presentations and that parallels the order found in developmental and primate research. (c) 2012 APA, all rights reserved.
Directory of Open Access Journals (Sweden)
Yuchen Yang
2016-10-01
Full Text Available Glacial vicariance is thought to influence population dynamics and speciation of many marine organisms. Mangroves, a plant group inhabiting intertidal zones, were also profoundly influenced by Pleistocene glaciations. In this study, we investigated phylogeographic patterns of a widespread mangrove species Sonneratia caseolaris and a narrowly distributed, closely related species S. lanceolata to infer their divergence histories and related it to historical geological events. We sequenced two chloroplast fragments and five nuclear genes for one population of S. lanceolata and 12 populations of S. caseolaris across the Indo-West Pacific (IWP region to evaluate genetic differentiation and divergence time among them. Phylogenetic analysis based on sequences of nuclear ribosomal internal transcribed spacer (nrITS and a nuclear gene rpl9 for all Sonneratia species indicate that S. lanceolata individuals are nested within S. caseolaris. We found strong genetic structure among geographic regions (South China Sea, the Indian Ocean and eastern Australia inhabited by S. caseolaris. We estimated that divergence between the Indo-Malesia and Australasia populations occurred 4.035 million years ago (MYA, prior to the onset of Pleistocene. BARRIERS analysis suggested that complex geographic features in the IWP region had largely shaped the phylogeographic patterns of S. caseolaris. Furthermore, haplotype analyses provided convincing evidence for secondary contact of the South China Sea (SCS and the Indian Ocean lineages at the Indo-Pacific boundary. Demographic history inference under isolation and migration (IM model detected substantial gene flow from the Sri Lanka populations to the populations in the Java Island. Moreover, multi-locus sequence analysis indicated that S. lanceolata was most closely related to the Indian Ocean populations of S. caseolaris and the divergence time between them was 2.057 MYA, coinciding with the onset of the Pleistocene
2018-02-15
expressed a variety of inference techniques on discrete and continuous distributions: exact inference, importance sampling, Metropolis-Hastings (MH...without redoing any math or rewriting any code. And although our main goal is composable reuse, our performance is also good because we can use...control paths. • The Hakaru language can express mixtures of discrete and continuous distributions, but the current disintegration transformation
Bailer-Jones, Coryn A. L.
2017-04-01
Preface; 1. Probability basics; 2. Estimation and uncertainty; 3. Statistical models and inference; 4. Linear models, least squares, and maximum likelihood; 5. Parameter estimation: single parameter; 6. Parameter estimation: multiple parameters; 7. Approximating distributions; 8. Monte Carlo methods for inference; 9. Parameter estimation: Markov chain Monte Carlo; 10. Frequentist hypothesis testing; 11. Model comparison; 12. Dealing with more complicated problems; References; Index.
Directory of Open Access Journals (Sweden)
Amaël Borzée
2017-11-01
Full Text Available The effects of ice ages on speciation have been well documented for many European and North American taxa. In contrast, very few studies have addressed the consequences of such environmental and topographical changes in North East Asian species. More precisely, the Korean Peninsula offers a unique model to assess patterns and processes of speciation as it hosts the northern- and eastern-most distribution limit of some widespread Asian taxa. Despite this, studies addressing phylogeographic patterns and population genetics in the peninsula and surrounding countries are few and studies for most families are lacking. Here we inferred the phylogenetic relationships of the common toad (Bufo gargarizans from South Korea and their North East Asian counterpart populations, based on mitochondrial data. Korean B. gargarizans GenBank BLASTs matched few individuals from nearby China, but the presence of a Korean clade suggests isolation on the Korean Peninsula, previous to the last glacial maximum, linked to sea level resurgence. Molecular clock calibrations within this group were used to date the divergence between clades and their relationship to paleo-climatic events in the area. Lack of genetic structure among South Korean populations and strong homogeneity between the Korean and some Chinese localities suggest weak isolation and recent expansion. Geographical projection of continuous coalescent maximum-clade-credibility trees shows an original Chinese expansion towards the Korean Peninsula through the Yellow Sea circa two million years ago with colonisation events dating circa 800 thousand years ago (K. y. a.. Following this colonisation, the data point to outgoing Korean Peninsula dispersal events throughout different periods, towards the North through land, and West through land bridge formations over the Yellow Sea during sea level falls. In accordance, demographic analyses revealed a population expansion in the Koran Peninsula circa 300 K. y. a
Comparison of Urban Human Movements Inferring from Multi-Source Spatial-Temporal Data
Cao, Rui; Tu, Wei; Cao, Jinzhou; Li, Qingquan
2016-06-01
The quantification of human movements is very hard because of the sparsity of traditional data and the labour intensive of the data collecting process. Recently, much spatial-temporal data give us an opportunity to observe human movement. This research investigates the relationship of city-wide human movements inferring from two types of spatial-temporal data at traffic analysis zone (TAZ) level. The first type of human movement is inferred from long-time smart card transaction data recording the boarding actions. The second type of human movement is extracted from citywide time sequenced mobile phone data with 30 minutes interval. Travel volume, travel distance and travel time are used to measure aggregated human movements in the city. To further examine the relationship between the two types of inferred movements, the linear correlation analysis is conducted on the hourly travel volume. The obtained results show that human movements inferred from smart card data and mobile phone data have a correlation of 0.635. However, there are still some non-ignorable differences in some special areas. This research not only reveals the citywide spatial-temporal human dynamic but also benefits the understanding of the reliability of the inference of human movements with big spatial-temporal data.
COMPARISON OF URBAN HUMAN MOVEMENTS INFERRING FROM MULTI-SOURCE SPATIAL-TEMPORAL DATA
Directory of Open Access Journals (Sweden)
R. Cao
2016-06-01
Full Text Available The quantification of human movements is very hard because of the sparsity of traditional data and the labour intensive of the data collecting process. Recently, much spatial-temporal data give us an opportunity to observe human movement. This research investigates the relationship of city-wide human movements inferring from two types of spatial-temporal data at traffic analysis zone (TAZ level. The first type of human movement is inferred from long-time smart card transaction data recording the boarding actions. The second type of human movement is extracted from citywide time sequenced mobile phone data with 30 minutes interval. Travel volume, travel distance and travel time are used to measure aggregated human movements in the city. To further examine the relationship between the two types of inferred movements, the linear correlation analysis is conducted on the hourly travel volume. The obtained results show that human movements inferred from smart card data and mobile phone data have a correlation of 0.635. However, there are still some non-ignorable differences in some special areas. This research not only reveals the citywide spatial-temporal human dynamic but also benefits the understanding of the reliability of the inference of human movements with big spatial-temporal data.
Inference of Transcriptional Network for Pluripotency in Mouse Embryonic Stem Cells
International Nuclear Information System (INIS)
Aburatani, S
2015-01-01
In embryonic stem cells, various transcription factors (TFs) maintain pluripotency. To gain insights into the regulatory system controlling pluripotency, I inferred the regulatory relationships between the TFs expressed in ES cells. In this study, I applied a method based on structural equation modeling (SEM), combined with factor analysis, to 649 expression profiles of 19 TF genes measured in mouse Embryonic Stem Cells (ESCs). The factor analysis identified 19 TF genes that were regulated by several unmeasured factors. Since the known cell reprogramming TF genes (Pou5f1, Sox2 and Nanog) are regulated by different factors, each estimated factor is considered to be an input for signal transduction to control pluripotency in mouse ESCs. In the inferred network model, TF proteins were also arranged as unmeasured factors that control other TFs. The interpretation of the inferred network model revealed the regulatory mechanism for controlling pluripotency in ES cells
International Nuclear Information System (INIS)
George, J.S.; Schmidt, D.M.; Wood, C.C.
1999-01-01
We have developed a Bayesian approach to the analysis of neural electromagnetic (MEG/EEG) data that can incorporate or fuse information from other imaging modalities and addresses the ill-posed inverse problem by sarnpliig the many different solutions which could have produced the given data. From these samples one can draw probabilistic inferences about regions of activation. Our source model assumes a variable number of variable size cortical regions of stimulus-correlated activity. An active region consists of locations on the cortical surf ace, within a sphere centered on some location in cortex. The number and radi of active regions can vary to defined maximum values. The goal of the analysis is to determine the posterior probability distribution for the set of parameters that govern the number, location, and extent of active regions. Markov Chain Monte Carlo is used to generate a large sample of sets of parameters distributed according to the posterior distribution. This sample is representative of the many different source distributions that could account for given data, and allows identification of probable (i.e. consistent) features across solutions. Examples of the use of this analysis technique with both simulated and empirical MEG data are presented
Deli, Temim; Kalkan, Evrim; Karhan, Selahattin Ünsal; Uzunova, Sonya; Keikhosravi, Alireza; Bilgin, Raşit; Schubart, Christoph D
2018-04-11
Recently, population genetic studies of Mediterranean marine species highlighted patterns of genetic divergence and phylogeographic breaks, due to the interplay between impacts of Pleistocene climate shifts and contemporary hydrographical barriers. These factors markedly shaped the distribution of marine organisms and their genetic makeup. The present study is part of an ongoing effort to understand the phylogeography and evolutionary history of the highly dispersive Mediterranean green crab, Carcinus aestuarii (Nardo, 1847), across the Mediterranean Sea. Recently, marked divergence between two highly separated haplogroups (genetic types I and II) of C. aestuarii was discerned across the Siculo-Tunisian Strait, suggesting an Early Pleistocene vicariant event. In order to better identify phylogeographic patterns in this species, a total of 263 individuals from 22 Mediterranean locations were analysed by comparing a 587 basepair region of the mitochondrial gene Cox1 (cytochrome oxidase subunit 1). The examined dataset is composed of both newly generated sequences (76) and previously investigated ones (187). Our results unveiled the occurrence of a highly divergent haplogroup (genetic type III) in the most north-eastern part of the Mediterranean Sea. Divergence between the most distinct type III and the common ancestor of both types I and II corresponds to the Early Pleistocene and coincides with the historical episode of separation between types I and II. Our results also revealed strong genetic divergence among adjacent regions (separating the Aegean and Marmara seas from the remaining distribution zone) and confirmed a sharp phylogeographic break across the Eastern Mediterranean. The recorded parapatric genetic divergence, with the potential existence of a contact zone between both groups in the Ionian Sea and notable differences in the demographic history, suggest the likely impact of paleoclimatic events, as well as past and contemporary oceanographic processes
Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference
Energy Technology Data Exchange (ETDEWEB)
Beggs, W.J.
1981-02-01
This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; the analysis of variance; quality control procedures; and linear regression analysis.
Logical inference and evaluation
International Nuclear Information System (INIS)
Perey, F.G.
1981-01-01
Most methodologies of evaluation currently used are based upon the theory of statistical inference. It is generally perceived that this theory is not capable of dealing satisfactorily with what are called systematic errors. Theories of logical inference should be capable of treating all of the information available, including that not involving frequency data. A theory of logical inference is presented as an extension of deductive logic via the concept of plausibility and the application of group theory. Some conclusions, based upon the application of this theory to evaluation of data, are also given
Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models.
Directory of Open Access Journals (Sweden)
Richard R Stein
2015-07-01
Full Text Available Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.
Models for probability and statistical inference theory and applications
Stapleton, James H
2007-01-01
This concise, yet thorough, book is enhanced with simulations and graphs to build the intuition of readersModels for Probability and Statistical Inference was written over a five-year period and serves as a comprehensive treatment of the fundamentals of probability and statistical inference. With detailed theoretical coverage found throughout the book, readers acquire the fundamentals needed to advance to more specialized topics, such as sampling, linear models, design of experiments, statistical computing, survival analysis, and bootstrapping.Ideal as a textbook for a two-semester sequence on probability and statistical inference, early chapters provide coverage on probability and include discussions of: discrete models and random variables; discrete distributions including binomial, hypergeometric, geometric, and Poisson; continuous, normal, gamma, and conditional distributions; and limit theory. Since limit theory is usually the most difficult topic for readers to master, the author thoroughly discusses mo...
DEFF Research Database (Denmark)
Møller, Jesper
(This text written by Jesper Møller, Aalborg University, is submitted for the collection ‘Stochastic Geometry: Highlights, Interactions and New Perspectives', edited by Wilfrid S. Kendall and Ilya Molchanov, to be published by ClarendonPress, Oxford, and planned to appear as Section 4.1 with the ......(This text written by Jesper Møller, Aalborg University, is submitted for the collection ‘Stochastic Geometry: Highlights, Interactions and New Perspectives', edited by Wilfrid S. Kendall and Ilya Molchanov, to be published by ClarendonPress, Oxford, and planned to appear as Section 4.......1 with the title ‘Inference'.) This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods using Markov chain Monte Carlo (MCMC) simulations. Due to space limitations the focus...
Guo, Guo-Ye; Chen, Fang; Shi, Xiao-Dong; Tian, Yin-Shuai; Yu, Mao-Qun; Han, Xue-Qin; Yuan, Li-Chun; Zhang, Ying
2016-01-01
Genetic variation and phylogenetic relationships among 102 Jatropha curcas accessions from Asia, Africa, and the Americas were assessed using the internal transcribed spacer region of nuclear ribosomal DNA (nrDNA ITS). The average G+C content (65.04%) was considerably higher than the A+T (34.96%) content. The estimated genetic diversity revealed moderate genetic variation. The pairwise genetic divergences (GD) between haplotypes were evaluated and ranged from 0.000 to 0.017, suggesting a higher level of genetic differentiation in Mexican accessions than those of other regions. Phylogenetic relationships and intraspecific divergence were inferred by Bayesian inference (BI), maximum parsimony (MP), and median joining (MJ) network analysis and were generally resolved. The J. curcas accessions were consistently divided into three lineages, groups A, B, and C, which demonstrated distant geographical isolation and genetic divergence between American accessions and those from other regions. The MJ network analysis confirmed that Central America was the possible center of origin. The putative migration route suggested that J. curcas was distributed from Mexico or Brazil, via Cape Verde and then split into two routes. One route was dispersed to Spain, then migrated to China, eventually spreading to southeastern Asia, while the other route was dispersed to Africa, via Madagascar and migrated to China, later spreading to southeastern Asia. Copyright © 2016 Académie des sciences. Published by Elsevier SAS. All rights reserved.
Royle, J. Andrew; Dorazio, Robert M.
2008-01-01
A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics.
Directory of Open Access Journals (Sweden)
Takebayashi Naoki
2007-07-01
Full Text Available Abstract Background Although testing for simultaneous divergence (vicariance across different population-pairs that span the same barrier to gene flow is of central importance to evolutionary biology, researchers often equate the gene tree and population/species tree thereby ignoring stochastic coalescent variance in their conclusions of temporal incongruence. In contrast to other available phylogeographic software packages, msBayes is the only one that analyses data from multiple species/population pairs under a hierarchical model. Results msBayes employs approximate Bayesian computation (ABC under a hierarchical coalescent model to test for simultaneous divergence (TSD in multiple co-distributed population-pairs. Simultaneous isolation is tested by estimating three hyper-parameters that characterize the degree of variability in divergence times across co-distributed population pairs while allowing for variation in various within population-pair demographic parameters (sub-parameters that can affect the coalescent. msBayes is a software package consisting of several C and R programs that are run with a Perl "front-end". Conclusion The method reasonably distinguishes simultaneous isolation from temporal incongruence in the divergence of co-distributed population pairs, even with sparse sampling of individuals. Because the estimate step is decoupled from the simulation step, one can rapidly evaluate different ABC acceptance/rejection conditions and the choice of summary statistics. Given the complex and idiosyncratic nature of testing multi-species biogeographic hypotheses, we envision msBayes as a powerful and flexible tool for tackling a wide array of difficult research questions that use population genetic data from multiple co-distributed species. The msBayes pipeline is available for download at http://msbayes.sourceforge.net/ under an open source license (GNU Public License. The msBayes pipeline is comprised of several C and R programs that
Lower complexity bounds for lifted inference
DEFF Research Database (Denmark)
Jaeger, Manfred
2015-01-01
instances of the model. Numerous approaches for such “lifted inference” techniques have been proposed. While it has been demonstrated that these techniques will lead to significantly more efficient inference on some specific models, there are only very recent and still quite restricted results that show...... the feasibility of lifted inference on certain syntactically defined classes of models. Lower complexity bounds that imply some limitations for the feasibility of lifted inference on more expressive model classes were established earlier in Jaeger (2000; Jaeger, M. 2000. On the complexity of inference about...... that under the assumption that NETIME≠ETIME, there is no polynomial lifted inference algorithm for knowledge bases of weighted, quantifier-, and function-free formulas. Further strengthening earlier results, this is also shown to hold for approximate inference and for knowledge bases not containing...
A Cautionary Analysis of STAPLE Using Direct Inference of Segmentation Truth
DEFF Research Database (Denmark)
Van Leemput, Koen; Sabuncu, Mert R.
2014-01-01
In this paper we analyze the properties of the well-known segmentation fusion algorithm STAPLE, using a novel inference technique that analytically marginalizes out all model parameters. We demonstrate both theoretically and empirically that when the number of raters is large, or when consensus r...
Variations on Bayesian Prediction and Inference
2016-05-09
inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle
R Package multiPIM: A Causal Inference Approach to Variable Importance Analysis
Directory of Open Access Journals (Sweden)
Stephan J Ritter
2014-04-01
Full Text Available We describe the R package multiPIM, including statistical background, functionality and user options. The package is for variable importance analysis, and is meant primarily for analyzing data from exploratory epidemiological studies, though it could certainly be applied in other areas as well. The approach taken to variable importance comes from the causal inference field, and is different from approaches taken in other R packages. By default, multiPIM uses a double robust targeted maximum likelihood estimator (TMLE of a parameter akin to the attributable risk. Several regression methods/machine learning algorithms are available for estimating the nuisance parameters of the models, including super learner, a meta-learner which combines several different algorithms into one. We describe a simulation in which the double robust TMLE is compared to the graphical computation estimator. We also provide example analyses using two data sets which are included with the package.
Cheng, Feon W; Gao, Xiang; Bao, Le; Mitchell, Diane C; Wood, Craig; Sliwinski, Martin J; Smiciklas-Wright, Helen; Still, Christopher D; Rolston, David D K; Jensen, Gordon L
2017-07-01
To examine the risk factors of developing functional decline and make probabilistic predictions by using a tree-based method that allows higher order polynomials and interactions of the risk factors. The conditional inference tree analysis, a data mining approach, was used to construct a risk stratification algorithm for developing functional limitation based on BMI and other potential risk factors for disability in 1,951 older adults without functional limitations at baseline (baseline age 73.1 ± 4.2 y). We also analyzed the data with multivariate stepwise logistic regression and compared the two approaches (e.g., cross-validation). Over a mean of 9.2 ± 1.7 years of follow-up, 221 individuals developed functional limitation. Higher BMI, age, and comorbidity were consistently identified as significant risk factors for functional decline using both methods. Based on these factors, individuals were stratified into four risk groups via the conditional inference tree analysis. Compared to the low-risk group, all other groups had a significantly higher risk of developing functional limitation. The odds ratio comparing two extreme categories was 9.09 (95% confidence interval: 4.68, 17.6). Higher BMI, age, and comorbid disease were consistently identified as significant risk factors for functional decline among older individuals across all approaches and analyses. © 2017 The Obesity Society.
Bayesian inference in probabilistic risk assessment-The current state of the art
International Nuclear Information System (INIS)
Kelly, Dana L.; Smith, Curtis L.
2009-01-01
Markov chain Monte Carlo (MCMC) approaches to sampling directly from the joint posterior distribution of aleatory model parameters have led to tremendous advances in Bayesian inference capability in a wide variety of fields, including probabilistic risk analysis. The advent of freely available software coupled with inexpensive computing power has catalyzed this advance. This paper examines where the risk assessment community is with respect to implementing modern computational-based Bayesian approaches to inference. Through a series of examples in different topical areas, it introduces salient concepts and illustrates the practical application of Bayesian inference via MCMC sampling to a variety of important problems
Directory of Open Access Journals (Sweden)
Xiaoying Shi
2017-07-01
Full Text Available Information concerning the home and workplace of residents is the basis of analyzing the urban job-housing spatial relationship. Traditional methods conduct time-consuming user surveys to obtain personal job and housing location information. Some new methods define rules to detect personal places based on human mobility data. However, because the travel patterns of residents are variable, simple rule-based methods are unable to generalize highly changing and complex travel modes. In this paper, we propose a visual analysis approach to assist the analyzer in inferring personal job and housing locations interactively based on public bicycle data. All users are first clustered to find potential commuting users. Then, several visual views are designed to find the key candidate stations for a specific user, and the visited temporal pattern of stations and the user’s hire behavior are analyzed, which helps with the inference of station semantic meanings. Finally, a number of users’ job and housing locations are detected by the analyzer and visualized. Our approach can manage the complex and diverse cycling habits of users. The effectiveness of the approach is shown through case studies based on a real-world public bicycle dataset.
Inference of neuronal network spike dynamics and topology from calcium imaging data
Directory of Open Access Journals (Sweden)
Henry eLütcke
2013-12-01
Full Text Available Two-photon calcium imaging enables functional analysis of neuronal circuits by inferring action potential (AP occurrence ('spike trains' from cellular fluorescence signals. It remains unclear how experimental parameters such as signal-to-noise ratio (SNR and acquisition rate affect spike inference and whether additional information about network structure can be extracted. Here we present a simulation framework for quantitatively assessing how well spike dynamics and network topology can be inferred from noisy calcium imaging data. For simulated AP-evoked calcium transients in neocortical pyramidal cells, we analyzed the quality of spike inference as a function of SNR and data acquisition rate using a recently introduced peeling algorithm. Given experimentally attainable values of SNR and acquisition rate, neural spike trains could be reconstructed accurately and with up to millisecond precision. We then applied statistical neuronal network models to explore how remaining uncertainties in spike inference affect estimates of network connectivity and topological features of network organization. We define the experimental conditions suitable for inferring whether the network has a scale-free structure and determine how well hub neurons can be identified. Our findings provide a benchmark for future calcium imaging studies that aim to reliably infer neuronal network properties.
Role of Speaker Cues in Attention Inference
Directory of Open Access Journals (Sweden)
Jin Joo Lee
2017-10-01
Full Text Available Current state-of-the-art approaches to emotion recognition primarily focus on modeling the nonverbal expressions of the sole individual without reference to contextual elements such as the co-presence of the partner. In this paper, we demonstrate that the accurate inference of listeners’ social-emotional state of attention depends on accounting for the nonverbal behaviors of their storytelling partner, namely their speaker cues. To gain a deeper understanding of the role of speaker cues in attention inference, we conduct investigations into real-world interactions of children (5–6 years old storytelling with their peers. Through in-depth analysis of human–human interaction data, we first identify nonverbal speaker cues (i.e., backchannel-inviting cues and listener responses (i.e., backchannel feedback. We then demonstrate how speaker cues can modify the interpretation of attention-related backchannels as well as serve as a means to regulate the responsiveness of listeners. We discuss the design implications of our findings toward our primary goal of developing attention recognition models for storytelling robots, and we argue that social robots can proactively use speaker cues to form more accurate inferences about the attentive state of their human partners.
Adaptive Inference on General Graphical Models
Acar, Umut A.; Ihler, Alexander T.; Mettu, Ramgopal; Sumer, Ozgur
2012-01-01
Many algorithms and applications involve repeatedly solving variations of the same inference problem; for example we may want to introduce new evidence to the model or perform updates to conditional dependencies. The goal of adaptive inference is to take advantage of what is preserved in the model and perform inference more rapidly than from scratch. In this paper, we describe techniques for adaptive inference on general graphs that support marginal computation and updates to the conditional ...
Reward Behavior by Male and Female Leaders: A Causal Inference Analysis.
Szilagyi, Andrew D.
1980-01-01
Investigated causal inferences between leader reward behavior and subordinate goal attainment, absenteeism, and work satisfaction. Results revealed that no significant differences were attributed to sex and that the leader reward behavior and subordinate attitudes and behavior were independent of the effects of sex of supervisor or subordinate.…
Estimating mountain basin-mean precipitation from streamflow using Bayesian inference
Henn, Brian; Clark, Martyn P.; Kavetski, Dmitri; Lundquist, Jessica D.
2015-10-01
Estimating basin-mean precipitation in complex terrain is difficult due to uncertainty in the topographical representativeness of precipitation gauges relative to the basin. To address this issue, we use Bayesian methodology coupled with a multimodel framework to infer basin-mean precipitation from streamflow observations, and we apply this approach to snow-dominated basins in the Sierra Nevada of California. Using streamflow observations, forcing data from lower-elevation stations, the Bayesian Total Error Analysis (BATEA) methodology and the Framework for Understanding Structural Errors (FUSE), we infer basin-mean precipitation, and compare it to basin-mean precipitation estimated using topographically informed interpolation from gauges (PRISM, the Parameter-elevation Regression on Independent Slopes Model). The BATEA-inferred spatial patterns of precipitation show agreement with PRISM in terms of the rank of basins from wet to dry but differ in absolute values. In some of the basins, these differences may reflect biases in PRISM, because some implied PRISM runoff ratios may be inconsistent with the regional climate. We also infer annual time series of basin precipitation using a two-step calibration approach. Assessment of the precision and robustness of the BATEA approach suggests that uncertainty in the BATEA-inferred precipitation is primarily related to uncertainties in hydrologic model structure. Despite these limitations, time series of inferred annual precipitation under different model and parameter assumptions are strongly correlated with one another, suggesting that this approach is capable of resolving year-to-year variability in basin-mean precipitation.
sick: The Spectroscopic Inference Crank
Casey, Andrew R.
2016-03-01
There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal
SICK: THE SPECTROSCOPIC INFERENCE CRANK
Energy Technology Data Exchange (ETDEWEB)
Casey, Andrew R., E-mail: arc@ast.cam.ac.uk [Institute of Astronomy, University of Cambridge, Madingley Road, Cambdridge, CB3 0HA (United Kingdom)
2016-03-15
There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal
SICK: THE SPECTROSCOPIC INFERENCE CRANK
International Nuclear Information System (INIS)
Casey, Andrew R.
2016-01-01
There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal
Schaffter, Thomas; Marbach, Daniel; Floreano, Dario
2011-08-15
Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5). GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data. Supplementary data are available at Bioinformatics online. dario.floreano@epfl.ch.
Phylogenetic Inference of HIV Transmission Clusters
Directory of Open Access Journals (Sweden)
Vlad Novitsky
2017-10-01
Full Text Available Better understanding the structure and dynamics of HIV transmission networks is essential for designing the most efficient interventions to prevent new HIV transmissions, and ultimately for gaining control of the HIV epidemic. The inference of phylogenetic relationships and the interpretation of results rely on the definition of the HIV transmission cluster. The definition of the HIV cluster is complex and dependent on multiple factors, including the design of sampling, accuracy of sequencing, precision of sequence alignment, evolutionary models, the phylogenetic method of inference, and specified thresholds for cluster support. While the majority of studies focus on clusters, non-clustered cases could also be highly informative. A new dimension in the analysis of the global and local HIV epidemics is the concept of phylogenetically distinct HIV sub-epidemics. The identification of active HIV sub-epidemics reveals spreading viral lineages and may help in the design of targeted interventions.HIVclustering can also be affected by sampling density. Obtaining a proper sampling density may increase statistical power and reduce sampling bias, so sampling density should be taken into account in study design and in interpretation of phylogenetic results. Finally, recent advances in long-range genotyping may enable more accurate inference of HIV transmission networks. If performed in real time, it could both inform public-health strategies and be clinically relevant (e.g., drug-resistance testing.
Causal inference of asynchronous audiovisual speech
Directory of Open Access Journals (Sweden)
John F Magnotti
2013-11-01
Full Text Available During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions abut the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.
On the Ability To Infer Deficiency in Mathematics From Performance in Physics Using Hierarchies
Riban, David M.
1971-01-01
Presents the procedures, results, and conclusions of a study designed to see if mathematical deficiencies can be inferred from PSSC students' performance by using a hierarchical model of requisite skills. Assuming inferences were possible, remediation was given. No effect due to remediation was observed but analysis indicated incidental learning…
Introductory statistical inference
Mukhopadhyay, Nitis
2014-01-01
This gracefully organized text reveals the rigorous theory of probability and statistical inference in the style of a tutorial, using worked examples, exercises, figures, tables, and computer simulations to develop and illustrate concepts. Drills and boxed summaries emphasize and reinforce important ideas and special techniques.Beginning with a review of the basic concepts and methods in probability theory, moments, and moment generating functions, the author moves to more intricate topics. Introductory Statistical Inference studies multivariate random variables, exponential families of dist
Kim, Jung-Jae; Rebholz-Schuhmann, Dietrich
2011-10-06
The extraction of complex events from biomedical text is a challenging task and requires in-depth semantic analysis. Previous approaches associate lexical and syntactic resources with ontologies for the semantic analysis, but fall short in testing the benefits from the use of domain knowledge. We developed a system that deduces implicit events from explicitly expressed events by using inference rules that encode domain knowledge. We evaluated the system with the inference module on three tasks: First, when tested against a corpus with manually annotated events, the inference module of our system contributes 53.2% of correct extractions, but does not cause any incorrect results. Second, the system overall reproduces 33.1% of the transcription regulatory events contained in RegulonDB (up to 85.0% precision) and the inference module is required for 93.8% of the reproduced events. Third, we applied the system with minimum adaptations to the identification of cell activity regulation events, confirming that the inference improves the performance of the system also on this task. Our research shows that the inference based on domain knowledge plays a significant role in extracting complex events from text. This approach has great potential in recognizing the complex concepts of such biomedical ontologies as Gene Ontology in the literature.
Directory of Open Access Journals (Sweden)
Kim Jung-jae
2011-10-01
Full Text Available Abstract Background The extraction of complex events from biomedical text is a challenging task and requires in-depth semantic analysis. Previous approaches associate lexical and syntactic resources with ontologies for the semantic analysis, but fall short in testing the benefits from the use of domain knowledge. Results We developed a system that deduces implicit events from explicitly expressed events by using inference rules that encode domain knowledge. We evaluated the system with the inference module on three tasks: First, when tested against a corpus with manually annotated events, the inference module of our system contributes 53.2% of correct extractions, but does not cause any incorrect results. Second, the system overall reproduces 33.1% of the transcription regulatory events contained in RegulonDB (up to 85.0% precision and the inference module is required for 93.8% of the reproduced events. Third, we applied the system with minimum adaptations to the identification of cell activity regulation events, confirming that the inference improves the performance of the system also on this task. Conclusions Our research shows that the inference based on domain knowledge plays a significant role in extracting complex events from text. This approach has great potential in recognizing the complex concepts of such biomedical ontologies as Gene Ontology in the literature.
Active inference, communication and hermeneutics.
Friston, Karl J; Frith, Christopher D
2015-07-01
Hermeneutics refers to interpretation and translation of text (typically ancient scriptures) but also applies to verbal and non-verbal communication. In a psychological setting it nicely frames the problem of inferring the intended content of a communication. In this paper, we offer a solution to the problem of neural hermeneutics based upon active inference. In active inference, action fulfils predictions about how we will behave (e.g., predicting we will speak). Crucially, these predictions can be used to predict both self and others--during speaking and listening respectively. Active inference mandates the suppression of prediction errors by updating an internal model that generates predictions--both at fast timescales (through perceptual inference) and slower timescales (through perceptual learning). If two agents adopt the same model, then--in principle--they can predict each other and minimise their mutual prediction errors. Heuristically, this ensures they are singing from the same hymn sheet. This paper builds upon recent work on active inference and communication to illustrate perceptual learning using simulated birdsongs. Our focus here is the neural hermeneutics implicit in learning, where communication facilitates long-term changes in generative models that are trying to predict each other. In other words, communication induces perceptual learning and enables others to (literally) change our minds and vice versa. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
DEFF Research Database (Denmark)
Jacobsen, B. H.; Hansen, Michael Møller; Loeschcke, V.
2005-01-01
The northern pike Esox lucius L. is a freshwater fish exhibiting pronounced population subdivision and low genetic variability. However, there is limited knowledge on phylogeographical patterns within the species, and it is not known whether the low genetic variability reflects primarily current...... low effective population sizes or historical bottlenecks. We analysed six microsatellite loci in ten populations from Europe and North America. Genetic variation was low, with the average number of alleles within populations ranging from 2.3 to 4.0 per locus. Genetic differentiation among populations...... was high (overall theta(ST) = 0.51; overall rho(ST) = 0.50). Multidimensional scaling analysis of genetic distances between populations and spatial analysis of molecular variance suggested a single phylogeographical race within the sampled populations from northern Europe, whereas North American...
Functional networks inference from rule-based machine learning models.
Lazzarini, Nicola; Widera, Paweł; Williamson, Stuart; Heer, Rakesh; Krasnogor, Natalio; Bacardit, Jaume
2016-01-01
Functional networks play an important role in the analysis of biological processes and systems. The inference of these networks from high-throughput (-omics) data is an area of intense research. So far, the similarity-based inference paradigm (e.g. gene co-expression) has been the most popular approach. It assumes a functional relationship between genes which are expressed at similar levels across different samples. An alternative to this paradigm is the inference of relationships from the structure of machine learning models. These models are able to capture complex relationships between variables, that often are different/complementary to the similarity-based methods. We propose a protocol to infer functional networks from machine learning models, called FuNeL. It assumes, that genes used together within a rule-based machine learning model to classify the samples, might also be functionally related at a biological level. The protocol is first tested on synthetic datasets and then evaluated on a test suite of 8 real-world datasets related to human cancer. The networks inferred from the real-world data are compared against gene co-expression networks of equal size, generated with 3 different methods. The comparison is performed from two different points of view. We analyse the enriched biological terms in the set of network nodes and the relationships between known disease-associated genes in a context of the network topology. The comparison confirms both the biological relevance and the complementary character of the knowledge captured by the FuNeL networks in relation to similarity-based methods and demonstrates its potential to identify known disease associations as core elements of the network. Finally, using a prostate cancer dataset as a case study, we confirm that the biological knowledge captured by our method is relevant to the disease and consistent with the specialised literature and with an independent dataset not used in the inference process. The
Statistical inference and visualization in scale-space for spatially dependent images
Vaughan, Amy
2012-03-01
SiZer (SIgnificant ZERo crossing of the derivatives) is a graphical scale-space visualization tool that allows for statistical inferences. In this paper we develop a spatial SiZer for finding significant features and conducting goodness-of-fit tests for spatially dependent images. The spatial SiZer utilizes a family of kernel estimates of the image and provides not only exploratory data analysis but also statistical inference with spatial correlation taken into account. It is also capable of comparing the observed image with a specific null model being tested by adjusting the statistical inference using an assumed covariance structure. Pixel locations having statistically significant differences between the image and a given null model are highlighted by arrows. The spatial SiZer is compared with the existing independent SiZer via the analysis of simulated data with and without signal on both planar and spherical domains. We apply the spatial SiZer method to the decadal temperature change over some regions of the Earth. © 2011 The Korean Statistical Society.
Fomekong-Nanfack, Y.; Postma, M.; Kaandorp, J.A.
2009-01-01
Background: Inference of gene regulatory networks (GRNs) requires accurate data, a method to simulate the expression patterns and an efficient optimization algorithm to estimate the unknown parameters. Using this approach it is possible to obtain alternative circuits without making any a priori
Inferring species interactions through joint mark–recapture analysis
Yackulic, Charles B.; Korman, Josh; Yard, Michael D.; Dzul, Maria C.
2018-01-01
Introduced species are frequently implicated in declines of native species. In many cases, however, evidence linking introduced species to native declines is weak. Failure to make strong inferences regarding the role of introduced species can hamper attempts to predict population viability and delay effective management responses. For many species, mark–recapture analysis is the more rigorous form of demographic analysis. However, to our knowledge, there are no mark–recapture models that allow for joint modeling of interacting species. Here, we introduce a two‐species mark–recapture population model in which the vital rates (and capture probabilities) of one species are allowed to vary in response to the abundance of the other species. We use a simulation study to explore bias and choose an approach to model selection. We then use the model to investigate species interactions between endangered humpback chub (Gila cypha) and introduced rainbow trout (Oncorhynchus mykiss) in the Colorado River between 2009 and 2016. In particular, we test hypotheses about how two environmental factors (turbidity and temperature), intraspecific density dependence, and rainbow trout abundance are related to survival, growth, and capture of juvenile humpback chub. We also project the long‐term effects of different rainbow trout abundances on adult humpback chub abundances. Our simulation study suggests this approach has minimal bias under potentially challenging circumstances (i.e., low capture probabilities) that characterized our application and that model selection using indicator variables could reliably identify the true generating model even when process error was high. When the model was applied to rainbow trout and humpback chub, we identified negative relationships between rainbow trout abundance and the survival, growth, and capture probability of juvenile humpback chub. Effects on interspecific interactions on survival and capture probability were strongly
Optimization methods for logical inference
Chandru, Vijay
2011-01-01
Merging logic and mathematics in deductive inference-an innovative, cutting-edge approach. Optimization methods for logical inference? Absolutely, say Vijay Chandru and John Hooker, two major contributors to this rapidly expanding field. And even though ""solving logical inference problems with optimization methods may seem a bit like eating sauerkraut with chopsticks. . . it is the mathematical structure of a problem that determines whether an optimization model can help solve it, not the context in which the problem occurs."" Presenting powerful, proven optimization techniques for logic in
Reliability of dose volume constraint inference from clinical data
Lutz, C. M.; Møller, D. S.; Hoffmann, L.; Knap, M. M.; Alber, M.
2017-04-01
Dose volume histogram points (DVHPs) frequently serve as dose constraints in radiotherapy treatment planning. An experiment was designed to investigate the reliability of DVHP inference from clinical data for multiple cohort sizes and complication incidence rates. The experimental background was radiation pneumonitis in non-small cell lung cancer and the DVHP inference method was based on logistic regression. From 102 NSCLC real-life dose distributions and a postulated DVHP model, an ‘ideal’ cohort was generated where the most predictive model was equal to the postulated model. A bootstrap and a Cohort Replication Monte Carlo (CoRepMC) approach were applied to create 1000 equally sized populations each. The cohorts were then analyzed to establish inference frequency distributions. This was applied to nine scenarios for cohort sizes of 102 (1), 500 (2) to 2000 (3) patients (by sampling with replacement) and three postulated DVHP models. The Bootstrap was repeated for a ‘non-ideal’ cohort, where the most predictive model did not coincide with the postulated model. The Bootstrap produced chaotic results for all models of cohort size 1 for both the ideal and non-ideal cohorts. For cohort size 2 and 3, the distributions for all populations were more concentrated around the postulated DVHP. For the CoRepMC, the inference frequency increased with cohort size and incidence rate. Correct inference rates >85 % were only achieved by cohorts with more than 500 patients. Both Bootstrap and CoRepMC indicate that inference of the correct or approximate DVHP for typical cohort sizes is highly uncertain. CoRepMC results were less spurious than Bootstrap results, demonstrating the large influence that randomness in dose-response has on the statistical analysis.
Energy Technology Data Exchange (ETDEWEB)
Petrov, S.
1996-10-01
Languages with a solvable implication problem but without complete and consistent systems of inference rules (`poor` languages) are considered. The problem of existence of finite complete and consistent inference rule system for a ``poor`` language is stated independently of the language or rules syntax. Several properties of the problem arc proved. An application of results to the language of join dependencies is given.
EI: A Program for Ecological Inference
Directory of Open Access Journals (Sweden)
Gary King
2004-09-01
Full Text Available The program EI provides a method of inferring individual behavior from aggregate data. It implements the statistical procedures, diagnostics, and graphics from the book A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data (King 1997. Ecological inference, as traditionally defined, is the process of using aggregate (i.e., "ecological" data to infer discrete individual-level relationships of interest when individual-level data are not available. Ecological inferences are required in political science research when individual-level surveys are unavailable (e.g., local or comparative electoral politics, unreliable (racial politics, insufficient (political geography, or infeasible (political history. They are also required in numerous areas of ma jor significance in public policy (e.g., for applying the Voting Rights Act and other academic disciplines ranging from epidemiology and marketing to sociology and quantitative history.
Paffetti, Donatella; Vettori, Cristina; Caramelli, David; Vernesi, Cristiano; Lari, Martina; Paganelli, Arturo; Paule, Ladislav; Giannini, Raffaello
2007-08-16
Phylogeographic analyses on the Western Euroasiatic Fagus taxa (F. orientalis, F. sylvatica, F. taurica and F. moesiaca) is available, however, the subdivision of Fagus spp. is unresolved and there is no consensus on the phylogeny and on the identification (both with morphological than molecular markers) of Fagus Eurasiatic taxa. For the first time molecular analyses of ancient pollen, dated at least 45,000 years ago, were used in combination with the phylogeny analysis on current species, to identify the Fagus spp. present during the Last Interglacial period in Italy. In this work we aim at testing if the trnL-trnF chloroplast DNA (cpDNA) region, that has been previously proved efficient in discriminating different Quercus taxa, can be employed in distinguishing the Fagus species and in identifying the ancient pollen. 86 populations from 4 Western Euroasistic taxa were sampled, and sequenced for the trnL-trnF region to verify the efficiency of this cpDNA region in identifying the Fagus spp.. Furthermore, Fagus crenata (2 populations), Fagus grandifolia (2 populations), Fagus japonica, Fagus hayatae, Quercus species and Castanea species were analysed to better resolve the phylogenetic inference. Our results show that this cpDNA region harbour some informative sites that allow to infer relationships among the species within the Fagaceae family. In particular, few specific and fixed mutations were able to discriminate and identify all the different Fagus species. Considering a short fragment of 176 base pairs within the trnL intron, 2 transversions were found able in distinguishing the F. orientalis complex taxa (F. orientalis, F. taurica and F. moesiaca) from the remaining Fagus spp. (F. sylvatica, F. japonica, F. hayataea, F. crenata and F. grandifolia). This permits to analyse this fragment also in ancient samples, where DNA is usually highly degraded. The sequences data indicate that the DNA recovered from ancient pollen belongs to the F. orientalis complex since
Phylogeographic, genomic, and meropenem susceptibility analysis of Burkholderia ubonensis.
Price, Erin P; Sarovich, Derek S; Webb, Jessica R; Hall, Carina M; Jaramillo, Sierra A; Sahl, Jason W; Kaestli, Mirjam; Mayo, Mark; Harrington, Glenda; Baker, Anthony L; Sidak-Loftis, Lindsay C; Settles, Erik W; Lummis, Madeline; Schupp, James M; Gillece, John D; Tuanyok, Apichai; Warner, Jeffrey; Busch, Joseph D; Keim, Paul; Currie, Bart J; Wagner, David M
2017-09-01
The bacterium Burkholderia ubonensis is commonly co-isolated from environmental specimens harbouring the melioidosis pathogen, Burkholderia pseudomallei. B. ubonensis has been reported in northern Australia and Thailand but not North America, suggesting similar geographic distribution to B. pseudomallei. Unlike most other Burkholderia cepacia complex (Bcc) species, B. ubonensis is considered non-pathogenic, although its virulence potential has not been tested. Antibiotic resistance in B. ubonensis, particularly towards drugs used to treat the most severe B. pseudomallei infections, has also been poorly characterised. This study examined the population biology of B. ubonensis, and includes the first reported isolates from the Caribbean. Phylogenomic analysis of 264 B. ubonensis genomes identified distinct clades that corresponded with geographic origin, similar to B. pseudomallei. A small proportion (4%) of strains lacked the 920kb chromosome III replicon, with discordance of presence/absence amongst genetically highly related strains, demonstrating that the third chromosome of B. ubonensis, like other Bcc species, probably encodes for a nonessential pC3 megaplasmid. Multilocus sequence typing using the B. pseudomallei scheme revealed that one-third of strains lack the "housekeeping" narK locus. In comparison, all strains could be genotyped using the Bcc scheme. Several strains possessed high-level meropenem resistance (≥32 μg/mL), a concern due to potential transmission of this phenotype to B. pseudomallei. In silico analysis uncovered a high degree of heterogeneity among the lipopolysaccharide O-antigen cluster loci, with at least 35 different variants identified. Finally, we show that Asian B. ubonensis isolate RF23-BP41 is avirulent in the BALB/c mouse model via a subcutaneous route of infection. Our results provide several new insights into the biology of this understudied species.
Directory of Open Access Journals (Sweden)
Remco Bouckaert
2017-08-01
Full Text Available Background Indigenous populations of the circumpolar Arctic are considered to be endemically infected (>2% prevalence with hepatitis B virus (HBV, with subgenotype B5 (formerly B6 unique to these populations. The distinctive properties of HBV/B5, including high nucleotide diversity yet no significant liver disease, suggest virus adaptation through long-term host-pathogen association. Methods To investigate the origin and evolutionary spread of HBV/B5 into the circumpolar Arctic, fifty-seven partial and full genome sequences from Alaska, Canada and Greenland, having known location and sampling dates spanning 40 years, were phylogeographically investigated by Bayesian analysis (BEAST 2 using a reversible-jump-based substitution model and a clock rate estimated at 4.1 × 10−5 substitutions/site/year. Results Following an initial divergence from an Asian viral ancestor approximately 1954 years before present (YBP; 95% highest probability density interval [1188, 2901], HBV/B5 coalescence occurred almost 1000 years later. Surprisingly, the HBV/B5 ancestor appears to locate first to Greenland in a rapid coastal route progression based on the landscape aware geographic model, with subsequent B5 evolution and spread westward. Bayesian skyline plot analysis demonstrated an HBV/B5 population expansion occurring approximately 400 YBP, coinciding with the disruption of the Neo-Eskimo Thule culture into more heterogeneous and regionally distinct Inuit populations throughout the North American Arctic. Discussion HBV/B5 origin and spread appears to occur coincident with the movement of Neo-Eskimo (Inuit populations within the past 1000 years, further supporting the hypothesis of HBV/host co-expansion, and illustrating the concept of host-pathogen adaptation and balance.
An exceptional case of historical outbreeding in African sable antelope populations
DEFF Research Database (Denmark)
Pitra, C.; Hansen, Anders J.; Lieckfeldt, D.
2002-01-01
) sequences analysed from 95 individuals representing 17 sampling locations scattered through the African miombo (Brachystegia) woodland ecosystem] and phylogeographical statistical procedures (gene genealogy, nested cladistic and admixture proportion analyses), we (i) give a detailed dissection...... of the geographical genetic structure of Hippotragus niger; (ii) infer the processes and events potentially involved in the population history; and (iii) trace extensive introgressive hybridization in the species. The present-day sable antelope population shows a tripartite pattern of genetic subdivision representing...... West Tanzanian, Kenya/East Tanzanian and Southern Africa locations. Nested clade analysis revealed that past allopatric fragmentation, caused probably by habitat discontinuities associated with the East African Rift Valley system, together with intermediary episodic long-distance colonization...
On the criticality of inferred models
Mastromatteo, Iacopo; Marsili, Matteo
2011-10-01
Advanced inference techniques allow one to reconstruct a pattern of interaction from high dimensional data sets, from probing simultaneously thousands of units of extended systems—such as cells, neural tissues and financial markets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to singular values of parameters, akin to critical points in physics where phase transitions occur. These are points where the response of physical systems to external perturbations, as measured by the susceptibility, is very large and diverges in the limit of infinite size. We show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher information) are directly related to the susceptibility of the inferred model. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. This region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time scales naturally yield models which are close to criticality.
On the criticality of inferred models
International Nuclear Information System (INIS)
Mastromatteo, Iacopo; Marsili, Matteo
2011-01-01
Advanced inference techniques allow one to reconstruct a pattern of interaction from high dimensional data sets, from probing simultaneously thousands of units of extended systems—such as cells, neural tissues and financial markets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to singular values of parameters, akin to critical points in physics where phase transitions occur. These are points where the response of physical systems to external perturbations, as measured by the susceptibility, is very large and diverges in the limit of infinite size. We show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher information) are directly related to the susceptibility of the inferred model. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. This region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time scales naturally yield models which are close to criticality
A unified framework for haplotype inference in nuclear families.
Iliadis, Alexandros; Anastassiou, Dimitris; Wang, Xiaodong
2012-07-01
Many large genome-wide association studies include nuclear families with more than one child (trio families), allowing for analysis of differences between siblings (sib pair analysis). Statistical power can be increased when haplotypes are used instead of genotypes. Currently, haplotype inference in families with more than one child can be performed either using the familial information or statistical information derived from the population samples but not both. Building on our recently proposed tree-based deterministic framework (TDS) for trio families, we augment its applicability to general nuclear families. We impose a minimum recombinant approach locally and independently on each multiple children family, while resorting to the population-derived information to solve the remaining ambiguities. Thus our framework incorporates all available information (familial and population) in a given study. We demonstrate that using all the constraints in our approach we can have gains in the accuracy as opposed to breaking the multiple children families to separate trios and resorting to a trio inference algorithm or phasing each family in isolation. We believe that our proposed framework could be the method of choice for haplotype inference in studies that include nuclear families with multiple children. Our software (tds2.0) is downloadable from www.ee.columbia.edu/∼anastas/tds. © 2012 The Authors Annals of Human Genetics © 2012 Blackwell Publishing Ltd/University College London.
Vertically Integrated Seismological Analysis II : Inference
Arora, N. S.; Russell, S.; Sudderth, E.
2009-12-01
Methods for automatically associating detected waveform features with hypothesized seismic events, and localizing those events, are a critical component of efforts to verify the Comprehensive Test Ban Treaty (CTBT). As outlined in our companion abstract, we have developed a hierarchical model which views detection, association, and localization as an integrated probabilistic inference problem. In this abstract, we provide more details on the Markov chain Monte Carlo (MCMC) methods used to solve this inference task. MCMC generates samples from a posterior distribution π(x) over possible worlds x by defining a Markov chain whose states are the worlds x, and whose stationary distribution is π(x). In the Metropolis-Hastings (M-H) method, transitions in the Markov chain are constructed in two steps. First, given the current state x, a candidate next state x‧ is generated from a proposal distribution q(x‧ | x), which may be (more or less) arbitrary. Second, the transition to x‧ is not automatic, but occurs with an acceptance probability—α(x‧ | x) = min(1, π(x‧)q(x | x‧)/π(x)q(x‧ | x)). The seismic event model outlined in our companion abstract is quite similar to those used in multitarget tracking, for which MCMC has proved very effective. In this model, each world x is defined by a collection of events, a list of properties characterizing those events (times, locations, magnitudes, and types), and the association of each event to a set of observed detections. The target distribution π(x) = P(x | y), the posterior distribution over worlds x given the observed waveform data y at all stations. Proposal distributions then implement several types of moves between worlds. For example, birth moves create new events; death moves delete existing events; split moves partition the detections for an event into two new events; merge moves combine event pairs; swap moves modify the properties and assocations for pairs of events. Importantly, the rules for
An Inference Language for Imaging
DEFF Research Database (Denmark)
Pedemonte, Stefano; Catana, Ciprian; Van Leemput, Koen
2014-01-01
We introduce iLang, a language and software framework for probabilistic inference. The iLang framework enables the definition of directed and undirected probabilistic graphical models and the automated synthesis of high performance inference algorithms for imaging applications. The iLang framewor...
Metis: A Pure Metropolis Markov Chain Monte Carlo Bayesian Inference Library
Energy Technology Data Exchange (ETDEWEB)
Bates, Cameron Russell [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Mckigney, Edward Allen [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
2018-01-09
The use of Bayesian inference in data analysis has become the standard for large scienti c experiments [1, 2]. The Monte Carlo Codes Group(XCP-3) at Los Alamos has developed a simple set of algorithms currently implemented in C++ and Python to easily perform at-prior Markov Chain Monte Carlo Bayesian inference with pure Metropolis sampling. These implementations are designed to be user friendly and extensible for customization based on speci c application requirements. This document describes the algorithmic choices made and presents two use cases.
Colosimo, Giuliano; Knapp, Charles R; Wallace, Lisa E; Welch, Mark E
2014-01-01
Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst = 0.117, p<0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes.
Colosimo, Giuliano; Knapp, Charles R.; Wallace, Lisa E.; Welch, Mark E.
2014-01-01
Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst = 0.117, p0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes. PMID:25229344
Directory of Open Access Journals (Sweden)
Giuliano Colosimo
Full Text Available Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst = 0.117, p<<0.01. These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes.
DEFF Research Database (Denmark)
Møller, Jesper
2010-01-01
Chapter 9: This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods based on a maximum likelihood or Bayesian approach combined with markov chain Monte Carlo...... (MCMC) techniques. Due to space limitations the focus is on spatial point processes....
A neuro-fuzzy inference system for sensor monitoring
International Nuclear Information System (INIS)
Na, Man Gyun
2001-01-01
A neuro-fuzzy inference system combined with the wavelet denoising, PCA (principal component analysis) and SPRT (sequential probability ratio test) methods has been developed to monitor the relevant sensor using the information of other sensors. The paramters of the neuro-fuzzy inference system which estimates the relevant sensor signal are optimized by a genetic algorithm and a least-squares algorithm. The wavelet denoising technique was applied to remove noise components in input signals into the neuro-fuzzy system. By reducing the dimension of an input space into the neuro-fuzzy system without losing a significant amount of information, the PCA was used to reduce the time necessary to train the neuro-fuzzy system, simplify the structure of the neuro-fuzzy inference system and also, make easy the selection of the input signals into the neuro-fuzzy system. By using the residual signals between the estimated signals and the measured signals, the SPRT is applied to detect whether the sensors are degraded or not. The proposed sensor-monitoring algorithm was verified through applications to the pressurizer water level, the pressurizer pressure, and the hot-leg temperature sensors in pressurized water reactors
Feature Inference Learning and Eyetracking
Rehder, Bob; Colner, Robert M.; Hoffman, Aaron B.
2009-01-01
Besides traditional supervised classification learning, people can learn categories by inferring the missing features of category members. It has been proposed that feature inference learning promotes learning a category's internal structure (e.g., its typical features and interfeature correlations) whereas classification promotes the learning of…
Jeffery, Nicholas W; Elías-Gutiérrez, Manuel; Adamowicz, Sarah J
2011-01-01
The region of Churchill, Manitoba, contains a wide variety of habitats representative of both the boreal forest and arctic tundra and has been used as a model site for biodiversity studies for nearly seven decades within Canada. Much previous work has been done in Churchill to study the Daphnia pulex species complex in particular, but no study has completed a wide-scale survey on the crustacean species that inhabit Churchill's aquatic ecosystems using molecular markers. We have employed DNA barcoding to study the diversity of the Branchiopoda (Crustacea) in a wide variety of freshwater habitats and to determine the likely origins of the Churchill fauna following the last glaciation. The standard animal barcode marker (COI) was sequenced for 327 specimens, and a 3% divergence threshold was used to delineate potential species. We found 42 provisional and valid branchiopod species from this survey alone, including several cryptic lineages, in comparison with the 25 previously recorded from previous ecological works. Using published sequence data, we explored the phylogeographic affinities of Churchill's branchiopods, finding that the Churchill fauna apparently originated from all directions from multiple glacial refugia (including southern, Beringian, and high arctic regions). Overall, these microcrustaceans are very diverse in Churchill and contain multiple species complexes. The present study introduces among the first sequences for some understudied genera, for which further work is required to delineate species boundaries and develop a more complete understanding of branchiopod diversity over a larger spatial scale.
Directory of Open Access Journals (Sweden)
Antonia M Florio
Full Text Available The Malagasy giant chameleons (Furcifer oustaleti and Furcifer verrucosus are sister species that are both broadly distributed in Madagascar, and also endemic to the island. These species are also morphologically similar and, because of this, have been frequently misidentified in the field. Previous studies have suggested that cryptic species are nested within this chameleon group, and two subspecies have been described in F. verrucosus. In this study, we utilized a phylogeographic approach to assess genetic diversification within these chameleons. This was accomplished by (1 identifying clades within each species supported by both mitochondrial and nuclear DNA, (2 assessing divergence times between clades, and (3 testing for niche divergence or conservatism. We found that both F. oustaleti and F. verrucosus could be readily identified based on genetic data, and within each species, there are two well-supported clades. However, divergence times are not contemporary and spatial patterns are not congruent. Diversification within F. verrucosus occurred during the Plio-Pleistocene, and there is evidence for niche divergence between a southwestern and southeastern clade, in a region of Madagascar that shows no obvious landscape barriers to dispersal. Diversification in F. oustaleti occurred earlier in the Pliocene or Miocene, and niche conservatism is supported with two genetically distinct clades separated at the Sofia River in northwestern Madagascar. Divergence within F. verrucosus is most consistent with patterns expected from ecologically mediated speciation, whereas divergence in F. oustaleti most strongly matches the patterns expected from the riverine barrier hypothesis.
Phylogeographic reconstruction of a bacterial species with high levels of lateral gene transfer
Pearson, T.; Giffard, P.; Beckstrom-Sternberg, S.; Auerbach, R.; Hornstra, H.; Tuanyok, A.; Price, E.P.; Glass, M.B.; Leadem, B.; Beckstrom-Sternberg, J. S.; Allan, G.J.; Foster, J.T.; Wagner, D.M.; Okinaka, R.T.; Sim, S.H.; Pearson, O.; Wu, Z.; Chang, J.; Kaul, R.; Hoffmaster, A.R.; Brettin, T.S.; Robison, R.A.; Mayo, M.; Gee, J.E.; Tan, P.; Currie, B.J.; Keim, P.
2009-01-01
Background: Phylogeographic reconstruction of some bacterial populations is hindered by low diversity coupled with high levels of lateral gene transfer. A comparison of recombination levels and diversity at seven housekeeping genes for eleven bacterial species, most of which are commonly cited as having high levels of lateral gene transfer shows that the relative contributions of homologous recombination versus mutation for Burkholderia pseudomallei is over two times higher than for Streptococcus pneumoniae and is thus the highest value yet reported in bacteria. Despite the potential for homologous recombination to increase diversity, B. pseudomallei exhibits a relative lack of diversity at these loci. In these situations, whole genome genotyping of orthologous shared single nucleotide polymorphism loci, discovered using next generation sequencing technologies, can provide very large data sets capable of estimating core phylogenetic relationships. We compared and searched 43 whole genome sequences of B. pseudomallei and its closest relatives for single nucleotide polymorphisms in orthologous shared regions to use in phylogenetic reconstruction. Results: Bayesian phylogenetic analyses of >14,000 single nucleotide polymorphisms yielded completely resolved trees for these 43 strains with high levels of statistical support. These results enable a better understanding of a separate analysis of population differentiation among >1,700 B. pseudomallei isolates as defined by sequence data from seven housekeeping genes. We analyzed this larger data set for population structure and allele sharing that can be attributed to lateral gene transfer. Our results suggest that despite an almost panmictic population, we can detect two distinct populations of B. pseudomallei that conform to biogeographic patterns found in many plant and animal species. That is, separation along Wallace's Line, a biogeographic boundary between Southeast Asia and Australia. Conclusion: We describe an
Phylogeographic reconstruction of a bacterial species with high levels of lateral gene transfer
Directory of Open Access Journals (Sweden)
Kaul Rajinder
2009-11-01
Full Text Available Abstract Background Phylogeographic reconstruction of some bacterial populations is hindered by low diversity coupled with high levels of lateral gene transfer. A comparison of recombination levels and diversity at seven housekeeping genes for eleven bacterial species, most of which are commonly cited as having high levels of lateral gene transfer shows that the relative contributions of homologous recombination versus mutation for Burkholderia pseudomallei is over two times higher than for Streptococcus pneumoniae and is thus the highest value yet reported in bacteria. Despite the potential for homologous recombination to increase diversity, B. pseudomallei exhibits a relative lack of diversity at these loci. In these situations, whole genome genotyping of orthologous shared single nucleotide polymorphism loci, discovered using next generation sequencing technologies, can provide very large data sets capable of estimating core phylogenetic relationships. We compared and searched 43 whole genome sequences of B. pseudomallei and its closest relatives for single nucleotide polymorphisms in orthologous shared regions to use in phylogenetic reconstruction. Results Bayesian phylogenetic analyses of >14,000 single nucleotide polymorphisms yielded completely resolved trees for these 43 strains with high levels of statistical support. These results enable a better understanding of a separate analysis of population differentiation among >1,700 B. pseudomallei isolates as defined by sequence data from seven housekeeping genes. We analyzed this larger data set for population structure and allele sharing that can be attributed to lateral gene transfer. Our results suggest that despite an almost panmictic population, we can detect two distinct populations of B. pseudomallei that conform to biogeographic patterns found in many plant and animal species. That is, separation along Wallace's Line, a biogeographic boundary between Southeast Asia and Australia
Farhadi, Ahmad; Jeffs, Andrew G; Farahmand, Hamid; Rejiniemon, Thankappan Sarasam; Smith, Greg; Lavery, Shane D
2017-08-18
There is increasing recognition of the concordance between marine biogeographic and phylogeographic boundaries. However, it is still unclear how population-level divergence translates into species-level divergence, and what are the principal factors that first initiate that divergence, and then maintain reproductive isolation. This study examines the likely forces driving population and lineage divergences in the broadly-distributed Indo-Pacific spiny lobster Panulirus homarus, which has peripheral divergent lineages in the west and east. The study focuses particularly on the West Indian Ocean, which is emerging as a region of unexpected diversity. Mitochondrial control region (mtCR) and COI sequences as well as genotypes of 9 microsatellite loci were examined in 410 individuals from 17 locations grouped into 7 regions from South Africa in the west, and eastward across to Taiwan and the Marquesas Islands. Phylogenetic and population-level analyses were used to test the significance and timing of divergences and describe the genetic relationships among populations. Analyses of the mtCR revealed high levels of divergence among the seven regions (Ф ST = 0.594, P Indo-Pacific that helps drive some of the regions' recognized biogeographic boundaries.
Directory of Open Access Journals (Sweden)
Hai Li
2014-08-01
Full Text Available The Chinese beard eel (Cirrhimuraena chinensis Kaup is an intertidal fish and a model organism for the study of impacts caused by topological fluctuations during the Pleistocene and current intricate hydrological conditions on fauna living in the coastal areas of China. In this study, we examined the phylogeographical pattern, population genetic profile and demographical history of C. chinensis using mitochondrial DNA (cytochrome b (cyt b and control region (CR from 266 individuals sampled in seven localities across the coastal area of southeastern China. The combined data indicated high levels of haplotype diversity and low levels of nucleotide diversity. Analyses of molecular variance (AMOVA and FST statistics suggested the absence of a significant population structure across the Chinese coast. Neutrality tests, mismatch distributions and Bayesian skyline plots uniformly indicated a recent population expansion. The phylogeographic structure of C. chinensis may be attributed to past population expansion and long-distance pelagic larval dispersal facilitated by present-day ocean currents.
Directory of Open Access Journals (Sweden)
Giorgos Minas
2017-07-01
Full Text Available In order to analyse large complex stochastic dynamical models such as those studied in systems biology there is currently a great need for both analytical tools and also algorithms for accurate and fast simulation and estimation. We present a new stochastic approximation of biological oscillators that addresses these needs. Our method, called phase-corrected LNA (pcLNA overcomes the main limitations of the standard Linear Noise Approximation (LNA to remain uniformly accurate for long times, still maintaining the speed and analytically tractability of the LNA. As part of this, we develop analytical expressions for key probability distributions and associated quantities, such as the Fisher Information Matrix and Kullback-Leibler divergence and we introduce a new approach to system-global sensitivity analysis. We also present algorithms for statistical inference and for long-term simulation of oscillating systems that are shown to be as accurate but much faster than leaping algorithms and algorithms for integration of diffusion equations. Stochastic versions of published models of the circadian clock and NF-κB system are used to illustrate our results.
Forward and backward inference in spatial cognition.
Directory of Open Access Journals (Sweden)
Will D Penny
Full Text Available This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of 'lower-level' computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus.
Bayesian inference for Markov jump processes with informative observations.
Golightly, Andrew; Wilkinson, Darren J
2015-04-01
In this paper we consider the problem of parameter inference for Markov jump process (MJP) representations of stochastic kinetic models. Since transition probabilities are intractable for most processes of interest yet forward simulation is straightforward, Bayesian inference typically proceeds through computationally intensive methods such as (particle) MCMC. Such methods ostensibly require the ability to simulate trajectories from the conditioned jump process. When observations are highly informative, use of the forward simulator is likely to be inefficient and may even preclude an exact (simulation based) analysis. We therefore propose three methods for improving the efficiency of simulating conditioned jump processes. A conditioned hazard is derived based on an approximation to the jump process, and used to generate end-point conditioned trajectories for use inside an importance sampling algorithm. We also adapt a recently proposed sequential Monte Carlo scheme to our problem. Essentially, trajectories are reweighted at a set of intermediate time points, with more weight assigned to trajectories that are consistent with the next observation. We consider two implementations of this approach, based on two continuous approximations of the MJP. We compare these constructs for a simple tractable jump process before using them to perform inference for a Lotka-Volterra system. The best performing construct is used to infer the parameters governing a simple model of motility regulation in Bacillus subtilis.
da Silva Santos, Anelisie; Trigo, Tatiane Campos; de Oliveira, Tadeu Gomes; Silveira, Leandro
2018-01-01
Abstract The pampas cat is a small felid that occurs in open habitats throughout much of South America. Previous studies have revealed intriguing patterns of morphological differentiation and genetic structure among its populations, as well as molecular evidence for hybridization with the closely related L. tigrinus. Here we report phylogeographic analyses encompassing most of its distribution (focusing particularly on Brazilian specimens, which had been poorly sampled in previous studies), using a novel dataset comprising 2,143 bp of the mitogenome, along with previously reported mtDNA sequences. Our data revealed strong population strutucture and supported a west-to-east colonization process in this species’ history. We detected two population expansion events, one older (ca. 200 thousand years ago [kya]) in western South America and another more recent (ca. 60-50 kya) in eastern areas, coinciding with the expansion of savanna environments in Brazil. Analyses including L. tigrinus individuals bearing introgressed mtDNA from L. colocola showed a complete lack of shared haplotypes between species, indicating that their hybridization was ancient. Finally, we observed a close relationship between Brazilian/Uruguayan L. colocola haplotypes and those sampled in L. tigrinus, indicating that their hybridization was likely related to the demographic expansion of L. colocola into eastern South America. PMID:29668017
A formal model of interpersonal inference
Directory of Open Access Journals (Sweden)
Michael eMoutoussis
2014-03-01
Full Text Available Introduction: We propose that active Bayesian inference – a general framework for decision-making – can equally be applied to interpersonal exchanges. Social cognition, however, entails special challenges. We address these challenges through a novel formulation of a formal model and demonstrate its psychological significance. Method: We review relevant literature, especially with regards to interpersonal representations, formulate a mathematical model and present a simulation study. The model accommodates normative models from utility theory and places them within the broader setting of Bayesian inference. Crucially, we endow people's prior beliefs, into which utilities are absorbed, with preferences of self and others. The simulation illustrates the model's dynamics and furnishes elementary predictions of the theory. Results: 1. Because beliefs about self and others inform both the desirability and plausibility of outcomes, in this framework interpersonal representations become beliefs that have to be actively inferred. This inference, akin to 'mentalising' in the psychological literature, is based upon the outcomes of interpersonal exchanges. 2. We show how some well-known social-psychological phenomena (e.g. self-serving biases can be explained in terms of active interpersonal inference. 3. Mentalising naturally entails Bayesian updating of how people value social outcomes. Crucially this includes inference about one’s own qualities and preferences. Conclusion: We inaugurate a Bayes optimal framework for modelling intersubject variability in mentalising during interpersonal exchanges. Here, interpersonal representations are endowed with explicit functional and affective properties. We suggest the active inference framework lends itself to the study of psychiatric conditions where mentalising is distorted.
Kroese, A.H.; van der Meulen, E.A.; Poortema, Klaas; Schaafsma, W.
1995-01-01
The making of statistical inferences in distributional form is conceptionally complicated because the epistemic 'probabilities' assigned are mixtures of fact and fiction. In this respect they are essentially different from 'physical' or 'frequency-theoretic' probabilities. The distributional form is
Continuous Integrated Invariant Inference, Phase I
National Aeronautics and Space Administration — The proposed project will develop a new technique for invariant inference and embed this and other current invariant inference and checking techniques in an...
Estimating uncertainty of inference for validation
Energy Technology Data Exchange (ETDEWEB)
Booker, Jane M [Los Alamos National Laboratory; Langenbrunner, James R [Los Alamos National Laboratory; Hemez, Francois M [Los Alamos National Laboratory; Ross, Timothy J [UNM
2010-09-30
We present a validation process based upon the concept that validation is an inference-making activity. This has always been true, but the association has not been as important before as it is now. Previously, theory had been confirmed by more data, and predictions were possible based on data. The process today is to infer from theory to code and from code to prediction, making the role of prediction somewhat automatic, and a machine function. Validation is defined as determining the degree to which a model and code is an accurate representation of experimental test data. Imbedded in validation is the intention to use the computer code to predict. To predict is to accept the conclusion that an observable final state will manifest; therefore, prediction is an inference whose goodness relies on the validity of the code. Quantifying the uncertainty of a prediction amounts to quantifying the uncertainty of validation, and this involves the characterization of uncertainties inherent in theory/models/codes and the corresponding data. An introduction to inference making and its associated uncertainty is provided as a foundation for the validation problem. A mathematical construction for estimating the uncertainty in the validation inference is then presented, including a possibility distribution constructed to represent the inference uncertainty for validation under uncertainty. The estimation of inference uncertainty for validation is illustrated using data and calculations from Inertial Confinement Fusion (ICF). The ICF measurements of neutron yield and ion temperature were obtained for direct-drive inertial fusion capsules at the Omega laser facility. The glass capsules, containing the fusion gas, were systematically selected with the intent of establishing a reproducible baseline of high-yield 10{sup 13}-10{sup 14} neutron output. The deuterium-tritium ratio in these experiments was varied to study its influence upon yield. This paper on validation inference is the
On Bayesian Inference under Sampling from Scale Mixtures of Normals
Fernández, C.; Steel, M.F.J.
1996-01-01
This paper considers a Bayesian analysis of the linear regression model under independent sampling from general scale mixtures of Normals.Using a common reference prior, we investigate the validity of Bayesian inference and the existence of posterior moments of the regression and precision
Quantum-Like Representation of Non-Bayesian Inference
Asano, M.; Basieva, I.; Khrennikov, A.; Ohya, M.; Tanaka, Y.
2013-01-01
This research is related to the problem of "irrational decision making or inference" that have been discussed in cognitive psychology. There are some experimental studies, and these statistical data cannot be described by classical probability theory. The process of decision making generating these data cannot be reduced to the classical Bayesian inference. For this problem, a number of quantum-like coginitive models of decision making was proposed. Our previous work represented in a natural way the classical Bayesian inference in the frame work of quantum mechanics. By using this representation, in this paper, we try to discuss the non-Bayesian (irrational) inference that is biased by effects like the quantum interference. Further, we describe "psychological factor" disturbing "rationality" as an "environment" correlating with the "main system" of usual Bayesian inference.
Statistical inference an integrated Bayesianlikelihood approach
Aitkin, Murray
2010-01-01
Filling a gap in current Bayesian theory, Statistical Inference: An Integrated Bayesian/Likelihood Approach presents a unified Bayesian treatment of parameter inference and model comparisons that can be used with simple diffuse prior specifications. This novel approach provides new solutions to difficult model comparison problems and offers direct Bayesian counterparts of frequentist t-tests and other standard statistical methods for hypothesis testing.After an overview of the competing theories of statistical inference, the book introduces the Bayes/likelihood approach used throughout. It pre
Inference and Analysis of Population Structure Using Genetic Data and Network Theory.
Greenbaum, Gili; Templeton, Alan R; Bar-David, Shirli
2016-04-01
Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition's modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of
Directory of Open Access Journals (Sweden)
Peter Caley
Full Text Available A recent study has inferred that the red fox (Vulpes vulpes is now widespread in Tasmania as of 2010, based on the extraction of fox DNA from predator scats. Heuristically, this inference appears at first glance to be at odds with the lack of recent confirmed discoveries of either road-killed foxes--the last of which occurred in 2006, or hunter killed foxes--the most recent in 2001. This paper demonstrates a method to codify this heuristic analysis and produce inferences consistent with assumptions and data. It does this by formalising the analysis in a transparent and repeatable manner to make inference on the past, present and future distribution of an invasive species. It utilizes Approximate Bayesian Computation to make inferences. Importantly, the method is able to inform management of invasive species within realistic time frames, and can be applied widely. We illustrate the technique using the Tasmanian fox data. Based on the pattern of carcass discoveries of foxes in Tasmania, we infer that the population of foxes in Tasmania is most likely extinct, or restricted in distribution and demographically weak as of 2013. It is possible, though unlikely, that that population is widespread and/or demographically robust. This inference is largely at odds with the inference from the predator scat survey data. Our results suggest the chances of successfully eradicating the introduced red fox population in Tasmania may be significantly higher than previously thought.
Caley, Peter; Ramsey, David S L; Barry, Simon C
2015-01-01
A recent study has inferred that the red fox (Vulpes vulpes) is now widespread in Tasmania as of 2010, based on the extraction of fox DNA from predator scats. Heuristically, this inference appears at first glance to be at odds with the lack of recent confirmed discoveries of either road-killed foxes--the last of which occurred in 2006, or hunter killed foxes--the most recent in 2001. This paper demonstrates a method to codify this heuristic analysis and produce inferences consistent with assumptions and data. It does this by formalising the analysis in a transparent and repeatable manner to make inference on the past, present and future distribution of an invasive species. It utilizes Approximate Bayesian Computation to make inferences. Importantly, the method is able to inform management of invasive species within realistic time frames, and can be applied widely. We illustrate the technique using the Tasmanian fox data. Based on the pattern of carcass discoveries of foxes in Tasmania, we infer that the population of foxes in Tasmania is most likely extinct, or restricted in distribution and demographically weak as of 2013. It is possible, though unlikely, that that population is widespread and/or demographically robust. This inference is largely at odds with the inference from the predator scat survey data. Our results suggest the chances of successfully eradicating the introduced red fox population in Tasmania may be significantly higher than previously thought.
International Nuclear Information System (INIS)
Chen, H.; Yonezawa, T.
2014-01-01
Lycium ruthenicum (Solananeae), a spiny shrub mostly distributed in the desert regions of north and northwest China, has been shown to exhibit high tolerance to the extreme environment. In this study, the phylogeography and evolutionary history of L. ruthenicum were examined, on the basis of 80 individuals from eight populations. Using the sequence variations of two spacer regions of chloroplast DNA (trnH-psbA and rps16-trnK) , the absence of a geographic component in the chloroplast DNA genetic structure was identified (GST = 0.351, NST = 0.304, NST< GST), which was consisted with the result of SAMOVA, suggesting weak phylogeographic structure of this species. Phylogenetic and network analyses showed that a total of 10 haplotypes identified in the present study clustered into two clades, in which clade I harbored the ancestral haplotypes that inferred two independent glacial refugia in the middle of Qaidam Basin and the western Inner Mongolia. The existence of regional evolutionary differences was supported by GENETREE, which revealed that one of the population in Qaidam Basin and the two populations in Tarim Basin had experienced rapid expansion, and the other populations retained relatively stable population size during the Pleistocene . Given the results of long-term gene flow and pairwise differences, strong gene flow was insufficient to reduce the genetic differentiation among populations or within populations, probably due to the genetic composition containing a common haplotype and the high number of private haplotypes fixed for most of the population. The divergence times of different lineages were consistent with the rapid uplift phases of the Qinghai-Tibetan Plateau and the initiation and expansion of deserts in northern China, suggesting that the origin and evolution of L. ruthenicum were strongly influenced by Quaternary environment changes. (author)
Kurita, Kazuki; Toda, Mamoru
2017-04-01
We conducted comparative phylogeographic and population genetic analyses of Plestiodon kishinouyei and P. stimpsonii, two sympatric skinks endemic to islands in the southern Ryukyus, to explore different factors that have influenced population structure. Previous phylogenetic studies using partial mitochondrial DNA indicate similar divergence times from their respective closest relatives, suggesting that differences in population structure are driven by intrinsic attributes of either species rather than the common set of extrinsic factors that both presumably have been exposed to throughout their history. In this study, analysis of mtDNA sequences and microsatellite polymorphism demonstrate contrasting patterns of phylogeography and population structure: P. kishinouyei exhibits a lower genetic variability and lower genetic differentiation among islands than P. stimpsonii, consistent with recent population expansion. However, historical demographic analyses indicate that the relatively high genetic uniformity in P. kishinouyei is not attributable to recent expansion. We detected significant isolation-by-distance patterns among P. kishinouyei populations on the land bridge islands, but not among P. stimpsonii populations occurring on those same islands. Our results suggest that P. kishinouyei populations have maintained gene flows across islands until recently, probably via ephemeral Quaternary land bridges. The lower genetic variability in P. kishinouyei may also indicate smaller effective population sizes on average than that of P. stimpsonii. We interpret these differences as a consequence of ecological divergence between the two species, primarily in trophic level and habitat preference.
Learning Probabilistic Inference through Spike-Timing-Dependent Plasticity.
Pecevski, Dejan; Maass, Wolfgang
2016-01-01
Numerous experimental data show that the brain is able to extract information from complex, uncertain, and often ambiguous experiences. Furthermore, it can use such learnt information for decision making through probabilistic inference. Several models have been proposed that aim at explaining how probabilistic inference could be performed by networks of neurons in the brain. We propose here a model that can also explain how such neural network could acquire the necessary information for that from examples. We show that spike-timing-dependent plasticity in combination with intrinsic plasticity generates in ensembles of pyramidal cells with lateral inhibition a fundamental building block for that: probabilistic associations between neurons that represent through their firing current values of random variables. Furthermore, by combining such adaptive network motifs in a recursive manner the resulting network is enabled to extract statistical information from complex input streams, and to build an internal model for the distribution p (*) that generates the examples it receives. This holds even if p (*) contains higher-order moments. The analysis of this learning process is supported by a rigorous theoretical foundation. Furthermore, we show that the network can use the learnt internal model immediately for prediction, decision making, and other types of probabilistic inference.
Bidegaray-Batista, Leticia; Macías-Hernández, Nuria; Oromí, Pedro; Arnedo, Miquel A
2007-08-01
The Eastern Canary Islands are the emerged tips of a continuous volcanic ridge running parallel to the northeastern African coast, originated by episodic volcanic eruptions that can be traced back to the Miocene and that, following a major period of quiescence and erosion, continued from the Pliocene to the present day. The islands have been periodically connected by eustatic sea-level changes resulting from Pleistocene glacial cycles. The ground-dwelling spider Dysdera lancerotensis Simon, 1907 occurs along the entire ridge, except on recent barren lavas and sand dunes, and is therefore an ideal model for studying the effect of episodic geological processes on terrestrial organisms. Nested clade and population genetic analyses using 39 haplotypes from 605 base pairs of mitochondrial DNA cytochrome c oxidase I sequence data, along with phylogenetic analyses including two additional mitochondrial genes, uncover complex phylogeographical and demographic patterns. Our results indicate that D. lancerotensis colonized the ridge from north to south, in contrast to what had been expected given the SSW-NNE trend of volcanism and to what had been reported for other terrestrial arthropods. The occurrence of several episodes of extinction, recolonization and expansion are hypothesized for this species, and areas that act as refugia during volcanic cycles are identified. Relaxed molecular clock methods reveal divergence times between main haplotype lineages that suggest an older origin of the northern islets than anticipated based on geological evidence. This study supports the key role of volcanism in shaping the distribution of terrestrial organisms on oceanic islands and generates phylogeographical predictions that warrant further research into other terrestrial endemisms of this fascinating region.
Inference Attacks and Control on Database Structures
Directory of Open Access Journals (Sweden)
Muhamed Turkanovic
2015-02-01
Full Text Available Today’s databases store information with sensitivity levels that range from public to highly sensitive, hence ensuring confidentiality can be highly important, but also requires costly control. This paper focuses on the inference problem on different database structures. It presents possible treats on privacy with relation to the inference, and control methods for mitigating these treats. The paper shows that using only access control, without any inference control is inadequate, since these models are unable to protect against indirect data access. Furthermore, it covers new inference problems which rise from the dimensions of new technologies like XML, semantics, etc.
CompareSVM: supervised, Support Vector Machine (SVM) inference of gene regularity networks.
Gillani, Zeeshan; Akash, Muhammad Sajid Hamid; Rahaman, M D Matiur; Chen, Ming
2014-11-30
Predication of gene regularity network (GRN) from expression data is a challenging task. There are many methods that have been developed to address this challenge ranging from supervised to unsupervised methods. Most promising methods are based on support vector machine (SVM). There is a need for comprehensive analysis on prediction accuracy of supervised method SVM using different kernels on different biological experimental conditions and network size. We developed a tool (CompareSVM) based on SVM to compare different kernel methods for inference of GRN. Using CompareSVM, we investigated and evaluated different SVM kernel methods on simulated datasets of microarray of different sizes in detail. The results obtained from CompareSVM showed that accuracy of inference method depends upon the nature of experimental condition and size of the network. For network with nodes (SVM Gaussian kernel outperform on knockout, knockdown, and multifactorial datasets compared to all the other inference methods. For network with large number of nodes (~500), choice of inference method depend upon nature of experimental condition. CompareSVM is available at http://bis.zju.edu.cn/CompareSVM/ .
Directory of Open Access Journals (Sweden)
Maritza Barrera
2017-01-01
Full Text Available In 2010, new Chinese strains of porcine epidemic diarrhea virus (PEDV, clinically more severe than the classical strains, emerged. These strains were spread to United States in 2013 through an intercontinental transmission from China with further spreading across the world, evidencing the emergent nature of these strains. In the present study, an analysis of PEDV field sequences from Ecuador was conducted by comparing all the PEDV S gene sequences available in the GenBank database. Phylogenetic comparisons and Bayesian phylogeographic inference based on complete S gene sequences were also conducted to track the origin and putative route of PEDV. The sequence from the PED-outbreak in Ecuador was grouped into the clade II of PEDV genogroup 2a together with other sequences of isolates from Mexico, Canada, and United States. The phylogeographic study revealed the emergence of the Chinese PEDV strains, followed by spreading to US in 2013, from US to Korea, and later the introduction of PEDV to Canada, Mexico, and Ecuador directly from the US. The sources of imports of live swine in Ecuador in 2014 were mainly from Chile and US. Thus, this movement of pigs is suggested as the main way for introducing PEDV to Ecuador.
DIM SUM: demography and individual migration simulated using a Markov chain.
Brown, Jeremy M; Savidge, Kevin; McTavish, Emily Jane B
2011-03-01
An increasing number of studies seek to infer demographic history, often jointly with genetic relationships. Despite numerous analytical methods for such data, few simulations have investigated the methods' power and robustness, especially when underlying assumptions have been violated. DIM SUM (Demography and Individual Migration Simulated Using a Markov chain) is a stand-alone Java program for the simulation of population demography and individual migration while recording ancestor-descendant relationships. It does not employ coalescent assumptions or discrete population boundaries. It is extremely flexible, allowing the user to specify border positions, reactions of organisms to borders, local and global carrying capacities, individual dispersal kernels, rates of reproduction and strategies for sampling individuals. Spatial variables may be specified using image files (e.g., as exported from gis software) and may vary through time. In combination with software for genetic marker simulation, DIM SUM will be useful for testing phylogeographic (e.g., nested clade phylogeographic analysis, coalescent-based tests and continuous-landscape frameworks) and landscape-genetic methods, specifically regarding violations of coalescent assumptions. It can also be used to explore the qualitative features of proposed demographic scenarios (e.g. regarding biological invasions) and as a pedagogical tool. DIM SUM (with user's manual) can be downloaded from http://code.google.com/p/bio-dimsum. © 2010 Blackwell Publishing Ltd.
Zhao, Yunpeng; Qi, Zhechen; Ma, Weiwei; Dai, Qiongyan; Li, Pan; Cameron, Kenneth M; Lee, Joongku; Xiang, Qiu-Yun Jenny; Fu, Chengxin
2013-08-01
The Smilax hispida group (Smilacaceae) exhibits a discontinuous distribution in eastern Asia, eastern and western United States, and Mexico. A broad scale phylogeographic analysis was conducted for this group to evaluate the hypotheses of accelerated allopatric divergence in eastern Asia and a northern origin of the temperate elements in Mexico. Phylogeny was inferred using seven plastid and nuclear DNA sequences. Species delineation was assessed using genealogical sorting indices (GSI). Lineage divergence time, haplotype diversification rates, and ancestral distributions were estimated using Bayesian methods. Phylogeographic patterns in eastern Asia and North America were compared by analyzing 539 individuals from 64 populations to assess allopatric diversification. Results strongly supported delineation of six allopatric species, the origin of this group from a Mexican ancestor around 11.42mya, and Mexican origins of the temperate species in Mexico. Significant geographic structure of haplotypes was found in eastern Asia, and greater haplotype diversification rate was observed for the North American lineage. Our data support allopatric speciation in eastern Asia but do not find evidence of an elevated diversification rate. Greater species diversity of the study system in eastern Asia may be due to a longer evolutionary history. Our results do not support northern origins of the Mexican temperate species. Copyright © 2013 Elsevier Inc. All rights reserved.
A Bootstrap Approach to Computing Uncertainty in Inferred Oil and Gas Reserve Estimates
International Nuclear Information System (INIS)
Attanasi, Emil D.; Coburn, Timothy C.
2004-01-01
This study develops confidence intervals for estimates of inferred oil and gas reserves based on bootstrap procedures. Inferred reserves are expected additions to proved reserves in previously discovered conventional oil and gas fields. Estimates of inferred reserves accounted for 65% of the total oil and 34% of the total gas assessed in the U.S. Geological Survey's 1995 National Assessment of oil and gas in US onshore and State offshore areas. When the same computational methods used in the 1995 Assessment are applied to more recent data, the 80-year (from 1997 through 2076) inferred reserve estimates for pre-1997 discoveries located in the lower 48 onshore and state offshore areas amounted to a total of 39.7 billion barrels of oil (BBO) and 293 trillion cubic feet (TCF) of gas. The 90% confidence interval about the oil estimate derived from the bootstrap approach is 22.4 BBO to 69.5 BBO. The comparable 90% confidence interval for the inferred gas reserve estimate is 217 TCF to 413 TCF. The 90% confidence interval describes the uncertainty that should be attached to the estimates. It also provides a basis for developing scenarios to explore the implications for energy policy analysis
DeepInfer: open-source deep learning deployment toolkit for image-guided therapy
Mehrtash, Alireza; Pesteie, Mehran; Hetherington, Jorden; Behringer, Peter A.; Kapur, Tina; Wells, William M.; Rohling, Robert; Fedorov, Andriy; Abolmaesumi, Purang
2017-03-01
Deep learning models have outperformed some of the previous state-of-the-art approaches in medical image analysis. Instead of using hand-engineered features, deep models attempt to automatically extract hierarchical representations at multiple levels of abstraction from the data. Therefore, deep models are usually considered to be more flexible and robust solutions for image analysis problems compared to conventional computer vision models. They have demonstrated significant improvements in computer-aided diagnosis and automatic medical image analysis applied to such tasks as image segmentation, classification and registration. However, deploying deep learning models often has a steep learning curve and requires detailed knowledge of various software packages. Thus, many deep models have not been integrated into the clinical research work ows causing a gap between the state-of-the-art machine learning in medical applications and evaluation in clinical research procedures. In this paper, we propose "DeepInfer" - an open-source toolkit for developing and deploying deep learning models within the 3D Slicer medical image analysis platform. Utilizing a repository of task-specific models, DeepInfer allows clinical researchers and biomedical engineers to deploy a trained model selected from the public registry, and apply it to new data without the need for software development or configuration. As two practical use cases, we demonstrate the application of DeepInfer in prostate segmentation for targeted MRI-guided biopsy and identification of the target plane in 3D ultrasound for spinal injections.
Type Inference with Inequalities
DEFF Research Database (Denmark)
Schwartzbach, Michael Ignatieff
1991-01-01
of (monotonic) inequalities on the types of variables and expressions. A general result about systems of inequalities over semilattices yields a solvable form. We distinguish between deciding typability (the existence of solutions) and type inference (the computation of a minimal solution). In our case, both......Type inference can be phrased as constraint-solving over types. We consider an implicitly typed language equipped with recursive types, multiple inheritance, 1st order parametric polymorphism, and assignments. Type correctness is expressed as satisfiability of a possibly infinite collection...
Assessing children's inference generation: what do tests of reading comprehension measure?
Bowyer-Crane, Claudine; Snowling, Margaret J
2005-06-01
Previous research suggests that children with specific comprehension difficulties have problems with the generation of inferences. This raises important questions as to whether poor comprehenders have poor comprehension skills generally, or whether their problems are confined to specific inference types. The main aims of the study were (a) using two commonly used tests of reading comprehension to classify the questions requiring the generation of inferences, and (b) to investigate the relative performance of skilled and less-skilled comprehenders on questions tapping different inference types. The performance of 10 poor comprehenders (mean age 110.06 months) was compared with the performance of 10 normal readers (mean age 112.78 months) on two tests of reading comprehension. A qualitative analysis of the NARA II (form 1) and the WORD comprehension subtest was carried out. Participants were then administered the NARA II, WORD comprehension subtest and a test of non-word reading. The NARA II was heavily reliant on the generation of knowledge-based inferences, while the WORD comprehension subtest was biased towards the retention of literal information. Children identified by the NARA II as having comprehension difficulties performed in the normal range on the WORD comprehension subtests. Further, children with comprehension difficulties performed poorly on questions requiring the generation of knowledge-based and elaborative inferences. However, they were able to answer questions requiring attention to literal information or use of cohesive devices at a level comparable to normal readers. Different reading tests tap different types of inferencing skills. Lessskilled comprehenders have particular difficulty applying real-world knowledge to a text during reading, and this has implications for the formulation of effective intervention strategies.
Disclosure-Protected Inference with Linked Microdata Using a Remote Analysis Server
Directory of Open Access Journals (Sweden)
Chipperfield James O.
2014-03-01
Full Text Available Large amounts of microdata are collected by data custodians in the form of censuses and administrative records. Often, data custodians will collect different information on the same individual. Many important questions can be answered by linking microdata collected by different data custodians. For this reason, there is very strong demand from analysts, within government, business, and universities, for linked microdata. However, many data custodians are legally obliged to ensure the risk of disclosing information about a person or organisation is acceptably low. Different authors have considered the problem of how to facilitate reliable statistical inference from analysis of linked microdata while ensuring that the risk of disclosure is acceptably low. This article considers the problem from the perspective of an Integrating Authority that, by definition, is trusted to link the microdata and to facilitate analysts’ access to the linked microdata via a remote server, which allows analysts to fit models and view the statistical output without being able to observe the underlying linked microdata. One disclosure risk that must be managed by an Integrating Authority is that one data custodian may use the microdata it supplied to the Integrating Authority and statistical output released from the remote server to disclose information about a person or organisation that was supplied by the other data custodian. This article considers analysis of only binary variables. The utility and disclosure risk of the proposed method are investigated both in a simulation and using a real example. This article shows that some popular protections against disclosure (dropping records, rounding regression coefficients or imposing restrictions on model selection can be ineffective in the above setting.
Inferring Social Influence of Anti-Tobacco Mass Media Campaign.
Zhan, Qianyi; Zhang, Jiawei; Yu, Philip S; Emery, Sherry; Xie, Junyuan
2017-07-01
Anti-tobacco mass media campaigns are designed to influence tobacco users. It has been proved that campaigns will produce users' changes in awareness, knowledge, and attitudes, and also produce meaningful behavior change of audience. Anti-smoking television advertising is the most important part in the campaign. Meanwhile, nowadays, successful online social networks are creating new media environment, however, little is known about the relation between social conversations and anti-tobacco campaigns. This paper aims to infer social influence of these campaigns, and the problem is formally referred to as the Social Influence inference of anti-Tobacco mass mEdia campaigns (Site) problem. To address the Site problem, a novel influence inference framework, TV advertising social influence estimation (Asie), is proposed based on our analysis of two real anti-tobacco campaigns. Asie divides audience attitudes toward TV ads into three distinct stages: 1) cognitive; 2) affective; and 3) conative. Audience online reactions at each of these three stages are depicted by Asie with specific probabilistic models based on the synergistic influences from both online social friends and offline TV ads. Extensive experiments demonstrate the effectiveness of Asie.
Inference in models with adaptive learning
Chevillon, G.; Massmann, M.; Mavroeidis, S.
2010-01-01
Identification of structural parameters in models with adaptive learning can be weak, causing standard inference procedures to become unreliable. Learning also induces persistent dynamics, and this makes the distribution of estimators and test statistics non-standard. Valid inference can be
Bayesian inference with information content model check for Langevin equations
DEFF Research Database (Denmark)
Krog, Jens F. C.; Lomholt, Michael Andersen
2017-01-01
The Bayesian data analysis framework has been proven to be a systematic and effective method of parameter inference and model selection for stochastic processes. In this work we introduce an information content model check which may serve as a goodness-of-fit, like the chi-square procedure...
International Nuclear Information System (INIS)
Donker, H.C.; Katsnelson, M.I.; De Raedt, H.; Michielsen, K.
2016-01-01
The logical inference approach to quantum theory, proposed earlier De Raedt et al. (2014), is considered in a relativistic setting. It is shown that the Klein–Gordon equation for a massive, charged, and spinless particle derives from the combination of the requirements that the space–time data collected by probing the particle is obtained from the most robust experiment and that on average, the classical relativistic equation of motion of a particle holds. - Highlights: • Logical inference applied to relativistic, massive, charged, and spinless particle experiments leads to the Klein–Gordon equation. • The relativistic Hamilton–Jacobi is scrutinized by employing a field description for the four-velocity. • Logical inference allows analysis of experiments with uncertainty in detection events and experimental conditions.
Asymptotic inference for waiting times and patiences in queues with abandonment
DEFF Research Database (Denmark)
Gorst-Rasmussen, Anders; Hansen, Martin Bøgsted
2009-01-01
Motivated by applications in call center management, we propose a framework based on empirical process techniques for inference about waiting time and patience distributions in multiserver queues with abandonment. The framework rigorises heuristics based on survival analysis of independent...
Fiducial inference - A Neyman-Pearson interpretation
Salome, D; VonderLinden, W; Dose,; Fischer, R; Preuss, R
1999-01-01
Fisher's fiducial argument is a tool for deriving inferences in the form of a probability distribution on the parameter space, not based on Bayes's Theorem. Lindley established that in exceptional situations fiducial inferences coincide with posterior distributions; in the other situations fiducial
Uncertainty in prediction and in inference
Hilgevoord, J.; Uffink, J.
1991-01-01
The concepts of uncertainty in prediction and inference are introduced and illustrated using the diffraction of light as an example. The close re-lationship between the concepts of uncertainty in inference and resolving power is noted. A general quantitative measure of uncertainty in
Pettengill, James B; Moeller, David A
2012-09-01
The origins of hybrid zones between parapatric taxa have been of particular interest for understanding the evolution of reproductive isolation and the geographic context of species divergence. One challenge has been to distinguish between allopatric divergence (followed by secondary contact) versus primary intergradation (parapatric speciation) as alternative divergence histories. Here, we use complementary phylogeographic and population genetic analyses to investigate the recent divergence of two subspecies of Clarkia xantiana and the formation of a hybrid zone within the narrow region of sympatry. We tested alternative phylogeographic models of divergence using approximate Bayesian computation (ABC) and found strong support for a secondary contact model and little support for a model allowing for gene flow throughout the divergence process (i.e. primary intergradation). Two independent methods for inferring the ancestral geography of each subspecies, one based on probabilistic character state reconstructions and the other on palaeo-distribution modelling, also support a model of divergence in allopatry and range expansion leading to secondary contact. The membership of individuals to genetic clusters suggests geographic substructure within each taxon where allopatric and sympatric samples are primarily found in separate clusters. We also observed coincidence and concordance of genetic clines across three types of molecular markers, which suggests that there is a strong barrier to gene flow. Taken together, our results provide evidence for allopatric divergence followed by range expansion leading to secondary contact. The location of refugial populations and the directionality of range expansion are consistent with expectations based on climate change since the last glacial maximum. Our approach also illustrates the utility of combining phylogeographic hypothesis testing with species distribution modelling and fine-scale population genetic analyses for inferring
Bertola, L. D.; Jongbloed, H.; van der Gaag, K. J.; de Knijff, P.; Yamaguchi, N.; Hooghiemstra, H.; Bauer, H.; Henschel, P.; White, P. A.; Driscoll, C. A.; Tende, T.; Ottosson, U.; Saidu, Y.; Vrieling, K.; de Iongh, H. H.
2016-08-01
Comparative phylogeography of African savannah mammals shows a congruent pattern in which populations in West/Central Africa are distinct from populations in East/Southern Africa. However, for the lion, all African populations are currently classified as a single subspecies (Panthera leo leo), while the only remaining population in Asia is considered to be distinct (Panthera leo persica). This distinction is disputed both by morphological and genetic data. In this study we introduce the lion as a model for African phylogeography. Analyses of mtDNA sequences reveal six supported clades and a strongly supported ancestral dichotomy with northern populations (West Africa, Central Africa, North Africa/Asia) on one branch, and southern populations (North East Africa, East/Southern Africa and South West Africa) on the other. We review taxonomies and phylogenies of other large savannah mammals, illustrating that similar clades are found in other species. The described phylogeographic pattern is considered in relation to large scale environmental changes in Africa over the past 300,000 years, attributable to climate. Refugial areas, predicted by climate envelope models, further confirm the observed pattern. We support the revision of current lion taxonomy, as recognition of a northern and a southern subspecies is more parsimonious with the evolutionary history of the lion.
Bertola, L D; Jongbloed, H; van der Gaag, K J; de Knijff, P; Yamaguchi, N; Hooghiemstra, H; Bauer, H; Henschel, P; White, P A; Driscoll, C A; Tende, T; Ottosson, U; Saidu, Y; Vrieling, K; de Iongh, H H
2016-08-04
Comparative phylogeography of African savannah mammals shows a congruent pattern in which populations in West/Central Africa are distinct from populations in East/Southern Africa. However, for the lion, all African populations are currently classified as a single subspecies (Panthera leo leo), while the only remaining population in Asia is considered to be distinct (Panthera leo persica). This distinction is disputed both by morphological and genetic data. In this study we introduce the lion as a model for African phylogeography. Analyses of mtDNA sequences reveal six supported clades and a strongly supported ancestral dichotomy with northern populations (West Africa, Central Africa, North Africa/Asia) on one branch, and southern populations (North East Africa, East/Southern Africa and South West Africa) on the other. We review taxonomies and phylogenies of other large savannah mammals, illustrating that similar clades are found in other species. The described phylogeographic pattern is considered in relation to large scale environmental changes in Africa over the past 300,000 years, attributable to climate. Refugial areas, predicted by climate envelope models, further confirm the observed pattern. We support the revision of current lion taxonomy, as recognition of a northern and a southern subspecies is more parsimonious with the evolutionary history of the lion.
Asymptotic inference for waiting times and patiences in queues with abandonment
DEFF Research Database (Denmark)
Gorst-Rasmussen, Anders; Hansen, Martin Bøgsted
Motivated by applications in call center management, we propose a framework based on empirical process techniques for inference about the waiting time and patience distribution in multiserver queues with abandonment. The framework rigorises heuristics based on survival analysis of independent...
Polynomial Chaos Surrogates for Bayesian Inference
Le Maitre, Olivier
2016-01-06
The Bayesian inference is a popular probabilistic method to solve inverse problems, such as the identification of field parameter in a PDE model. The inference rely on the Bayes rule to update the prior density of the sought field, from observations, and derive its posterior distribution. In most cases the posterior distribution has no explicit form and has to be sampled, for instance using a Markov-Chain Monte Carlo method. In practice the prior field parameter is decomposed and truncated (e.g. by means of Karhunen- Lo´eve decomposition) to recast the inference problem into the inference of a finite number of coordinates. Although proved effective in many situations, the Bayesian inference as sketched above faces several difficulties requiring improvements. First, sampling the posterior can be a extremely costly task as it requires multiple resolutions of the PDE model for different values of the field parameter. Second, when the observations are not very much informative, the inferred parameter field can highly depends on its prior which can be somehow arbitrary. These issues have motivated the introduction of reduced modeling or surrogates for the (approximate) determination of the parametrized PDE solution and hyperparameters in the description of the prior field. Our contribution focuses on recent developments in these two directions: the acceleration of the posterior sampling by means of Polynomial Chaos expansions and the efficient treatment of parametrized covariance functions for the prior field. We also discuss the possibility of making such approach adaptive to further improve its efficiency.
Interactive Instruction in Bayesian Inference
DEFF Research Database (Denmark)
Khan, Azam; Breslav, Simon; Hornbæk, Kasper
2018-01-01
An instructional approach is presented to improve human performance in solving Bayesian inference problems. Starting from the original text of the classic Mammography Problem, the textual expression is modified and visualizations are added according to Mayer’s principles of instruction. These pri......An instructional approach is presented to improve human performance in solving Bayesian inference problems. Starting from the original text of the classic Mammography Problem, the textual expression is modified and visualizations are added according to Mayer’s principles of instruction....... These principles concern coherence, personalization, signaling, segmenting, multimedia, spatial contiguity, and pretraining. Principles of self-explanation and interactivity are also applied. Four experiments on the Mammography Problem showed that these principles help participants answer the questions...... that an instructional approach to improving human performance in Bayesian inference is a promising direction....
Inferring Phylogenetic Networks Using PhyloNet.
Wen, Dingqiao; Yu, Yun; Zhu, Jiafan; Nakhleh, Luay
2018-07-01
PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.
Active inference and learning.
Friston, Karl; FitzGerald, Thomas; Rigoli, Francesco; Schwartenbeck, Philipp; O Doherty, John; Pezzulo, Giovanni
2016-09-01
This paper offers an active inference account of choice behaviour and learning. It focuses on the distinction between goal-directed and habitual behaviour and how they contextualise each other. We show that habits emerge naturally (and autodidactically) from sequential policy optimisation when agents are equipped with state-action policies. In active inference, behaviour has explorative (epistemic) and exploitative (pragmatic) aspects that are sensitive to ambiguity and risk respectively, where epistemic (ambiguity-resolving) behaviour enables pragmatic (reward-seeking) behaviour and the subsequent emergence of habits. Although goal-directed and habitual policies are usually associated with model-based and model-free schemes, we find the more important distinction is between belief-free and belief-based schemes. The underlying (variational) belief updating provides a comprehensive (if metaphorical) process theory for several phenomena, including the transfer of dopamine responses, reversal learning, habit formation and devaluation. Finally, we show that active inference reduces to a classical (Bellman) scheme, in the absence of ambiguity. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Active Inference, homeostatic regulation and adaptive behavioural control.
Pezzulo, Giovanni; Rigoli, Francesco; Friston, Karl
2015-11-01
We review a theory of homeostatic regulation and adaptive behavioural control within the Active Inference framework. Our aim is to connect two research streams that are usually considered independently; namely, Active Inference and associative learning theories of animal behaviour. The former uses a probabilistic (Bayesian) formulation of perception and action, while the latter calls on multiple (Pavlovian, habitual, goal-directed) processes for homeostatic and behavioural control. We offer a synthesis these classical processes and cast them as successive hierarchical contextualisations of sensorimotor constructs, using the generative models that underpin Active Inference. This dissolves any apparent mechanistic distinction between the optimization processes that mediate classical control or learning. Furthermore, we generalize the scope of Active Inference by emphasizing interoceptive inference and homeostatic regulation. The ensuing homeostatic (or allostatic) perspective provides an intuitive explanation for how priors act as drives or goals to enslave action, and emphasises the embodied nature of inference. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Human brain lesion-deficit inference remapped.
Mah, Yee-Haur; Husain, Masud; Rees, Geraint; Nachev, Parashkev
2014-09-01
Our knowledge of the anatomical organization of the human brain in health and disease draws heavily on the study of patients with focal brain lesions. Historically the first method of mapping brain function, it is still potentially the most powerful, establishing the necessity of any putative neural substrate for a given function or deficit. Great inferential power, however, carries a crucial vulnerability: without stronger alternatives any consistent error cannot be easily detected. A hitherto unexamined source of such error is the structure of the high-dimensional distribution of patterns of focal damage, especially in ischaemic injury-the commonest aetiology in lesion-deficit studies-where the anatomy is naturally shaped by the architecture of the vascular tree. This distribution is so complex that analysis of lesion data sets of conventional size cannot illuminate its structure, leaving us in the dark about the presence or absence of such error. To examine this crucial question we assembled the largest known set of focal brain lesions (n = 581), derived from unselected patients with acute ischaemic injury (mean age = 62.3 years, standard deviation = 17.8, male:female ratio = 0.547), visualized with diffusion-weighted magnetic resonance imaging, and processed with validated automated lesion segmentation routines. High-dimensional analysis of this data revealed a hidden bias within the multivariate patterns of damage that will consistently distort lesion-deficit maps, displacing inferred critical regions from their true locations, in a manner opaque to replication. Quantifying the size of this mislocalization demonstrates that past lesion-deficit relationships estimated with conventional inferential methodology are likely to be significantly displaced, by a magnitude dependent on the unknown underlying lesion-deficit relationship itself. Past studies therefore cannot be retrospectively corrected, except by new knowledge that would render them redundant
Generative Inferences Based on Learned Relations
Chen, Dawn; Lu, Hongjing; Holyoak, Keith J.
2017-01-01
A key property of relational representations is their "generativity": From partial descriptions of relations between entities, additional inferences can be drawn about other entities. A major theoretical challenge is to demonstrate how the capacity to make generative inferences could arise as a result of learning relations from…
Ekki Syamsulhakim
2008-01-01
This paper aims to exercise a rather recent trend in applied microeconometrics, namely the effect of sampling design on statistical inference, especially on binary outcome model. Many theoretical research in econometrics have shown the inappropriateness of applying i.i.dassumed statistical analysis on non-i.i.d data. These research have provided proofs showing that applying the iid-assumed analysis on a non-iid observations would result in an inflated standard errors which could make the esti...
Fomekong-Nanfack, Y.; Postma, M.; Kaandorp, J.A.
2009-01-01
Abstract Background Inference of gene regulatory networks (GRNs) requires accurate data, a method to simulate the expression patterns and an efficient optimization algorithm to estimate the unknown parameters. Using this approach it is possible to obtain alternative circuits without making any a priori assumptions about the interactions, which all simulate the observed patterns. It is important to analyze the properties of the circuits. Findings We have analyzed the simulated gene expression ...
Sensitivity of Inferred Electron Temperature from X-ray Emission of NIF Cryogenic DT Implosions
Energy Technology Data Exchange (ETDEWEB)
Klem, Michael [Univ. of Dallas, Irving, TX (United States)
2015-05-01
The National Ignition Facility (NIF) at the Lawrence Livermore National Laboratory seeks to achieve thermonuclear ignition through inertial confinement fusion. The accurate assessment of the performance of each implosion experiment is a crucial step. Here we report on work to derive a reliable electron temperature for the cryogenic deuteriumtritium implosions completed on the NIF using the xray signal from the Ross filter diagnostic. These Xrays are dominated by bremsstrahlung emission. By fitting the xray signal measured through each of the individual Ross filters, the source bremsstrahlung spectrum can be inferred, and an electron temperature of the implosion hot spot inferred. Currently, each filter is weighted equally in this analysis. We present work quantifying the errors with such a technique and the results from investigating the contribution of each filter to the overall accuracy of the temperature inference. Using this research, we also compare the inferred electron temperature against other measured implosion quantities to develop a more complete understanding of the hotspot physics.
Parametric statistical inference basic theory and modern approaches
Zacks, Shelemyahu; Tsokos, C P
1981-01-01
Parametric Statistical Inference: Basic Theory and Modern Approaches presents the developments and modern trends in statistical inference to students who do not have advanced mathematical and statistical preparation. The topics discussed in the book are basic and common to many fields of statistical inference and thus serve as a jumping board for in-depth study. The book is organized into eight chapters. Chapter 1 provides an overview of how the theory of statistical inference is presented in subsequent chapters. Chapter 2 briefly discusses statistical distributions and their properties. Chapt
Segmentation, Inference and Classification of Partially Overlapping Nanoparticles
Chiwoo Park,
2013-03-01
This paper presents a method that enables automated morphology analysis of partially overlapping nanoparticles in electron micrographs. In the undertaking of morphology analysis, three tasks appear necessary: separate individual particles from an agglomerate of overlapping nano-objects; infer the particle\\'s missing contours; and ultimately, classify the particles by shape based on their complete contours. Our specific method adopts a two-stage approach: the first stage executes the task of particle separation, and the second stage conducts simultaneously the tasks of contour inference and shape classification. For the first stage, a modified ultimate erosion process is developed for decomposing a mixture of particles into markers, and then, an edge-to-marker association method is proposed to identify the set of evidences that eventually delineate individual objects. We also provided theoretical justification regarding the separation capability of the first stage. In the second stage, the set of evidences become inputs to a Gaussian mixture model on B-splines, the solution of which leads to the joint learning of the missing contour and the particle shape. Using twelve real electron micrographs of overlapping nanoparticles, we compare the proposed method with seven state-of-the-art methods. The results show the superiority of the proposed method in terms of particle recognition rate.
Jordan, Stephen A.; Simon, C.; Foote, D.; Englund, R.A.
2005-01-01
The Pleistocene geological history of the Hawaiian Islands is becoming well understood. Numerous predictions about the influence of this history on the genetic diversity of Hawaiian organisms have been made, including the idea that changing sea levels would lead to the genetic differentiation of populations isolated on individual volcanoes during high sea stands. Here, we analyse DNA sequence data from two closely related, endemic Hawaiian damselfly species in order to test these predictions, and generate novel insights into the effects of Pleistocene glaciation and climate change on island organisms. Megalagrion xanthomelas and Megalagrion pacificum are currently restricted to five islands, including three islands of the Maui Nui super-island complex (Molokai, Lanai, and Maui) that were connected during periods of Pleistocene glaciation, and Hawaii island, which has never been subdivided. Maui Nui and Hawaii are effectively a controlled, natural experiment on the genetic effects of Pleistocene sea level change. We confirm well-defined morphological species boundaries using data from the nuclear EF-1?? gene and show that the species are reciprocally monophyletic. We perform phylogeographic analyses of 663 base pairs (bp) of cytochrome oxidase subunit II (COII) gene sequence data from 157 individuals representing 25 populations. Our results point to the importance of Pleistocene land bridges and historical island habitat availability in maintaining inter-island gene flow. We also propose that repeated bottlenecks on Maui Nui caused by sea level change and restricted habitat availability are likely responsible for low genetic diversity there. An island analogue to northern genetic purity and southern diversity is proposed, whereby islands with little suitable habitat exhibit genetic purity while islands with more exhibit genetic diversity. ?? 2005 Blackwell Publishing Ltd.
Variational inference & deep learning: A new synthesis
Kingma, D.P.
2017-01-01
In this thesis, Variational Inference and Deep Learning: A New Synthesis, we propose novel solutions to the problems of variational (Bayesian) inference, generative modeling, representation learning, semi-supervised learning, and stochastic optimization.
Variational inference & deep learning : A new synthesis
Kingma, D.P.
2017-01-01
In this thesis, Variational Inference and Deep Learning: A New Synthesis, we propose novel solutions to the problems of variational (Bayesian) inference, generative modeling, representation learning, semi-supervised learning, and stochastic optimization.
Bottlenecks and Hubs in Inferred Networks Are Important for Virulence in Salmonella typhimurium
Energy Technology Data Exchange (ETDEWEB)
McDermott, Jason E.; Taylor, Ronald C.; Yoon, Hyunjin; Heffron, Fred
2009-02-01
Recent advances in experimental methods have provided sufficient data to consider systems as large networks of interconnected components. High-throughput determination of protein-protein interaction networks has led to the observation that topological bottlenecks, that is proteins defined by high centrality in the network, are enriched in proteins with systems-level phenotypes such as essentiality. Global transcriptional profiling by microarray analysis has been used extensively to characterize systems, for example, cellular response to environmental conditions and genetic mutations. These transcriptomic datasets have been used to infer regulatory and functional relationship networks based on co-regulation. We use the context likelihood of relatedness (CLR) method to infer networks from two datasets gathered from the pathogen Salmonella typhimurium; one under a range of environmental culture conditions and the other from deletions of 15 regulators found to be essential in virulence. Bottleneck nodes were identified from these inferred networks and we show that these nodes are significantly more likely to be essential for virulence than their non-bottleneck counterparts. A network generated using Pearson correlation did not display this behavior. Overall this study demonstrates that topology of networks inferred from global transcriptional profiles provides information about the systems-level roles of bottleneck genes. Analysis of the differences between the two CLR-derived networks suggests that the bottleneck nodes are either mediators of transitions between system states or sentinels that reflect the dynamics of these transitions.
DEFF Research Database (Denmark)
Sangula, Abraham K.; Belsham, Graham; Muwanika, Vincent B.
2010-01-01
Background: In East Africa, foot-and-mouth disease virus serotype SAT 1 is responsible for occasional severe outbreaks in livestock and is known to be maintained within the buffalo populations. Little is known about the evolutionary forces underlying its epidemiology in the region. To enhance our...... 1 FMD viruses from East Africa has been determined and compared with known sequences derived from other SAT 1 viruses from sub-Saharan Africa. Purifying (negative) selection and low substitution rates characterized the SAT 1 virus isolates in East Africa. Two virus groups with probable independent...... appreciation of the epidemiological status of serotype SAT 1 virus in the region, we inferred its evolutionary and phylogeographic history by means of genealogy-based coalescent methods using 53 VP1 coding sequences covering a sampling period from 1948-2007. Results: The VP1 coding sequence of 11 serotype SAT...
Directory of Open Access Journals (Sweden)
Zhou Fan
Full Text Available Geographic distance and geographical barriers likely play a considerable role in structuring genetic variation in species, although some migratory species may have less phylogeographic structure on a smaller spatial scale. Here, genetic diversity and the phylogenetic structure among geographical populations of the yellow-spined bamboo locust, Ceracris kiangsu, were examined with 16S rDNA and amplified fragment length polymorphisms (AFLPs. In this study, no conspicuous phylogeographical structure was discovered from either Maximum parsimony (MP and Neighbor-joining (NJ phylogenetic analyses. The effect of geographical isolation was not conspicuous on a large spatial scale.At smaller spatial scales local diversity of some populations within mountainous areas were detected using Nei's genetic distance and AMOVA. There is a high level of genetic diversity and a low genetic differentiation among populations in the C. kiangsu of South and Southeast China. Our analyses indicate that C. kiangsu is a monophyletic group. Our results also support the hypothesis that the C. kiangsu population is in a primary differentiation stage. Given the mismatch distribution, it is likely that a population expansion in C. kiangsu occurred about 0.242 Ma during the Quaternary interglaciation. Based on historical reports, we conjecture that human activities had significant impacts on the C. kiangsu gene flow.
Learning Probabilistic Inference through Spike-Timing-Dependent Plasticity123
Pecevski, Dejan
2016-01-01
Abstract Numerous experimental data show that the brain is able to extract information from complex, uncertain, and often ambiguous experiences. Furthermore, it can use such learnt information for decision making through probabilistic inference. Several models have been proposed that aim at explaining how probabilistic inference could be performed by networks of neurons in the brain. We propose here a model that can also explain how such neural network could acquire the necessary information for that from examples. We show that spike-timing-dependent plasticity in combination with intrinsic plasticity generates in ensembles of pyramidal cells with lateral inhibition a fundamental building block for that: probabilistic associations between neurons that represent through their firing current values of random variables. Furthermore, by combining such adaptive network motifs in a recursive manner the resulting network is enabled to extract statistical information from complex input streams, and to build an internal model for the distribution p* that generates the examples it receives. This holds even if p* contains higher-order moments. The analysis of this learning process is supported by a rigorous theoretical foundation. Furthermore, we show that the network can use the learnt internal model immediately for prediction, decision making, and other types of probabilistic inference. PMID:27419214
Burns, Kevin J; Barhoum, Dino N
2006-01-01
The phylogeography of a variety of species has been studied within the California Floristic Province; however, few studies have examined genetic variation in bird species across the entire region. This study uses mitochondrial DNA data to investigate the phylogeography of the wrentit (Chamaea fasciata), a sedentary bird native to scrub and chaparral habitats of this region. Analysis of molecular variance shows geographic structure, and maximum likelihood, Bayesian, and parsimony analyses consistently identify six main clades that are each restricted geographically. Nested clade phylogeographic analyses infer an overall range expansion for the entire cladogram, and a range expansion is also inferred from the mismatch distribution. Thus, our results suggest that the wrentit was isolated into southern refugia during the Pleistocene and has undergone a recent range expansion. Southern refugia and a range expansion were also identified in a previous study of the California thrasher (Toxostoma redivivum). The wrentit did not show marked divergence between northern and southern California defined by the Transverse Ranges, a pattern seen in a variety of other taxa within this region, including some birds.
Ensemble stacking mitigates biases in inference of synaptic connectivity
Directory of Open Access Journals (Sweden)
Brendan Chambers
2018-03-01
Full Text Available A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches. Mapping the routing of spikes through local circuitry is crucial for understanding neocortical computation. Under appropriate experimental conditions, these maps can be used to infer likely patterns of synaptic recruitment, linking activity to underlying anatomical connections. Such inferences help to reveal the synaptic implementation of population dynamics and computation. We compare a number of standard functional measures to infer underlying connectivity. We find that regularization impacts measures
Constraint Satisfaction Inference : Non-probabilistic Global Inference for Sequence Labelling
Canisius, S.V.M.; van den Bosch, A.; Daelemans, W.; Basili, R.; Moschitti, A.
2006-01-01
We present a new method for performing sequence labelling based on the idea of using a machine-learning classifier to generate several possible output sequences, and then applying an inference procedure to select the best sequence among those. Most sequence labelling methods following a similar
Reasoning about Informal Statistical Inference: One Statistician's View
Rossman, Allan J.
2008-01-01
This paper identifies key concepts and issues associated with the reasoning of informal statistical inference. I focus on key ideas of inference that I think all students should learn, including at secondary level as well as tertiary. I argue that a fundamental component of inference is to go beyond the data at hand, and I propose that statistical…
PIA: An Intuitive Protein Inference Engine with a Web-Based User Interface.
Uszkoreit, Julian; Maerkens, Alexandra; Perez-Riverol, Yasset; Meyer, Helmut E; Marcus, Katrin; Stephan, Christian; Kohlbacher, Oliver; Eisenacher, Martin
2015-07-02
Protein inference connects the peptide spectrum matches (PSMs) obtained from database search engines back to proteins, which are typically at the heart of most proteomics studies. Different search engines yield different PSMs and thus different protein lists. Analysis of results from one or multiple search engines is often hampered by different data exchange formats and lack of convenient and intuitive user interfaces. We present PIA, a flexible software suite for combining PSMs from different search engine runs and turning these into consistent results. PIA can be integrated into proteomics data analysis workflows in several ways. A user-friendly graphical user interface can be run either locally or (e.g., for larger core facilities) from a central server. For automated data processing, stand-alone tools are available. PIA implements several established protein inference algorithms and can combine results from different search engines seamlessly. On several benchmark data sets, we show that PIA can identify a larger number of proteins at the same protein FDR when compared to that using inference based on a single search engine. PIA supports the majority of established search engines and data in the mzIdentML standard format. It is implemented in Java and freely available at https://github.com/mpc-bioinformatics/pia.
Lee, K. David; Wiesenfeld, Eric; Gelfand, Andrew
2007-04-01
One of the greatest challenges in modern combat is maintaining a high level of timely Situational Awareness (SA). In many situations, computational complexity and accuracy considerations make the development and deployment of real-time, high-level inference tools very difficult. An innovative hybrid framework that combines Bayesian inference, in the form of Bayesian Networks, and Possibility Theory, in the form of Fuzzy Logic systems, has recently been introduced to provide a rigorous framework for high-level inference. In previous research, the theoretical basis and benefits of the hybrid approach have been developed. However, lacking is a concrete experimental comparison of the hybrid framework with traditional fusion methods, to demonstrate and quantify this benefit. The goal of this research, therefore, is to provide a statistical analysis on the comparison of the accuracy and performance of hybrid network theory, with pure Bayesian and Fuzzy systems and an inexact Bayesian system approximated using Particle Filtering. To accomplish this task, domain specific models will be developed under these different theoretical approaches and then evaluated, via Monte Carlo Simulation, in comparison to situational ground truth to measure accuracy and fidelity. Following this, a rigorous statistical analysis of the performance results will be performed, to quantify the benefit of hybrid inference to other fusion tools.
Systematic inference of functional phosphorylation events in yeast metabolism.
Chen, Yu; Wang, Yonghong; Nielsen, Jens
2017-07-01
Protein phosphorylation is a post-translational modification that affects proteins by changing their structure and conformation in a rapid and reversible way, and it is an important mechanism for metabolic regulation in cells. Phosphoproteomics enables high-throughput identification of phosphorylation events on metabolic enzymes, but identifying functional phosphorylation events still requires more detailed biochemical characterization. Therefore, development of computational methods for investigating unknown functions of a large number of phosphorylation events identified by phosphoproteomics has received increased attention. We developed a mathematical framework that describes the relationship between phosphorylation level of a metabolic enzyme and the corresponding flux through the enzyme. Using this framework, it is possible to quantitatively estimate contribution of phosphorylation events to flux changes. We showed that phosphorylation regulation analysis, combined with a systematic workflow and correlation analysis, can be used for inference of functional phosphorylation events in steady and dynamic conditions, respectively. Using this analysis, we assigned functionality to phosphorylation events of 17 metabolic enzymes in the yeast Saccharomyces cerevisiae , among which 10 are novel. Phosphorylation regulation analysis cannot only be extended for inference of other functional post-translational modifications but also be a promising scaffold for multi-omics data integration in systems biology. Matlab codes for flux balance analysis in this study are available in Supplementary material. yhwang@ecust.edu.cn or nielsenj@chalmers.se. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Hadwin, Paul J.; Sipkens, T. A.; Thomson, K. A.; Liu, F.; Daun, K. J.
2016-01-01
Auto-correlated laser-induced incandescence (AC-LII) infers the soot volume fraction (SVF) of soot particles by comparing the spectral incandescence from laser-energized particles to the pyrometrically inferred peak soot temperature. This calculation requires detailed knowledge of model parameters such as the absorption function of soot, which may vary with combustion chemistry, soot age, and the internal structure of the soot. This work presents a Bayesian methodology to quantify such uncertainties. This technique treats the additional "nuisance" model parameters, including the soot absorption function, as stochastic variables and incorporates the current state of knowledge of these parameters into the inference process through maximum entropy priors. While standard AC-LII analysis provides a point estimate of the SVF, Bayesian techniques infer the posterior probability density, which will allow scientists and engineers to better assess the reliability of AC-LII inferred SVFs in the context of environmental regulations and competing diagnostics.
Sparse linear models: Variational approximate inference and Bayesian experimental design
International Nuclear Information System (INIS)
Seeger, Matthias W
2009-01-01
A wide range of problems such as signal reconstruction, denoising, source separation, feature selection, and graphical model search are addressed today by posterior maximization for linear models with sparsity-favouring prior distributions. The Bayesian posterior contains useful information far beyond its mode, which can be used to drive methods for sampling optimization (active learning), feature relevance ranking, or hyperparameter estimation, if only this representation of uncertainty can be approximated in a tractable manner. In this paper, we review recent results for variational sparse inference, and show that they share underlying computational primitives. We discuss how sampling optimization can be implemented as sequential Bayesian experimental design. While there has been tremendous recent activity to develop sparse estimation, little attendance has been given to sparse approximate inference. In this paper, we argue that many problems in practice, such as compressive sensing for real-world image reconstruction, are served much better by proper uncertainty approximations than by ever more aggressive sparse estimation algorithms. Moreover, since some variational inference methods have been given strong convex optimization characterizations recently, theoretical analysis may become possible, promising new insights into nonlinear experimental design.
Sparse linear models: Variational approximate inference and Bayesian experimental design
Energy Technology Data Exchange (ETDEWEB)
Seeger, Matthias W [Saarland University and Max Planck Institute for Informatics, Campus E1.4, 66123 Saarbruecken (Germany)
2009-12-01
A wide range of problems such as signal reconstruction, denoising, source separation, feature selection, and graphical model search are addressed today by posterior maximization for linear models with sparsity-favouring prior distributions. The Bayesian posterior contains useful information far beyond its mode, which can be used to drive methods for sampling optimization (active learning), feature relevance ranking, or hyperparameter estimation, if only this representation of uncertainty can be approximated in a tractable manner. In this paper, we review recent results for variational sparse inference, and show that they share underlying computational primitives. We discuss how sampling optimization can be implemented as sequential Bayesian experimental design. While there has been tremendous recent activity to develop sparse estimation, little attendance has been given to sparse approximate inference. In this paper, we argue that many problems in practice, such as compressive sensing for real-world image reconstruction, are served much better by proper uncertainty approximations than by ever more aggressive sparse estimation algorithms. Moreover, since some variational inference methods have been given strong convex optimization characterizations recently, theoretical analysis may become possible, promising new insights into nonlinear experimental design.
Statistical inference and Aristotle's Rhetoric.
Macdonald, Ranald R
2004-11-01
Formal logic operates in a closed system where all the information relevant to any conclusion is present, whereas this is not the case when one reasons about events and states of the world. Pollard and Richardson drew attention to the fact that the reasoning behind statistical tests does not lead to logically justifiable conclusions. In this paper statistical inferences are defended not by logic but by the standards of everyday reasoning. Aristotle invented formal logic, but argued that people mostly get at the truth with the aid of enthymemes--incomplete syllogisms which include arguing from examples, analogies and signs. It is proposed that statistical tests work in the same way--in that they are based on examples, invoke the analogy of a model and use the size of the effect under test as a sign that the chance hypothesis is unlikely. Of existing theories of statistical inference only a weak version of Fisher's takes this into account. Aristotle anticipated Fisher by producing an argument of the form that there were too many cases in which an outcome went in a particular direction for that direction to be plausibly attributed to chance. We can therefore conclude that Aristotle would have approved of statistical inference and there is a good reason for calling this form of statistical inference classical.
Pillow, Bradford H; Pearson, Raeanne M; Hecht, Mary; Bremer, Amanda
2010-01-01
Children and adults rated their own certainty following inductive inferences, deductive inferences, and guesses. Beginning in kindergarten, participants rated deductions as more certain than weak inductions or guesses. Deductions were rated as more certain than strong inductions beginning in Grade 3, and fourth-grade children and adults differentiated strong inductions, weak inductions, and informed guesses from pure guesses. By Grade 3, participants also gave different types of explanations for their deductions and inductions. These results are discussed in relation to children's concepts of cognitive processes, logical reasoning, and epistemological development.
Directory of Open Access Journals (Sweden)
T. DELI
2016-07-01
Full Text Available This study focuses on the population genetic structure of the green crab Carcinus aestuarii along part of the African Mediterranean coast, with the main target to confirm genetic subdivision across the well documented genetic boundary of the Siculo-Tunisian Strait. For this purpose, the mitochondrial COI (cytochrome oxidase I gene and five polymorphic microsatellite loci were analysed in 144 and 120 specimens, respectively. Our results show the existence of two distinct haplogroups, separated by 16 mutational steps and revealed a non random distribution of the genetic variation along the African Mediterranean coast. Dating analyses, based on the use of different molecular clock models and rates, placed the divergence among both haplogroups at 1.91 Myr (95% HPD: 1.11–2.68 Myr to 0.69 Myr (95% HPD: 0.44–0.98 Myr. This range of divergence time estimation corresponds to the Early Pleistocene. The particular pattern of genetic divergence among Eastern and Western African Mediterranean populations of C. aestuarii, detected by 2-level AMOVA at the mitochondrial level, was consistent with that inferred from microsatellite analysis and suggests a vicariant event in C. aestuarii. Demographic reconstruction, inferred from mismatch distribution and BSP analyses, yielded different patterns of demographic history between both African Mediterranean groups. The distribution pattern of the two haplogroups across the African Mediterranean coast, along with results of Bayesian analysis of genetic structure revealing an intermediate geographic group between the two divergent groups of the African coast, support the hypothesis of a secondary contact between two historically isolated groups. Although this hypothetical contact zone, thought to be located near the Siculo-Tunisian Strait, still needs to be verified, the asymmetric gene flow from Western to Eastern African Mediterranean, as inferred by the results of a MIGRATE analysis, reinforces the previously
Adaptability and phenotypic stability of common bean genotypes through Bayesian inference.
Corrêa, A M; Teodoro, P E; Gonçalves, M C; Barroso, L M A; Nascimento, M; Santos, A; Torres, F E
2016-04-27
This study used Bayesian inference to investigate the genotype x environment interaction in common bean grown in Mato Grosso do Sul State, and it also evaluated the efficiency of using informative and minimally informative a priori distributions. Six trials were conducted in randomized blocks, and the grain yield of 13 common bean genotypes was assessed. To represent the minimally informative a priori distributions, a probability distribution with high variance was used, and a meta-analysis concept was adopted to represent the informative a priori distributions. Bayes factors were used to conduct comparisons between the a priori distributions. The Bayesian inference was effective for the selection of upright common bean genotypes with high adaptability and phenotypic stability using the Eberhart and Russell method. Bayes factors indicated that the use of informative a priori distributions provided more accurate results than minimally informative a priori distributions. According to Bayesian inference, the EMGOPA-201, BAMBUÍ, CNF 4999, CNF 4129 A 54, and CNFv 8025 genotypes had specific adaptability to favorable environments, while the IAPAR 14 and IAC CARIOCA ETE genotypes had specific adaptability to unfavorable environments.
MODIS and GIMMS Inferred Northern Hemisphere Spring Greenup in Responses to Preseason Climate
Xu, X.; Riley, W. J.; Koven, C.; Jia, G.
2017-12-01
We compare the discrepancies in Normalized Difference Vegetation Index (NDVI) inferred spring greenup (SG) between Terra Moderate Resolution Imaging Spectroradiometer (MODIS) and Advanced Very High Resolution Radiometer (AVHRR) instruments carried by the Global Inventory Monitoring and Modeling Studies (GIMMS) in North Hemisphere. The interannual variation of SG inferred by MODIS and GIMMS NDVI is well correlated in the mid to high latitudes. However, the presence of NDVI discrepancies leads to discrepancies in SG with remarkable latitudinal characteristics. MODIS NDVI inferred later SG in the high latitude while earlier SG in the mid to low latitudes, in comparison to GIMMS NDVI inferred SG. MODIS NDVI inferred SG is better correlated to preseason climate. Interannual variation of SG is only sensitive to preseason temperature. The GIMMS SG to temperature sensitivity over two periods implied that the inter-biome SG to temperature sensitivity is relatively stable, but SG to temperature sensitivity decreased over time. Over the same period, MODIS SG to temperature sensitivity is much higher than GIMMS. This decreased sensitivity demonstrated the findings from previous studies with continuous GIMMS NDVI analysis that vegetation growth (indicated by growing season NDVI) to temperature sensitivity is reduced over time and SG advance trend ceased after 2000s. Our results also explained the contradictive findings that SG advance accelerated after 2000s according to the merged GIMMS and MODIS NDVI time series. Despite the found discrepancies, without ground data support, the quality of NDVI and its inferred SG cannot be effectively evaluated. The discrepancies and uncertainties in different NDVI products and its inferred SG may bias the scientific significance of climate-vegetation relationship. The different NDVI products when used together should be first evaluated and harmonized.
Using Alien Coins to Test Whether Simple Inference Is Bayesian
Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.
2016-01-01
Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…
Ramos, Ana Carolina Simões; Lemos-Filho, José Pires; Ribeiro, Renata Acácio; Santos, Fabrício Rodrigues; Lovato, Maria Bernadete
2007-01-01
Background and Aims Hymenaea stigonocarpa (Fabaceae: Caesalpinioideae) is an endemic tree from the Brazilian cerrado (savanna vegetation), a biome classified as a hotspot for conservation priority. This study investigates the phylogeographic structure of H. stigonocarpa, in order to understand the processes that have led to its current spatial genetic pattern. Methods The polymorphism level and spatial distribution of variants of the plastid non-coding region between the genes psbC and trnS were investigated in 175 individuals from 17 populations, covering the greater part of the total distribution of the species. Molecular diversity indices were calculated and intra-specific relationships were inferred by the construction of haplotype networks using the median-joining method. Genetic differentiation among populations and main geographical groups was evaluated using spatial analysis of molecular variance (SAMOVA). Key Results Twenty-three different haplotypes were identified. The level of differentiation among the populations analysed was relatively high (FST = 0·692). Phylogeographic analyses showed a clear association between the haplotype network and geographic distribution of populations, revealing three main geographical groups: western, central and eastern. SAMOVA corroborated this finding, indicating that most of the variation can be attributed to differences among these three groups (58·8 %), with little difference among populations within groups (FSC = 0·252). Conclusions The subdivision of the geographic distribution of H. stigonocarpa populations into three genetically differentiated groups can be associated with Quaternary climatic changes. The data suggest that during glacial times H. stigonocarpa populations became extinct in most parts of the southern present-day cerrado area. Milder climatic conditions in the north and eastern portions of the cerrado resulted in maintenance of populations in these regions. Thus it is inferred that the most
On Maximum Entropy and Inference
Directory of Open Access Journals (Sweden)
Luigi Gresele
2017-11-01
Full Text Available Maximum entropy is a powerful concept that entails a sharp separation between relevant and irrelevant variables. It is typically invoked in inference, once an assumption is made on what the relevant variables are, in order to estimate a model from data, that affords predictions on all other (dependent variables. Conversely, maximum entropy can be invoked to retrieve the relevant variables (sufficient statistics directly from the data, once a model is identified by Bayesian model selection. We explore this approach in the case of spin models with interactions of arbitrary order, and we discuss how relevant interactions can be inferred. In this perspective, the dimensionality of the inference problem is not set by the number of parameters in the model, but by the frequency distribution of the data. We illustrate the method showing its ability to recover the correct model in a few prototype cases and discuss its application on a real dataset.
Stone, Graham N; White, Sarah C; Csóka, György; Melika, George; Mutun, Serap; Pénzes, Zsolt; Sadeghi, S Ebrahim; Schönrogge, Karsten; Tavakoli, Majid; Nicholls, James A
2017-12-01
Approximate Bayesian computation (ABC) is a powerful and widely used approach in inference of population history. However, the computational effort required to discriminate among alternative historical scenarios often limits the set that is compared to those considered more likely a priori. While often justifiable, this approach will fail to consider unexpected but well-supported population histories. We used a hierarchical tournament approach, in which subsets of scenarios are compared in a first round of ABC analyses and the winners are compared in a second analysis, to reconstruct the population history of an oak gall wasp, Synergus umbraculus (Hymenoptera, Cynipidae) across the Western Palaearctic. We used 4,233 bp of sequence data across seven loci to explore the relationships between four putative Pleistocene refuge populations in Iberia, Italy, the Balkans and Western Asia. We compared support for 148 alternative scenarios in eight pools, each pool comprising all possible rearrangements of four populations over a given topology of relationships, with or without founding of one population by admixture and with or without an unsampled "ghost" population. We found very little support for the directional "out of the east" scenario previously inferred for other gall wasp community members. Instead, the best-supported models identified Iberia as the first-regional population to diverge from the others in the late Pleistocene, followed by divergence between the Balkans and Western Asia, and founding of the Italian population through late Pleistocene admixture from Iberia and the Balkans. We compare these results with what is known for other members of the oak gall community, and consider the strengths and weaknesses of using a tournament approach to explore phylogeographic model space. © 2017 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.
Cognitive Inference Device for Activity Supervision in the Elderly
Directory of Open Access Journals (Sweden)
Nilamadhab Mishra
2014-01-01
Full Text Available Human activity, life span, and quality of life are enhanced by innovations in science and technology. Aging individual needs to take advantage of these developments to lead a self-regulated life. However, maintaining a self-regulated life at old age involves a high degree of risk, and the elderly often fail at this goal. Thus, the objective of our study is to investigate the feasibility of implementing a cognitive inference device (CI-device for effective activity supervision in the elderly. To frame the CI-device, we propose a device design framework along with an inference algorithm and implement the designs through an artificial neural model with different configurations, mapping the CI-device’s functions to minimise the device’s prediction error. An analysis and discussion are then provided to validate the feasibility of CI-device implementation for activity supervision in the elderly.
Compiling Relational Bayesian Networks for Exact Inference
DEFF Research Database (Denmark)
Jaeger, Manfred; Chavira, Mark; Darwiche, Adnan
2004-01-01
We describe a system for exact inference with relational Bayesian networks as defined in the publicly available \\primula\\ tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference by evaluating...
Principles for statistical inference on big spatio-temporal data from climate models
Castruccio, Stefano; Genton, Marc G.
2018-01-01
The vast increase in size of modern spatio-temporal datasets has prompted statisticians working in environmental applications to develop new and efficient methodologies that are still able to achieve inference for nontrivial models within an affordable time. Climate model outputs push the limits of inference for Gaussian processes, as their size can easily be larger than 10 billion data points. Drawing from our experience in a set of previous work, we provide three principles for the statistical analysis of such large datasets that leverage recent methodological and computational advances. These principles emphasize the need of embedding distributed and parallel computing in the inferential process.
Principles for statistical inference on big spatio-temporal data from climate models
Castruccio, Stefano
2018-02-24
The vast increase in size of modern spatio-temporal datasets has prompted statisticians working in environmental applications to develop new and efficient methodologies that are still able to achieve inference for nontrivial models within an affordable time. Climate model outputs push the limits of inference for Gaussian processes, as their size can easily be larger than 10 billion data points. Drawing from our experience in a set of previous work, we provide three principles for the statistical analysis of such large datasets that leverage recent methodological and computational advances. These principles emphasize the need of embedding distributed and parallel computing in the inferential process.
Uncertainty in prediction and in inference
International Nuclear Information System (INIS)
Hilgevoord, J.; Uffink, J.
1991-01-01
The concepts of uncertainty in prediction and inference are introduced and illustrated using the diffraction of light as an example. The close relationship between the concepts of uncertainty in inference and resolving power is noted. A general quantitative measure of uncertainty in inference can be obtained by means of the so-called statistical distance between probability distributions. When applied to quantum mechanics, this distance leads to a measure of the distinguishability of quantum states, which essentially is the absolute value of the matrix element between the states. The importance of this result to the quantum mechanical uncertainty principle is noted. The second part of the paper provides a derivation of the statistical distance on the basis of the so-called method of support
Ogunnaike, Babatunde A; Gelmi, Claudio A; Edwards, Jeremy S
2010-05-21
Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to the weakness of microarrays. On the other hand, because of the data volume, treating the entire data set as an ensemble, and developing theoretical distributions for these ensembles provides a framework that plays instead to the strength of microarrays. We present theoretical results that under reasonable assumptions, the distribution of microarray intensities follows the Gamma model, with the biological interpretations of the model parameters emerging naturally. We subsequently establish that for each microarray data set, the fractional intensities can be represented as a mixture of Beta densities, and develop a procedure for using these results to draw statistical inference regarding differential gene expression. We illustrate the results with experimental data from gene expression studies on Deinococcus radiodurans following DNA damage using cDNA microarrays. Copyright (c) 2010 Elsevier Ltd. All rights reserved.
Protein Inference from the Integration of Tandem MS Data and Interactome Networks.
Zhong, Jiancheng; Wang, Jianxing; Ding, Xiaojun; Zhang, Zhen; Li, Min; Wu, Fang-Xiang; Pan, Yi
2017-01-01
Since proteins are digested into a mixture of peptides in the preprocessing step of tandem mass spectrometry (MS), it is difficult to determine which specific protein a shared peptide belongs to. In recent studies, besides tandem MS data and peptide identification information, some other information is exploited to infer proteins. Different from the methods which first use only tandem MS data to infer proteins and then use network information to refine them, this study proposes a protein inference method named TMSIN, which uses interactome networks directly. As two interacting proteins should co-exist, it is reasonable to assume that if one of the interacting proteins is confidently inferred in a sample, its interacting partners should have a high probability in the same sample, too. Therefore, we can use the neighborhood information of a protein in an interactome network to adjust the probability that the shared peptide belongs to the protein. In TMSIN, a multi-weighted graph is constructed by incorporating the bipartite graph with interactome network information, where the bipartite graph is built with the peptide identification information. Based on multi-weighted graphs, TMSIN adopts an iterative workflow to infer proteins. At each iterative step, the probability that a shared peptide belongs to a specific protein is calculated by using the Bayes' law based on the neighbor protein support scores of each protein which are mapped by the shared peptides. We carried out experiments on yeast data and human data to evaluate the performance of TMSIN in terms of ROC, q-value, and accuracy. The experimental results show that AUC scores yielded by TMSIN are 0.742 and 0.874 in yeast dataset and human dataset, respectively, and TMSIN yields the maximum number of true positives when q-value less than or equal to 0.05. The overlap analysis shows that TMSIN is an effective complementary approach for protein inference.
Directory of Open Access Journals (Sweden)
Brian Wade Jamandre
2014-01-01
Full Text Available The sequence and structure of the complete mtDNA control region (CR of M. cephalus from African, Pacific, and Atlantic populations are presented in this study to assess its usefulness in phylogeographic studies of this species. The mtDNA CR sequence variations among M. cephalus populations largely exceeded intraspecific polymorphisms that are generally observed in other vertebrates. The length of CR sequence varied among M. cephalus populations due to the presence of indels and variable number of tandem repeats at the 3′ hypervariable domain. The high evolutionary rate of the CR in this species probably originated from these mutations. However, no excessive homoplasic mutations were noticed. Finally, the star shaped tree inferred from the CR polymorphism stresses a rapid radiation worldwide, in this species. The CR still appears as a good marker for phylogeographic investigations and additional worldwide samples are warranted to further investigate the genetic structure and evolution in M. cephalus.
Bakbergenuly, Ilyas; Morgenthaler, Stephan
2016-01-01
We study bias arising as a result of nonlinear transformations of random variables in random or mixed effects models and its effect on inference in group‐level studies or in meta‐analysis. The findings are illustrated on the example of overdispersed binomial distributions, where we demonstrate considerable biases arising from standard log‐odds and arcsine transformations of the estimated probability p^, both for single‐group studies and in combining results from several groups or studies in meta‐analysis. Our simulations confirm that these biases are linear in ρ, for small values of ρ, the intracluster correlation coefficient. These biases do not depend on the sample sizes or the number of studies K in a meta‐analysis and result in abysmal coverage of the combined effect for large K. We also propose bias‐correction for the arcsine transformation. Our simulations demonstrate that this bias‐correction works well for small values of the intraclass correlation. The methods are applied to two examples of meta‐analyses of prevalence. PMID:27192062
A human genome-wide library of local phylogeny predictions for whole-genome inference problems
Directory of Open Access Journals (Sweden)
Schwartz Russell
2008-08-01
Full Text Available Abstract Background Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding. Results In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history. Conclusion Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.
Fan, Zhenxin; Liu, Shaoying; Liu, Yang; Zhang, Xiuyue; Yue, Bisong
2011-03-01
Phylogeographical studies that focus on the southeastern margin of the Tibetan Plateau are limited. The complex terrain and unique geological history make it a particularly unusual region of the Tibetan Plateau. We carried out a phylogeographical study of two rodent species Neodon irene and Apodemus latronum using the mitochondrial cytochrome b gene sequences. High genetic diversities and deep phylogenetic splits were detected in both rodents. Some haplotypes from one sampling region fell into different evolutionary clades, but most haplotypes from the same sampling regions were clustered together with each other. The results of isolation by distance analysis further substantiated that their genetic diversities were structured along geography. Thus, there were high levels of geographical structure for both rodents. Demographic analyses implied a relatively constant population size for all samples of N. irene and A. latronum in history. However, clade B of N. irene and clade 3 of A. latronum experienced population expansions at 105-32 and 156-47 Kya, respectively. Through comparison with previous studies, we suggest the high mitochondrial DNA diversities in them are probably not a species-specific feature, but a common pattern for small mammals in this unique area. Details of the historical demography of these rodents revealed in this study could provide new insights into how rodents and possibly other small mammals in this region responded to the geological and climatic events.
International Nuclear Information System (INIS)
Dai, Yu; Lu, Peng
2015-01-01
In evolutionary games, the temptation mechanism reduces cooperation percentage while the reputation mechanism promotes it. Inferring reputation theory proposes that agent's imitating neighbors with the highest reputation takes place with a probability. Although reputation promotes cooperation, when and how it enhances cooperation is still a question. This paper investigates the condition where the inferring reputation probability promotes cooperation. Hence, the effects of reputation and temptation on cooperation are explored under the spatial prisoners’ dilemma game, utilizing the methods of simulation and statistical analysis. Results show that temptation reduces cooperation unconditionally while reputation promotes it conditionally, i.e. reputation countervails temptation conditionally. When the inferring reputation probability is less than 0.5, reputation promotes cooperation substantially and thus countervails temptation. However, when the inferring reputation probability is larger than 0.5, its contribution to cooperation is relatively weak and cannot prevent temptation from undermining cooperation. Reputation even decreases cooperation together with temptation when the probability is higher than 0.8. It should be noticed that inferring reputation does not always succeed to countervail temptation and there is a specific interval for it to promote cooperation
Tusiime, Felly Mugizi; Gizaw, Abel; Wondimu, Tigist; Masao, Catherine Aloyce; Abdi, Ahmed Abdikadir; Muwanika, Vincent; Trávníček, Pavel; Nemomissa, Sileshi; Popp, Magnus; Eilu, Gerald; Brochmann, Christian; Pimentel, Manuel
2017-07-01
High tropical mountains harbour remarkable and fragmented biodiversity thought to a large degree to have been shaped by multiple dispersals of cold-adapted lineages from remote areas. Few dated phylogenetic/phylogeographic analyses are however available. Here, we address the hypotheses that the sub-Saharan African sweet vernal grasses have a dual colonization history and that lineages of independent origins have established secondary contact. We carried out rangewide sampling across the eastern African high mountains, inferred dated phylogenies from nuclear ribosomal and plastid DNA using Bayesian methods, and performed flow cytometry and AFLP (amplified fragment length polymorphism) analyses. We inferred a single Late Pliocene western Eurasian origin of the eastern African taxa, whose high-ploid populations in one mountain group formed a distinct phylogeographic group and carried plastids that diverged from those of the currently allopatric southern African lineage in the Mid- to Late Pleistocene. We show that Anthoxanthum has an intriguing history in sub-Saharan Africa, including Late Pliocene colonization from southeast and north, followed by secondary contact, hybridization, allopolyploidization and local extinction during one of the last glacial cycles. Our results add to a growing body of evidence showing that isolated tropical high mountain habitats have a dynamic recent history involving niche conservatism and recruitment from remote sources, repeated dispersals, diversification, hybridization and local extinction. © 2017 John Wiley & Sons Ltd.
Nonparametric predictive inference in statistical process control
Arts, G.R.J.; Coolen, F.P.A.; Laan, van der P.
2000-01-01
New methods for statistical process control are presented, where the inferences have a nonparametric predictive nature. We consider several problems in process control in terms of uncertainties about future observable random quantities, and we develop inferences for these random quantities hased on
Compiling Relational Bayesian Networks for Exact Inference
DEFF Research Database (Denmark)
Jaeger, Manfred; Darwiche, Adnan; Chavira, Mark
2006-01-01
We describe in this paper a system for exact inference with relational Bayesian networks as defined in the publicly available PRIMULA tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference...
Making inference from wildlife collision data: inferring predator absence from prey strikes
Directory of Open Access Journals (Sweden)
Peter Caley
2017-02-01
Full Text Available Wildlife collision data are ubiquitous, though challenging for making ecological inference due to typically irreducible uncertainty relating to the sampling process. We illustrate a new approach that is useful for generating inference from predator data arising from wildlife collisions. By simply conditioning on a second prey species sampled via the same collision process, and by using a biologically realistic numerical response functions, we can produce a coherent numerical response relationship between predator and prey. This relationship can then be used to make inference on the population size of the predator species, including the probability of extinction. The statistical conditioning enables us to account for unmeasured variation in factors influencing the runway strike incidence for individual airports and to enable valid comparisons. A practical application of the approach for testing hypotheses about the distribution and abundance of a predator species is illustrated using the hypothesized red fox incursion into Tasmania, Australia. We estimate that conditional on the numerical response between fox and lagomorph runway strikes on mainland Australia, the predictive probability of observing no runway strikes of foxes in Tasmania after observing 15 lagomorph strikes is 0.001. We conclude there is enough evidence to safely reject the null hypothesis that there is a widespread red fox population in Tasmania at a population density consistent with prey availability. The method is novel and has potential wider application.
Making inference from wildlife collision data: inferring predator absence from prey strikes.
Caley, Peter; Hosack, Geoffrey R; Barry, Simon C
2017-01-01
Wildlife collision data are ubiquitous, though challenging for making ecological inference due to typically irreducible uncertainty relating to the sampling process. We illustrate a new approach that is useful for generating inference from predator data arising from wildlife collisions. By simply conditioning on a second prey species sampled via the same collision process, and by using a biologically realistic numerical response functions, we can produce a coherent numerical response relationship between predator and prey. This relationship can then be used to make inference on the population size of the predator species, including the probability of extinction. The statistical conditioning enables us to account for unmeasured variation in factors influencing the runway strike incidence for individual airports and to enable valid comparisons. A practical application of the approach for testing hypotheses about the distribution and abundance of a predator species is illustrated using the hypothesized red fox incursion into Tasmania, Australia. We estimate that conditional on the numerical response between fox and lagomorph runway strikes on mainland Australia, the predictive probability of observing no runway strikes of foxes in Tasmania after observing 15 lagomorph strikes is 0.001. We conclude there is enough evidence to safely reject the null hypothesis that there is a widespread red fox population in Tasmania at a population density consistent with prey availability. The method is novel and has potential wider application.
Recent Advances in System Reliability Signatures, Multi-state Systems and Statistical Inference
Frenkel, Ilia
2012-01-01
Recent Advances in System Reliability discusses developments in modern reliability theory such as signatures, multi-state systems and statistical inference. It describes the latest achievements in these fields, and covers the application of these achievements to reliability engineering practice. The chapters cover a wide range of new theoretical subjects and have been written by leading experts in reliability theory and its applications. The topics include: concepts and different definitions of signatures (D-spectra), their properties and applications to reliability of coherent systems and network-type structures; Lz-transform of Markov stochastic process and its application to multi-state system reliability analysis; methods for cost-reliability and cost-availability analysis of multi-state systems; optimal replacement and protection strategy; and statistical inference. Recent Advances in System Reliability presents many examples to illustrate the theoretical results. Real world multi-state systems...
Causal inference in biology networks with integrated belief propagation.
Chang, Rui; Karr, Jonathan R; Schadt, Eric E
2015-01-01
Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the 'fitness' of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot.
Directory of Open Access Journals (Sweden)
Ji Wei
2010-10-01
Full Text Available Abstract Background Microarray data discretization is a basic preprocess for many algorithms of gene regulatory network inference. Some common discretization methods in informatics are used to discretize microarray data. Selection of the discretization method is often arbitrary and no systematic comparison of different discretization has been conducted, in the context of gene regulatory network inference from time series gene expression data. Results In this study, we propose a new discretization method "bikmeans", and compare its performance with four other widely-used discretization methods using different datasets, modeling algorithms and number of intervals. Sensitivities, specificities and total accuracies were calculated and statistical analysis was carried out. Bikmeans method always gave high total accuracies. Conclusions Our results indicate that proper discretization methods can consistently improve gene regulatory network inference independent of network modeling algorithms and datasets. Our new method, bikmeans, resulted in significant better total accuracies than other methods.
Directory of Open Access Journals (Sweden)
Florian Gehre
Full Text Available Mycobacterium africanum is an important cause of tuberculosis (TB in West Africa. So far, two lineages called M. africanum West African 1 (MAF1 and M. africanum West African 2 (MAF2 have been defined. Although several molecular studies on MAF2 have been conducted to date, little is known about MAF1. As MAF1 is mainly present in countries around the Gulf of Guinea we aimed to estimate its prevalence in Cotonou, the biggest city in Benin. Between 2005-06 we collected strains in Cotonou/Benin and genotyped them using spoligo- and 12-loci-MIRU-VNTR-typing. Analyzing 194 isolates, we found that 31% and 6% were MAF1 and MAF2, respectively. Therefore Benin is one of the countries with the highest prevalence (37% of M. africanum in general and MAF1 in particular. Moreover, we combined our data from Benin with publicly available genotyping information from Nigeria and Sierra Leone, and determined the phylogeographic population structure and genotypic clustering of MAF1. Within the MAF1 lineage, we identified an unexpected great genetic variability with the presence of at least 10 sub-lineages. Interestingly, 8 out of 10 of the discovered sub-lineages not only clustered genetically but also geographically. Besides showing a remarkable local restriction to certain regions in Benin and Nigeria, the sub-lineages differed dramatically in their capacity to transmit within the human host population. While identifying Benin as one of the countries with the highest overall prevalence of M. africanum, this study also contains the first detailed description of the transmission dynamics and phylogenetic composition of the MAF1 lineage.
Inferring Smoking Status from User Generated Content in an Online Cessation Community.
Amato, Michael S; Papandonatos, George D; Cha, Sarah; Wang, Xi; Zhao, Kang; Cohn, Amy M; Pearson, Jennifer L; Graham, Amanda L
2018-01-22
User generated content (UGC) is a valuable but underutilized source of information about individuals who participate in online cessation interventions. This study represents a first effort to passively detect smoking status among members of an online cessation program using UGC. Secondary data analysis was performed on data from 826 participants in a web-based smoking cessation randomized trial that included an online community. Domain experts from the online community reviewed each post and comment written by participants and attempted to infer the author's smoking status at the time it was written. Inferences from UGC were validated by comparison with self-reported 30-day point prevalence abstinence (PPA). Following validation, the impact of this method was evaluated across all individuals and timepoints in the study period. Of the 826 participants in the analytic sample, 719 had written at least one post from which content inference was possible. Among participants for whom unambiguous smoking status was inferred during the 30 days preceding their 3-month follow-up survey, concordance with self-report was almost perfect (kappa = 0.94). Posts indicating abstinence tended to be written shortly after enrollment (median = 14 days). Passive inference of smoking status from UGC in online cessation communities is possible and highly reliable for smokers who actively produce content. These results lay the groundwork for further development of observational research tools and intervention innovations. © The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Type Inference for Session Types in the Pi-Calculus
DEFF Research Database (Denmark)
Graversen, Eva Fajstrup; Harbo, Jacob Buchreitz; Huttel, Hans
2014-01-01
In this paper we present a direct algorithm for session type inference for the π-calculus. Type inference for session types has previously been achieved by either imposing limitations and restriction on the π-calculus, or by reducing the type inference problem to that for linear types. Our approach...
Inferring Pre-shock Acoustic Field From Post-shock Pitot Pressure Measurement
Wang, Jian-Xun; Zhang, Chao; Duan, Lian; Xiao, Heng; Virginia Tech Team; Missouri Univ of Sci; Tech Team
2017-11-01
Linear interaction analysis (LIA) and iterative ensemble Kalman method are used to convert post-shock Pitot pressure fluctuations to static pressure fluctuations in front of the shock. The LIA is used as the forward model for the transfer function associated with a homogeneous field of acoustic waves passing through a nominally normal shock wave. The iterative ensemble Kalman method is then employed to infer the spectrum of upstream acoustic waves based on the post-shock Pitot pressure measured at a single point. Several test cases with synthetic and real measurement data are used to demonstrate the merits of the proposed inference scheme. The study provides the basis for measuring tunnel freestream noise with intrusive probes in noisy supersonic wind tunnels.
Explanatory Preferences Shape Learning and Inference.
Lombrozo, Tania
2016-10-01
Explanations play an important role in learning and inference. People often learn by seeking explanations, and they assess the viability of hypotheses by considering how well they explain the data. An emerging body of work reveals that both children and adults have strong and systematic intuitions about what constitutes a good explanation, and that these explanatory preferences have a systematic impact on explanation-based processes. In particular, people favor explanations that are simple and broad, with the consequence that engaging in explanation can shape learning and inference by leading people to seek patterns and favor hypotheses that support broad and simple explanations. Given the prevalence of explanation in everyday cognition, understanding explanation is therefore crucial to understanding learning and inference. Copyright © 2016 Elsevier Ltd. All rights reserved.
Grammatical inference algorithms, routines and applications
Wieczorek, Wojciech
2017-01-01
This book focuses on grammatical inference, presenting classic and modern methods of grammatical inference from the perspective of practitioners. To do so, it employs the Python programming language to present all of the methods discussed. Grammatical inference is a field that lies at the intersection of multiple disciplines, with contributions from computational linguistics, pattern recognition, machine learning, computational biology, formal learning theory and many others. Though the book is largely practical, it also includes elements of learning theory, combinatorics on words, the theory of automata and formal languages, plus references to real-world problems. The listings presented here can be directly copied and pasted into other programs, thus making the book a valuable source of ready recipes for students, academic researchers, and programmers alike, as well as an inspiration for their further development.>.
Reveal, A General Reverse Engineering Algorithm for Inference of Genetic Network Architectures
Liang, Shoudan; Fuhrman, Stefanie; Somogyi, Roland
1998-01-01
Given the immanent gene expression mapping covering whole genomes during development, health and disease, we seek computational methods to maximize functional inference from such large data sets. Is it possible, in principle, to completely infer a complex regulatory network architecture from input/output patterns of its variables? We investigated this possibility using binary models of genetic networks. Trajectories, or state transition tables of Boolean nets, resemble time series of gene expression. By systematically analyzing the mutual information between input states and output states, one is able to infer the sets of input elements controlling each element or gene in the network. This process is unequivocal and exact for complete state transition tables. We implemented this REVerse Engineering ALgorithm (REVEAL) in a C program, and found the problem to be tractable within the conditions tested so far. For n = 50 (elements) and k = 3 (inputs per element), the analysis of incomplete state transition tables (100 state transition pairs out of a possible 10(exp 15)) reliably produced the original rule and wiring sets. While this study is limited to synchronous Boolean networks, the algorithm is generalizable to include multi-state models, essentially allowing direct application to realistic biological data sets. The ability to adequately solve the inverse problem may enable in-depth analysis of complex dynamic systems in biology and other fields.
BagReg: Protein inference through machine learning.
Zhao, Can; Liu, Dao; Teng, Ben; He, Zengyou
2015-08-01
Protein inference from the identified peptides is of primary importance in the shotgun proteomics. The target of protein inference is to identify whether each candidate protein is truly present in the sample. To date, many computational methods have been proposed to solve this problem. However, there is still no method that can fully utilize the information hidden in the input data. In this article, we propose a learning-based method named BagReg for protein inference. The method firstly artificially extracts five features from the input data, and then chooses each feature as the class feature to separately build models to predict the presence probabilities of proteins. Finally, the weak results from five prediction models are aggregated to obtain the final result. We test our method on six public available data sets. The experimental results show that our method is superior to the state-of-the-art protein inference algorithms. Copyright © 2015 Elsevier Ltd. All rights reserved.
Inference on inspiral signals using LISA MLDC data
International Nuclear Information System (INIS)
Roever, Christian; Stroeer, Alexander; Bloomer, Ed; Christensen, Nelson; Clark, James; Hendry, Martin; Messenger, Chris; Meyer, Renate; Pitkin, Matt; Toher, Jennifer; Umstaetter, Richard; Vecchio, Alberto; Veitch, John; Woan, Graham
2007-01-01
In this paper, we describe a Bayesian inference framework for the analysis of data obtained by LISA. We set up a model for binary inspiral signals as defined for the Mock LISA Data Challenge 1.2 (MLDC), and implemented a Markov chain Monte Carlo (MCMC) algorithm to facilitate exploration and integration of the posterior distribution over the nine-dimensional parameter space. Here, we present intermediate results showing how, using this method, information about the nine parameters can be extracted from the data
Ensemble stacking mitigates biases in inference of synaptic connectivity.
Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N
2018-01-01
A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.
Efficient algorithms for conditional independence inference
Czech Academy of Sciences Publication Activity Database
Bouckaert, R.; Hemmecke, R.; Lindner, S.; Studený, Milan
2010-01-01
Roč. 11, č. 1 (2010), s. 3453-3479 ISSN 1532-4435 R&D Projects: GA ČR GA201/08/0539; GA MŠk 1M0572 Institutional research plan: CEZ:AV0Z10750506 Keywords : conditional independence inference * linear programming approach Subject RIV: BA - General Mathematics Impact factor: 2.949, year: 2010 http://library.utia.cas.cz/separaty/2010/MTR/studeny-efficient algorithms for conditional independence inference.pdf
Approximation and inference methods for stochastic biochemical kinetics—a tutorial review
International Nuclear Information System (INIS)
Schnoerr, David; Grima, Ramon; Sanguinetti, Guido
2017-01-01
Stochastic fluctuations of molecule numbers are ubiquitous in biological systems. Important examples include gene expression and enzymatic processes in living cells. Such systems are typically modelled as chemical reaction networks whose dynamics are governed by the chemical master equation. Despite its simple structure, no analytic solutions to the chemical master equation are known for most systems. Moreover, stochastic simulations are computationally expensive, making systematic analysis and statistical inference a challenging task. Consequently, significant effort has been spent in recent decades on the development of efficient approximation and inference methods. This article gives an introduction to basic modelling concepts as well as an overview of state of the art methods. First, we motivate and introduce deterministic and stochastic methods for modelling chemical networks, and give an overview of simulation and exact solution methods. Next, we discuss several approximation methods, including the chemical Langevin equation, the system size expansion, moment closure approximations, time-scale separation approximations and hybrid methods. We discuss their various properties and review recent advances and remaining challenges for these methods. We present a comparison of several of these methods by means of a numerical case study and highlight some of their respective advantages and disadvantages. Finally, we discuss the problem of inference from experimental data in the Bayesian framework and review recent methods developed the literature. In summary, this review gives a self-contained introduction to modelling, approximations and inference methods for stochastic chemical kinetics. (topical review)
Matrix dimensions bias demographic inferences: implications for comparative plant demography.
Salguero-Gómez, Roberto; Plotkin, Joshua B
2010-12-01
While the wealth of projection matrices in plant demography permits comparative studies, variation in matrix dimensions complicates interspecific comparisons. Collapsing matrices to a common dimension may facilitate such comparisons but may also bias the inferred demographic parameters. Here we examine how matrix dimension affects inferred demographic elasticities and how different collapsing criteria perform. We analyzed 13 x 13 matrices representing nine plant species, collapsing these matrices (i) into even 7 x 7, 5 x 5, 4 x 4, and 3 x 3 matrices and (ii) into 5 x 5 matrices using different criteria. Stasis and fecundity elasticities increased when matrix dimension was reduced, whereas those of progression and retrogression decreased. We suggest a collapsing criterion that minimizes dissimilarities between the original- and collapsed-matrix elasticities and apply it to 66 plant species to study how life span and growth form influence the relationship between matrix dimension and elasticities. Our analysis demonstrates that (i) projection matrix dimension has significant effects on inferred demographic parameters, (ii) there are better-performing methods than previously suggested for standardizing matrix dimension, and (iii) herbaceous perennial projection matrices are particularly sensitive to changes in matrix dimensionality. For comparative demographic studies, we recommend normalizing matrices to a common dimension by collapsing higher classes and leaving the first few classes unaltered.
Inferring domain-domain interactions from protein-protein interactions with formal concept analysis.
Directory of Open Access Journals (Sweden)
Susan Khor
Full Text Available Identifying reliable domain-domain interactions will increase our ability to predict novel protein-protein interactions, to unravel interactions in protein complexes, and thus gain more information about the function and behavior of genes. One of the challenges of identifying reliable domain-domain interactions is domain promiscuity. Promiscuous domains are domains that can occur in many domain architectures and are therefore found in many proteins. This becomes a problem for a method where the score of a domain-pair is the ratio between observed and expected frequencies because the protein-protein interaction network is sparse. As such, many protein-pairs will be non-interacting and domain-pairs with promiscuous domains will be penalized. This domain promiscuity challenge to the problem of inferring reliable domain-domain interactions from protein-protein interactions has been recognized, and a number of work-arounds have been proposed. This paper reports on an application of Formal Concept Analysis to this problem. It is found that the relationship between formal concepts provides a natural way for rare domains to elevate the rank of promiscuous domain-pairs and enrich highly ranked domain-pairs with reliable domain-domain interactions. This piggybacking of promiscuous domain-pairs onto less promiscuous domain-pairs is possible only with concept lattices whose attribute-labels are not reduced and is enhanced by the presence of proteins that comprise both promiscuous and rare domains.
Inferring Domain-Domain Interactions from Protein-Protein Interactions with Formal Concept Analysis
Khor, Susan
2014-01-01
Identifying reliable domain-domain interactions will increase our ability to predict novel protein-protein interactions, to unravel interactions in protein complexes, and thus gain more information about the function and behavior of genes. One of the challenges of identifying reliable domain-domain interactions is domain promiscuity. Promiscuous domains are domains that can occur in many domain architectures and are therefore found in many proteins. This becomes a problem for a method where the score of a domain-pair is the ratio between observed and expected frequencies because the protein-protein interaction network is sparse. As such, many protein-pairs will be non-interacting and domain-pairs with promiscuous domains will be penalized. This domain promiscuity challenge to the problem of inferring reliable domain-domain interactions from protein-protein interactions has been recognized, and a number of work-arounds have been proposed. This paper reports on an application of Formal Concept Analysis to this problem. It is found that the relationship between formal concepts provides a natural way for rare domains to elevate the rank of promiscuous domain-pairs and enrich highly ranked domain-pairs with reliable domain-domain interactions. This piggybacking of promiscuous domain-pairs onto less promiscuous domain-pairs is possible only with concept lattices whose attribute-labels are not reduced and is enhanced by the presence of proteins that comprise both promiscuous and rare domains. PMID:24586450
State-Space Inference and Learning with Gaussian Processes
Turner, R; Deisenroth, MP; Rasmussen, CE
2010-01-01
18.10.13 KB. Ok to add author version to spiral, authors hold copyright. State-space inference and learning with Gaussian processes (GPs) is an unsolved problem. We propose a new, general methodology for inference and learning in nonlinear state-space models that are described probabilistically by non-parametric GP models. We apply the expectation maximization algorithm to iterate between inference in the latent state-space and learning the parameters of the underlying GP dynamics model. C...
Enhancing Transparency and Control When Drawing Data-Driven Inferences About Individuals.
Chen, Daizhuo; Fraiberger, Samuel P; Moakler, Robert; Provost, Foster
2017-09-01
Recent studies show the remarkable power of fine-grained information disclosed by users on social network sites to infer users' personal characteristics via predictive modeling. Similar fine-grained data are being used successfully in other commercial applications. In response, attention is turning increasingly to the transparency that organizations provide to users as to what inferences are drawn and why, as well as to what sort of control users can be given over inferences that are drawn about them. In this article, we focus on inferences about personal characteristics based on information disclosed by users' online actions. As a use case, we explore personal inferences that are made possible from "Likes" on Facebook. We first present a means for providing transparency into the information responsible for inferences drawn by data-driven models. We then introduce the "cloaking device"-a mechanism for users to inhibit the use of particular pieces of information in inference. Using these analytical tools we ask two main questions: (1) How much information must users cloak to significantly affect inferences about their personal traits? We find that usually users must cloak only a small portion of their actions to inhibit inference. We also find that, encouragingly, false-positive inferences are significantly easier to cloak than true-positive inferences. (2) Can firms change their modeling behavior to make cloaking more difficult? The answer is a definitive yes. We demonstrate a simple modeling change that requires users to cloak substantially more information to affect the inferences drawn. The upshot is that organizations can provide transparency and control even into complicated, predictive model-driven inferences, but they also can make control easier or harder for their users.
Directory of Open Access Journals (Sweden)
Lara D Shepherd
Full Text Available The little spotted kiwi (Apteryx owenii is a flightless ratite formerly found throughout New Zealand but now greatly reduced in distribution. Previous phylogeographic studies of the related brown kiwi (A. mantelli, A. rowi and A. australis, with which little spotted kiwi was once sympatric, revealed extremely high levels of genetic structuring, with mitochondrial DNA haplotypes often restricted to populations. We surveyed genetic variation throughout the present and pre-human range of little spotted kiwi by obtaining mitochondrial DNA sequences from contemporary and ancient samples. Little spotted kiwi and great spotted kiwi (A. haastii formed a monophyletic clade sister to brown kiwi. Ancient samples of little spotted kiwi from the northern North Island, where it is now extinct, formed a lineage that was distinct from remaining little spotted kiwi and great spotted kiwi lineages, potentially indicating unrecognized taxonomic diversity. Overall, little spotted kiwi exhibited much lower levels of genetic diversity and structuring than brown kiwi, particularly through the South Island. Our results also indicate that little spotted kiwi (or at least hybrids involving this species survived on the South Island mainland until more recently than previously thought.
Rakotoarisoa, Jean-Eric; Raheriarisena, Martin; Goodman, Steven M
2013-01-01
We conducted a mitochondrial phylogeographic study of the endemic dry forest rodent Eliurus carletoni (Rodentia: Nesomyinae) in an ecological transition zone of northern Madagascar (Loky-Manambato) and 2 surrounding regions (Ankarana and Analamerana). The main goal was to assess the evolutionary consequences on this taxon of the complex landscape features and Quaternary ecological vicissitudes. Three haplogroups were identified from the 215 specimens obtained from 15 populations. High levels of genetic diversity and significant genetic differentiation among populations were observed. The different geographical subdivisions of the study area by regions, by river catchment zones, and the physical distance between populations are not correlated with genetic patterns. In contrast, population structure is mostly explained by the geographic distribution of the samples among existing forest blocks. E. carletoni experienced a genetic bottleneck between 18 750 and 7500 years BP, which correlates with periods when moister climates existed on the island. Overall, our data suggest that the complex genetic patterns of E. carletoni can be explained by Quaternary climatic vicissitudes that resulted in habitat fluctuations between dry and humid forests, as well as subsequent human-induced fragmentation of forest habitat.
Fused Regression for Multi-source Gene Regulatory Network Inference.
Directory of Open Access Journals (Sweden)
Kari Y Lam
2016-12-01
Full Text Available Understanding gene regulatory networks is critical to understanding cellular differentiation and response to external stimuli. Methods for global network inference have been developed and applied to a variety of species. Most approaches consider the problem of network inference independently in each species, despite evidence that gene regulation can be conserved even in distantly related species. Further, network inference is often confined to single data-types (single platforms and single cell types. We introduce a method for multi-source network inference that allows simultaneous estimation of gene regulatory networks in multiple species or biological processes through the introduction of priors based on known gene relationships such as orthology incorporated using fused regression. This approach improves network inference performance even when orthology mapping and conservation are incomplete. We refine this method by presenting an algorithm that extracts the true conserved subnetwork from a larger set of potentially conserved interactions and demonstrate the utility of our method in cross species network inference. Last, we demonstrate our method's utility in learning from data collected on different experimental platforms.
Use of a Novel Grammatical Inference Approach in Classification of Amyloidogenic Hexapeptides
Directory of Open Access Journals (Sweden)
Wojciech Wieczorek
2016-01-01
Full Text Available The present paper is a novel contribution to the field of bioinformatics by using grammatical inference in the analysis of data. We developed an algorithm for generating star-free regular expressions which turned out to be good recommendation tools, as they are characterized by a relatively high correlation coefficient between the observed and predicted binary classifications. The experiments have been performed for three datasets of amyloidogenic hexapeptides, and our results are compared with those obtained using the graph approaches, the current state-of-the-art methods in heuristic automata induction, and the support vector machine. The results showed the superior performance of the new grammatical inference algorithm on fixed-length amyloid datasets.
Inference of R 0 and Transmission Heterogeneity from the Size Distribution of Stuttering Chains
Blumberg, Seth; Lloyd-Smith, James O.
2013-01-01
For many infectious disease processes such as emerging zoonoses and vaccine-preventable diseases, and infections occur as self-limited stuttering transmission chains. A mechanistic understanding of transmission is essential for characterizing the risk of emerging diseases and monitoring spatio-temporal dynamics. Thus methods for inferring and the degree of heterogeneity in transmission from stuttering chain data have important applications in disease surveillance and management. Previous researchers have used chain size distributions to infer , but estimation of the degree of individual-level variation in infectiousness (as quantified by the dispersion parameter, ) has typically required contact tracing data. Utilizing branching process theory along with a negative binomial offspring distribution, we demonstrate how maximum likelihood estimation can be applied to chain size data to infer both and the dispersion parameter that characterizes heterogeneity. While the maximum likelihood value for is a simple function of the average chain size, the associated confidence intervals are dependent on the inferred degree of transmission heterogeneity. As demonstrated for monkeypox data from the Democratic Republic of Congo, this impacts when a statistically significant change in is detectable. In addition, by allowing for superspreading events, inference of shifts the threshold above which a transmission chain should be considered anomalously large for a given value of (thus reducing the probability of false alarms about pathogen adaptation). Our analysis of monkeypox also clarifies the various ways that imperfect observation can impact inference of transmission parameters, and highlights the need to quantitatively evaluate whether observation is likely to significantly bias results. PMID:23658504
HIERARCHICAL PROBABILISTIC INFERENCE OF COSMIC SHEAR
International Nuclear Information System (INIS)
Schneider, Michael D.; Dawson, William A.; Hogg, David W.; Marshall, Philip J.; Bard, Deborah J.; Meyers, Joshua; Lang, Dustin
2015-01-01
Point estimators for the shearing of galaxy images induced by gravitational lensing involve a complex inverse problem in the presence of noise, pixelization, and model uncertainties. We present a probabilistic forward modeling approach to gravitational lensing inference that has the potential to mitigate the biased inferences in most common point estimators and is practical for upcoming lensing surveys. The first part of our statistical framework requires specification of a likelihood function for the pixel data in an imaging survey given parameterized models for the galaxies in the images. We derive the lensing shear posterior by marginalizing over all intrinsic galaxy properties that contribute to the pixel data (i.e., not limited to galaxy ellipticities) and learn the distributions for the intrinsic galaxy properties via hierarchical inference with a suitably flexible conditional probabilitiy distribution specification. We use importance sampling to separate the modeling of small imaging areas from the global shear inference, thereby rendering our algorithm computationally tractable for large surveys. With simple numerical examples we demonstrate the improvements in accuracy from our importance sampling approach, as well as the significance of the conditional distribution specification for the intrinsic galaxy properties when the data are generated from an unknown number of distinct galaxy populations with different morphological characteristics
Inverse Ising inference with correlated samples
International Nuclear Information System (INIS)
Obermayer, Benedikt; Levine, Erel
2014-01-01
Correlations between two variables of a high-dimensional system can be indicative of an underlying interaction, but can also result from indirect effects. Inverse Ising inference is a method to distinguish one from the other. Essentially, the parameters of the least constrained statistical model are learned from the observed correlations such that direct interactions can be separated from indirect correlations. Among many other applications, this approach has been helpful for protein structure prediction, because residues which interact in the 3D structure often show correlated substitutions in a multiple sequence alignment. In this context, samples used for inference are not independent but share an evolutionary history on a phylogenetic tree. Here, we discuss the effects of correlations between samples on global inference. Such correlations could arise due to phylogeny but also via other slow dynamical processes. We present a simple analytical model to address the resulting inference biases, and develop an exact method accounting for background correlations in alignment data by combining phylogenetic modeling with an adaptive cluster expansion algorithm. We find that popular reweighting schemes are only marginally effective at removing phylogenetic bias, suggest a rescaling strategy that yields better results, and provide evidence that our conclusions carry over to the frequently used mean-field approach to the inverse Ising problem. (paper)
Bayesian structural inference for hidden processes
Strelioff, Christopher C.; Crutchfield, James P.
2014-04-01
We introduce a Bayesian approach to discovering patterns in structurally complex processes. The proposed method of Bayesian structural inference (BSI) relies on a set of candidate unifilar hidden Markov model (uHMM) topologies for inference of process structure from a data series. We employ a recently developed exact enumeration of topological ɛ-machines. (A sequel then removes the topological restriction.) This subset of the uHMM topologies has the added benefit that inferred models are guaranteed to be ɛ-machines, irrespective of estimated transition probabilities. Properties of ɛ-machines and uHMMs allow for the derivation of analytic expressions for estimating transition probabilities, inferring start states, and comparing the posterior probability of candidate model topologies, despite process internal structure being only indirectly present in data. We demonstrate BSI's effectiveness in estimating a process's randomness, as reflected by the Shannon entropy rate, and its structure, as quantified by the statistical complexity. We also compare using the posterior distribution over candidate models and the single, maximum a posteriori model for point estimation and show that the former more accurately reflects uncertainty in estimated values. We apply BSI to in-class examples of finite- and infinite-order Markov processes, as well to an out-of-class, infinite-state hidden process.
Image Analysis of Endosocopic Ultrasonography in Submucosal Tumor Using Fuzzy Inference
Directory of Open Access Journals (Sweden)
Kwang Baek Kim
2013-01-01
Full Text Available Endoscopists usually make a diagnosis in the submucosal tumor depending on the subjective evaluation about general images obtained by endoscopic ultrasonography. In this paper, we propose a method to extract areas of gastrointestinal stromal tumor (GIST and lipoma automatically from the ultrasonic image to assist those specialists. We also propose an algorithm to differentiate GIST from non-GIST by fuzzy inference from such images after applying ROC curve with mean and standard deviation of brightness information. In experiments using real images that medical specialists use, we verify that our method is sufficiently helpful for such specialists for efficient classification of submucosal tumors.
The Impact of Disablers on Predictive Inference
Cummins, Denise Dellarosa
2014-01-01
People consider alternative causes when deciding whether a cause is responsible for an effect (diagnostic inference) but appear to neglect them when deciding whether an effect will occur (predictive inference). Five experiments were conducted to test a 2-part explanation of this phenomenon: namely, (a) that people interpret standard predictive…
Automatic physical inference with information maximizing neural networks
Charnock, Tom; Lavaux, Guilhem; Wandelt, Benjamin D.
2018-04-01
Compressing large data sets to a manageable number of summaries that are informative about the underlying parameters vastly simplifies both frequentist and Bayesian inference. When only simulations are available, these summaries are typically chosen heuristically, so they may inadvertently miss important information. We introduce a simulation-based machine learning technique that trains artificial neural networks to find nonlinear functionals of data that maximize Fisher information: information maximizing neural networks (IMNNs). In test cases where the posterior can be derived exactly, likelihood-free inference based on automatically derived IMNN summaries produces nearly exact posteriors, showing that these summaries are good approximations to sufficient statistics. In a series of numerical examples of increasing complexity and astrophysical relevance we show that IMNNs are robustly capable of automatically finding optimal, nonlinear summaries of the data even in cases where linear compression fails: inferring the variance of Gaussian signal in the presence of noise, inferring cosmological parameters from mock simulations of the Lyman-α forest in quasar spectra, and inferring frequency-domain parameters from LISA-like detections of gravitational waveforms. In this final case, the IMNN summary outperforms linear data compression by avoiding the introduction of spurious likelihood maxima. We anticipate that the automatic physical inference method described in this paper will be essential to obtain both accurate and precise cosmological parameter estimates from complex and large astronomical data sets, including those from LSST and Euclid.
Comparison of evolutionary algorithms in gene regulatory network model inference.
LENUS (Irish Health Repository)
2010-01-01
ABSTRACT: BACKGROUND: The evolution of high throughput technologies that measure gene expression levels has created a data base for inferring GRNs (a process also known as reverse engineering of GRNs). However, the nature of these data has made this process very difficult. At the moment, several methods of discovering qualitative causal relationships between genes with high accuracy from microarray data exist, but large scale quantitative analysis on real biological datasets cannot be performed, to date, as existing approaches are not suitable for real microarray data which are noisy and insufficient. RESULTS: This paper performs an analysis of several existing evolutionary algorithms for quantitative gene regulatory network modelling. The aim is to present the techniques used and offer a comprehensive comparison of approaches, under a common framework. Algorithms are applied to both synthetic and real gene expression data from DNA microarrays, and ability to reproduce biological behaviour, scalability and robustness to noise are assessed and compared. CONCLUSIONS: Presented is a comparison framework for assessment of evolutionary algorithms, used to infer gene regulatory networks. Promising methods are identified and a platform for development of appropriate model formalisms is established.
A Modular Artificial Intelligence Inference Engine System (MAIS) for support of on orbit experiments
Hancock, Thomas M., III
1994-01-01
This paper describes a Modular Artificial Intelligence Inference Engine System (MAIS) support tool that would provide health and status monitoring, cognitive replanning, analysis and support of on-orbit Space Station, Spacelab experiments and systems.
Watson, Jane
2007-01-01
Inference, or decision making, is seen in curriculum documents as the final step in a statistical investigation. For a formal statistical enquiry this may be associated with sophisticated tests involving probability distributions. For young students without the mathematical background to perform such tests, it is still possible to draw informal…
Problem solving and inference mechanisms
Energy Technology Data Exchange (ETDEWEB)
Furukawa, K; Nakajima, R; Yonezawa, A; Goto, S; Aoyama, A
1982-01-01
The heart of the fifth generation computer will be powerful mechanisms for problem solving and inference. A deduction-oriented language is to be designed, which will form the core of the whole computing system. The language is based on predicate logic with the extended features of structuring facilities, meta structures and relational data base interfaces. Parallel computation mechanisms and specialized hardware architectures are being investigated to make possible efficient realization of the language features. The project includes research into an intelligent programming system, a knowledge representation language and system, and a meta inference system to be built on the core. 30 references.
Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks.
Deeter, Anthony; Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui
2017-01-01
The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.
Elements of Causal Inference: Foundations and Learning Algorithms
DEFF Research Database (Denmark)
Peters, Jonas Martin; Janzing, Dominik; Schölkopf, Bernhard
A concise and self-contained introduction to causal inference, increasingly important in data science and machine learning......A concise and self-contained introduction to causal inference, increasingly important in data science and machine learning...
A novel gene network inference algorithm using predictive minimum description length approach.
Chaitankar, Vijender; Ghosh, Preetam; Perkins, Edward J; Gong, Ping; Deng, Youping; Zhang, Chaoyang
2010-05-28
Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold which defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we proposed a new inference algorithm which incorporated mutual information (MI), conditional mutual information (CMI) and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm was evaluated using both synthetic time series data sets and a biological time series data set for the yeast Saccharomyces cerevisiae. The benchmark quantities precision and recall were used as performance measures. The results show that the proposed algorithm produced less false edges and significantly improved the precision, as compared to the existing algorithm. For further analysis the performance of the algorithms was observed over different sizes of data. We have proposed a new algorithm that implements the PMDL principle for inferring gene regulatory networks from time series DNA microarray data that eliminates the need of a fine tuning parameter. The evaluation results obtained from both synthetic and actual biological data sets show that the
Bayesian methods for hackers probabilistic programming and Bayesian inference
Davidson-Pilon, Cameron
2016-01-01
Bayesian methods of inference are deeply natural and extremely powerful. However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice–freeing you to get results using computing power. Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention. Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples a...
Directory of Open Access Journals (Sweden)
Hojatollah Daneshmand
2015-01-01
Full Text Available Nowadays, a lot of attention is paid to the application of intelligent systems in predicting natural phenomena. Artificial neural network systems, fuzzy logic, and adaptive neuro-fuzzy inference are used in this field. Daily minimum temperature of the meteorology station of the city of Mashhad, in northeast of Iran, in a 42-year statistical period, 1966-2008, has been received from the Iranian meteorological organization. Adaptive neuro-fuzzy inference system is used for modeling and forecasting the monthly minimum temperature. To find appropriate inputs, three approaches, i.e. spectral analysis, correlation coefficient, and the knowledge of experts,are used. By applying fast Fourier transform to the parameter of monthly minimum temperature and climate indices, and by using correlation coefficient and the knowledge of experts, 3 indices, Nino 1 + 2, NP, and PNA, are selected as model inputs. A hybrid training algorithm is used to train the system. According to simulation results, a correlation coefficient of 0.987 between the observed values and the predicted values, as well as amean absolute percentage deviations of 27.6% indicate an acceptable estimation of the model.
Assessment of network inference methods: how to cope with an underdetermined problem.
Directory of Open Access Journals (Sweden)
Caroline Siegenthaler
Full Text Available The inference of biological networks is an active research area in the field of systems biology. The number of network inference algorithms has grown tremendously in the last decade, underlining the importance of a fair assessment and comparison among these methods. Current assessments of the performance of an inference method typically involve the application of the algorithm to benchmark datasets and the comparison of the network predictions against the gold standard or reference networks. While the network inference problem is often deemed underdetermined, implying that the inference problem does not have a (unique solution, the consequences of such an attribute have not been rigorously taken into consideration. Here, we propose a new procedure for assessing the performance of gene regulatory network (GRN inference methods. The procedure takes into account the underdetermined nature of the inference problem, in which gene regulatory interactions that are inferable or non-inferable are determined based on causal inference. The assessment relies on a new definition of the confusion matrix, which excludes errors associated with non-inferable gene regulations. For demonstration purposes, the proposed assessment procedure is applied to the DREAM 4 In Silico Network Challenge. The results show a marked change in the ranking of participating methods when taking network inferability into account.
Probability and Statistical Inference
Prosper, Harrison B.
2006-01-01
These lectures introduce key concepts in probability and statistical inference at a level suitable for graduate students in particle physics. Our goal is to paint as vivid a picture as possible of the concepts covered.
Fuzzy logic controller using different inference methods
International Nuclear Information System (INIS)
Liu, Z.; De Keyser, R.
1994-01-01
In this paper the design of fuzzy controllers by using different inference methods is introduced. Configuration of the fuzzy controllers includes a general rule-base which is a collection of fuzzy PI or PD rules, the triangular fuzzy data model and a centre of gravity defuzzification algorithm. The generalized modus ponens (GMP) is used with the minimum operator of the triangular norm. Under the sup-min inference rule, six fuzzy implication operators are employed to calculate the fuzzy look-up tables for each rule base. The performance is tested in simulated systems with MATLAB/SIMULINK. Results show the effects of using the fuzzy controllers with different inference methods and applied to different test processes
An algebra-based method for inferring gene regulatory networks.
Vera-Licona, Paola; Jarrah, Abdul; Garcia-Puente, Luis David; McGee, John; Laubenbacher, Reinhard
2014-03-26
The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the
Turewicz, Michael; Kohl, Michael; Ahrens, Maike; Mayer, Gerhard; Uszkoreit, Julian; Naboulsi, Wael; Bracht, Thilo; Megger, Dominik A; Sitek, Barbara; Marcus, Katrin; Eisenacher, Martin
2017-11-10
The analysis of high-throughput mass spectrometry-based proteomics data must address the specific challenges of this technology. To this end, the comprehensive proteomics workflow offered by the de.NBI service center BioInfra.Prot provides indispensable components for the computational and statistical analysis of this kind of data. These components include tools and methods for spectrum identification and protein inference, protein quantification, expression analysis as well as data standardization and data publication. All particular methods of the workflow which address these tasks are state-of-the-art or cutting edge. As has been shown in previous publications, each of these methods is adequate to solve its specific task and gives competitive results. However, the methods included in the workflow are continuously reviewed, updated and improved to adapt to new scientific developments. All of these particular components and methods are available as stand-alone BioInfra.Prot services or as a complete workflow. Since BioInfra.Prot provides manifold fast communication channels to get access to all components of the workflow (e.g., via the BioInfra.Prot ticket system: bioinfraprot@rub.de) users can easily benefit from this service and get support by experts. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Active inference, sensory attenuation and illusions.
Brown, Harriet; Adams, Rick A; Parees, Isabel; Edwards, Mark; Friston, Karl
2013-11-01
Active inference provides a simple and neurobiologically plausible account of how action and perception are coupled in producing (Bayes) optimal behaviour. This can be seen most easily as minimising prediction error: we can either change our predictions to explain sensory input through perception. Alternatively, we can actively change sensory input to fulfil our predictions. In active inference, this action is mediated by classical reflex arcs that minimise proprioceptive prediction error created by descending proprioceptive predictions. However, this creates a conflict between action and perception; in that, self-generated movements require predictions to override the sensory evidence that one is not actually moving. However, ignoring sensory evidence means that externally generated sensations will not be perceived. Conversely, attending to (proprioceptive and somatosensory) sensations enables the detection of externally generated events but precludes generation of actions. This conflict can be resolved by attenuating the precision of sensory evidence during movement or, equivalently, attending away from the consequences of self-made acts. We propose that this Bayes optimal withdrawal of precise sensory evidence during movement is the cause of psychophysical sensory attenuation. Furthermore, it explains the force-matching illusion and reproduces empirical results almost exactly. Finally, if attenuation is removed, the force-matching illusion disappears and false (delusional) inferences about agency emerge. This is important, given the negative correlation between sensory attenuation and delusional beliefs in normal subjects--and the reduction in the magnitude of the illusion in schizophrenia. Active inference therefore links the neuromodulatory optimisation of precision to sensory attenuation and illusory phenomena during the attribution of agency in normal subjects. It also provides a functional account of deficits in syndromes characterised by false inference
Bayesian Inference and Online Learning in Poisson Neuronal Networks.
Huang, Yanping; Rao, Rajesh P N
2016-08-01
Motivated by the growing evidence for Bayesian computation in the brain, we show how a two-layer recurrent network of Poisson neurons can perform both approximate Bayesian inference and learning for any hidden Markov model. The lower-layer sensory neurons receive noisy measurements of hidden world states. The higher-layer neurons infer a posterior distribution over world states via Bayesian inference from inputs generated by sensory neurons. We demonstrate how such a neuronal network with synaptic plasticity can implement a form of Bayesian inference similar to Monte Carlo methods such as particle filtering. Each spike in a higher-layer neuron represents a sample of a particular hidden world state. The spiking activity across the neural population approximates the posterior distribution over hidden states. In this model, variability in spiking is regarded not as a nuisance but as an integral feature that provides the variability necessary for sampling during inference. We demonstrate how the network can learn the likelihood model, as well as the transition probabilities underlying the dynamics, using a Hebbian learning rule. We present results illustrating the ability of the network to perform inference and learning for arbitrary hidden Markov models.
Surface radiant flux densities inferred from LAC and GAC AVHRR data
Berger, F.; Klaes, D.
To infer surface radiant flux densities from current (NOAA-AVHRR, ERS-1/2 ATSR) and future meteorological (Envisat AATSR, MSG, METOP) satellite data, the complex, modular analysis scheme SESAT (Strahlungs- und Energieflüsse aus Satellitendaten) could be developed (Berger, 2001). This scheme allows the determination of cloud types, optical and microphysical cloud properties as well as surface and TOA radiant flux densities. After testing of SESAT in Central Europe and the Baltic Sea catchment (more than 400scenes U including a detailed validation with various surface measurements) it could be applied to a large number of NOAA-16 AVHRR overpasses covering the globe.For the analysis, two different spatial resolutions U local area coverage (LAC) andwere considered. Therefore, all inferred results, like global area coverage (GAC) U cloud cover, cloud properties and radiant properties, could be intercompared. Specific emphasis could be made to the surface radiant flux densities (all radiative balance compoments), where results for different regions, like Southern America, Southern Africa, Northern America, Europe, and Indonesia, will be presented. Applying SESAT, energy flux densities, like latent and sensible heat flux densities could also be determined additionally. A statistical analysis of all results including a detailed discussion for the two spatial resolutions will close this study.
Inference of protein diffusion probed via fluorescence correlation spectroscopy
Tsekouras, Konstantinos
2015-03-01
Fluctuations are an inherent part of single molecule or few particle biophysical data sets. Traditionally, ``noise'' fluctuations have been viewed as a nuisance, to be eliminated or minimized. Here we look on how statistical inference methods - that take explicit advantage of fluctuations - have allowed us to draw an unexpected picture of single molecule diffusional dynamics. Our focus is on the diffusion of proteins probed using fluorescence correlation spectroscopy (FCS). First, we discuss how - in collaboration with the Bustamante and Marqusee labs at UC Berkeley - we determined using FCS data that individual enzymes are perturbed by self-generated catalytic heat (Riedel et al, Nature, 2014). Using the tools of inference, we found how distributions of enzyme diffusion coefficients shift in the presence of substrate revealing that enzymes performing highly exothermic reactions dissipate heat by transiently accelerating their center of mass following a catalytic reaction. Next, when molecules diffuse in the cell nucleus they often appear to diffuse anomalously. We analyze FCS data - in collaboration with Rich Day at the IU Med School - to propose a simple model for transcription factor binding-unbinding in the nucleus to show that it may give rise to apparent anomalous diffusion. Here inference methods extract entire binding affinity distributions for the diffusing transcription factors, allowing us to precisely characterize their interactions with different components of the nuclear environment. From this analysis, we draw key mechanistic insight that goes beyond what is possible by simply fitting data to ``anomalous diffusion'' models.
Contingency inferences driven by base rates: Valid by sampling
Directory of Open Access Journals (Sweden)
Florian Kutzner
2011-04-01
Full Text Available Fiedler et al. (2009, reviewed evidence for the utilization of a contingency inference strategy termed pseudocontingencies (PCs. In PCs, the more frequent levels (and, by implication, the less frequent levels are assumed to be associated. PCs have been obtained using a wide range of task settings and dependent measures. Yet, the readiness with which decision makers rely on PCs is poorly understood. A computer simulation explored two potential sources of subjective validity of PCs. First, PCs are shown to perform above chance level when the task is to infer the sign of moderate to strong population contingencies from a sample of observations. Second, contingency inferences based on PCs and inferences based on cell frequencies are shown to partially agree across samples. Intriguingly, this criterion and convergent validity are by-products of random sampling error, highlighting the inductive nature of contingency inferences.
Steele, Vaughn R; Bernat, Edward M; van den Broek, Paul; Collins, Paul F; Patrick, Christopher J; Marsolek, Chad J
2013-01-25
Successful comprehension during reading often requires inferring information not explicitly presented. This information is readily accessible when subsequently encountered, and a neural correlate of this is an attenuation of the N400 event-related potential (ERP). We used ERPs and time-frequency (TF) analysis to investigate neural correlates of processing inferred information after a causal coherence inference had been generated during text comprehension. Participants read short texts, some of which promoted inference generation. After each text, they performed lexical decisions to target words that were unrelated or inference-related to the preceding text. Consistent with previous findings, inference-related words elicited an attenuated N400 relative to unrelated words. TF analyses revealed unique contributions to the N400 from activity occurring at 1-6 Hz (theta) and 0-2 Hz (delta), supporting the view that multiple, sequential processes underlie the N400. Copyright © 2012 Elsevier B.V. All rights reserved.
Reinforcement and inference in cross-situational word learning.
Tilles, Paulo F C; Fontanari, José F
2013-01-01
Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.
Data-driven inference for the spatial scan statistic.
Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C
2011-08-02
Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
Graphical models for inferring single molecule dynamics
Directory of Open Access Journals (Sweden)
Gonzalez Ruben L
2010-10-01
Full Text Available Abstract Background The recent explosion of experimental techniques in single molecule biophysics has generated a variety of novel time series data requiring equally novel computational tools for analysis and inference. This article describes in general terms how graphical modeling may be used to learn from biophysical time series data using the variational Bayesian expectation maximization algorithm (VBEM. The discussion is illustrated by the example of single-molecule fluorescence resonance energy transfer (smFRET versus time data, where the smFRET time series is modeled as a hidden Markov model (HMM with Gaussian observables. A detailed description of smFRET is provided as well. Results The VBEM algorithm returns the model’s evidence and an approximating posterior parameter distribution given the data. The former provides a metric for model selection via maximum evidence (ME, and the latter a description of the model’s parameters learned from the data. ME/VBEM provide several advantages over the more commonly used approach of maximum likelihood (ML optimized by the expectation maximization (EM algorithm, the most important being a natural form of model selection and a well-posed (non-divergent optimization problem. Conclusions The results demonstrate the utility of graphical modeling for inference of dynamic processes in single molecule biophysics.
Bakbergenuly, Ilyas; Kulinskaya, Elena; Morgenthaler, Stephan
2016-07-01
We study bias arising as a result of nonlinear transformations of random variables in random or mixed effects models and its effect on inference in group-level studies or in meta-analysis. The findings are illustrated on the example of overdispersed binomial distributions, where we demonstrate considerable biases arising from standard log-odds and arcsine transformations of the estimated probability p̂, both for single-group studies and in combining results from several groups or studies in meta-analysis. Our simulations confirm that these biases are linear in ρ, for small values of ρ, the intracluster correlation coefficient. These biases do not depend on the sample sizes or the number of studies K in a meta-analysis and result in abysmal coverage of the combined effect for large K. We also propose bias-correction for the arcsine transformation. Our simulations demonstrate that this bias-correction works well for small values of the intraclass correlation. The methods are applied to two examples of meta-analyses of prevalence. © 2016 The Authors. Biometrical Journal Published by Wiley-VCH Verlag GmbH & Co. KGaA.
A new fast method for inferring multiple consensus trees using k-medoids.
Tahiri, Nadia; Willems, Matthieu; Makarenkov, Vladimir
2018-04-05
Gene trees carry important information about specific evolutionary patterns which characterize the evolution of the corresponding gene families. However, a reliable species consensus tree cannot be inferred from a multiple sequence alignment of a single gene family or from the concatenation of alignments corresponding to gene families having different evolutionary histories. These evolutionary histories can be quite different due to horizontal transfer events or to ancient gene duplications which cause the emergence of paralogs within a genome. Many methods have been proposed to infer a single consensus tree from a collection of gene trees. Still, the application of these tree merging methods can lead to the loss of specific evolutionary patterns which characterize some gene families or some groups of gene families. Thus, the problem of inferring multiple consensus trees from a given set of gene trees becomes relevant. We describe a new fast method for inferring multiple consensus trees from a given set of phylogenetic trees (i.e. additive trees or X-trees) defined on the same set of species (i.e. objects or taxa). The traditional consensus approach yields a single consensus tree. We use the popular k-medoids partitioning algorithm to divide a given set of trees into several clusters of trees. We propose novel versions of the well-known Silhouette and Caliński-Harabasz cluster validity indices that are adapted for tree clustering with k-medoids. The efficiency of the new method was assessed using both synthetic and real data, such as a well-known phylogenetic dataset consisting of 47 gene trees inferred for 14 archaeal organisms. The method described here allows inference of multiple consensus trees from a given set of gene trees. It can be used to identify groups of gene trees having similar intragroup and different intergroup evolutionary histories. The main advantage of our method is that it is much faster than the existing tree clustering approaches, while
Inferring biological functions of guanylyl cyclases with computational methods
Alquraishi, May Majed; Meier, Stuart Kurt
2013-01-01
A number of studies have shown that functionally related genes are often co-expressed and that computational based co-expression analysis can be used to accurately identify functional relationships between genes and by inference, their encoded proteins. Here we describe how a computational based co-expression analysis can be used to link the function of a specific gene of interest to a defined cellular response. Using a worked example we demonstrate how this methodology is used to link the function of the Arabidopsis Wall-Associated Kinase-Like 10 gene, which encodes a functional guanylyl cyclase, to host responses to pathogens. © Springer Science+Business Media New York 2013.
Inferring biological functions of guanylyl cyclases with computational methods
Alquraishi, May Majed
2013-09-03
A number of studies have shown that functionally related genes are often co-expressed and that computational based co-expression analysis can be used to accurately identify functional relationships between genes and by inference, their encoded proteins. Here we describe how a computational based co-expression analysis can be used to link the function of a specific gene of interest to a defined cellular response. Using a worked example we demonstrate how this methodology is used to link the function of the Arabidopsis Wall-Associated Kinase-Like 10 gene, which encodes a functional guanylyl cyclase, to host responses to pathogens. © Springer Science+Business Media New York 2013.
Eight challenges in phylodynamic inference
Directory of Open Access Journals (Sweden)
Simon D.W. Frost
2015-03-01
Full Text Available The field of phylodynamics, which attempts to enhance our understanding of infectious disease dynamics using pathogen phylogenies, has made great strides in the past decade. Basic epidemiological and evolutionary models are now well characterized with inferential frameworks in place. However, significant challenges remain in extending phylodynamic inference to more complex systems. These challenges include accounting for evolutionary complexities such as changing mutation rates, selection, reassortment, and recombination, as well as epidemiological complexities such as stochastic population dynamics, host population structure, and different patterns at the within-host and between-host scales. An additional challenge exists in making efficient inferences from an ever increasing corpus of sequence data.
Inference of R(0 and transmission heterogeneity from the size distribution of stuttering chains.
Directory of Open Access Journals (Sweden)
Seth Blumberg
Full Text Available For many infectious disease processes such as emerging zoonoses and vaccine-preventable diseases, [Formula: see text] and infections occur as self-limited stuttering transmission chains. A mechanistic understanding of transmission is essential for characterizing the risk of emerging diseases and monitoring spatio-temporal dynamics. Thus methods for inferring [Formula: see text] and the degree of heterogeneity in transmission from stuttering chain data have important applications in disease surveillance and management. Previous researchers have used chain size distributions to infer [Formula: see text], but estimation of the degree of individual-level variation in infectiousness (as quantified by the dispersion parameter, [Formula: see text] has typically required contact tracing data. Utilizing branching process theory along with a negative binomial offspring distribution, we demonstrate how maximum likelihood estimation can be applied to chain size data to infer both [Formula: see text] and the dispersion parameter that characterizes heterogeneity. While the maximum likelihood value for [Formula: see text] is a simple function of the average chain size, the associated confidence intervals are dependent on the inferred degree of transmission heterogeneity. As demonstrated for monkeypox data from the Democratic Republic of Congo, this impacts when a statistically significant change in [Formula: see text] is detectable. In addition, by allowing for superspreading events, inference of [Formula: see text] shifts the threshold above which a transmission chain should be considered anomalously large for a given value of [Formula: see text] (thus reducing the probability of false alarms about pathogen adaptation. Our analysis of monkeypox also clarifies the various ways that imperfect observation can impact inference of transmission parameters, and highlights the need to quantitatively evaluate whether observation is likely to significantly bias results.
Cox, L A; Ricci, P F
2005-04-01
Causal inference of exposure-response relations from data is a challenging aspect of risk assessment with important implications for public and private risk management. Such inference, which is fundamentally empirical and based on exposure (or dose)-response models, seldom arises from a single set of data; rather, it requires integrating heterogeneous information from diverse sources and disciplines including epidemiology, toxicology, and cell and molecular biology. The causal aspects we discuss focus on these three aspects: drawing sound inferences about causal relations from one or more observational studies; addressing and resolving biases that can affect a single multivariate empirical exposure-response study; and applying the results from these considerations to the microbiological risk management of human health risks and benefits of a ban on antibiotic use in animals, in the context of banning enrofloxacin or macrolides, antibiotics used against bacterial illnesses in poultry, and the effects of such bans on changing the risk of human food-borne campylobacteriosis infections. The purposes of this paper are to describe novel causal methods for assessing empirical causation and inference; exemplify how to deal with biases that routinely arise in multivariate exposure- or dose-response modeling; and provide a simplified discussion of a case study of causal inference using microbial risk analysis as an example. The case study supports the conclusion that the human health benefits from a ban are unlikely to be greater than the excess human health risks that it could create, even when accounting for uncertainty. We conclude that quantitative causal analysis of risks is a preferable to qualitative assessments because it does not involve unjustified loss of information and is sound under the inferential use of risk results by management.
Making Type Inference Practical
DEFF Research Database (Denmark)
Schwartzbach, Michael Ignatieff; Oxhøj, Nicholas; Palsberg, Jens
1992-01-01
We present the implementation of a type inference algorithm for untyped object-oriented programs with inheritance, assignments, and late binding. The algorithm significantly improves our previous one, presented at OOPSLA'91, since it can handle collection classes, such as List, in a useful way. Abo......, the complexity has been dramatically improved, from exponential time to low polynomial time. The implementation uses the techniques of incremental graph construction and constraint template instantiation to avoid representing intermediate results, doing superfluous work, and recomputing type information....... Experiments indicate that the implementation type checks as much as 100 lines pr. second. This results in a mature product, on which a number of tools can be based, for example a safety tool, an image compression tool, a code optimization tool, and an annotation tool. This may make type inference for object...
Examples in parametric inference with R
Dixit, Ulhas Jayram
2016-01-01
This book discusses examples in parametric inference with R. Combining basic theory with modern approaches, it presents the latest developments and trends in statistical inference for students who do not have an advanced mathematical and statistical background. The topics discussed in the book are fundamental and common to many fields of statistical inference and thus serve as a point of departure for in-depth study. The book is divided into eight chapters: Chapter 1 provides an overview of topics on sufficiency and completeness, while Chapter 2 briefly discusses unbiased estimation. Chapter 3 focuses on the study of moments and maximum likelihood estimators, and Chapter 4 presents bounds for the variance. In Chapter 5, topics on consistent estimator are discussed. Chapter 6 discusses Bayes, while Chapter 7 studies some more powerful tests. Lastly, Chapter 8 examines unbiased and other tests. Senior undergraduate and graduate students in statistics and mathematics, and those who have taken an introductory cou...
Causal Effect Inference with Deep Latent-Variable Models
Louizos, C; Shalit, U.; Mooij, J.; Sontag, D.; Zemel, R.; Welling, M.
2017-01-01
Learning individual-level causal effects from observational data, such as inferring the most effective medication for a specific patient, is a problem of growing importance for policy makers. The most important aspect of inferring causal effects from observational data is the handling of
Statistical Inference at Work: Statistical Process Control as an Example
Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia
2008-01-01
To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…
On quantum statistical inference
Barndorff-Nielsen, O.E.; Gill, R.D.; Jupp, P.E.
2003-01-01
Interest in problems of statistical inference connected to measurements of quantum systems has recently increased substantially, in step with dramatic new developments in experimental techniques for studying small quantum systems. Furthermore, developments in the theory of quantum measurements have
Inferring time‐varying recharge from inverse analysis of long‐term water levels
Dickinson, Jesse; Hanson, R.T.; Ferré, T.P.A.; Leake, S.A.
2004-01-01
Water levels in aquifers typically vary in response to time‐varying rates of recharge, suggesting the possibility of inferring time‐varying recharge rates on the basis of long‐term water level records. Presumably, in the southwestern United States (Arizona, Nevada, New Mexico, southern California, and southern Utah), rates of mountain front recharge to alluvial aquifers depend on variations in precipitation rates due to known climate cycles such as the El Niño‐Southern Oscillation index and the Pacific Decadal Oscillation. This investigation examined the inverse application of a one‐dimensional analytical model for periodic flow described by Lloyd R. Townley in 1995 to estimate periodic recharge variations on the basis of variations in long‐term water level records using southwest aquifers as the case study. Time‐varying water level records at various locations along the flow line were obtained by simulation of forward models of synthetic basins with applied sinusoidal recharge of either a single period or composite of multiple periods of length similar to known climate cycles. Periodic water level components, reconstructed using singular spectrum analysis (SSA), were used to calibrate the analytical model to estimate each recharge component. The results demonstrated that periodic recharge estimates were most accurate in basins with nearly uniform transmissivity and the accuracy of the recharge estimates depends on monitoring well location. A case study of the San Pedro Basin, Arizona, is presented as an example of calibrating the analytical model to real data.
Rohatgi, Vijay K
2003-01-01
Unified treatment of probability and statistics examines and analyzes the relationship between the two fields, exploring inferential issues. Numerous problems, examples, and diagrams--some with solutions--plus clear-cut, highlighted summaries of results. Advanced undergraduate to graduate level. Contents: 1. Introduction. 2. Probability Model. 3. Probability Distributions. 4. Introduction to Statistical Inference. 5. More on Mathematical Expectation. 6. Some Discrete Models. 7. Some Continuous Models. 8. Functions of Random Variables and Random Vectors. 9. Large-Sample Theory. 10. General Meth
The Impact of Contextual Clue Selection on Inference
Directory of Open Access Journals (Sweden)
Leila Barati
2010-05-01
Full Text Available Linguistic information can be conveyed in the form of speech and written text, but it is the content of the message that is ultimately essential for higher-level processes in language comprehension, such as making inferences and associations between text information and knowledge about the world. Linguistically, inference is the shovel that allows receivers to dig meaning out from the text with selecting different embedded contextual clues. Naturally, people with different world experiences infer similar contextual situations differently. Lack of contextual knowledge of the target language can present an obstacle to comprehension (Anderson & Lynch, 2003. This paper tries to investigate how true contextual clue selection from the text can influence listener’s inference. In the present study 60 male and female teenagers (13-19 and 60 male and female young adults (20-26 were selected randomly based on Oxford Placement Test (OPT. During the study two fiction and two non-fiction passages were read to the participants in the experimental and control groups respectively and they were given scores according to Lexile’s Score (LS[1] based on their correct inference and logical thinking ability. In general the results show that participants’ clue selection based on their personal schematic references and background knowledge differ between teenagers and young adults and influence inference and listening comprehension. [1]- This is a framework for reading and listening which matches the appropriate score to each text based on degree of difficulty of text and each text was given a Lexile score from zero to four.
Horne, J. B.
2013-01-11
Phylogeographical studies have shown that some shallow-water marine organisms, such as certain coral reef fishes, lack spatial population structure at oceanic scales, despite vast distances of pelagic habitat between reefs and other dispersal barriers. However, whether these dispersive widespread taxa constitute long-term panmictic populations across their species ranges remains unknown. Conventional phylogeographical inferences frequently fail to distinguish between long-term panmixia and metapopulations connected by gene flow. Moreover, marine organisms have notoriously large effective population sizes that confound population structure detection. Therefore, at what spatial scale marine populations experience independent evolutionary trajectories and ultimately species divergence is still unclear. Here, we present a phylogeographical study of a cosmopolitan Indo-Pacific coral reef fish Naso hexacanthus and its sister species Naso caesius, using two mtDNA and two nDNA markers. The purpose of this study was two-fold: first, to test for broad-scale panmixia in N. hexacanthus by fitting the data to various phylogeographical models within a Bayesian statistical framework, and second, to explore patterns of genetic divergence between the two broadly sympatric species. We report that N. hexacanthus shows little population structure across the Indo-Pacific and a range-wide, long-term panmictic population model best fit the data. Hence, this species presently comprises a single evolutionary unit across much of the tropical Indian and Pacific Oceans. Naso hexacanthus and N. caesius were not reciprocally monophyletic in the mtDNA markers but showed varying degrees of population level divergence in the two nuclear introns. Overall, patterns are consistent with secondary introgression following a period of isolation, which may be attributed to oceanographic conditions of the mid to late Pleistocene, when these two species appear to have diverged. © 2013 The Authors. Journal
Horne, J. B.; van Herwerden, L.
2013-01-01
Phylogeographical studies have shown that some shallow-water marine organisms, such as certain coral reef fishes, lack spatial population structure at oceanic scales, despite vast distances of pelagic habitat between reefs and other dispersal barriers. However, whether these dispersive widespread taxa constitute long-term panmictic populations across their species ranges remains unknown. Conventional phylogeographical inferences frequently fail to distinguish between long-term panmixia and metapopulations connected by gene flow. Moreover, marine organisms have notoriously large effective population sizes that confound population structure detection. Therefore, at what spatial scale marine populations experience independent evolutionary trajectories and ultimately species divergence is still unclear. Here, we present a phylogeographical study of a cosmopolitan Indo-Pacific coral reef fish Naso hexacanthus and its sister species Naso caesius, using two mtDNA and two nDNA markers. The purpose of this study was two-fold: first, to test for broad-scale panmixia in N. hexacanthus by fitting the data to various phylogeographical models within a Bayesian statistical framework, and second, to explore patterns of genetic divergence between the two broadly sympatric species. We report that N. hexacanthus shows little population structure across the Indo-Pacific and a range-wide, long-term panmictic population model best fit the data. Hence, this species presently comprises a single evolutionary unit across much of the tropical Indian and Pacific Oceans. Naso hexacanthus and N. caesius were not reciprocally monophyletic in the mtDNA markers but showed varying degrees of population level divergence in the two nuclear introns. Overall, patterns are consistent with secondary introgression following a period of isolation, which may be attributed to oceanographic conditions of the mid to late Pleistocene, when these two species appear to have diverged. © 2013 The Authors. Journal
Inferring Demographic History Using Two-Locus Statistics.
Ragsdale, Aaron P; Gutenkunst, Ryan N
2017-06-01
Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference. Copyright © 2017 by the Genetics Society of America.
Statistical Inference on the Canadian Middle Class
Directory of Open Access Journals (Sweden)
Russell Davidson
2018-03-01
Full Text Available Conventional wisdom says that the middle classes in many developed countries have recently suffered losses, in terms of both the share of the total population belonging to the middle class, and also their share in total income. Here, distribution-free methods are developed for inference on these shares, by means of deriving expressions for their asymptotic variances of sample estimates, and the covariance of the estimates. Asymptotic inference can be undertaken based on asymptotic normality. Bootstrap inference can be expected to be more reliable, and appropriate bootstrap procedures are proposed. As an illustration, samples of individual earnings drawn from Canadian census data are used to test various hypotheses about the middle-class shares, and confidence intervals for them are computed. It is found that, for the earlier censuses, sample sizes are large enough for asymptotic and bootstrap inference to be almost identical, but that, in the twenty-first century, the bootstrap fails on account of a strange phenomenon whereby many presumably different incomes in the data are rounded to one and the same value. Another difference between the centuries is the appearance of heavy right-hand tails in the income distributions of both men and women.
The importance of learning when making inferences
Directory of Open Access Journals (Sweden)
Jorg Rieskamp
2008-03-01
Full Text Available The assumption that people possess a repertoire of strategies to solve the inference problems they face has been made repeatedly. The experimental findings of two previous studies on strategy selection are reexamined from a learning perspective, which argues that people learn to select strategies for making probabilistic inferences. This learning process is modeled with the strategy selection learning (SSL theory, which assumes that people develop subjective expectancies for the strategies they have. They select strategies proportional to their expectancies, which are updated on the basis of experience. For the study by Newell, Weston, and Shanks (2003 it can be shown that people did not anticipate the success of a strategy from the beginning of the experiment. Instead, the behavior observed at the end of the experiment was the result of a learning process that can be described by the SSL theory. For the second study, by Br"oder and Schiffer (2006, the SSL theory is able to provide an explanation for why participants only slowly adapted to new environments in a dynamic inference situation. The reanalysis of the previous studies illustrates the importance of learning for probabilistic inferences.
Statistical inference for financial engineering
Taniguchi, Masanobu; Ogata, Hiroaki; Taniai, Hiroyuki
2014-01-01
This monograph provides the fundamentals of statistical inference for financial engineering and covers some selected methods suitable for analyzing financial time series data. In order to describe the actual financial data, various stochastic processes, e.g. non-Gaussian linear processes, non-linear processes, long-memory processes, locally stationary processes etc. are introduced and their optimal estimation is considered as well. This book also includes several statistical approaches, e.g., discriminant analysis, the empirical likelihood method, control variate method, quantile regression, realized volatility etc., which have been recently developed and are considered to be powerful tools for analyzing the financial data, establishing a new bridge between time series and financial engineering. This book is well suited as a professional reference book on finance, statistics and statistical financial engineering. Readers are expected to have an undergraduate-level knowledge of statistics.
Bayesian inference of substrate properties from film behavior
International Nuclear Information System (INIS)
Aggarwal, R; Demkowicz, M J; Marzouk, Y M
2015-01-01
We demonstrate that by observing the behavior of a film deposited on a substrate, certain features of the substrate may be inferred with quantified uncertainty using Bayesian methods. We carry out this demonstration on an illustrative film/substrate model where the substrate is a Gaussian random field and the film is a two-component mixture that obeys the Cahn–Hilliard equation. We construct a stochastic reduced order model to describe the film/substrate interaction and use it to infer substrate properties from film behavior. This quantitative inference strategy may be adapted to other film/substrate systems. (paper)
Brain Imaging, Forward Inference, and Theories of Reasoning
Heit, Evan
2015-01-01
This review focuses on the issue of how neuroimaging studies address theoretical accounts of reasoning, through the lens of the method of forward inference (Henson, 2005, 2006). After theories of deductive and inductive reasoning are briefly presented, the method of forward inference for distinguishing between psychological theories based on brain imaging evidence is critically reviewed. Brain imaging studies of reasoning, comparing deductive and inductive arguments, comparing meaningful versus non-meaningful material, investigating hemispheric localization, and comparing conditional and relational arguments, are assessed in light of the method of forward inference. Finally, conclusions are drawn with regard to future research opportunities. PMID:25620926
Brain imaging, forward inference, and theories of reasoning.
Heit, Evan
2014-01-01
This review focuses on the issue of how neuroimaging studies address theoretical accounts of reasoning, through the lens of the method of forward inference (Henson, 2005, 2006). After theories of deductive and inductive reasoning are briefly presented, the method of forward inference for distinguishing between psychological theories based on brain imaging evidence is critically reviewed. Brain imaging studies of reasoning, comparing deductive and inductive arguments, comparing meaningful versus non-meaningful material, investigating hemispheric localization, and comparing conditional and relational arguments, are assessed in light of the method of forward inference. Finally, conclusions are drawn with regard to future research opportunities.
Directory of Open Access Journals (Sweden)
Vanessa Almendro
2014-02-01
Full Text Available Cancer therapy exerts a strong selection pressure that shapes tumor evolution, yet our knowledge of how tumors change during treatment is limited. Here, we report the analysis of cellular heterogeneity for genetic and phenotypic features and their spatial distribution in breast tumors pre- and post-neoadjuvant chemotherapy. We found that intratumor genetic diversity was tumor-subtype specific, and it did not change during treatment in tumors with partial or no response. However, lower pretreatment genetic diversity was significantly associated with pathologic complete response. In contrast, phenotypic diversity was different between pre- and posttreatment samples. We also observed significant changes in the spatial distribution of cells with distinct genetic and phenotypic features. We used these experimental data to develop a stochastic computational model to infer tumor growth patterns and evolutionary dynamics. Our results highlight the importance of integrated analysis of genotypes and phenotypes of single cells in intact tissues to predict tumor evolution.
International Nuclear Information System (INIS)
Almendro, Vanessa; Cheng, Yu-Kang; Randles, Amanda; Itzkovitz, Shalev; Marusyk, Andriy; Ametller, Elisabet; Gonzalez-Farre, Xavier; Muñoz, Montse; Russnes, Hege G.; Helland, Åslaug; Rye, Inga H.; Borresen-Dale, Anne-Lise; Maruyama, Reo; Van Oudenaarden, Alexander; Dowsett, Mitchell; Jones, Robin L.; Reis-Filho, Jorge; Gascon, Pere; Gönen, Mithat; Michor, Franziska; Polyak, Kornelia
2014-01-01
Cancer therapy exerts a strong selection pressure that shapes tumor evolution, yet our knowledge of how tumors change during treatment is limited. Here, we report the analysis of cellular heterogeneity for genetic and phenotypic features and their spatial distribution in breast tumors pre- and post-neoadjuvant chemotherapy. We found that intratumor genetic diversity was tumor-subtype specific, and it did not change during treatment in tumors with partial or no response. However, lower pretreatment genetic diversity was significantly associated with pathologic complete response. In contrast, phenotypic diversity was different between pre- and post-treatment samples. We also observed significant changes in the spatial distribution of cells with distinct genetic and phenotypic features. We used these experimental data to develop a stochastic computational model to infer tumor growth patterns and evolutionary dynamics. Our results highlight the importance of integrated analysis of genotypes and phenotypes of single cells in intact tissues to predict tumor evolution
Data-driven inference for the spatial scan statistic
Directory of Open Access Journals (Sweden)
Duczmal Luiz H
2011-08-01
Full Text Available Abstract Background Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. Results A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. Conclusions A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
A method for crack sizing using Bayesian inference arising in eddy current testing
International Nuclear Information System (INIS)
Kojima, Fumio; Kikuchi, Mitsuhiro
2008-01-01
This paper is concerned with a sizing methodology of crack using Bayesian inference arising in eddy current testing. There is often uncertainty about data through quantitative measurements of nondestructive testing and this can yield misleading inference of crack sizing at on-site monitoring. In this paper, we propose optimal strategies of measurements in eddy current testing using Bayesian prior-to-posteriori analysis. First our likelihood functional is given by Gaussian distribution with the measurement model based on the hybrid use of finite and boundary element methods. Secondly, given a priori distributions of crack sizing, we propose a method for estimating the region of interest for sizing cracks. Finally an optimal sensing method is demonstrated using our idea. (author)
Pecevski, Dejan; Buesing, Lars; Maass, Wolfgang
2011-01-01
An important open problem of computational neuroscience is the generic organization of computations in networks of neurons in the brain. We show here through rigorous theoretical analysis that inherent stochastic features of spiking neurons, in combination with simple nonlinear computational operations in specific network motifs and dendritic arbors, enable networks of spiking neurons to carry out probabilistic inference through sampling in general graphical models. In particular, it enables them to carry out probabilistic inference in Bayesian networks with converging arrows (“explaining away”) and with undirected loops, that occur in many real-world tasks. Ubiquitous stochastic features of networks of spiking neurons, such as trial-to-trial variability and spontaneous activity, are necessary ingredients of the underlying computational organization. We demonstrate through computer simulations that this approach can be scaled up to neural emulations of probabilistic inference in fairly large graphical models, yielding some of the most complex computations that have been carried out so far in networks of spiking neurons. PMID:22219717
Directory of Open Access Journals (Sweden)
Dejan Pecevski
2011-12-01
Full Text Available An important open problem of computational neuroscience is the generic organization of computations in networks of neurons in the brain. We show here through rigorous theoretical analysis that inherent stochastic features of spiking neurons, in combination with simple nonlinear computational operations in specific network motifs and dendritic arbors, enable networks of spiking neurons to carry out probabilistic inference through sampling in general graphical models. In particular, it enables them to carry out probabilistic inference in Bayesian networks with converging arrows ("explaining away" and with undirected loops, that occur in many real-world tasks. Ubiquitous stochastic features of networks of spiking neurons, such as trial-to-trial variability and spontaneous activity, are necessary ingredients of the underlying computational organization. We demonstrate through computer simulations that this approach can be scaled up to neural emulations of probabilistic inference in fairly large graphical models, yielding some of the most complex computations that have been carried out so far in networks of spiking neurons.
Pecevski, Dejan; Buesing, Lars; Maass, Wolfgang
2011-12-01
An important open problem of computational neuroscience is the generic organization of computations in networks of neurons in the brain. We show here through rigorous theoretical analysis that inherent stochastic features of spiking neurons, in combination with simple nonlinear computational operations in specific network motifs and dendritic arbors, enable networks of spiking neurons to carry out probabilistic inference through sampling in general graphical models. In particular, it enables them to carry out probabilistic inference in Bayesian networks with converging arrows ("explaining away") and with undirected loops, that occur in many real-world tasks. Ubiquitous stochastic features of networks of spiking neurons, such as trial-to-trial variability and spontaneous activity, are necessary ingredients of the underlying computational organization. We demonstrate through computer simulations that this approach can be scaled up to neural emulations of probabilistic inference in fairly large graphical models, yielding some of the most complex computations that have been carried out so far in networks of spiking neurons.
Genetic Network Inference: From Co-Expression Clustering to Reverse Engineering
Dhaeseleer, Patrik; Liang, Shoudan; Somogyi, Roland
2000-01-01
Advances in molecular biological, analytical, and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using high-throughput gene expression assays, we are able to measure the output of the gene regulatory network. We aim here to review datamining and modeling approaches for conceptualizing and unraveling the functional relationships implicit in these datasets. Clustering of co-expression profiles allows us to infer shared regulatory inputs and functional pathways. We discuss various aspects of clustering, ranging from distance measures to clustering algorithms and multiple-duster memberships. More advanced analysis aims to infer causal connections between genes directly, i.e., who is regulating whom and how. We discuss several approaches to the problem of reverse engineering of genetic networks, from discrete Boolean networks, to continuous linear and non-linear models. We conclude that the combination of predictive modeling with systematic experimental verification will be required to gain a deeper insight into living organisms, therapeutic targeting, and bioengineering.
Inference of segmented color and texture description by tensor voting.
Jia, Jiaya; Tang, Chi-Keung
2004-06-01
A robust synthesis method is proposed to automatically infer missing color and texture information from a damaged 2D image by (N)D tensor voting (N > 3). The same approach is generalized to range and 3D data in the presence of occlusion, missing data and noise. Our method translates texture information into an adaptive (N)D tensor, followed by a voting process that infers noniteratively the optimal color values in the (N)D texture space. A two-step method is proposed. First, we perform segmentation based on insufficient geometry, color, and texture information in the input, and extrapolate partitioning boundaries by either 2D or 3D tensor voting to generate a complete segmentation for the input. Missing colors are synthesized using (N)D tensor voting in each segment. Different feature scales in the input are automatically adapted by our tensor scale analysis. Results on a variety of difficult inputs demonstrate the effectiveness of our tensor voting approach.
Statistical inference an integrated approach
Migon, Helio S; Louzada, Francisco
2014-01-01
Introduction Information The concept of probability Assessing subjective probabilities An example Linear algebra and probability Notation Outline of the bookElements of Inference Common statistical modelsLikelihood-based functions Bayes theorem Exchangeability Sufficiency and exponential family Parameter elimination Prior Distribution Entirely subjective specification Specification through functional forms Conjugacy with the exponential family Non-informative priors Hierarchical priors Estimation Introduction to decision theoryBayesian point estimation Classical point estimation Empirical Bayes estimation Comparison of estimators Interval estimation Estimation in the Normal model Approximating Methods The general problem of inference Optimization techniquesAsymptotic theory Other analytical approximations Numerical integration methods Simulation methods Hypothesis Testing Introduction Classical hypothesis testingBayesian hypothesis testing Hypothesis testing and confidence intervalsAsymptotic tests Prediction...
Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas
2016-09-19
Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.
Inferring Gene Regulatory Networks Using Conditional Regulation Pattern to Guide Candidate Genes.
Directory of Open Access Journals (Sweden)
Fei Xiao
Full Text Available Combining path consistency (PC algorithms with conditional mutual information (CMI are widely used in reconstruction of gene regulatory networks. CMI has many advantages over Pearson correlation coefficient in measuring non-linear dependence to infer gene regulatory networks. It can also discriminate the direct regulations from indirect ones. However, it is still a challenge to select the conditional genes in an optimal way, which affects the performance and computation complexity of the PC algorithm. In this study, we develop a novel conditional mutual information-based algorithm, namely RPNI (Regulation Pattern based Network Inference, to infer gene regulatory networks. For conditional gene selection, we define the co-regulation pattern, indirect-regulation pattern and mixture-regulation pattern as three candidate patterns to guide the selection of candidate genes. To demonstrate the potential of our algorithm, we apply it to gene expression data from DREAM challenge. Experimental results show that RPNI outperforms existing conditional mutual information-based methods in both accuracy and time complexity for different sizes of gene samples. Furthermore, the robustness of our algorithm is demonstrated by noisy interference analysis using different types of noise.
Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins
Gaucher, Eric A.; Thomson, J. Michael; Burgan, Michelle F.; Benner, Steven A.
2003-01-01
Features of the physical environment surrounding an ancestral organism can be inferred by reconstructing sequences of ancient proteins made by those organisms, resurrecting these proteins in the laboratory, and measuring their properties. Here, we resurrect candidate sequences for elongation factors of the Tu family (EF-Tu) found at ancient nodes in the bacterial evolutionary tree, and measure their activities as a function of temperature. The ancient EF-Tu proteins have temperature optima of 55-65 degrees C. This value seems to be robust with respect to uncertainties in the ancestral reconstruction. This suggests that the ancient bacteria that hosted these particular genes were thermophiles, and neither hyperthermophiles nor mesophiles. This conclusion can be compared and contrasted with inferences drawn from an analysis of the lengths of branches in trees joining proteins from contemporary bacteria, the distribution of thermophily in derived bacterial lineages, the inferred G + C content of ancient ribosomal RNA, and the geological record combined with assumptions concerning molecular clocks. The study illustrates the use of experimental palaeobiochemistry and assumptions about deep phylogenetic relationships between bacteria to explore the character of ancient life.
Interpretable inference on the mixed effect model with the Box-Cox transformation.
Maruo, K; Yamaguchi, Y; Noma, H; Gosho, M
2017-07-10
We derived results for inference on parameters of the marginal model of the mixed effect model with the Box-Cox transformation based on the asymptotic theory approach. We also provided a robust variance estimator of the maximum likelihood estimator of the parameters of this model in consideration of the model misspecifications. Using these results, we developed an inference procedure for the difference of the model median between treatment groups at the specified occasion in the context of mixed effects models for repeated measures analysis for randomized clinical trials, which provided interpretable estimates of the treatment effect. From simulation studies, it was shown that our proposed method controlled type I error of the statistical test for the model median difference in almost all the situations and had moderate or high performance for power compared with the existing methods. We illustrated our method with cluster of differentiation 4 (CD4) data in an AIDS clinical trial, where the interpretability of the analysis results based on our proposed method is demonstrated. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Fong, Jonathan J; Li, Pi-Peng; Yang, Bao-Tian; Zhou, Zheng-Yan; Leaché, Adam D; Min, Mi-Sook; Waldman, Bruce
2016-04-01
The Oriental fire-bellied toad (Bombina orientalis) is a commonly used study organism, but knowledge of its evolutionary history is incomplete. We analyze sequence data from four genetic markers (mtDNA genes encoding cytochrome c oxidase subunit I, cytochrome b, and 12S-16S rRNA; nuDNA gene encoding recombination activating gene 2) from 188 individuals across its range in Northeast Asia to elucidate phylogeographic patterns and to identify the historic events that shaped its evolutionary history. Although morphologically similar across its range, B. orientalis exhibits phylogeographic structure, which we infer was shaped by geologic, climatic, and anthropogenic events. Phylogenetic and divergence-dating analyses recover four genetically distinct groups of B. orientalis: Lineage 1-Shandong Province and Beijing (China); Lineage 2-Bukhan Mountain (Korea); Lineage 3-Russia, Northeast China, and northern South Korea; and Lineage 4-South Korea. Lineage 2 was previously unknown. Additionally, we discover an area of secondary contact on the Korean Peninsula, and infer a single dispersal event as the origin of the insular Jeju population. Skyline plots estimate different population histories for the four lineages: Lineages 1 and 2 experienced population decreases, Lineage 3 remained stable, while Lineage 4 experienced a sharp increase during the Holocene. The timing of the population expansion of Lineage 4 coincides with the advent of rice cultivation, which may have facilitated the increase in population size by providing additional breeding habitat. Copyright © 2016 Elsevier Inc. All rights reserved.
Directory of Open Access Journals (Sweden)
Xiufang Lin
2016-08-01
Full Text Available Magnetorheological dampers have become prominent semi-active control devices for vibration mitigation of structures which are subjected to severe loads. However, the damping force cannot be controlled directly due to the inherent nonlinear characteristics of the magnetorheological dampers. Therefore, for fully exploiting the capabilities of the magnetorheological dampers, one of the challenging aspects is to develop an accurate inverse model which can appropriately predict the input voltage to control the damping force. In this article, a hybrid modeling strategy combining shuffled frog-leaping algorithm and adaptive-network-based fuzzy inference system is proposed to model the inverse dynamic characteristics of the magnetorheological dampers for improving the modeling accuracy. The shuffled frog-leaping algorithm is employed to optimize the premise parameters of the adaptive-network-based fuzzy inference system while the consequent parameters are tuned by a least square estimation method, here known as shuffled frog-leaping algorithm-based adaptive-network-based fuzzy inference system approach. To evaluate the effectiveness of the proposed approach, the inverse modeling results based on the shuffled frog-leaping algorithm-based adaptive-network-based fuzzy inference system approach are compared with those based on the adaptive-network-based fuzzy inference system and genetic algorithm–based adaptive-network-based fuzzy inference system approaches. Analysis of variance test is carried out to statistically compare the performance of the proposed methods and the results demonstrate that the shuffled frog-leaping algorithm-based adaptive-network-based fuzzy inference system strategy outperforms the other two methods in terms of modeling (training accuracy and checking accuracy.
Vlaic, Sebastian; Hoffmann, Bianca; Kupfer, Peter; Weber, Michael; Dräger, Andreas
2013-09-01
GRN2SBML automatically encodes gene regulatory networks derived from several inference tools in systems biology markup language. Providing a graphical user interface, the networks can be annotated via the simple object access protocol (SOAP)-based application programming interface of BioMart Central Portal and minimum information required in the annotation of models registry. Additionally, we provide an R-package, which processes the output of supported inference algorithms and automatically passes all required parameters to GRN2SBML. Therefore, GRN2SBML closes a gap in the processing pipeline between the inference of gene regulatory networks and their subsequent analysis, visualization and storage. GRN2SBML is freely available under the GNU Public License version 3 and can be downloaded from http://www.hki-jena.de/index.php/0/2/490. General information on GRN2SBML, examples and tutorials are available at the tool's web page.
Object-Oriented Type Inference
DEFF Research Database (Denmark)
Schwartzbach, Michael Ignatieff; Palsberg, Jens
1991-01-01
We present a new approach to inferring types in untyped object-oriented programs with inheritance, assignments, and late binding. It guarantees that all messages are understood, annotates the program with type information, allows polymorphic methods, and can be used as the basis of an op...
Serang, Oliver
2014-01-01
Exact Bayesian inference can sometimes be performed efficiently for special cases where a function has commutative and associative symmetry of its inputs (called “causal independence”). For this reason, it is desirable to exploit such symmetry on big data sets. Here we present a method to exploit a general form of this symmetry on probabilistic adder nodes by transforming those probabilistic adder nodes into a probabilistic convolution tree with which dynamic programming computes exact probabilities. A substantial speedup is demonstrated using an illustration example that can arise when identifying splice forms with bottom-up mass spectrometry-based proteomics. On this example, even state-of-the-art exact inference algorithms require a runtime more than exponential in the number of splice forms considered. By using the probabilistic convolution tree, we reduce the runtime to and the space to where is the number of variables joined by an additive or cardinal operator. This approach, which can also be used with junction tree inference, is applicable to graphs with arbitrary dependency on counting variables or cardinalities and can be used on diverse problems and fields like forward error correcting codes, elemental decomposition, and spectral demixing. The approach also trivially generalizes to multiple dimensions. PMID:24626234
Reward inference by primate prefrontal and striatal neurons.
Pan, Xiaochuan; Fan, Hongwei; Sawa, Kosuke; Tsuda, Ichiro; Tsukada, Minoru; Sakagami, Masamichi
2014-01-22
The brain contains multiple yet distinct systems involved in reward prediction. To understand the nature of these processes, we recorded single-unit activity from the lateral prefrontal cortex (LPFC) and the striatum in monkeys performing a reward inference task using an asymmetric reward schedule. We found that neurons both in the LPFC and in the striatum predicted reward values for stimuli that had been previously well experienced with set reward quantities in the asymmetric reward task. Importantly, these LPFC neurons could predict the reward value of a stimulus using transitive inference even when the monkeys had not yet learned the stimulus-reward association directly; whereas these striatal neurons did not show such an ability. Nevertheless, because there were two set amounts of reward (large and small), the selected striatal neurons were able to exclusively infer the reward value (e.g., large) of one novel stimulus from a pair after directly experiencing the alternative stimulus with the other reward value (e.g., small). Our results suggest that although neurons that predict reward value for old stimuli in the LPFC could also do so for new stimuli via transitive inference, those in the striatum could only predict reward for new stimuli via exclusive inference. Moreover, the striatum showed more complex functions than was surmised previously for model-free learning.
Evolutionary inference via the Poisson Indel Process.
Bouchard-Côté, Alexandre; Jordan, Michael I
2013-01-22
We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments.
Statistical inference for noisy nonlinear ecological dynamic systems.
Wood, Simon N
2010-08-26
Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.
Higher-level fusion for military operations based on abductive inference: proof of principle
Pantaleev, Aleksandar V.; Josephson, John
2006-04-01
The ability of contemporary military commanders to estimate and understand complicated situations already suffers from information overload, and the situation can only grow worse. We describe a prototype application that uses abductive inferencing to fuse information from multiple sensors to evaluate the evidence for higher-level hypotheses that are close to the levels of abstraction needed for decision making (approximately JDL levels 2 and 3). Abductive inference (abduction, inference to the best explanation) is a pattern of reasoning that occurs naturally in diverse settings such as medical diagnosis, criminal investigations, scientific theory formation, and military intelligence analysis. Because abduction is part of common-sense reasoning, implementations of it can produce reasoning traces that are very human understandable. Automated abductive inferencing can be deployed to augment human reasoning, taking advantage of computation to process large amounts of information, and to bypass limits to human attention and short-term memory. We illustrate the workings of the prototype system by describing an example of its use for small-unit military operations in an urban setting. Knowledge was encoded as it might be captured prior to engagement from a standard military decision making process (MDMP) and analysis of commander's priority intelligence requirements (PIR). The system is able to reasonably estimate the evidence for higher-level hypotheses based on information from multiple sensors. Its inference processes can be examined closely to verify correctness. Decision makers can override conclusions at any level and changes will propagate appropriately.
An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.
Liang, Ying; Liao, Bo; Zhu, Wen
2017-01-01
Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.
System Support for Forensic Inference
Gehani, Ashish; Kirchner, Florent; Shankar, Natarajan
Digital evidence is playing an increasingly important role in prosecuting crimes. The reasons are manifold: financially lucrative targets are now connected online, systems are so complex that vulnerabilities abound and strong digital identities are being adopted, making audit trails more useful. If the discoveries of forensic analysts are to hold up to scrutiny in court, they must meet the standard for scientific evidence. Software systems are currently developed without consideration of this fact. This paper argues for the development of a formal framework for constructing “digital artifacts” that can serve as proxies for physical evidence; a system so imbued would facilitate sound digital forensic inference. A case study involving a filesystem augmentation that provides transparent support for forensic inference is described.
Cheng, Shanmei; Qiong, La; Lu, Fan; Yonezawa, Takahiro; Yin, Ganqiang; Song, Zhiping; Wang, Yuguo; Yang, Ji; Zhang, Wenju
2017-06-01
Wu hypothesized that the Tibetan flora originated mostly from the paleotropical Tertiary flora in the Hengduan Mountains by adapting to the cold and arid environments associated with the strong uplift of the Qinghai-Tibet Plateau (QTP). Here, we combine the phylogeographic history of Sophora moorcroftiana with that of Sophora davidii to explore the speciation of S. moorcroftiana to test this hypothesis. We collected 151 individuals from 17 populations and sequenced 2 chloroplast fragments and the internal transcribed spacer of rDNA. Five chlorotypes and 9 ribotypes were detected but no significant phylogeographic structure was revealed. The integrated results of phylogeographic studies of these 2 species clearly support the progenitor-derivative relationship between them. We infer that the western peripheral population of S. davidii migrated westwards from the Hengduan Mountains to the middle reaches of the Yarlung Zangbo River and differentiated from its ancestor in the process of adaptation to increasingly cold and arid environments with the uplift of the QTP and finally evolved into S. moorcroftiana during the Late Pliocene. In addition, our findings shed light on the idea that natural selection, as imposed by climate differentiation (especially mean diurnal range and precipitation seasonality), directly drove this peripatric speciation event after geographic isolation. The speciation of S. moorcroftiana is a strong case supporting Wu's hypothesis about the origin of Tibet's flora. © The American Genetic Association 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Geostatistical inference using crosshole ground-penetrating radar
DEFF Research Database (Denmark)
Looms, Majken C; Hansen, Thomas Mejer; Cordua, Knud Skou
2010-01-01
of the subsurface are used to evaluate the uncertainty of the inversion estimate. We have explored the full potential of the geostatistical inference method using several synthetic models of varying correlation structures and have tested the influence of different assumptions concerning the choice of covariance...... reflection profile. Furthermore, the inferred values of the subsurface global variance and the mean velocity have been corroborated with moisturecontent measurements, obtained gravimetrically from samples collected at the field site....
Bernardes, Samuel C; Pepato, Almir R; von Rintelen, Thomas; von Rintelen, Kristina; Page, Timothy J; Freitag, Hendrik; de Bruyn, Mark
2017-08-22
The evolutionary history of the old, diverse freshwater shrimp genus Caridina is still poorly understood, despite its vast distribution - from Africa to Polynesia. Here, we used nuclear and mitochondrial DNA to infer the phylogeographic and evolutionary history of C. typus, which is one of only four species distributed across the entire range of the genus. Despite this species' potential for high levels of gene flow, questions have been raised regarding its phylogeographic structure and taxonomic status. We identified three distinct lineages that likely diverged in the Miocene. Molecular dating and ancestral range reconstructions are congruent with C. typus' early dispersal to Africa, possibly mediated by the Miocene Indian Ocean Equatorial Jet, followed by back dispersal to Australasia after the Jet's closure. Furthermore, several different species delimitation methods indicate each lineage represents a distinct (cryptic) species, contradicting current morphospecies delimitation of a single C. typus taxon. The evolutionary history of C. typus lineages is complex, in which ancient oceanic current systems and (currently unrecognised) speciation events preceded secondary sympatry of these cryptic species.
Analogy in causal inference: rethinking Austin Bradford Hill's neglected consideration.
Weed, Douglas L
2018-05-01
The purpose of this article was to rethink and resurrect Austin Bradford Hill's "criterion" of analogy as an important consideration in causal inference. In epidemiology today, analogy is either completely ignored (e.g., in many textbooks), or equated with biologic plausibility or coherence, or aligned with the scientist's imagination. None of these examples, however, captures Hill's description of analogy. His words suggest that there may be something gained by contrasting two bodies of evidence, one from an established causal relationship, the other not. Coupled with developments in the methods of systematic assessments of evidence-including but not limited to meta-analysis-analogy can be restructured as a key component in causal inference. This new approach will require that a collection-a library-of known cases of causal inference (i.e., bodies of evidence involving established causal relationships) be developed. This library would likely include causal assessments by organizations such as the International Agency for Research on Cancer, the National Toxicology Program, and the United States Environmental Protection Agency. In addition, a process for describing key features of a causal relationship would need to be developed along with what will be considered paradigm cases of causation. Finally, it will be important to develop ways to objectively compare a "new" body of evidence with the relevant paradigm case of causation. Analogy, along with all other existing methods and causal considerations, may improve our ability to identify causal relationships. Copyright © 2018 Elsevier Inc. All rights reserved.
STRATEGIES IN SEISMIC INFERENCE OF SUPERGRANULAR FLOWS ON THE SUN
Energy Technology Data Exchange (ETDEWEB)
Bhattacharya, Jishnu; Hanasoge, Shravan M. [Department of Astronomy and Astrophysics, Tata Institute of Fundamental Research, Mumbai-400005 (India)
2016-08-01
Observations of the solar surface reveal the presence of flows with length scales of around 35 Mm, commonly referred to as supergranules. Inferring the subsurface flow profile of supergranules from measurements of the surface and photospheric wavefield is an important challenge faced by helioseismology. Traditionally, the inverse problem has been approached by studying the linear response of seismic waves in a horizontally translationally invariant background to the presence of the supergranule; following an iterative approach that does not depend on horizontal translational invariance might perform better, since the misfit can be analyzed post iterations. In this work, we construct synthetic observations using a reference supergranule and invert for the flow profile using surface measurements of travel times of waves belonging to modal ridges f (surface gravity) and p {sub 1} through p {sub 7} (acoustic). We study the extent to which individual modes and their combinations contribute to infer the flow. We show that this method of nonlinear iterative inversion tends to underestimate the flow velocities, as well as inferring a shallower flow profile, with significant deviations from the reference supergranule near the surface. We carry out a similar analysis for a sound-speed perturbation and find that analogous near-surface deviations persist, although the iterations converge faster and more accurately. We conclude that a better approach to inversion would be to expand the supergranule profile in an appropriate basis, thereby reducing the number of parameters being inverted for and appropriately regularizing them.
Bayesian Inference for Signal-Based Seismic Monitoring
Moore, D.
2015-12-01
Traditional seismic monitoring systems rely on discrete detections produced by station processing software, discarding significant information present in the original recorded signal. SIG-VISA (Signal-based Vertically Integrated Seismic Analysis) is a system for global seismic monitoring through Bayesian inference on seismic signals. By modeling signals directly, our forward model is able to incorporate a rich representation of the physics underlying the signal generation process, including source mechanisms, wave propagation, and station response. This allows inference in the model to recover the qualitative behavior of recent geophysical methods including waveform matching and double-differencing, all as part of a unified Bayesian monitoring system that simultaneously detects and locates events from a global network of stations. We demonstrate recent progress in scaling up SIG-VISA to efficiently process the data stream of global signals recorded by the International Monitoring System (IMS), including comparisons against existing processing methods that show increased sensitivity from our signal-based model and in particular the ability to locate events (including aftershock sequences that can tax analyst processing) precisely from waveform correlation effects. We also provide a Bayesian analysis of an alleged low-magnitude event near the DPRK test site in May 2010 [1] [2], investigating whether such an event could plausibly be detected through automated processing in a signal-based monitoring system. [1] Zhang, Miao and Wen, Lianxing. "Seismological Evidence for a Low-Yield Nuclear Test on 12 May 2010 in North Korea". Seismological Research Letters, January/February 2015. [2] Richards, Paul. "A Seismic Event in North Korea on 12 May 2010". CTBTO SnT 2015 oral presentation, video at https://video-archive.ctbto.org/index.php/kmc/preview/partner_id/103/uiconf_id/4421629/entry_id/0_ymmtpps0/delivery/http
Estimation of tool wear length in finish milling using a fuzzy inference algorithm
Ko, Tae Jo; Cho, Dong Woo
1993-10-01
The geometric accuracy and surface roughness are mainly affected by the flank wear at the minor cutting edge in finish machining. A fuzzy estimator obtained by a fuzzy inference algorithm with a max-min composition rule to evaluate the minor flank wear length in finish milling is introduced. The features sensitive to minor flank wear are extracted from the dispersion analysis of a time series AR model of the feed directional acceleration of the spindle housing. Linguistic rules for fuzzy estimation are constructed using these features, and then fuzzy inferences are carried out with test data sets under various cutting conditions. The proposed system turns out to be effective for estimating minor flank wear length, and its mean error is less than 12%.
An Intelligent Inference System for Robot Hand Optimal Grasp Preshaping
Directory of Open Access Journals (Sweden)
Cabbar Veysel Baysal
2010-11-01
Full Text Available This paper presents a novel Intelligent Inference System (IIS for the determination of an optimum preshape for multifingered robot hand grasping, given object under a manipulation task. The IIS is formed as hybrid agent architecture, by the synthesis of object properties, manipulation task characteristics, grasp space partitioning, lowlevel kinematical analysis, evaluation of contact wrench patterns via fuzzy approximate reasoning and ANN structure for incremental learning. The IIS is implemented in software with a robot hand simulation.
Zonta, Zivko J; Flotats, Xavier; Magrí, Albert
2014-08-01
The procedure commonly used for the assessment of the parameters included in activated sludge models (ASMs) relies on the estimation of their optimal value within a confidence region (i.e. frequentist inference). Once optimal values are estimated, parameter uncertainty is computed through the covariance matrix. However, alternative approaches based on the consideration of the model parameters as probability distributions (i.e. Bayesian inference), may be of interest. The aim of this work is to apply (and compare) both Bayesian and frequentist inference methods when assessing uncertainty for an ASM-type model, which considers intracellular storage and biomass growth, simultaneously. Practical identifiability was addressed exclusively considering respirometric profiles based on the oxygen uptake rate and with the aid of probabilistic global sensitivity analysis. Parameter uncertainty was thus estimated according to both the Bayesian and frequentist inferential procedures. Results were compared in order to evidence the strengths and weaknesses of both approaches. Since it was demonstrated that Bayesian inference could be reduced to a frequentist approach under particular hypotheses, the former can be considered as a more generalist methodology. Hence, the use of Bayesian inference is encouraged for tackling inferential issues in ASM environments.
Xiao, Yongshuang; Zhang, Yan; Yanagimoto, Takashi; Li, Jun; Xiao, Zhizhong; Gao, Tianxiang; Xu, Shihong; Ma, Daoyuan
2011-02-01
Intraspecific phylogenies can provide useful insights into how populations have been shaped by historical and contemporary processes. To determine the population genetic structure and the demographic and colonization history of Cleisthenes herzensteini in the Northwestern Pacific, one hundred and twenty-one individuals were sampled from six localities along the coastal regions of Japan and the Yellow Sea of China. Mitochondrial DNA variation was analyzed using DNA sequence data from the 5' end of control region. High levels of haplotype diversity (>0.96) were found for all populations, indicating a high level of genetic diversity. No pattern of isolation by distance was detected among the population differentiation throughout the examined range. Analyses of molecular variance (AMOVA) and the conventional population statistic Fst revealed no significant population genetic structure among populations. According to the exact test of differentiation among populations, the null hypothesis that C. herzensteini within the examined range constituted a non-differential mtDNA gene pool was accepted. The demographic history of C. herzensteini was examined using neutrality test and mismatch distribution analyses and results indicated Pleistocene population expansion (about 94-376 kya) in the species, which was consistent with the inference result of nested clade phylogeographical analysis (NCPA) showing contiguous range expansion for C. herzensteini. The lack of phylogeographical structure for the species may reflect a recent range expansion after the glacial maximum and insufficient time to attain migration-drift equilibrium.
Working memory supports inference learning just like classification learning.
Craig, Stewart; Lewandowsky, Stephan
2013-08-01
Recent research has found a positive relationship between people's working memory capacity (WMC) and their speed of category learning. To date, only classification-learning tasks have been considered, in which people learn to assign category labels to objects. It is unknown whether learning to make inferences about category features might also be related to WMC. We report data from a study in which 119 participants undertook classification learning and inference learning, and completed a series of WMC tasks. Working memory capacity was positively related to people's classification and inference learning performance.
Statistical inference for stochastic processes
National Research Council Canada - National Science Library
Basawa, Ishwar V; Prakasa Rao, B. L. S
1980-01-01
The aim of this monograph is to attempt to reduce the gap between theory and applications in the area of stochastic modelling, by directing the interest of future researchers to the inference aspects...
A method of inferring k-infinity from reaction rate measurements in thermal reactor systems
International Nuclear Information System (INIS)
Newmarch, D.A.
1967-05-01
A scheme is described for inferring a value of k-infinity from reaction rate measurements. The method is devised with the METHUSELAH group structure in mind and was developed for the analysis of S.G.H.W. reactor experiments; the underlying principles, however, are general. (author)
Inference of Large Phylogenies Using Neighbour-Joining
DEFF Research Database (Denmark)
Simonsen, Martin; Mailund, Thomas; Pedersen, Christian Nørgaard Storm
2011-01-01
The neighbour-joining method is a widely used method for phylogenetic reconstruction which scales to thousands of taxa. However, advances in sequencing technology have made data sets with more than 10,000 related taxa widely available. Inference of such large phylogenies takes hours or days using...... the Neighbour-Joining method on a normal desktop computer because of the O(n^3) running time. RapidNJ is a search heuristic which reduce the running time of the Neighbour-Joining method significantly but at the cost of an increased memory consumption making inference of large phylogenies infeasible. We present...... two extensions for RapidNJ which reduce the memory requirements and \\makebox{allows} phylogenies with more than 50,000 taxa to be inferred efficiently on a desktop computer. Furthermore, an improved version of the search heuristic is presented which reduces the running time of RapidNJ on many data...
The anatomy of choice: active inference and agency
Directory of Open Access Journals (Sweden)
Karl eFriston
2013-09-01
Full Text Available This paper considers agency in the setting of embodied or active inference. In brief, we associate a sense of agency with prior beliefs about action and ask what sorts of beliefs underlie optimal behaviour. In particular, we consider prior beliefs that action minimises the Kullback-Leibler divergence between desired states and attainable states in the future. This allows one to formulate bounded rationality as approximate Bayesian inference that optimises a free energy bound on model evidence. We show that constructs like expected utility, exploration bonuses, softmax choice rules and optimism bias emerge as natural consequences of this formulation. Previous accounts of active inference have focused on predictive coding and Bayesian filtering schemes for minimising free energy. Here, we consider variational Bayes as an alternative scheme that provides formal constraints on the computational anatomy of inference and action – constraints that are remarkably consistent with neuroanatomy. Furthermore, this scheme contextualises optimal decision theory and economic (utilitarian formulations as pure inference problems. For example, expected utility theory emerges as a special case of free energy minimisation, where the sensitivity or inverse temperature (of softmax functions and quantal response equilibria has a unique and Bayes-optimal solution – that minimises free energy. This sensitivity corresponds to the precision of beliefs about behaviour, such that attainable goals are afforded a higher precision or confidence. In turn, this means that optimal behaviour entails a representation of confidence about outcomes that are under an agent's control.
The anatomy of choice: active inference and agency.
Friston, Karl; Schwartenbeck, Philipp; Fitzgerald, Thomas; Moutoussis, Michael; Behrens, Timothy; Dolan, Raymond J
2013-01-01
This paper considers agency in the setting of embodied or active inference. In brief, we associate a sense of agency with prior beliefs about action and ask what sorts of beliefs underlie optimal behavior. In particular, we consider prior beliefs that action minimizes the Kullback-Leibler (KL) divergence between desired states and attainable states in the future. This allows one to formulate bounded rationality as approximate Bayesian inference that optimizes a free energy bound on model evidence. We show that constructs like expected utility, exploration bonuses, softmax choice rules and optimism bias emerge as natural consequences of this formulation. Previous accounts of active inference have focused on predictive coding and Bayesian filtering schemes for minimizing free energy. Here, we consider variational Bayes as an alternative scheme that provides formal constraints on the computational anatomy of inference and action-constraints that are remarkably consistent with neuroanatomy. Furthermore, this scheme contextualizes optimal decision theory and economic (utilitarian) formulations as pure inference problems. For example, expected utility theory emerges as a special case of free energy minimization, where the sensitivity or inverse temperature (of softmax functions and quantal response equilibria) has a unique and Bayes-optimal solution-that minimizes free energy. This sensitivity corresponds to the precision of beliefs about behavior, such that attainable goals are afforded a higher precision or confidence. In turn, this means that optimal behavior entails a representation of confidence about outcomes that are under an agent's control.
Universal Darwinism As a Process of Bayesian Inference.
Campbell, John O
2016-01-01
Many of the mathematical frameworks describing natural selection are equivalent to Bayes' Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus, natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an "experiment" in the external world environment, and the results of that "experiment" or the "surprise" entailed by predicted and actual outcomes of the "experiment." Minimization of free energy implies that the implicit measure of "surprise" experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature.
Directory of Open Access Journals (Sweden)
Richard R Stein
Full Text Available The intestinal microbiota is a microbial ecosystem of crucial importance to human health. Understanding how the microbiota confers resistance against enteric pathogens and how antibiotics disrupt that resistance is key to the prevention and cure of intestinal infections. We present a novel method to infer microbial community ecology directly from time-resolved metagenomics. This method extends generalized Lotka-Volterra dynamics to account for external perturbations. Data from recent experiments on antibiotic-mediated Clostridium difficile infection is analyzed to quantify microbial interactions, commensal-pathogen interactions, and the effect of the antibiotic on the community. Stability analysis reveals that the microbiota is intrinsically stable, explaining how antibiotic perturbations and C. difficile inoculation can produce catastrophic shifts that persist even after removal of the perturbations. Importantly, the analysis suggests a subnetwork of bacterial groups implicated in protection against C. difficile. Due to its generality, our method can be applied to any high-resolution ecological time-series data to infer community structure and response to external stimuli.
Stein, Richard R; Bucci, Vanni; Toussaint, Nora C; Buffie, Charlie G; Rätsch, Gunnar; Pamer, Eric G; Sander, Chris; Xavier, João B
2013-01-01
The intestinal microbiota is a microbial ecosystem of crucial importance to human health. Understanding how the microbiota confers resistance against enteric pathogens and how antibiotics disrupt that resistance is key to the prevention and cure of intestinal infections. We present a novel method to infer microbial community ecology directly from time-resolved metagenomics. This method extends generalized Lotka-Volterra dynamics to account for external perturbations. Data from recent experiments on antibiotic-mediated Clostridium difficile infection is analyzed to quantify microbial interactions, commensal-pathogen interactions, and the effect of the antibiotic on the community. Stability analysis reveals that the microbiota is intrinsically stable, explaining how antibiotic perturbations and C. difficile inoculation can produce catastrophic shifts that persist even after removal of the perturbations. Importantly, the analysis suggests a subnetwork of bacterial groups implicated in protection against C. difficile. Due to its generality, our method can be applied to any high-resolution ecological time-series data to infer community structure and response to external stimuli.
On principles of inductive inference
Kostecki, Ryszard Paweł
2011-01-01
We propose an intersubjective epistemic approach to foundations of probability theory and statistical inference, based on relative entropy and category theory, and aimed to bypass the mathematical and conceptual problems of existing foundational approaches.
Directory of Open Access Journals (Sweden)
Tania Anaid Gutiérrez-García
2013-12-01
Full Text Available Central America is an ideal region for comparative phylogeographic studies because of its intricate geologic and biogeographic history, diversity of habitats and dynamic climatic and tectonic history. The aim of this work was to assess the phylogeography of two rodents codistributed throughout Central America, in order to identify if they show concordant genetic and phylogeographic patterns. The synopsis includes four parts: (1 an overview of the field of comparative phylogeography; (2 a detailed review that describes how genetic and geologic studies can be combined to elucidate general patterns of the biogeographic and evolutionary history of Central America; and a phylogeographic analysis of two species at both the (3 intraspecific and (4 comparative phylogeographic levels. The last incorporates specific ecological features and evaluates their influence on the species’ genetic patterns. Results showed a concordant genetic structure influenced by geographic distance for both rodents, but dissimilar dispersal patterns due to ecological features and life history.
Evidence cross-validation and Bayesian inference of MAST plasma equilibria
Energy Technology Data Exchange (ETDEWEB)
Nessi, G. T. von; Hole, M. J. [Research School of Physical Sciences and Engineering, Australian National University, Canberra ACT 0200 (Australia); Svensson, J. [Max-Planck-Institut fuer Plasmaphysik, D-17491 Greifswald (Germany); Appel, L. [EURATOM/CCFE Fusion Association, Culham Science Centre, Abingdon, Oxon OX14 3DB (United Kingdom)
2012-01-15
In this paper, current profiles for plasma discharges on the mega-ampere spherical tokamak are directly calculated from pickup coil, flux loop, and motional-Stark effect observations via methods based in the statistical theory of Bayesian analysis. By representing toroidal plasma current as a series of axisymmetric current beams with rectangular cross-section and inferring the current for each one of these beams, flux-surface geometry and q-profiles are subsequently calculated by elementary application of Biot-Savart's law. The use of this plasma model in the context of Bayesian analysis was pioneered by Svensson and Werner on the joint-European tokamak [Svensson and Werner,Plasma Phys. Controlled Fusion 50(8), 085002 (2008)]. In this framework, linear forward models are used to generate diagnostic predictions, and the probability distribution for the currents in the collection of plasma beams was subsequently calculated directly via application of Bayes' formula. In this work, we introduce a new diagnostic technique to identify and remove outlier observations associated with diagnostics falling out of calibration or suffering from an unidentified malfunction. These modifications enable a good agreement between Bayesian inference of the last-closed flux-surface with other corroborating data, such as that from force balance considerations using EFIT++[Appel et al., ''A unified approach to equilibrium reconstruction'' Proceedings of the 33rd EPS Conference on Plasma Physics (Rome, Italy, 2006)]. In addition, this analysis also yields errors on the plasma current profile and flux-surface geometry as well as directly predicting the Shafranov shift of the plasma core.
Evidence cross-validation and Bayesian inference of MAST plasma equilibria
International Nuclear Information System (INIS)
Nessi, G. T. von; Hole, M. J.; Svensson, J.; Appel, L.
2012-01-01
In this paper, current profiles for plasma discharges on the mega-ampere spherical tokamak are directly calculated from pickup coil, flux loop, and motional-Stark effect observations via methods based in the statistical theory of Bayesian analysis. By representing toroidal plasma current as a series of axisymmetric current beams with rectangular cross-section and inferring the current for each one of these beams, flux-surface geometry and q-profiles are subsequently calculated by elementary application of Biot-Savart's law. The use of this plasma model in the context of Bayesian analysis was pioneered by Svensson and Werner on the joint-European tokamak [Svensson and Werner,Plasma Phys. Controlled Fusion 50(8), 085002 (2008)]. In this framework, linear forward models are used to generate diagnostic predictions, and the probability distribution for the currents in the collection of plasma beams was subsequently calculated directly via application of Bayes' formula. In this work, we introduce a new diagnostic technique to identify and remove outlier observations associated with diagnostics falling out of calibration or suffering from an unidentified malfunction. These modifications enable a good agreement between Bayesian inference of the last-closed flux-surface with other corroborating data, such as that from force balance considerations using EFIT++[Appel et al., ''A unified approach to equilibrium reconstruction'' Proceedings of the 33rd EPS Conference on Plasma Physics (Rome, Italy, 2006)]. In addition, this analysis also yields errors on the plasma current profile and flux-surface geometry as well as directly predicting the Shafranov shift of the plasma core.
Model averaging, optimal inference and habit formation
Directory of Open Access Journals (Sweden)
Thomas H B FitzGerald
2014-06-01
Full Text Available Postulating that the brain performs approximate Bayesian inference generates principled and empirically testable models of neuronal function – the subject of much current interest in neuroscience and related disciplines. Current formulations address inference and learning under some assumed and particular model. In reality, organisms are often faced with an additional challenge – that of determining which model or models of their environment are the best for guiding behaviour. Bayesian model averaging – which says that an agent should weight the predictions of different models according to their evidence – provides a principled way to solve this problem. Importantly, because model evidence is determined by both the accuracy and complexity of the model, optimal inference requires that these be traded off against one another. This means an agent’s behaviour should show an equivalent balance. We hypothesise that Bayesian model averaging plays an important role in cognition, given that it is both optimal and realisable within a plausible neuronal architecture. We outline model averaging and how it might be implemented, and then explore a number of implications for brain and behaviour. In particular, we propose that model averaging can explain a number of apparently suboptimal phenomena within the framework of approximate (bounded Bayesian inference, focussing particularly upon the relationship between goal-directed and habitual behaviour.
Directory of Open Access Journals (Sweden)
Ricardo Augusto Perera
2018-04-01
Full Text Available The use of new paradigms of false belief tasks (FBT allowed to reduce the age of children who pass the test from the previous 4 years in the standard version to only 15 months or even a striking 6 months in the nonverbal modification. These results are often taken as evidence that infants already possess an—at least implicit—theory of mind (ToM. We criticize this inferential leap on the grounds that inferring a ToM from the predictive success on a false belief task requires to assume as premise that a belief reasoning is a necessary condition for correct action prediction. It is argued that the FBT does not satisfactorily constrain the predictive means, leaving room for the use of belief-independent inferences (that can rely on the attribution of non-representational mental states or the consideration of behavioral patterns that dispense any reference to other minds. These heuristics, when applied to the FBT, can achieve the same predictive success of a belief-based inference because information provided by the test stimulus allows the recognition of particular situations that can be subsumed by their ‘laws’. Instead of solving this issue by designing a single experimentum crucis that would render unfeasible the use of non-representational inferences, we suggest the application of a set of tests in which, although individually they can support inferences dissociated from a ToM, only an inference that makes use of false beliefs is able to correctly predict all the outcomes.
Hierarchial mark-recapture models: a framework for inference about demographic processes
Link, W.A.; Barker, R.J.
2004-01-01
The development of sophisticated mark-recapture models over the last four decades has provided fundamental tools for the study of wildlife populations, allowing reliable inference about population sizes and demographic rates based on clearly formulated models for the sampling processes. Mark-recapture models are now routinely described by large numbers of parameters. These large models provide the next challenge to wildlife modelers: the extraction of signal from noise in large collections of parameters. Pattern among parameters can be described by strong, deterministic relations (as in ultrastructural models) but is more flexibly and credibly modeled using weaker, stochastic relations. Trend in survival rates is not likely to be manifest by a sequence of values falling precisely on a given parametric curve; rather, if we could somehow know the true values, we might anticipate a regression relation between parameters and explanatory variables, in which true value equals signal plus noise. Hierarchical models provide a useful framework for inference about collections of related parameters. Instead of regarding parameters as fixed but unknown quantities, we regard them as realizations of stochastic processes governed by hyperparameters. Inference about demographic processes is based on investigation of these hyperparameters. We advocate the Bayesian paradigm as a natural, mathematically and scientifically sound basis for inference about hierarchical models. We describe analysis of capture-recapture data from an open population based on hierarchical extensions of the Cormack-Jolly-Seber model. In addition to recaptures of marked animals, we model first captures of animals and losses on capture, and are thus able to estimate survival probabilities w (i.e., the complement of death or permanent emigration) and per capita growth rates f (i.e., the sum of recruitment and immigration rates). Covariation in these rates, a feature of demographic interest, is explicitly
Statistical Models for Inferring Vegetation Composition from Fossil Pollen
Paciorek, C.; McLachlan, J. S.; Shang, Z.
2011-12-01
Fossil pollen provide information about vegetation composition that can be used to help understand how vegetation has changed over the past. However, these data have not traditionally been analyzed in a way that allows for statistical inference about spatio-temporal patterns and trends. We build a Bayesian hierarchical model called STEPPS (Spatio-Temporal Empirical Prediction from Pollen in Sediments) that predicts forest composition in southern New England, USA, over the last two millenia based on fossil pollen. The critical relationships between abundances of tree taxa in the pollen record and abundances in actual vegetation are estimated using modern (Forest Inventory Analysis) data and (witness tree) data from colonial records. This gives us two time points at which both pollen and direct vegetation data are available. Based on these relationships, and incorporating our uncertainty about them, we predict forest composition using fossil pollen. We estimate the spatial distribution and relative abundances of tree species and draw inference about how these patterns have changed over time. Finally, we describe ongoing work to extend the modeling to the upper Midwest of the U.S., including an approach to infer tree density and thereby estimate the prairie-forest boundary in Minnesota and Wisconsin. This work is part of the PalEON project, which brings together a team of ecosystem modelers, paleoecologists, and statisticians with the goal of reconstructing vegetation responses to climate during the last two millenia in the northeastern and midwestern United States. The estimates from the statistical modeling will be used to assess and calibrate ecosystem models that are used to project ecological changes in response to global change.
Bootstrapping phylogenies inferred from rearrangement data
Directory of Open Access Journals (Sweden)
Lin Yu
2012-08-01
Full Text Available Abstract Background Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be resampled; yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models. Results We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data; our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework; we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap for sequence data and of our corresponding new approach for rearrangement data over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches. Conclusions Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its
Bootstrapping phylogenies inferred from rearrangement data.
Lin, Yu; Rajan, Vaibhav; Moret, Bernard Me
2012-08-29
Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be resampled; yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models. We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data; our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework; we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap for sequence data and of our corresponding new approach for rearrangement data over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches. Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its support values follow a similar scale and its receiver
Classification versus inference learning contrasted with real-world categories.
Jones, Erin L; Ross, Brian H
2011-07-01
Categories are learned and used in a variety of ways, but the research focus has been on classification learning. Recent work contrasting classification with inference learning of categories found important later differences in category performance. However, theoretical accounts differ on whether this is due to an inherent difference between the tasks or to the implementation decisions. The inherent-difference explanation argues that inference learners focus on the internal structure of the categories--what each category is like--while classification learners focus on diagnostic information to predict category membership. In two experiments, using real-world categories and controlling for earlier methodological differences, inference learners learned more about what each category was like than did classification learners, as evidenced by higher performance on a novel classification test. These results suggest that there is an inherent difference between learning new categories by classifying an item versus inferring a feature.
Statistical inference via fiducial methods
Salomé, Diemer
1998-01-01
In this thesis the attention is restricted to inductive reasoning using a mathematical probability model. A statistical procedure prescribes, for every theoretically possible set of data, the inference about the unknown of interest. ... Zie: Summary
The Heuristic Value of p in Inductive Statistical Inference
Directory of Open Access Journals (Sweden)
Joachim I. Krueger
2017-06-01
Full Text Available Many statistical methods yield the probability of the observed data – or data more extreme – under the assumption that a particular hypothesis is true. This probability is commonly known as ‘the’ p-value. (Null Hypothesis Significance Testing ([NH]ST is the most prominent of these methods. The p-value has been subjected to much speculation, analysis, and criticism. We explore how well the p-value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p-value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.
The Heuristic Value of p in Inductive Statistical Inference.
Krueger, Joachim I; Heck, Patrick R
2017-01-01
Many statistical methods yield the probability of the observed data - or data more extreme - under the assumption that a particular hypothesis is true. This probability is commonly known as 'the' p -value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p -value has been subjected to much speculation, analysis, and criticism. We explore how well the p -value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p -value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p -value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.
Information-Theoretic Inference of Large Transcriptional Regulatory Networks
Directory of Open Access Journals (Sweden)
Meyer Patrick
2007-01-01
Full Text Available The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR, an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.
Information-Theoretic Inference of Large Transcriptional Regulatory Networks
Directory of Open Access Journals (Sweden)
Patrick E. Meyer
2007-06-01
Full Text Available The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR, an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.
Directory of Open Access Journals (Sweden)
Raşit Bilgin
Full Text Available The combination of habitat loss, climate change, direct persecution, introduced species and other components of the global environmental crisis has resulted in a rapid loss of biodiversity, including species, population and genetic diversity. Birds, which inhabit a wide spectrum of different habitat types, are particularly sensitive to and indicative of environmental changes. The Caucasus endemic bird area, part of which covers northeastern Turkey, is one of the world's key regions harboring a unique bird community threatened with habitat loss. More than 75% of all bird species native to Turkey have been recorded in this region, in particular along the Kars-Iğdır migratory corridor, stopover, wintering and breeding sites along the Aras River, whose wetlands harbor at least 264 bird species. In this study, DNA barcoding technique was used for evaluating the genetic diversity of land bird species of Aras River Bird Paradise at the confluence of Aras River and Iğdır Plains key biodiversity areas. Seventy three COI sequences from 33 common species and 26 different genera were newly generated and used along with 301 sequences that were retrieved from the Barcoding of Life Database (BOLD. Using the sequences obtained in this study, we made global phylogeographic comparisons to define four categories of species, based on barcoding suitability, intraspecific divergence and taxonomy. Our findings indicate that the landbird community of northeastern Turkey has a genetical signature mostly typical of northern Palearctic bird communities while harboring some unique variations. The study also provides a good example of how DNA barcoding can build upon its primary mission of species identification and use available data to integrate genetic variation investigated at the local scale into a global framework. However, the rich bird community of the Aras River wetlands is highly threatened with the imminent construction of the Tuzluca Dam by the government.
Human synthetic lethal inference as potential anti-cancer target gene detection
Directory of Open Access Journals (Sweden)
Solé Ricard V
2009-12-01
Full Text Available Abstract Background Two genes are called synthetic lethal (SL if mutation of either alone is not lethal, but mutation of both leads to death or a significant decrease in organism's fitness. The detection of SL gene pairs constitutes a promising alternative for anti-cancer therapy. As cancer cells exhibit a large number of mutations, the identification of these mutated genes' SL partners may provide specific anti-cancer drug candidates, with minor perturbations to the healthy cells. Since existent SL data is mainly restricted to yeast screenings, the road towards human SL candidates is limited to inference methods. Results In the present work, we use phylogenetic analysis and database manipulation (BioGRID for interactions, Ensembl and NCBI for homology, Gene Ontology for GO attributes in order to reconstruct the phylogenetically-inferred SL gene network for human. In addition, available data on cancer mutated genes (COSMIC and Cancer Gene Census databases as well as on existent approved drugs (DrugBank database supports our selection of cancer-therapy candidates. Conclusions Our work provides a complementary alternative to the current methods for drug discovering and gene target identification in anti-cancer research. Novel SL screening analysis and the use of highly curated databases would contribute to improve the results of this methodology.
IMAGINE: Interstellar MAGnetic field INference Engine
Steininger, Theo
2018-03-01
IMAGINE (Interstellar MAGnetic field INference Engine) performs inference on generic parametric models of the Galaxy. The modular open source framework uses highly optimized tools and technology such as the MultiNest sampler (ascl:1109.006) and the information field theory framework NIFTy (ascl:1302.013) to create an instance of the Milky Way based on a set of parameters for physical observables, using Bayesian statistics to judge the mismatch between measured data and model prediction. The flexibility of the IMAGINE framework allows for simple refitting for newly available data sets and makes state-of-the-art Bayesian methods easily accessible particularly for random components of the Galactic magnetic field.
PAnalyzer: A software tool for protein inference in shotgun proteomics
Directory of Open Access Journals (Sweden)
Prieto Gorka
2012-11-01
Full Text Available Abstract Background Protein inference from peptide identifications in shotgun proteomics must deal with ambiguities that arise due to the presence of peptides shared between different proteins, which is common in higher eukaryotes. Recently data independent acquisition (DIA approaches have emerged as an alternative to the traditional data dependent acquisition (DDA in shotgun proteomics experiments. MSE is the term used to name one of the DIA approaches used in QTOF instruments. MSE data require specialized software to process acquired spectra and to perform peptide and protein identifications. However the software available at the moment does not group the identified proteins in a transparent way by taking into account peptide evidence categories. Furthermore the inspection, comparison and report of the obtained results require tedious manual intervention. Here we report a software tool to address these limitations for MSE data. Results In this paper we present PAnalyzer, a software tool focused on the protein inference process of shotgun proteomics. Our approach considers all the identified proteins and groups them when necessary indicating their confidence using different evidence categories. PAnalyzer can read protein identification files in the XML output format of the ProteinLynx Global Server (PLGS software provided by Waters Corporation for their MSE data, and also in the mzIdentML format recently standardized by HUPO-PSI. Multiple files can also be read simultaneously and are considered as technical replicates. Results are saved to CSV, HTML and mzIdentML (in the case of a single mzIdentML input file files. An MSE analysis of a real sample is presented to compare the results of PAnalyzer and ProteinLynx Global Server. Conclusions We present a software tool to deal with the ambiguities that arise in the protein inference process. Key contributions are support for MSE data analysis by ProteinLynx Global Server and technical replicates
Inferring epidemic network topology from surveillance data.
Directory of Open Access Journals (Sweden)
Xiang Wan
Full Text Available The transmission of infectious diseases can be affected by many or even hidden factors, making it difficult to accurately predict when and where outbreaks may emerge. One approach at the moment is to develop and deploy surveillance systems in an effort to detect outbreaks as timely as possible. This enables policy makers to modify and implement strategies for the control of the transmission. The accumulated surveillance data including temporal, spatial, clinical, and demographic information, can provide valuable information with which to infer the underlying epidemic networks. Such networks can be quite informative and insightful as they characterize how infectious diseases transmit from one location to another. The aim of this work is to develop a computational model that allows inferences to be made regarding epidemic network topology in heterogeneous populations. We apply our model on the surveillance data from the 2009 H1N1 pandemic in Hong Kong. The inferred epidemic network displays significant effect on the propagation of infectious diseases.
A Learning Algorithm for Multimodal Grammar Inference.
D'Ulizia, A; Ferri, F; Grifoni, P
2011-12-01
The high costs of development and maintenance of multimodal grammars in integrating and understanding input in multimodal interfaces lead to the investigation of novel algorithmic solutions in automating grammar generation and in updating processes. Many algorithms for context-free grammar inference have been developed in the natural language processing literature. An extension of these algorithms toward the inference of multimodal grammars is necessary for multimodal input processing. In this paper, we propose a novel grammar inference mechanism that allows us to learn a multimodal grammar from its positive samples of multimodal sentences. The algorithm first generates the multimodal grammar that is able to parse the positive samples of sentences and, afterward, makes use of two learning operators and the minimum description length metrics in improving the grammar description and in avoiding the over-generalization problem. The experimental results highlight the acceptable performances of the algorithm proposed in this paper since it has a very high probability of parsing valid sentences.
First order augmentation to tensor voting for boundary inference and multiscale analysis in 3D.
Tong, Wai-Shun; Tang, Chi-Keung; Mordohai, Philippos; Medioni, Gérard
2004-05-01
Most computer vision applications require the reliable detection of boundaries. In the presence of outliers, missing data, orientation discontinuities, and occlusion, this problem is particularly challenging. We propose to address it by complementing the tensor voting framework, which was limited to second order properties, with first order representation and voting. First order voting fields and a mechanism to vote for 3D surface and volume boundaries and curve endpoints in 3D are defined. Boundary inference is also useful for a second difficult problem in grouping, namely, automatic scale selection. We propose an algorithm that automatically infers the smallest scale that can preserve the finest details. Our algorithm then proceeds with progressively larger scales to ensure continuity where it has not been achieved. Therefore, the proposed approach does not oversmooth features or delay the handling of boundaries and discontinuities until model misfit occurs. The interaction of smooth features, boundaries, and outliers is accommodated by the unified representation, making possible the perceptual organization of data in curves, surfaces, volumes, and their boundaries simultaneously. We present results on a variety of data sets to show the efficacy of the improved formalism.
iQuantitator: A tool for protein expression inference using iTRAQ
Directory of Open Access Journals (Sweden)
Comte-Walters Susana
2009-10-01
Full Text Available Abstract Background Isobaric Tags for Relative and Absolute Quantitation (iTRAQ™ [Applied Biosystems] have seen increased application in differential protein expression analysis. To facilitate the growing need to analyze iTRAQ data, especially for cases involving multiple iTRAQ experiments, we have developed a modeling approach, statistical methods, and tools for estimating the relative changes in protein expression under various treatments and experimental conditions. Results This modeling approach provides a unified analysis of data from multiple iTRAQ experiments and links the observed quantity (reporter ion peak area to the experiment design and the calculated quantity of interest (treatment-dependent protein and peptide fold change through an additive model under log transformation. Others have demonstrated, through a case study, this modeling approach and noted the computational challenges of parameter inference in the unbalanced data set typical of multiple iTRAQ experiments. Here we present the development of an inference approach, based on hierarchical regression with batching of regression coefficients and Markov Chain Monte Carlo (MCMC methods that overcomes some of these challenges. In addition to our discussion of the underlying method, we also present our implementation of the software, simulation results, experimental results, and sample output from the resulting analysis report. Conclusion iQuantitator's process-based modeling approach overcomes limitations in current methods and allows for application in a variety of experimental designs. Additionally, hypertext-linked documents produced by the tool aid in the interpretation and exploration of results.
Bayesian Inference of High-Dimensional Dynamical Ocean Models
Lin, J.; Lermusiaux, P. F. J.; Lolla, S. V. T.; Gupta, A.; Haley, P. J., Jr.
2015-12-01
This presentation addresses a holistic set of challenges in high-dimension ocean Bayesian nonlinear estimation: i) predict the probability distribution functions (pdfs) of large nonlinear dynamical systems using stochastic partial differential equations (PDEs); ii) assimilate data using Bayes' law with these pdfs; iii) predict the future data that optimally reduce uncertainties; and (iv) rank the known and learn the new model formulations themselves. Overall, we allow the joint inference of the state, equations, geometry, boundary conditions and initial conditions of dynamical models. Examples are provided for time-dependent fluid and ocean flows, including cavity, double-gyre and Strait flows with jets and eddies. The Bayesian model inference, based on limited observations, is illustrated first by the estimation of obstacle shapes and positions in fluid flows. Next, the Bayesian inference of biogeochemical reaction equations and of their states and parameters is presented, illustrating how PDE-based machine learning can rigorously guide the selection and discovery of complex ecosystem models. Finally, the inference of multiscale bottom gravity current dynamics is illustrated, motivated in part by classic overflows and dense water formation sites and their relevance to climate monitoring and dynamics. This is joint work with our MSEAS group at MIT.
Using absolute x-ray spectral measurements to infer stagnation conditions in ICF implosions
Patel, Pravesh; Benedetti, L. R.; Cerjan, C.; Clark, D. S.; Hurricane, O. A.; Izumi, N.; Jarrott, L. C.; Khan, S.; Kritcher, A. L.; Ma, T.; Macphee, A. G.; Landen, O.; Spears, B. K.; Springer, P. T.
2016-10-01
Measurements of the continuum x-ray spectrum emitted from the hot-spot of an ICF implosion can be used to infer a number thermodynamic properties at stagnation including temperature, pressure, and hot-spot mix. In deuterium-tritium (DT) layered implosion experiments on the National Ignition Facility (NIF) we field a number of x-ray diagnostics that provide spatial, temporal, and spectrally-resolved measurements of the radiated x-ray emission. We report on analysis of these measurements using a 1-D hot-spot model to infer thermodynamic properties at stagnation. We compare these to similar properties that can be derived from DT fusion neutron measurements. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Hybrid Optical Inference Machines
1991-09-27
with labels. Now, events. a set of facts cal be generated in the dyadic form "u, R 1,2" Eichmann and Caulfield (19] consider the same type of and can...these enceding-schemes. These architectures are-based pri- 19. G. Eichmann and H. J. Caulfield, "Optical Learning (Inference)marily on optical inner
Models for inference in dynamic metacommunity systems
Dorazio, Robert M.; Kery, Marc; Royle, J. Andrew; Plattner, Matthias
2010-01-01
A variety of processes are thought to be involved in the formation and dynamics of species assemblages. For example, various metacommunity theories are based on differences in the relative contributions of dispersal of species among local communities and interactions of species within local communities. Interestingly, metacommunity theories continue to be advanced without much empirical validation. Part of the problem is that statistical models used to analyze typical survey data either fail to specify ecological processes with sufficient complexity or they fail to account for errors in detection of species during sampling. In this paper, we describe a statistical modeling framework for the analysis of metacommunity dynamics that is based on the idea of adopting a unified approach, multispecies occupancy modeling, for computing inferences about individual species, local communities of species, or the entire metacommunity of species. This approach accounts for errors in detection of species during sampling and also allows different metacommunity paradigms to be specified in terms of species- and location-specific probabilities of occurrence, extinction, and colonization: all of which are estimable. In addition, this approach can be used to address inference problems that arise in conservation ecology, such as predicting temporal and spatial changes in biodiversity for use in making conservation decisions. To illustrate, we estimate changes in species composition associated with the species-specific phenologies of flight patterns of butterflies in Switzerland for the purpose of estimating regional differences in biodiversity.
Training Inference Making Skills Using a Situation Model Approach Improves Reading Comprehension
Directory of Open Access Journals (Sweden)
Lisanne eBos
2016-02-01
Full Text Available This study aimed to enhance third and fourth graders’ text comprehension at the situation model level. Therefore, we tested a reading strategy training developed to target inference making skills, which are widely considered to be pivotal to situation model construction. The training was grounded in contemporary literature on situation model-based inference making and addressed the source (text-based versus knowledge-based, type (necessary versus unnecessary for (re-establishing coherence, and depth of an inference (making single lexical inferences versus combining multiple lexical inferences, as well as the type of searching strategy (forward versus backward. Results indicated that, compared to a control group (n = 51, children who followed the experimental training (n = 67 improved their inference making skills supportive to situation model construction. Importantly, our training also resulted in increased levels of general reading comprehension and motivation. In sum, this study showed that a ‘level of text representation’-approach can provide a useful framework to teach inference making skills to third and fourth graders.
Robust Demographic Inference from Genomic and SNP Data
Excoffier, Laurent; Dupanloup, Isabelle; Huerta-Sánchez, Emilia; Sousa, Vitor C.; Foll, Matthieu
2013-01-01
We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets. PMID:24204310
Universal Darwinism as a process of Bayesian inference
Directory of Open Access Journals (Sweden)
John Oberon Campbell
2016-06-01
Full Text Available Many of the mathematical frameworks describing natural selection are equivalent to Bayes’ Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians. As Bayesian inference can always be cast in terms of (variational free energy minimization, natural selection can be viewed as comprising two components: a generative model of an ‘experiment’ in the external world environment, and the results of that 'experiment' or the 'surprise' entailed by predicted and actual outcomes of the ‘experiment’. Minimization of free energy implies that the implicit measure of 'surprise' experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature.
SDG multiple fault diagnosis by real-time inverse inference
International Nuclear Information System (INIS)
Zhang Zhaoqian; Wu Chongguang; Zhang Beike; Xia Tao; Li Anfeng
2005-01-01
In the past 20 years, one of the qualitative simulation technologies, signed directed graph (SDG) has been widely applied in the field of chemical fault diagnosis. However, the assumption of single fault origin was usually used by many former researchers. As a result, this will lead to the problem of combinatorial explosion and has limited SDG to the realistic application on the real process. This is mainly because that most of the former researchers used forward inference engine in the commercial expert system software to carry out the inverse diagnosis inference on the SDG model which violates the internal principle of diagnosis mechanism. In this paper, we present a new SDG multiple faults diagnosis method by real-time inverse inference. This is a method of multiple faults diagnosis from the genuine significance and the inference engine use inverse mechanism. At last, we give an example of 65t/h furnace diagnosis system to demonstrate its applicability and efficiency
SDG multiple fault diagnosis by real-time inverse inference
Energy Technology Data Exchange (ETDEWEB)
Zhang Zhaoqian; Wu Chongguang; Zhang Beike; Xia Tao; Li Anfeng
2005-02-01
In the past 20 years, one of the qualitative simulation technologies, signed directed graph (SDG) has been widely applied in the field of chemical fault diagnosis. However, the assumption of single fault origin was usually used by many former researchers. As a result, this will lead to the problem of combinatorial explosion and has limited SDG to the realistic application on the real process. This is mainly because that most of the former researchers used forward inference engine in the commercial expert system software to carry out the inverse diagnosis inference on the SDG model which violates the internal principle of diagnosis mechanism. In this paper, we present a new SDG multiple faults diagnosis method by real-time inverse inference. This is a method of multiple faults diagnosis from the genuine significance and the inference engine use inverse mechanism. At last, we give an example of 65t/h furnace diagnosis system to demonstrate its applicability and efficiency.
Calibrated birth-death phylogenetic time-tree priors for bayesian inference.
Heled, Joseph; Drummond, Alexei J
2015-05-01
Here we introduce a general class of multiple calibration birth-death tree priors for use in Bayesian phylogenetic inference. All tree priors in this class separate ancestral node heights into a set of "calibrated nodes" and "uncalibrated nodes" such that the marginal distribution of the calibrated nodes is user-specified whereas the density ratio of the birth-death prior is retained for trees with equal values for the calibrated nodes. We describe two formulations, one in which the calibration information informs the prior on ranked tree topologies, through the (conditional) prior, and the other which factorizes the prior on divergence times and ranked topologies, thus allowing uniform, or any arbitrary prior distribution on ranked topologies. Although the first of these formulations has some attractive properties, the algorithm we present for computing its prior density is computationally intensive. However, the second formulation is always faster and computationally efficient for up to six calibrations. We demonstrate the utility of the new class of multiple-calibration tree priors using both small simulations and a real-world analysis and compare the results to existing schemes. The two new calibrated tree priors described in this article offer greater flexibility and control of prior specification in calibrated time-tree inference and divergence time dating, and will remove the need for indirect approaches to the assessment of the combined effect of calibration densities and tree priors in Bayesian phylogenetic inference. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Causal inference based on counterfactuals
Directory of Open Access Journals (Sweden)
Höfler M
2005-09-01
Full Text Available Abstract Background The counterfactual or potential outcome model has become increasingly standard for causal inference in epidemiological and medical studies. Discussion This paper provides an overview on the counterfactual and related approaches. A variety of conceptual as well as practical issues when estimating causal effects are reviewed. These include causal interactions, imperfect experiments, adjustment for confounding, time-varying exposures, competing risks and the probability of causation. It is argued that the counterfactual model of causal effects captures the main aspects of causality in health sciences and relates to many statistical procedures. Summary Counterfactuals are the basis of causal inference in medicine and epidemiology. Nevertheless, the estimation of counterfactual differences pose several difficulties, primarily in observational studies. These problems, however, reflect fundamental barriers only when learning from observations, and this does not invalidate the counterfactual concept.
Implementing and analyzing the multi-threaded LP-inference
Bolotova, S. Yu; Trofimenko, E. V.; Leschinskaya, M. V.
2018-03-01
The logical production equations provide new possibilities for the backward inference optimization in intelligent production-type systems. The strategy of a relevant backward inference is aimed at minimization of a number of queries to external information source (either to a database or an interactive user). The idea of the method is based on the computing of initial preimages set and searching for the true preimage. The execution of each stage can be organized independently and in parallel and the actual work at a given stage can also be distributed between parallel computers. This paper is devoted to the parallel algorithms of the relevant inference based on the advanced scheme of the parallel computations “pipeline” which allows to increase the degree of parallelism. The author also provides some details of the LP-structures implementation.
Directory of Open Access Journals (Sweden)
Sun Shi Yan
2016-01-01
Full Text Available Multiple attributes decision making (MADM method is an important measure for system integration. Robustness analysis on MADM is a hotspot in these years which wins academe’s great attention, and is supposed to be an effective way when countering imperfect information. Setting parameters in ELECTRE-III’s is a vital and difficult step. In this paper, a method of inferring ELECTRE-III’s parameters with fuzzy information based on robustness analysis is presented. First, ELECTRE-III is transformed into a continuous smooth function of each parameter vector. Then, robustness analysis structure and a parameters inferring algorithm are provided by maximizing robustness margin based on mathematics programming. Moreover, how to resolve the programming problem is also discussed. At last, a illustrative example of Naval Gun Weapon System Integration is put forward.
Clewing, Catharina; Albrecht, Christian; Wilke, Thomas
2016-01-01
Although only relatively few freshwater invertebrate families are reported from the Tibetan Plateau, the degree of endemism may be high. Many endemic lineages occur within permafrost areas, raising questions about the existence of isolated intra-plateau glacial refugia. Moreover, if such refugia existed, it might be instructive to learn whether they were associated with lakes or with more dynamic ecosystems such as ponds, wetlands, or springs. To study these hypotheses, we used pulmonate snails of the plateau-wide distributed genus Radix as model group and the Lake Donggi Cona drainage system, located in the north-eastern part of the plateau, as model site. First, we performed plateau-wide phylogenetic analyses using mtDNA data to assess the overall relationships of Radix populations inhabiting the Lake Donggi Cona system for revealing refugial lineages. We then conducted regional phylogeographical analyses applying a combination of mtDNA and nuclear AFLP markers to infer the local structure and demographic history of the most abundant endemic Radix clade for identifying location and type of (sub-)refugia within the drainage system. Our phylogenetic analysis showed a high diversity of Radix lineages in the Lake Donggi Cona system. Subsequent phylogeographical analyses of the most abundant endemic clade indicated a habitat-related clustering of genotypes and several Late Pleistocene spatial/demographic expansion events. The most parsimonious explanation for these patterns would be a scenario of an intra-plateau glacial refugium in the Lake Donggi Cona drainage system, which might have consisted of isolated sub-refugia. Though the underlying processes remain unknown, an initial separation of lake and watershed populations could have been triggered by lake-level fluctuations before and during the Last Glacial Maximum. This study inferred the first intra-plateau refugium for freshwater animals on the Tibetan Plateau. It thus sheds new light on the evolutionary history
Evolutionary rates at codon sites may be used to align sequences and infer protein domain function