WorldWideScience

Sample records for genetic analysis infers

  1. Deep Learning for Population Genetic Inference.

    Science.gov (United States)

    Sheehan, Sara; Song, Yun S

    2016-03-01

    Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.

  2. Deep Learning for Population Genetic Inference.

    Directory of Open Access Journals (Sweden)

    Sara Sheehan

    2016-03-01

    Full Text Available Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data to the output (e.g., population genetic parameters of interest. We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history. Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.

  3. Deep Learning for Population Genetic Inference

    Science.gov (United States)

    Sheehan, Sara; Song, Yun S.

    2016-01-01

    Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908

  4. Inference and Analysis of Population Structure Using Genetic Data and Network Theory.

    Science.gov (United States)

    Greenbaum, Gili; Templeton, Alan R; Bar-David, Shirli

    2016-04-01

    Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition's modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of

  5. Genetic variation and phylogenetic relationship analysis of Jatropha curcas L. inferred from nrDNA ITS sequences.

    Science.gov (United States)

    Guo, Guo-Ye; Chen, Fang; Shi, Xiao-Dong; Tian, Yin-Shuai; Yu, Mao-Qun; Han, Xue-Qin; Yuan, Li-Chun; Zhang, Ying

    2016-01-01

    Genetic variation and phylogenetic relationships among 102 Jatropha curcas accessions from Asia, Africa, and the Americas were assessed using the internal transcribed spacer region of nuclear ribosomal DNA (nrDNA ITS). The average G+C content (65.04%) was considerably higher than the A+T (34.96%) content. The estimated genetic diversity revealed moderate genetic variation. The pairwise genetic divergences (GD) between haplotypes were evaluated and ranged from 0.000 to 0.017, suggesting a higher level of genetic differentiation in Mexican accessions than those of other regions. Phylogenetic relationships and intraspecific divergence were inferred by Bayesian inference (BI), maximum parsimony (MP), and median joining (MJ) network analysis and were generally resolved. The J. curcas accessions were consistently divided into three lineages, groups A, B, and C, which demonstrated distant geographical isolation and genetic divergence between American accessions and those from other regions. The MJ network analysis confirmed that Central America was the possible center of origin. The putative migration route suggested that J. curcas was distributed from Mexico or Brazil, via Cape Verde and then split into two routes. One route was dispersed to Spain, then migrated to China, eventually spreading to southeastern Asia, while the other route was dispersed to Africa, via Madagascar and migrated to China, later spreading to southeastern Asia. Copyright © 2016 Académie des sciences. Published by Elsevier SAS. All rights reserved.

  6. Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure.

    Science.gov (United States)

    Byun, Jinyoung; Han, Younghun; Gorlov, Ivan P; Busam, Jonathan A; Seldin, Michael F; Amos, Christopher I

    2017-10-16

    Accurate inference of genetic ancestry is of fundamental interest to many biomedical, forensic, and anthropological research areas. Genetic ancestry memberships may relate to genetic disease risks. In a genome association study, failing to account for differences in genetic ancestry between cases and controls may also lead to false-positive results. Although a number of strategies for inferring and taking into account the confounding effects of genetic ancestry are available, applying them to large studies (tens thousands samples) is challenging. The goal of this study is to develop an approach for inferring genetic ancestry of samples with unknown ancestry among closely related populations and to provide accurate estimates of ancestry for application to large-scale studies. In this study we developed a novel distance-based approach, Ancestry Inference using Principal component analysis and Spatial analysis (AIPS) that incorporates an Inverse Distance Weighted (IDW) interpolation method from spatial analysis to assign individuals to population memberships. We demonstrate the benefits of AIPS in analyzing population substructure, specifically related to the four most commonly used tools EIGENSTRAT, STRUCTURE, fastSTRUCTURE, and ADMIXTURE using genotype data from various intra-European panels and European-Americans. While the aforementioned commonly used tools performed poorly in inferring ancestry from a large number of subpopulations, AIPS accurately distinguished variations between and within subpopulations. Our results show that AIPS can be applied to large-scale data sets to discriminate the modest variability among intra-continental populations as well as for characterizing inter-continental variation. The method we developed will protect against spurious associations when mapping the genetic basis of a disease. Our approach is more accurate and computationally efficient method for inferring genetic ancestry in the large-scale genetic studies.

  7. Bayesian inference on genetic merit under uncertain paternity

    Directory of Open Access Journals (Sweden)

    Tempelman Robert J

    2003-09-01

    Full Text Available Abstract A hierarchical animal model was developed for inference on genetic merit of livestock with uncertain paternity. Fully conditional posterior distributions for fixed and genetic effects, variance components, sire assignments and their probabilities are derived to facilitate a Bayesian inference strategy using MCMC methods. We compared this model to a model based on the Henderson average numerator relationship (ANRM in a simulation study with 10 replicated datasets generated for each of two traits. Trait 1 had a medium heritability (h2 for each of direct and maternal genetic effects whereas Trait 2 had a high h2 attributable only to direct effects. The average posterior probabilities inferred on the true sire were between 1 and 10% larger than the corresponding priors (the inverse of the number of candidate sires in a mating pasture for Trait 1 and between 4 and 13% larger than the corresponding priors for Trait 2. The predicted additive and maternal genetic effects were very similar using both models; however, model choice criteria (Pseudo Bayes Factor and Deviance Information Criterion decisively favored the proposed hierarchical model over the ANRM model.

  8. Spurious correlations and inference in landscape genetics

    Science.gov (United States)

    Samuel A. Cushman; Erin L. Landguth

    2010-01-01

    Reliable interpretation of landscape genetic analyses depends on statistical methods that have high power to identify the correct process driving gene flow while rejecting incorrect alternative hypotheses. Little is known about statistical power and inference in individual-based landscape genetics. Our objective was to evaluate the power of causalmodelling with partial...

  9. Inferring genetic interactions from comparative fitness data.

    Science.gov (United States)

    Crona, Kristina; Gavryushkin, Alex; Greene, Devin; Beerenwinkel, Niko

    2017-12-20

    Darwinian fitness is a central concept in evolutionary biology. In practice, however, it is hardly possible to measure fitness for all genotypes in a natural population. Here, we present quantitative tools to make inferences about epistatic gene interactions when the fitness landscape is only incompletely determined due to imprecise measurements or missing observations. We demonstrate that genetic interactions can often be inferred from fitness rank orders, where all genotypes are ordered according to fitness, and even from partial fitness orders. We provide a complete characterization of rank orders that imply higher order epistasis. Our theory applies to all common types of gene interactions and facilitates comprehensive investigations of diverse genetic interactions. We analyzed various genetic systems comprising HIV-1, the malaria-causing parasite Plasmodium vivax , the fungus Aspergillus niger , and the TEM-family of β-lactamase associated with antibiotic resistance. For all systems, our approach revealed higher order interactions among mutations.

  10. Inference of Tumor Evolution during Chemotherapy by Computational Modeling and In Situ Analysis of Genetic and Phenotypic Cellular Diversity

    Directory of Open Access Journals (Sweden)

    Vanessa Almendro

    2014-02-01

    Full Text Available Cancer therapy exerts a strong selection pressure that shapes tumor evolution, yet our knowledge of how tumors change during treatment is limited. Here, we report the analysis of cellular heterogeneity for genetic and phenotypic features and their spatial distribution in breast tumors pre- and post-neoadjuvant chemotherapy. We found that intratumor genetic diversity was tumor-subtype specific, and it did not change during treatment in tumors with partial or no response. However, lower pretreatment genetic diversity was significantly associated with pathologic complete response. In contrast, phenotypic diversity was different between pre- and posttreatment samples. We also observed significant changes in the spatial distribution of cells with distinct genetic and phenotypic features. We used these experimental data to develop a stochastic computational model to infer tumor growth patterns and evolutionary dynamics. Our results highlight the importance of integrated analysis of genotypes and phenotypes of single cells in intact tissues to predict tumor evolution.

  11. Inference of tumor evolution during chemotherapy by computational modeling and in situ analysis of genetic and phenotypic cellular diversity

    International Nuclear Information System (INIS)

    Almendro, Vanessa; Cheng, Yu-Kang; Randles, Amanda; Itzkovitz, Shalev; Marusyk, Andriy; Ametller, Elisabet; Gonzalez-Farre, Xavier; Muñoz, Montse; Russnes, Hege G.; Helland, Åslaug; Rye, Inga H.; Borresen-Dale, Anne-Lise; Maruyama, Reo; Van Oudenaarden, Alexander; Dowsett, Mitchell; Jones, Robin L.; Reis-Filho, Jorge; Gascon, Pere; Gönen, Mithat; Michor, Franziska; Polyak, Kornelia

    2014-01-01

    Cancer therapy exerts a strong selection pressure that shapes tumor evolution, yet our knowledge of how tumors change during treatment is limited. Here, we report the analysis of cellular heterogeneity for genetic and phenotypic features and their spatial distribution in breast tumors pre- and post-neoadjuvant chemotherapy. We found that intratumor genetic diversity was tumor-subtype specific, and it did not change during treatment in tumors with partial or no response. However, lower pretreatment genetic diversity was significantly associated with pathologic complete response. In contrast, phenotypic diversity was different between pre- and post-treatment samples. We also observed significant changes in the spatial distribution of cells with distinct genetic and phenotypic features. We used these experimental data to develop a stochastic computational model to infer tumor growth patterns and evolutionary dynamics. Our results highlight the importance of integrated analysis of genotypes and phenotypes of single cells in intact tissues to predict tumor evolution

  12. The Analysis of Polyploid Genetic Data.

    Science.gov (United States)

    Meirmans, Patrick G; Liu, Shenglin; van Tienderen, Peter H

    2018-03-16

    Though polyploidy is an important aspect of the evolutionary genetics of both plants and animals, the development of population genetic theory of polyploids has seriously lagged behind that of diploids. This is unfortunate since the analysis of polyploid genetic data-and the interpretation of the results-requires even more scrutiny than with diploid data. This is because of several polyploidy-specific complications in segregation and genotyping such as tetrasomy, double reduction, and missing dosage information. Here, we review the theoretical and statistical aspects of the population genetics of polyploids. We discuss several widely used types of inferences, including genetic diversity, Hardy-Weinberg equilibrium, population differentiation, genetic distance, and detecting population structure. For each, we point out how the statistical approach, expected result, and interpretation differ between different ploidy levels. We also discuss for each type of inference what biases may arise from the polyploid-specific complications and how these biases can be overcome. From our overview, it is clear that the statistical toolbox that is available for the analysis of genetic data is flexible and still expanding. Modern sequencing techniques will soon be able to overcome some of the current limitations to the analysis of polyploid data, though the techniques are lagging behind those available for diploids. Furthermore, the availability of more data may aggravate the biases that can arise, and increase the risk of false inferences. Therefore, simulations such as we used throughout this review are an important tool to verify the results of analyses of polyploid genetic data.

  13. Inferences of Recent and Ancient Human Population History Using Genetic and Non-Genetic Data

    Science.gov (United States)

    Kitchen, Andrew

    2008-01-01

    I have adopted complementary approaches to inferring human demographic history utilizing human and non-human genetic data as well as cultural data. These complementary approaches form an interdisciplinary perspective that allows one to make inferences of human history at varying timescales, from the events that occurred tens of thousands of years…

  14. Inferring Genetic Ancestry: Opportunities, Challenges, and Implications

    OpenAIRE

    Royal, Charmaine D.; Novembre, John; Fullerton, Stephanie M.; Goldstein, David B.; Long, Jeffrey C.; Bamshad, Michael J.; Clark, Andrew G.

    2010-01-01

    Increasing public interest in direct-to-consumer (DTC) genetic ancestry testing has been accompanied by growing concern about issues ranging from the personal and societal implications of the testing to the scientific validity of ancestry inference. The very concept of “ancestry” is subject to misunderstanding in both the general and scientific communities. What do we mean by ancestry? How exactly is ancestry measured? How far back can such ancestry be defined and by which genetic tools? How ...

  15. Use of genetic data to infer population-specific ecological and phenotypic traits from mixed aggregations.

    Directory of Open Access Journals (Sweden)

    Paul Moran

    Full Text Available Many applications in ecological genetics involve sampling individuals from a mixture of multiple biological populations and subsequently associating those individuals with the populations from which they arose. Analytical methods that assign individuals to their putative population of origin have utility in both basic and applied research, providing information about population-specific life history and habitat use, ecotoxins, pathogen and parasite loads, and many other non-genetic ecological, or phenotypic traits. Although the question is initially directed at the origin of individuals, in most cases the ultimate desire is to investigate the distribution of some trait among populations. Current practice is to assign individuals to a population of origin and study properties of the trait among individuals within population strata as if they constituted independent samples. It seemed that approach might bias population-specific trait inference. In this study we made trait inferences directly through modeling, bypassing individual assignment. We extended a Bayesian model for population mixture analysis to incorporate parameters for the phenotypic trait and compared its performance to that of individual assignment with a minimum probability threshold for assignment. The Bayesian mixture model outperformed individual assignment under some trait inference conditions. However, by discarding individuals whose origins are most uncertain, the individual assignment method provided a less complex analytical technique whose performance may be adequate for some common trait inference problems. Our results provide specific guidance for method selection under various genetic relationships among populations with different trait distributions.

  16. Use of genetic data to infer population-specific ecological and phenotypic traits from mixed aggregations

    Science.gov (United States)

    Moran, Paul; Bromaghin, Jeffrey F.; Masuda, Michele

    2014-01-01

    Many applications in ecological genetics involve sampling individuals from a mixture of multiple biological populations and subsequently associating those individuals with the populations from which they arose. Analytical methods that assign individuals to their putative population of origin have utility in both basic and applied research, providing information about population-specific life history and habitat use, ecotoxins, pathogen and parasite loads, and many other non-genetic ecological, or phenotypic traits. Although the question is initially directed at the origin of individuals, in most cases the ultimate desire is to investigate the distribution of some trait among populations. Current practice is to assign individuals to a population of origin and study properties of the trait among individuals within population strata as if they constituted independent samples. It seemed that approach might bias population-specific trait inference. In this study we made trait inferences directly through modeling, bypassing individual assignment. We extended a Bayesian model for population mixture analysis to incorporate parameters for the phenotypic trait and compared its performance to that of individual assignment with a minimum probability threshold for assignment. The Bayesian mixture model outperformed individual assignment under some trait inference conditions. However, by discarding individuals whose origins are most uncertain, the individual assignment method provided a less complex analytical technique whose performance may be adequate for some common trait inference problems. Our results provide specific guidance for method selection under various genetic relationships among populations with different trait distributions.

  17. Genetic Network Inference: From Co-Expression Clustering to Reverse Engineering

    Science.gov (United States)

    Dhaeseleer, Patrik; Liang, Shoudan; Somogyi, Roland

    2000-01-01

    Advances in molecular biological, analytical, and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using high-throughput gene expression assays, we are able to measure the output of the gene regulatory network. We aim here to review datamining and modeling approaches for conceptualizing and unraveling the functional relationships implicit in these datasets. Clustering of co-expression profiles allows us to infer shared regulatory inputs and functional pathways. We discuss various aspects of clustering, ranging from distance measures to clustering algorithms and multiple-duster memberships. More advanced analysis aims to infer causal connections between genes directly, i.e., who is regulating whom and how. We discuss several approaches to the problem of reverse engineering of genetic networks, from discrete Boolean networks, to continuous linear and non-linear models. We conclude that the combination of predictive modeling with systematic experimental verification will be required to gain a deeper insight into living organisms, therapeutic targeting, and bioengineering.

  18. The genetic assimilation in language borrowing inferred from Jing People.

    Science.gov (United States)

    Huang, Xiufeng; Zhou, Qinghui; Bin, Xiaoyun; Lai, Shu; Lin, Chaowen; Hu, Rong; Xiao, Jiashun; Luo, Dajun; Li, Yingxiang; Wei, Lan-Hai; Yeh, Hui-Yuan; Chen, Gang; Wang, Chuan-Chao

    2018-02-28

    The Jing people are a recognized ethnic group in Guangxi, southwest China, who are the immigrants from Vietnam during the 16th century. They speak Vietnamese but with lots of language borrowings from Cantonese, Zhuang, and Mandarin. However, it's unclear if there is large-scale gene flow from surrounding populations into Jing people during their language change due to the very limited genetic information of this population. We collected blood samples from 37 Jing and 3 Han Chinese individuals from Wanwei, Shanxin, and Wutou islands in Guangxi and genotyped about 600,000 genome-wide single nucleotide polymorphisms (SNPs). We used Principal Component Analysis (PCA), ADMIXTURE analysis, f statistics, qpWave and qpAdm to infer the population genetic structure and admixture. Our data revealed that the Jing people are genetically similar to the populations in southwest China and mainland Southeast Asia. But compared with Vietnamese, they show significant evidence of gene flow from surrounding East Asians. The admixture proportion is estimated to be around 35-42% in different Jing groups using southern Han Chinese as a proxy. The majority of the paternal lineages of Jing people are most likely from surrounding East Asians. We conclude that the formation and language change of present-day Jing people have involved genetic assimilation of surrounding East Asian populations. The language borrowing, in this case, is not only a cultural phenomenon but has involved demic diffusion. © 2018 Wiley Periodicals, Inc.

  19. The Information Content of Discrete Functions and Their Application in Genetic Data Analysis.

    Science.gov (United States)

    Sakhanenko, Nikita A; Kunert-Graf, James; Galas, David J

    2017-12-01

    The complex of central problems in data analysis consists of three components: (1) detecting the dependence of variables using quantitative measures, (2) defining the significance of these dependence measures, and (3) inferring the functional relationships among dependent variables. We have argued previously that an information theory approach allows separation of the detection problem from the inference of functional form problem. We approach here the third component of inferring functional forms based on information encoded in the functions. We present here a direct method for classifying the functional forms of discrete functions of three variables represented in data sets. Discrete variables are frequently encountered in data analysis, both as the result of inherently categorical variables and from the binning of continuous numerical variables into discrete alphabets of values. The fundamental question of how much information is contained in a given function is answered for these discrete functions, and their surprisingly complex relationships are illustrated. The all-important effect of noise on the inference of function classes is found to be highly heterogeneous and reveals some unexpected patterns. We apply this classification approach to an important area of biological data analysis-that of inference of genetic interactions. Genetic analysis provides a rich source of real and complex biological data analysis problems, and our general methods provide an analytical basis and tools for characterizing genetic problems and for analyzing genetic data. We illustrate the functional description and the classes of a number of common genetic interaction modes and also show how different modes vary widely in their sensitivity to noise.

  20. Reveal, A General Reverse Engineering Algorithm for Inference of Genetic Network Architectures

    Science.gov (United States)

    Liang, Shoudan; Fuhrman, Stefanie; Somogyi, Roland

    1998-01-01

    Given the immanent gene expression mapping covering whole genomes during development, health and disease, we seek computational methods to maximize functional inference from such large data sets. Is it possible, in principle, to completely infer a complex regulatory network architecture from input/output patterns of its variables? We investigated this possibility using binary models of genetic networks. Trajectories, or state transition tables of Boolean nets, resemble time series of gene expression. By systematically analyzing the mutual information between input states and output states, one is able to infer the sets of input elements controlling each element or gene in the network. This process is unequivocal and exact for complete state transition tables. We implemented this REVerse Engineering ALgorithm (REVEAL) in a C program, and found the problem to be tractable within the conditions tested so far. For n = 50 (elements) and k = 3 (inputs per element), the analysis of incomplete state transition tables (100 state transition pairs out of a possible 10(exp 15)) reliably produced the original rule and wiring sets. While this study is limited to synchronous Boolean networks, the algorithm is generalizable to include multi-state models, essentially allowing direct application to realistic biological data sets. The ability to adequately solve the inverse problem may enable in-depth analysis of complex dynamic systems in biology and other fields.

  1. Reducing bias in population and landscape genetic inferences: the effects of sampling related individuals and multiple life stages.

    Science.gov (United States)

    Peterman, William; Brocato, Emily R; Semlitsch, Raymond D; Eggert, Lori S

    2016-01-01

    In population or landscape genetics studies, an unbiased sampling scheme is essential for generating accurate results, but logistics may lead to deviations from the sample design. Such deviations may come in the form of sampling multiple life stages. Presently, it is largely unknown what effect sampling different life stages can have on population or landscape genetic inference, or how mixing life stages can affect the parameters being measured. Additionally, the removal of siblings from a data set is considered best-practice, but direct comparisons of inferences made with and without siblings are limited. In this study, we sampled embryos, larvae, and adult Ambystoma maculatum from five ponds in Missouri, and analyzed them at 15 microsatellite loci. We calculated allelic richness, heterozygosity and effective population sizes for each life stage at each pond and tested for genetic differentiation (F ST and D C ) and isolation-by-distance (IBD) among ponds. We tested for differences in each of these measures between life stages, and in a pooled population of all life stages. All calculations were done with and without sibling pairs to assess the effect of sibling removal. We also assessed the effect of reducing the number of microsatellites used to make inference. No statistically significant differences were found among ponds or life stages for any of the population genetic measures, but patterns of IBD differed among life stages. There was significant IBD when using adult samples, but tests using embryos, larvae, or a combination of the three life stages were not significant. We found that increasing the ratio of larval or embryo samples in the analysis of genetic distance weakened the IBD relationship, and when using D C , the IBD was no longer significant when larvae and embryos exceeded 60% of the population sample. Further, power to detect an IBD relationship was reduced when fewer microsatellites were used in the analysis.

  2. Reducing bias in population and landscape genetic inferences: the effects of sampling related individuals and multiple life stages

    Directory of Open Access Journals (Sweden)

    William Peterman

    2016-03-01

    Full Text Available In population or landscape genetics studies, an unbiased sampling scheme is essential for generating accurate results, but logistics may lead to deviations from the sample design. Such deviations may come in the form of sampling multiple life stages. Presently, it is largely unknown what effect sampling different life stages can have on population or landscape genetic inference, or how mixing life stages can affect the parameters being measured. Additionally, the removal of siblings from a data set is considered best-practice, but direct comparisons of inferences made with and without siblings are limited. In this study, we sampled embryos, larvae, and adult Ambystoma maculatum from five ponds in Missouri, and analyzed them at 15 microsatellite loci. We calculated allelic richness, heterozygosity and effective population sizes for each life stage at each pond and tested for genetic differentiation (FST and DC and isolation-by-distance (IBD among ponds. We tested for differences in each of these measures between life stages, and in a pooled population of all life stages. All calculations were done with and without sibling pairs to assess the effect of sibling removal. We also assessed the effect of reducing the number of microsatellites used to make inference. No statistically significant differences were found among ponds or life stages for any of the population genetic measures, but patterns of IBD differed among life stages. There was significant IBD when using adult samples, but tests using embryos, larvae, or a combination of the three life stages were not significant. We found that increasing the ratio of larval or embryo samples in the analysis of genetic distance weakened the IBD relationship, and when using DC, the IBD was no longer significant when larvae and embryos exceeded 60% of the population sample. Further, power to detect an IBD relationship was reduced when fewer microsatellites were used in the analysis.

  3. Genetic analysis on three South Indian sympatric hipposiderid bats (Chiroptera, Hipposideridae

    Directory of Open Access Journals (Sweden)

    Kanagaraj, C

    2010-12-01

    Full Text Available In mitochondrial DNA, variations in the sequence of 16S rRNA region were analyzed to infer the genetic relationship and population history of three sympatric hipposiderid bats, Hipposideros speoris, H. fulvus and H. ater. Based on the DNA sequence data, we observed relatively lower haplotype and higher nucleotide diversity in H. speoris than in the other two species. The pairwise comparisons of the genetic divergence inferred a genetic relationship between the three hipposiderid bats. We used haplotype sequences to construct a phylogenetic tree. Maximum parsimony and Bayesian inference analysis generated a tree with similar topology. H. fulvus and H. ater formed one cluster and H. speoris formed another cluster. Analysis of the demographic history of populations using Jajima’s D test revealed past changes in populations. Comparison of the observed distribution of pairwise differences in the nucleotides with expected sudden expansion model accepts for H. fulvus and H. ater but not for H. speoris populations.

  4. Statistical Methods for Population Genetic Inference Based on Low-Depth Sequencing Data from Modern and Ancient DNA

    DEFF Research Database (Denmark)

    Korneliussen, Thorfinn Sand

    Due to the recent advances in DNA sequencing technology genomic data are being generated at an unprecedented rate and we are gaining access to entire genomes at population level. The technology does, however, not give direct access to the genetic variation and the many levels of preprocessing...... that is required before being able to make inferences from the data introduces multiple levels of uncertainty, especially for low-depth data. Therefore methods that take into account the inherent uncertainty are needed for being able to make robust inferences in the downstream analysis of such data. This poses...... a problem for a range of key summary statistics within populations genetics where existing methods are based on the assumption that the true genotypes are known. Motivated by this I present: 1) a new method for the estimation of relatedness between pairs of individuals, 2) a new method for estimating...

  5. Developments in statistical analysis in quantitative genetics

    DEFF Research Database (Denmark)

    Sorensen, Daniel

    2009-01-01

    of genetic means and variances, models for the analysis of categorical and count data, the statistical genetics of a model postulating that environmental variance is partly under genetic control, and a short discussion of models that incorporate massive genetic marker information. We provide an overview......A remarkable research impetus has taken place in statistical genetics since the last World Conference. This has been stimulated by breakthroughs in molecular genetics, automated data-recording devices and computer-intensive statistical methods. The latter were revolutionized by the bootstrap...... and by Markov chain Monte Carlo (McMC). In this overview a number of specific areas are chosen to illustrate the enormous flexibility that McMC has provided for fitting models and exploring features of data that were previously inaccessible. The selected areas are inferences of the trajectories over time...

  6. Network-assisted crop systems genetics: network inference and integrative analysis.

    Science.gov (United States)

    Lee, Tak; Kim, Hyojin; Lee, Insuk

    2015-04-01

    Although next-generation sequencing (NGS) technology has enabled the decoding of many crop species genomes, most of the underlying genetic components for economically important crop traits remain to be determined. Network approaches have proven useful for the study of the reference plant, Arabidopsis thaliana, and the success of network-based crop genetics will also require the availability of a genome-scale functional networks for crop species. In this review, we discuss how to construct functional networks and elucidate the holistic view of a crop system. The crop gene network then can be used for gene prioritization and the analysis of resequencing-based genome-wide association study (GWAS) data, the amount of which will rapidly grow in the field of crop science in the coming years. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Scale dependent inference in landscape genetics

    Science.gov (United States)

    Samuel A. Cushman; Erin L. Landguth

    2010-01-01

    Ecological relationships between patterns and processes are highly scale dependent. This paper reports the first formal exploration of how changing scale of research away from the scale of the processes governing gene flow affects the results of landscape genetic analysis. We used an individual-based, spatially explicit simulation model to generate patterns of genetic...

  8. Population genetic structure of the cotton bollworm Helicoverpa armigera (Hübner) (Lepidoptera: Noctuidae) in India as inferred from EPIC-PCR DNA markers.

    Science.gov (United States)

    Behere, Gajanan Tryambak; Tay, Wee Tek; Russell, Derek Alan; Kranthi, Keshav Raj; Batterham, Philip

    2013-01-01

    Helicoverpa armigera is an important pest of cotton and other agricultural crops in the Old World. Its wide host range, high mobility and fecundity, and the ability to adapt and develop resistance against all common groups of insecticides used for its management have exacerbated its pest status. An understanding of the population genetic structure in H. armigera under Indian agricultural conditions will help ascertain gene flow patterns across different agricultural zones. This study inferred the population genetic structure of Indian H. armigera using five Exon-Primed Intron-Crossing (EPIC)-PCR markers. Nested alternative EPIC markers detected moderate null allele frequencies (4.3% to 9.4%) in loci used to infer population genetic structure but the apparently genome-wide heterozygote deficit suggests in-breeding or a Wahlund effect rather than a null allele effect. Population genetic analysis of the 26 populations suggested significant genetic differentiation within India but especially in cotton-feeding populations in the 2006-07 cropping season. In contrast, overall pair-wise F(ST) estimates from populations feeding on food crops indicated no significant population substructure irrespective of cropping seasons. A Baysian cluster analysis was used to assign the genetic make-up of individuals to likely membership of population clusters. Some evidence was found for four major clusters with individuals in two populations from cotton in one year (from two populations in northern India) showing especially high homogeneity. Taken as a whole, this study found evidence of population substructure at host crop, temporal and spatial levels in Indian H. armigera, without, however, a clear biological rationale for these structures being evident.

  9. General Methods for Evolutionary Quantitative Genetic Inference from Generalized Mixed Models.

    Science.gov (United States)

    de Villemereuil, Pierre; Schielzeth, Holger; Nakagawa, Shinichi; Morrissey, Michael

    2016-11-01

    Methods for inference and interpretation of evolutionary quantitative genetic parameters, and for prediction of the response to selection, are best developed for traits with normal distributions. Many traits of evolutionary interest, including many life history and behavioral traits, have inherently nonnormal distributions. The generalized linear mixed model (GLMM) framework has become a widely used tool for estimating quantitative genetic parameters for nonnormal traits. However, whereas GLMMs provide inference on a statistically convenient latent scale, it is often desirable to express quantitative genetic parameters on the scale upon which traits are measured. The parameters of fitted GLMMs, despite being on a latent scale, fully determine all quantities of potential interest on the scale on which traits are expressed. We provide expressions for deriving each of such quantities, including population means, phenotypic (co)variances, variance components including additive genetic (co)variances, and parameters such as heritability. We demonstrate that fixed effects have a strong impact on those parameters and show how to deal with this by averaging or integrating over fixed effects. The expressions require integration of quantities determined by the link function, over distributions of latent values. In general cases, the required integrals must be solved numerically, but efficient methods are available and we provide an implementation in an R package, QGglmm. We show that known formulas for quantities such as heritability of traits with binomial and Poisson distributions are special cases of our expressions. Additionally, we show how fitted GLMM can be incorporated into existing methods for predicting evolutionary trajectories. We demonstrate the accuracy of the resulting method for evolutionary prediction by simulation and apply our approach to data from a wild pedigreed vertebrate population. Copyright © 2016 de Villemereuil et al.

  10. Amino acid substitutions in genetic variants of human serum albumin and in sequences inferred from molecular cloning

    International Nuclear Information System (INIS)

    Takahashi, N.; Takahashi, Y.; Blumberg, B.S.; Putnam, F.W.

    1987-01-01

    The structural changes in four genetic variants of human serum albumin were analyzed by tandem high-pressure liquid chromatography (HPLC) of the tryptic peptides, HPLC mapping and isoelectric focusing of the CNBr fragments, and amino acid sequence analysis of the purified peptides. Lysine-372 of normal (common) albumin A was changed to glutamic acid both in albumin Naskapi, a widespread polymorphic variant of North American Indians, and in albumin Mersin found in Eti Turks. The two variants also exhibited anomalous migration in NaDodSO 4 /PAGE, which is attributed to a conformational change. The identity of albumins Naskapi and Mersin may have originated through descent from a common mid-Asiatic founder of the two migrating ethnic groups, or it may represent identical but independent mutations of the albumin gene. In albumin Adana, from Eti Turks, the substitution site was not identified but was localized to the region from positions 447 through 548. The substitution of aspartic acid-550 by glycine was found in albumin Mexico-2 from four individuals of the Pima tribe. Although only single-point substitutions have been found in these and in certain other genetic variants of human albumin, five differences exist in the amino acid sequences inferred from cDNA sequences by workers in three other laboratories. However, our results on albumin A and on 14 different genetic variants accord with the amino acid sequence of albumin deduced from the genomic sequence. The apparent amino acid substitutions inferred from comparison of individual cDNA sequences probably reflect artifacts in cloning or in cDNA sequence analysis rather than polymorphism of the coding sections of the albumin gene

  11. Analysis of a genetically structured variance heterogeneity model using the Box-Cox transformation.

    Science.gov (United States)

    Yang, Ye; Christensen, Ole F; Sorensen, Daniel

    2011-02-01

    Over recent years, statistical support for the presence of genetic factors operating at the level of the environmental variance has come from fitting a genetically structured heterogeneous variance model to field or experimental data in various species. Misleading results may arise due to skewness of the marginal distribution of the data. To investigate how the scale of measurement affects inferences, the genetically structured heterogeneous variance model is extended to accommodate the family of Box-Cox transformations. Litter size data in rabbits and pigs that had previously been analysed in the untransformed scale were reanalysed in a scale equal to the mode of the marginal posterior distribution of the Box-Cox parameter. In the rabbit data, the statistical evidence for a genetic component at the level of the environmental variance is considerably weaker than that resulting from an analysis in the original metric. In the pig data, the statistical evidence is stronger, but the coefficient of correlation between additive genetic effects affecting mean and variance changes sign, compared to the results in the untransformed scale. The study confirms that inferences on variances can be strongly affected by the presence of asymmetry in the distribution of data. We recommend that to avoid one important source of spurious inferences, future work seeking support for a genetic component acting on environmental variation using a parametric approach based on normality assumptions confirms that these are met.

  12. Population genetics inference for longitudinally-sampled mutants under strong selection.

    Science.gov (United States)

    Lacerda, Miguel; Seoighe, Cathal

    2014-11-01

    Longitudinal allele frequency data are becoming increasingly prevalent. Such samples permit statistical inference of the population genetics parameters that influence the fate of mutant variants. To infer these parameters by maximum likelihood, the mutant frequency is often assumed to evolve according to the Wright-Fisher model. For computational reasons, this discrete model is commonly approximated by a diffusion process that requires the assumption that the forces of natural selection and mutation are weak. This assumption is not always appropriate. For example, mutations that impart drug resistance in pathogens may evolve under strong selective pressure. Here, we present an alternative approximation to the mutant-frequency distribution that does not make any assumptions about the magnitude of selection or mutation and is much more computationally efficient than the standard diffusion approximation. Simulation studies are used to compare the performance of our method to that of the Wright-Fisher and Gaussian diffusion approximations. For large populations, our method is found to provide a much better approximation to the mutant-frequency distribution when selection is strong, while all three methods perform comparably when selection is weak. Importantly, maximum-likelihood estimates of the selection coefficient are severely attenuated when selection is strong under the two diffusion models, but not when our method is used. This is further demonstrated with an application to mutant-frequency data from an experimental study of bacteriophage evolution. We therefore recommend our method for estimating the selection coefficient when the effective population size is too large to utilize the discrete Wright-Fisher model. Copyright © 2014 by the Genetics Society of America.

  13. Analysis of a genetically structured variance heterogeneity model using the Box-Cox transformation

    DEFF Research Database (Denmark)

    Yang, Ye; Christensen, Ole Fredslund; Sorensen, Daniel

    2011-01-01

    of the marginal distribution of the data. To investigate how the scale of measurement affects inferences, the genetically structured heterogeneous variance model is extended to accommodate the family of Box–Cox transformations. Litter size data in rabbits and pigs that had previously been analysed...... in the untransformed scale were reanalysed in a scale equal to the mode of the marginal posterior distribution of the Box–Cox parameter. In the rabbit data, the statistical evidence for a genetic component at the level of the environmental variance is considerably weaker than that resulting from an analysis...... in the original metric. In the pig data, the statistical evidence is stronger, but the coefficient of correlation between additive genetic effects affecting mean and variance changes sign, compared to the results in the untransformed scale. The study confirms that inferences on variances can be strongly affected...

  14. Using Fuzzy Gaussian Inference and Genetic Programming to Classify 3D Human Motions

    Science.gov (United States)

    Khoury, Mehdi; Liu, Honghai

    This research introduces and builds on the concept of Fuzzy Gaussian Inference (FGI) (Khoury and Liu in Proceedings of UKCI, 2008 and IEEE Workshop on Robotic Intelligence in Informationally Structured Space (RiiSS 2009), 2009) as a novel way to build Fuzzy Membership Functions that map to hidden Probability Distributions underlying human motions. This method is now combined with a Genetic Programming Fuzzy rule-based system in order to classify boxing moves from natural human Motion Capture data. In this experiment, FGI alone is able to recognise seven different boxing stances simultaneously with an accuracy superior to a GMM-based classifier. Results seem to indicate that adding an evolutionary Fuzzy Inference Engine on top of FGI improves the accuracy of the classifier in a consistent way.

  15. Causal inference in survival analysis using pseudo-observations

    DEFF Research Database (Denmark)

    Andersen, Per K; Syriopoulou, Elisavet; Parner, Erik T

    2017-01-01

    Causal inference for non-censored response variables, such as binary or quantitative outcomes, is often based on either (1) direct standardization ('G-formula') or (2) inverse probability of treatment assignment weights ('propensity score'). To do causal inference in survival analysis, one needs ...

  16. Meta-learning framework applied in bioinformatics inference system design.

    Science.gov (United States)

    Arredondo, Tomás; Ormazábal, Wladimir

    2015-01-01

    This paper describes a meta-learner inference system development framework which is applied and tested in the implementation of bioinformatic inference systems. These inference systems are used for the systematic classification of the best candidates for inclusion in bacterial metabolic pathway maps. This meta-learner-based approach utilises a workflow where the user provides feedback with final classification decisions which are stored in conjunction with analysed genetic sequences for periodic inference system training. The inference systems were trained and tested with three different data sets related to the bacterial degradation of aromatic compounds. The analysis of the meta-learner-based framework involved contrasting several different optimisation methods with various different parameters. The obtained inference systems were also contrasted with other standard classification methods with accurate prediction capabilities observed.

  17. Hybrid Origins of Citrus Varieties Inferred from DNA Marker Analysis of Nuclear and Organelle Genomes

    Science.gov (United States)

    Kitajima, Akira; Nonaka, Keisuke; Yoshioka, Terutaka; Ohta, Satoshi; Goto, Shingo; Toyoda, Atsushi; Fujiyama, Asao; Mochizuki, Takako; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu

    2016-01-01

    Most indigenous citrus varieties are assumed to be natural hybrids, but their parentage has so far been determined in only a few cases because of their wide genetic diversity and the low transferability of DNA markers. Here we infer the parentage of indigenous citrus varieties using simple sequence repeat and indel markers developed from various citrus genome sequence resources. Parentage tests with 122 known hybrids using the selected DNA markers certify their transferability among those hybrids. Identity tests confirm that most variant strains are selected mutants, but we find four types of kunenbo (Citrus nobilis) and three types of tachibana (Citrus tachibana) for which we suggest different origins. Structure analysis with DNA markers that are in Hardy–Weinberg equilibrium deduce three basic taxa coinciding with the current understanding of citrus ancestors. Genotyping analysis of 101 indigenous citrus varieties with 123 selected DNA markers infers the parentages of 22 indigenous citrus varieties including Satsuma, Temple, and iyo, and single parents of 45 indigenous citrus varieties, including kunenbo, C. ichangensis, and Ichang lemon by allele-sharing and parentage tests. Genotyping analysis of chloroplast and mitochondrial genomes using 11 DNA markers classifies their cytoplasmic genotypes into 18 categories and deduces the combination of seed and pollen parents. Likelihood ratio analysis verifies the inferred parentages with significant scores. The reconstructed genealogy identifies 12 types of varieties consisting of Kishu, kunenbo, yuzu, koji, sour orange, dancy, kobeni mikan, sweet orange, tachibana, Cleopatra, willowleaf mandarin, and pummelo, which have played pivotal roles in the occurrence of these indigenous varieties. The inferred parentage of the indigenous varieties confirms their hybrid origins, as found by recent studies. PMID:27902727

  18. Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation.

    Science.gov (United States)

    Kidd, Jeffrey M; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F; Peckham, Heather E; Omberg, Larsson; Bormann Chung, Christina A; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G; Russell, Archie; Reynolds, Andy; Clark, Andrew G; Reese, Martin G; Lincoln, Stephen E; Butte, Atul J; De La Vega, Francisco M; Bustamante, Carlos D

    2012-10-05

    Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today's African Americans dates back to European gene flow happening only 7-8 generations ago. Copyright © 2012 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  19. Genetic origin and dispersal of the invasive soybean aphid inferred from population genetic analysis and approximate Bayesian computation.

    Science.gov (United States)

    Fang, Fang; Chen, Jing; Jiang, Li-Yun; Qu, Yan-Hua; Qiao, Ge-Xia

    2018-01-09

    Biological invasion is considered one of the most important global environmental problems. Knowledge of the source and dispersal routes of invasion could facilitate the eradication and control of invasive species. Soybean aphid, Aphis glycines Matsumura, is one of the most destructive soybean pests. For effective management of this pest, we conducted genetic analyses and approximate Bayesian computation (ABC) analysis to determine the origins and dispersal of the aphid species, as well as the source of its invasion in the USA, using eight microsatellite loci and the mitochondrial cytochrome c oxidase subunit I (COI) gene. We were able to identify a significant isolation by distance (IBD) pattern and three genetic lineages in the microsatellite data but not in the mtDNA dataset. The genetic structure showed that the USA population has the closest relationship with those from Korea and Japan, indicating that the two latter populations might be the sources of the invasion to the USA. Both population genetic analyses and ABC showed that the northeastern populations in China were the possible sources of the further spread of A. glycines to Indonesia. The dispersal history of this aphid can provide useful information for pest management strategies and can further help predict areas at risk of invasion. This article is protected by copyright. All rights reserved.

  20. Genetic parameters and genetic and phenotypic trends of performance traits of equines from the Brazilian Army

    OpenAIRE

    Dornelles, Mariana de Almeida; Araújo, Ronyere Olegário de; Everling, Dionéia Magda; Weber, Tomás; Lopes, Jader Silva; Pacheco, Paulo Santana; Breda, Fernanda Cristina; Rorato, Paulo Roberto Nogara

    2012-01-01

    The objective of this research was to compare the magnitude of genetic parameters (coefficients of heritability and genetic correlation) as estimated by the Restricted Maximum Likelihood (REML) method and Bayesian Inference, and to estimate the genetic and phenotypic trends to the traits height at the withers (HW24) and weight at 24 months of age (W24). The average heritability estimated by Bayesian Inference to HW24 was 0.47, and it was lower than that obtained by REML bi-trait analysis (0.5...

  1. MEGA-CC: computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis.

    Science.gov (United States)

    Kumar, Sudhir; Stecher, Glen; Peterson, Daniel; Tamura, Koichiro

    2012-10-15

    There is a growing need in the research community to apply the molecular evolutionary genetics analysis (MEGA) software tool for batch processing a large number of datasets and to integrate it into analysis workflows. Therefore, we now make available the computing core of the MEGA software as a stand-alone executable (MEGA-CC), along with an analysis prototyper (MEGA-Proto). MEGA-CC provides users with access to all the computational analyses available through MEGA's graphical user interface version. This includes methods for multiple sequence alignment, substitution model selection, evolutionary distance estimation, phylogeny inference, substitution rate and pattern estimation, tests of natural selection and ancestral sequence inference. Additionally, we have upgraded the source code for phylogenetic analysis using the maximum likelihood methods for parallel execution on multiple processors and cores. Here, we describe MEGA-CC and outline the steps for using MEGA-CC in tandem with MEGA-Proto for iterative and automated data analysis. http://www.megasoftware.net/.

  2. On the validity of within-nuclear-family genetic association analysis in samples of extended families.

    Science.gov (United States)

    Bureau, Alexandre; Duchesne, Thierry

    2015-12-01

    Splitting extended families into their component nuclear families to apply a genetic association method designed for nuclear families is a widespread practice in familial genetic studies. Dependence among genotypes and phenotypes of nuclear families from the same extended family arises because of genetic linkage of the tested marker with a risk variant or because of familial specificity of genetic effects due to gene-environment interaction. This raises concerns about the validity of inference conducted under the assumption of independence of the nuclear families. We indeed prove theoretically that, in a conditional logistic regression analysis applicable to disease cases and their genotyped parents, the naive model-based estimator of the variance of the coefficient estimates underestimates the true variance. However, simulations with realistic effect sizes of risk variants and variation of this effect from family to family reveal that the underestimation is negligible. The simulations also show the greater efficiency of the model-based variance estimator compared to a robust empirical estimator. Our recommendation is therefore, to use the model-based estimator of variance for inference on effects of genetic variants.

  3. A Comparative Analysis of Fuzzy Inference Engines in Context of ...

    African Journals Online (AJOL)

    Fuzzy inference engine has found successful applications in a wide variety of fields, such as automatic control, data classification, decision analysis, expert engines, time series prediction, robotics, pattern recognition, etc. This paper presents a comparative analysis of three fuzzy inference engines, max-product, max-min ...

  4. Inferring genetic connectivity in real populations, exemplified by coastal and oceanic Atlantic cod.

    Science.gov (United States)

    Spies, Ingrid; Hauser, Lorenz; Jorde, Per Erik; Knutsen, Halvor; Punt, André E; Rogers, Lauren A; Stenseth, Nils Chr

    2018-04-19

    Genetic data are commonly used to estimate connectivity between putative populations, but translating them to demographic dispersal rates is complicated. Theoretical equations that infer a migration rate based on the genetic estimator F ST , such as Wright's equation, F ST ≈ 1/(4 N e m + 1), make assumptions that do not apply to most real populations. How complexities inherent to real populations affect migration was exemplified by Atlantic cod in the North Sea and Skagerrak and was examined within an age-structured model that incorporated genetic markers. Migration was determined under various scenarios by varying the number of simulated migrants until the mean simulated level of genetic differentiation matched a fixed level of genetic differentiation equal to empirical estimates. Parameters that decreased the N e / N t ratio (where N e is the effective and N t is the total population size), such as high fishing mortality and high fishing gear selectivity, increased the number of migrants required to achieve empirical levels of genetic differentiation. Higher maturity-at-age and lower selectivity increased N e / N t and decreased migration when genetic differentiation was fixed. Changes in natural mortality, fishing gear selectivity, and maturity-at-age within expected limits had a moderate effect on migration when genetic differentiation was held constant. Changes in population size had the greatest effect on the number of migrants to achieve fixed levels of F ST , particularly when genetic differentiation was low, F ST ≈ 10 -3 Highly variable migration patterns, compared with constant migration, resulted in higher variance in genetic differentiation and higher extreme values. Results are compared with and provide insight into the use of theoretical equations to estimate migration among real populations. Copyright © 2018 the Author(s). Published by PNAS.

  5. Inferred vs realized patterns of gene flow: an analysis of population structure in the Andros Island Rock Iguana.

    Science.gov (United States)

    Colosimo, Giuliano; Knapp, Charles R; Wallace, Lisa E; Welch, Mark E

    2014-01-01

    Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p<0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes.

  6. Inferred vs Realized Patterns of Gene Flow: An Analysis of Population Structure in the Andros Island Rock Iguana

    Science.gov (United States)

    Colosimo, Giuliano; Knapp, Charles R.; Wallace, Lisa E.; Welch, Mark E.

    2014-01-01

    Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes. PMID:25229344

  7. Inferred vs realized patterns of gene flow: an analysis of population structure in the Andros Island Rock Iguana.

    Directory of Open Access Journals (Sweden)

    Giuliano Colosimo

    Full Text Available Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p<<0.01. These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes.

  8. A human genome-wide library of local phylogeny predictions for whole-genome inference problems

    Directory of Open Access Journals (Sweden)

    Schwartz Russell

    2008-08-01

    Full Text Available Abstract Background Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding. Results In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history. Conclusion Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.

  9. Extensive dispersal of Roanoke logperch (Percina rex) inferred from genetic marker data

    Science.gov (United States)

    Roberts, James H.; Angermeier, Paul; Hallerman, Eric M.

    2016-01-01

    The dispersal ecology of most stream fishes is poorly characterised, complicating conservation efforts for these species. We used microsatellite DNA marker data to characterise dispersal patterns and effective population size (Ne) for a population of Roanoke logperchPercina rex, an endangered darter (Percidae). Juveniles and candidate parents were sampled for 2 years at sites throughout the Roanoke River watershed. Dispersal was inferred via genetic assignment tests (ATs), pedigree reconstruction (PR) and estimation of lifetime dispersal distance under a genetic isolation-by-distance model. Estimates of Ne varied from 105 to 1218 individuals, depending on the estimation method. Based on PR, polygamy was frequent in parents of both sexes, with individuals spawning with an average of 2.4 mates. The sample contained 61 half-sibling pairs, but only one parent–offspring pair and no full-sib pairs, which limited our ability to discriminate natal dispersal of juveniles from breeding dispersal of their parents between spawning events. Nonetheless, all methods indicated extensive dispersal. The AT indicated unrestricted dispersal among sites ≤15 km apart, while siblings inferred by the PR were captured an average of 14 km and up to 55 km apart. Model-based estimates of median lifetime dispersal distance (6–24 km, depending on assumptions) bracketed AT and PR estimates, indicating that widely dispersed individuals do, on average, contribute to gene flow. Extensive dispersal of P. rex suggests that darters and other small benthic stream fishes may be unexpectedly mobile. Monitoring and management activities for such populations should encompass entire watersheds to fully capture population dynamics.

  10. Molecular phylogeny of Toxoplasmatinae: comparison between inferences based on mitochondrial and apicoplast genetic sequences

    Directory of Open Access Journals (Sweden)

    Michelle Klein Sercundes

    2016-03-01

    Full Text Available Abstract Phylogenies within Toxoplasmatinae have been widely investigated with different molecular markers. Here, we studied molecular phylogenies of the Toxoplasmatinae subfamily based on apicoplast and mitochondrial genes. Partial sequences of apicoplast genes coding for caseinolytic protease (clpC and beta subunit of RNA polymerase (rpoB, and mitochondrial gene coding for cytochrome B (cytB were analyzed. Laboratory-adapted strains of the closely related parasites Sarcocystis falcatula and Sarcocystis neurona were investigated, along with Neospora caninum, Neospora hughesi, Toxoplasma gondii (strains RH, CTG and PTG, Besnoitia akodoni, Hammondia hammondiand two genetically divergent lineages of Hammondia heydorni. The molecular analysis based on organellar genes did not clearly differentiate between N. caninum and N. hughesi, but the two lineages of H. heydorni were confirmed. Slight differences between the strains of S. falcatula and S. neurona were encountered in all markers. In conclusion, congruent phylogenies were inferred from the three different genes and they might be used for screening undescribed sarcocystid parasites in order to ascertain their phylogenetic relationships with organisms of the family Sarcocystidae. The evolutionary studies based on organelar genes confirm that the genusHammondia is paraphyletic. The primers used for amplification of clpC and rpoB were able to amplify genetic sequences of organisms of the genus Sarcocystisand organisms of the subfamily Toxoplasmatinae as well.

  11. A landscape genetic analysis of important agricultural pest species in Tunisia: The whitefly Bemisia tabaci.

    Directory of Open Access Journals (Sweden)

    Ahmed Ben Abdelkrim

    Full Text Available Combining landscape ecology and genetics provides an excellent framework to appreciate pest population dynamics and dispersal. The genetic architectures of many species are always shaped by environmental constraints. Because little is known about the ecological and genetic traits of Tunisian whitefly populations, the main objective of this work is to highlight patterns of biodiversity, genetic structure and migration routes of this pest. We used nuclear microsatellite loci to analyze B. tabaci populations collected from various agricultural areas across the country and we determine their biotype status. Molecular data were subsequently interpreted in an ecological context supplied from a species distribution model to infer habitat suitability and hereafter the potential connection paths between sampling localities. An analysis of landscape resistance to B. tabaci genetic flow was thus applied to take into account habitat suitability, genetic relatedness and functional connectivity of habitats within a varied landscape matrix. We shed light on the occurrence of three geographically delineated genetic groups with high levels of genetic differentiation within each of them. Potential migration corridors of this pest were then established providing significant advances toward the understanding of genetic features and the dynamic dispersal of this pest. This study supports the hypothesis of a long-distance dispersal of B. tabaci followed by infrequent long-term isolations. The Inference of population sources and colonization routes is critical for the design and implementation of accurate management strategies against this pest.

  12. A landscape genetic analysis of important agricultural pest species in Tunisia: The whitefly Bemisia tabaci.

    Science.gov (United States)

    Ben Abdelkrim, Ahmed; Hattab, Tarek; Fakhfakh, Hatem; Belkadhi, Mohamed Sadok; Gorsane, Faten

    2017-01-01

    Combining landscape ecology and genetics provides an excellent framework to appreciate pest population dynamics and dispersal. The genetic architectures of many species are always shaped by environmental constraints. Because little is known about the ecological and genetic traits of Tunisian whitefly populations, the main objective of this work is to highlight patterns of biodiversity, genetic structure and migration routes of this pest. We used nuclear microsatellite loci to analyze B. tabaci populations collected from various agricultural areas across the country and we determine their biotype status. Molecular data were subsequently interpreted in an ecological context supplied from a species distribution model to infer habitat suitability and hereafter the potential connection paths between sampling localities. An analysis of landscape resistance to B. tabaci genetic flow was thus applied to take into account habitat suitability, genetic relatedness and functional connectivity of habitats within a varied landscape matrix. We shed light on the occurrence of three geographically delineated genetic groups with high levels of genetic differentiation within each of them. Potential migration corridors of this pest were then established providing significant advances toward the understanding of genetic features and the dynamic dispersal of this pest. This study supports the hypothesis of a long-distance dispersal of B. tabaci followed by infrequent long-term isolations. The Inference of population sources and colonization routes is critical for the design and implementation of accurate management strategies against this pest.

  13. Inference of gene regulatory networks with sparse structural equation models exploiting genetic perturbations.

    Directory of Open Access Journals (Sweden)

    Xiaodong Cai

    Full Text Available Integrating genetic perturbations with gene expression data not only improves accuracy of regulatory network topology inference, but also enables learning of causal regulatory relations between genes. Although a number of methods have been developed to integrate both types of data, the desiderata of efficient and powerful algorithms still remains. In this paper, sparse structural equation models (SEMs are employed to integrate both gene expression data and cis-expression quantitative trait loci (cis-eQTL, for modeling gene regulatory networks in accordance with biological evidence about genes regulating or being regulated by a small number of genes. A systematic inference method named sparsity-aware maximum likelihood (SML is developed for SEM estimation. Using simulated directed acyclic or cyclic networks, the SML performance is compared with that of two state-of-the-art algorithms: the adaptive Lasso (AL based scheme, and the QTL-directed dependency graph (QDG method. Computer simulations demonstrate that the novel SML algorithm offers significantly better performance than the AL-based and QDG algorithms across all sample sizes from 100 to 1,000, in terms of detection power and false discovery rate, in all the cases tested that include acyclic or cyclic networks of 10, 30 and 300 genes. The SML method is further applied to infer a network of 39 human genes that are related to the immune function and are chosen to have a reliable eQTL per gene. The resulting network consists of 9 genes and 13 edges. Most of the edges represent interactions reasonably expected from experimental evidence, while the remaining may just indicate the emergence of new interactions. The sparse SEM and efficient SML algorithm provide an effective means of exploiting both gene expression and perturbation data to infer gene regulatory networks. An open-source computer program implementing the SML algorithm is freely available upon request.

  14. Genetic parameters and genetic and phenotypic trends of performance traits of equines from the Brazilian Army

    Directory of Open Access Journals (Sweden)

    Mariana de Almeida Dornelles

    2012-06-01

    Full Text Available The objective of this research was to compare the magnitude of genetic parameters (coefficients of heritability and genetic correlation as estimated by the Restricted Maximum Likelihood (REML method and Bayesian Inference, and to estimate the genetic and phenotypic trends to the traits height at the withers (HW24 and weight at 24 months of age (W24. The average heritability estimated by Bayesian Inference to HW24 was 0.47, and it was lower than that obtained by REML bi-trait analysis (0.52; however, the value estimated to W24 (0.39 was higher than that obtained by REML bi-trait analysis (0.38. The genetic correlation estimate between W24 and HW24 traits obtained by the REML method (0.66 was lower than that obtained by the Bayesian Inference Method (0.72. From the regression of the average additive genetic merit in the year of birth of the animals, it was found that the averaged genetic values of the animals for HW24 showed a genetic trend near zero (-0.0008cm/year, and the averaged genetic values for W24 showed a negative trend of -0.38 kg/year. The values to the direct heritability estimated for HW24 and W24 suggest that the direct selection for these traits can provide genetic gain in this population. The genetic correlation between the traits, high and positive, suggests that the selection for HW24 should promote increase in W24 at this age. The genetic trends obtained for the traits studied, near zero, indicate that the selection performed produced a slight reduction of the weight of the animals at 24 months of age; however, it did not promote increase in height at the wither at this same age, in this population.

  15. Journal of Genetics | Indian Academy of Sciences

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Genetics; Volume 96; Online resources. Population genetic diversity of marble goby (Oxyeleotris marmoratus)inferred from mitochondrial DNA and microsatellite analysis. CHENG ZHAO XIAOPING ZHU YICHUN GU QINTAO WANG ZECHENG LI SHAOWU YIN. ONLINE RESOURCES Volume 96 ...

  16. Inference of population history and patterns from molecular data

    DEFF Research Database (Denmark)

    Tataru, Paula

    , the existing mathematical models and computational methods need to be reformulated. I address this from an inference perspective in two areas of bioinformatics. Population genetics studies the influence exerted by various factors on the dynamics of a population's genetic variation. These factors cover...... evolutionary forces, such as mutation and selection, but also changes in population size. The aim in population genetics is to untangle the history of a population from observed genetic variation. This subject is dominated by two dual models, the Wright-Fisher and coalescent. I first introduce a new...... approximation to the Wright-Fisher model, which I show to accurately infer split times between populations. This approximation can potentially be applied for inference of mutation rates and selection coefficients. I then illustrate how the coalescent process is the natural framework for detecting traces...

  17. Causal inference in survival analysis using pseudo-observations.

    Science.gov (United States)

    Andersen, Per K; Syriopoulou, Elisavet; Parner, Erik T

    2017-07-30

    Causal inference for non-censored response variables, such as binary or quantitative outcomes, is often based on either (1) direct standardization ('G-formula') or (2) inverse probability of treatment assignment weights ('propensity score'). To do causal inference in survival analysis, one needs to address right-censoring, and often, special techniques are required for that purpose. We will show how censoring can be dealt with 'once and for all' by means of so-called pseudo-observations when doing causal inference in survival analysis. The pseudo-observations can be used as a replacement of the outcomes without censoring when applying 'standard' causal inference methods, such as (1) or (2) earlier. We study this idea for estimating the average causal effect of a binary treatment on the survival probability, the restricted mean lifetime, and the cumulative incidence in a competing risks situation. The methods will be illustrated in a small simulation study and via a study of patients with acute myeloid leukemia who received either myeloablative or non-myeloablative conditioning before allogeneic hematopoetic cell transplantation. We will estimate the average causal effect of the conditioning regime on outcomes such as the 3-year overall survival probability and the 3-year risk of chronic graft-versus-host disease. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  18. The impact of within-herd genetic variation upon inferred transmission trees for foot-and-mouth disease virus.

    Science.gov (United States)

    Valdazo-González, Begoña; Kim, Jan T; Soubeyrand, Samuel; Wadsworth, Jemma; Knowles, Nick J; Haydon, Daniel T; King, Donald P

    2015-06-01

    Full-genome sequences have been used to monitor the fine-scale dynamics of epidemics caused by RNA viruses. However, the ability of this approach to confidently reconstruct transmission trees is limited by the knowledge of the genetic diversity of viruses that exist within different epidemiological units. In order to address this question, this study investigated the variability of 45 foot-and-mouth disease virus (FMDV) genome sequences (from 33 animals) that were collected during 2007 from eight premises (10 different herds) in the United Kingdom. Bayesian and statistical parsimony analysis demonstrated that these sequences exhibited clustering which was consistent with a transmission scenario describing herd-to-herd spread of the virus. As an alternative to analysing all of the available samples in future epidemics, the impact of randomly selecting one sequence from each of these herds was used to assess cost-effective methods that might be used to infer transmission trees during FMD outbreaks. Using these approaches, 85% and 91% of the resulting topologies were either identical or differed by only one edge from a reference tree comprising all of the sequences generated within the outbreak. The sequence distances that accrued during sequential transmission events between epidemiological units was estimated to be 4.6 nucleotides, although the genetic variability between viruses recovered from chronic carrier animals was higher than between viruses from animals with acute-stage infection: an observation which poses challenges for the use of simple approaches to infer transmission trees. This study helps to develop strategies for sampling during FMD outbreaks, and provides data that will guide the development of further models to support control policies in the event of virus incursions into FMD free countries. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  19. Differential network analysis reveals genetic effects on catalepsy modules.

    Directory of Open Access Journals (Sweden)

    Ovidiu D Iancu

    Full Text Available We performed short-term bi-directional selective breeding for haloperidol-induced catalepsy, starting from three mouse populations of increasingly complex genetic structure: an F2 intercross, a heterogeneous stock (HS formed by crossing four inbred strains (HS4 and a heterogeneous stock (HS-CC formed from the inbred strain founders of the Collaborative Cross (CC. All three selections were successful, with large differences in haloperidol response emerging within three generations. Using a custom differential network analysis procedure, we found that gene coexpression patterns changed significantly; importantly, a number of these changes were concordant across genetic backgrounds. In contrast, absolute gene-expression changes were modest and not concordant across genetic backgrounds, in spite of the large and similar phenotypic differences. By inferring strain contributions from the parental lines, we are able to identify significant differences in allelic content between the selected lines concurrent with large changes in transcript connectivity. Importantly, this observation implies that genetic polymorphisms can affect transcript and module connectivity without large changes in absolute expression levels. We conclude that, in this case, selective breeding acts at the subnetwork level, with the same modules but not the same transcripts affected across the three selections.

  20. Likelihood-based inference for discretely observed birth-death-shift processes, with applications to evolution of mobile genetic elements.

    Science.gov (United States)

    Xu, Jason; Guttorp, Peter; Kato-Maeda, Midori; Minin, Vladimir N

    2015-12-01

    Continuous-time birth-death-shift (BDS) processes are frequently used in stochastic modeling, with many applications in ecology and epidemiology. In particular, such processes can model evolutionary dynamics of transposable elements-important genetic markers in molecular epidemiology. Estimation of the effects of individual covariates on the birth, death, and shift rates of the process can be accomplished by analyzing patient data, but inferring these rates in a discretely and unevenly observed setting presents computational challenges. We propose a multi-type branching process approximation to BDS processes and develop a corresponding expectation maximization algorithm, where we use spectral techniques to reduce calculation of expected sufficient statistics to low-dimensional integration. These techniques yield an efficient and robust optimization routine for inferring the rates of the BDS process, and apply broadly to multi-type branching processes whose rates can depend on many covariates. After rigorously testing our methodology in simulation studies, we apply our method to study intrapatient time evolution of IS6110 transposable element, a genetic marker frequently used during estimation of epidemiological clusters of Mycobacterium tuberculosis infections. © 2015, The International Biometric Society.

  1. Adaptive divergence despite strong genetic drift: genomic analysis of the evolutionary mechanisms causing genetic differentiation in the island fox (Urocyon littoralis)

    Science.gov (United States)

    FUNK, W. CHRIS; LOVICH, ROBERT E.; HOHENLOHE, PAUL A.; HOFMAN, COURTNEY A.; MORRISON, SCOTT A.; SILLETT, T. SCOTT; GHALAMBOR, CAMERON K.; MALDONADO, JESUS E.; RICK, TORBEN C.; DAY, MITCH D.; POLATO, NICHOLAS R.; FITZPATRICK, SARAH W.; COONAN, TIMOTHY J.; CROOKS, KEVIN R.; DILLON, ADAM; GARCELON, DAVID K.; KING, JULIE L.; BOSER, CHRISTINA L.; GOULD, NICHOLAS; ANDELT, WILLIAM F.

    2016-01-01

    The evolutionary mechanisms generating the tremendous biodiversity of islands have long fascinated evolutionary biologists. Genetic drift and divergent selection are predicted to be strong on islands and both could drive population divergence and speciation. Alternatively, strong genetic drift may preclude adaptation. We conducted a genomic analysis to test the roles of genetic drift and divergent selection in causing genetic differentiation among populations of the island fox (Urocyon littoralis). This species consists of 6 subspecies, each of which occupies a different California Channel Island. Analysis of 5293 SNP loci generated using Restriction-site Associated DNA (RAD) sequencing found support for genetic drift as the dominant evolutionary mechanism driving population divergence among island fox populations. In particular, populations had exceptionally low genetic variation, small Ne (range = 2.1–89.7; median = 19.4), and significant genetic signatures of bottlenecks. Moreover, islands with the lowest genetic variation (and, by inference, the strongest historical genetic drift) were most genetically differentiated from mainland gray foxes, and vice versa, indicating genetic drift drives genome-wide divergence. Nonetheless, outlier tests identified 3.6–6.6% of loci as high FST outliers, suggesting that despite strong genetic drift, divergent selection contributes to population divergence. Patterns of similarity among populations based on high FST outliers mirrored patterns based on morphology, providing additional evidence that outliers reflect adaptive divergence. Extremely low genetic variation and small Ne in some island fox populations, particularly on San Nicolas Island, suggest that they may be vulnerable to fixation of deleterious alleles, decreased fitness, and reduced adaptive potential. PMID:26992010

  2. Adaptive divergence despite strong genetic drift: genomic analysis of the evolutionary mechanisms causing genetic differentiation in the island fox (Urocyon littoralis).

    Science.gov (United States)

    Funk, W Chris; Lovich, Robert E; Hohenlohe, Paul A; Hofman, Courtney A; Morrison, Scott A; Sillett, T Scott; Ghalambor, Cameron K; Maldonado, Jesus E; Rick, Torben C; Day, Mitch D; Polato, Nicholas R; Fitzpatrick, Sarah W; Coonan, Timothy J; Crooks, Kevin R; Dillon, Adam; Garcelon, David K; King, Julie L; Boser, Christina L; Gould, Nicholas; Andelt, William F

    2016-05-01

    The evolutionary mechanisms generating the tremendous biodiversity of islands have long fascinated evolutionary biologists. Genetic drift and divergent selection are predicted to be strong on islands and both could drive population divergence and speciation. Alternatively, strong genetic drift may preclude adaptation. We conducted a genomic analysis to test the roles of genetic drift and divergent selection in causing genetic differentiation among populations of the island fox (Urocyon littoralis). This species consists of six subspecies, each of which occupies a different California Channel Island. Analysis of 5293 SNP loci generated using Restriction-site Associated DNA (RAD) sequencing found support for genetic drift as the dominant evolutionary mechanism driving population divergence among island fox populations. In particular, populations had exceptionally low genetic variation, small Ne (range = 2.1-89.7; median = 19.4), and significant genetic signatures of bottlenecks. Moreover, islands with the lowest genetic variation (and, by inference, the strongest historical genetic drift) were most genetically differentiated from mainland grey foxes, and vice versa, indicating genetic drift drives genome-wide divergence. Nonetheless, outlier tests identified 3.6-6.6% of loci as high FST outliers, suggesting that despite strong genetic drift, divergent selection contributes to population divergence. Patterns of similarity among populations based on high FST outliers mirrored patterns based on morphology, providing additional evidence that outliers reflect adaptive divergence. Extremely low genetic variation and small Ne in some island fox populations, particularly on San Nicolas Island, suggest that they may be vulnerable to fixation of deleterious alleles, decreased fitness and reduced adaptive potential. © 2016 John Wiley & Sons Ltd.

  3. Inferring Demographic History Using Two-Locus Statistics.

    Science.gov (United States)

    Ragsdale, Aaron P; Gutenkunst, Ryan N

    2017-06-01

    Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference. Copyright © 2017 by the Genetics Society of America.

  4. An analysis pipeline for the inference of protein-protein interaction networks

    Energy Technology Data Exchange (ETDEWEB)

    Taylor, Ronald C.; Singhal, Mudita; Daly, Don S.; Gilmore, Jason M.; Cannon, William R.; Domico, Kelly O.; White, Amanda M.; Auberry, Deanna L.; Auberry, Kenneth J.; Hooker, Brian S.; Hurst, G. B.; McDermott, Jason E.; McDonald, W. H.; Pelletier, Dale A.; Schmoyer, Denise A.; Wiley, H. S.

    2009-12-01

    An analysis pipeline has been created for deployment of a novel algorithm, the Bayesian Estimator of Protein-Protein Association Probabilities (BEPro), for use in the reconstruction of protein-protein interaction networks. We have combined the Software Environment for BIological Network Inference (SEBINI), an interactive environment for the deployment and testing of network inference algorithms that use high-throughput data, and the Collective Analysis of Biological Interaction Networks (CABIN), software that allows integration and analysis of protein-protein interaction and gene-to-gene regulatory evidence obtained from multiple sources, to allow interactions computed by BEPro to be stored, visualized, and further analyzed. Incorporating BEPro into SEBINI and automatically feeding the resulting inferred network into CABIN, we have created a structured workflow for protein-protein network inference and supplemental analysis from sets of mass spectrometry bait-prey experiment data. SEBINI demo site: https://www.emsl.pnl.gov /SEBINI/ Contact: ronald.taylor@pnl.gov. BEPro is available at http://www.pnl.gov/statistics/BEPro3/index.htm. Contact: ds.daly@pnl.gov. CABIN is available at http://www.sysbio.org/dataresources/cabin.stm. Contact: mudita.singhal@pnl.gov.

  5. [Genetic variation and differentiation in striped field mouse Apodemus agrarius inferred from RAPD-PCR analysis].

    Science.gov (United States)

    Atopkin, D M; Bogdanov, A S; Chelomina, G N

    2007-06-01

    Genetic variation and differentiation of the trans-Palearctic species Apodemus agrarius (striped field mouse), whose range consists of two large isolates-European-Siberian and Far Eastern-Chinese, were examined using RAPD-PCR analysis. The material from the both parts of the range was examined (41 individual of A. agrarius from 18 localities of Russia, Ukraine, Moldova, and Kazakhstan); the Far-Eastern part was represented by samples from the Amur region, Khabarovsk krai, and Primorye (Russia). Differences in frequencies of polymorphic RAPD loci were found between the European-Siberian and the Far Eastern population groups of striped field mouse. No "fixed" differences between them in RAPD spectra were found, and none of the used statistical methods permitted to distinguish with absolute certainty animals from the two range parts. Thus, genetic isolation of the European-Siberian and the Far Eastern population groups of A. agrarius is not strict. These results support the hypothesis on recent dispersal of striped field mouse from East to West Palearctics (during the Holocene climatic optimum, 7000 to 4500 years ago) and subsequent disjunction of the species range (not earlier than 4000-4500 years ago). The Far Eastern population group is more polymorphic than the European-Siberian one, while genetic heterogeneity is more uniformly distributed within it. This is probably explained by both historical events that happened during the species dispersal in the past, and different environmental conditions for the species in different parts of its range. The Far Eastern population group inhabits the area close to the distribution center of A. agrarius. It is likely that this group preserved genetic variation of the formerly integral ancestral form, while some amount of genetic polymorphism could be lost during the species colonization of the Siberian and European areas. To date, the settlement density and population number in general are higher than within the European

  6. Inferring genetic parameters of lactation in Tropical Milking Criollo cattle with random regression test-day models.

    Science.gov (United States)

    Santellano-Estrada, E; Becerril-Pérez, C M; de Alba, J; Chang, Y M; Gianola, D; Torres-Hernández, G; Ramírez-Valverde, R

    2008-11-01

    This study inferred genetic and permanent environmental variation of milk yield in Tropical Milking Criollo cattle and compared 5 random regression test-day models using Wilmink's function and Legendre polynomials. Data consisted of 15,377 test-day records from 467 Tropical Milking Criollo cows that calved between 1974 and 2006 in the tropical lowlands of the Gulf Coast of Mexico and in southern Nicaragua. Estimated heritabilities of test-day milk yields ranged from 0.18 to 0.45, and repeatabilities ranged from 0.35 to 0.68 for the period spanning from 6 to 400 d in milk. Genetic correlation between days in milk 10 and 400 was around 0.50 but greater than 0.90 for most pairs of test days. The model that used first-order Legendre polynomials for additive genetic effects and second-order Legendre polynomials for permanent environmental effects gave the smallest residual variance and was also favored by the Akaike information criterion and likelihood ratio tests.

  7. Comparison of Channel Catfish and Blue Catfish Gut Microbiota Assemblages Shows Minimal Effects of Host Genetics on Microbial Structure and Inferred Function

    Directory of Open Access Journals (Sweden)

    Jacob W. Bledsoe

    2018-05-01

    Full Text Available The microbiota of teleost fish has gained a great deal of research attention within the past decade, with experiments suggesting that both host-genetics and environment are strong ecological forces shaping the bacterial assemblages of fish microbiomes. Despite representing great commercial and scientific importance, the catfish within the family Ictaluridae, specifically the blue and channel catfish, have received very little research attention directed toward their gut-associated microbiota using 16S rRNA gene sequencing. Within this study we utilize multiple genetically distinct strains of blue and channel catfish, verified via microsatellite genotyping, to further quantify the role of host-genetics in shaping the bacterial communities in the fish gut, while maintaining environmental and husbandry parameters constant. Comparisons of the gut microbiota among the two catfish species showed no differences in bacterial species richness (observed and Chao1 or overall composition (weighted and unweighted UniFrac and UniFrac distances showed no correlation with host genetic distances (Rst according to Mantel tests. The microbiota of environmental samples (diet and water were found to be significantly more diverse than that of the catfish gut associated samples, suggesting that factors within the host were further regulating the bacterial communities, despite the lack of a clear connection between microbiota composition and host genotype. The catfish gut communities were dominated by the phyla Fusobacteria, Proteobacteria, and Firmicutes; however, differential abundance analysis between the two catfish species using analysis of composition of microbiomes detected two differential genera, Cetobacterium and Clostridium XI. The metagenomic pathway features inferred from our dataset suggests the catfish gut bacterial communities possess pathways beneficial to their host such as those involved in nutrient metabolism and antimicrobial biosynthesis, while

  8. Genealogical and evolutionary inference with the human Y chromosome.

    Science.gov (United States)

    Stumpf, M P; Goldstein, D B

    2001-03-02

    Population genetics has emerged as a powerful tool for unraveling human history. In addition to the study of mitochondrial and autosomal DNA, attention has recently focused on Y-chromosome variation. Ambiguities and inaccuracies in data analysis, however, pose an important obstacle to further development of the field. Here we review the methods available for genealogical inference using Y-chromosome data. Approaches can be divided into those that do and those that do not use an explicit population model in genealogical inference. We describe the strengths and weaknesses of these model-based and model-free approaches, as well as difficulties associated with the mutation process that affect both methods. In the case of genealogical inference using microsatellite loci, we use coalescent simulations to show that relatively simple generalizations of the mutation process can greatly increase the accuracy of genealogical inference. Because model-free and model-based approaches have different biases and limitations, we conclude that there is considerable benefit in the continued use of both types of approaches.

  9. A Statistical Framework for Microbial Source Attribution: Measuring Uncertainty in Host Transmission Events Inferred from Genetic Data (Part 2 of a 2 Part Report)

    Energy Technology Data Exchange (ETDEWEB)

    Allen, J; Velsko, S

    2009-11-16

    This report explores the question of whether meaningful conclusions can be drawn regarding the transmission relationship between two microbial samples on the basis of differences observed between the two sample's respective genomes. Unlike similar forensic applications using human DNA, the rapid rate of microbial genome evolution combined with the dynamics of infectious disease require a shift in thinking on what it means for two samples to 'match' in support of a forensic hypothesis. Previous outbreaks for SARS-CoV, FMDV and HIV were examined to investigate the question of how microbial sequence data can be used to draw inferences that link two infected individuals by direct transmission. The results are counter intuitive with respect to human DNA forensic applications in that some genetic change rather than exact matching improve confidence in inferring direct transmission links, however, too much genetic change poses challenges, which can weaken confidence in inferred links. High rates of infection coupled with relatively weak selective pressure observed in the SARS-CoV and FMDV data lead to fairly low confidence for direct transmission links. Confidence values for forensic hypotheses increased when testing for the possibility that samples are separated by at most a few intermediate hosts. Moreover, the observed outbreak conditions support the potential to provide high confidence values for hypothesis that exclude direct transmission links. Transmission inferences are based on the total number of observed or inferred genetic changes separating two sequences rather than uniquely weighing the importance of any one genetic mismatch. Thus, inferences are surprisingly robust in the presence of sequencing errors provided the error rates are randomly distributed across all samples in the reference outbreak database and the novel sequence samples in question. When the number of observed nucleotide mutations are limited due to characteristics of the

  10. Inferring Population Genetic Structure in Widely and Continuously Distributed Carnivores: The Stone Marten (Martes foina) as a Case Study.

    Science.gov (United States)

    Vergara, María; Basto, Mafalda P; Madeira, María José; Gómez-Moliner, Benjamín J; Santos-Reis, Margarida; Fernandes, Carlos; Ruiz-González, Aritz

    2015-01-01

    The stone marten is a widely distributed mustelid in the Palaearctic region that exhibits variable habitat preferences in different parts of its range. The species is a Holocene immigrant from southwest Asia which, according to fossil remains, followed the expansion of the Neolithic farming cultures into Europe and possibly colonized the Iberian Peninsula during the Early Neolithic (ca. 7,000 years BP). However, the population genetic structure and historical biogeography of this generalist carnivore remains essentially unknown. In this study we have combined mitochondrial DNA (mtDNA) sequencing (621 bp) and microsatellite genotyping (23 polymorphic markers) to infer the population genetic structure of the stone marten within the Iberian Peninsula. The mtDNA data revealed low haplotype and nucleotide diversities and a lack of phylogeographic structure, most likely due to a recent colonization of the Iberian Peninsula by a few mtDNA lineages during the Early Neolithic. The microsatellite data set was analysed with a) spatial and non-spatial Bayesian individual-based clustering (IBC) approaches (STRUCTURE, TESS, BAPS and GENELAND), and b) multivariate methods [discriminant analysis of principal components (DAPC) and spatial principal component analysis (sPCA)]. Additionally, because isolation by distance (IBD) is a common spatial genetic pattern in mobile and continuously distributed species and it may represent a challenge to the performance of the above methods, the microsatellite data set was tested for its presence. Overall, the genetic structure of the stone marten in the Iberian Peninsula was characterized by a NE-SW spatial pattern of IBD, and this may explain the observed disagreement between clustering solutions obtained by the different IBC methods. However, there was significant indication for contemporary genetic structuring, albeit weak, into at least three different subpopulations. The detected subdivision could be attributed to the influence of the

  11. Inferring Population Genetic Structure in Widely and Continuously Distributed Carnivores: The Stone Marten (Martes foina as a Case Study.

    Directory of Open Access Journals (Sweden)

    María Vergara

    Full Text Available The stone marten is a widely distributed mustelid in the Palaearctic region that exhibits variable habitat preferences in different parts of its range. The species is a Holocene immigrant from southwest Asia which, according to fossil remains, followed the expansion of the Neolithic farming cultures into Europe and possibly colonized the Iberian Peninsula during the Early Neolithic (ca. 7,000 years BP. However, the population genetic structure and historical biogeography of this generalist carnivore remains essentially unknown. In this study we have combined mitochondrial DNA (mtDNA sequencing (621 bp and microsatellite genotyping (23 polymorphic markers to infer the population genetic structure of the stone marten within the Iberian Peninsula. The mtDNA data revealed low haplotype and nucleotide diversities and a lack of phylogeographic structure, most likely due to a recent colonization of the Iberian Peninsula by a few mtDNA lineages during the Early Neolithic. The microsatellite data set was analysed with a spatial and non-spatial Bayesian individual-based clustering (IBC approaches (STRUCTURE, TESS, BAPS and GENELAND, and b multivariate methods [discriminant analysis of principal components (DAPC and spatial principal component analysis (sPCA]. Additionally, because isolation by distance (IBD is a common spatial genetic pattern in mobile and continuously distributed species and it may represent a challenge to the performance of the above methods, the microsatellite data set was tested for its presence. Overall, the genetic structure of the stone marten in the Iberian Peninsula was characterized by a NE-SW spatial pattern of IBD, and this may explain the observed disagreement between clustering solutions obtained by the different IBC methods. However, there was significant indication for contemporary genetic structuring, albeit weak, into at least three different subpopulations. The detected subdivision could be attributed to the influence

  12. Genetic relatedness of indigenous ethnic groups in northern Borneo to neighboring populations from Southeast Asia, as inferred from genome-wide SNP data.

    Science.gov (United States)

    Yew, Chee Wei; Hoque, Mohd Zahirul; Pugh-Kitingan, Jacqueline; Minsong, Alexander; Voo, Christopher Lok Yung; Ransangan, Julian; Lau, Sophia Tiek Ying; Wang, Xu; Saw, Woei Yuh; Ong, Rick Twee-Hee; Teo, Yik-Ying; Xu, Shuhua; Hoh, Boon-Peng; Phipps, Maude E; Kumar, S Vijay

    2018-07-01

    The region of northern Borneo is home to the current state of Sabah, Malaysia. It is located closest to the southern Philippine islands and may have served as a viaduct for ancient human migration onto or off of Borneo Island. In this study, five indigenous ethnic groups from Sabah were subjected to genome-wide SNP genotyping. These individuals represent the "North Borneo"-speaking group of the great Austronesian family. They have traditionally resided in the inland region of Sabah. The dataset was merged with public datasets, and the genetic relatedness of these groups to neighboring populations from the islands of Southeast Asia, mainland Southeast Asia and southern China was inferred. Genetic structure analysis revealed that these groups formed a genetic cluster that was independent of the clusters of neighboring populations. Additionally, these groups exhibited near-absolute proportions of a genetic component that is also common among Austronesians from Taiwan and the Philippines. They showed no genetic admixture with Austro-Melanesian populations. Furthermore, phylogenetic analysis showed that they are closely related to non-Austro-Melansian Filipinos as well as to Taiwan natives but are distantly related to populations from mainland Southeast Asia. Relatively lower heterozygosity and higher pairwise genetic differentiation index (F ST ) values than those of nearby populations indicate that these groups might have experienced genetic drift in the past, resulting in their differentiation from other Austronesians. Subsequent formal testing suggested that these populations have received no gene flow from neighboring populations. Taken together, these results imply that the indigenous ethnic groups of northern Borneo shared a common ancestor with Taiwan natives and non-Austro-Melanesian Filipinos and then isolated themselves on the inland of Sabah. This isolation presumably led to no admixture with other populations, and these individuals therefore underwent

  13. A Unifying Model for the Analysis of Phenotypic, Genetic and Geographic Data

    DEFF Research Database (Denmark)

    Guillot, Gilles; Rena, Sabrina; Ledevin, Ronan

    2012-01-01

    Recognition of evolutionary units (species, populations) requires integrating several kinds of data such as genetic or phenotypic markers or spatial information, in order to get a comprehensive view concerning the dierentiation of the units. We propose a statistical model with a double original...... advantage: (i) it incorporates information about the spatial distribution of the samples, with the aim to increase inference power and to relate more explicitly observed patterns to geography; and (ii) it allows one to analyze genetic and phenotypic data within a unied model and inference framework, thus...... an intricate case of inter- and intra-species dierentiation based on an original data-set of georeferenced genetic and morphometric markers obtained on Myodes voles from Sweden. A computer program is made available as an extension of the R package Geneland....

  14. Reduced genetic variance among high fitness individuals: inferring stabilizing selection on male sexual displays in Drosophila serrata.

    Science.gov (United States)

    Sztepanacz, Jacqueline L; Rundle, Howard D

    2012-10-01

    Directional selection is prevalent in nature, yet phenotypes tend to remain relatively constant, suggesting a limit to trait evolution. However, the genetic basis of this limit is unresolved. Given widespread pleiotropy, opposing selection on a trait may arise from the effects of the underlying alleles on other traits under selection, generating net stabilizing selection on trait genetic variance. These pleiotropic costs of trait exaggeration may arise through any number of other traits, making them hard to detect in phenotypic analyses. Stabilizing selection can be inferred, however, if genetic variance is greater among low- compared to high-fitness individuals. We extend a recently suggested approach to provide a direct test of a difference in genetic variance for a suite of cuticular hydrocarbons (CHCs) in Drosophila serrata. Despite strong directional sexual selection on these traits, genetic variance differed between high- and low-fitness individuals and was greater among the low-fitness males for seven of eight CHCs, significantly more than expected by chance. Univariate tests of a difference in genetic variance were nonsignificant but likely have low power. Our results suggest that further CHC exaggeration in D. serrata in response to sexual selection is limited by pleiotropic costs mediated through other traits. © 2012 The Author(s). Evolution© 2012 The Society for the Study of Evolution.

  15. A Systematic Bayesian Integration of Epidemiological and Genetic Data

    Science.gov (United States)

    Lau, Max S. Y.; Marion, Glenn; Streftaris, George; Gibson, Gavin

    2015-01-01

    Genetic sequence data on pathogens have great potential to inform inference of their transmission dynamics ultimately leading to better disease control. Where genetic change and disease transmission occur on comparable timescales additional information can be inferred via the joint analysis of such genetic sequence data and epidemiological observations based on clinical symptoms and diagnostic tests. Although recently introduced approaches represent substantial progress, for computational reasons they approximate genuine joint inference of disease dynamics and genetic change in the pathogen population, capturing partially the joint epidemiological-evolutionary dynamics. Improved methods are needed to fully integrate such genetic data with epidemiological observations, for achieving a more robust inference of the transmission tree and other key epidemiological parameters such as latent periods. Here, building on current literature, a novel Bayesian framework is proposed that infers simultaneously and explicitly the transmission tree and unobserved transmitted pathogen sequences. Our framework facilitates the use of realistic likelihood functions and enables systematic and genuine joint inference of the epidemiological-evolutionary process from partially observed outbreaks. Using simulated data it is shown that this approach is able to infer accurately joint epidemiological-evolutionary dynamics, even when pathogen sequences and epidemiological data are incomplete, and when sequences are available for only a fraction of exposures. These results also characterise and quantify the value of incomplete and partial sequence data, which has important implications for sampling design, and demonstrate the abilities of the introduced method to identify multiple clusters within an outbreak. The framework is used to analyse an outbreak of foot-and-mouth disease in the UK, enhancing current understanding of its transmission dynamics and evolutionary process. PMID:26599399

  16. Robust Inference of Population Structure for Ancestry Prediction and Correction of Stratification in the Presence of Relatedness

    Science.gov (United States)

    Conomos, Matthew P.; Miller, Mike; Thornton, Timothy

    2016-01-01

    Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multi-dimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using ten (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness. PMID:25810074

  17. Problems in Psychiatric Genetic Research: A Reply to Faraone and Biederman.

    Science.gov (United States)

    Joseph, Jay

    2000-01-01

    Answers the most important criticisms by Faraone and Biederman in their critique of Joseph's analysis of evidence supporting a genetic basis of attention deficit hyperactivity disorder. Argues that possible genetic and environmental influences in ADHD twin studies are confounded, obscuring inferences about genetic factors. (JPB)

  18. Genetic interaction motif finding by expectation maximization – a novel statistical model for inferring gene modules from synthetic lethality

    Directory of Open Access Journals (Sweden)

    Ye Ping

    2005-12-01

    Full Text Available Abstract Background Synthetic lethality experiments identify pairs of genes with complementary function. More direct functional associations (for example greater probability of membership in a single protein complex may be inferred between genes that share synthetic lethal interaction partners than genes that are directly synthetic lethal. Probabilistic algorithms that identify gene modules based on motif discovery are highly appropriate for the analysis of synthetic lethal genetic interaction data and have great potential in integrative analysis of heterogeneous datasets. Results We have developed Genetic Interaction Motif Finding (GIMF, an algorithm for unsupervised motif discovery from synthetic lethal interaction data. Interaction motifs are characterized by position weight matrices and optimized through expectation maximization. Given a seed gene, GIMF performs a nonlinear transform on the input genetic interaction data and automatically assigns genes to the motif or non-motif category. We demonstrate the capacity to extract known and novel pathways for Saccharomyces cerevisiae (budding yeast. Annotations suggested for several uncharacterized genes are supported by recent experimental evidence. GIMF is efficient in computation, requires no training and automatically down-weights promiscuous genes with high degrees. Conclusion GIMF effectively identifies pathways from synthetic lethality data with several unique features. It is mostly suitable for building gene modules around seed genes. Optimal choice of one single model parameter allows construction of gene networks with different levels of confidence. The impact of hub genes the generic probabilistic framework of GIMF may be used to group other types of biological entities such as proteins based on stochastic motifs. Analysis of the strongest motifs discovered by the algorithm indicates that synthetic lethal interactions are depleted between genes within a motif, suggesting that synthetic

  19. Meaningful mediation analysis : Plausible causal inference and informative communication

    NARCIS (Netherlands)

    Pieters, Rik

    2017-01-01

    Statistical mediation analysis has become the technique of choice in consumer research to make causal inferences about the influence of a treatment on an outcome via one or more mediators. This tutorial aims to strengthen two weak links that impede statistical mediation analysis from reaching its

  20. Killer Whale Genetic Data - Southern resident killer whale pedigree analysis

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — In this project, we are using genetic variation to infer mating patterns in the southern killer whale community. In Canada, this population was listed as threatened...

  1. Inference algorithms and learning theory for Bayesian sparse factor analysis

    International Nuclear Information System (INIS)

    Rattray, Magnus; Sharp, Kevin; Stegle, Oliver; Winn, John

    2009-01-01

    Bayesian sparse factor analysis has many applications; for example, it has been applied to the problem of inferring a sparse regulatory network from gene expression data. We describe a number of inference algorithms for Bayesian sparse factor analysis using a slab and spike mixture prior. These include well-established Markov chain Monte Carlo (MCMC) and variational Bayes (VB) algorithms as well as a novel hybrid of VB and Expectation Propagation (EP). For the case of a single latent factor we derive a theory for learning performance using the replica method. We compare the MCMC and VB/EP algorithm results with simulated data to the theoretical prediction. The results for MCMC agree closely with the theory as expected. Results for VB/EP are slightly sub-optimal but show that the new algorithm is effective for sparse inference. In large-scale problems MCMC is infeasible due to computational limitations and the VB/EP algorithm then provides a very useful computationally efficient alternative.

  2. Inference algorithms and learning theory for Bayesian sparse factor analysis

    Energy Technology Data Exchange (ETDEWEB)

    Rattray, Magnus; Sharp, Kevin [School of Computer Science, University of Manchester, Manchester M13 9PL (United Kingdom); Stegle, Oliver [Max-Planck-Institute for Biological Cybernetics, Tuebingen (Germany); Winn, John, E-mail: magnus.rattray@manchester.ac.u [Microsoft Research Cambridge, Roger Needham Building, Cambridge, CB3 0FB (United Kingdom)

    2009-12-01

    Bayesian sparse factor analysis has many applications; for example, it has been applied to the problem of inferring a sparse regulatory network from gene expression data. We describe a number of inference algorithms for Bayesian sparse factor analysis using a slab and spike mixture prior. These include well-established Markov chain Monte Carlo (MCMC) and variational Bayes (VB) algorithms as well as a novel hybrid of VB and Expectation Propagation (EP). For the case of a single latent factor we derive a theory for learning performance using the replica method. We compare the MCMC and VB/EP algorithm results with simulated data to the theoretical prediction. The results for MCMC agree closely with the theory as expected. Results for VB/EP are slightly sub-optimal but show that the new algorithm is effective for sparse inference. In large-scale problems MCMC is infeasible due to computational limitations and the VB/EP algorithm then provides a very useful computationally efficient alternative.

  3. New Algorithm and Software (BNOmics) for Inferring and Visualizing Bayesian Networks from Heterogeneous Big Biological and Genetic Data.

    Science.gov (United States)

    Gogoshin, Grigoriy; Boerwinkle, Eric; Rodin, Andrei S

    2017-04-01

    Bayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs. To overcome these and other obstacles, we present BNOmics, an improved algorithm and software toolkit for inferring and analyzing BNs from omics datasets. BNOmics aims at comprehensive systems biology-type data exploration, including both generating new biological hypothesis and testing and validating the existing ones. Novel aspects of the algorithm center around increasing scalability and applicability to varying data types (with different explicit and implicit distributional assumptions) within the same analysis framework. An output and visualization interface to widely available graph-rendering software is also included. Three diverse applications are detailed. BNOmics was originally developed in the context of genetic epidemiology data and is being continuously optimized to keep pace with the ever-increasing inflow of available large-scale omics datasets. As such, the software scalability and usability on the less than exotic computer hardware are a priority, as well as the applicability of the algorithm and software to the heterogeneous datasets containing many data types-single-nucleotide polymorphisms and other genetic/epigenetic/transcriptome variables, metabolite

  4. Past and future range shifts and loss of diversity in dwarf willow (Salix herbaceae L.) inferred from genetics, fossils and modelling

    DEFF Research Database (Denmark)

    Alsos, Inger Greve; Alm, Torbjørn; Normand, Signe

    2009-01-01

    . Macrofossil records were compiled to infer past distribution, and species distribution models were used to predict the Last Glacial Maximum (LGM) and future distribution of climatically suitable areas. Results  We found strong genetic differentiation between the populations from Europe/East Greenland...... during the last glaciation was inferred based on the fossil records and distribution modelling. A 46-57% reduction in suitable areas was predicted in 2080 compared to present. However, mainly southern alpine populations may go extinct, causing a loss of about 5% of the genetic diversity in the species....... Main conclusions  From a continuous range in Central Europe during the last glaciation, northward colonization probably occurred as a broad front maintaining diversity as the climate warmed. This explains why potential extinction of southern populations by 2080 will cause a comparatively low loss...

  5. An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.

    Science.gov (United States)

    Liang, Ying; Liao, Bo; Zhu, Wen

    2017-01-01

    Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.

  6. A Neuro-Fuzzy Inference System Combining Wavelet Denoising, Principal Component Analysis, and Sequential Probability Ratio Test for Sensor Monitoring

    International Nuclear Information System (INIS)

    Na, Man Gyun; Oh, Seungrohk

    2002-01-01

    A neuro-fuzzy inference system combined with the wavelet denoising, principal component analysis (PCA), and sequential probability ratio test (SPRT) methods has been developed to monitor the relevant sensor using the information of other sensors. The parameters of the neuro-fuzzy inference system that estimates the relevant sensor signal are optimized by a genetic algorithm and a least-squares algorithm. The wavelet denoising technique was applied to remove noise components in input signals into the neuro-fuzzy system. By reducing the dimension of an input space into the neuro-fuzzy system without losing a significant amount of information, the PCA was used to reduce the time necessary to train the neuro-fuzzy system, simplify the structure of the neuro-fuzzy inference system, and also, make easy the selection of the input signals into the neuro-fuzzy system. By using the residual signals between the estimated signals and the measured signals, the SPRT is applied to detect whether the sensors are degraded or not. The proposed sensor-monitoring algorithm was verified through applications to the pressurizer water level, the pressurizer pressure, and the hot-leg temperature sensors in pressurized water reactors

  7. Genetic structure and gene flow among Komodo dragon populations inferred by microsatellite loci analysis.

    Science.gov (United States)

    Ciofi, C; Bruford, M W

    1999-12-01

    A general concern for the conservation of endangered species is the maintenance of genetic variation within populations, particularly when they become isolated and reduced in size. Estimates of gene flow and effective population size are therefore important for any conservation initiative directed to the long-term persistence of a species in its natural habitat. In the present study, 10 microsatellite loci were used to assess the level of genetic variability among populations of the Komodo dragon Varanus komodoensis. Effective population size was calculated and gene flow estimates were compared with palaeogeographic data in order to assess the degree of vulnerability of four island populations. Rinca and Flores, currently separated by an isthmus of about 200 m, retained a high level of genetic diversity and showed a high degree of genetic similarity, with gene flow values close to one migrant per generation. The island of Komodo showed by far the highest levels of genetic divergence, and its allelic distinctiveness was considered of great importance in the maintenance of genetic variability within the species. A lack of distinct alleles and low levels of gene flow and genetic variability were found for the small population of Gili Motang island, which was identified as vulnerable to stochastic threats. Our results are potentially important for both the short- and long-term management of the Komodo dragon, and are critical in view of future re-introduction or augmentation in areas where the species is now extinct or depleted.

  8. Use of molecular genetics and historical records to reconstruct the ...

    African Journals Online (AJOL)

    Recent advances in molecular genetics made the inference of past demographic events through the analysis of gene pools from modern populations possible. The technology uses genetic markers to provide previously unavailable resolution into questions of human evolution, migration and the historical relationship of ...

  9. MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods

    Science.gov (United States)

    Tamura, Koichiro; Peterson, Daniel; Peterson, Nicholas; Stecher, Glen; Nei, Masatoshi; Kumar, Sudhir

    2011-01-01

    Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net. PMID:21546353

  10. Significant population genetic structure detected in the rock bream Oplegnathus fasciatus (Temminck & Schlegel, 1844) inferred from fluorescent-AFLP analysis

    Science.gov (United States)

    Xiao, Yongshuang; Ma, Daoyuan; Xu, Shihong; Liu, Qinghua; Wang, Yanfeng; Xiao, Zhizhong; Li, Jun

    2016-05-01

    Oplegnathus fasciatus (rock bream) is a commercial rocky reef fish species in East Asia that has been considered for aquaculture. We estimated the population genetic diversity and population structure of the species along the coastal waters of China using fluorescent-amplified fragment length polymorphisms technology. Using 53 individuals from three populations and four pairs of selective primers, we amplified 1 264 bands, 98.73% of which were polymorphic. The Zhoushan population showed the highest Nei's genetic diversity and Shannon genetic diversity. The results of analysis of molecular variance (AMOVA) showed that 59.55% of genetic variation existed among populations and 40.45% occurred within populations, which indicated that a significant population genetic structure existed in the species. The pairwise fixation index F st ranged from 0.20 to 0.63 and were significant after sequential Bonferroni correction. The topology of an unweighted pair group method with arithmetic mean tree showed two significant genealogical branches corresponding to the sampling locations of North and South China. The AMOVA and STRUCTURE analyses suggested that the O. fasciatus populations examined should comprise two stocks.

  11. Integrated genetic analysis microsystems

    International Nuclear Information System (INIS)

    Lagally, Eric T; Mathies, Richard A

    2004-01-01

    With the completion of the Human Genome Project and the ongoing DNA sequencing of the genomes of other animals, bacteria, plants and others, a wealth of new information about the genetic composition of organisms has become available. However, as the demand for sequence information grows, so does the workload required both to generate this sequence and to use it for targeted genetic analysis. Microfabricated genetic analysis systems are well poised to assist in the collection and use of these data through increased analysis speed, lower analysis cost and higher parallelism leading to increased assay throughput. In addition, such integrated microsystems may point the way to targeted genetic experiments on single cells and in other areas that are otherwise very difficult. Concomitant with these advantages, such systems, when fully integrated, should be capable of forming portable systems for high-speed in situ analyses, enabling a new standard in disciplines such as clinical chemistry, forensics, biowarfare detection and epidemiology. This review will discuss the various technologies available for genetic analysis on the microscale, and efforts to integrate them to form fully functional robust analysis devices. (topical review)

  12. Intercoalescence time distribution of incomplete gene genealogies in temporally varying populations, and applications in population genetic inference.

    Science.gov (United States)

    Chen, Hua

    2013-03-01

    Tracing back to a specific time T in the past, the genealogy of a sample of haplotypes may not have reached their common ancestor and may leave m lineages extant. For such an incomplete genealogy truncated at a specific time T in the past, the distribution and expectation of the intercoalescence times conditional on T are derived in an exact form in this paper for populations of deterministically time-varying sizes, specifically, for populations growing exponentially. The derived intercoalescence time distribution can be integrated to the coalescent-based joint allele frequency spectrum (JAFS) theory, and is useful for population genetic inference from large-scale genomic data, without relying on computationally intensive approaches, such as importance sampling and Markov Chain Monte Carlo (MCMC) methods. The inference of several important parameters relying on this derived conditional distribution is demonstrated: quantifying population growth rate and onset time, and estimating the number of ancestral lineages at a specific ancient time. Simulation studies confirm validity of the derivation and statistical efficiency of the methods using the derived intercoalescence time distribution. Two examples of real data are given to show the inference of the population growth rate of a European sample from the NIEHS Environmental Genome Project, and the number of ancient lineages of 31 mitochondrial genomes from Tibetan populations. © 2013 Blackwell Publishing Ltd/University College London.

  13. Genetic diversity and relationship of Indian cattle inferred from microsatellite and mitochondrial DNA markers.

    Science.gov (United States)

    Sharma, Rekha; Kishore, Amit; Mukesh, Manishi; Ahlawat, Sonika; Maitra, Avishek; Pandey, Ashwni Kumar; Tantia, Madhu Sudan

    2015-06-30

    Indian agriculture is an economic symbiosis of crop and livestock production with cattle as the foundation. Sadly, the population of indigenous cattle (Bos indicus) is declining (8.94% in last decade) and needs immediate scientific management. Genetic characterization is the first step in the development of proper management strategies for preserving genetic diversity and preventing undesirable loss of alleles. Thus, in this study we investigated genetic diversity and relationship among eleven Indian cattle breeds using 21 microsatellite markers and mitochondrial D loop sequence. The analysis of autosomal DNA was performed on 508 cattle which exhibited sufficient genetic diversity across all the breeds. Estimates of mean allele number and observed heterozygosity across all loci and population were 8.784 ± 0.25 and 0.653 ± 0.014, respectively. Differences among breeds accounted for 13.3% of total genetic variability. Despite high genetic diversity, significant inbreeding was also observed within eight populations. Genetic distances and cluster analysis showed a close relationship between breeds according to proximity in geographic distribution. The genetic distance, STRUCTURE and Principal Coordinate Analysis concluded that the Southern Indian Ongole cattle are the most distinct among the investigated cattle populations. Sequencing of hypervariable mitochondrial DNA region on a subset of 170 cattle revealed sixty haplotypes with haplotypic diversity of 0.90240, nucleotide diversity of 0.02688 and average number of nucleotide differences as 6.07407. Two major star clusters for haplotypes indicated population expansion for Indian cattle. Nuclear and mitochondrial genomes show a similar pattern of genetic variability and genetic differentiation. Various analyses concluded that the Southern breed 'Ongole' was distinct from breeds of Northern/ Central India. Overall these results provide basic information about genetic diversity and structure of Indian cattle which

  14. Inference of Well-Typings for Logic Programs with Application to Termination Analysis

    DEFF Research Database (Denmark)

    Bruynooghe, M.; Gallagher, John Patrick; Humbeeck, W. Van

    2005-01-01

    A method is developed to infer a polymorphic well-typing for a logic program. Our motivation is to improve the automation of termination analysis by deriving types from which norms can automatically be constructed. Previous work on type-based termination analysis used either types declared...... by the user, or automatically generated monomorphic types describing the success set of predicates. The latter types are less precise and result in weaker termination conditions than those obtained from declared types. Our type inference procedure involves solving set constraints generated from the program...... and derives a well-typing in contrast to a success-set approximation. Experiments so far show that our automatically inferred well-typings are close to the declared types and result in termination conditions that are as strong as those obtained with declared types. We describe the method, its implementation...

  15. Historical explanation of genetic variation in the Mediterranean horseshoe bat Rhinolophus euryale (Chiroptera: Rhinolophidae) inferred from mitochondrial cytochrome-b and D-loop genes in Iran.

    Science.gov (United States)

    Najafi, Nargess; Akmali, Vahid; Sharifi, Mozafar

    2018-04-26

    Molecular phylogeography and species distribution modelling (SDM) suggest that late Quaternary glacial cycles have portrayed a significant role in structuring current population genetic structure and diversity. Based on phylogenetic relationships using Bayesian inference and maximum likelihood of 535 bp mtDNA (D-loop) and 745 bp mtDNA (Cytb) in 62 individuals of the Mediterranean Horseshoe Bat, Rhinolophus euryale, from 13 different localities in Iran we identified two subspecific populations with differing population genetic structure distributed in southern Zagros Mts. and northern Elburz Mts. Analysis of molecular variance (AMOVA) obtained from D-loop sequences indicates that 21.18% of sequence variation is distributed among populations and 10.84% within them. Moreover, a degree of genetic subdivision, mainly attributable to the existence of significant variance among the two regions is shown (θCT = 0.68, p = .005). The positive and significant correlation between geographic and genetic distances (R 2  = 0.28, r = 0.529, p = .000) is obtained following controlling for environmental distance. Spatial distribution of haplotypes indicates that marginal population of the species in southern part of the species range have occupied this section as a glacial refugia. However, this genetic variation, in conjunction with results of the SDM shows a massive postglacial range expansion for R. euryale towards higher latitudes in Iran.

  16. Native South American genetic structure and prehistory inferred from hierarchical modeling of mtDNA.

    Science.gov (United States)

    Lewis, Cecil M; Long, Jeffrey C

    2008-03-01

    Genetic diversity in Native South Americans forms a complex pattern at both the continental and local levels. In comparing the West to the East, there is more variation within groups and smaller genetic distances between groups. From this pattern, researchers have proposed that there is more variation in the West and that a larger, more genetically diverse, founding population entered the West than the East. Here, we question this characterization of South American genetic variation and its interpretation. Our concern arises because others have inferred regional variation from the mean variation within local populations without taking into account the variation among local populations within the same region. This failure produces a biased view of the actual variation in the East. In this study, we analyze the mitochondrial DNA sequence between positions 16040 and 16322 of the Cambridge reference sequence. Our sample represents a total of 886 people from 27 indigenous populations from South (22), Central (3), and North America (2). The basic unit of our analyses is nucleotide identity by descent, which is easily modeled and proportional to nucleotide diversity. We use a forward modeling strategy to fit a series of nested models to identity by descent within and between all pairs of local populations. This method provides estimates of identity by descent at different levels of population hierarchy without assuming homogeneity within populations, regions, or continents. Our main discovery is that Eastern South America harbors more genetic variation than has been recognized. We find no evidence that there is increased identity by descent in the East relative to the total for South America. By contrast, we discovered that populations in the Western region, as a group, harbor more identity by descent than has been previously recognized, despite the fact that average identity by descent within groups is lower. In this light, there is no need to postulate separate founding

  17. Effects analysis fuzzy inference system in nuclear problems using approximate reasoning

    International Nuclear Information System (INIS)

    Guimaraes, Antonio C.F.; Franklin Lapa, Celso Marcelo

    2004-01-01

    In this paper a fuzzy inference system modeling technique applied on failure mode and effects analysis (FMEA) is introduced in reactor nuclear problems. This method uses the concept of a pure fuzzy logic system to treat the traditional FMEA parameters: probabilities of occurrence, severity and detection. The auxiliary feed-water system of a typical two-loop pressurized water reactor (PWR) was used as practical example in this analysis. The kernel result is the conceptual confrontation among the traditional risk priority number (RPN) and the fuzzy risk priority number (FRPN) obtained from experts opinion. The set of results demonstrated the great potential of the inference system and advantage of the gray approach in this class of problems

  18. Forecasting building energy consumption with hybrid genetic algorithm-hierarchical adaptive network-based fuzzy inference system

    Energy Technology Data Exchange (ETDEWEB)

    Li, Kangji [Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou 310027 (China); School of Electricity Information Engineering, Jiangsu University, Zhenjiang 212013 (China); Su, Hongye [Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou 310027 (China)

    2010-11-15

    There are several ways to forecast building energy consumption, varying from simple regression to models based on physical principles. In this paper, a new method, namely, the hybrid genetic algorithm-hierarchical adaptive network-based fuzzy inference system (GA-HANFIS) model is developed. In this model, hierarchical structure decreases the rule base dimension. Both clustering and rule base parameters are optimized by GAs and neural networks (NNs). The model is applied to predict a hotel's daily air conditioning consumption for a period over 3 months. The results obtained by the proposed model are presented and compared with regular method of NNs, which indicates that GA-HANFIS model possesses better performance than NNs in terms of their forecasting accuracy. (author)

  19. Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks.

    Science.gov (United States)

    Deeter, Anthony; Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui

    2017-01-01

    The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.

  20. A neuro-fuzzy inference system for sensor monitoring

    International Nuclear Information System (INIS)

    Na, Man Gyun

    2001-01-01

    A neuro-fuzzy inference system combined with the wavelet denoising, PCA (principal component analysis) and SPRT (sequential probability ratio test) methods has been developed to monitor the relevant sensor using the information of other sensors. The paramters of the neuro-fuzzy inference system which estimates the relevant sensor signal are optimized by a genetic algorithm and a least-squares algorithm. The wavelet denoising technique was applied to remove noise components in input signals into the neuro-fuzzy system. By reducing the dimension of an input space into the neuro-fuzzy system without losing a significant amount of information, the PCA was used to reduce the time necessary to train the neuro-fuzzy system, simplify the structure of the neuro-fuzzy inference system and also, make easy the selection of the input signals into the neuro-fuzzy system. By using the residual signals between the estimated signals and the measured signals, the SPRT is applied to detect whether the sensors are degraded or not. The proposed sensor-monitoring algorithm was verified through applications to the pressurizer water level, the pressurizer pressure, and the hot-leg temperature sensors in pressurized water reactors

  1. Attitudes towards genetic testing: analysis of contradictions

    DEFF Research Database (Denmark)

    Jallinoja, P; Hakonen, A; Aro, A R

    1998-01-01

    A survey study was conducted among 1169 people to evaluate attitudes towards genetic testing in Finland. Here we present an analysis of the contradictions detected in people's attitudes towards genetic testing. This analysis focuses on the approval of genetic testing as an individual choice and o...... studies on attitudes towards genetic testing as well as in the health care context, e.g. in genetic counselling.......A survey study was conducted among 1169 people to evaluate attitudes towards genetic testing in Finland. Here we present an analysis of the contradictions detected in people's attitudes towards genetic testing. This analysis focuses on the approval of genetic testing as an individual choice...... and on the confidence in control of the process of genetic testing and its implications. Our analysis indicated that some of the respondents have contradictory attitudes towards genetic testing. It is proposed that contradictory attitudes towards genetic testing should be given greater significance both in scientific...

  2. Inference of population splits and mixtures from genome-wide allele frequency data.

    Directory of Open Access Journals (Sweden)

    Joseph K Pickrell

    Full Text Available Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In our model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data, we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively to wolves subsequent to domestication and that East Asian toy breeds (the Shih Tzu and the Pekingese result from admixture between modern toy breeds and "ancient" Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.com.

  3. A Combined Methodology of Adaptive Neuro-Fuzzy Inference System and Genetic Algorithm for Short-term Energy Forecasting

    Directory of Open Access Journals (Sweden)

    KAMPOUROPOULOS, K.

    2014-02-01

    Full Text Available This document presents an energy forecast methodology using Adaptive Neuro-Fuzzy Inference System (ANFIS and Genetic Algorithms (GA. The GA has been used for the selection of the training inputs of the ANFIS in order to minimize the training result error. The presented algorithm has been installed and it is being operating in an automotive manufacturing plant. It periodically communicates with the plant to obtain new information and update the database in order to improve its training results. Finally the obtained results of the algorithm are used in order to provide a short-term load forecasting for the different modeled consumption processes.

  4. Assessing population genetic structure via the maximisation of genetic distance

    Directory of Open Access Journals (Sweden)

    Toro Miguel A

    2009-11-01

    Full Text Available Abstract Background The inference of the hidden structure of a population is an essential issue in population genetics. Recently, several methods have been proposed to infer population structure in population genetics. Methods In this study, a new method to infer the number of clusters and to assign individuals to the inferred populations is proposed. This approach does not make any assumption on Hardy-Weinberg and linkage equilibrium. The implemented criterion is the maximisation (via a simulated annealing algorithm of the averaged genetic distance between a predefined number of clusters. The performance of this method is compared with two Bayesian approaches: STRUCTURE and BAPS, using simulated data and also a real human data set. Results The simulations show that with a reduced number of markers, BAPS overestimates the number of clusters and presents a reduced proportion of correct groupings. The accuracy of the new method is approximately the same as for STRUCTURE. Also, in Hardy-Weinberg and linkage disequilibrium cases, BAPS performs incorrectly. In these situations, STRUCTURE and the new method show an equivalent behaviour with respect to the number of inferred clusters, although the proportion of correct groupings is slightly better with the new method. Re-establishing equilibrium with the randomisation procedures improves the precision of the Bayesian approaches. All methods have a good precision for FST ≥ 0.03, but only STRUCTURE estimates the correct number of clusters for FST as low as 0.01. In situations with a high number of clusters or a more complex population structure, MGD performs better than STRUCTURE and BAPS. The results for a human data set analysed with the new method are congruent with the geographical regions previously found. Conclusion This new method used to infer the hidden structure in a population, based on the maximisation of the genetic distance and not taking into consideration any assumption about Hardy

  5. Geographic population structure analysis of worldwide human populations infers their biogeographical origins

    Science.gov (United States)

    Elhaik, Eran; Tatarinova, Tatiana; Chebotarev, Dmitri; Piras, Ignazio S.; Maria Calò, Carla; De Montis, Antonella; Atzori, Manuela; Marini, Monica; Tofanelli, Sergio; Francalacci, Paolo; Pagani, Luca; Tyler-Smith, Chris; Xue, Yali; Cucca, Francesco; Schurr, Theodore G.; Gaieski, Jill B.; Melendez, Carlalynne; Vilar, Miguel G.; Owings, Amanda C.; Gómez, Rocío; Fujita, Ricardo; Santos, Fabrício R.; Comas, David; Balanovsky, Oleg; Balanovska, Elena; Zalloua, Pierre; Soodyall, Himla; Pitchappan, Ramasamy; GaneshPrasad, ArunKumar; Hammer, Michael; Matisoo-Smith, Lisa; Wells, R. Spencer; Acosta, Oscar; Adhikarla, Syama; Adler, Christina J.; Bertranpetit, Jaume; Clarke, Andrew C.; Cooper, Alan; Der Sarkissian, Clio S. I.; Haak, Wolfgang; Haber, Marc; Jin, Li; Kaplan, Matthew E.; Li, Hui; Li, Shilin; Martínez-Cruz, Begoña; Merchant, Nirav C.; Mitchell, John R.; Parida, Laxmi; Platt, Daniel E.; Quintana-Murci, Lluis; Renfrew, Colin; Lacerda, Daniela R.; Royyuru, Ajay K.; Sandoval, Jose Raul; Santhakumari, Arun Varatharajan; Soria Hernanz, David F.; Swamikrishnan, Pandikumar; Ziegle, Janet S.

    2014-01-01

    The search for a method that utilizes biological information to predict humans’ place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data in an effort to achieve this goal but with limited success. While biogeographical algorithms using next-generation sequencing data have achieved an accuracy of 700 km in Europe, they were inaccurate elsewhere. Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three data sets using 40,000–130,000 SNPs. GPS placed 83% of worldwide individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50 km of their villages. GPS’s accuracy and power to infer the biogeography of worldwide individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing. PMID:24781250

  6. Technical Note: How to use Winbugs to infer animal models

    DEFF Research Database (Denmark)

    Damgaard, Lars Holm

    2007-01-01

    This paper deals with Bayesian inferences of animal models using Gibbs sampling. First, we suggest a general and efficient method for updating additive genetic effects, in which the computational cost is independent of the pedigree depth and increases linearly only with the size of the pedigree....... Second, we show how this approach can be used to draw inferences from a wide range of animal models using the computer package Winbugs. Finally, we illustrate the approach in a simulation study, in which the data are generated and analyzed using Winbugs according to a linear model with i.i.d errors...... having Student's t distributions. In conclusion, Winbugs can be used to make inferences in small-sized, quantitative, genetic data sets applying a wide range of animal models that are not yet standard in the animal breeding literature...

  7. A unified framework for haplotype inference in nuclear families.

    Science.gov (United States)

    Iliadis, Alexandros; Anastassiou, Dimitris; Wang, Xiaodong

    2012-07-01

    Many large genome-wide association studies include nuclear families with more than one child (trio families), allowing for analysis of differences between siblings (sib pair analysis). Statistical power can be increased when haplotypes are used instead of genotypes. Currently, haplotype inference in families with more than one child can be performed either using the familial information or statistical information derived from the population samples but not both. Building on our recently proposed tree-based deterministic framework (TDS) for trio families, we augment its applicability to general nuclear families. We impose a minimum recombinant approach locally and independently on each multiple children family, while resorting to the population-derived information to solve the remaining ambiguities. Thus our framework incorporates all available information (familial and population) in a given study. We demonstrate that using all the constraints in our approach we can have gains in the accuracy as opposed to breaking the multiple children families to separate trios and resorting to a trio inference algorithm or phasing each family in isolation. We believe that our proposed framework could be the method of choice for haplotype inference in studies that include nuclear families with multiple children. Our software (tds2.0) is downloadable from www.ee.columbia.edu/∼anastas/tds. © 2012 The Authors Annals of Human Genetics © 2012 Blackwell Publishing Ltd/University College London.

  8. Inference of the Genetic Network Regulating Lateral Root Initiation in Arabidopsis thaliana

    KAUST Repository

    Muraro, D.

    2013-01-01

    Regulation of gene expression is crucial for organism growth, and it is one of the challenges in systems biology to reconstruct the underlying regulatory biological networks from transcriptomic data. The formation of lateral roots in Arabidopsis thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based on different approaches, for which ready-to-use software is available. We show that their performance improves with the network size and the inclusion of mutants. We then analyze two sets of genes, whose activity is likely to be relevant to lateral root initiation in Arabidopsis, and assess causality of their regulatory interactions by integrating sequence analysis with the intersection of the results of the best performing methods on time series and mutants. The methods applied capture known interactions between genes that are candidate regulators at early stages of development. The network inferred from genes significantly expressed during lateral root formation exhibits distinct scale free, small world and hierarchical properties and the nodes with a high out-degree may warrant further investigation. © 2004-2012 IEEE.

  9. Cyclic Concatenated Genetic Encoder: A mathematical proposal for biological inferences.

    Science.gov (United States)

    Duarte-González, M E; Echeverri, O Y; Guevara, J M; Palazzo, R

    2018-01-01

    The organization of the genetic information and its ability to be conserved and translated to proteins with low error rates have been the subject of study by scientists from different disciplines. Recently, it has been proposed that living organisms display an intra-cellular transmission system of genetic information, similar to a model of digital communication system, in which there is the ability to detect and correct errors. In this work, the concept of Concatenated Genetic Encoder is introduced and applied to the analysis of protein sequences as a tool for exploring evolutionary relationships. For such purposes Error Correcting Codes (ECCs) are used to represent proteins. A methodology for representing or identifying proteins by use of BCH codes over ℤ 20 and F 4 ×ℤ 5 is proposed and cytochrome b6-f complex subunit 6-OS sequences, corresponding to different plants species, are analyzed according to the proposed methodology and results are contrasted to phylogenetic and taxonomic analyses. Through the analyses, it was observed that using BCH codes only some sequences are identified, all of which differ in one amino acid from the original sequence. In addition, mathematical relationships among identified sequences are established by considering minimal polynomials, where such sequences showed a close relationship as revealed in the phylogenetic reconstruction. Results, here shown, point out that communication theory may provide biology of interesting and useful tools to identify biological relationships among proteins, however the proposed methodology needs to be improved and rigorously tested in order to become into an applicable tool for biological analysis. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Genetic population structure of the desert shrub species lycium ruthenicum inferred from chloroplast dna

    International Nuclear Information System (INIS)

    Chen, H.; Yonezawa, T.

    2014-01-01

    Lycium ruthenicum (Solananeae), a spiny shrub mostly distributed in the desert regions of north and northwest China, has been shown to exhibit high tolerance to the extreme environment. In this study, the phylogeography and evolutionary history of L. ruthenicum were examined, on the basis of 80 individuals from eight populations. Using the sequence variations of two spacer regions of chloroplast DNA (trnH-psbA and rps16-trnK) , the absence of a geographic component in the chloroplast DNA genetic structure was identified (GST = 0.351, NST = 0.304, NST< GST), which was consisted with the result of SAMOVA, suggesting weak phylogeographic structure of this species. Phylogenetic and network analyses showed that a total of 10 haplotypes identified in the present study clustered into two clades, in which clade I harbored the ancestral haplotypes that inferred two independent glacial refugia in the middle of Qaidam Basin and the western Inner Mongolia. The existence of regional evolutionary differences was supported by GENETREE, which revealed that one of the population in Qaidam Basin and the two populations in Tarim Basin had experienced rapid expansion, and the other populations retained relatively stable population size during the Pleistocene . Given the results of long-term gene flow and pairwise differences, strong gene flow was insufficient to reduce the genetic differentiation among populations or within populations, probably due to the genetic composition containing a common haplotype and the high number of private haplotypes fixed for most of the population. The divergence times of different lineages were consistent with the rapid uplift phases of the Qinghai-Tibetan Plateau and the initiation and expansion of deserts in northern China, suggesting that the origin and evolution of L. ruthenicum were strongly influenced by Quaternary environment changes. (author)

  11. Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives.

    Science.gov (United States)

    Ramstetter, Monica D; Dyer, Thomas D; Lehman, Donna M; Curran, Joanne E; Duggirala, Ravindranath; Blangero, John; Mezey, Jason G; Williams, Amy L

    2017-09-01

    Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. While numerous methods exist for inferring relatedness, thorough evaluation of these approaches in real data has been lacking. Here, we report an assessment of 12 state-of-the-art pairwise relatedness inference methods using a data set with 2485 individuals contained in several large pedigrees that span up to six generations. We find that all methods have high accuracy (92-99%) when detecting first- and second-degree relationships, but their accuracy dwindles to 76% of relative pairs. Overall, the most accurate methods are Estimation of Recent Shared Ancestry (ERSA) and approaches that compute total IBD sharing using the output from GERMLINE and Refined IBD to infer relatedness. Combining information from the most accurate methods provides little accuracy improvement, indicating that novel approaches, such as new methods that leverage relatedness signals from multiple samples, are needed to achieve a sizeable jump in performance. Copyright © 2017 Ramstetter et al.

  12. Population biology of establishment in New Zealand hedgehogs inferred from genetic and historical data: conflict or compromise?

    Science.gov (United States)

    Bolfíková, Barbora; Konečný, Adam; Pfäffle, Miriam; Skuballa, Jasmin; Hulva, Pavel

    2013-07-01

    The crucial steps in biological invasions, related to the shaping of genetic architecture and the current evolution of adaptations to a novel environment, usually occur in small populations during the phases of introduction and establishment. However, these processes are difficult to track in nature due to invasion lag, large geographic and temporal scales compared with human observation capabilities, the frequent depletion of genetic variance, admixture and other phenomena. In this study, we compared genetic and historical evidence related to the invasion of the West European hedgehog to New Zealand to infer details about the introduction and establishment. Historical information indicates that the species was initially established on the South Island. A molecular assay of populations from Great Britain and New Zealand using mitochondrial sequences and nuclear microsatellite loci was performed based on a set of analyses including approximate Bayesian computation, a powerful approach for disentangling complex population demographies. According to these analyses, the population of the North Island was most similar to that of the native area and showed greatest reduction in genetic variation caused by founder demography and/or drift. This evidence indicated the location of the establishment phase. The hypothesis was corroborated by data on climate and urbanization. We discuss the contrasting results obtained by the molecular and historical approaches in the light of their different explanatory power and the possible biases influencing the description of particular aspects of invasions, and we advocate the integration of the two types of approaches in invasion biology. © 2013 John Wiley & Sons Ltd.

  13. Co-Inheritance Analysis within the Domains of Life Substantially Improves Network Inference by Phylogenetic Profiling.

    Directory of Open Access Journals (Sweden)

    Junha Shin

    Full Text Available Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life-Archaea, Bacteria, and Eukaryota-suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co

  14. Genetic structure and inferences on potential source areas for Bactrocera dorsalis (Hendel based on mitochondrial and microsatellite markers.

    Directory of Open Access Journals (Sweden)

    Wei Shi

    Full Text Available Bactrocera dorsalis (Diptera: Tephritidae is mainly distributed in tropical and subtropical Asia and in the Pacific region. Despite its economic importance, very few studies have addressed the question of the wide genetic structure and potential source area of this species. This pilot study attempts to infer the native region of this pest and its colonization pathways in Asia. Combining mitochondrial and microsatellite markers, we evaluated the level of genetic diversity, genetic structure, and the gene flow among fly populations collected across Southeast Asia and China. A complex and significant genetic structure corresponding to the geographic pattern was found with both types of molecular markers. However, the genetic structure found was rather weak in both cases, and no pattern of isolation by distance was identified. Multiple long-distance dispersal events and miscellaneous host selection by this species may explain the results. These complex patterns may have been influenced by human-mediated transportation of the pest from one area to another and the complex topography of the study region. For both mitochondrial and microsatellite data, no signs of bottleneck or founder events could be identified. Nonetheless, maximal genetic diversity was observed in Myanmar, Vietnam and Guangdong (China and asymmetric migration patterns were found. These results provide indirect evidence that the tropical regions of Southeast Asia and southern coast of China may be considered as the native range of the species and the population expansion is northward. Yunnan (China is a contact zone that has been colonized from different sources. Regions along the southern coast of Vietnam and China probably served to colonize mainly the southern region of China. Southern coastal regions of China may also have colonized central parts of China and of central Yunnan.

  15. Genetic analysis

    NARCIS (Netherlands)

    Koornneef, M.; Alonso-Blanco, C.; Stam, P.

    2006-01-01

    The Mendelian analysis of genetic variation, available as induced mutants or as natural variation, requires a number of steps that are described in this chapter. These include the determination of the number of genes involved in the observed trait's variation, the determination of dominance

  16. Genetic analysis reveals efficient sexual spore dispersal at a fine spatial scale in Armillaria ostoyae, the causal agent of root-rot disease in conifers.

    Science.gov (United States)

    Dutech, Cyril; Labbé, Frédéric; Capdevielle, Xavier; Lung-Escarmant, Brigitte

    Armillaria ostoyae (sometimes named Armillaria solidipes) is a fungal species causing root diseases in numerous coniferous forests of the northern hemisphere. The importance of sexual spores for the establishment of new disease centres remains unclear, particularly in the large maritime pine plantations of southwestern France. An analysis of the genetic diversity of a local fungal population distributed over 500 ha in this French forest showed genetic recombination between genotypes to be frequent, consistent with regular sexual reproduction within the population. The estimated spatial genetic structure displayed a significant pattern of isolation by distance, consistent with the dispersal of sexual spores mostly at the spatial scale studied. Using these genetic data, we inferred an effective density of reproductive individuals of 0.1-0.3 individuals/ha, and a second moment of parent-progeny dispersal distance of 130-800 m, compatible with the main models of fungal spore dispersal. These results contrast with those obtained for studies of A. ostoyae over larger spatial scales, suggesting that inferences about mean spore dispersal may be best performed at fine spatial scales (i.e. a few kilometres) for most fungal species. Copyright © 2017 British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  17. Integrated analysis of genetic data with R

    Directory of Open Access Journals (Sweden)

    Zhao Jing

    2006-01-01

    Full Text Available Abstract Genetic data are now widely available. There is, however, an apparent lack of concerted effort to produce software systems for statistical analysis of genetic data compared with other fields of statistics. It is often a tremendous task for end-users to tailor them for particular data, especially when genetic data are analysed in conjunction with a large number of covariates. Here, R http://www.r-project.org, a free, flexible and platform-independent environment for statistical modelling and graphics is explored as an integrated system for genetic data analysis. An overview of some packages currently available for analysis of genetic data is given. This is followed by examples of package development and practical applications. With clear advantages in data management, graphics, statistical analysis, programming, internet capability and use of available codes, it is a feasible, although challenging, task to develop it into an integrated platform for genetic analysis; this will require the joint efforts of many researchers.

  18. Inferring genetic architecture of complex traits using Bayesian integrative analysis of genome and transcriptiome data

    DEFF Research Database (Denmark)

    Ehsani, Alireza; Sørensen, Peter; Pomp, Daniel

    2012-01-01

    Background To understand the genetic architecture of complex traits and bridge the genotype-phenotype gap, it is useful to study intermediate -omics data, e.g. the transcriptome. The present study introduces a method for simultaneous quantification of the contributions from single nucleotide......-modal distribution of genomic values collapses, when gene expressions are added to the model Conclusions With increased availability of various -omics data, integrative approaches are promising tools for understanding the genetic architecture of complex traits. Partitioning of explained variances at the chromosome...

  19. Information-Theoretic Inference of Large Transcriptional Regulatory Networks

    Directory of Open Access Journals (Sweden)

    Meyer Patrick

    2007-01-01

    Full Text Available The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR, an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.

  20. Information-Theoretic Inference of Large Transcriptional Regulatory Networks

    Directory of Open Access Journals (Sweden)

    Patrick E. Meyer

    2007-06-01

    Full Text Available The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR, an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.

  1. Arthritis Genetics Analysis Aids Drug Discovery

    Science.gov (United States)

    ... NIH Research Matters January 13, 2014 Arthritis Genetics Analysis Aids Drug Discovery An international research team identified 42 new ... Edition Distracted Driving Raises Crash Risk Arthritis Genetics Analysis Aids Drug Discovery Oxytocin Affects Facial Recognition Connect with Us ...

  2. Inference of biogeographical ancestry across central regions of Eurasia.

    Science.gov (United States)

    Bulbul, O; Filoglu, G; Zorlu, T; Altuncul, H; Freire-Aradas, A; Söchtig, J; Ruiz, Y; Klintschar, M; Triki-Fendri, S; Rebai, A; Phillips, C; Lareu, M V; Carracedo, Á; Schneider, P M

    2016-01-01

    The inference of biogeographical ancestry (BGA) can provide useful information for forensic investigators when there are no suspects to be compared with DNA collected at the crime scene or when no DNA database matches exist. Although public databases are increasing in size and population scope, there is a lack of information regarding genetic variation in Eurasian populations, especially in central regions such as the Middle East. Inhabitants of these regions show a high degree of genetic admixture, characterized by an allele frequency cline running from NW Europe to East Asia. Although a proper differentiation has been established between the cline extremes of western Europe and South Asia, populations geographically located in between, i.e, Middle East and Mediterranean populations, require more detailed study in order to characterize their genetic background as well as to further understand their demographic histories. To initiate these studies, three ancestry informative SNP (AI-SNP) multiplex panels: the SNPforID 34-plex, Eurasiaplex and a novel 33-plex assay were used to describe the ancestry patterns of a total of 24 populations ranging across the longitudinal axis from NW Europe to East Asia. Different ancestry inference approaches, including STRUCTURE, PCA, DAPC and Snipper Bayes analysis, were applied to determine relationships among populations. The structure results show differentiation between continental groups and a NW to SE allele frequency cline running across Eurasian populations. This study adds useful population data that could be used as reference genotypes for future ancestry investigations in forensic cases. The 33-plex assay also includes pigmentation predictive SNPs, but this study primarily focused on Eurasian population differentiation using 33-plex and its combination with the other two AI-SNP sets.

  3. Genetic Structure and Relationship Analysis of an Association Population in Jute (Corchorus spp. Evaluated by SSR Markers.

    Directory of Open Access Journals (Sweden)

    Liwu Zhang

    Full Text Available Population structure and relationship analysis is of great importance in the germplasm utilization and association mapping. Jute, comprised of white jute (C. capsularis L and dark jute (C. olitorius L, is second to cotton in its commercial significance in the world. Here, we assessed the genetic structure and relationship in a panel of 159 jute accessions from 11 countries and regions using 63 SSRs. The structure analysis divided the 159 jute accessions from white and dark jute into Co and Cc group, further into Co1, Co2, Cc1 and Cc2 subgroups. Out of Cc1 subgroup, 81 accessions were from China and the remaining 10 accessions were from India (2, Japan (5, Thailand, Vietnam (2 and Pakistan (1. Out of Cc2 subgroup, 35 accessions were from China, and the remaining 3 accessions were from India, Pakistan and Thailand respectively. It can be inferred that the genetic background of these jute accessions was not always correlative with their geographical regions. Similar results were found in Co1 and Co2 subgroups. Analysis of molecular variance revealed 81% molecular variation between groups but it was low (19% within subgroups, which further confirmed the genetic differentiation between the two groups. The genetic relationship analysis showed that the most diverse genotypes were Maliyeshengchangguo and Changguozhongyueyin in dark jute, BZ-2-2, Aidianyehuangma, Yangjuchiyuanguo, Zijinhuangma and Jute 179 in white jute, which could be used as the potential parents in breeding programs for jute improvement. These results would be very useful for association studies and breeding in jute.

  4. Horse domestication and conservation genetics of Przewalski's horse inferred from sex chromosomal and autosomal sequences.

    Science.gov (United States)

    Lau, Allison N; Peng, Lei; Goto, Hiroki; Chemnick, Leona; Ryder, Oliver A; Makova, Kateryna D

    2009-01-01

    Despite their ability to interbreed and produce fertile offspring, there is continued disagreement about the genetic relationship of the domestic horse (Equus caballus) to its endangered wild relative, Przewalski's horse (Equus przewalskii). Analyses have differed as to whether or not Przewalski's horse is placed phylogenetically as a separate sister group to domestic horses. Because Przewalski's horse and domestic horse are so closely related, genetic data can also be used to infer domestication-specific differences between the two. To investigate the genetic relationship of Przewalski's horse to the domestic horse and to address whether evolution of the domestic horse is driven by males or females, five homologous introns (a total of approximately 3 kb) were sequenced on the X and Y chromosomes in two Przewalski's horses and three breeds of domestic horses: Arabian horse, Mongolian domestic horse, and Dartmoor pony. Five autosomal introns (a total of approximately 6 kb) were sequenced for these horses as well. The sequences of sex chromosomal and autosomal introns were used to determine nucleotide diversity and the forces driving evolution in these species. As a result, X chromosomal and autosomal data do not place Przewalski's horses in a separate clade within phylogenetic trees for horses, suggesting a close relationship between domestic and Przewalski's horses. It was also found that there was a lack of nucleotide diversity on the Y chromosome and higher nucleotide diversity than expected on the X chromosome in domestic horses as compared with the Y chromosome and autosomes. This supports the hypothesis that very few male horses along with numerous female horses founded the various domestic horse breeds. Patterns of nucleotide diversity among different types of chromosomes were distinct for Przewalski's in contrast to domestic horses, supporting unique evolutionary histories of the two species.

  5. Demographic inferences from large-scale NGS data

    DEFF Research Database (Denmark)

    Pedersen, Casper-Emil Tingskov

    .g. human genetics. In this thesis, the three papers presented demonstrate the advantages of NGS data in the framework of population genetics for elucidating demographic inferences, important for understanding conservation efforts, selection and mutational burdens. In the first whole-genome study...... that the demographic history of the Inuit is the most extreme in terms of population size, of any human population. We identify a slight increase in the number of deleterious alleles because of this demographic history and support our results using simulations. We use this to show that the reduction in population size...

  6. Inferring transcriptional compensation interactions in yeast via stepwise structure equation modeling

    Directory of Open Access Journals (Sweden)

    Wang Woei-Fuh

    2008-03-01

    Full Text Available Abstract Background With the abundant information produced by microarray technology, various approaches have been proposed to infer transcriptional regulatory networks. However, few approaches have studied subtle and indirect interaction such as genetic compensation, the existence of which is widely recognized although its mechanism has yet to be clarified. Furthermore, when inferring gene networks most models include only observed variables whereas latent factors, such as proteins and mRNA degradation that are not measured by microarrays, do participate in networks in reality. Results Motivated by inferring transcriptional compensation (TC interactions in yeast, a stepwise structural equation modeling algorithm (SSEM is developed. In addition to observed variables, SSEM also incorporates hidden variables to capture interactions (or regulations from latent factors. Simulated gene networks are used to determine with which of six possible model selection criteria (MSC SSEM works best. SSEM with Bayesian information criterion (BIC results in the highest true positive rates, the largest percentage of correctly predicted interactions from all existing interactions, and the highest true negative (non-existing interactions rates. Next, we apply SSEM using real microarray data to infer TC interactions among (1 small groups of genes that are synthetic sick or lethal (SSL to SGS1, and (2 a group of SSL pairs of 51 yeast genes involved in DNA synthesis and repair that are of interest. For (1, SSEM with BIC is shown to outperform three Bayesian network algorithms and a multivariate autoregressive model, checked against the results of qRT-PCR experiments. The predictions for (2 are shown to coincide with several known pathways of Sgs1 and its partners that are involved in DNA replication, recombination and repair. In addition, experimentally testable interactions of Rad27 are predicted. Conclusion SSEM is a useful tool for inferring genetic networks, and the

  7. Inferring Group Processes from Computer-Mediated Affective Text Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Schryver, Jack C [ORNL; Begoli, Edmon [ORNL; Jose, Ajith [Missouri University of Science and Technology; Griffin, Christopher [Pennsylvania State University

    2011-02-01

    Political communications in the form of unstructured text convey rich connotative meaning that can reveal underlying group social processes. Previous research has focused on sentiment analysis at the document level, but we extend this analysis to sub-document levels through a detailed analysis of affective relationships between entities extracted from a document. Instead of pure sentiment analysis, which is just positive or negative, we explore nuances of affective meaning in 22 affect categories. Our affect propagation algorithm automatically calculates and displays extracted affective relationships among entities in graphical form in our prototype (TEAMSTER), starting with seed lists of affect terms. Several useful metrics are defined to infer underlying group processes by aggregating affective relationships discovered in a text. Our approach has been validated with annotated documents from the MPQA corpus, achieving a performance gain of 74% over comparable random guessers.

  8. Significant genetic differentiation between native and introduced silver carp (Hypophthalmichthys molitrix) inferred from mtDNA analysis

    Science.gov (United States)

    Li, S.-F.; Xu, J.-W.; Yang, Q.-L.; Wang, C.H.; Chapman, D.C.; Lu, G.

    2011-01-01

    Silver carp Hypophthalmichthys molitrix (Cyprinidae) is native to China and has been introduced to over 80 countries. The extent of genetic diversity in introduced silver carp and the genetic divergence between introduced and native populations remain largely unknown. In this study, 241 silver carp sampled from three major native rivers and two non-native rivers (Mississippi River and Danube River) were analyzed using nucleotide sequences of mitochondrial COI gene and D-loop region. A total of 73 haplotypes were observed, with no haplotype found common to all the five populations and eight haplotypes shared by two to four populations. As compared with introduced populations, all native populations possess both higher haplotype diversity and higher nucleotide diversity, presumably a result of the founder effect. Significant genetic differentiation was revealed between native and introduced populations as well as among five sampled populations, suggesting strong selection pressures might have occurred in introduced populations. Collectively, this study not only provides baseline information for sustainable use of silver carp in their native country (i.e., China), but also offers first-hand genetic data for the control of silver carp in countries (e.g., the United States) where they are considered invasive.

  9. Parametric inference for biological sequence analysis.

    Science.gov (United States)

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.

  10. Bottlenecks and Hubs in Inferred Networks Are Important for Virulence in Salmonella typhimurium

    Energy Technology Data Exchange (ETDEWEB)

    McDermott, Jason E.; Taylor, Ronald C.; Yoon, Hyunjin; Heffron, Fred

    2009-02-01

    Recent advances in experimental methods have provided sufficient data to consider systems as large networks of interconnected components. High-throughput determination of protein-protein interaction networks has led to the observation that topological bottlenecks, that is proteins defined by high centrality in the network, are enriched in proteins with systems-level phenotypes such as essentiality. Global transcriptional profiling by microarray analysis has been used extensively to characterize systems, for example, cellular response to environmental conditions and genetic mutations. These transcriptomic datasets have been used to infer regulatory and functional relationship networks based on co-regulation. We use the context likelihood of relatedness (CLR) method to infer networks from two datasets gathered from the pathogen Salmonella typhimurium; one under a range of environmental culture conditions and the other from deletions of 15 regulators found to be essential in virulence. Bottleneck nodes were identified from these inferred networks and we show that these nodes are significantly more likely to be essential for virulence than their non-bottleneck counterparts. A network generated using Pearson correlation did not display this behavior. Overall this study demonstrates that topology of networks inferred from global transcriptional profiles provides information about the systems-level roles of bottleneck genes. Analysis of the differences between the two CLR-derived networks suggests that the bottleneck nodes are either mediators of transitions between system states or sentinels that reflect the dynamics of these transitions.

  11. Microsatellite data analysis for population genetics

    Science.gov (United States)

    Theories and analytical tools of population genetics have been widely applied for addressing various questions in the fields of ecological genetics, conservation biology, and any context where the role of dispersal or gene flow is important. Underlying much of population genetics is the analysis of ...

  12. The Analysis of Polyploid Genetic Data

    NARCIS (Netherlands)

    Meirmans, P.G.; Liu, S.; van Tienderen, P.H.

    2018-01-01

    Though polyploidy is an important aspect of the evolutionary genetics of both plants and animals, the development of population genetic theory of polyploids has seriously lagged behind that of diploids. This is unfortunate since the analysis of polyploid genetic data—and the interpretation of the

  13. A simple algorithm to estimate genetic variance in an animal threshold model using Bayesian inference Genetics Selection Evolution 2010, 42:29

    DEFF Research Database (Denmark)

    Ødegård, Jørgen; Meuwissen, Theo HE; Heringstad, Bjørg

    2010-01-01

    Background In the genetic analysis of binary traits with one observation per animal, animal threshold models frequently give biased heritability estimates. In some cases, this problem can be circumvented by fitting sire- or sire-dam models. However, these models are not appropriate in cases where...... records exist for the parents). Furthermore, the new algorithm showed much faster Markov chain mixing properties for genetic parameters (similar to the sire-dam model). Conclusions The new algorithm to estimate genetic parameters via Gibbs sampling solves the bias problems typically occurring in animal...... individual records exist on parents. Therefore, the aim of our study was to develop a new Gibbs sampling algorithm for a proper estimation of genetic (co)variance components within an animal threshold model framework. Methods In the proposed algorithm, individuals are classified as either "informative...

  14. Genetic analysis of post-mating reproductive barriers in hybridizing European Populus species.

    Science.gov (United States)

    Macaya-Sanz, D; Suter, L; Joseph, J; Barbará, T; Alba, N; González-Martínez, S C; Widmer, A; Lexer, C

    2011-10-01

    Molecular genetic analyses of experimental crosses provide important information on the strength and nature of post-mating barriers to gene exchange between divergent populations, which are topics of great interest to evolutionary geneticists and breeders. Although not a trivial task in long-lived organisms such as trees, experimental interspecific recombinants can sometimes be created through controlled crosses involving natural F(1)'s. Here, we used this approach to understand the genetics of post-mating isolation and barriers to introgression in Populus alba and Populus tremula, two ecologically divergent, hybridizing forest trees. We studied 86 interspecific backcross (BC(1)) progeny and >350 individuals from natural populations of these species for up to 98 nuclear genetic markers, including microsatellites, indels and single nucleotide polymorphisms, and inferred the origin of the cytoplasm of the cross with plastid DNA. Genetic analysis of the BC(1) revealed extensive segregation distortions on six chromosomes, and >90% of these (12 out of 13) favored P. tremula donor alleles in the heterospecific genomic background. Since selection was documented during early diploid stages of the progeny, this surprising result was attributed to epistasis, cyto-nuclear coadaptation, heterozygote advantage at nuclear loci experiencing introgression or a combination of these. Our results indicate that gene flow across 'porous' species barriers affects these poplars and aspens beyond neutral, Mendelian expectations and suggests the mechanisms responsible. Contrary to expectations, the Populus sex determination region is not protected from introgression. Understanding the population dynamics of the Populus sex determination region will require tests based on natural interspecific hybrid zones.

  15. Inferring Fitness Effects from Time-Resolved Sequence Data with a Delay-Deterministic Model.

    Science.gov (United States)

    Nené, Nuno R; Dunham, Alistair S; Illingworth, Christopher J R

    2018-05-01

    A common challenge arising from the observation of an evolutionary system over time is to infer the magnitude of selection acting upon a specific genetic variant, or variants, within the population. The inference of selection may be confounded by the effects of genetic drift in a system, leading to the development of inference procedures to account for these effects. However, recent work has suggested that deterministic models of evolution may be effective in capturing the effects of selection even under complex models of demography, suggesting the more general application of deterministic approaches to inference. Responding to this literature, we here note a case in which a deterministic model of evolution may give highly misleading inferences, resulting from the nondeterministic properties of mutation in a finite population. We propose an alternative approach that acts to correct for this error, and which we denote the delay-deterministic model. Applying our model to a simple evolutionary system, we demonstrate its performance in quantifying the extent of selection acting within that system. We further consider the application of our model to sequence data from an evolutionary experiment. We outline scenarios in which our model may produce improved results for the inference of selection, noting that such situations can be easily identified via the use of a regular deterministic model. Copyright © 2018 Nené et al.

  16. Cumulative t-link threshold models for the genetic analysis of calving ease scores

    Directory of Open Access Journals (Sweden)

    Tempelman Robert J

    2003-09-01

    Full Text Available Abstract In this study, a hierarchical threshold mixed model based on a cumulative t-link specification for the analysis of ordinal data or more, specifically, calving ease scores, was developed. The validation of this model and the Markov chain Monte Carlo (MCMC algorithm was carried out on simulated data from normally and t4 (i.e. a t-distribution with four degrees of freedom distributed populations using the deviance information criterion (DIC and a pseudo Bayes factor (PBF measure to validate recently proposed model choice criteria. The simulation study indicated that although inference on the degrees of freedom parameter is possible, MCMC mixing was problematic. Nevertheless, the DIC and PBF were validated to be satisfactory measures of model fit to data. A sire and maternal grandsire cumulative t-link model was applied to a calving ease dataset from 8847 Italian Piemontese first parity dams. The cumulative t-link model was shown to lead to posterior means of direct and maternal heritabilities (0.40 ± 0.06, 0.11 ± 0.04 and a direct maternal genetic correlation (-0.58 ± 0.15 that were not different from the corresponding posterior means of the heritabilities (0.42 ± 0.07, 0.14 ± 0.04 and the genetic correlation (-0.55 ± 0.14 inferred under the conventional cumulative probit link threshold model. Furthermore, the correlation (> 0.99 between posterior means of sire progeny merit from the two models suggested no meaningful rerankings. Nevertheless, the cumulative t-link model was decisively chosen as the better fitting model for this calving ease data using DIC and PBF.

  17. A non-parametric mixture model for genome-enabled prediction of genetic value for a quantitative trait.

    Science.gov (United States)

    Gianola, Daniel; Wu, Xiao-Lin; Manfredi, Eduardo; Simianer, Henner

    2010-10-01

    A Bayesian nonparametric form of regression based on Dirichlet process priors is adapted to the analysis of quantitative traits possibly affected by cryptic forms of gene action, and to the context of SNP-assisted genomic selection, where the main objective is to predict a genomic signal on phenotype. The procedure clusters unknown genotypes into groups with distinct genetic values, but in a setting in which the number of clusters is unknown a priori, so that standard methods for finite mixture analysis do not work. The central assumption is that genetic effects follow an unknown distribution with some "baseline" family, which is a normal process in the cases considered here. A Bayesian analysis based on the Gibbs sampler produces estimates of the number of clusters, posterior means of genetic effects, a measure of credibility in the baseline distribution, as well as estimates of parameters of the latter. The procedure is illustrated with a simulation representing two populations. In the first one, there are 3 unknown QTL, with additive, dominance and epistatic effects; in the second, there are 10 QTL with additive, dominance and additive × additive epistatic effects. In the two populations, baseline parameters are inferred correctly. The Dirichlet process model infers the number of unique genetic values correctly in the first population, but it produces an understatement in the second one; here, the true number of clusters is over 900, and the model gives a posterior mean estimate of about 140, probably because more replication of genotypes is needed for correct inference. The impact on inferences of the prior distribution of a key parameter (M), and of the extent of replication, was examined via an analysis of mean body weight in 192 paternal half-sib families of broiler chickens, where each sire was genotyped for nearly 7,000 SNPs. In this small sample, it was found that inference about the number of clusters was affected by the prior distribution of M. For a

  18. An analysis of the genetic diversity and genetic structure of ...

    African Journals Online (AJOL)

    Scientific approaches to conservation of threatened species depend on a good understanding of the genetic information of wild and artificial population. The genetic diversity and structure analysis of 10 Eucommia ulmoides population was analyzed using inter-simple sequence repeat (ISSR) markers in this paper.

  19. Simulating pattern-process relationships to validate landscape genetic models

    Science.gov (United States)

    A. J. Shirk; S. A. Cushman; E. L. Landguth

    2012-01-01

    Landscapes may resist gene flow and thereby give rise to a pattern of genetic isolation within a population. The mechanism by which a landscape resists gene flow can be inferred by evaluating the relationship between landscape models and an observed pattern of genetic isolation. This approach risks false inferences because researchers can never feasibly test all...

  20. Genetic approaches in comparative and evolutionary physiology

    Science.gov (United States)

    Bridgham, Jamie T.; Kelly, Scott A.; Garland, Theodore

    2015-01-01

    Whole animal physiological performance is highly polygenic and highly plastic, and the same is generally true for the many subordinate traits that underlie performance capacities. Quantitative genetics, therefore, provides an appropriate framework for the analysis of physiological phenotypes and can be used to infer the microevolutionary processes that have shaped patterns of trait variation within and among species. In cases where specific genes are known to contribute to variation in physiological traits, analyses of intraspecific polymorphism and interspecific divergence can reveal molecular mechanisms of functional evolution and can provide insights into the possible adaptive significance of observed sequence changes. In this review, we explain how the tools and theory of quantitative genetics, population genetics, and molecular evolution can inform our understanding of mechanism and process in physiological evolution. For example, lab-based studies of polygenic inheritance can be integrated with field-based studies of trait variation and survivorship to measure selection in the wild, thereby providing direct insights into the adaptive significance of physiological variation. Analyses of quantitative genetic variation in selection experiments can be used to probe interrelationships among traits and the genetic basis of physiological trade-offs and constraints. We review approaches for characterizing the genetic architecture of physiological traits, including linkage mapping and association mapping, and systems approaches for dissecting intermediary steps in the chain of causation between genotype and phenotype. We also discuss the promise and limitations of population genomic approaches for inferring adaptation at specific loci. We end by highlighting the role of organismal physiology in the functional synthesis of evolutionary biology. PMID:26041111

  1. COMPUTER METHODS OF GENETIC ANALYSIS.

    Directory of Open Access Journals (Sweden)

    A. L. Osipov

    2017-02-01

    Full Text Available The basic statistical methods used in conducting the genetic analysis of human traits. We studied by segregation analysis, linkage analysis and allelic associations. Developed software for the implementation of these methods support.

  2. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data.

    Science.gov (United States)

    Fan, Jean; Lee, Hae-Ock; Lee, Soohyun; Ryu, Da-Eun; Lee, Semin; Xue, Catherine; Kim, Seok Jin; Kim, Kihyun; Barkas, Nikolas; Park, Peter J; Park, Woong-Yang; Kharchenko, Peter V

    2018-06-13

    Characterization of intratumoral heterogeneity is critical to cancer therapy, as presence of phenotypically diverse cell populations commonly fuels relapse and resistance to treatment. Although genetic variation is a well-studied source of intratumoral heterogeneity, the functional impact of most genetic alterations remains unclear. Even less understood is the relative importance of other factors influencing heterogeneity, such as epigenetic state or tumor microenvironment. To investigate the relationship between genetic and transcriptional heterogeneity in a context of cancer progression, we devised a computational approach called HoneyBADGER to identify copy number variation and loss-of-heterozygosity in individual cells from single-cell RNA-sequencing data. By integrating allele and normalized expression information, HoneyBADGER is able to identify and infer the presence of subclone-specific alterations in individual cells and reconstruct underlying subclonal architecture. Examining several tumor types, we show that HoneyBADGER is effective at identifying deletion, amplifications, and copy-neutral loss-of-heterozygosity events, and is capable of robustly identifying subclonal focal alterations as small as 10 megabases. We further apply HoneyBADGER to analyze single cells from a progressive multiple myeloma patient to identify major genetic subclones that exhibit distinct transcriptional signatures relevant to cancer progression. Surprisingly, other prominent transcriptional subpopulations within these tumors did not line up with the genetic subclonal structure, and were likely driven by alternative, non-clonal mechanisms. These results highlight the need for integrative analysis to understand the molecular and phenotypic heterogeneity in cancer. Published by Cold Spring Harbor Laboratory Press.

  3. Assessing Extinction Risk: Integrating Genetic Information

    Directory of Open Access Journals (Sweden)

    Jason Dunham

    1999-06-01

    Full Text Available Risks of population extinction have been estimated using a variety of methods incorporating information from different spatial and temporal scales. We briefly consider how several broad classes of extinction risk assessments, including population viability analysis, incidence functions, and ranking methods integrate information on different temporal and spatial scales. In many circumstances, data from surveys of neutral genetic variability within, and among, populations can provide information useful for assessing extinction risk. Patterns of genetic variability resulting from past and present ecological and demographic events, can indicate risks of extinction that are otherwise difficult to infer from ecological and demographic analyses alone. We provide examples of how patterns of neutral genetic variability, both within, and among populations, can be used to corroborate and complement extinction risk assessments.

  4. Genetic analysis in Bartter syndrome from India.

    Science.gov (United States)

    Sharma, Pradeep Kumar; Saikia, Bhaskar; Sharma, Rachna; Ankur, Kumar; Khilnani, Praveen; Aggarwal, Vinay Kumar; Cheong, Hae

    2014-10-01

    Bartter syndrome is a group of inherited, salt-losing tubulopathies presenting as hypokalemic metabolic alkalosis with normotensive hyperreninemia and hyperaldosteronism. Around 150 cases have been reported in literature till now. Mutations leading to salt losing tubulopathies are not routinely tested in Indian population. The authors have done the genetic analysis for the first time in the Bartter syndrome on two cases from India. First case was antenatal Bartter syndrome presenting with massive polyuria and hyperkalemia. Mutational analysis revealed compound heterozygous mutations in KCNJ1(ROMK) gene [p(Leu220Phe), p(Thr191Pro)]. Second case had a phenotypic presentation of classical Bartter syndrome however, genetic analysis revealed only heterozygous novel mutation in SLC12A gene p(Ala232Thr). Bartter syndrome is a clinical diagnosis and genetic analysis is recommended for prognostication and genetic counseling.

  5. Genetic Diversity and Geographic Population Structure of Bovine Neospora caninum Determined by Microsatellite Genotyping Analysis

    Science.gov (United States)

    Regidor-Cerrillo, Javier; Díez-Fuertes, Francisco; García-Culebras, Alicia; Moore, Dadín P.; González-Warleta, Marta; Cuevas, Carmen; Schares, Gereon; Katzer, Frank; Pedraza-Díaz, Susana; Mezo, Mercedes; Ortega-Mora, Luis M.

    2013-01-01

    The cyst-forming protozoan parasite Neospora caninum is one of the main causes of bovine abortion worldwide and is of great economic importance in the cattle industry. Recent studies have revealed extensive genetic variation among N . caninum isolates based on microsatellite sequences (MSs). MSs may be suitable molecular markers for inferring the diversity of parasite populations, molecular epidemiology and the basis for phenotypic variations in N . caninum , which have been poorly defined. In this study, we evaluated nine MS markers using a panel of 11 N . caninum -derived reference isolates from around the world and 96 N . caninum bovine clinical samples and one ovine clinical sample collected from four countries on two continents, including Spain, Argentina, Germany and Scotland, over a 10-year period. These markers were used as molecular tools to investigate the genetic diversity, geographic distribution and population structure of N . caninum . Multilocus microsatellite genotyping based on 7 loci demonstrated high levels of genetic diversity in the samples from all of the different countries, with 96 microsatellite multilocus genotypes (MLGs) identified from 108 N . caninum samples. Geographic sub-structuring was present in the country populations according to pairwise F ST. Principal component analysis (PCA) and Neighbor Joining tree topologies also suggested MLG segregation partially associated with geographical origin. An analysis of the MLG relationships, using eBURST, confirmed that the close genetic relationship observed between the Spanish and Argentinean populations may be the result of parasite migration (i.e., the introduction of novel MLGs from Spain to South America) due to cattle movement. The eBURST relationships also revealed genetically different clusters associated with the abortion. The presence of linkage disequilibrium, the co-existence of specific MLGs to individual farms and eBURST MLG relationships suggest a predominant clonal

  6. Genetic diversity among Korean bermudagrass (Cynodon spp.) ecotypes characterized by morphological, cytological and molecular approaches.

    Science.gov (United States)

    Kang, Si-Yong; Lee, Geung-Joo; Lim, Ki Byung; Lee, Hye Jung; Park, In Sook; Chung, Sung Jin; Kim, Jin-Baek; Kim, Dong Sub; Rhee, Hye Kyung

    2008-04-30

    The genus Cynodon comprises ten species. The objective of this study was to evaluate the genetic diversity of Korean bermudagrasses at the morphological, cytological and molecular levels. Morphological parameters, the nuclear DNA content and ploidy levels were observed in 43 bermudagrass ecotypes. AFLP markers were evaluated to define the genetic diversity, and chromosome counts were made to confirm the inferred cytotypes. Nuclear DNA contents were in the ranges 1.42-1.56, 1.94-2.19, 2.54, and 2.77-2.85 pg/2C for the triploid, tetraploid, pentaploid, and hexaploid accessions, respectively. The inferred cytotypes were triploid (2n = 3x = 27), tetraploid (2n = 4x = 36), pentaploid (2n = 5x = 45), and hexaploid (2n = 6x = 54), but the majority of the collections were tetraploid (81%). Mitotic chromosome counts verified the corresponding ploidy levels. The fast growing fine-textured ecotypes had lower ploidy levels, while the pentaploids and hexaploids were coarse types. The genetic similarity ranged from 0.42 to 0.94 with an average of 0.64. UPGMA cluster analysis and principle coordinate analysis separated the ecotypes into 6 distinct groups. The genetic similarity suggests natural hybridization between the different cytotypes, which could be useful resources for future breeding and genetic studies.

  7. Copy-number analysis and inference of subclonal populations in cancer genomes using Sclust.

    Science.gov (United States)

    Cun, Yupeng; Yang, Tsun-Po; Achter, Viktor; Lang, Ulrich; Peifer, Martin

    2018-06-01

    The genomes of cancer cells constantly change during pathogenesis. This evolutionary process can lead to the emergence of drug-resistant mutations in subclonal populations, which can hinder therapeutic intervention in patients. Data derived from massively parallel sequencing can be used to infer these subclonal populations using tumor-specific point mutations. The accurate determination of copy-number changes and tumor impurity is necessary to reliably infer subclonal populations by mutational clustering. This protocol describes how to use Sclust, a copy-number analysis method with a recently developed mutational clustering approach. In a series of simulations and comparisons with alternative methods, we have previously shown that Sclust accurately determines copy-number states and subclonal populations. Performance tests show that the method is computationally efficient, with copy-number analysis and mutational clustering taking Linux/Unix command-line syntax should be able to carry out analyses of subclonal populations.

  8. Study of human genetic diversity : inferences on population origin and history

    OpenAIRE

    Haber, Marc, 1980-

    2013-01-01

    Patterns of human genetic diversity suggest that all modern humans originated from a small population in Africa that expanded rapidly 50,000 years ago to occupy the whole world. While moving into new environments, genetic drift and natural selection affected populations differently, creating genetic structure. By understanding the genetic structure of human populations, we can reconstruct human history and understand the genetic basis of diseases. The work presented here contributes to the on...

  9. Structural influence of gene networks on their inference: analysis of C3NET

    Directory of Open Access Journals (Sweden)

    Emmert-Streib Frank

    2011-06-01

    Full Text Available Abstract Background The availability of large-scale high-throughput data possesses considerable challenges toward their functional analysis. For this reason gene network inference methods gained considerable interest. However, our current knowledge, especially about the influence of the structure of a gene network on its inference, is limited. Results In this paper we present a comprehensive investigation of the structural influence of gene networks on the inferential characteristics of C3NET - a recently introduced gene network inference algorithm. We employ local as well as global performance metrics in combination with an ensemble approach. The results from our numerical study for various biological and synthetic network structures and simulation conditions, also comparing C3NET with other inference algorithms, lead a multitude of theoretical and practical insights into the working behavior of C3NET. In addition, in order to facilitate the practical usage of C3NET we provide an user-friendly R package, called c3net, and describe its functionality. It is available from https://r-forge.r-project.org/projects/c3net and from the CRAN package repository. Conclusions The availability of gene network inference algorithms with known inferential properties opens a new era of large-scale screening experiments that could be equally beneficial for basic biological and biomedical research with auspicious prospects. The availability of our easy to use software package c3net may contribute to the popularization of such methods. Reviewers This article was reviewed by Lev Klebanov, Joel Bader and Yuriy Gusev.

  10. Inferring relationships between pairs of individuals from locus heterozygosities

    Directory of Open Access Journals (Sweden)

    Spinetti Isabella

    2002-11-01

    Full Text Available Abstract Background The traditional exact method for inferring relationships between individuals from genetic data is not easily applicable in all situations that may be encountered in several fields of applied genetics. This study describes an approach that gives affordable results and is easily applicable; it is based on the probabilities that two individuals share 0, 1 or both alleles at a locus identical by state. Results We show that these probabilities (zi depend on locus heterozygosity (H, and are scarcely affected by variation of the distribution of allele frequencies. This allows us to obtain empirical curves relating zi's to H for a series of common relationships, so that the likelihood ratio of a pair of relationships between any two individuals, given their genotypes at a locus, is a function of a single parameter, H. Application to large samples of mother-child and full-sib pairs shows that the statistical power of this method to infer the correct relationship is not much lower than the exact method. Analysis of a large database of STR data proves that locus heterozygosity does not vary significantly among Caucasian populations, apart from special cases, so that the likelihood ratio of the more common relationships between pairs of individuals may be obtained by looking at tabulated zi values. Conclusions A simple method is provided, which may be used by any scientist with the help of a calculator or a spreadsheet to compute the likelihood ratios of common alternative relationships between pairs of individuals.

  11. A general framework for the evaluation of genetic association studies using multiple marginal models

    DEFF Research Database (Denmark)

    Kitsche, Andreas; Ritz, Christian; Hothorn, Ludwig A.

    2016-01-01

    OBJECTIVE: In this study, we present a simultaneous inference procedure as a unified analysis framework for genetic association studies. METHODS: The method is based on the formulation of multiple marginal models that reflect different modes of inheritance. The basic advantage of this methodology...

  12. Bayesian inference in genetic parameter estimation of visual scores in Nellore beef-cattle

    Science.gov (United States)

    2009-01-01

    The aim of this study was to estimate the components of variance and genetic parameters for the visual scores which constitute the Morphological Evaluation System (MES), such as body structure (S), precocity (P) and musculature (M) in Nellore beef-cattle at the weaning and yearling stages, by using threshold Bayesian models. The information used for this was gleaned from visual scores of 5,407 animals evaluated at the weaning and 2,649 at the yearling stages. The genetic parameters for visual score traits were estimated through two-trait analysis, using the threshold animal model, with Bayesian statistics methodology and MTGSAM (Multiple Trait Gibbs Sampler for Animal Models) threshold software. Heritability estimates for S, P and M were 0.68, 0.65 and 0.62 (at weaning) and 0.44, 0.38 and 0.32 (at the yearling stage), respectively. Heritability estimates for S, P and M were found to be high, and so it is expected that these traits should respond favorably to direct selection. The visual scores evaluated at the weaning and yearling stages might be used in the composition of new selection indexes, as they presented sufficient genetic variability to promote genetic progress in such morphological traits. PMID:21637450

  13. Complex postglacial recolonization inferred from population genetic structure of mottled sculpin Cottus bairdii in tributaries of eastern Lake Michigan, U.S.A.

    Science.gov (United States)

    Homola, J J; Ruetz, C R; Kohler, S L; Thum, R A

    2016-11-01

    This study used analyses of the genetic structure of a non-game fish species, the mottled sculpin Cottus bairdii to hypothesize probable recolonization routes used by cottids and possibly other Laurentian Great Lakes fishes following glacial recession. Based on samples from 16 small streams in five major Lake Michigan, U.S.A., tributary basins, significant interpopulation differentiation was documented (overall F ST = 0·235). Differentiation was complex, however, with unexpectedly high genetic similarity among basins as well as occasionally strong differentiation within basins, despite relatively close geographic proximity of populations. Genetic dissimilarities were identified between eastern and western populations within river basins, with similarities existing between eastern and western populations across basins. Given such patterns, recolonization is hypothesized to have occurred on three occasions from more than one glacial refugium, with a secondary vicariant event resulting from reduction in the water level of ancestral Lake Michigan. By studying the phylogeography of a small, non-game fish species, this study provides insight into recolonization dynamics of the region that could be difficult to infer from game species that are often broadly dispersed by humans. © 2016 The Fisheries Society of the British Isles.

  14. Haplotype inference in general pedigrees with two sites

    Directory of Open Access Journals (Sweden)

    Doan Duong D

    2011-04-01

    Full Text Available Abstract Background Genetic disease studies investigate relationships between changes in chromosomes and genetic diseases. Single haplotypes provide useful information for these studies but extracting single haplotypes directly by biochemical methods is expensive. A computational method to infer haplotypes from genotype data is therefore important. We investigate the problem of computing the minimum number of recombination events for general pedigrees with two sites for all members. Results We show that this NP-hard problem can be parametrically reduced to the Bipartization by Edge Removal problem and therefore can be solved by an O(2k · n2 exact algorithm, where n is the number of members and k is the number of recombination events. Conclusions Our work can therefore be useful for genetic disease studies to track down how changes in haplotypes such as recombinations relate to genetic disease.

  15. Nonparametric Bayesian inference for mean residual life functions in survival analysis.

    Science.gov (United States)

    Poynor, Valerie; Kottas, Athanasios

    2018-01-19

    Modeling and inference for survival analysis problems typically revolves around different functions related to the survival distribution. Here, we focus on the mean residual life (MRL) function, which provides the expected remaining lifetime given that a subject has survived (i.e. is event-free) up to a particular time. This function is of direct interest in reliability, medical, and actuarial fields. In addition to its practical interpretation, the MRL function characterizes the survival distribution. We develop general Bayesian nonparametric inference for MRL functions built from a Dirichlet process mixture model for the associated survival distribution. The resulting model for the MRL function admits a representation as a mixture of the kernel MRL functions with time-dependent mixture weights. This model structure allows for a wide range of shapes for the MRL function. Particular emphasis is placed on the selection of the mixture kernel, taken to be a gamma distribution, to obtain desirable properties for the MRL function arising from the mixture model. The inference method is illustrated with a data set of two experimental groups and a data set involving right censoring. The supplementary material available at Biostatistics online provides further results on empirical performance of the model, using simulated data examples. © The Author 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  16. Designs and Methods for Association Studies and Population Size Inference in Statistical Genetics

    DEFF Research Database (Denmark)

    Waltoft, Berit Lindum

    method provides a simple goodness of t test by comparing the observed SFS with the expected SFS under a given model of population size changes. By the use of Monte Carlo estimation the expected time between coalescent events can be estimated and the expected SFS can thereby be evaluated. Using......). The OR is interpreted as the eect of an exposure on the probability of being diseased at the end of follow-up, while the interpretation of the IRR is the eect of an exposure on the probability of becoming diseased. Through a simulation study, the OR from a classical case-control study is shown to be an inconsistent...... the classical chi-square statistics we are able to infer single parameter models. Multiple parameter models, e.g. multiple epochs, are harder to identify. By introducing the inference of population size back in time as an inverse problem, the second procedure applies the theory of smoothing splines to infer...

  17. Rapid Genetic Analysis in Congenital Hyperinsulinism

    DEFF Research Database (Denmark)

    Christesen, Henrik Thybo; Brusgaard, Klaus; Alm, Jan

    2007-01-01

    BACKGROUND: In severe, medically unresponsive congenital hyperinsulinism (CHI), the histological differentiation of focal versus diffuse disease is vital, since the surgical management is completely different. Genetic analysis may help in the differential diagnosis, as focal CHI is associated...... with a paternal germline ABCC8 or KCNJ11 mutation and a focal loss of maternal chromosome 11p15, whereas a maternal mutation, or homozygous/compound heterozygous ABCC8 and KCNJ11 mutations predict diffuse-type disease. However, genotyping usually takes too long to be helpful in the absence of a founder mutation....... METHODS: In 4 patients, a rapid genetic analysis of the ABBC8 and KCNJ11 genes was performed within 2 weeks on request prior to the decision of pancreatic surgery. RESULTS: Two patients had no mutations, rendering the genetic analysis non-informative. Peroperative multiple biopsies showed diffuse disease...

  18. Genetic variation of the greenhouse whitefly, Trialeurodes vaporariorum (Hemiptera: Aleyrodidae), among populations from Serbia and neighbouring countries, as inferred from COI sequence variability.

    Science.gov (United States)

    Prijović, M; Skaljac, M; Drobnjaković, T; Zanić, K; Perić, P; Marčić, D; Puizina, J

    2014-06-01

    The greenhouse whitefly Trialeurodes vaporariorum Westwood, 1856 (Hemiptera: Aleyrodidae) is an invasive and highly polyphagous phloem-feeding pest of vegetables and ornamentals. Trialeurodes vaporariorum causes serious damage due to direct feeding and transmits several important plant viruses. Excessive use of insecticides has resulted in significantly reduced levels of susceptibility of various T. vaporariorum populations. To determine the genetic variability within and among populations of T. vaporariorum from Serbia and to explore their genetic relatedness with other T. vaporariorum populations, we analysed the mitochondrial cytochrome c oxidase I (COI) sequences of 16 populations from Serbia and six neighbouring countries: Montenegro (three populations), Macedonia (one population) and Croatia (two populations), for a total of 198 analysed specimens. A low overall level of sequence divergence and only five variable nucleotides and six haplotypes were found. The most frequent haplotype, H1, was identified in all Serbian populations and in all specimens from distant localities in Croatia and Macedonia. The COI sequence data that was retrieved from GenBank and the data from our study indicated that H1 is the most globally widespread T. vaporariorum haplotype. A lack of spatial genetic structure among the studied T. vaporariorum populations, as well as two demographic tests that we performed (Tajima's D value and Fu's Fs statistics), indicate a recent colonisation event and population growth. Phylogenetic analyses of the COI haplotypes in this study and other T. vaporariorum haplotypes that were retrieved from GenBank were performed using Bayesian inference and median-joining (MJ) network analysis. Two major haplogroups with only a single unique nucleotide difference were found: haplogroup 1 (containing the five Serbian haplotypes and those previously identified in India, China, the Netherlands, the United Kingdom, Morocco, Reunion and the USA) and haplogroup 3

  19. Characterization and comparison of EST-SSR and TRAP markers for genetic analysis of the Japanese persimmon Diospyros kaki.

    Science.gov (United States)

    Luo, C; Zhang, F; Zhang, Q L; Guo, D Y; Luo, Z R

    2013-01-09

    We developed and characterized expressed sequence tags (ESTs)-simple sequence repeats (SSRs) and targeted region amplified polymorphism (TRAP) markers to examine genetic relationships in the persimmon genus Diospyros gene pool. In total, we characterized 14 EST-SSR primer pairs and 36 TRAP primer combinations, which were amplified across 20 germplasms of 4 species in the genus Diospyros. We used various genetic parameters, including effective multiplex ratio (EMR), diversity index (DI), and marker index (MI), to test the utility of these markers. TRAP markers gave higher EMR (24.85) but lower DI (0.33), compared to EST-SSRs (EMR = 3.65, DI = 0.34). TRAP gave a very high MI (8.08), which was about 8 times than the MI of EST-SSR (1.25). These markers were utilized for phylogenetic inference of 20 genotypes of Diospyros kaki Thunb. and allied species, with a result that all kaki genotypes clustered closely and 3 allied species formed an independent group. These markers could be further exploited for large-scale genetic relationship inference.

  20. Causal inference in econometrics

    CERN Document Server

    Kreinovich, Vladik; Sriboonchitta, Songsak

    2016-01-01

    This book is devoted to the analysis of causal inference which is one of the most difficult tasks in data analysis: when two phenomena are observed to be related, it is often difficult to decide whether one of them causally influences the other one, or whether these two phenomena have a common cause. This analysis is the main focus of this volume. To get a good understanding of the causal inference, it is important to have models of economic phenomena which are as accurate as possible. Because of this need, this volume also contains papers that use non-traditional economic models, such as fuzzy models and models obtained by using neural networks and data mining techniques. It also contains papers that apply different econometric models to analyze real-life economic dependencies.

  1. Congruence between morphological and molecular markers inferred from the analysis of the intra-morphotype genetic diversity and the spatial structure of Oxalis tuberosa Mol.

    Science.gov (United States)

    Pissard, Audrey; Arbizu, Carlos; Ghislain, Marc; Faux, Anne-Michèle; Paulet, Sébastien; Bertin, Pierre

    2008-01-01

    Oxalis tuberosa is an important crop cultivated in the highest Andean zones. A germplasm collection is maintained ex situ by CIP, which has developed a morphological markers system to classify the accessions into morphotypes, i.e. groups of morphologically identical accessions. However, their genetic uniformity is currently unknown. The ISSR technique was used in two experiments to determine the relationships between both morphological and molecular markers systems. The intra-morphotype genetic diversity, the spatial structures of the diversity and the congruence between both markers systems were determined. In the first experience, 44 accessions representing five morphotypes, clearly distinct from each other, were analyzed. At the molecular level, the accessions exactly clustered according to their morphotypes. However, a genetic variability was observed inside each morphotype. In the second experiment, 34 accessions gradually differing from each other on morphological base were analyzed. The morphological clustering showed no geographical structure. On the opposite, the molecular analysis showed that the genetic structure was slightly related to the collection site. The correlation between both markers systems was weak but significant. The lack of perfect congruence between morphological and molecular data suggests that the morphological system may be useful for the morphotypes management but is not appropriate to study the genetic structure of the oca. The spatial structure of the genetic diversity can be related to the evolution of the species and the discordance between the morphological and molecular structures may result from similar selection pressures at different places leading to similar forms with a different genetic background.

  2. Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies.

    Science.gov (United States)

    Bhaskar, Anand; Javanmard, Adel; Courtade, Thomas A; Tse, David

    2017-03-15

    Genetic variation in human populations is influenced by geographic ancestry due to spatial locality in historical mating and migration patterns. Spatial population structure in genetic datasets has been traditionally analyzed using either model-free algorithms, such as principal components analysis (PCA) and multidimensional scaling, or using explicit spatial probabilistic models of allele frequency evolution. We develop a general probabilistic model and an associated inference algorithm that unify the model-based and data-driven approaches to visualizing and inferring population structure. Our spatial inference algorithm can also be effectively applied to the problem of population stratification in genome-wide association studies (GWAS), where hidden population structure can create fictitious associations when population ancestry is correlated with both the genotype and the trait. Our algorithm Geographic Ancestry Positioning (GAP) relates local genetic distances between samples to their spatial distances, and can be used for visually discerning population structure as well as accurately inferring the spatial origin of individuals on a two-dimensional continuum. On both simulated and several real datasets from diverse human populations, GAP exhibits substantially lower error in reconstructing spatial ancestry coordinates compared to PCA. We also develop an association test that uses the ancestry coordinates inferred by GAP to accurately account for ancestry-induced correlations in GWAS. Based on simulations and analysis of a dataset of 10 metabolic traits measured in a Northern Finland cohort, which is known to exhibit significant population structure, we find that our method has superior power to current approaches. Our software is available at https://github.com/anand-bhaskar/gap . abhaskar@stanford.edu or ajavanma@usc.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved

  3. Neutral polymorphisms in putative housekeeping genes and tandem repeats unravels the population genetics and evolutionary history of Plasmodium vivax in India.

    Directory of Open Access Journals (Sweden)

    Surendra K Prajapati

    Full Text Available The evolutionary history and age of Plasmodium vivax has been inferred as both recent and ancient by several studies, mainly using mitochondrial genome diversity. Here we address the age of P. vivax on the Indian subcontinent using selectively neutral housekeeping genes and tandem repeat loci. Analysis of ten housekeeping genes revealed a substantial number of SNPs (n = 75 from 100 P. vivax isolates collected from five geographical regions of India. Neutrality tests showed a majority of the housekeeping genes were selectively neutral, confirming the suitability of housekeeping genes for inferring the evolutionary history of P. vivax. In addition, a genetic differentiation test using housekeeping gene polymorphism data showed a lack of geographical structuring between the five regions of India. The coalescence analysis of the time to the most recent common ancestor estimate yielded an ancient TMRCA (232,228 to 303,030 years and long-term population history (79,235 to 104,008 of extant P. vivax on the Indian subcontinent. Analysis of 18 tandem repeat loci polymorphisms showed substantial allelic diversity and heterozygosity per locus, and analysis of potential bottlenecks revealed the signature of a stable P. vivax population, further corroborating our ancient age estimates. For the first time we report a comparable evolutionary history of P. vivax inferred by nuclear genetic markers (putative housekeeping genes to that inferred from mitochondrial genome diversity.

  4. Neutral polymorphisms in putative housekeeping genes and tandem repeats unravels the population genetics and evolutionary history of Plasmodium vivax in India.

    Science.gov (United States)

    Prajapati, Surendra K; Joshi, Hema; Carlton, Jane M; Rizvi, M Alam

    2013-01-01

    The evolutionary history and age of Plasmodium vivax has been inferred as both recent and ancient by several studies, mainly using mitochondrial genome diversity. Here we address the age of P. vivax on the Indian subcontinent using selectively neutral housekeeping genes and tandem repeat loci. Analysis of ten housekeeping genes revealed a substantial number of SNPs (n = 75) from 100 P. vivax isolates collected from five geographical regions of India. Neutrality tests showed a majority of the housekeeping genes were selectively neutral, confirming the suitability of housekeeping genes for inferring the evolutionary history of P. vivax. In addition, a genetic differentiation test using housekeeping gene polymorphism data showed a lack of geographical structuring between the five regions of India. The coalescence analysis of the time to the most recent common ancestor estimate yielded an ancient TMRCA (232,228 to 303,030 years) and long-term population history (79,235 to 104,008) of extant P. vivax on the Indian subcontinent. Analysis of 18 tandem repeat loci polymorphisms showed substantial allelic diversity and heterozygosity per locus, and analysis of potential bottlenecks revealed the signature of a stable P. vivax population, further corroborating our ancient age estimates. For the first time we report a comparable evolutionary history of P. vivax inferred by nuclear genetic markers (putative housekeeping genes) to that inferred from mitochondrial genome diversity.

  5. A Bayesian Network Schema for Lessening Database Inference

    National Research Council Canada - National Science Library

    Chang, LiWu; Moskowitz, Ira S

    2001-01-01

    .... The authors introduce a formal schema for database inference analysis, based upon a Bayesian network structure, which identifies critical parameters involved in the inference problem and represents...

  6. Inferring ancient Agave cultivation practices from contemporary genetic patterns.

    Science.gov (United States)

    Parker, Kathleen C; Trapnell, Dorset W; Hamrick, J L; Hodgson, Wendy C; Parker, Albert J

    2010-04-01

    Several Agave species have played an important ethnobotanical role since prehistory in Mesoamerica and semiarid areas to the north, including central Arizona. We examined genetic variation in relict Agave parryi populations northeast of the Mogollon Rim in Arizona, remnants from anthropogenic manipulation over 600 years ago. We used both allozymes and microsatellites to compare genetic variability and structure in anthropogenically manipulated populations with putative wild populations, to assess whether they were actively cultivated or the result of inadvertent manipulation, and to determine probable source locations for anthropogenic populations. Wild populations were more genetically diverse than anthropogenic populations, with greater expected heterozygosity, polymorphic loci, effective number of alleles and allelic richness. Anthropogenic populations exhibited many traits indicative of past active cultivation: fixed heterozygosity for several loci in all populations (nonexistent in wild populations); fewer multilocus genotypes, which differed by fewer alleles; and greater differentiation among populations than was characteristic of wild populations. Furthermore, manipulated populations date from a period when changes in the cultural context may have favoured active cultivation near dwellings. Patterns of genetic similarity among populations suggest a complex anthropogenic history. Anthropogenic populations were not simply derived from the closest wild A. parryi stock; instead they evidently came from more distant, often more diverse, wild populations, perhaps obtained through trade networks in existence at the time of cultivation.

  7. Analysis of genetic diversity inpigeonpeagermplasm using ...

    Indian Academy of Sciences (India)

    Navya

    2016-11-25

    Nov 25, 2016 ... accessions from Orissa (105) and AP (15) do not group with any Indian accessions. ... In the present work, comparison between SSAP and REMAP revealed ... (sequence-specific amplified polymorphism) for genetic analysis of sweet potato. ... Sharma,V.and Nandinemi, M.R. 2014 Assessment of genetic ...

  8. Inference of gene-phenotype associations via protein-protein interaction and orthology.

    Directory of Open Access Journals (Sweden)

    Panwen Wang

    Full Text Available One of the fundamental goals of genetics is to understand gene functions and their associated phenotypes. To achieve this goal, in this study we developed a computational algorithm that uses orthology and protein-protein interaction information to infer gene-phenotype associations for multiple species. Furthermore, we developed a web server that provides genome-wide phenotype inference for six species: fly, human, mouse, worm, yeast, and zebrafish. We evaluated our inference method by comparing the inferred results with known gene-phenotype associations. The high Area Under the Curve values suggest a significant performance of our method. By applying our method to two human representative diseases, Type 2 Diabetes and Breast Cancer, we demonstrated that our method is able to identify related Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways. The web server can be used to infer functions and putative phenotypes of a gene along with the candidate genes of a phenotype, and thus aids in disease candidate gene discovery. Our web server is available at http://jjwanglab.org/PhenoPPIOrth.

  9. Genome-Wide SNP Discovery, Genotyping and Their Preliminary Applications for Population Genetic Inference in Spotted Sea Bass (Lateolabrax maculatus.

    Directory of Open Access Journals (Sweden)

    Juan Wang

    Full Text Available Next-generation sequencing and the collection of genome-wide single-nucleotide polymorphisms (SNPs allow identifying fine-scale population genetic structure and genomic regions under selection. The spotted sea bass (Lateolabrax maculatus is a non-model species of ecological and commercial importance and widely distributed in northwestern Pacific. A total of 22 648 SNPs was discovered across the genome of L. maculatus by paired-end sequencing of restriction-site associated DNA (RAD-PE for 30 individuals from two populations. The nucleotide diversity (π for each population was 0.0028±0.0001 in Dandong and 0.0018±0.0001 in Beihai, respectively. Shallow but significant genetic differentiation was detected between the two populations analyzed by using both the whole data set (FST = 0.0550, P < 0.001 and the putatively neutral SNPs (FST = 0.0347, P < 0.001. However, the two populations were highly differentiated based on the putatively adaptive SNPs (FST = 0.6929, P < 0.001. Moreover, a total of 356 SNPs representing 298 unique loci were detected as outliers putatively under divergent selection by FST-based outlier tests as implemented in BAYESCAN and LOSITAN. Functional annotation of the contigs containing putatively adaptive SNPs yielded hits for 22 of 55 (40% significant BLASTX matches. Candidate genes for local selection constituted a wide array of functions, including binding, catalytic and metabolic activities, etc. The analyses with the SNPs developed in the present study highlighted the importance of genome-wide genetic variation for inference of population structure and local adaptation in L. maculatus.

  10. A canonical correlation analysis-based dynamic bayesian network prior to infer gene regulatory networks from multiple types of biological data.

    Science.gov (United States)

    Baur, Brittany; Bozdag, Serdar

    2015-04-01

    One of the challenging and important computational problems in systems biology is to infer gene regulatory networks (GRNs) of biological systems. Several methods that exploit gene expression data have been developed to tackle this problem. In this study, we propose the use of copy number and DNA methylation data to infer GRNs. We developed an algorithm that scores regulatory interactions between genes based on canonical correlation analysis. In this algorithm, copy number or DNA methylation variables are treated as potential regulator variables, and expression variables are treated as potential target variables. We first validated that the canonical correlation analysis method is able to infer true interactions in high accuracy. We showed that the use of DNA methylation or copy number datasets leads to improved inference over steady-state expression. Our results also showed that epigenetic and structural information could be used to infer directionality of regulatory interactions. Additional improvements in GRN inference can be gleaned from incorporating the result in an informative prior in a dynamic Bayesian algorithm. This is the first study that incorporates copy number and DNA methylation into an informative prior in dynamic Bayesian framework. By closely examining top-scoring interactions with different sources of epigenetic or structural information, we also identified potential novel regulatory interactions.

  11. Phylogeography, genetic variability and structure of Acanthamoeba metapopulations in Iran inferred by 18S ribosomal RNA sequences: A systematic review and meta-analysis.

    Science.gov (United States)

    Spotin, Adel; Moslemzadeh, Hamid Reza; Mahami-Oskouei, Mahmoud; Ahmadpour, Ehsan; Niyyati, Maryam; Hejazi, Seyed Hossein; Memari, Fatemeh; Noori, Jafar

    2017-09-01

    To verify phylogeography and genetic structure of Acanthamoeba populations among the Iranian clinical isolates and natural/artificial environments distributed in various regions of the country. We searched electronic databases including Medline, PubMed, Science Direct, Scopus and Google Scholar from 2005 to 2016. To explore the genetic variability of Acanthamoeba sp, 205 sequences were retrieved from keratitis patients, immunosuppressed cases and environmental sources as of various geographies of Iran. T4 genotype was the predominant strain in Iran, and the rare genotypes belonged to T2, T3, T5 (Acanthamoeba lenticulata), T6, T9, T11, T13 and T15 (Acanthamoeba jacobsi). A total of 47 unique haplotypes of T4 were identified. A parsimonious network of the sequence haplotypes demonstrated star-like feature containing haplogroups IR6 (34.1%) and IR7 (31.2%) as the most common haplotypes. In accordance with the analysis of molecular variance, the high value of haplotype diversity (0.612-0.848) of Acanthamoeba T4 represented genetic variability within populations. Neutrality indices of the 18S ribosomal RNA demonstrated negative values in all populations which represented a considerable divergence from neutrality. The majority of genetic diversity belonged to the infected contact lens and dust samples in immunodeficiency and ophthalmology wards, which indicated potential routes for exposure to a pathogenic Acanthamoeba sp. in at-risk individuals. A pairwise fixation index (F ST ) was from low to high values (0.02433-0.41892). The statistically F ST points out that T4 is genetically differentiated between north-west, north-south and central-south metapopulations, but not differentiated between west-central, west-south, central-south, and north-central isolates. An occurrence of IR6 and IR7 displays that possibly a gene flow of Acanthamoeba T4 occurred after the founder effect or bottleneck experience through ecological changes or host mobility. This is the first

  12. Mathematical inference and control of molecular networks from perturbation experiments

    Science.gov (United States)

    Mohammed-Rasheed, Mohammed

    One of the main challenges facing biologists and mathematicians in the post genomic era is to understand the behavior of molecular networks and harness this understanding into an educated intervention of the cell. The cell maintains its function via an elaborate network of interconnecting positive and negative feedback loops of genes, RNA and proteins that send different signals to a large number of pathways and molecules. These structures are referred to as genetic regulatory networks (GRNs) or molecular networks. GRNs can be viewed as dynamical systems with inherent properties and mechanisms, such as steady-state equilibriums and stability, that determine the behavior of the cell. The biological relevance of the mathematical concepts are important as they may predict the differentiation of a stem cell, the maintenance of a normal cell, the development of cancer and its aberrant behavior, and the design of drugs and response to therapy. Uncovering the underlying GRN structure from gene/protein expression data, e.g., microarrays or perturbation experiments, is called inference or reverse engineering of the molecular network. Because of the high cost and time consuming nature of biological experiments, the number of available measurements or experiments is very small compared to the number of molecules (genes, RNA and proteins). In addition, the observations are noisy, where the noise is due to the measurements imperfections as well as the inherent stochasticity of genetic expression levels. Intra-cellular activities and extra-cellular environmental attributes are also another source of variability. Thus, the inference of GRNs is, in general, an under-determined problem with a highly noisy set of observations. The ultimate goal of GRN inference and analysis is to be able to intervene within the network, in order to force it away from undesirable cellular states and into desirable ones. However, it remains a major challenge to design optimal intervention strategies

  13. Inferring genetic diversity and differentiation of the endangered chinese endemic plant sauvagesia rhodoleuca (ochnaceae) using microsatelite markers

    International Nuclear Information System (INIS)

    Chen, Z. Y.; Wei, X.; Jiang, Y. S.; Chai, S. F.

    2015-01-01

    Sauvagesia rhodoleuca is one of the most endangered species in China. It has a narrow distribution in the evergreen broadleaved forest of southern China. Up to now, only six populations remained in two provinces. In this study, eight microsatellite loci were used to examine genetic diversity in these populations. We found very low levels of genetic diversity within populations of S. rhodoleuca with average observed and expected heterozygosity (HO and HE) of 0.069 and 0.186, respectively. Estimated inbreeding coefficients (FIS) within populations were high suggests the probable selfing in the species.Combination of the UPGMA dendrogram and the INSTRUCT analysis show that six extant populations could be classified into three distinct genetic groups and no pattern of isolation by distance was detected among populations. The low genetic variation within populations and high genetic differentiation among populations indicate that the management for the conservation of genetic diversity in S. rhodoleuca should aim to preserve every population. (author)

  14. Stochastic processes inference theory

    CERN Document Server

    Rao, Malempati M

    2014-01-01

    This is the revised and enlarged 2nd edition of the authors’ original text, which was intended to be a modest complement to Grenander's fundamental memoir on stochastic processes and related inference theory. The present volume gives a substantial account of regression analysis, both for stochastic processes and measures, and includes recent material on Ridge regression with some unexpected applications, for example in econometrics. The first three chapters can be used for a quarter or semester graduate course on inference on stochastic processes. The remaining chapters provide more advanced material on stochastic analysis suitable for graduate seminars and discussions, leading to dissertation or research work. In general, the book will be of interest to researchers in probability theory, mathematical statistics and electrical and information theory.

  15. Genetic analysis identifies the region of origin of smuggled peach palm seeds.

    Science.gov (United States)

    Cristo-Araújo, Michelly; Molles, David Bronze; Rodrigues, Doriane Picanço; Clement, Charles R

    2017-04-01

    Seeds of a plant, supposedly a palm tree known popularly as peach palm (Bactris gasipaes), were seized by the Federal Police in the state of Pará, Brazil, without documentation of legal origin to authorize transportation and marketing in Brazil. They were alleged to be from the western part of Amazonas, Brazil, near the frontier with Peru and Colombia, justifying the lack of documentation. The species was confirmed to be peach palm. To determine the likely place of origin, a genetic analysis was performed to determine the relationship between the seized seeds and representative populations of peach palm from all of Amazonia, maintained in the Peach palm Core Collection, at the National Research Institute for Amazonia, using nine microsatellite loci. Reynolds' coancestry analysis showed a strong relationship between the seeds and the Pampa Hermosa landrace, around Yurimaguas, Peru. The Structure program, used to infer the probability of an individual belonging to a given population, showed that most seeds grouped with populations close to Yurimaguas, Peru, corroborating the coancestry analysis. The Pampa Hermosa landrace is the main source of spineless peach palm seeds used in the Brazilian heart-of-palm agribusiness, which motivated the smugglers to attempt this biopiracy. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. BayesTwin: An R Package for Bayesian Inference of Item-Level Twin Data

    Directory of Open Access Journals (Sweden)

    Inga Schwabe

    2017-11-01

    Full Text Available BayesTwin is an open-source R package that serves as a pipeline to the MCMC program JAGS to perform Bayesian inference on genetically-informative hierarchical twin data. Simultaneously to the biometric model, an item response theory (IRT measurement model is estimated, allowing analysis of the raw phenotypic (item-level data. The integration of such a measurement model is important since earlier research has shown that an analysis based on an aggregated measure (e.g., a sum-score based analysis can lead to an underestimation of heritability and the spurious finding of genotype-environment interactions. The package includes all common biometric and IRT models as well as functions that help plot relevant information or determine whether the analysis was performed well. Funding statement: Partly funded by the PROO grant 411-12-623 from the Netherlands Organisation for Scientific Research (NWO.

  17. A Meta-Analysis of Multiple Matched Copy Number and Transcriptomics Data Sets for Inferring Gene Regulatory Relationships

    Science.gov (United States)

    Newton, Richard; Wernisch, Lorenz

    2014-01-01

    Inferring gene regulatory relationships from observational data is challenging. Manipulation and intervention is often required to unravel causal relationships unambiguously. However, gene copy number changes, as they frequently occur in cancer cells, might be considered natural manipulation experiments on gene expression. An increasing number of data sets on matched array comparative genomic hybridisation and transcriptomics experiments from a variety of cancer pathologies are becoming publicly available. Here we explore the potential of a meta-analysis of thirty such data sets. The aim of our analysis was to assess the potential of in silico inference of trans-acting gene regulatory relationships from this type of data. We found sufficient correlation signal in the data to infer gene regulatory relationships, with interesting similarities between data sets. A number of genes had highly correlated copy number and expression changes in many of the data sets and we present predicted potential trans-acted regulatory relationships for each of these genes. The study also investigates to what extent heterogeneity between cell types and between pathologies determines the number of statistically significant predictions available from a meta-analysis of experiments. PMID:25148247

  18. Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis

    Science.gov (United States)

    Dezfuli, Homayoon; Kelly, Dana; Smith, Curtis; Vedros, Kurt; Galyean, William

    2009-01-01

    This document, Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis, is intended to provide guidelines for the collection and evaluation of risk and reliability-related data. It is aimed at scientists and engineers familiar with risk and reliability methods and provides a hands-on approach to the investigation and application of a variety of risk and reliability data assessment methods, tools, and techniques. This document provides both: A broad perspective on data analysis collection and evaluation issues. A narrow focus on the methods to implement a comprehensive information repository. The topics addressed herein cover the fundamentals of how data and information are to be used in risk and reliability analysis models and their potential role in decision making. Understanding these topics is essential to attaining a risk informed decision making environment that is being sought by NASA requirements and procedures such as 8000.4 (Agency Risk Management Procedural Requirements), NPR 8705.05 (Probabilistic Risk Assessment Procedures for NASA Programs and Projects), and the System Safety requirements of NPR 8715.3 (NASA General Safety Program Requirements).

  19. An integrated system for genetic analysis

    Directory of Open Access Journals (Sweden)

    Duan Xiao

    2006-04-01

    Full Text Available Abstract Background Large-scale genetic mapping projects require data management systems that can handle complex phenotypes and detect and correct high-throughput genotyping errors, yet are easy to use. Description We have developed an Integrated Genotyping System (IGS to meet this need. IGS securely stores, edits and analyses genotype and phenotype data. It stores information about DNA samples, plates, primers, markers and genotypes generated by a genotyping laboratory. Data are structured so that statistical genetic analysis of both case-control and pedigree data is straightforward. Conclusion IGS can model complex phenotypes and contain genotypes from whole genome association studies. The database makes it possible to integrate genetic analysis with data curation. The IGS web site http://bioinformatics.well.ox.ac.uk/project-igs.shtml contains further information.

  20. The inference from a single case: moral versus scientific inferences in implementing new biotechnologies.

    Science.gov (United States)

    Hofmann, B

    2008-06-01

    Are there similarities between scientific and moral inference? This is the key question in this article. It takes as its point of departure an instance of one person's story in the media changing both Norwegian public opinion and a brand-new Norwegian law prohibiting the use of saviour siblings. The case appears to falsify existing norms and to establish new ones. The analysis of this case reveals similarities in the modes of inference in science and morals, inasmuch as (a) a single case functions as a counter-example to an existing rule; (b) there is a common presupposition of stability, similarity and order, which makes it possible to reason from a few cases to a general rule; and (c) this makes it possible to hold things together and retain order. In science, these modes of inference are referred to as falsification, induction and consistency. In morals, they have a variety of other names. Hence, even without abandoning the fact-value divide, there appear to be similarities between inference in science and inference in morals, which may encourage communication across the boundaries between "the two cultures" and which are relevant to medical humanities.

  1. Optimal inverse magnetorheological damper modeling using shuffled frog-leaping algorithm–based adaptive neuro-fuzzy inference system approach

    Directory of Open Access Journals (Sweden)

    Xiufang Lin

    2016-08-01

    Full Text Available Magnetorheological dampers have become prominent semi-active control devices for vibration mitigation of structures which are subjected to severe loads. However, the damping force cannot be controlled directly due to the inherent nonlinear characteristics of the magnetorheological dampers. Therefore, for fully exploiting the capabilities of the magnetorheological dampers, one of the challenging aspects is to develop an accurate inverse model which can appropriately predict the input voltage to control the damping force. In this article, a hybrid modeling strategy combining shuffled frog-leaping algorithm and adaptive-network-based fuzzy inference system is proposed to model the inverse dynamic characteristics of the magnetorheological dampers for improving the modeling accuracy. The shuffled frog-leaping algorithm is employed to optimize the premise parameters of the adaptive-network-based fuzzy inference system while the consequent parameters are tuned by a least square estimation method, here known as shuffled frog-leaping algorithm-based adaptive-network-based fuzzy inference system approach. To evaluate the effectiveness of the proposed approach, the inverse modeling results based on the shuffled frog-leaping algorithm-based adaptive-network-based fuzzy inference system approach are compared with those based on the adaptive-network-based fuzzy inference system and genetic algorithm–based adaptive-network-based fuzzy inference system approaches. Analysis of variance test is carried out to statistically compare the performance of the proposed methods and the results demonstrate that the shuffled frog-leaping algorithm-based adaptive-network-based fuzzy inference system strategy outperforms the other two methods in terms of modeling (training accuracy and checking accuracy.

  2. A Comparative Analysis of Fuzzy Inference Engines in Context of ...

    African Journals Online (AJOL)

    PROF. O. E. OSUAGWU

    Fuzzy Inference engine is an important part of reasoning systems capable of extracting correct conclusions from ... is known as the inference, or rule definition portion, of fuzzy .... minimal set of decision rules based on input- ... The study uses Mamdani FIS model and. Sugeno FIS ... control of induction motor drive. [18] study.

  3. Inference of the Genetic Network Regulating Lateral Root Initiation in Arabidopsis thaliana

    KAUST Repository

    Muraro, D.; Voss, U.; Wilson, M.; Bennett, M.; Byrne, H.; De Smet, I.; Hodgman, C.; King, J.

    2013-01-01

    thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based

  4. Inferring recent historic abundance from current genetic diversity

    NARCIS (Netherlands)

    Palsboll, Per J.; Peery, M. Zachariah; Olsen, Morten T.; Beissinger, Steven R.; Berube, Martine

    Recent historic abundance is an elusive parameter of great importance for conserving endangered species and understanding the pre-anthropogenic state of the biosphere. The number of studies that have used population genetic theory to estimate recent historic abundance from contemporary levels of

  5. Genetic variability and discrimination of low doses of Toxocara spp. from public areas soil inferred by loop-mediated isothermal amplification assay as a field-friendly molecular tool

    OpenAIRE

    Ozlati, Maryam; Spotin, Adel; Shahbazi, Abbas; Mahami-Oskouei, Mahmoud; Hazratian, Teimour; Adibpor, Mohammad; Ahmadpour, Ehsan; Dolatkhah, Afsaneh; Khoshakhlagh, Paria

    2016-01-01

    Abstract: Aim: One of the main diagnostic problems of conventional polymerase chain reaction (PCR) is indiscrimination of low parasitic loads in soil samples. The aim of this study is to determine the genetic diversity and identification of Toxocara spp. from public areas soil inferred by loop-mediated isothermal amplification (LAMP) assay. Materials and Methods: A total of 180 soil samples were collected from various streets and public parks of northwest Iran. The DNA of recovered Toxocara e...

  6. Genetic structure of European populations of Salmo salar L (Atlantic salmon) inferred from mitochondrial DNA

    DEFF Research Database (Denmark)

    Eg Nielsen, Einar; Hansen, Michael Møller; Loeschcke, V.

    1996-01-01

    The genetic relationships between the only natural population of Atlantic salmon (Salmo salar L.) in Denmark and seven other European salmon populations were studied using RFLP analysis of PCR amplified mitochondrial DNA segments. Six different haplotypes were detected by restriction enzyme...

  7. Structural Analysis of Treatment Cycles Representing Transitions between Nursing Organizational Units Inferred from Diabetes

    Science.gov (United States)

    Dehmer, Matthias; Kurt, Zeyneb; Emmert-Streib, Frank; Them, Christa; Schulc, Eva; Hofer, Sabine

    2015-01-01

    In this paper, we investigate treatment cycles inferred from diabetes data by means of graph theory. We define the term treatment cycles graph-theoretically and perform a descriptive as well as quantitative analysis thereof. Also, we interpret our findings in terms of nursing and clinical management. PMID:26030296

  8. A Bayesian Framework That Integrates Heterogeneous Data for Inferring Gene Regulatory Networks

    Energy Technology Data Exchange (ETDEWEB)

    Santra, Tapesh, E-mail: tapesh.santra@ucd.ie [Systems Biology Ireland, University College Dublin, Dublin (Ireland)

    2014-05-20

    Reconstruction of gene regulatory networks (GRNs) from experimental data is a fundamental challenge in systems biology. A number of computational approaches have been developed to infer GRNs from mRNA expression profiles. However, expression profiles alone are proving to be insufficient for inferring GRN topologies with reasonable accuracy. Recently, it has been shown that integration of external data sources (such as gene and protein sequence information, gene ontology data, protein–protein interactions) with mRNA expression profiles may increase the reliability of the inference process. Here, I propose a new approach that incorporates transcription factor binding sites (TFBS) and physical protein interactions (PPI) among transcription factors (TFs) in a Bayesian variable selection (BVS) algorithm which can infer GRNs from mRNA expression profiles subjected to genetic perturbations. Using real experimental data, I show that the integration of TFBS and PPI data with mRNA expression profiles leads to significantly more accurate networks than those inferred from expression profiles alone. Additionally, the performance of the proposed algorithm is compared with a series of least absolute shrinkage and selection operator (LASSO) regression-based network inference methods that can also incorporate prior knowledge in the inference framework. The results of this comparison suggest that BVS can outperform LASSO regression-based method in some circumstances.

  9. A Bayesian Framework That Integrates Heterogeneous Data for Inferring Gene Regulatory Networks

    International Nuclear Information System (INIS)

    Santra, Tapesh

    2014-01-01

    Reconstruction of gene regulatory networks (GRNs) from experimental data is a fundamental challenge in systems biology. A number of computational approaches have been developed to infer GRNs from mRNA expression profiles. However, expression profiles alone are proving to be insufficient for inferring GRN topologies with reasonable accuracy. Recently, it has been shown that integration of external data sources (such as gene and protein sequence information, gene ontology data, protein–protein interactions) with mRNA expression profiles may increase the reliability of the inference process. Here, I propose a new approach that incorporates transcription factor binding sites (TFBS) and physical protein interactions (PPI) among transcription factors (TFs) in a Bayesian variable selection (BVS) algorithm which can infer GRNs from mRNA expression profiles subjected to genetic perturbations. Using real experimental data, I show that the integration of TFBS and PPI data with mRNA expression profiles leads to significantly more accurate networks than those inferred from expression profiles alone. Additionally, the performance of the proposed algorithm is compared with a series of least absolute shrinkage and selection operator (LASSO) regression-based network inference methods that can also incorporate prior knowledge in the inference framework. The results of this comparison suggest that BVS can outperform LASSO regression-based method in some circumstances.

  10. Design of uav robust autopilot based on adaptive neuro-fuzzy inference system

    Directory of Open Access Journals (Sweden)

    Mohand Achour Touat

    2008-04-01

    Full Text Available  This paper is devoted to the application of adaptive neuro-fuzzy inference systems to the robust control of the UAV longitudinal motion. The adaptive neore-fuzzy inference system model needs to be trained by input/output data. This data were obtained from the modeling of a ”crisp” robust control system. The synthesis of this system is based on the separation theorem, which defines the structure and parameters of LQG-optimal controller, and further - robust optimization of this controller, based on the genetic algorithm. Such design procedure can define the rule base and parameters of fuzzyfication and defuzzyfication algorithms of the adaptive neore-fuzzy inference system controller, which ensure the robust properties of the control system. Simulation of the closed loop control system of UAV longitudinal motion with adaptive neore-fuzzy inference system controller demonstrates high efficiency of proposed design procedure.

  11. Gene set analysis for interpreting genetic studies

    DEFF Research Database (Denmark)

    Pers, Tune H

    2016-01-01

    Interpretation of genome-wide association study (GWAS) results is lacking behind the discovery of new genetic associations. Consequently, there is an urgent need for data-driven methods for interpreting genetic association studies. Gene set analysis (GSA) can identify aetiologic pathways...

  12. Causal inference in economics and marketing.

    Science.gov (United States)

    Varian, Hal R

    2016-07-05

    This is an elementary introduction to causal inference in economics written for readers familiar with machine learning methods. The critical step in any causal analysis is estimating the counterfactual-a prediction of what would have happened in the absence of the treatment. The powerful techniques used in machine learning may be useful for developing better estimates of the counterfactual, potentially improving causal inference.

  13. Bayesian Inference Methods for Sparse Channel Estimation

    DEFF Research Database (Denmark)

    Pedersen, Niels Lovmand

    2013-01-01

    This thesis deals with sparse Bayesian learning (SBL) with application to radio channel estimation. As opposed to the classical approach for sparse signal representation, we focus on the problem of inferring complex signals. Our investigations within SBL constitute the basis for the development...... of Bayesian inference algorithms for sparse channel estimation. Sparse inference methods aim at finding the sparse representation of a signal given in some overcomplete dictionary of basis vectors. Within this context, one of our main contributions to the field of SBL is a hierarchical representation...... analysis of the complex prior representation, where we show that the ability to induce sparse estimates of a given prior heavily depends on the inference method used and, interestingly, whether real or complex variables are inferred. We also show that the Bayesian estimators derived from the proposed...

  14. Quantitative genetic analysis of total glucosinolate, oil and protein ...

    African Journals Online (AJOL)

    Quantitative genetic analysis of total glucosinolate, oil and protein contents in Ethiopian mustard ( Brassica carinata A. Braun) ... Seeds were analyzed using HPLC (glucosinolates), NMR (oil) and NIRS (protein). Analyses of variance, Hayman's method of diallel analysis and a mixed linear model of genetic analysis were ...

  15. Intelligent Modeling Combining Adaptive Neuro Fuzzy Inference System and Genetic Algorithm for Optimizing Welding Process Parameters

    Science.gov (United States)

    Gowtham, K. N.; Vasudevan, M.; Maduraimuthu, V.; Jayakumar, T.

    2011-04-01

    Modified 9Cr-1Mo ferritic steel is used as a structural material for steam generator components of power plants. Generally, tungsten inert gas (TIG) welding is preferred for welding of these steels in which the depth of penetration achievable during autogenous welding is limited. Therefore, activated flux TIG (A-TIG) welding, a novel welding technique, has been developed in-house to increase the depth of penetration. In modified 9Cr-1Mo steel joints produced by the A-TIG welding process, weld bead width, depth of penetration, and heat-affected zone (HAZ) width play an important role in determining the mechanical properties as well as the performance of the weld joints during service. To obtain the desired weld bead geometry and HAZ width, it becomes important to set the welding process parameters. In this work, adaptative neuro fuzzy inference system is used to develop independent models correlating the welding process parameters like current, voltage, and torch speed with weld bead shape parameters like depth of penetration, bead width, and HAZ width. Then a genetic algorithm is employed to determine the optimum A-TIG welding process parameters to obtain the desired weld bead shape parameters and HAZ width.

  16. Cross-platform comparison of microarray data using order restricted inference

    Science.gov (United States)

    Klinglmueller, Florian; Tuechler, Thomas; Posch, Martin

    2013-01-01

    Motivation Titration experiments measuring the gene expression from two different tissues, along with total RNA mixtures of the pure samples, are frequently used for quality evaluation of microarray technologies. Such a design implies that the true mRNA expression of each gene, is either constant or follows a monotonic trend between the mixtures, applying itself to the use of order restricted inference procedures. Exploiting only the postulated monotonicity of titration designs, we propose three statistical analysis methods for the validation of high-throughput genetic data and corresponding preprocessing techniques. Results Our methods allow for inference of accuracy, repeatability and cross-platform agreement, with minimal required assumptions regarding the underlying data generating process. Therefore, they are readily applicable to all sorts of genetic high-throughput data independent of the degree of preprocessing. An application to the EMERALD dataset was used to demonstrate how our methods provide a rich spectrum of easily interpretable quality metrics and allow the comparison of different microarray technologies and normalization methods. The results are on par with previous work, but provide additional new insights that cast doubt on the utility of popular preprocessing techniques, specifically concerning the EMERALD projects dataset. Availability All datasets are available on EBI’s ArrayExpress web site (http://www.ebi.ac.uk/microarray-as/ae/) under accession numbers E-TABM-536, E-TABM-554 and E-TABM-555. Source code implemented in C and R is available at: http://statistics.msi.meduniwien.ac.at/float/cross_platform/. Methods for testing and variance decomposition have been made available in the R-package orQA, which can be downloaded and installed from CRAN http://cran.r-project.org. PMID:21317143

  17. Evaluation and genetic analysis of semi-dwarf mutants in rice (Oryza sativa L.)

    International Nuclear Information System (INIS)

    Awan, M.A.; Cheema, A.A.; Tahir, G.R.

    1984-01-01

    Four semi-dwarf mutants namely DM16-5-1, DM16-5-2, DM-2 and DM107-4 were derived from the local tall basmati cultivar. The mode of reduction of internode length was studied in DM107-4. The reduction in culm length was due to a corresponding but disproportionate reduction in all the internodes. It was inferred that reduction in internode length contributes more towards reduction in height as compared to the reduction in the total number of internodes. The effect of semi-dwarfism on some yield components (panicle characters) was studied in two semi-dwarf mutants viz. DM16-5-1 and DM107-4 compared to Basmati 370. A marginal reduction in the panicle axis, primary branches per panicle, secondary branches per primary branch per panicle, spikelets borne on secondary branches and total number of spikelets per panicle was observed in DM16-5-1, whereas, a significant reduction of these characters was observed in DM107-4. Evaluation of the semi-dwarf mutants with respect to grain yield and harvest index showed that all the mutants possess high yield potential with higher harvest index values compared to the parent cultivar. Genetic analysis for plant height in 4x4 diallel involving semi-dwarf mutants revealed that mutant DM107-4 carries mainly recessive alleles while mutant DM16-5-1 showed some dominance effects as assessed through the estimates of genetic components of variation and Vr,Wr graph analysis. The semi-dwarf mutants have good potential for use as parents in cross-breeding programmes. (author)

  18. Accurate Local-Ancestry Inference in Exome-Sequenced Admixed Individuals via Off-Target Sequence Reads

    Science.gov (United States)

    Hu, Youna; Willer, Cristen; Zhan, Xiaowei; Kang, Hyun Min; Abecasis, Gonçalo R.

    2013-01-01

    Estimates of the ancestry of specific chromosomal regions in admixed individuals are useful for studies of human evolutionary history and for genetic association studies. Previously, this ancestry inference relied on high-quality genotypes from genome-wide association study (GWAS) arrays. These high-quality genotypes are not always available when samples are exome sequenced, and exome sequencing is the strategy of choice for many ongoing genetic studies. Here we show that off-target reads generated during exome-sequencing experiments can be combined with on-target reads to accurately estimate the ancestry of each chromosomal segment in an admixed individual. To reconstruct local ancestry, our method SEQMIX models aligned bases directly instead of relying on hard genotype calls. We evaluate the accuracy of our method through simulations and analysis of samples sequenced by the 1000 Genomes Project and the NHLBI Grand Opportunity Exome Sequencing Project. In African Americans, we show that local-ancestry estimates derived by our method are very similar to those derived with Illumina’s Omni 2.5M genotyping array and much improved in relation to estimates that use only exome genotypes and ignore off-target sequencing reads. Software implementing this method, SEQMIX, can be applied to analysis of human population history or used for genetic association studies in admixed individuals. PMID:24210252

  19. Inference of miRNA targets using evolutionary conservation and pathway analysis

    Directory of Open Access Journals (Sweden)

    van Nimwegen Erik

    2007-03-01

    Full Text Available Abstract Background MicroRNAs have emerged as important regulatory genes in a variety of cellular processes and, in recent years, hundreds of such genes have been discovered in animals. In contrast, functional annotations are available only for a very small fraction of these miRNAs, and even in these cases only partially. Results We developed a general Bayesian method for the inference of miRNA target sites, in which, for each miRNA, we explicitly model the evolution of orthologous target sites in a set of related species. Using this method we predict target sites for all known miRNAs in flies, worms, fish, and mammals. By comparing our predictions in fly with a reference set of experimentally tested miRNA-mRNA interactions we show that our general method performs at least as well as the most accurate methods available to date, including ones specifically tailored for target prediction in fly. An important novel feature of our model is that it explicitly infers the phylogenetic distribution of functional target sites, independently for each miRNA. This allows us to infer species-specific and clade-specific miRNA targeting. We also show that, in long human 3' UTRs, miRNA target sites occur preferentially near the start and near the end of the 3' UTR. To characterize miRNA function beyond the predicted lists of targets we further present a method to infer significant associations between the sets of targets predicted for individual miRNAs and specific biochemical pathways, in particular those of the KEGG pathway database. We show that this approach retrieves several known functional miRNA-mRNA associations, and predicts novel functions for known miRNAs in cell growth and in development. Conclusion We have presented a Bayesian target prediction algorithm without any tunable parameters, that can be applied to sequences from any clade of species. The algorithm automatically infers the phylogenetic distribution of functional sites for each miRNA, and

  20. Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis.

    Directory of Open Access Journals (Sweden)

    Brett A McKinney

    2009-03-01

    Full Text Available Evidence from human genetic studies of several disorders suggests that interactions between alleles at multiple genes play an important role in influencing phenotypic expression. Analytical methods for identifying Mendelian disease genes are not appropriate when applied to common multigenic diseases, because such methods investigate association with the phenotype only one genetic locus at a time. New strategies are needed that can capture the spectrum of genetic effects, from Mendelian to multifactorial epistasis. Random Forests (RF and Relief-F are two powerful machine-learning methods that have been studied as filters for genetic case-control data due to their ability to account for the context of alleles at multiple genes when scoring the relevance of individual genetic variants to the phenotype. However, when variants interact strongly, the independence assumption of RF in the tree node-splitting criterion leads to diminished importance scores for relevant variants. Relief-F, on the other hand, was designed to detect strong interactions but is sensitive to large backgrounds of variants that are irrelevant to classification of the phenotype, which is an acute problem in genome-wide association studies. To overcome the weaknesses of these data mining approaches, we develop Evaporative Cooling (EC feature selection, a flexible machine learning method that can integrate multiple importance scores while removing irrelevant genetic variants. To characterize detailed interactions, we construct a genetic-association interaction network (GAIN, whose edges quantify the synergy between variants with respect to the phenotype. We use simulation analysis to show that EC is able to identify a wide range of interaction effects in genetic association data. We apply the EC filter to a smallpox vaccine cohort study of single nucleotide polymorphisms (SNPs and infer a GAIN for a collection of SNPs associated with adverse events. Our results suggest an important

  1. Multi-Objective data analysis using Bayesian Inference for MagLIF experiments

    Science.gov (United States)

    Knapp, Patrick; Glinksy, Michael; Evans, Matthew; Gom, Matth; Han, Stephanie; Harding, Eric; Slutz, Steve; Hahn, Kelly; Harvey-Thompson, Adam; Geissel, Matthias; Ampleford, David; Jennings, Christopher; Schmit, Paul; Smith, Ian; Schwarz, Jens; Peterson, Kyle; Jones, Brent; Rochau, Gregory; Sinars, Daniel

    2017-10-01

    The MagLIF concept has recently demonstrated Gbar pressures and confinement of charged fusion products at stagnation. We present a new analysis methodology that allows for integration of multiple diagnostics including nuclear, x-ray imaging, and x-ray power to determine the temperature, pressure, liner areal density, and mix fraction. A simplified hot-spot model is used with a Bayesian inference network to determine the most probable model parameters that describe the observations while simultaneously revealing the principal uncertainties in the analysis. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA-0003525.

  2. A Network Inference Workflow Applied to Virulence-Related Processes in Salmonella typhimurium

    Energy Technology Data Exchange (ETDEWEB)

    Taylor, Ronald C.; Singhal, Mudita; Weller, Jennifer B.; Khoshnevis, Saeed; Shi, Liang; McDermott, Jason E.

    2009-04-20

    Inference of the structure of mRNA transcriptional regulatory networks, protein regulatory or interaction networks, and protein activation/inactivation-based signal transduction networks are critical tasks in systems biology. In this article we discuss a workflow for the reconstruction of parts of the transcriptional regulatory network of the pathogenic bacterium Salmonella typhimurium based on the information contained in sets of microarray gene expression data now available for that organism, and describe our results obtained by following this workflow. The primary tool is one of the network inference algorithms deployed in the Software Environment for BIological Network Inference (SEBINI). Specifically, we selected the algorithm called Context Likelihood of Relatedness (CLR), which uses the mutual information contained in the gene expression data to infer regulatory connections. The associated analysis pipeline automatically stores the inferred edges from the CLR runs within SEBINI and, upon request, transfers the inferred edges into either Cytoscape or the plug-in Collective Analysis of Biological of Biological Interaction Networks (CABIN) tool for further post-analysis of the inferred regulatory edges. The following article presents the outcome of this workflow, as well as the protocols followed for microarray data collection, data cleansing, and network inference. Our analysis revealed several interesting interactions, functional groups, metabolic pathways, and regulons in S. typhimurium.

  3. Accuracy of Demographic Inferences from the Site Frequency Spectrum: The Case of the Yoruba Population.

    Science.gov (United States)

    Lapierre, Marguerite; Lambert, Amaury; Achaz, Guillaume

    2017-05-01

    Some methods for demographic inference based on the observed genetic diversity of current populations rely on the use of summary statistics such as the Site Frequency Spectrum (SFS). Demographic models can be either model-constrained with numerous parameters, such as growth rates, timing of demographic events, and migration rates, or model-flexible, with an unbounded collection of piecewise constant sizes. It is still debated whether demographic histories can be accurately inferred based on the SFS. Here, we illustrate this theoretical issue on an example of demographic inference for an African population. The SFS of the Yoruba population (data from the 1000 Genomes Project) is fit to a simple model of population growth described with a single parameter ( e.g. , founding time). We infer a time to the most recent common ancestor of 1.7 million years (MY) for this population. However, we show that the Yoruba SFS is not informative enough to discriminate between several different models of growth. We also show that for such simple demographies, the fit of one-parameter models outperforms the stairway plot, a recently developed model-flexible method. The use of this method on simulated data suggests that it is biased by the noise intrinsically present in the data. Copyright © 2017 by the Genetics Society of America.

  4. First regional evaluation of nuclear genetic diversity and population structure in northeastern coyotes (Canis latrans [v1; ref status: indexed, http://f1000r.es/2y3

    Directory of Open Access Journals (Sweden)

    Javier Monzón

    2014-03-01

    Full Text Available Previous genetic studies of eastern coyotes (Canis latrans are based on one of two strategies: sampling many individuals using one or very few molecular markers, or sampling very few individuals using many genomic markers. Thus, a regional analysis of genetic diversity and population structure in eastern coyotes using many samples and several molecular markers is lacking. I evaluated genetic diversity and population structure in 385 northeastern coyotes using 16 common single nucleotide polymorphisms (SNPs. A region-wide analysis of population structure revealed three primary genetic populations, but these do not correspond to the same three subdivisions inferred in a previous analysis of mitochondrial DNA sequences. More focused geographic analyses of population structure indicated that ample genetic structure occurs in coyotes from an intermediate contact zone where two range expansion fronts meet. These results demonstrate that genotyping several highly heterozygous SNPs in a large, geographically dense sample is an effective way to detect cryptic population genetic structure. The importance of SNPs in studies of population and wildlife genomics is rapidly increasing; this study adds to the growing body of recent literature that demonstrates the utility of SNPs ascertained from a model organism for evolutionary inference in closely related species.

  5. In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.

    Science.gov (United States)

    Audain, Enrique; Uszkoreit, Julian; Sachsenberg, Timo; Pfeuffer, Julianus; Liang, Xiao; Hermjakob, Henning; Sanchez, Aniel; Eisenacher, Martin; Reinert, Knut; Tabb, David L; Kohlbacher, Oliver; Perez-Riverol, Yasset

    2017-01-06

    inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could

  6. Genetic and morphological contrasts between wild and anthropogenic populations of Agave parryi var. huachucensis in south-eastern Arizona.

    Science.gov (United States)

    Parker, Kathleen C; Trapnell, Dorset W; Hamrick, J L; Hodgson, Wendy C

    2014-05-01

    At least seven species of Agave, including A. parryi, were cultivated prehistorically in Arizona, serving as important sources of food and fibre. Many relict populations from ancient cultivation remain in the modern landscape, offering a unique opportunity to study pre-Columbian plant manipulation practices. This study examined genetic and morphological variation in six A. p. var. huachucensis populations of unknown origin to compare them with previous work on A. parryi populations of known origin, to infer their cultivation history and to determine whether artificial selection is evident in populations potentially managed by early agriculturalists. Six A. p. var. huachucensis and 17 A. parryi populations were sampled, and morphometric, allozyme and microsatellite data were used to compare morphology and genetic structure in purportedly anthropogenic and wild populations, as well as in the two taxa. Analysis of molecular variance and Bayesian clustering were performed to partition variation associated with taxonomic identity and hypothesized evolutionary history, to highlight patterns of similarity among populations and to identify potential wild sources for the planting stock. A p. var. huachucensis and A. parryi populations differed significantly both morphologically and genetically. Like A. parryi, wild A. p. var. huachucensis populations were more genetically diverse than the inferred anthropogenic populations, with greater expected heterozygosity, percentage of polymorphic loci and number of alleles. Inferred anthropogenic populations exhibited many traits indicative of past active cultivation: greater morphological uniformity, fixed heterozygosity for several loci (non-existent in wild populations), fewer multilocus genotypes and strong differentiation among populations. Where archaeological information is lacking, the genetic signature of many Agave populations in Arizona can be used to infer their evolutionary history and to identify potentially fruitful

  7. Intraspecific relationship within the genus convolvulus l. inferred by rbcl gene using different phylogenetic approaches

    International Nuclear Information System (INIS)

    Kausar, S.; Qamarunnisa, S.

    2016-01-01

    A molecular systematics analysis was conducted using sequence data of chloroplast rbcL gene for the genus Convolvulus L., by distance and character based phylogenetic methods. Fifteen representative members from genus Convolvulus L., were included as in group whereas two members from a sister family Solanaceae were taken as out group to root the tree. Intraspecific relationships within Convolvulus were inferred by distance matrix, maximum parsimony and bayesian analysis. Transition/transversion ratio was also calculated and it was revealed that in the investigated Convolvulus species, transitional changes were more prevalent in rbcL gene. The nature of rbcL gene in the present study was observed to be conserved, as it does not show major variations between examined species. Distance matrix represented the minimal genetic variations between some species (C. glomeratus and C. pyrrhotrichus), thus exhibiting them as close relatives. The result of parsimonious and bayesian analysis revealed almost similar clades however maximum parsimony based tree was unable to establish relationship between some Convolvulus species. The bayesian inference method was found to be the method of choice for establishing intraspecific associations between Convolvulus species using rbcL data as it clearly defined the connections supported by posterior probability values. (author)

  8. Genetic Pattern and Demographic History of Salminus brasiliensis: Population Expansion in the Pantanal Region during the Pleistocene

    Directory of Open Access Journals (Sweden)

    Lívia A. de Carvalho Mondin

    2018-01-01

    Full Text Available Pleistocene climate changes were major historical events that impacted South American biodiversity. Although the effects of such changes are well-documented for several biomes, it is poorly known how these climate shifts affected the biodiversity of the Pantanal floodplain. Fish are one of the most diverse groups in the Pantanal floodplains and can be taken as a suitable biological model for reconstructing paleoenvironmental scenarios. To identify the effects of Pleistocene climate changes on Pantanal’s ichthyofauna, we used genetic data from multiple populations of a top-predator long-distance migratory fish, Salminus brasiliensis. We specifically investigated whether Pleistocene climate changes affected the demography of this species. If this was the case, we expected to find changes in population size over time. Thus, we assessed the genetic diversity of S. brasiliensis to trace the demographic history of nine populations from the Upper Paraguay basin, which includes the Pantanal floodplain, that form a single genetic group, employing approximate Bayesian computation (ABC to test five scenarios: constant population, old expansion, old decline, old bottleneck following by recent expansion, and old expansion following by recent decline. Based on two mitochondrial DNA markers, our inferences from ABC analysis, the results of Bayesian skyline plot, the implications of star-like networks, and the patterns of genetic diversity (high haplotype diversity and low-to-moderate nucleotide diversity indicated a sudden population expansion. ABC allowed us to make strong quantitative inferences about the demographic history of S. brasiliensis. We estimated a small ancestral population size that underwent a drastic fivefold expansion, probably associated with the colonization of newly formed habitats. The estimated time of this expansion was consistent with a humid and warm phase as inferred by speleothem growth phases and travertine records during

  9. TYPE Ia SUPERNOVA LIGHT-CURVE INFERENCE: HIERARCHICAL BAYESIAN ANALYSIS IN THE NEAR-INFRARED

    International Nuclear Information System (INIS)

    Mandel, Kaisey S.; Friedman, Andrew S.; Kirshner, Robert P.; Wood-Vasey, W. Michael

    2009-01-01

    We present a comprehensive statistical analysis of the properties of Type Ia supernova (SN Ia) light curves in the near-infrared using recent data from Peters Automated InfraRed Imaging TELescope and the literature. We construct a hierarchical Bayesian framework, incorporating several uncertainties including photometric error, peculiar velocities, dust extinction, and intrinsic variations, for principled and coherent statistical inference. SN Ia light-curve inferences are drawn from the global posterior probability of parameters describing both individual supernovae and the population conditioned on the entire SN Ia NIR data set. The logical structure of the hierarchical model is represented by a directed acyclic graph. Fully Bayesian analysis of the model and data is enabled by an efficient Markov Chain Monte Carlo algorithm exploiting the conditional probabilistic structure using Gibbs sampling. We apply this framework to the JHK s SN Ia light-curve data. A new light-curve model captures the observed J-band light-curve shape variations. The marginal intrinsic variances in peak absolute magnitudes are σ(M J ) = 0.17 ± 0.03, σ(M H ) = 0.11 ± 0.03, and σ(M Ks ) = 0.19 ± 0.04. We describe the first quantitative evidence for correlations between the NIR absolute magnitudes and J-band light-curve shapes, and demonstrate their utility for distance estimation. The average residual in the Hubble diagram for the training set SNe at cz > 2000kms -1 is 0.10 mag. The new application of bootstrap cross-validation to SN Ia light-curve inference tests the sensitivity of the statistical model fit to the finite sample and estimates the prediction error at 0.15 mag. These results demonstrate that SN Ia NIR light curves are as effective as corrected optical light curves, and, because they are less vulnerable to dust absorption, they have great potential as precise and accurate cosmological distance indicators.

  10. Generative inference for cultural evolution.

    Science.gov (United States)

    Kandler, Anne; Powell, Adam

    2018-04-05

    One of the major challenges in cultural evolution is to understand why and how various forms of social learning are used in human populations, both now and in the past. To date, much of the theoretical work on social learning has been done in isolation of data, and consequently many insights focus on revealing the learning processes or the distributions of cultural variants that are expected to have evolved in human populations. In population genetics, recent methodological advances have allowed a greater understanding of the explicit demographic and/or selection mechanisms that underlie observed allele frequency distributions across the globe, and their change through time. In particular, generative frameworks-often using coalescent-based simulation coupled with approximate Bayesian computation (ABC)-have provided robust inferences on the human past, with no reliance on a priori assumptions of equilibrium. Here, we demonstrate the applicability and utility of generative inference approaches to the field of cultural evolution. The framework advocated here uses observed population-level frequency data directly to establish the likely presence or absence of particular hypothesized learning strategies. In this context, we discuss the problem of equifinality and argue that, in the light of sparse cultural data and the multiplicity of possible social learning processes, the exclusion of those processes inconsistent with the observed data might be the most instructive outcome. Finally, we summarize the findings of generative inference approaches applied to a number of case studies.This article is part of the theme issue 'Bridging cultural gaps: interdisciplinary studies in human cultural evolution'. © 2018 The Author(s).

  11. Admixture mapping of end stage kidney disease genetic susceptibility using estimated mutual information ancestry informative markers

    Directory of Open Access Journals (Sweden)

    Geiger Dan

    2010-10-01

    Full Text Available Abstract Background The question of a genetic contribution to the higher prevalence and incidence of end stage kidney disease (ESKD among African Americans (AA remained unresolved, until recent findings using admixture mapping pointed to the association of a genomic locus on chromosome 22 with this disease phenotype. In the current study we utilize this example to demonstrate the utility of applying a multi-step admixture mapping approach. Methods A multi-step case only admixture mapping study, consisted of the following steps was designed: 1 Assembly of the sample dataset (ESKD AA; 2 Design of the estimated mutual information ancestry informative markers (n = 2016 screening panel 3; Genotyping the sample set whose size was determined by a power analysis (n = 576 appropriate for the initial screening panel; 4 Inference of local ancestry for each individual and identification of regions with increased AA ancestry using two different ancestry inference statistical approaches; 5 Enrichment of the initial screening panel; 6 Power analysis of the enriched panel 7 Genotyping of additional samples. 8 Re-analysis of the genotyping results to identify a genetic risk locus. Results The initial screening phase yielded a significant peak using the ADMIXMAP ancestry inference program applying case only statistics. Subgroup analysis of 299 ESKD patients with no history of diabetes yielded peaks using both the ANCESTRYMAP and ADMIXMAP ancestry inference programs. The significant peak was found on chromosome 22. Genotyping of additional ancestry informative markers on chromosome 22 that took into account linkage disequilibrium in the ancestral populations, and the addition of samples increased the statistical significance of the finding. Conclusions A multi-step admixture mapping analysis of AA ESKD patients replicated the finding of a candidate risk locus on chromosome 22, contributing to the heightened susceptibility of African Americans to develop non

  12. Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies

    Directory of Open Access Journals (Sweden)

    Hero Alfred

    2010-11-01

    Full Text Available Abstract Background Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP, the Indian Buffet Process (IBP, and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB analysis. Results Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV, Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD, closely related non-Bayesian approaches. Conclusions Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.

  13. Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies.

    Science.gov (United States)

    Chen, Bo; Chen, Minhua; Paisley, John; Zaas, Aimee; Woods, Christopher; Ginsburg, Geoffrey S; Hero, Alfred; Lucas, Joseph; Dunson, David; Carin, Lawrence

    2010-11-09

    Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.

  14. Construction of the first compendium of chemical-genetic profiles in the fission yeast Schizosaccharomyces pombe and comparative compendium approach

    Energy Technology Data Exchange (ETDEWEB)

    Han, Sangjo [Bioinformatics Lab, Healthcare Group, SK Telecom, 9-1, Sunae-dong, Pundang-gu, Sungnam-si, Kyunggi-do 463-784 (Korea, Republic of); Lee, Minho [Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of); Chang, Hyeshik [Department of Biological Science, Seoul National University, 599 Gwanakro, Gwanak-gu, Seoul 151-747 (Korea, Republic of); Nam, Miyoung [Department of New Drug Discovery and Development, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon, 305-764 (Korea, Republic of); Park, Han-Oh [Bioneer Corp., 8-11 Munpyeongseo-ro, Daedeok-gu, Daejeon 306-220 (Korea, Republic of); Kwak, Youn-Sig [Department of Applied Biology, Gyeongsang National University, 501 Jinju-daero, Jinju, Gyeongnam 660-701 (Korea, Republic of); Ha, Hye-jeong [Aging Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), 125 Gwahak-ro, Yuseong-Gu, Daejeon 305-806 (Korea, Republic of); Kim, Dongsup [Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of); Hwang, Sung-Ook [Department of Obstetrics and Gynecology, Inha University Hospital, 7-206 Sinheung-dong, Jung-gu, Incheon 400-711 (Korea, Republic of); Hoe, Kwang-Lae [Department of New Drug Discovery and Development, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon, 305-764 (Korea, Republic of); Kim, Dong-Uk [Aging Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), 125 Gwahak-ro, Yuseong-Gu, Daejeon 305-806 (Korea, Republic of)

    2013-07-12

    Highlights: •The first compendium of chemical-genetic profiles form fission yeast was generated. •The first HTS of drug mode-of-action in fission yeast was performed. •The first comparative chemical genetic analysis between two yeasts was conducted. -- Abstract: Genome-wide chemical genetic profiles in Saccharomyces cerevisiae since the budding yeast deletion library construction have been successfully used to reveal unknown mode-of-actions of drugs. Here, we introduce comparative approach to infer drug target proteins more accurately using two compendiums of chemical-genetic profiles from the budding yeast S. cerevisiae and the fission yeast Schizosaccharomyces pombe. For the first time, we established DNA-chip based growth defect measurement of genome-wide deletion strains of S. pombe, and then applied 47 drugs to the pooled heterozygous deletion strains to generate chemical-genetic profiles in S. pombe. In our approach, putative drug targets were inferred from strains hypersensitive to given drugs by analyzing S. pombe and S. cerevisiae compendiums. Notably, many evidences in the literature revealed that the inferred target genes of fungicide and bactericide identified by such comparative approach are in fact the direct targets. Furthermore, by filtering out the genes with no essentiality, the multi-drug sensitivity genes, and the genes with less eukaryotic conservation, we created a set of drug target gene candidates that are expected to be directly affected by a given drug in human cells. Our study demonstrated that it is highly beneficial to construct the multiple compendiums of chemical genetic profiles using many different species. The fission yeast chemical-genetic compendium is available at (http://pombe.kaist.ac.kr/compendium)

  15. Analysis of genetic structure and relationship among nine ...

    Indian Academy of Sciences (India)

    These results indicated that the clustering analysis using the Structure program might provide an ..... of the current genetic relations among the breeds, and con- tribute to ... sis of the genetic structure of the Canary goat populations using.

  16. MP-GeneticSynth: inferring biological network regulations from time series.

    Science.gov (United States)

    Castellini, Alberto; Paltrinieri, Daniele; Manca, Vincenzo

    2015-03-01

    MP-GeneticSynth is a Java tool for discovering the logic and regulation mechanisms responsible for observed biological dynamics in terms of finite difference recurrent equations. The software makes use of: (i) metabolic P systems as a modeling framework, (ii) an evolutionary approach to discover flux regulation functions as linear combinations of given primitive functions, (iii) a suitable reformulation of the least squares method to estimate function parameters considering simultaneously all the reactions involved in complex dynamics. The tool is available as a plugin for the virtual laboratory MetaPlab. It has graphical and interactive interfaces for data preparation, a priori knowledge integration, and flux regulator analysis. Availability and implementation: Source code, binaries, documentation (including quick start guide and videos) and case studies are freely available at http://mplab.sci.univr.it/plugins/mpgs/index.html. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Noise genetics: inferring protein function by correlating phenotype with protein levels and localization in individual human cells.

    Directory of Open Access Journals (Sweden)

    Shlomit Farkash-Amar

    2014-03-01

    Full Text Available To understand gene function, genetic analysis uses large perturbations such as gene deletion, knockdown or over-expression. Large perturbations have drawbacks: they move the cell far from its normal working point, and can thus be masked by off-target effects or compensation by other genes. Here, we offer a complementary approach, called noise genetics. We use natural cell-cell variations in protein level and localization, and correlate them to the natural variations of the phenotype of the same cells. Observing these variations is made possible by recent advances in dynamic proteomics that allow measuring proteins over time in individual living cells. Using motility of human cancer cells as a model system, and time-lapse microscopy on 566 fluorescently tagged proteins, we found 74 candidate motility genes whose level or localization strongly correlate with motility in individual cells. We recovered 30 known motility genes, and validated several novel ones by mild knockdown experiments. Noise genetics can complement standard genetics for a variety of phenotypes.

  18. Timing Analysis of Genetic Logic Circuits using D-VASim

    DEFF Research Database (Denmark)

    Baig, Hasan; Madsen, Jan

    and propagation delay analysis of single as well as cascaded geneticlogic circuits can be performed. D-VASim allows user to change the circuit parameters during runtime simulation to observe its effectson circuit’s timing behavior. The results obtained from D-VASim can be used not only to characterize the timing...... delay analysis may play a very significant role in the designing of genetic logic circuits. In thisdemonstration, we present the capability of D-VASim (Dynamic Virtual Analyzer and Simulator) to perform the timing and propagationdelay analysis of genetic logic circuits. Using D-VASim, the timing...... behavior of geneticlogic circuits but also to analyze the timing constraints of cascaded genetic logic circuits....

  19. Genetic analysis of Mexican Criollo cattle populations.

    Science.gov (United States)

    Ulloa-Arvizu, R; Gayosso-Vázquez, A; Ramos-Kuri, M; Estrada, F J; Montaño, M; Alonso, R A

    2008-10-01

    The objective of this study was to evaluate the genetic structure of Mexican Criollo cattle populations using microsatellite genetic markers. DNA samples were collected from 168 animals from four Mexican Criollo cattle populations, geographically isolated in remote areas of Sierra Madre Occidental (West Highlands). Also were included samples from two breeds with Iberian origin: the fighting bull (n = 24) and the milking central American Criollo (n = 24) and one Asiatic breed: Guzerat (n = 32). Genetic analysis consisted of the estimation of the genetic diversity in each population by the allele number and the average expected heterozygosity found in nine microsatellite loci. Furthermore, genetic relationships among the populations were defined by their genetic distances. Our data shows that Mexican cattle populations have a relatively high level of genetic diversity based either on the mean number of alleles (10.2-13.6) and on the expected heterozygosity (0.71-0.85). The degree of observed homozygosity within the Criollo populations was remarkable and probably caused by inbreeding (reduced effective population size) possibly due to reproductive structure within populations. Our data shows that considerable genetic differentiation has been occurred among the Criollo cattle populations in different regions of Mexico.

  20. Bayesian network model for identification of pathways by integrating protein interaction with genetic interaction data.

    Science.gov (United States)

    Fu, Changhe; Deng, Su; Jin, Guangxu; Wang, Xinxin; Yu, Zu-Guo

    2017-09-21

    Molecular interaction data at proteomic and genetic levels provide physical and functional insights into a molecular biosystem and are helpful for the construction of pathway structures complementarily. Despite advances in inferring biological pathways using genetic interaction data, there still exists weakness in developed models, such as, activity pathway networks (APN), when integrating the data from proteomic and genetic levels. It is necessary to develop new methods to infer pathway structure by both of interaction data. We utilized probabilistic graphical model to develop a new method that integrates genetic interaction and protein interaction data and infers exquisitely detailed pathway structure. We modeled the pathway network as Bayesian network and applied this model to infer pathways for the coherent subsets of the global genetic interaction profiles, and the available data set of endoplasmic reticulum genes. The protein interaction data were derived from the BioGRID database. Our method can accurately reconstruct known cellular pathway structures, including SWR complex, ER-Associated Degradation (ERAD) pathway, N-Glycan biosynthesis pathway, Elongator complex, Retromer complex, and Urmylation pathway. By comparing N-Glycan biosynthesis pathway and Urmylation pathway identified from our approach with that from APN, we found that our method is able to overcome its weakness (certain edges are inexplicable). According to underlying protein interaction network, we defined a simple scoring function that only adopts genetic interaction information to avoid the balance difficulty in the APN. Using the effective stochastic simulation algorithm, the performance of our proposed method is significantly high. We developed a new method based on Bayesian network to infer detailed pathway structures from interaction data at proteomic and genetic levels. The results indicate that the developed method performs better in predicting signaling pathways than previously

  1. Statistical inference on residual life

    CERN Document Server

    Jeong, Jong-Hyeon

    2014-01-01

    This is a monograph on the concept of residual life, which is an alternative summary measure of time-to-event data, or survival data. The mean residual life has been used for many years under the name of life expectancy, so it is a natural concept for summarizing survival or reliability data. It is also more interpretable than the popular hazard function, especially for communications between patients and physicians regarding the efficacy of a new drug in the medical field. This book reviews existing statistical methods to infer the residual life distribution. The review and comparison includes existing inference methods for mean and median, or quantile, residual life analysis through medical data examples. The concept of the residual life is also extended to competing risks analysis. The targeted audience includes biostatisticians, graduate students, and PhD (bio)statisticians. Knowledge in survival analysis at an introductory graduate level is advisable prior to reading this book.

  2. Modeling genetic imprinting effects of DNA sequences with multilocus polymorphism data

    Directory of Open Access Journals (Sweden)

    Staud Roland

    2009-08-01

    Full Text Available Abstract Single nucleotide polymorphisms (SNPs represent the most widespread type of DNA sequence variation in the human genome and they have recently emerged as valuable genetic markers for revealing the genetic architecture of complex traits in terms of nucleotide combination and sequence. Here, we extend an algorithmic model for the haplotype analysis of SNPs to estimate the effects of genetic imprinting expressed at the DNA sequence level. The model provides a general procedure for identifying the number and types of optimal DNA sequence variants that are expressed differently due to their parental origin. The model is used to analyze a genetic data set collected from a pain genetics project. We find that DNA haplotype GAC from three SNPs, OPRKG36T (with two alleles G and T, OPRKA843G (with alleles A and G, and OPRKC846T (with alleles C and T, at the kappa-opioid receptor, triggers a significant effect on pain sensitivity, but with expression significantly depending on the parent from which it is inherited (p = 0.008. With a tremendous advance in SNP identification and automated screening, the model founded on haplotype discovery and statistical inference may provide a useful tool for genetic analysis of any quantitative trait with complex inheritance.

  3. Integrative inference of population history in the Ibero-Maghrebian endemic Pleurodeles waltl (Salamandridae).

    Science.gov (United States)

    Gutiérrez-Rodríguez, Jorge; Barbosa, A Márcia; Martínez-Solano, Íñigo

    2017-07-01

    Inference of population histories from the molecular signatures of past demographic processes is challenging, but recent methodological advances in species distribution models and their integration in time-calibrated phylogeographic studies allow detailed reconstruction of complex biogeographic scenarios. We apply an integrative approach to infer the evolutionary history of the Iberian ribbed newt (Pleurodeles waltl), an Ibero-Maghrebian endemic with populations north and south of the Strait of Gibraltar. We analyzed an extensive multilocus dataset (mitochondrial and nuclear DNA sequences and ten polymorphic microsatellite loci) and found a deep east-west phylogeographic break in Iberian populations dating back to the Plio-Pleistocene. This break is inferred to result from vicariance associated with the formation of the Guadalquivir river basin. In contrast with previous studies, North African populations showed exclusive mtDNA haplotypes, and formed a monophyletic clade within the Eastern Iberian lineage in the mtDNA genealogy. On the other hand, microsatellites failed to recover Moroccan populations as a differentiated genetic cluster. This is interpreted to result from post-divergence gene flow based on the results of IMA2 and Migrate analyses. Thus, Moroccan populations would have originated after overseas dispersal from the Iberian Peninsula in the Pleistocene, with subsequent gene flow in more recent times, implying at least two trans-marine dispersal events. We modeled the distribution of the species and of each lineage, and projected these models back in time to infer climatically favourable areas during the mid-Holocene, the last glacial maximum (LGM) and the last interglacial (LIG), to reconstruct more recent population dynamics. We found minor differences in climatic favourability across lineages, suggesting intraspecific niche conservatism. Genetic diversity was significantly correlated with the intersection of environmental favourability in the LIG and

  4. Bootstrap inference when using multiple imputation.

    Science.gov (United States)

    Schomaker, Michael; Heumann, Christian

    2018-04-16

    Many modern estimators require bootstrapping to calculate confidence intervals because either no analytic standard error is available or the distribution of the parameter of interest is nonsymmetric. It remains however unclear how to obtain valid bootstrap inference when dealing with multiple imputation to address missing data. We present 4 methods that are intuitively appealing, easy to implement, and combine bootstrap estimation with multiple imputation. We show that 3 of the 4 approaches yield valid inference, but that the performance of the methods varies with respect to the number of imputed data sets and the extent of missingness. Simulation studies reveal the behavior of our approaches in finite samples. A topical analysis from HIV treatment research, which determines the optimal timing of antiretroviral treatment initiation in young children, demonstrates the practical implications of the 4 methods in a sophisticated and realistic setting. This analysis suffers from missing data and uses the g-formula for inference, a method for which no standard errors are available. Copyright © 2018 John Wiley & Sons, Ltd.

  5. Multivariate Methods for Meta-Analysis of Genetic Association Studies.

    Science.gov (United States)

    Dimou, Niki L; Pantavou, Katerina G; Braliou, Georgia G; Bagos, Pantelis G

    2018-01-01

    Multivariate meta-analysis of genetic association studies and genome-wide association studies has received a remarkable attention as it improves the precision of the analysis. Here, we review, summarize and present in a unified framework methods for multivariate meta-analysis of genetic association studies and genome-wide association studies. Starting with the statistical methods used for robust analysis and genetic model selection, we present in brief univariate methods for meta-analysis and we then scrutinize multivariate methodologies. Multivariate models of meta-analysis for a single gene-disease association studies, including models for haplotype association studies, multiple linked polymorphisms and multiple outcomes are discussed. The popular Mendelian randomization approach and special cases of meta-analysis addressing issues such as the assumption of the mode of inheritance, deviation from Hardy-Weinberg Equilibrium and gene-environment interactions are also presented. All available methods are enriched with practical applications and methodologies that could be developed in the future are discussed. Links for all available software implementing multivariate meta-analysis methods are also provided.

  6. Role of Utility and Inference in the Evolution of Functional Information

    Science.gov (United States)

    Sharov, Alexei A.

    2009-01-01

    Functional information means an encoded network of functions in living organisms from molecular signaling pathways to an organism’s behavior. It is represented by two components: code and an interpretation system, which together form a self-sustaining semantic closure. Semantic closure allows some freedom between components because small variations of the code are still interpretable. The interpretation system consists of inference rules that control the correspondence between the code and the function (phenotype) and determines the shape of the fitness landscape. The utility factor operates at multiple time scales: short-term selection drives evolution towards higher survival and reproduction rate within a given fitness landscape, and long-term selection favors those fitness landscapes that support adaptability and lead to evolutionary expansion of certain lineages. Inference rules make short-term selection possible by shaping the fitness landscape and defining possible directions of evolution, but they are under control of the long-term selection of lineages. Communication normally occurs within a set of agents with compatible interpretation systems, which I call communication system. Functional information cannot be directly transferred between communication systems with incompatible inference rules. Each biological species is a genetic communication system that carries unique functional information together with inference rules that determine evolutionary directions and constraints. This view of the relation between utility and inference can resolve the conflict between realism/positivism and pragmatism. Realism overemphasizes the role of inference in evolution of human knowledge because it assumes that logic is embedded in reality. Pragmatism substitutes usefulness for truth and therefore ignores the advantage of inference. The proposed concept of evolutionary pragmatism rejects the idea that logic is embedded in reality; instead, inference rules are

  7. ERC analysis: web-based inference of gene function via evolutionary rate covariation.

    Science.gov (United States)

    Wolfe, Nicholas W; Clark, Nathan L

    2015-12-01

    The recent explosion of comparative genomics data presents an unprecedented opportunity to construct gene networks via the evolutionary rate covariation (ERC) signature. ERC is used to identify genes that experienced similar evolutionary histories, and thereby draws functional associations between them. The ERC Analysis website allows researchers to exploit genome-wide datasets to infer novel genes in any biological function and to explore deep evolutionary connections between distinct pathways and complexes. The website provides five analytical methods, graphical output, statistical support and access to an increasing number of taxonomic groups. Analyses and data at http://csb.pitt.edu/erc_analysis/ nclark@pitt.edu. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Inferring biological tasks using Pareto analysis of high-dimensional data.

    Science.gov (United States)

    Hart, Yuval; Sheftel, Hila; Hausser, Jean; Szekely, Pablo; Ben-Moshe, Noa Bossel; Korem, Yael; Tendler, Avichai; Mayo, Avraham E; Alon, Uri

    2015-03-01

    We present the Pareto task inference method (ParTI; http://www.weizmann.ac.il/mcb/UriAlon/download/ParTI) for inferring biological tasks from high-dimensional biological data. Data are described as a polytope, and features maximally enriched closest to the vertices (or archetypes) allow identification of the tasks the vertices represent. We demonstrate that human breast tumors and mouse tissues are well described by tetrahedrons in gene expression space, with specific tumor types and biological functions enriched at each of the vertices, suggesting four key tasks.

  9. Microsatellite genotyping reveals high genetic diversity but low ...

    African Journals Online (AJOL)

    JMwacharos

    2016-03-16

    Mar 16, 2016 ... diversity and (2) Investigate population structure and extent of admixture .... to estimate and partition genetic variation within and ... K between 1 and 40 and inferred its most optimal value ... populations of 0.84 ± 0.021 with the lowest mean in ..... on population stratification and the distribution of genetic.

  10. Genetic mapping of centromeres in the nine Citrus clementina chromosomes using half-tetrad analysis and recombination patterns in unreduced and haploid gametes.

    Science.gov (United States)

    Aleza, Pablo; Cuenca, José; Hernández, María; Juárez, José; Navarro, Luis; Ollitrault, Patrick

    2015-03-08

    Mapping centromere locations in plant species provides essential information for the analysis of genetic structures and population dynamics. The centromere's position affects the distribution of crossovers along a chromosome and the parental heterozygosity restitution by 2n gametes is a direct function of the genetic distance to the centromere. Sexual polyploidisation is relatively frequent in Citrus species and is widely used to develop new seedless triploid cultivars. The study's objectives were to (i) map the positions of the centromeres of the nine Citrus clementina chromosomes; (ii) analyse the crossover interference in unreduced gametes; and (iii) establish the pattern of genetic recombination in haploid clementine gametes along each chromosome and its relationship with the centromere location and distribution of genic sequences. Triploid progenies were derived from unreduced megagametophytes produced by second-division restitution. Centromere positions were mapped genetically for all linkage groups using half-tetrad analysis. Inference of the physical locations of centromeres revealed one acrocentric, four metacentric and four submetacentric chromosomes. Crossover interference was observed in unreduced gametes, with variation seen between chromosome arms. For haploid gametes, a strong decrease in the recombination rate occurred in centromeric and pericentromeric regions, which contained a low density of genic sequences. In chromosomes VIII and IX, these low recombination rates extended beyond the pericentromeric regions. The genomic region corresponding to a genetic distance recombination pattern along each chromosome. However, regions with low recombination rates extended beyond the pericentromeric regions of some chromosomes into areas richer in genic sequences. The persistence of strong linkage disequilibrium between large numbers of genes promotes the stability of epistatic interactions and multilocus-controlled traits over successive generations but

  11. Using Genetic Distance to Infer the Accuracy of Genomic Prediction.

    Directory of Open Access Journals (Sweden)

    Marco Scutari

    2016-09-01

    Full Text Available The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is important in deciding which training population to use and how often the model has to be recalibrated. We find that the correlation between true and predicted values decays approximately linearly with respect to either FST or mean kinship between the training and the target populations. We illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics.

  12. A bayesian approach to inferring the genetic population structure of sugarcane accessions from INTA (Argentina

    Directory of Open Access Journals (Sweden)

    Mariana Inés Pocovi

    2015-06-01

    Full Text Available Understanding the population structure and genetic diversity in sugarcane (Saccharum officinarum L. accessions from INTA germplasm bank (Argentina will be of great importance for germplasm collection and breeding improvement as it will identify diverse parental combinations to create segregating progenies with maximum genetic variability for further selection. A Bayesian approach, ordination methods (PCoA, Principal Coordinate Analysis and clustering analysis (UPGMA, Unweighted Pair Group Method with Arithmetic Mean were applied to this purpose. Sixty three INTA sugarcane hybrids were genotyped for 107 Simple Sequence Repeat (SSR and 136 Amplified Fragment Length Polymorphism (AFLP loci. Given the low probability values found with AFLP for individual assignment (4.7%, microsatellites seemed to perform better (54% for STRUCTURE analysis that revealed the germplasm to exist in five optimum groups with partly corresponding to their origin. However clusters shown high degree of admixture, F ST values confirmed the existence of differences among groups. Dissimilarity coefficients ranged from 0.079 to 0.651. PCoA separated sugarcane in groups that did not agree with those identified by STRUCTURE. The clustering including all genotypes neither showed resemblance to populations find by STRUCTURE, but clustering performed considering only individuals displaying a proportional membership > 0.6 in their primary population obtained with STRUCTURE showed close similarities. The Bayesian method indubitably brought more information on cultivar origins than classical PCoA and hierarchical clustering method.

  13. Gene expression inference with deep learning.

    Science.gov (United States)

    Chen, Yifei; Li, Yi; Narayan, Rajiv; Subramanian, Aravind; Xie, Xiaohui

    2016-06-15

    Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes. We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. D-GEX is available at https://github.com/uci-cbcl/D-GEX CONTACT: xhx@ics.uci.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. Russell and Humean Inferences

    Directory of Open Access Journals (Sweden)

    João Paulo Monteiro

    2001-12-01

    Full Text Available Russell's The Problems of Philosophy tries to establish a new theory of induction, at the same time that Hume is there accused of an irrational/ scepticism about induction". But a careful analysis of the theory of knowledge explicitly acknowledged by Hume reveals that, contrary to the standard interpretation in the XXth century, possibly influenced by Russell, Hume deals exclusively with causal inference (which he never classifies as "causal induction", although now we are entitled to do so, never with inductive inference in general, mainly generalizations about sensible qualities of objects ( whether, e.g., "all crows are black" or not is not among Hume's concerns. Russell's theories are thus only false alternatives to Hume's, in (1912 or in his (1948.

  15. Analysis of genetic relationships of mulberry (Morus L.) germplasm ...

    African Journals Online (AJOL)

    STORAGESEVER

    2009-06-03

    Jun 3, 2009 ... Full Length Research Paper. Analysis of genetic ... Key words: Mulberry, molecular marker, genetic diversity, SRAP. ... Europe, North and South America, and Africa, and it is cultivated ... Xingjiang autonomous region, China.

  16. Genetic diversity in India and the inference of Eurasian population expansion.

    Science.gov (United States)

    Xing, Jinchuan; Watkins, W Scott; Hu, Ya; Huff, Chad D; Sabo, Aniko; Muzny, Donna M; Bamshad, Michael J; Gibbs, Richard A; Jorde, Lynn B; Yu, Fuli

    2010-01-01

    Genetic studies of populations from the Indian subcontinent are of great interest because of India's large population size, complex demographic history, and unique social structure. Despite recent large-scale efforts in discovering human genetic variation, India's vast reservoir of genetic diversity remains largely unexplored. To analyze an unbiased sample of genetic diversity in India and to investigate human migration history in Eurasia, we resequenced one 100-kb ENCODE region in 92 samples collected from three castes and one tribal group from the state of Andhra Pradesh in south India. Analyses of the four Indian populations, along with eight HapMap populations (692 samples), showed that 30% of all SNPs in the south Indian populations are not seen in HapMap populations. Several Indian populations, such as the Yadava, Mala/Madiga, and Irula, have nucleotide diversity levels as high as those of HapMap African populations. Using unbiased allele-frequency spectra, we investigated the expansion of human populations into Eurasia. The divergence time estimates among the major population groups suggest that Eurasian populations in this study diverged from Africans during the same time frame (approximately 90 to 110 thousand years ago). The divergence among different Eurasian populations occurred more than 40,000 years after their divergence with Africans. Our results show that Indian populations harbor large amounts of genetic variation that have not been surveyed adequately by public SNP discovery efforts. Our data also support a delayed expansion hypothesis in which an ancestral Eurasian founding population remained isolated long after the out-of-Africa diaspora, before expanding throughout Eurasia. © 2010 Xing et al.; licensee BioMed Central Ltd.

  17. Testing Genetic Pleiotropy with GWAS Summary Statistics for Marginal and Conditional Analyses.

    Science.gov (United States)

    Deng, Yangqing; Pan, Wei

    2017-12-01

    There is growing interest in testing genetic pleiotropy, which is when a single genetic variant influences multiple traits. Several methods have been proposed; however, these methods have some limitations. First, all the proposed methods are based on the use of individual-level genotype and phenotype data; in contrast, for logistical, and other, reasons, summary statistics of univariate SNP-trait associations are typically only available based on meta- or mega-analyzed large genome-wide association study (GWAS) data. Second, existing tests are based on marginal pleiotropy, which cannot distinguish between direct and indirect associations of a single genetic variant with multiple traits due to correlations among the traits. Hence, it is useful to consider conditional analysis, in which a subset of traits is adjusted for another subset of traits. For example, in spite of substantial lowering of low-density lipoprotein cholesterol (LDL) with statin therapy, some patients still maintain high residual cardiovascular risk, and, for these patients, it might be helpful to reduce their triglyceride (TG) level. For this purpose, in order to identify new therapeutic targets, it would be useful to identify genetic variants with pleiotropic effects on LDL and TG after adjusting the latter for LDL; otherwise, a pleiotropic effect of a genetic variant detected by a marginal model could simply be due to its association with LDL only, given the well-known correlation between the two types of lipids. Here, we develop a new pleiotropy testing procedure based only on GWAS summary statistics that can be applied for both marginal analysis and conditional analysis. Although the main technical development is based on published union-intersection testing methods, care is needed in specifying conditional models to avoid invalid statistical estimation and inference. In addition to the previously used likelihood ratio test, we also propose using generalized estimating equations under the

  18. QuASAR: quantitative allele-specific analysis of reads.

    Science.gov (United States)

    Harvey, Chris T; Moyerbrailean, Gregory A; Davis, Gordon O; Wen, Xiaoquan; Luca, Francesca; Pique-Regi, Roger

    2015-04-15

    Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele-specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available, it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls. We present QuASAR, quantitative allele-specific analysis of reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls, while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high-quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available. http://github.com/piquelab/QuASAR. fluca@wayne.edu or rpique@wayne.edu Supplementary Material is available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. Mixed normal inference on multicointegration

    NARCIS (Netherlands)

    Boswijk, H.P.

    2009-01-01

    Asymptotic likelihood analysis of cointegration in I(2) models, see Johansen (1997, 2006), Boswijk (2000) and Paruolo (2000), has shown that inference on most parameters is mixed normal, implying hypothesis test statistics with an asymptotic 2 null distribution. The asymptotic distribution of the

  20. Genetic analysis of repeated, biparental, diploid, hydatidiform moles

    DEFF Research Database (Denmark)

    Sunde, Lone; Vejerslev, Lars O.; Jensen, Mie Poulsen

    1993-01-01

    for the abnormal development can be envisaged, environmental as well as genetic. To conform to current ideas of molar pathogenesis, it is suggested that the present conceptuses might have arisen from imbalances in imprinted genomic regions. This could be a consequence of uniparental disomy in critical regions......A woman presented with five consecutive pregnancies displaying molar morphology. In the fifth pregnancy, a non-malformed, liveborn infant was delivered. Genetic analyses (RFLP analysis, cytogenetics, flow cytometry) were performed in pregnancies II-V. It was demonstrated that these pregnancies...... originated in separate conceptions, all conceptuses were diploid, and all had maternally as well as paternally derived genetic markers. By cytogenetic analysis, aberrant heteromorphisms were noted; no other abnormalities were observed in chromosome structure or in DNA sequence. Many different causes...

  1. Obesity as a risk factor for developing functional limitation among older adults: A conditional inference tree analysis

    Science.gov (United States)

    Objective: To examine the risk factors of developing functional decline and make probabilistic predictions by using a tree-based method that allows higher order polynomials and interactions of the risk factors. Methods: The conditional inference tree analysis, a data mining approach, was used to con...

  2. Genetic diversity of popcorn genotypes using molecular analysis.

    Science.gov (United States)

    Resh, F S; Scapim, C A; Mangolin, C A; Machado, M F P S; do Amaral, A T; Ramos, H C C; Vivas, M

    2015-08-19

    In this study, we analyzed dominant molecular markers to estimate the genetic divergence of 26 popcorn genotypes and evaluate whether using various dissimilarity coefficients with these dominant markers influences the results of cluster analysis. Fifteen random amplification of polymorphic DNA primers produced 157 amplified fragments, of which 65 were monomorphic and 92 were polymorphic. To calculate the genetic distances among the 26 genotypes, the complements of the Jaccard, Dice, and Rogers and Tanimoto similarity coefficients were used. A matrix of Dij values (dissimilarity matrix) was constructed, from which the genetic distances among genotypes were represented in a more simplified manner as a dendrogram generated using the unweighted pair-group method with arithmetic average. Clusters determined by molecular analysis generally did not group material from the same parental origin together. The largest genetic distance was between varieties 17 (UNB-2) and 18 (PA-091). In the identification of genotypes with the smallest genetic distance, the 3 coefficients showed no agreement. The 3 dissimilarity coefficients showed no major differences among their grouping patterns because agreement in determining the genotypes with large, medium, and small genetic distances was high. The largest genetic distances were observed for the Rogers and Tanimoto dissimilarity coefficient (0.74), followed by the Jaccard coefficient (0.65) and the Dice coefficient (0.48). The 3 coefficients showed similar estimations for the cophenetic correlation coefficient. Correlations among the matrices generated using the 3 coefficients were positive and had high magnitudes, reflecting strong agreement among the results obtained using the 3 evaluated dissimilarity coefficients.

  3. Determination of genetic relatedness from low-coverage human genome sequences using pedigree simulations.

    Science.gov (United States)

    Martin, Michael D; Jay, Flora; Castellano, Sergi; Slatkin, Montgomery

    2017-08-01

    We develop and evaluate methods for inferring relatedness among individuals from low-coverage DNA sequences of their genomes, with particular emphasis on sequences obtained from fossil remains. We suggest the major factors complicating the determination of relatedness among ancient individuals are sequencing depth, the number of overlapping sites, the sequencing error rate and the presence of contamination from present-day genetic sources. We develop a theoretical model that facilitates the exploration of these factors and their relative effects, via measurement of pairwise genetic distances, without calling genotypes, and determine the power to infer relatedness under various scenarios of varying sequencing depth, present-day contamination and sequencing error. The model is validated by a simulation study as well as the analysis of aligned sequences from present-day human genomes. We then apply the method to the recently published genome sequences of ancient Europeans, developing a statistical treatment to determine confidence in assigned relatedness that is, in some cases, more precise than previously reported. As the majority of ancient specimens are from animals, this method would be applicable to investigate kinship in nonhuman remains. The developed software grups (Genetic Relatedness Using Pedigree Simulations) is implemented in Python and freely available. © 2017 John Wiley & Sons Ltd.

  4. RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry Inference

    Science.gov (United States)

    Maples, Brian K.; Gravel, Simon; Kenny, Eimear E.; Bustamante, Carlos D.

    2013-01-01

    Local-ancestry inference is an important step in the genetic analysis of fully sequenced human genomes. Current methods can only detect continental-level ancestry (i.e., European versus African versus Asian) accurately even when using millions of markers. Here, we present RFMix, a powerful discriminative modeling approach that is faster (∼30×) and more accurate than existing methods. We accomplish this by using a conditional random field parameterized by random forests trained on reference panels. RFMix is capable of learning from the admixed samples themselves to boost performance and autocorrect phasing errors. RFMix shows high sensitivity and specificity in simulated Hispanics/Latinos and African Americans and admixed Europeans, Africans, and Asians. Finally, we demonstrate that African Americans in HapMap contain modest (but nonzero) levels of Native American ancestry (∼0.4%). PMID:23910464

  5. Bayesian Inference using Neural Net Likelihood Models for Protein Secondary Structure Prediction

    Directory of Open Access Journals (Sweden)

    Seong-Gon Kim

    2011-06-01

    Full Text Available Several techniques such as Neural Networks, Genetic Algorithms, Decision Trees and other statistical or heuristic methods have been used to approach the complex non-linear task of predicting Alpha-helicies, Beta-sheets and Turns of a proteins secondary structure in the past. This project introduces a new machine learning method by using an offline trained Multilayered Perceptrons (MLP as the likelihood models within a Bayesian Inference framework to predict secondary structures proteins. Varying window sizes are used to extract neighboring amino acid information and passed back and forth between the Neural Net models and the Bayesian Inference process until there is a convergence of the posterior secondary structure probability.

  6. RESEARCH NOTE Molecular genetic analysis of consanguineous ...

    Indian Academy of Sciences (India)

    Navya

    Molecular genetic analysis of consanguineous families with primary microcephaly ... Translational Research Institute, Academic Health System, Hamad Medical ..... bridging the gap between homozygosity mapping and deep sequencing.

  7. An analysis of line-drawings based upon automatically inferred grammar and its application to chest x-ray images

    International Nuclear Information System (INIS)

    Nakayama, Akira; Yoshida, Yuuji; Fukumura, Teruo

    1984-01-01

    There is a technique using inferring grammer as image- structure analyzing technique. This technique involves a few problems if it is applied to naturally obtained images, as the practical grammatical technique for two-dimensional image is not established. The authors developed a technique which solved the above problems for the main purpose of the automated structure analysis of naturally obtained image. The first half of this paper describes on the automatic inference of line drawing generation grammar and the line drawing analysis based on that automatic inference. The second half of the paper reports on the actual analysis. The proposed technique is that to extract object line drawings out of the line drawings containing noise. The technique was evaluated for its effectiveness with an example of extracting rib center lines out of thin line chest X-ray images having practical scale and complexity. In this example, the total number of characteristic points (ends, branch points and intersections) composing line drawings per one image was 377, and the total number of line segments composing line drawings was 566 on average per sheet. The extraction ratio was 86.6 % which seemed to be proper when the complexity of input line drawings was considered. Further, the result was compared with the identified rib center lines with the automatic screening system AISCR-V3 for comparison with the conventional processing technique, and it was satisfactory when the versatility of this method was considered. (Wakatsuki, Y.)

  8. A genetic analysis of Trichuris trichiura and Trichuris suis from Ecuador

    DEFF Research Database (Denmark)

    Meekums, Hayley; Hawash, Mohamed B F; Sparks, Alexandra M

    2015-01-01

    BACKGROUND: Since the nematodes Trichuris trichiura and T. suis are morphologically indistinguishable, genetic analysis is required to assess epidemiological cross-over between people and pigs. This study aimed to clarify the transmission biology of trichuriasis in Ecuador. FINDINGS: Adult...... Trichuris worms were collected during a parasitological survey of 132 people and 46 pigs in Esmeraldas Province, Ecuador. Morphometric analysis of 49 pig worms and 64 human worms revealed significant variation. In discriminant analysis morphometric characteristics correctly classified male worms according...... to genetically analyse Trichuris parasites. Although T. trichiura does not appear to be zoonotic in Ecuador, there is evidence of genetic exchange between T. trichiura and T. suis warranting more detailed genetic sampling....

  9. Inverse Analysis of Pavement Structural Properties Based on Dynamic Finite Element Modeling and Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Xiaochao Tang

    2013-03-01

    Full Text Available With the movement towards the implementation of mechanistic-empirical pavement design guide (MEPDG, an accurate determination of pavement layer moduli is vital for predicting pavement critical mechanistic responses. A backcalculation procedure is commonly used to estimate the pavement layer moduli based on the non-destructive falling weight deflectometer (FWD tests. Backcalculation of flexible pavement layer properties is an inverse problem with known input and output signals based upon which unknown parameters of the pavement system are evaluated. In this study, an inverse analysis procedure that combines the finite element analysis and a population-based optimization technique, Genetic Algorithm (GA has been developed to determine the pavement layer structural properties. A lightweight deflectometer (LWD was used to infer the moduli of instrumented three-layer scaled flexible pavement models. While the common practice in backcalculating pavement layer properties still assumes a static FWD load and uses only peak values of the load and deflections, dynamic analysis was conducted to simulate the impulse LWD load. The recorded time histories of the LWD load were used as the known inputs into the pavement system while the measured time-histories of surface central deflections and subgrade deflections measured with a linear variable differential transformers (LVDT were considered as the outputs. As a result, consistent pavement layer moduli can be obtained through this inverse analysis procedure.

  10. Evaluation of genetic diversity of Portuguese Pinus sylvestris L.

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Genetics; Volume 92; Online resources. Evaluation of genetic diversity of Portuguese Pinus sylvestris L. populations based on molecular data and inferences about the future use of this germplasm. J. Cipriano A. Carvalho C. Fernandes M. J. Gaspar J. Pires J. Bento L. Roxo J. Louzada J. Lima- ...

  11. Network inference via adaptive optimal design

    Directory of Open Access Journals (Sweden)

    Stigter Johannes D

    2012-09-01

    Full Text Available Abstract Background Current research in network reverse engineering for genetic or metabolic networks very often does not include a proper experimental and/or input design. In this paper we address this issue in more detail and suggest a method that includes an iterative design of experiments based, on the most recent data that become available. The presented approach allows a reliable reconstruction of the network and addresses an important issue, i.e., the analysis and the propagation of uncertainties as they exist in both the data and in our own knowledge. These two types of uncertainties have their immediate ramifications for the uncertainties in the parameter estimates and, hence, are taken into account from the very beginning of our experimental design. Findings The method is demonstrated for two small networks that include a genetic network for mRNA synthesis and degradation and an oscillatory network describing a molecular network underlying adenosine 3’-5’ cyclic monophosphate (cAMP as observed in populations of Dyctyostelium cells. In both cases a substantial reduction in parameter uncertainty was observed. Extension to larger scale networks is possible but needs a more rigorous parameter estimation algorithm that includes sparsity as a constraint in the optimization procedure. Conclusion We conclude that a careful experiment design very often (but not always pays off in terms of reliability in the inferred network topology. For large scale networks a better parameter estimation algorithm is required that includes sparsity as an additional constraint. These algorithms are available in the literature and can also be used in an adaptive optimal design setting as demonstrated in this paper.

  12. Simultaneous inference of phenotype-associated genes and relevant tissues from GWAS data via Bayesian integration of multiple tissue-specific gene networks.

    Science.gov (United States)

    Wu, Mengmeng; Lin, Zhixiang; Ma, Shining; Chen, Ting; Jiang, Rui; Wong, Wing Hung

    2017-12-01

    Although genome-wide association studies (GWAS) have successfully identified thousands of genomic loci associated with hundreds of complex traits in the past decade, the debate about such problems as missing heritability and weak interpretability has been appealing for effective computational methods to facilitate the advanced analysis of the vast volume of existing and anticipated genetic data. Towards this goal, gene-level integrative GWAS analysis with the assumption that genes associated with a phenotype tend to be enriched in biological gene sets or gene networks has recently attracted much attention, due to such advantages as straightforward interpretation, less multiple testing burdens, and robustness across studies. However, existing methods in this category usually exploit non-tissue-specific gene networks and thus lack the ability to utilize informative tissue-specific characteristics. To overcome this limitation, we proposed a Bayesian approach called SIGNET (Simultaneously Inference of GeNEs and Tissues) to integrate GWAS data and multiple tissue-specific gene networks for the simultaneous inference of phenotype-associated genes and relevant tissues. Through extensive simulation studies, we showed the effectiveness of our method in finding both associated genes and relevant tissues for a phenotype. In applications to real GWAS data of 14 complex phenotypes, we demonstrated the power of our method in both deciphering genetic basis and discovering biological insights of a phenotype. With this understanding, we expect to see SIGNET as a valuable tool for integrative GWAS analysis, thereby boosting the prevention, diagnosis, and treatment of human inherited diseases and eventually facilitating precision medicine.

  13. Quantifying population genetic differentiation from next-generation sequencing data

    DEFF Research Database (Denmark)

    Fumagalli, Matteo; Garrett Vieira, Filipe Jorge; Korneliussen, Thorfinn Sand

    2013-01-01

    method for quantifying population genetic differentiation from next-generation sequencing data. In addition, we present a strategy to investigate population structure via Principal Components Analysis. Through extensive simulations, we compare the new method herein proposed to approaches based...... on genotype calling and demonstrate a marked improvement in estimation accuracy for a wide range of conditions. We apply the method to a large-scale genomic data set of domesticated and wild silkworms sequenced at low coverage. We find that we can infer the fine-scale genetic structure of the sampled......Over the last few years, new high-throughput DNA sequencing technologies have dramatically increased speed and reduced sequencing costs. However, the use of these sequencing technologies is often challenged by errors and biases associated with the bioinformatical methods used for analyzing the data...

  14. Analysis of genetic structure in Melia volkensii (Gurke.) populations ...

    African Journals Online (AJOL)

    Administrator

    2Farm Forestry Programme, Kenya Forestry Research Institute, P. O. Box 20412, Nairobi, Kenya. Accepted 5 ... were used to estimate genetic distances between populations and for construction of neighbour-joining phenograms. Analysis of Molecular Variance (AMOVA) indicated significant genetic differentiation between ...

  15. Structural model analysis of multiple quantitative traits.

    Directory of Open Access Journals (Sweden)

    Renhua Li

    2006-07-01

    Full Text Available We introduce a method for the analysis of multilocus, multitrait genetic data that provides an intuitive and precise characterization of genetic architecture. We show that it is possible to infer the magnitude and direction of causal relationships among multiple correlated phenotypes and illustrate the technique using body composition and bone density data from mouse intercross populations. Using these techniques we are able to distinguish genetic loci that affect adiposity from those that affect overall body size and thus reveal a shortcoming of standardized measures such as body mass index that are widely used in obesity research. The identification of causal networks sheds light on the nature of genetic heterogeneity and pleiotropy in complex genetic systems.

  16. Entropic Inference

    Science.gov (United States)

    Caticha, Ariel

    2011-03-01

    In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEnt and Bayes' rule, and therefore unifies the two themes of these workshops—the Maximum Entropy and the Bayesian methods—into a single general inference scheme.

  17. Statistical learning and selective inference.

    Science.gov (United States)

    Taylor, Jonathan; Tibshirani, Robert J

    2015-06-23

    We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.

  18. Efficient Bayesian inference for ARFIMA processes

    Science.gov (United States)

    Graves, T.; Gramacy, R. B.; Franzke, C. L. E.; Watkins, N. W.

    2015-03-01

    Many geophysical quantities, like atmospheric temperature, water levels in rivers, and wind speeds, have shown evidence of long-range dependence (LRD). LRD means that these quantities experience non-trivial temporal memory, which potentially enhances their predictability, but also hampers the detection of externally forced trends. Thus, it is important to reliably identify whether or not a system exhibits LRD. In this paper we present a modern and systematic approach to the inference of LRD. Rather than Mandelbrot's fractional Gaussian noise, we use the more flexible Autoregressive Fractional Integrated Moving Average (ARFIMA) model which is widely used in time series analysis, and of increasing interest in climate science. Unlike most previous work on the inference of LRD, which is frequentist in nature, we provide a systematic treatment of Bayesian inference. In particular, we provide a new approximate likelihood for efficient parameter inference, and show how nuisance parameters (e.g. short memory effects) can be integrated over in order to focus on long memory parameters, and hypothesis testing more directly. We illustrate our new methodology on the Nile water level data, with favorable comparison to the standard estimators.

  19. Inferring the gene network underlying the branching of tomato inflorescence.

    Directory of Open Access Journals (Sweden)

    Laura Astola

    Full Text Available The architecture of tomato inflorescence strongly affects flower production and subsequent crop yield. To understand the genetic activities involved, insight into the underlying network of genes that initiate and control the sympodial growth in the tomato is essential. In this paper, we show how the structure of this network can be derived from available data of the expressions of the involved genes. Our approach starts from employing biological expert knowledge to select the most probable gene candidates behind branching behavior. To find how these genes interact, we develop a stepwise procedure for computational inference of the network structure. Our data consists of expression levels from primary shoot meristems, measured at different developmental stages on three different genotypes of tomato. With the network inferred by our algorithm, we can explain the dynamics corresponding to all three genotypes simultaneously, despite their apparent dissimilarities. We also correctly predict the chronological order of expression peaks for the main hubs in the network. Based on the inferred network, using optimal experimental design criteria, we are able to suggest an informative set of experiments for further investigation of the mechanisms underlying branching behavior.

  20. Statistical inference based on divergence measures

    CERN Document Server

    Pardo, Leandro

    2005-01-01

    The idea of using functionals of Information Theory, such as entropies or divergences, in statistical inference is not new. However, in spite of the fact that divergence statistics have become a very good alternative to the classical likelihood ratio test and the Pearson-type statistic in discrete models, many statisticians remain unaware of this powerful approach.Statistical Inference Based on Divergence Measures explores classical problems of statistical inference, such as estimation and hypothesis testing, on the basis of measures of entropy and divergence. The first two chapters form an overview, from a statistical perspective, of the most important measures of entropy and divergence and study their properties. The author then examines the statistical analysis of discrete multivariate data with emphasis is on problems in contingency tables and loglinear models using phi-divergence test statistics as well as minimum phi-divergence estimators. The final chapter looks at testing in general populations, prese...

  1. Dynamic modeling of genetic networks using genetic algorithm and S-system.

    Science.gov (United States)

    Kikuchi, Shinichi; Tominaga, Daisuke; Arita, Masanori; Takahashi, Katsutoshi; Tomita, Masaru

    2003-03-22

    The modeling of system dynamics of genetic networks, metabolic networks or signal transduction cascades from time-course data is formulated as a reverse-problem. Previous studies focused on the estimation of only network structures, and they were ineffective in inferring a network structure with feedback loops. We previously proposed a method to predict not only the network structure but also its dynamics using a Genetic Algorithm (GA) and an S-system formalism. However, it could predict only a small number of parameters and could rarely obtain essential structures. In this work, we propose a unified extension of the basic method. Notable improvements are as follows: (1) an additional term in its evaluation function that aims at eliminating futile parameters; (2) a crossover method called Simplex Crossover (SPX) to improve its optimization ability; and (3) a gradual optimization strategy to increase the number of predictable parameters. The proposed method is implemented as a C program called PEACE1 (Predictor by Evolutionary Algorithms and Canonical Equations 1). Its performance was compared with the basic method. The comparison showed that: (1) the convergence rate increased about 5-fold; (2) the optimization speed was raised about 1.5-fold; and (3) the number of predictable parameters was increased about 5-fold. Moreover, we successfully inferred the dynamics of a small genetic network constructed with 60 parameters for 5 network variables and feedback loops using only time-course data of gene expression.

  2. MetaGenyo: a web tool for meta-analysis of genetic association studies.

    Science.gov (United States)

    Martorell-Marugan, Jordi; Toro-Dominguez, Daniel; Alarcon-Riquelme, Marta E; Carmona-Saez, Pedro

    2017-12-16

    Genetic association studies (GAS) aims to evaluate the association between genetic variants and phenotypes. In the last few years, the number of this type of study has increased exponentially, but the results are not always reproducible due to experimental designs, low sample sizes and other methodological errors. In this field, meta-analysis techniques are becoming very popular tools to combine results across studies to increase statistical power and to resolve discrepancies in genetic association studies. A meta-analysis summarizes research findings, increases statistical power and enables the identification of genuine associations between genotypes and phenotypes. Meta-analysis techniques are increasingly used in GAS, but it is also increasing the amount of published meta-analysis containing different errors. Although there are several software packages that implement meta-analysis, none of them are specifically designed for genetic association studies and in most cases their use requires advanced programming or scripting expertise. We have developed MetaGenyo, a web tool for meta-analysis in GAS. MetaGenyo implements a complete and comprehensive workflow that can be executed in an easy-to-use environment without programming knowledge. MetaGenyo has been developed to guide users through the main steps of a GAS meta-analysis, covering Hardy-Weinberg test, statistical association for different genetic models, analysis of heterogeneity, testing for publication bias, subgroup analysis and robustness testing of the results. MetaGenyo is a useful tool to conduct comprehensive genetic association meta-analysis. The application is freely available at http://bioinfo.genyo.es/metagenyo/ .

  3. Quantifying selection in evolving populations using time-resolved genetic data

    Science.gov (United States)

    Illingworth, Christopher J. R.; Mustonen, Ville

    2013-01-01

    Methods which uncover the molecular basis of the adaptive evolution of a population address some important biological questions. For example, the problem of identifying genetic variants which underlie drug resistance, a question of importance for the treatment of pathogens, and of cancer, can be understood as a matter of inferring selection. One difficulty in the inference of variants under positive selection is the potential complexity of the underlying evolutionary dynamics, which may involve an interplay between several contributing processes, including mutation, recombination and genetic drift. A source of progress may be found in modern sequencing technologies, which confer an increasing ability to gather information about evolving populations, granting a window into these complex processes. One particularly interesting development is the ability to follow evolution as it happens, by whole-genome sequencing of an evolving population at multiple time points. We here discuss how to use time-resolved sequence data to draw inferences about the evolutionary dynamics of a population under study. We begin by reviewing our earlier analysis of a yeast selection experiment, in which we used a deterministic evolutionary framework to identify alleles under selection for heat tolerance, and to quantify the selection acting upon them. Considering further the use of advanced intercross lines to measure selection, we here extend this framework to cover scenarios of simultaneous recombination and selection, and of two driver alleles with multiple linked neutral, or passenger, alleles, where the driver pair evolves under an epistatic fitness landscape. We conclude by discussing the limitations of the approach presented and outlining future challenges for such methodologies.

  4. Genetic inferences in common bean differential cultivars to Colletotrichum lindemuthianum race 69/ Inferências genéticas em cultivares diferenciadoras de feijoeiro comum ao Colletotrichum lindemuthianum raça 69

    Directory of Open Access Journals (Sweden)

    Adilson R. Schuelter

    2006-06-01

    Full Text Available Anthracnose caused by the Colletotrichum lindemuthianum Sacc. et Magn fungus, is one of the most important diseases and can result in heavy yield losses to the common bean (Phaseolus vulgaris L.. Genetic inferences about resistance of cultivars: Michelite, Michigan Dark Red Kidney, Perry Marrow, Cornell 49-242, PI 207262, AB 136, G 2333 and their 21 diallel hybrids were obtained in relation to the reaction to 69 race by using Hayman’s method. The results showed that dominance effects were higher than additive effects to resistance of the related race. The order of parents in relation to dominant genes concentration obtained was: G 2333, AB 136, PI 207262, Cornell 49-242, Michigan Dark Red Kidney, Perry Marrow and Michelite. G 2333, AB 136 and PI 707262 parents are the most indicated for breeding programs to obtain anthracnose resistant cultivars.A antracnose, causada pelo fungo Colletotrichum lindemuthianum Sacc cet Magn, é uma das mais importantes doenças e pode causar severas perdas ao cultivo do feijão comum (Phaseolus vulgaris L.. Inferências genéticas sobre a resistência de sete cultivares diferenciais de feijão comum (Michelite, Michigan Dark Red Kidney, Perry Marrow, Cornell 49-242, PI 207262, AB 136, G 2333 e seus 21 híbridos dialélicos foram obtidas em relação à raça 69, por meio da metodologia de Hayman. Os resultados mostraram que os efeitos dominantes foram superiores aos aditivos para resistência à referida raça. A ordem dos parentais em relação à concentração de genes dominantes obtida foi: G 2333, AB 136, PI 207262, Cornell 49-242, Michigan Dark Red Kidney, Perry Marrow e Michelite. Os parentais G 2333, AB 136 e PI 707262 são os mais indicados para programas de melhoramento visando à obtenção de cultivares resistentes à antracnose.

  5. Inferring regulatory networks from expression data using tree-based methods.

    Directory of Open Access Journals (Sweden)

    Vân Anh Huynh-Thu

    2010-09-01

    Full Text Available One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene is predicted from the expression patterns of all the other genes (input genes, using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions.

  6. Inferring common cognitive mechanisms from brain blood-flow lateralization data: a new methodology for fTCD analysis.

    Science.gov (United States)

    Meyer, Georg F; Spray, Amy; Fairlie, Jo E; Uomini, Natalie T

    2014-01-01

    Current neuroimaging techniques with high spatial resolution constrain participant motion so that many natural tasks cannot be carried out. The aim of this paper is to show how a time-locked correlation-analysis of cerebral blood flow velocity (CBFV) lateralization data, obtained with functional TransCranial Doppler (fTCD) ultrasound, can be used to infer cerebral activation patterns across tasks. In a first experiment we demonstrate that the proposed analysis method results in data that are comparable with the standard Lateralization Index (LI) for within-task comparisons of CBFV patterns, recorded during cued word generation (CWG) at two difficulty levels. In the main experiment we demonstrate that the proposed analysis method shows correlated blood-flow patterns for two different cognitive tasks that are known to draw on common brain areas, CWG, and Music Synthesis. We show that CBFV patterns for Music and CWG are correlated only for participants with prior musical training. CBFV patterns for tasks that draw on distinct brain areas, the Tower of London and CWG, are not correlated. The proposed methodology extends conventional fTCD analysis by including temporal information in the analysis of cerebral blood-flow patterns to provide a robust, non-invasive method to infer whether common brain areas are used in different cognitive tasks. It complements conventional high resolution imaging techniques.

  7. Using DNA fingerprints to infer familial relationships within NHANES III households.

    Science.gov (United States)

    Katki, Hormuzd A; Sanders, Christopher L; Graubard, Barry I; Bergen, Andrew W

    2010-06-01

    Developing, targeting, and evaluating genomic strategies for population-based disease prevention require population-based data. In response to this urgent need, genotyping has been conducted within the Third National Health and Nutrition Examination (NHANES III), the nationally-representative household-interview health survey in the U.S. However, before these genetic analyses can occur, family relationships within households must be accurately ascertained. Unfortunately, reported family relationships within NHANES III households based on questionnaire data are incomplete and inconclusive with regards to actual biological relatedness of family members. We inferred family relationships within households using DNA fingerprints (Identifiler(R)) that contain the DNA loci used by law enforcement agencies for forensic identification of individuals. However, performance of these loci for relationship inference is not well understood. We evaluated two competing statistical methods for relationship inference on pairs of household members: an exact likelihood ratio relying on allele frequencies to an Identical By State (IBS) likelihood ratio that only requires matching alleles. We modified these methods to account for genotyping errors and population substructure. The two methods usually agree on the rankings of the most likely relationships. However, the IBS method underestimates the likelihood ratio by not accounting for the informativeness of matching rare alleles. The likelihood ratio is sensitive to estimates of population substructure, and parent-child relationships are sensitive to the specified genotyping error rate. These loci were unable to distinguish second-degree relationships and cousins from being unrelated. The genetic data is also useful for verifying reported relationships and identifying data quality issues. An important by-product is the first explicitly nationally-representative estimates of allele frequencies at these ubiquitous forensic loci.

  8. More than one kind of inference: re-examining what's learned in feature inference and classification.

    Science.gov (United States)

    Sweller, Naomi; Hayes, Brett K

    2010-08-01

    Three studies examined how task demands that impact on attention to typical or atypical category features shape the category representations formed through classification learning and inference learning. During training categories were learned via exemplar classification or by inferring missing exemplar features. In the latter condition inferences were made about missing typical features alone (typical feature inference) or about both missing typical and atypical features (mixed feature inference). Classification and mixed feature inference led to the incorporation of typical and atypical features into category representations, with both kinds of features influencing inferences about familiar (Experiments 1 and 2) and novel (Experiment 3) test items. Those in the typical inference condition focused primarily on typical features. Together with formal modelling, these results challenge previous accounts that have characterized inference learning as producing a focus on typical category features. The results show that two different kinds of inference learning are possible and that these are subserved by different kinds of category representations.

  9. Perceptual inference.

    Science.gov (United States)

    Aggelopoulos, Nikolaos C

    2015-08-01

    Perceptual inference refers to the ability to infer sensory stimuli from predictions that result from internal neural representations built through prior experience. Methods of Bayesian statistical inference and decision theory model cognition adequately by using error sensing either in guiding action or in "generative" models that predict the sensory information. In this framework, perception can be seen as a process qualitatively distinct from sensation, a process of information evaluation using previously acquired and stored representations (memories) that is guided by sensory feedback. The stored representations can be utilised as internal models of sensory stimuli enabling long term associations, for example in operant conditioning. Evidence for perceptual inference is contributed by such phenomena as the cortical co-localisation of object perception with object memory, the response invariance in the responses of some neurons to variations in the stimulus, as well as from situations in which perception can be dissociated from sensation. In the context of perceptual inference, sensory areas of the cerebral cortex that have been facilitated by a priming signal may be regarded as comparators in a closed feedback loop, similar to the better known motor reflexes in the sensorimotor system. The adult cerebral cortex can be regarded as similar to a servomechanism, in using sensory feedback to correct internal models, producing predictions of the outside world on the basis of past experience. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. SEMANTIC PATCH INFERENCE

    DEFF Research Database (Denmark)

    Andersen, Jesper

    2009-01-01

    Collateral evolution the problem of updating several library-using programs in response to API changes in the used library. In this dissertation we address the issue of understanding collateral evolutions by automatically inferring a high-level specification of the changes evident in a given set ...... specifications inferred by spdiff in Linux are shown. We find that the inferred specifications concisely capture the actual collateral evolution performed in the examples....

  11. International Conference on Trends and Perspectives in Linear Statistical Inference

    CERN Document Server

    Rosen, Dietrich

    2018-01-01

    This volume features selected contributions on a variety of topics related to linear statistical inference. The peer-reviewed papers from the International Conference on Trends and Perspectives in Linear Statistical Inference (LinStat 2016) held in Istanbul, Turkey, 22-25 August 2016, cover topics in both theoretical and applied statistics, such as linear models, high-dimensional statistics, computational statistics, the design of experiments, and multivariate analysis. The book is intended for statisticians, Ph.D. students, and professionals who are interested in statistical inference. .

  12. Causal inference in nonlinear systems: Granger causality versus time-delayed mutual information

    Science.gov (United States)

    Li, Songting; Xiao, Yanyang; Zhou, Douglas; Cai, David

    2018-05-01

    The Granger causality (GC) analysis has been extensively applied to infer causal interactions in dynamical systems arising from economy and finance, physics, bioinformatics, neuroscience, social science, and many other fields. In the presence of potential nonlinearity in these systems, the validity of the GC analysis in general is questionable. To illustrate this, here we first construct minimal nonlinear systems and show that the GC analysis fails to infer causal relations in these systems—it gives rise to all types of incorrect causal directions. In contrast, we show that the time-delayed mutual information (TDMI) analysis is able to successfully identify the direction of interactions underlying these nonlinear systems. We then apply both methods to neuroscience data collected from experiments and demonstrate that the TDMI analysis but not the GC analysis can identify the direction of interactions among neuronal signals. Our work exemplifies inference hazards in the GC analysis in nonlinear systems and suggests that the TDMI analysis can be an appropriate tool in such a case.

  13. Event History Analysis in Quantitative Genetics

    DEFF Research Database (Denmark)

    Maia, Rafael Pimentel

    Event history analysis is a clas of statistical methods specially designed to analyze time-to-event characteristics, e.g. the time until death. The aim of the thesis was to present adequate multivariate versions of mixed survival models that properly represent the genetic aspects related to a given...

  14. Human Inferences about Sequences: A Minimal Transition Probability Model.

    Directory of Open Access Journals (Sweden)

    Florent Meyniel

    2016-12-01

    Full Text Available The brain constantly infers the causes of the inputs it receives and uses these inferences to generate statistical expectations about future observations. Experimental evidence for these expectations and their violations include explicit reports, sequential effects on reaction times, and mismatch or surprise signals recorded in electrophysiology and functional MRI. Here, we explore the hypothesis that the brain acts as a near-optimal inference device that constantly attempts to infer the time-varying matrix of transition probabilities between the stimuli it receives, even when those stimuli are in fact fully unpredictable. This parsimonious Bayesian model, with a single free parameter, accounts for a broad range of findings on surprise signals, sequential effects and the perception of randomness. Notably, it explains the pervasive asymmetry between repetitions and alternations encountered in those studies. Our analysis suggests that a neural machinery for inferring transition probabilities lies at the core of human sequence knowledge.

  15. AD-LIBS: inferring ancestry across hybrid genomes using low-coverage sequence data.

    Science.gov (United States)

    Schaefer, Nathan K; Shapiro, Beth; Green, Richard E

    2017-04-04

    Inferring the ancestry of each region of admixed individuals' genomes is useful in studies ranging from disease gene mapping to speciation genetics. Current methods require high-coverage genotype data and phased reference panels, and are therefore inappropriate for many data sets. We present a software application, AD-LIBS, that uses a hidden Markov model to infer ancestry across hybrid genomes without requiring variant calling or phasing. This approach is useful for non-model organisms and in cases of low-coverage data, such as ancient DNA. We demonstrate the utility of AD-LIBS with synthetic data. We then use AD-LIBS to infer ancestry in two published data sets: European human genomes with Neanderthal ancestry and brown bear genomes with polar bear ancestry. AD-LIBS correctly infers 87-91% of ancestry in simulations and produces ancestry maps that agree with published results and global ancestry estimates in humans. In brown bears, we find more polar bear ancestry than has been published previously, using both AD-LIBS and an existing software application for local ancestry inference, HAPMIX. We validate AD-LIBS polar bear ancestry maps by recovering a geographic signal within bears that mirrors what is seen in SNP data. Finally, we demonstrate that AD-LIBS is more effective than HAPMIX at inferring ancestry when preexisting phased reference data are unavailable and genomes are sequenced to low coverage. AD-LIBS is an effective tool for ancestry inference that can be used even when few individuals are available for comparison or when genomes are sequenced to low coverage. AD-LIBS is therefore likely to be useful in studies of non-model or ancient organisms that lack large amounts of genomic DNA. AD-LIBS can therefore expand the range of studies in which admixture mapping is a viable tool.

  16. Phylodynamic Inference with Kernel ABC and Its Application to HIV Epidemiology.

    Science.gov (United States)

    Poon, Art F Y

    2015-09-01

    The shapes of phylogenetic trees relating virus populations are determined by the adaptation of viruses within each host, and by the transmission of viruses among hosts. Phylodynamic inference attempts to reverse this flow of information, estimating parameters of these processes from the shape of a virus phylogeny reconstructed from a sample of genetic sequences from the epidemic. A key challenge to phylodynamic inference is quantifying the similarity between two trees in an efficient and comprehensive way. In this study, I demonstrate that a new distance measure, based on a subset tree kernel function from computational linguistics, confers a significant improvement over previous measures of tree shape for classifying trees generated under different epidemiological scenarios. Next, I incorporate this kernel-based distance measure into an approximate Bayesian computation (ABC) framework for phylodynamic inference. ABC bypasses the need for an analytical solution of model likelihood, as it only requires the ability to simulate data from the model. I validate this "kernel-ABC" method for phylodynamic inference by estimating parameters from data simulated under a simple epidemiological model. Results indicate that kernel-ABC attained greater accuracy for parameters associated with virus transmission than leading software on the same data sets. Finally, I apply the kernel-ABC framework to study a recent outbreak of a recombinant HIV subtype in China. Kernel-ABC provides a versatile framework for phylodynamic inference because it can fit a broader range of models than methods that rely on the computation of exact likelihoods. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  17. Inferring the demographic history of European Ficedula flycatcher populations

    Directory of Open Access Journals (Sweden)

    Backström Niclas

    2013-01-01

    Full Text Available Abstract Background Inference of population and species histories and population stratification using genetic data is important for discriminating between different speciation scenarios and for correct interpretation of genome scans for signs of adaptive evolution and trait association. Here we use data from 24 intronic loci re-sequenced in population samples of two closely related species, the pied flycatcher and the collared flycatcher. Results We applied Isolation-Migration models, assignment analyses and estimated the genetic differentiation and diversity between species and between populations within species. The data indicate a divergence time between the species of Conclusions Our results provide further evidence for a divergence process where different genomic regions may be at different stages of speciation. We also conclude that forthcoming analyses of genotype-phenotype relations in these ecological model species should be designed to take population stratification into account.

  18. Inference with constrained hidden Markov models in PRISM

    DEFF Research Database (Denmark)

    Christiansen, Henning; Have, Christian Theil; Lassen, Ole Torp

    2010-01-01

    A Hidden Markov Model (HMM) is a common statistical model which is widely used for analysis of biological sequence data and other sequential phenomena. In the present paper we show how HMMs can be extended with side-constraints and present constraint solving techniques for efficient inference. De......_different are integrated. We experimentally validate our approach on the biologically motivated problem of global pairwise alignment.......A Hidden Markov Model (HMM) is a common statistical model which is widely used for analysis of biological sequence data and other sequential phenomena. In the present paper we show how HMMs can be extended with side-constraints and present constraint solving techniques for efficient inference...

  19. SimHap GUI: an intuitive graphical user interface for genetic association analysis.

    Science.gov (United States)

    Carter, Kim W; McCaskie, Pamela A; Palmer, Lyle J

    2008-12-25

    Researchers wishing to conduct genetic association analysis involving single nucleotide polymorphisms (SNPs) or haplotypes are often confronted with the lack of user-friendly graphical analysis tools, requiring sophisticated statistical and informatics expertise to perform relatively straightforward tasks. Tools, such as the SimHap package for the R statistics language, provide the necessary statistical operations to conduct sophisticated genetic analysis, but lacks a graphical user interface that allows anyone but a professional statistician to effectively utilise the tool. We have developed SimHap GUI, a cross-platform integrated graphical analysis tool for conducting epidemiological, single SNP and haplotype-based association analysis. SimHap GUI features a novel workflow interface that guides the user through each logical step of the analysis process, making it accessible to both novice and advanced users. This tool provides a seamless interface to the SimHap R package, while providing enhanced functionality such as sophisticated data checking, automated data conversion, and real-time estimations of haplotype simulation progress. SimHap GUI provides a novel, easy-to-use, cross-platform solution for conducting a range of genetic and non-genetic association analyses. This provides a free alternative to commercial statistics packages that is specifically designed for genetic association analysis.

  20. A genetic epidemiological mega analysis of smoking initiation in adolescents

    NARCIS (Netherlands)

    Maes, H.H.; Prom-Wormley, E.; Eaves, L.J.; Rhee, S.H.; Hewitt, J.K.; Young, S.; Corley, R.; McGue, M.K.; Iacono, W.G.; Legrand, L.; Samek, D.; Murrelle, E.L.; Silberg, J.L.; Miles, D.; Schieken, R.M.; Beunen, G.P.; Thomis, M.; Rose, R.J.; Dick, D.M.; Boomsma, D.I.; Bartels, M.; Vink, J.M.; Lichtenstein, P.; White, V.; Kaprio, J.; Neale, M.C.

    2017-01-01

    Introduction. Previous studies in adolescents were not adequately powered to accurately disentangle genetic and environmental influences on smoking initiation across adolescence. Methods. Mega-analysis of pooled genetically informative data on smoking initiation was performed, with structural

  1. Diversification of the silverspot butterflies (Nymphalidae) in the Neotropics inferred from multi-locus DNA sequences.

    Science.gov (United States)

    Massardo, Darli; Fornel, Rodrigo; Kronforst, Marcus; Gonçalves, Gislene Lopes; Moreira, Gilson Rudinei Pires

    2015-01-01

    The tribe Heliconiini (Lepidoptera: Nymphalidae) is a diverse group of butterflies distributed throughout the Neotropics, which has been studied extensively, in particular the genus Heliconius. However, most of the other lineages, such as Dione, which are less diverse and considered basal within the group, have received little attention. Basic information, such as species limits and geographical distributions remain uncertain for this genus. Here we used multilocus DNA sequence data and the geographical distribution analysis across the entire range of Dione in the Neotropical region in order to make inferences on the evolutionary history of this poorly explored lineage. Bayesian time-tree reconstruction allows inferring two major diversification events in this tribe around 25mya. Lineages thought to be ancient, such as Dione and Agraulis, are as recent as Heliconius. Dione formed a monophyletic clade, sister to the genus Agraulis. Dione juno, D. glycera and D. moneta were reciprocally monophyletic and formed genetic clusters, with the first two more close related than each other in relation to the third. Divergence time estimates support the hypothesis that speciation in Dione coincided with both the rise of Passifloraceae (the host plants) and the uplift of the Andes. Since the sister species D. glycera and D. moneta are specialized feeders on passion-vine lineages that are endemic to areas located either within or adjacent to the Andes, we inferred that they co-speciated with their host plants during this vicariant event. Copyright © 2014 Elsevier Inc. All rights reserved.

  2. Parameter determination for quantitative PIXE analysis using genetic algorithms

    International Nuclear Information System (INIS)

    Aspiazu, J.; Belmont-Moreno, E.

    1996-01-01

    For biological and environmental samples, PIXE technique is in particular advantage for elemental analysis, but the quantitative analysis implies accomplishing complex calculations that require the knowledge of more than a dozen parameters. Using a genetic algorithm, the authors give here an account of the procedure to obtain the best values for the parameters necessary to fit the efficiency for a X-ray detector. The values for some variables involved in quantitative PIXE analysis, were manipulated in a similar way as the genetic information is treated in a biological process. The authors carried out the algorithm until they reproduce, within the confidence interval, the elemental concentrations corresponding to a reference material

  3. A neuro-fuzzy inference system for sensor failure detection using wavelet denoising, PCA and SPRT

    International Nuclear Information System (INIS)

    Na, Man Gyun

    2001-01-01

    In this work, a neuro-fuzzy inference system combined with the wavelet denoising, PCA(principal component analysis) and SPRT (sequential probability ratio test) methods is developed to detect the relevant sensor failure using other sensor signals. The wavelet denoising technique is applied to remove noise components in input signals into the neuro-fuzzy system. The PCA is used to reduce the dimension of an input space without losing a significant amount of information, The PCA makes easy the selection of the input signals into the neuro-fuzzy system. Also, a lower dimensional input space usually reduces the time necessary to train a neuro-fuzzy system. The parameters of the neuro-fuzzy inference system which estimates the relevant sensor signal are optimized by a genetic algorithm and a least-squares algorithm. The residuals between the estimated signals and the measured signals are used to detect whether the sensors are failed or not. The SPRT is used in this failure detection algorithm. The proposed sensor-monitoring algorithm was verified through applications to the pressurizer water level and the hot-leg flowrate sensors in pressurized water reactors

  4. Multimodel inference and adaptive management

    Science.gov (United States)

    Rehme, S.E.; Powell, L.A.; Allen, Craig R.

    2011-01-01

    Ecology is an inherently complex science coping with correlated variables, nonlinear interactions and multiple scales of pattern and process, making it difficult for experiments to result in clear, strong inference. Natural resource managers, policy makers, and stakeholders rely on science to provide timely and accurate management recommendations. However, the time necessary to untangle the complexities of interactions within ecosystems is often far greater than the time available to make management decisions. One method of coping with this problem is multimodel inference. Multimodel inference assesses uncertainty by calculating likelihoods among multiple competing hypotheses, but multimodel inference results are often equivocal. Despite this, there may be pressure for ecologists to provide management recommendations regardless of the strength of their study’s inference. We reviewed papers in the Journal of Wildlife Management (JWM) and the journal Conservation Biology (CB) to quantify the prevalence of multimodel inference approaches, the resulting inference (weak versus strong), and how authors dealt with the uncertainty. Thirty-eight percent and 14%, respectively, of articles in the JWM and CB used multimodel inference approaches. Strong inference was rarely observed, with only 7% of JWM and 20% of CB articles resulting in strong inference. We found the majority of weak inference papers in both journals (59%) gave specific management recommendations. Model selection uncertainty was ignored in most recommendations for management. We suggest that adaptive management is an ideal method to resolve uncertainty when research results in weak inference.

  5. Input data for inferring species distributions in Kyphosidae world-wide

    Directory of Open Access Journals (Sweden)

    Steen Wilhelm Knudsen

    2016-09-01

    Full Text Available Input data files for inferring the relationship among the family Kyphosidae, as presented in (Knudsen and Clements, 2016 [1], is here provided together with resulting topologies, to allow the reader to explore the topologies in detail. The input data files comprise seven nexus-files with sequence alignments of mtDNA and nDNA markers for performing Bayesian analysis. A matrix of recoded character states inferred from the morphology examined in museum specimens representing Dichistiidae, Girellidae, Kyphosidae, Microcanthidae and Scorpididae, is also provided, and can be used for performing a parsimonious analysis to infer the relationship among these perciform families. The nucleotide input data files comprise both multiple and single representatives of the various species to allow for inference of the relationship among the species in Kyphosidae and between the families closely related to Kyphosidae. The ‘.xml’-files with various constrained relationships among the families potentially closely related to Kyphosidae are also provided to allow the reader to rerun and explore the results from the stepping-stone analysis. The resulting topologies are supplied in newick-file formats together with input data files for Bayesian analysis, together with ‘.xml’-files. Re-running the input data files in the appropriate software, will enable the reader to examine log-files and tree-files themselves. Keywords: Sea chub, Drummer, Kyphosus, Scorpis, Girella

  6. Exploring Relationships Among Belief in Genetic Determinism, Genetics Knowledge, and Social Factors

    Science.gov (United States)

    Gericke, Niklas; Carver, Rebecca; Castéra, Jérémy; Evangelista, Neima Alice Menezes; Marre, Claire Coiffard; El-Hani, Charbel N.

    2017-12-01

    Genetic determinism can be described as the attribution of the formation of traits to genes, where genes are ascribed more causal power than what scientific consensus suggests. Belief in genetic determinism is an educational problem because it contradicts scientific knowledge, and is a societal problem because it has the potential to foster intolerant attitudes such as racism and prejudice against sexual orientation. In this article, we begin by investigating the very nature of belief in genetic determinism. Then, we investigate whether knowledge of genetics and genomics is associated with beliefs in genetic determinism. Finally, we explore the extent to which social factors such as gender, education, and religiosity are associated with genetic determinism. Methodologically, we gathered and analyzed data on beliefs in genetic determinism, knowledge of genetics and genomics, and social variables using the "Public Understanding and Attitudes towards Genetics and Genomics" (PUGGS) instrument. Our analyses of PUGGS responses from a sample of Brazilian university freshmen undergraduates indicated that (1) belief in genetic determinism was best characterized as a construct built up by two dimensions or belief systems: beliefs concerning social traits and beliefs concerning biological traits; (2) levels of belief in genetic determination of social traits were low, which contradicts prior work; (3) associations between knowledge of genetics and genomics and levels of belief in genetic determinism were low; and (4) social factors such as age and religiosity had stronger associations with beliefs in genetic determinism than knowledge. Although our study design precludes causal inferences, our results raise questions about whether enhancing genetic literacy will decrease or prevent beliefs in genetic determinism.

  7. Rich analysis and rational models: Inferring individual behavior from infant looking data

    Science.gov (United States)

    Piantadosi, Steven T.; Kidd, Celeste; Aslin, Richard

    2013-01-01

    Studies of infant looking times over the past 50 years have provided profound insights about cognitive development, but their dependent measures and analytic techniques are quite limited. In the context of infants' attention to discrete sequential events, we show how a Bayesian data analysis approach can be combined with a rational cognitive model to create a rich data analysis framework for infant looking times. We formalize (i) a statistical learning model (ii) a parametric linking between the learning model's beliefs and infants' looking behavior, and (iii) a data analysis model that infers parameters of the cognitive model and linking function for groups and individuals. Using this approach, we show that recent findings from Kidd, Piantadosi, and Aslin (2012) of a U-shaped relationship between look-away probability and stimulus complexity even holds within infants and is not due to averaging subjects with different types of behavior. Our results indicate that individual infants prefer stimuli of intermediate complexity, reserving attention for events that are moderately predictable given their probabilistic expectations about the world. PMID:24750256

  8. Genetic analysis of Myanmar Vigna species in responses to salt ...

    African Journals Online (AJOL)

    Genetic analysis of Myanmar Vigna species in responses to salt stress at the ... of reduction was highly dependent on different genotypes and salinity levels. ... the mechanism of salt tolerance and for the provision of genetic resources for ...

  9. Ground Field-Based Hyperspectral Imaging: A Preliminary Study to Assess the Potential of Established Vegetation Indices to Infer Variation in Water-Use Efficiency.

    Science.gov (United States)

    Pelech, E. A.; McGrath, J.; Pederson, T.; Bernacchi, C.

    2017-12-01

    Increases in the global average temperature will consequently induce a higher occurrence of severe environmental conditions such as drought on arable land. To mitigate these threats, crops for fuel and food must be bred for higher water-use efficiencies (WUE). Defining genomic variation through high-throughput phenotypic analysis in field conditions has the potential to relieve the major bottleneck in linking desirable genetic traits to the associated phenotypic response. This can subsequently enable breeders to create new agricultural germplasm that supports the need for higher water-use efficient crops. From satellites to field-based aerial and ground sensors, the reflectance properties of vegetation measured by hyperspectral imaging is becoming a rapid high-throughput phenotyping technique. A variety of physiological traits can be inferred by regression analysis with leaf reflectance which is controlled by the properties and abundance of water, carbon, nitrogen and pigments. Although, given that the current established vegetation indices are designed to accentuate these properties from spectral reflectance, it becomes a challenge to infer relative measurements of WUE at a crop canopy scale without ground-truth data collection. This study aims to correlate established biomass and canopy-water-content indices with ground-truth data. Five bioenergy sorghum genotypes (Sorghum bicolor L. Moench) that have differences in WUE and wild-type Tobacco (Nicotiana tabacum var. Samsun) under irrigated and rainfed field conditions were examined. A linear regression analysis was conducted to determine if variation in canopy water content and biomass, driven by natural genotypic and artificial treatment influences, can be inferred using established vegetation indices. The results from this study will elucidate the ability of ground field-based hyperspectral imaging to assess variation in water content, biomass and water-use efficiency. This can lead to improved opportunities to

  10. Long term human impacts on genetic structure of Italian walnut inferred by SSR markers

    Science.gov (United States)

    Paola Pollegioni; Keith Woeste; Irene Olimpieri; Danilo Marandola; Francesco Cannata; Maria E Malvolti

    2011-01-01

    Life history traits, historic factors, and human activities can all shape the genetic diversity of a species. In Italy, walnut (Juglans regia L.) has a long history of cultivation both for wood and edible nuts. To better understand the genetic variability of current Italian walnut resources, we analyzed the relationships among the genetic structure...

  11. Genetic variation of temperature-regulated curd induction in cauliflower: elucidation of floral transition by genome-wide association mapping and gene expression analysis

    Science.gov (United States)

    Matschegewski, Claudia; Zetzsche, Holger; Hasan, Yaser; Leibeguth, Lena; Briggs, William; Ordon, Frank; Uptmoor, Ralf

    2015-01-01

    Cauliflower (Brassica oleracea var. botrytis) is a vernalization-responsive crop. High ambient temperatures delay harvest time. The elucidation of the genetic regulation of floral transition is highly interesting for a precise harvest scheduling and to ensure stable market supply. This study aims at genetic dissection of temperature-dependent curd induction in cauliflower by genome-wide association studies and gene expression analysis. To assess temperature-dependent curd induction, two greenhouse trials under distinct temperature regimes were conducted on a diversity panel consisting of 111 cauliflower commercial parent lines, genotyped with 14,385 SNPs. Broad phenotypic variation and high heritability (0.93) were observed for temperature-related curd induction within the cauliflower population. GWA mapping identified a total of 18 QTL localized on chromosomes O1, O2, O3, O4, O6, O8, and O9 for curding time under two distinct temperature regimes. Among those, several QTL are localized within regions of promising candidate flowering genes. Inferring population structure and genetic relatedness among the diversity set assigned three main genetic clusters. Linkage disequilibrium (LD) patterns estimated global LD extent of r2 = 0.06 and a maximum physical distance of 400 kb for genetic linkage. Transcriptional profiling of flowering genes FLOWERING LOCUS C (BoFLC) and VERNALIZATION 2 (BoVRN2) was performed, showing increased expression levels of BoVRN2 in genotypes with faster curding. However, functional relevance of BoVRN2 and BoFLC2 could not consistently be supported, which probably suggests to act facultative and/or might evidence for BoVRN2/BoFLC-independent mechanisms in temperature-regulated floral transition in cauliflower. Genetic insights in temperature-regulated curd induction can underpin genetically informed phenology models and benefit molecular breeding strategies toward the development of thermo-tolerant cultivars. PMID:26442034

  12. Genetic analysis for grain quality traits in pakistani wheat varieties

    International Nuclear Information System (INIS)

    Minhas, N.M.; Ajmal, S.U.; Iqbal, Z.; Munir, M.

    2014-01-01

    A set of eight parental diallel involving seven commercial wheat cultivars and one breeding line was made to investigate the nature of gene action determining inheritance pattern of grain quality characters. Highly significant differences were observed among the genotypes for 1000 grain weight, protein content, wet gluten and lysine content. Adequacy tests were employed to estimate the fitness of data sets to additive dominance model. Both the tests i.e. analysis of uniformity of Wr, Vr and joint regression analysis validated the data of these traits for genetic analysis. Gene actions for grain quality traits were ascertained following Hayman's analysis of variance. Results of the genetic analysis revealed that both additive and dominance genetic components were involved in the manifestation of characters under study. However, additive gene effects were more pronounced in the genetic control of these traits. Non significance of b1, b2 and b3 values revealed the absence of directional dominance, symmetrical distribution of genes among the parental lines and absence of specific genes action respectively in all the traits. Maternal effects were also noted in 1000 grain weight, protein content and wet gluten percentage. It is concluded that additive effects are crucial in the expression of grain quality characters of wheat in germplasm under study and single plant selection may be recommended in segregating generations for effective improvement in these characters. (author)

  13. Inference of Cell Mechanics in Heterogeneous Epithelial Tissue Based on Multivariate Clone Shape Quantification

    Science.gov (United States)

    Tsuboi, Alice; Umetsu, Daiki; Kuranaga, Erina; Fujimoto, Koichi

    2017-01-01

    Cell populations in multicellular organisms show genetic and non-genetic heterogeneity, even in undifferentiated tissues of multipotent cells during development and tumorigenesis. The heterogeneity causes difference of mechanical properties, such as, cell bond tension or adhesion, at the cell–cell interface, which determine the shape of clonal population boundaries via cell sorting or mixing. The boundary shape could alter the degree of cell–cell contacts and thus influence the physiological consequences of sorting or mixing at the boundary (e.g., tumor suppression or progression), suggesting that the cell mechanics could help clarify the physiology of heterogeneous tissues. While precise inference of mechanical tension loaded at each cell–cell contacts has been extensively developed, there has been little progress on how to distinguish the population-boundary geometry and identify the cause of geometry in heterogeneous tissues. We developed a pipeline by combining multivariate analysis of clone shape with tissue mechanical simulations. We examined clones with four different genotypes within Drosophila wing imaginal discs: wild-type, tartan (trn) overexpression, hibris (hbs) overexpression, and Eph RNAi. Although the clones were previously known to exhibit smoothed or convoluted morphologies, their mechanical properties were unknown. By applying a multivariate analysis to multiple criteria used to quantify the clone shapes based on individual cell shapes, we found the optimal criteria to distinguish not only among the four genotypes, but also non-genetic heterogeneity from genetic one. The efficient segregation of clone shape enabled us to quantitatively compare experimental data with tissue mechanical simulations. As a result, we identified the mechanical basis contributed to clone shape of distinct genotypes. The present pipeline will promote the understanding of the functions of mechanical interactions in heterogeneous tissue in a non-invasive manner. PMID

  14. Genetic data analysis for plant and animal breeding

    Science.gov (United States)

    This book is an advanced textbook covering the application of quantitative genetics theory to analysis of actual data (both trait and DNA marker information) for breeding populations of crops, trees, and animals. Chapter 1 is an introduction to basic software used for trait data analysis. Chapter 2 ...

  15. On coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative traits

    Directory of Open Access Journals (Sweden)

    Wang Tao

    2011-09-01

    Full Text Available Abstract Background In genetic association study of quantitative traits using F∞ models, how to code the marker genotypes and interpret the model parameters appropriately is important for constructing hypothesis tests and making statistical inferences. Currently, the coding of marker genotypes in building F∞ models has mainly focused on the biallelic case. A thorough work on the coding of marker genotypes and interpretation of model parameters for F∞ models is needed especially for genetic markers with multiple alleles. Results In this study, we will formulate F∞ genetic models under various regression model frameworks and introduce three genotype coding schemes for genetic markers with multiple alleles. Starting from an allele-based modeling strategy, we first describe a regression framework to model the expected genotypic values at given markers. Then, as extension from the biallelic case, we introduce three coding schemes for constructing fully parameterized one-locus F∞ models and discuss the relationships between the model parameters and the expected genotypic values. Next, under a simplified modeling framework for the expected genotypic values, we consider several reduced one-locus F∞ models from the three coding schemes on the estimability and interpretation of their model parameters. Finally, we explore some extensions of the one-locus F∞ models to two loci. Several fully parameterized as well as reduced two-locus F∞ models are addressed. Conclusions The genotype coding schemes provide different ways to construct F∞ models for association testing of multi-allele genetic markers with quantitative traits. Which coding scheme should be applied depends on how convenient it can provide the statistical inferences on the parameters of our research interests. Based on these F∞ models, the standard regression model fitting tools can be used to estimate and test for various genetic effects through statistical contrasts with the

  16. Variable-number-of-tandem-repeats analysis of genetic diversity in Pasteuria ramosa.

    Science.gov (United States)

    Mouton, L; Ebert, D

    2008-05-01

    Variable-number-of-tandem-repeats (VNTR) markers are increasingly being used in population genetic studies of bacteria. They were recently developed for Pasteuria ramosa, an endobacterium that infects Daphnia species. In the present study, we genotyped P. ramosa in 18 infected hosts from the United Kingdom, Belgium, and two lakes in the United States using seven VNTR markers. Two Daphnia species were collected: D. magna and D. dentifera. Six loci showed length polymorphism, with as many as five alleles identified for a single locus. Similarity coefficient calculations showed that the extent of genetic variation between pairs of isolates within populations differed according to the population, but it was always less than the genetic distances among populations. Analysis of the genetic distances performed using principal component analysis revealed strong clustering by location of origin, but not by host Daphnia species. Our study demonstrated that the VNTR markers available for P. ramosa are informative in revealing genetic differences within and among populations and may therefore become an important tool for providing detailed analysis of population genetics and epidemiology.

  17. Identifying significant genetic regulatory networks in the prostate cancer from microarray data based on transcription factor analysis and conditional independency

    Directory of Open Access Journals (Sweden)

    Yeh Cheng-Yu

    2009-12-01

    Full Text Available Abstract Background Prostate cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. According to the clinical heterogeneity, prostate cancer displays different stages and grades related to the aggressive metastasis disease. Although numerous studies used microarray analysis and traditional clustering method to identify the individual genes during the disease processes, the important gene regulations remain unclear. We present a computational method for inferring genetic regulatory networks from micorarray data automatically with transcription factor analysis and conditional independence testing to explore the potential significant gene regulatory networks that are correlated with cancer, tumor grade and stage in the prostate cancer. Results To deal with missing values in microarray data, we used a K-nearest-neighbors (KNN algorithm to determine the precise expression values. We applied web services technology to wrap the bioinformatics toolkits and databases to automatically extract the promoter regions of DNA sequences and predicted the transcription factors that regulate the gene expressions. We adopt the microarray datasets consists of 62 primary tumors, 41 normal prostate tissues from Stanford Microarray Database (SMD as a target dataset to evaluate our method. The predicted results showed that the possible biomarker genes related to cancer and denoted the androgen functions and processes may be in the development of the prostate cancer and promote the cell death in cell cycle. Our predicted results showed that sub-networks of genes SREBF1, STAT6 and PBX1 are strongly related to a high extent while ETS transcription factors ELK1, JUN and EGR2 are related to a low extent. Gene SLC22A3 may explain clinically the differentiation associated with the high grade cancer compared with low grade cancer. Enhancer of Zeste Homolg 2 (EZH2 regulated by RUNX1 and STAT3 is correlated to the pathological stage

  18. Identifying significant genetic regulatory networks in the prostate cancer from microarray data based on transcription factor analysis and conditional independency.

    Science.gov (United States)

    Yeh, Hsiang-Yuan; Cheng, Shih-Wu; Lin, Yu-Chun; Yeh, Cheng-Yu; Lin, Shih-Fang; Soo, Von-Wun

    2009-12-21

    Prostate cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. According to the clinical heterogeneity, prostate cancer displays different stages and grades related to the aggressive metastasis disease. Although numerous studies used microarray analysis and traditional clustering method to identify the individual genes during the disease processes, the important gene regulations remain unclear. We present a computational method for inferring genetic regulatory networks from micorarray data automatically with transcription factor analysis and conditional independence testing to explore the potential significant gene regulatory networks that are correlated with cancer, tumor grade and stage in the prostate cancer. To deal with missing values in microarray data, we used a K-nearest-neighbors (KNN) algorithm to determine the precise expression values. We applied web services technology to wrap the bioinformatics toolkits and databases to automatically extract the promoter regions of DNA sequences and predicted the transcription factors that regulate the gene expressions. We adopt the microarray datasets consists of 62 primary tumors, 41 normal prostate tissues from Stanford Microarray Database (SMD) as a target dataset to evaluate our method. The predicted results showed that the possible biomarker genes related to cancer and denoted the androgen functions and processes may be in the development of the prostate cancer and promote the cell death in cell cycle. Our predicted results showed that sub-networks of genes SREBF1, STAT6 and PBX1 are strongly related to a high extent while ETS transcription factors ELK1, JUN and EGR2 are related to a low extent. Gene SLC22A3 may explain clinically the differentiation associated with the high grade cancer compared with low grade cancer. Enhancer of Zeste Homolg 2 (EZH2) regulated by RUNX1 and STAT3 is correlated to the pathological stage. We provide a computational framework to reconstruct

  19. A roadmap for the genetic analysis of renal aging.

    Science.gov (United States)

    Noordmans, Gerda A; Hillebrands, Jan-Luuk; van Goor, Harry; Korstanje, Ron

    2015-10-01

    Several studies show evidence for the genetic basis of renal disease, which renders some individuals more prone than others to accelerated renal aging. Studying the genetics of renal aging can help us to identify genes involved in this process and to unravel the underlying pathways. First, this opinion article will give an overview of the phenotypes that can be observed in age-related kidney disease. Accurate phenotyping is essential in performing genetic analysis. For kidney aging, this could include both functional and structural changes. Subsequently, this article reviews the studies that report on candidate genes associated with renal aging in humans and mice. Several loci or candidate genes have been found associated with kidney disease, but identification of the specific genetic variants involved has proven to be difficult. CUBN, UMOD, and SHROOM3 were identified by human GWAS as being associated with albuminuria, kidney function, and chronic kidney disease (CKD). These are promising examples of genes that could be involved in renal aging, and were further mechanistically evaluated in animal models. Eventually, we will provide approaches for performing genetic analysis. We should leverage the power of mouse models, as testing in humans is limited. Mouse and other animal models can be used to explain the underlying biological mechanisms of genes and loci identified by human GWAS. Furthermore, mouse models can be used to identify genetic variants associated with age-associated histological changes, of which Far2, Wisp2, and Esrrg are examples. A new outbred mouse population with high genetic diversity will facilitate the identification of genes associated with renal aging by enabling high-resolution genetic mapping while also allowing the control of environmental factors, and by enabling access to renal tissues at specific time points for histology, proteomics, and gene expression. © 2015 The Authors. Aging Cell published by the Anatomical Society and John

  20. A roadmap for the genetic analysis of renal aging

    Science.gov (United States)

    Noordmans, Gerda A; Hillebrands, Jan-Luuk; van Goor, Harry; Korstanje, Ron

    2015-01-01

    Several studies show evidence for the genetic basis of renal disease, which renders some individuals more prone than others to accelerated renal aging. Studying the genetics of renal aging can help us to identify genes involved in this process and to unravel the underlying pathways. First, this opinion article will give an overview of the phenotypes that can be observed in age-related kidney disease. Accurate phenotyping is essential in performing genetic analysis. For kidney aging, this could include both functional and structural changes. Subsequently, this article reviews the studies that report on candidate genes associated with renal aging in humans and mice. Several loci or candidate genes have been found associated with kidney disease, but identification of the specific genetic variants involved has proven to be difficult. CUBN, UMOD, and SHROOM3 were identified by human GWAS as being associated with albuminuria, kidney function, and chronic kidney disease (CKD). These are promising examples of genes that could be involved in renal aging, and were further mechanistically evaluated in animal models. Eventually, we will provide approaches for performing genetic analysis. We should leverage the power of mouse models, as testing in humans is limited. Mouse and other animal models can be used to explain the underlying biological mechanisms of genes and loci identified by human GWAS. Furthermore, mouse models can be used to identify genetic variants associated with age-associated histological changes, of which Far2, Wisp2, and Esrrg are examples. A new outbred mouse population with high genetic diversity will facilitate the identification of genes associated with renal aging by enabling high-resolution genetic mapping while also allowing the control of environmental factors, and by enabling access to renal tissues at specific time points for histology, proteomics, and gene expression. PMID:26219736

  1. Optimal inference with suboptimal models: Addiction and active Bayesian inference

    Science.gov (United States)

    Schwartenbeck, Philipp; FitzGerald, Thomas H.B.; Mathys, Christoph; Dolan, Ray; Wurst, Friedrich; Kronbichler, Martin; Friston, Karl

    2015-01-01

    When casting behaviour as active (Bayesian) inference, optimal inference is defined with respect to an agent’s beliefs – based on its generative model of the world. This contrasts with normative accounts of choice behaviour, in which optimal actions are considered in relation to the true structure of the environment – as opposed to the agent’s beliefs about worldly states (or the task). This distinction shifts an understanding of suboptimal or pathological behaviour away from aberrant inference as such, to understanding the prior beliefs of a subject that cause them to behave less ‘optimally’ than our prior beliefs suggest they should behave. Put simply, suboptimal or pathological behaviour does not speak against understanding behaviour in terms of (Bayes optimal) inference, but rather calls for a more refined understanding of the subject’s generative model upon which their (optimal) Bayesian inference is based. Here, we discuss this fundamental distinction and its implications for understanding optimality, bounded rationality and pathological (choice) behaviour. We illustrate our argument using addictive choice behaviour in a recently described ‘limited offer’ task. Our simulations of pathological choices and addictive behaviour also generate some clear hypotheses, which we hope to pursue in ongoing empirical work. PMID:25561321

  2. Computer models and the evidence of anthropogenic climate change: An epistemology of variety-of-evidence inferences and robustness analysis.

    Science.gov (United States)

    Vezér, Martin A

    2016-04-01

    To study climate change, scientists employ computer models, which approximate target systems with various levels of skill. Given the imperfection of climate models, how do scientists use simulations to generate knowledge about the causes of observed climate change? Addressing a similar question in the context of biological modelling, Levins (1966) proposed an account grounded in robustness analysis. Recent philosophical discussions dispute the confirmatory power of robustness, raising the question of how the results of computer modelling studies contribute to the body of evidence supporting hypotheses about climate change. Expanding on Staley's (2004) distinction between evidential strength and security, and Lloyd's (2015) argument connecting variety-of-evidence inferences and robustness analysis, I address this question with respect to recent challenges to the epistemology robustness analysis. Applying this epistemology to case studies of climate change, I argue that, despite imperfections in climate models, and epistemic constraints on variety-of-evidence reasoning and robustness analysis, this framework accounts for the strength and security of evidence supporting climatological inferences, including the finding that global warming is occurring and its primary causes are anthropogenic. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Genetic subdivision and biogeography of the Danubian rheophilic barb Barbus peteney inferred from phylogenetic analysis of mitochondrial DNA variation

    Czech Academy of Sciences Publication Activity Database

    Kotlík, Petr; Berrebi, P.

    2002-01-01

    Roč. 24, - (2002), s. 10-18 ISSN 1055-7903 R&D Projects: GA AV ČR IBS5045111; GA AV ČR KSK6005114 Keywords : biogeography * phylogeogrpahy * mtDNA Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 2.590, year: 2002

  4. Inference rule and problem solving

    Energy Technology Data Exchange (ETDEWEB)

    Goto, S

    1982-04-01

    Intelligent information processing signifies an opportunity of having man's intellectual activity executed on the computer, in which inference, in place of ordinary calculation, is used as the basic operational mechanism for such an information processing. Many inference rules are derived from syllogisms in formal logic. The problem of programming this inference function is referred to as a problem solving. Although logically inference and problem-solving are in close relation, the calculation ability of current computers is on a low level for inferring. For clarifying the relation between inference and computers, nonmonotonic logic has been considered. The paper deals with the above topics. 16 references.

  5. Polyglot programming in applications used for genetic data analysis.

    Science.gov (United States)

    Nowak, Robert M

    2014-01-01

    Applications used for the analysis of genetic data process large volumes of data with complex algorithms. High performance, flexibility, and a user interface with a web browser are required by these solutions, which can be achieved by using multiple programming languages. In this study, I developed a freely available framework for building software to analyze genetic data, which uses C++, Python, JavaScript, and several libraries. This system was used to build a number of genetic data processing applications and it reduced the time and costs of development.

  6. Differential effects of historical migration, glaciations and human impact on the genetic structure and diversity of the mountain pasture weed Veratrum album L

    DEFF Research Database (Denmark)

    Treier, Urs; Müller-Schärer, H.

    2011-01-01

    Aim  Today’s genetic population structure and diversity of species can be understood as the result of range expansion from the area of origin, past climatic oscillations and contemporary processes. We examined the relative importance of these factors in Veratrum album L., a toxic weed of mountain...... grasslands. Location  Continental Europe. Methods  Forty populations from the Asian border (Urals and Caucasus) to Portugal were studied using amplified fragment length polymorphisms (AFLPs) combined with selected plant and population measures. The data were analysed with phylogenetic, population genetic...... and regression methods inferring both genetic structure and diversity from geographic and ecological factors. Results  Fragment frequency clines together with genetic distance clustering and principal coordinates analysis indicated an east–west direction in the genetic structure of V. album, suggesting ancient...

  7. Crater Lake Apoyo Revisited - Population Genetics of an Emerging Species Flock

    Science.gov (United States)

    Geiger, Matthias F.; McCrary, Jeffrey K.; Schliewen, Ulrich K.

    2013-01-01

    The polytypic Nicaraguan Midas cichlids ( Amphilophus cf. citrinellus) have been established as a model system for studying the mechanisms of speciation and patterns of diversification in allopatry and sympatry. The species assemblage in Crater Lake Apoyo has been accepted as a textbook example for sympatric speciation. Here, we present a first comprehensive data set of population genetic (mtDNA & AFLPs) proxies of species level differentiation for a representative set of individuals of all six endemic Amphilophus species occurring in Crater Lake Apoyo. AFLP genetic differentiation was partitioned into a neutral and non-neutral component based on outlier-loci detection approaches, and patterns of species divergence were explored with Bayesian clustering methods. Substantial levels of admixture between species were detected, indicating different levels of reproductive isolation between the six species. Analysis of neutral genetic variation revealed several A . zaliosus as being introgressed by an unknown contributor, hereby rendering the sympatrically evolving L. Apoyo flock polyphyletic. This is contrasted by the mtDNA analysis delivering a clear monophyly signal with Crater Lake Apoyo private haplotypes characterising all six described species, but also demonstrating different demographic histories as inferred from pairwise mismatch distributions. PMID:24086393

  8. GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research--an update.

    Science.gov (United States)

    Peakall, Rod; Smouse, Peter E

    2012-10-01

    GenAlEx: Genetic Analysis in Excel is a cross-platform package for population genetic analyses that runs within Microsoft Excel. GenAlEx offers analysis of diploid codominant, haploid and binary genetic loci and DNA sequences. Both frequency-based (F-statistics, heterozygosity, HWE, population assignment, relatedness) and distance-based (AMOVA, PCoA, Mantel tests, multivariate spatial autocorrelation) analyses are provided. New features include calculation of new estimators of population structure: G'(ST), G''(ST), Jost's D(est) and F'(ST) through AMOVA, Shannon Information analysis, linkage disequilibrium analysis for biallelic data and novel heterogeneity tests for spatial autocorrelation analysis. Export to more than 30 other data formats is provided. Teaching tutorials and expanded step-by-step output options are included. The comprehensive guide has been fully revised. GenAlEx is written in VBA and provided as a Microsoft Excel Add-in (compatible with Excel 2003, 2007, 2010 on PC; Excel 2004, 2011 on Macintosh). GenAlEx, and supporting documentation and tutorials are freely available at: http://biology.anu.edu.au/GenAlEx. rod.peakall@anu.edu.au.

  9. A longitudinal genetic survey identifies temporal shifts in the population structure of Dutch house sparrows

    Science.gov (United States)

    Cousseau, L; Husemann, M; Foppen, R; Vangestel, C; Lens, L

    2016-01-01

    Dutch house sparrow (Passer domesticus) densities dropped by nearly 50% since the early 1980s, and similar collapses in population sizes have been reported across Europe. Whether, and to what extent, such relatively recent demographic changes are accompanied by concomitant shifts in the genetic population structure of this species needs further investigation. Therefore, we here explore temporal shifts in genetic diversity, genetic structure and effective sizes of seven Dutch house sparrow populations. To allow the most powerful statistical inference, historical populations were resampled at identical locations and each individual bird was genotyped using nine polymorphic microsatellites. Although the demographic history was not reflected by a reduction in genetic diversity, levels of genetic differentiation increased over time, and the original, panmictic population (inferred from the museum samples) diverged into two distinct genetic clusters. Reductions in census size were supported by a substantial reduction in effective population size, although to a smaller extent. As most studies of contemporary house sparrow populations have been unable to identify genetic signatures of recent population declines, results of this study underpin the importance of longitudinal genetic surveys to unravel cryptic genetic patterns. PMID:27273323

  10. On statistical inference in time series analysis of the evolution of road safety.

    Science.gov (United States)

    Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora

    2013-11-01

    Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research. Copyright © 2012 Elsevier Ltd. All rights reserved.

  11. Knowledge and inference

    CERN Document Server

    Nagao, Makoto

    1990-01-01

    Knowledge and Inference discusses an important problem for software systems: How do we treat knowledge and ideas on a computer and how do we use inference to solve problems on a computer? The book talks about the problems of knowledge and inference for the purpose of merging artificial intelligence and library science. The book begins by clarifying the concept of """"knowledge"""" from many points of view, followed by a chapter on the current state of library science and the place of artificial intelligence in library science. Subsequent chapters cover central topics in the artificial intellig

  12. Patterns of genetic diversity of local pig populations in the State of Pernambuco, Brazil

    Directory of Open Access Journals (Sweden)

    Elizabete Cristina da Silva

    2011-08-01

    Full Text Available This study estimated the genetic diversity and structure of 12 genetic groups (GG of locally adapted and specialized pigs in the state of Pernambuco using 22 microsatellite markers. Nine locally adapted breeds (Baé, Caruncho, Canastra, Canastrão, Mamelado, Moura, Nilo, Piau and UDB (Undefined Breed and 3 specialized breeds (Duroc, Landrace and Large White, totaling 190 animals, were analyzed. The Analysis of Molecular Variance (AMOVA showed that 3.2% of the total variation was due to differences between genetic groups, and 3.6% to differences between local and commercial pigs. One hundred and ninety eight alleles were identified and apart from the Large White breed, all GG presented Hardy-Weinberg Equilibrium deviations for some loci. The total and effective allele means were lower for Duroc (3.65 and 3.01 and higher for UDB (8.89 and 4.53 and Canastra (8.61 and 4.58. Using Nei's standard genetic distance and the UPGMA method, it was possible to observe that the Landrace breed was grouped with the local genetic groups Canastra, Moura, Canastrão, Baé and Caruncho. Due to the complex admixture pattern, the genetic variability of the 12 genetic groups can be analyzed by distributing the individuals into two populations as demonstrated by a Bayesian analysis, corroborating the results from AMOVA, which revealed a low level of genetic differentiation between the inferred populations.

  13. Genetic diversity in two Japanese flounder populations from China seas inferred using microsatellite markers and COI sequences

    Science.gov (United States)

    Xu, Dongdong; Li, Sanlei; Lou, Bao; Zhang, Yurong; Zhan, Wei; Shi, Huilai

    2012-07-01

    Japanese flounder is one of the most important commercial species in China; however, information on the genetic background of natural populations in China seas is scarce. The lack of genetic data has hampered fishery management and aquaculture development programs for this species. In the present study, we have analyzed the genetic diversity in natural populations of Japanese flounder sampled from the Yellow Sea (Qingdao population, QD) and East China Sea (Zhoushan population, ZS) using 10 polymorphic microsatellite loci and cytochrome c oxidase subunit I (COI) sequencing data. A total of 68 different alleles were observed over 10 microsatellite loci. The total number of alleles per locus ranged from 2 to 9, and the number of genotypes per locus ranged from 3 to 45. The observed heterozygosity and expected heterozygosity in QD were 0.733 and 0.779, respectively, and in ZS the heterozygosity values were 0.708 and 0.783, respectively. Significant departures from Hardy-Weinberg equilibrium were observed in 7 of the 10 microsatellite loci in each of the two populations. The COI sequencing analysis revealed 25 polymorphic sites and 15 haplotypes in the two populations. The haplotype diversity and nucleotide diversity in the QD population were 0.746±0.072 8 and 0.003 34±0.001 03 respectively, and in ZS population the genetic diversity values were 0.712±0.047 0 and 0.003 18±0.000 49, respectively. The microsatellite data ( F st =0.048 7, P <0.001) and mitochondrial DNA data ( F st =0.128, P <0.001) both revealed significant genetic differentiation between the two populations. The information on the genetic variation and differentiation in Japanese flounder obtained in this study could be used to set up suitable guidelines for the management and conservation of this species, as well as for managing artificial selection programs. In future studies, more geographically diverse stocks should be used to obtain a deeper understanding of the population structure of Japanese

  14. A Continuous Correlated Beta Process Model for Genetic Ancestry in Admixed Populations.

    Science.gov (United States)

    Gompert, Zachariah

    2016-01-01

    Admixture and recombination create populations and genomes with genetic ancestry from multiple source populations. Analyses of genetic ancestry in admixed populations are relevant for trait and disease mapping, studies of speciation, and conservation efforts. Consequently, many methods have been developed to infer genome-average ancestry and to deconvolute ancestry into continuous local ancestry blocks or tracts within individuals. Current methods for local ancestry inference perform well when admixture occurred recently or hybridization is ongoing, or when admixture occurred in the distant past such that local ancestry blocks have fixed in the admixed population. However, methods to infer local ancestry frequencies in isolated admixed populations still segregating for ancestry do not exist. In the current paper, I develop and test a continuous correlated beta process model to fill this analytical gap. The method explicitly models autocorrelations in ancestry frequencies at the population-level and uses discriminant analysis of SNP windows to take advantage of ancestry blocks within individuals. Analyses of simulated data sets show that the method is generally accurate such that ancestry frequency estimates exhibited low root-mean-square error and were highly correlated with the true values, particularly when large (±10 or ±20) SNP windows were used. Along these lines, the proposed method outperformed post hoc inference of ancestry frequencies from a traditional hidden Markov model (i.e., the linkage model in structure), particularly when admixture occurred more distantly in the past with little on-going gene flow or was followed by natural selection. The reliability and utility of the method was further assessed by analyzing genetic ancestry in an admixed human population (Uyghur) and three populations from a hybrid zone between Mus domesticus and M. musculus. Considerable variation in ancestry frequencies was detected within and among chromosomes in the Uyghur

  15. Pitfalls in genetic analysis of pheochromocytomas/paragangliomas-case report.

    Science.gov (United States)

    Canu, Letizia; Rapizzi, Elena; Zampetti, Benedetta; Fucci, Rossella; Nesi, Gabriella; Richter, Susan; Qin, Nan; Giachè, Valentino; Bergamini, Carlo; Parenti, Gabriele; Valeri, Andrea; Ercolino, Tonino; Eisenhofer, Graeme; Mannelli, Massimo

    2014-07-01

    About 35% of patients with pheochromocytoma/paraganglioma carry a germline mutation in one of the 10 main susceptibility genes. The recent introduction of next-generation sequencing will allow the analysis of all these genes in one run. When positive, the analysis is generally unequivocal due to the association between a germline mutation and a concordant clinical presentation or positive family history. When genetic analysis reveals a novel mutation with no clinical correlates, particularly in the presence of a missense variant, the question arises whether the mutation is pathogenic or a rare polymorphism. We report the case of a 35-year-old patient operated for a pheochromocytoma who turned out to be a carrier of a novel SDHD (succinate dehydrogenase subunit D) missense mutation. With no positive family history or clinical correlates, we decided to perform additional analyses to test the clinical significance of the mutation. We performed in silico analysis, tissue loss of heterozygosity analysis, immunohistochemistry, Western blot analysis, SDH enzymatic assay, and measurement of the succinate/fumarate concentration ratio in the tumor tissue by tandem mass spectrometry. Although the in silico analysis gave contradictory results according to the different methods, all the other tests demonstrated that the SDH complex was conserved and normally active. We therefore came to the conclusion that the variant was a nonpathogenic polymorphism. Advancements in technology facilitate genetic analysis of patients with pheochromocytoma but also offer new challenges to the clinician who, in some cases, needs clinical correlates and/or functional tests to give significance to the results of the genetic assay.

  16. Personalized medicine and human genetic diversity.

    Science.gov (United States)

    Lu, Yi-Fan; Goldstein, David B; Angrist, Misha; Cavalleri, Gianpiero

    2014-07-24

    Human genetic diversity has long been studied both to understand how genetic variation influences risk of disease and infer aspects of human evolutionary history. In this article, we review historical and contemporary views of human genetic diversity, the rare and common mutations implicated in human disease susceptibility, and the relevance of genetic diversity to personalized medicine. First, we describe the development of thought about diversity through the 20th century and through more modern studies including genome-wide association studies (GWAS) and next-generation sequencing. We introduce several examples, such as sickle cell anemia and Tay-Sachs disease that are caused by rare mutations and are more frequent in certain geographical populations, and common treatment responses that are caused by common variants, such as hepatitis C infection. We conclude with comments about the continued relevance of human genetic diversity in medical genetics and personalized medicine more generally. Copyright © 2014 Cold Spring Harbor Laboratory Press; all rights reserved.

  17. Geometric statistical inference

    International Nuclear Information System (INIS)

    Periwal, Vipul

    1999-01-01

    A reparametrization-covariant formulation of the inverse problem of probability is explicitly solved for finite sample sizes. The inferred distribution is explicitly continuous for finite sample size. A geometric solution of the statistical inference problem in higher dimensions is outlined

  18. Genetic diversity analysis of common beans based on molecular markers

    Directory of Open Access Journals (Sweden)

    Homar R. Gill-Langarica

    2011-01-01

    Full Text Available A core collection of the common bean (Phaseolus vulgaris L., representing genetic diversity in the entire Mexican holding, is kept at the INIFAP (Instituto Nacional de Investigaciones Forestales, Agricolas y Pecuarias, Mexico Germplasm Bank. After evaluation, the genetic structure of this collection (200 accessions was compared with that of landraces from the states of Oaxaca, Chiapas and Veracruz (10 genotypes from each, as well as a further 10 cultivars, by means of four amplified fragment length polymorphisms (AFLP +3/+3 primer combinations and seven simple sequence repeats (SSR loci, in order to define genetic diversity, variability and mutual relationships. Data underwent cluster (UPGMA and molecular variance (AMOVA analyses. AFLP analysis produced 530 bands (88.5% polymorphic while SSR primers amplified 174 alleles, all polymorphic (8.2 alleles per locus. AFLP indicated that the highest genetic diversity was to be found in ten commercial-seed classes from two major groups of accessions from Central Mexico and Chiapas, which seems to be an important center of diversity in the south. A third group included genotypes from Nueva Granada, Mesoamerica, Jalisco and Durango races. Here, SSR analysis indicated a reduced number of shared haplotypes among accessions, whereas the highest genetic components of AMOVA variation were found within accessions. Genetic diversity observed in the common-bean core collection represents an important sample of the total Phaseolus genetic variability at the main Germplasm Bank of INIFAP. Molecular marker strategies could contribute to a better understanding of the genetic structure of the core collection as well as to its improvement and validation.

  19. Genetic diversity analysis of common beans based on molecular markers

    Directory of Open Access Journals (Sweden)

    Homar R. Gill-Langarica

    Full Text Available A core collection of the common bean (Phaseolus vulgaris L., representing genetic diversity in the entire Mexican holding, is kept at the INIFAP (Instituto Nacional de Investigaciones Forestales, Agricolas y Pecuarias, Mexico Germplasm Bank. After evaluation, the genetic structure of this collection (200 accessions was compared with that of landraces from the states of Oaxaca, Chiapas and Veracruz (10 genotypes from each, as well as a further 10 cultivars, by means of four amplified fragment length polymorphisms (AFLP +3/+3 primer combinations and seven simple sequence repeats (SSR loci, in order to define genetic diversity, variability and mutual relationships. Data underwent cluster (UPGMA and molecular variance (AMOVA analyses. AFLP analysis produced 530 bands (88.5% polymorphic while SSR primers amplified 174 alleles, all polymorphic (8.2 alleles per locus. AFLP indicated that the highest genetic diversity was to be found in ten commercial-seed classes from two major groups of accessions from Central Mexico and Chiapas, which seems to be an important center of diversity in the south. A third group included genotypes from Nueva Granada, Mesoamerica, Jalisco and Durango races. Here, SSR analysis indicated a reduced number of shared haplotypes among accessions, whereas the highest genetic components of AMOVA variation were found within accessions. Genetic diversity observed in the common-bean core collection represents an important sample of the total Phaseolus genetic variability at the main Germplasm Bank of INIFAP. Molecular marker strategies could contribute to a better understanding of the genetic structure of the core collection as well as to its improvement and validation.

  20. Genetic diversity analysis of common beans based on molecular markers.

    Science.gov (United States)

    Gill-Langarica, Homar R; Muruaga-Martínez, José S; Vargas-Vázquez, M L Patricia; Rosales-Serna, Rigoberto; Mayek-Pérez, Netzahualcoyotl

    2011-10-01

    A core collection of the common bean (Phaseolus vulgaris L.), representing genetic diversity in the entire Mexican holding, is kept at the INIFAP (Instituto Nacional de Investigaciones Forestales, Agricolas y Pecuarias, Mexico) Germplasm Bank. After evaluation, the genetic structure of this collection (200 accessions) was compared with that of landraces from the states of Oaxaca, Chiapas and Veracruz (10 genotypes from each), as well as a further 10 cultivars, by means of four amplified fragment length polymorphisms (AFLP) +3/+3 primer combinations and seven simple sequence repeats (SSR) loci, in order to define genetic diversity, variability and mutual relationships. Data underwent cluster (UPGMA) and molecular variance (AMOVA) analyses. AFLP analysis produced 530 bands (88.5% polymorphic) while SSR primers amplified 174 alleles, all polymorphic (8.2 alleles per locus). AFLP indicated that the highest genetic diversity was to be found in ten commercial-seed classes from two major groups of accessions from Central Mexico and Chiapas, which seems to be an important center of diversity in the south. A third group included genotypes from Nueva Granada, Mesoamerica, Jalisco and Durango races. Here, SSR analysis indicated a reduced number of shared haplotypes among accessions, whereas the highest genetic components of AMOVA variation were found within accessions. Genetic diversity observed in the common-bean core collection represents an important sample of the total Phaseolus genetic variability at the main Germplasm Bank of INIFAP. Molecular marker strategies could contribute to a better understanding of the genetic structure of the core collection as well as to its improvement and validation.

  1. A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics.

    Science.gov (United States)

    Lu, Qiongshi; Li, Boyang; Ou, Derek; Erlendsdottir, Margret; Powles, Ryan L; Jiang, Tony; Hu, Yiming; Chang, David; Jin, Chentian; Dai, Wei; He, Qidu; Liu, Zefeng; Mukherjee, Shubhabrata; Crane, Paul K; Zhao, Hongyu

    2017-12-07

    Despite the success of large-scale genome-wide association studies (GWASs) on complex traits, our understanding of their genetic architecture is far from complete. Jointly modeling multiple traits' genetic profiles has provided insights into the shared genetic basis of many complex traits. However, large-scale inference sets a high bar for both statistical power and biological interpretability. Here we introduce a principled framework to estimate annotation-stratified genetic covariance between traits using GWAS summary statistics. Through theoretical and numerical analyses, we demonstrate that our method provides accurate covariance estimates, thereby enabling researchers to dissect both the shared and distinct genetic architecture across traits to better understand their etiologies. Among 50 complex traits with publicly accessible GWAS summary statistics (N total ≈ 4.5 million), we identified more than 170 pairs with statistically significant genetic covariance. In particular, we found strong genetic covariance between late-onset Alzheimer disease (LOAD) and amyotrophic lateral sclerosis (ALS), two major neurodegenerative diseases, in single-nucleotide polymorphisms (SNPs) with high minor allele frequencies and in SNPs located in the predicted functional genome. Joint analysis of LOAD, ALS, and other traits highlights LOAD's correlation with cognitive traits and hints at an autoimmune component for ALS. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  2. Genetic structure of Japanese Spanish mackerel ( Scomberomorus ...

    African Journals Online (AJOL)

    Genetic structure of Japanese Spanish mackerel ( Scomberomorus niphonius ) in the East China Sea and Yellow Sea inferred from AFLP data. ... Considering the high hydrological connectivity of this region and the species pelagic life history, retention of larvae, different migration route and different spawning season may ...

  3. Statistical causal inferences and their applications in public health research

    CERN Document Server

    Wu, Pan; Chen, Ding-Geng

    2016-01-01

    This book compiles and presents new developments in statistical causal inference. The accompanying data and computer programs are publicly available so readers may replicate the model development and data analysis presented in each chapter. In this way, methodology is taught so that readers may implement it directly. The book brings together experts engaged in causal inference research to present and discuss recent issues in causal inference methodological development. This is also a timely look at causal inference applied to scenarios that range from clinical trials to mediation and public health research more broadly. In an academic setting, this book will serve as a reference and guide to a course in causal inference at the graduate level (Master's or Doctorate). It is particularly relevant for students pursuing degrees in Statistics, Biostatistics and Computational Biology. Researchers and data analysts in public health and biomedical research will also find this book to be an important reference.

  4. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data.

    Science.gov (United States)

    Bhaskar, Anand; Wang, Y X Rachel; Song, Yun S

    2015-02-01

    With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions. © 2015 Bhaskar et al.; Published by Cold Spring Harbor Laboratory Press.

  5. A genetic analysis of Trichuris trichiura and Trichuris suis from Ecuador.

    Science.gov (United States)

    Meekums, Hayley; Hawash, Mohamed B F; Sparks, Alexandra M; Oviedo, Yisela; Sandoval, Carlos; Chico, Martha E; Stothard, J Russell; Cooper, Philip J; Nejsum, Peter; Betson, Martha

    2015-03-19

    Since the nematodes Trichuris trichiura and T. suis are morphologically indistinguishable, genetic analysis is required to assess epidemiological cross-over between people and pigs. This study aimed to clarify the transmission biology of trichuriasis in Ecuador. Adult Trichuris worms were collected during a parasitological survey of 132 people and 46 pigs in Esmeraldas Province, Ecuador. Morphometric analysis of 49 pig worms and 64 human worms revealed significant variation. In discriminant analysis morphometric characteristics correctly classified male worms according to host species. In PCR-RFLP analysis of the ribosomal Internal Transcribed Spacer (ITS-2) and 18S DNA (59 pig worms and 82 human worms), nearly all Trichuris exhibited expected restriction patterns. However, two pig-derived worms showed a "heterozygous-type" ITS-2 pattern, with one also having a "heterozygous-type" 18S pattern. Phylogenetic analysis of the mitochondrial large ribosomal subunit partitioned worms by host species. Notably, some Ecuadorian T. suis clustered with porcine Trichuris from USA and Denmark and some with Chinese T. suis. This is the first study in Latin America to genetically analyse Trichuris parasites. Although T. trichiura does not appear to be zoonotic in Ecuador, there is evidence of genetic exchange between T. trichiura and T. suis warranting more detailed genetic sampling.

  6. A markerless protocol for genetic analysis of Aggregatibacter actinomycetemcomitans

    Science.gov (United States)

    Cheng, Ya-An; Jee, Jason; Hsu, Genie; Huang, Yanyan; Chen, Casey; Lin, Chun-Pin

    2015-01-01

    Background/Purpose The genomes of different Aggregatibacter actinomycetemcomitans strains contain many strain-specific genes and genomic islands (defined as DNA found in some but not all strains) of unknown functions. Genetic analysis for the functions of these islands will be constrained by the limited availability of genetic markers and vectors for A. actinomycetemcomitans. In this study we tested a novel genetic approach of gene deletion and restoration in a naturally competent A. actinomycetemcomitans strain D7S-1. Methods Specific genes’ deletion mutants and mutants restored with the deleted genes were constructed by a markerless loxP/Cre system. In mutants with sequential deletion of multiple genes loxP with different spacer regions were used to avoid unwanted recombinations between loxP sites. Results Eight single-gene deletion mutants, four multiple-gene deletion mutants, and two mutants with restored genes were constructed. No unintended non-specific deletion mutants were generated by this protocol. The protocol did not negatively affect the growth and biofilm formation of A. actinomycetemcomitans. Conclusion The protocol described in this study is efficient and specific for genetic manipulation of A. actinomycetemcomitans, and will be amenable for functional analysis of multiple genes in A. actinomycetemcomitans. PMID:24530245

  7. Contradictory genetic make-up of Dutch harbour porpoises: Response to van der Plas-Duivesteijn et al.

    Science.gov (United States)

    Kopps, Anna M.; Palsbøll, Per J.

    2016-02-01

    The assessment of the status of endangered species or populations typically draw generously on the plethora of population genetic software available to detect population genetic structuring. However, despite the many available analytical approaches, population genetic inference methods [of neutral genetic variation] essentially capture three basic processes; migration, random genetic drift and mutation. Consequently, different analytical approaches essentially capture the same basic process, and should yield consistent results.

  8. Privacy Threats and Practical Solutions for Genetic Risk Tests

    OpenAIRE

    Barman, Ludovic; El Graini, Mohammed-Taha; Raisaro, Jean Louis; Ayday, Erman; Hubaux, Jean-Pierre

    2015-01-01

    Recently, several solutions have been proposed to address the complex challenge of protecting individuals’ genetic data during personalized medicine tests. In this short paper, we analyze different privacy threats and propose simple countermeasures for the generic architecture mainly used in the literature. In particular, we present and evaluate a new practical solution against a critical attack of a malicious medical center trying to actively infer raw genetic information of patients.

  9. Goal inferences about robot behavior : goal inferences and human response behaviors

    NARCIS (Netherlands)

    Broers, H.A.T.; Ham, J.R.C.; Broeders, R.; De Silva, P.; Okada, M.

    2014-01-01

    This explorative research focused on the goal inferences human observers draw based on a robot's behavior, and the extent to which those inferences predict people's behavior in response to that robot. Results show that different robot behaviors cause different response behavior from people.

  10. Genetic analysis of PAX3 for diagnosis of Waardenburg syndrome type I.

    Science.gov (United States)

    Matsunaga, Tatsuo; Mutai, Hideki; Namba, Kazunori; Morita, Noriko; Masuda, Sawako

    2013-04-01

    PAX3 genetic analysis increased the diagnostic accuracy for Waardenburg syndrome type I (WS1). Analysis of the three-dimensional (3D) structure of PAX3 helped verify the pathogenicity of a missense mutation, and multiple ligation-dependent probe amplification (MLPA) analysis of PAX3 increased the sensitivity of genetic diagnosis in patients with WS1. Clinical diagnosis of WS1 is often difficult in individual patients with isolated, mild, or non-specific symptoms. The objective of the present study was to facilitate the accurate diagnosis of WS1 through genetic analysis of PAX3 and to expand the spectrum of known PAX3 mutations. In two Japanese families with WS1, we conducted a clinical evaluation of symptoms and genetic analysis, which involved direct sequencing, MLPA analysis, quantitative PCR of PAX3, and analysis of the predicted 3D structure of PAX3. The normal-hearing control group comprised 92 subjects who had normal hearing according to pure tone audiometry. In one family, direct sequencing of PAX3 identified a heterozygous mutation, p.I59F. Analysis of PAX3 3D structures indicated that this mutation distorted the DNA-binding site of PAX3. In the other family, MLPA analysis and subsequent quantitative PCR detected a large, heterozygous deletion spanning 1759-2554 kb that eliminated 12-18 genes including a whole PAX3 gene.

  11. DMPD: The Toll-like receptors: analysis by forward genetic methods. [Dynamic Macrophage Pathway CSML Database

    Lifescience Database Archive (English)

    Full Text Available 16001129 The Toll-like receptors: analysis by forward genetic methods. Beutler B. I...mmunogenetics. 2005 Jul;57(6):385-92. (.png) (.svg) (.html) (.csml) Show The Toll-like receptors: analysis by forwar...d genetic methods. PubmedID 16001129 Title The Toll-like receptors: analysis by forward genetic meth

  12. Analysis of genetic effects of nuclear-cytoplasmic interaction on quantitative traits: genetic model for diploid plants.

    Science.gov (United States)

    Han, Lide; Yang, Jian; Zhu, Jun

    2007-06-01

    A genetic model was proposed for simultaneously analyzing genetic effects of nuclear, cytoplasm, and nuclear-cytoplasmic interaction (NCI) as well as their genotype by environment (GE) interaction for quantitative traits of diploid plants. In the model, the NCI effects were further partitioned into additive and dominance nuclear-cytoplasmic interaction components. Mixed linear model approaches were used for statistical analysis. On the basis of diallel cross designs, Monte Carlo simulations showed that the genetic model was robust for estimating variance components under several situations without specific effects. Random genetic effects were predicted by an adjusted unbiased prediction (AUP) method. Data on four quantitative traits (boll number, lint percentage, fiber length, and micronaire) in Upland cotton (Gossypium hirsutum L.) were analyzed as a worked example to show the effectiveness of the model.

  13. GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research—an update

    Science.gov (United States)

    Peakall, Rod; Smouse, Peter E.

    2012-01-01

    Summary: GenAlEx: Genetic Analysis in Excel is a cross-platform package for population genetic analyses that runs within Microsoft Excel. GenAlEx offers analysis of diploid codominant, haploid and binary genetic loci and DNA sequences. Both frequency-based (F-statistics, heterozygosity, HWE, population assignment, relatedness) and distance-based (AMOVA, PCoA, Mantel tests, multivariate spatial autocorrelation) analyses are provided. New features include calculation of new estimators of population structure: G′ST, G′′ST, Jost’s Dest and F′ST through AMOVA, Shannon Information analysis, linkage disequilibrium analysis for biallelic data and novel heterogeneity tests for spatial autocorrelation analysis. Export to more than 30 other data formats is provided. Teaching tutorials and expanded step-by-step output options are included. The comprehensive guide has been fully revised. Availability and implementation: GenAlEx is written in VBA and provided as a Microsoft Excel Add-in (compatible with Excel 2003, 2007, 2010 on PC; Excel 2004, 2011 on Macintosh). GenAlEx, and supporting documentation and tutorials are freely available at: http://biology.anu.edu.au/GenAlEx. Contact: rod.peakall@anu.edu.au PMID:22820204

  14. Bayesian Inference for Functional Dynamics Exploring in fMRI Data

    Directory of Open Access Journals (Sweden)

    Xuan Guo

    2016-01-01

    Full Text Available This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM, Bayesian Connectivity Change Point Model (BCCPM, and Dynamic Bayesian Variable Partition Model (DBVPM, and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.

  15. Inferring ontology graph structures using OWL reasoning

    KAUST Repository

    Rodriguez-Garcia, Miguel Angel

    2018-01-05

    Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies\\' semantic content remains a challenge.We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies\\' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph .Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.

  16. Inferring ontology graph structures using OWL reasoning.

    Science.gov (United States)

    Rodríguez-García, Miguel Ángel; Hoehndorf, Robert

    2018-01-05

    Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies' semantic content remains a challenge. We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph . Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.

  17. EMBO Course “Formal Analysis of Genetic Regulation”

    CERN Document Server

    1979-01-01

    The E M B 0 course on "Formal Analysis of Genetic Regulation" A course entitled "Formal analysis of Genetic Regulation" was held at the University of Brussels from 6 to 16 September 1977 under the auspices of EMBO (European Molecular Biology Organization). As indicated by the title of the book (but not explicitly enough by the title of the course), the main emphasis was put on a dynamic analysis of systems using logical methods, that is, methods in which functions and variables take only a limited number of values - typically two. In this respect, this course was complementary to an EMBO course using continuous methods which was held some months later in Israel by Prof. Segel. People from four very different laboratories took an active part in teaching our course in Brussels : Drs Anne LEUSSLER and Philippe VAN HAM, from the Laboratory of Prof. Jean FLORINE (Laboratoire des Systemes logiques et numeriques, Faculte des Sciences appliquees, Universite Libre de Bruxelles). Dr Stuart KAUFFMAN (Dept. of Biochemist...

  18. Entropic Inference

    OpenAIRE

    Caticha, Ariel

    2010-01-01

    In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEn...

  19. Behavior Intention Derivation of Android Malware Using Ontology Inference

    Directory of Open Access Journals (Sweden)

    Jian Jiao

    2018-01-01

    Full Text Available Previous researches on Android malware mainly focus on malware detection, and malware’s evolution makes the process face certain hysteresis. The information presented by these detected results (malice judgment, family classification, and behavior characterization is limited for analysts. Therefore, a method is needed to restore the intention of malware, which reflects the relation between multiple behaviors of complex malware and its ultimate purpose. This paper proposes a novel description and derivation model of Android malware intention based on the theory of intention and malware reverse engineering. This approach creates ontology for malware intention to model the semantic relation between behaviors and its objects and automates the process of intention derivation by using SWRL rules transformed from intention model and Jess inference engine. Experiments on 75 typical samples show that the inference system can perform derivation of malware intention effectively, and 89.3% of the inference results are consistent with artificial analysis, which proves the feasibility and effectiveness of our theory and inference system.

  20. Progression inference for somatic mutations in cancer

    Directory of Open Access Journals (Sweden)

    Leif E. Peterson

    2017-04-01

    Full Text Available Computational methods were employed to determine progression inference of genomic alterations in commonly occurring cancers. Using cross-sectional TCGA data, we computed evolutionary trajectories involving selectivity relationships among pairs of gene-specific genomic alterations such as somatic mutations, deletions, amplifications, downregulation, and upregulation among the top 20 driver genes associated with each cancer. Results indicate that the majority of hierarchies involved TP53, PIK3CA, ERBB2, APC, KRAS, EGFR, IDH1, VHL, etc. Research into the order and accumulation of genomic alterations among cancer driver genes will ever-increase as the costs of nextgen sequencing subside, and personalized/precision medicine incorporates whole-genome scans into the diagnosis and treatment of cancer. Keywords: Oncology, Cancer research, Genetics, Computational biology

  1. Phenotypic and molecular genetic analysis of Pyruvate Kinase ...

    African Journals Online (AJOL)

    Phenotypic and molecular genetic analysis of Pyruvate Kinase deficiency in a Tunisian family. Jaouani Mouna, Hamdi Nadia, Chaouch Leila, Kalai Miniar, Mellouli Fethi, Darragi Imen, Boudriga Imen, Chaouachi Dorra, Bejaoui Mohamed, Abbes Salem ...

  2. Learning Convex Inference of Marginals

    OpenAIRE

    Domke, Justin

    2012-01-01

    Graphical models trained using maximum likelihood are a common tool for probabilistic inference of marginal distributions. However, this approach suffers difficulties when either the inference process or the model is approximate. In this paper, the inference process is first defined to be the minimization of a convex function, inspired by free energy approximations. Learning is then done directly in terms of the performance of the inference process at univariate marginal prediction. The main ...

  3. Population Genetic Structure of the Endangered Kaiser's Mountain Newt, Neurergus kaiseri (Amphibia: Salamandridae.

    Directory of Open Access Journals (Sweden)

    Hossein Farasat

    Full Text Available Species often exhibit different levels of genetic structuring correlated to their environment. However, understanding how environmental heterogeneity influences genetic variation is difficult because the effects of gene flow, drift and selection are confounded. We investigated the genetic variation and its ecological correlates in an endemic and critically endangered stream breeding mountain newt, Neurergus kaiseri, within its entire range in southwestern Iran. We identified two geographic regions based on phylogenetic relationships using Bayesian inference and maximum likelihood of 779 bp mtDNA (D-loop in 111 individuals from ten of twelve known breeding populations. This analysis revealed a clear divergence between northern populations, located in more humid habitats at higher elevation, and southern populations, from drier habitats at lower elevations regions. From seven haplotypes found in these populations none was shared between the two regions. Analysis of molecular variance (AMOVA of N. kaiseri indicates that 94.03% of sequence variation is distributed among newt populations and 5.97% within them. Moreover, a high degree of genetic subdivision, mainly attributable to the existence of significant variance among the two regions is shown (θCT = 0.94, P = 0.002. The positive and significant correlation between geographic and genetic distances (r = 0.61, P = 0.002 following controlling for environmental distance suggests an important influence of geographic divergence of the sites in shaping the genetic variation and may provide tools for a possible conservation based prioritization policy for the endangered species.

  4. Working with sample data exploration and inference

    CERN Document Server

    Chaffe-Stengel, Priscilla

    2014-01-01

    Managers and analysts routinely collect and examine key performance measures to better understand their operations and make good decisions. Being able to render the complexity of operations data into a coherent account of significant events requires an understanding of how to work well with raw data and to make appropriate inferences. Although some statistical techniques for analyzing data and making inferences are sophisticated and require specialized expertise, there are methods that are understandable and applicable by anyone with basic algebra skills and the support of a spreadsheet package. By applying these fundamental methods themselves rather than turning over both the data and the responsibility for analysis and interpretation to an expert, managers will develop a richer understanding and potentially gain better control over their environment. This text is intended to describe these fundamental statistical techniques to managers, data analysts, and students. Statistical analysis of sample data is enh...

  5. Population genetic structure in wild and aquaculture populations of Hemibarbus maculates inferred from microsatellites markers

    Directory of Open Access Journals (Sweden)

    Linlin Li

    2017-03-01

    Full Text Available The objective of this study was to investigate 4 aquaculture populations Shanghai (SH, Hangzhou (HZ, Kaihua (KH and Xianju (XJ and one wild population Yingshan (YS of spotted barbell (Hemibarbus maculates to assess their genetic diversity level and investigate the genetic structure of the populations. The dendrogram and STRUCTURE revealed that the populations XJ, KH, and HZ jointly formed one cluster, to which the populations SH and YS were sequentially adhered. The genetic diversity of the cultured populations maintained better, possible due to favourable hatchery conditions that decreased the effect of environmental selection present in wild populations. The results of the present study will contribute to the management of spotted barbell genetic resources, but also demonstrates how the genetic diversity of freshwater species is vulnerable to human activity.

  6. Indirect genetics effects and evolutionary constraint: an analysis of social dominance in red deer, Cervus elaphus.

    Science.gov (United States)

    Wilson, A J; Morrissey, M B; Adams, M J; Walling, C A; Guinness, F E; Pemberton, J M; Clutton-Brock, T H; Kruuk, L E B

    2011-04-01

    By determining access to limited resources, social dominance is often an important determinant of fitness. Thus, if heritable, standard theory predicts mean dominance should evolve. However, dominance is usually inferred from the tendency to win contests, and given one winner and one loser in any dyadic contest, the mean proportion won will always equal 0.5. Here, we argue that the apparent conflict between quantitative genetic theory and common sense is resolved by recognition of indirect genetic effects (IGEs). We estimate selection on, and genetic (co)variance structures for, social dominance, in a wild population of red deer Cervus elaphus, on the Scottish island of Rum. While dominance is heritable and positively correlated with lifetime fitness, contest outcomes depend as much on the genes carried by an opponent as on the genotype of a focal individual. We show how this dependency imposes an absolute evolutionary constraint on the phenotypic mean, thus reconciling theoretical predictions with common sense. More generally, we argue that IGEs likely provide a widespread but poorly recognized source of evolutionary constraint for traits influenced by competition. © 2011 The Authors. Journal of Evolutionary Biology © 2011 European Society For Evolutionary Biology.

  7. Computational methods for analysis and inference of kinase/inhibitor relationships

    Directory of Open Access Journals (Sweden)

    Fabrizio eFerrè

    2014-06-01

    Full Text Available The central role of kinases in virtually all signal transduction networks is the driving motivation for the development of compounds modulating their activity. ATP-mimetic inhibitors are essential tools for elucidating signaling pathways and are emerging as promising therapeutic agents. However, off-target ligand binding and complex and sometimes unexpected kinase/inhibitor relationships can occur for seemingly unrelated kinases, stressing that computational approaches are needed for learning the interaction determinants and for the inference of the effect of small compounds on a given kinase. Recently published high-throughput profiling studies assessed the effects of thousands of small compound inhibitors, covering a substantial portion of the kinome. This wealth of data paved the road for computational resources and methods that can offer a major contribution in understanding the reasons of the inhibition, helping in the rational design of more specific molecules, in the in silico prediction of inhibition for those neglected kinases for which no systematic analysis has been carried yet, in the selection of novel inhibitors with desired selectivity, and offering novel avenues of personalized therapies.

  8. Model-free information-theoretic approach to infer leadership in pairs of zebrafish.

    Science.gov (United States)

    Butail, Sachit; Mwaffo, Violet; Porfiri, Maurizio

    2016-04-01

    Collective behavior affords several advantages to fish in avoiding predators, foraging, mating, and swimming. Although fish schools have been traditionally considered egalitarian superorganisms, a number of empirical observations suggest the emergence of leadership in gregarious groups. Detecting and classifying leader-follower relationships is central to elucidate the behavioral and physiological causes of leadership and understand its consequences. Here, we demonstrate an information-theoretic approach to infer leadership from positional data of fish swimming. In this framework, we measure social interactions between fish pairs through the mathematical construct of transfer entropy, which quantifies the predictive power of a time series to anticipate another, possibly coupled, time series. We focus on the zebrafish model organism, which is rapidly emerging as a species of choice in preclinical research for its genetic similarity to humans and reduced neurobiological complexity with respect to mammals. To overcome experimental confounds and generate test data sets on which we can thoroughly assess our approach, we adapt and calibrate a data-driven stochastic model of zebrafish motion for the simulation of a coupled dynamical system of zebrafish pairs. In this synthetic data set, the extent and direction of the coupling between the fish are systematically varied across a wide parameter range to demonstrate the accuracy and reliability of transfer entropy in inferring leadership. Our approach is expected to aid in the analysis of collective behavior, providing a data-driven perspective to understand social interactions.

  9. Nonparametric Bayesian inference in biostatistics

    CERN Document Server

    Müller, Peter

    2015-01-01

    As chapters in this book demonstrate, BNP has important uses in clinical sciences and inference for issues like unknown partitions in genomics. Nonparametric Bayesian approaches (BNP) play an ever expanding role in biostatistical inference from use in proteomics to clinical trials. Many research problems involve an abundance of data and require flexible and complex probability models beyond the traditional parametric approaches. As this book's expert contributors show, BNP approaches can be the answer. Survival Analysis, in particular survival regression, has traditionally used BNP, but BNP's potential is now very broad. This applies to important tasks like arrangement of patients into clinically meaningful subpopulations and segmenting the genome into functionally distinct regions. This book is designed to both review and introduce application areas for BNP. While existing books provide theoretical foundations, this book connects theory to practice through engaging examples and research questions. Chapters c...

  10. Inference of Causal Relationships between Biomarkers and Outcomes in High Dimensions

    Directory of Open Access Journals (Sweden)

    Felix Agakov

    2011-12-01

    Full Text Available We describe a unified computational framework for learning causal dependencies between genotypes, biomarkers, and phenotypic outcomes from large-scale data. In contrast to previous studies, our framework allows for noisy measurements, hidden confounders, missing data, and pleiotropic effects of genotypes on outcomes. The method exploits the use of genotypes as “instrumental variables” to infer causal associations between phenotypic biomarkers and outcomes, without requiring the assumption that genotypic effects are mediated only through the observed biomarkers. The framework builds on sparse linear methods developed in statistics and machine learning and modified here for inferring structures of richer networks with latent variables. Where the biomarkers are gene transcripts, the method can be used for fine mapping of quantitative trait loci (QTLs detected in genetic linkage studies. To demonstrate our method, we examined effects of gene transcript levels in the liver on plasma HDL cholesterol levels in a sample of 260 mice from a heterogeneous stock.

  11. Genetic parameters of performance traits in Sul-Mato-Grossenses naturalized sheep

    Directory of Open Access Journals (Sweden)

    Daniele Portela de Oliveira

    2014-02-01

    Full Text Available Estimates of genetic parameters are important to study characteristics that are to be included in a breeding program of a genetic group. The information of 594 weights from 211 lambs of a genetic group of naturalized Sul-mato-grossenses sheep belonging to Manoel de Barros Foundation and breeding at Centro Tecnologico de Ovinos from Anhanguera-Uniderp University was used. The estimation of variance components in unicaracter and bicaracter analysis were carried out through Bayesian inference. Estimates of heritability ranged from unicaracter analyses (0.22 to 0.47 and the bicaracter analyses (0.13 to 0.78. The maternal environmental permanent effect was higher in birth weight and average daily gain from birth to 50 days in 24.2% and 19.5%, respectively, in the observed variation. Estimates of heritability, maternal environmental permanent effect participation, phenotypic and genetic correlations indicate that selection for average daily gain from birth to 90 days would imply increases in weight at 50 days, weight at 90 days and average daily gain from 50 to 90 days of lambs with no significant increase in birth weight and average daily gain birth at 50 days.

  12. A novel genetic tool for clonal analysis of fourth chromosome mutations

    OpenAIRE

    Sousa-Neves, Rui; Schinaman, Joseph M.

    2012-01-01

    The fourth chromosome of Drosophila remains one of the most intractable regions of the fly genome to genetic analysis. The main difficulty posed to the genetic analyses of mutations on this chromosome arises from the fact that it does not undergo meiotic recombination, which makes recombination mapping impossible, and also prevents clonal analysis of mutations, a technique which relies on recombination to introduce the prerequisite recessive markers and FLP-recombinase recognition targets (FR...

  13. Mapping DNA damage-dependent genetic interactions in yeast via party mating and barcode fusion genetics.

    Science.gov (United States)

    Díaz-Mejía, J Javier; Celaj, Albi; Mellor, Joseph C; Coté, Atina; Balint, Attila; Ho, Brandon; Bansal, Pritpal; Shaeri, Fatemeh; Gebbia, Marinella; Weile, Jochen; Verby, Marta; Karkhanina, Anna; Zhang, YiFan; Wong, Cassandra; Rich, Justin; Prendergast, D'Arcy; Gupta, Gaurav; Öztürk, Sedide; Durocher, Daniel; Brown, Grant W; Roth, Frederick P

    2018-05-28

    Condition-dependent genetic interactions can reveal functional relationships between genes that are not evident under standard culture conditions. State-of-the-art yeast genetic interaction mapping, which relies on robotic manipulation of arrays of double-mutant strains, does not scale readily to multi-condition studies. Here, we describe barcode fusion genetics to map genetic interactions (BFG-GI), by which double-mutant strains generated via en masse "party" mating can also be monitored en masse for growth to detect genetic interactions. By using site-specific recombination to fuse two DNA barcodes, each representing a specific gene deletion, BFG-GI enables multiplexed quantitative tracking of double mutants via next-generation sequencing. We applied BFG-GI to a matrix of DNA repair genes under nine different conditions, including methyl methanesulfonate (MMS), 4-nitroquinoline 1-oxide (4NQO), bleomycin, zeocin, and three other DNA-damaging environments. BFG-GI recapitulated known genetic interactions and yielded new condition-dependent genetic interactions. We validated and further explored a subnetwork of condition-dependent genetic interactions involving MAG1 , SLX4, and genes encoding the Shu complex, and inferred that loss of the Shu complex leads to an increase in the activation of the checkpoint protein kinase Rad53. © 2018 The Authors. Published under the terms of the CC BY 4.0 license.

  14. Analysis of genetic polymorphism of nine short tandem repeat loci in ...

    African Journals Online (AJOL)

    Yomi

    2012-03-15

    Mar 15, 2012 ... Key words: short tandem repeat, repeat motif, genetic polymorphism, Han population, forensic genetics. INTRODUCTION. Short tandem repeat (STR) is widely .... Data analysis. The exact test of Hardy-Weinberg equilibrium was conducted with. Arlequin version 3.5 software (Computational and Molecular.

  15. Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples.

    Science.gov (United States)

    Pettengill, James B; Pightling, Arthur W; Baugher, Joseph D; Rand, Hugh; Strain, Errol

    2016-01-01

    The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging due to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). When analyzing empirical data (whole-genome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.

  16. Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples.

    Directory of Open Access Journals (Sweden)

    James B Pettengill

    Full Text Available The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis. In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging due to both biological (evolutionary diverse samples and computational (petabytes of sequence data issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST scheme. When analyzing empirical data (whole-genome sequence data from 18,997 Salmonella isolates there are features (e.g., genomic, assembly, and contamination that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.

  17. Genetic Code Analysis Toolkit: A novel tool to explore the coding properties of the genetic code and DNA sequences

    Science.gov (United States)

    Kraljić, K.; Strüngmann, L.; Fimmel, E.; Gumbel, M.

    2018-01-01

    The genetic code is degenerated and it is assumed that redundancy provides error detection and correction mechanisms in the translation process. However, the biological meaning of the code's structure is still under current research. This paper presents a Genetic Code Analysis Toolkit (GCAT) which provides workflows and algorithms for the analysis of the structure of nucleotide sequences. In particular, sets or sequences of codons can be transformed and tested for circularity, comma-freeness, dichotomic partitions and others. GCAT comes with a fertile editor custom-built to work with the genetic code and a batch mode for multi-sequence processing. With the ability to read FASTA files or load sequences from GenBank, the tool can be used for the mathematical and statistical analysis of existing sequence data. GCAT is Java-based and provides a plug-in concept for extensibility. Availability: Open source Homepage:http://www.gcat.bio/

  18. Probabilistic inductive inference: a survey

    OpenAIRE

    Ambainis, Andris

    2001-01-01

    Inductive inference is a recursion-theoretic theory of learning, first developed by E. M. Gold (1967). This paper surveys developments in probabilistic inductive inference. We mainly focus on finite inference of recursive functions, since this simple paradigm has produced the most interesting (and most complex) results.

  19. LAIT: a local ancestry inference toolkit.

    Science.gov (United States)

    Hui, Daniel; Fang, Zhou; Lin, Jerome; Duan, Qing; Li, Yun; Hu, Ming; Chen, Wei

    2017-09-06

    Inferring local ancestry in individuals of mixed ancestry has many applications, most notably in identifying disease-susceptible loci that vary among different ethnic groups. Many software packages are available for inferring local ancestry in admixed individuals. However, most of these existing software packages require specific formatted input files and generate output files in various types, yielding practical inconvenience. We developed a tool set, Local Ancestry Inference Toolkit (LAIT), which can convert standardized files into software-specific input file formats as well as standardize and summarize inference results for four popular local ancestry inference software: HAPMIX, LAMP, LAMP-LD, and ELAI. We tested LAIT using both simulated and real data sets and demonstrated that LAIT provides convenience to run multiple local ancestry inference software. In addition, we evaluated the performance of local ancestry software among different supported software packages, mainly focusing on inference accuracy and computational resources used. We provided a toolkit to facilitate the use of local ancestry inference software, especially for users with limited bioinformatics background.

  20. Bayesian statistical inference

    Directory of Open Access Journals (Sweden)

    Bruno De Finetti

    2017-04-01

    Full Text Available This work was translated into English and published in the volume: Bruno De Finetti, Induction and Probability, Biblioteca di Statistica, eds. P. Monari, D. Cocchi, Clueb, Bologna, 1993.Bayesian statistical Inference is one of the last fundamental philosophical papers in which we can find the essential De Finetti's approach to the statistical inference.

  1. Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology

    Science.gov (United States)

    Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen

    2018-03-01

    Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data-space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper we use massive asymptotically-optimal data compression to reduce the dimensionality of the data-space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parameterized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate Density Estimation Likelihood-Free Inference with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological datasets.

  2. Polytene chromosomal maps of 11 Drosophila species: the order of genomic scaffolds inferred from genetic and physical maps.

    Science.gov (United States)

    Schaeffer, Stephen W; Bhutkar, Arjun; McAllister, Bryant F; Matsuda, Muneo; Matzkin, Luciano M; O'Grady, Patrick M; Rohde, Claudia; Valente, Vera L S; Aguadé, Montserrat; Anderson, Wyatt W; Edwards, Kevin; Garcia, Ana C L; Goodman, Josh; Hartigan, James; Kataoka, Eiko; Lapoint, Richard T; Lozovsky, Elena R; Machado, Carlos A; Noor, Mohamed A F; Papaceit, Montserrat; Reed, Laura K; Richards, Stephen; Rieger, Tania T; Russo, Susan M; Sato, Hajime; Segarra, Carmen; Smith, Douglas R; Smith, Temple F; Strelets, Victor; Tobari, Yoshiko N; Tomimura, Yoshihiko; Wasserman, Marvin; Watts, Thomas; Wilson, Robert; Yoshida, Kiyohito; Markow, Therese A; Gelbart, William M; Kaufman, Thomas C

    2008-07-01

    The sequencing of the 12 genomes of members of the genus Drosophila was taken as an opportunity to reevaluate the genetic and physical maps for 11 of the species, in part to aid in the mapping of assembled scaffolds. Here, we present an overview of the importance of cytogenetic maps to Drosophila biology and to the concepts of chromosomal evolution. Physical and genetic markers were used to anchor the genome assembly scaffolds to the polytene chromosomal maps for each species. In addition, a computational approach was used to anchor smaller scaffolds on the basis of the analysis of syntenic blocks. We present the chromosomal map data from each of the 11 sequenced non-Drosophila melanogaster species as a series of sections. Each section reviews the history of the polytene chromosome maps for each species, presents the new polytene chromosome maps, and anchors the genomic scaffolds to the cytological maps using genetic and physical markers. The mapping data agree with Muller's idea that the majority of Drosophila genes are syntenic. Despite the conservation of genes within homologous chromosome arms across species, the karyotypes of these species have changed through the fusion of chromosomal arms followed by subsequent rearrangement events.

  3. Logic analysis and verification of n-input genetic logic circuits

    DEFF Research Database (Denmark)

    Baig, Hasan; Madsen, Jan

    2017-01-01

    . In this paper, we present an approach to analyze and verify the Boolean logic of a genetic circuit from the data obtained through stochastic analog circuit simulations. The usefulness of this analysis is demonstrated through different case studies illustrating how our approach can be used to verify the expected......Nature is using genetic logic circuits to regulate the fundamental processes of life. These genetic logic circuits are triggered by a combination of external signals, such as chemicals, proteins, light and temperature, to emit signals to control other gene expressions or metabolic pathways...... accordingly. As compared to electronic circuits, genetic circuits exhibit stochastic behavior and do not always behave as intended. Therefore, there is a growing interest in being able to analyze and verify the logical behavior of a genetic circuit model, prior to its physical implementation in a laboratory...

  4. A parametric interpretation of Bayesian Nonparametric Inference from Gene Genealogies: Linking ecological, population genetics and evolutionary processes.

    Science.gov (United States)

    Ponciano, José Miguel

    2017-11-22

    Using a nonparametric Bayesian approach Palacios and Minin (2013) dramatically improved the accuracy, precision of Bayesian inference of population size trajectories from gene genealogies. These authors proposed an extension of a Gaussian Process (GP) nonparametric inferential method for the intensity function of non-homogeneous Poisson processes. They found that not only the statistical properties of the estimators were improved with their method, but also, that key aspects of the demographic histories were recovered. The authors' work represents the first Bayesian nonparametric solution to this inferential problem because they specify a convenient prior belief without a particular functional form on the population trajectory. Their approach works so well and provides such a profound understanding of the biological process, that the question arises as to how truly "biology-free" their approach really is. Using well-known concepts of stochastic population dynamics, here I demonstrate that in fact, Palacios and Minin's GP model can be cast as a parametric population growth model with density dependence and environmental stochasticity. Making this link between population genetics and stochastic population dynamics modeling provides novel insights into eliciting biologically meaningful priors for the trajectory of the effective population size. The results presented here also bring novel understanding of GP as models for the evolution of a trait. Thus, the ecological principles foundation of Palacios and Minin (2013)'s prior adds to the conceptual and scientific value of these authors' inferential approach. I conclude this note by listing a series of insights brought about by this connection with Ecology. Copyright © 2017 The Author. Published by Elsevier Inc. All rights reserved.

  5. Analysis and design of a genetic circuit for dynamic metabolic engineering.

    Science.gov (United States)

    Anesiadis, Nikolaos; Kobayashi, Hideki; Cluett, William R; Mahadevan, Radhakrishnan

    2013-08-16

    Recent advances in synthetic biology have equipped us with new tools for bioprocess optimization at the genetic level. Previously, we have presented an integrated in silico design for the dynamic control of gene expression based on a density-sensing unit and a genetic toggle switch. In the present paper, analysis of a serine-producing Escherichia coli mutant shows that an instantaneous ON-OFF switch leads to a maximum theoretical productivity improvement of 29.6% compared to the mutant. To further the design, global sensitivity analysis is applied here to a mathematical model of serine production in E. coli coupled with a genetic circuit. The model of the quorum sensing and the toggle switch involves 13 parameters of which 3 are identified as having a significant effect on serine concentration. Simulations conducted in this reduced parameter space further identified the optimal ranges for these 3 key parameters to achieve productivity values close to the maximum theoretical values. This analysis can now be used to guide the experimental implementation of a dynamic metabolic engineering strategy and reduce the time required to design the genetic circuit components.

  6. Is there a hierarchy of social inferences? The likelihood and speed of inferring intentionality, mind, and personality.

    Science.gov (United States)

    Malle, Bertram F; Holbrook, Jess

    2012-04-01

    People interpret behavior by making inferences about agents' intentionality, mind, and personality. Past research studied such inferences 1 at a time; in real life, people make these inferences simultaneously. The present studies therefore examined whether 4 major inferences (intentionality, desire, belief, and personality), elicited simultaneously in response to an observed behavior, might be ordered in a hierarchy of likelihood and speed. To achieve generalizability, the studies included a wide range of stimulus behaviors, presented them verbally and as dynamic videos, and assessed inferences both in a retrieval paradigm (measuring the likelihood and speed of accessing inferences immediately after they were made) and in an online processing paradigm (measuring the speed of forming inferences during behavior observation). Five studies provide evidence for a hierarchy of social inferences-from intentionality and desire to belief to personality-that is stable across verbal and visual presentations and that parallels the order found in developmental and primate research. (c) 2012 APA, all rights reserved.

  7. Inferring influenza global transmission networks without complete phylogenetic information.

    Science.gov (United States)

    Aris-Brosou, Stéphane

    2014-03-01

    Influenza is one of the most severe respiratory infections affecting humans throughout the world, yet the dynamics of its global transmission network are still contentious. Here, I describe a novel combination of phylogenetics, time series, and graph theory to analyze 14.25 years of data stratified in space and in time, focusing on the main target of the human immune response, the hemagglutinin gene. While bypassing the complete phylogenetic inference of huge data sets, the method still extracts information suggesting that waves of genetic or of nucleotide diversity circulate continuously around the globe for subtypes that undergo sustained transmission over several seasons, such as H3N2 and pandemic H1N1/09, while diversity of prepandemic H1N1 viruses had until 2009 a noncontinuous transmission pattern consistent with a source/sink model. Irrespective of the shift in the structure of H1N1 diversity circulation with the emergence of the pandemic H1N1/09 strain, US prevalence peaks during the winter months when genetic diversity is at its lowest. This suggests that a dominant strain is generally responsible for epidemics and that monitoring genetic and/or nucleotide diversity in real time could provide public health agencies with an indirect estimate of prevalence.

  8. Population genetic analysis and trichothecene profiling of Fusarium graminearum from wheat in Uruguay.

    Science.gov (United States)

    Pan, D; Mionetto, A; Calero, N; Reynoso, M M; Torres, A; Bettucci, L

    2016-03-11

    Fusarium graminearum sensu stricto (F. graminearum s.s.) is the major causal agent of Fusarium head blight of wheat worldwide, and contaminates grains with trichothecene mycotoxins that cause serious threats to food safety and animal health. An important aspect of managing this pathogen and reducing mycotoxin contamination of wheat is knowledge regarding its population genetics. Therefore, isolates of F. graminearum s.s. from the major wheat-growing region of Uruguay were analyzed by amplified fragment length polymorphism assays, PCR genotyping, and chemical analysis of trichothecene production. Of the 102 isolates identified as having the 15-ADON genotype via PCR genotyping, all were DON producers, but only 41 strains were also 15-ADON producers, as determined by chemical analysis. The populations were genotypically diverse but genetically similar, with significant genetic exchange occurring between them. Analysis of molecular variance indicated that most of the genetic variability resulted from differences between isolates within populations. Multilocus linkage disequilibrium analysis suggested that the isolates had a panmictic population genetic structure and that there is significant recombination occurs in F. graminearum s.s. In conclusion, tour findings provide the first detailed description of the genetic structure and trichothecene production of populations of F. graminearum s.s. from Uruguay, and expands our understanding of the agroecology of F. graminearum and of the correlation between genotypes and trichothecene chemotypes.

  9. Demographic history and biologically relevant genetic variation of Native Mexicans inferred from whole-genome sequencing

    OpenAIRE

    Romero-Hidalgo, Sandra; Ochoa-Leyva, Adrián; Garcíarrubio, Alejandro; Acuña-Alonzo, Victor; Antúnez-Argüelles, Erika; Balcazar-Quintero, Martha; Barquera-Lozano, Rodrigo; Carnevale, Alessandra; Cornejo-Granados, Fernanda; Fernández-López, Juan Carlos; García-Herrera, Rodrigo; García-Ortíz, Humberto; Granados-Silvestre, Ángeles; Granados, Julio; Guerrero-Romero, Fernando

    2017-01-01

    Understanding the genetic structure of Native American populations is important to clarify their diversity, demographic history, and to identify genetic factors relevant for biomedical traits. Here, we show a demographic history reconstruction from 12 Native American whole genomes belonging to six distinct ethnic groups representing the three main described genetic clusters of Mexico (Northern, Southern, and Maya). Effective population size estimates of all Native American groups remained bel...

  10. The Recombination Landscape in Wild House Mice Inferred Using Population Genomic Data.

    Science.gov (United States)

    Booker, Tom R; Ness, Rob W; Keightley, Peter D

    2017-09-01

    Characterizing variation in the rate of recombination across the genome is important for understanding several evolutionary processes. Previous analysis of the recombination landscape in laboratory mice has revealed that the different subspecies have different suites of recombination hotspots. It is unknown, however, whether hotspots identified in laboratory strains reflect the hotspot diversity of natural populations or whether broad-scale variation in the rate of recombination is conserved between subspecies. In this study, we constructed fine-scale recombination rate maps for a natural population of the Eastern house mouse, Mus musculus castaneus We performed simulations to assess the accuracy of recombination rate inference in the presence of phase errors, and we used a novel approach to quantify phase error. The spatial distribution of recombination events is strongly positively correlated between our castaneus map, and a map constructed using inbred lines derived predominantly from M. m. domesticus Recombination hotspots in wild castaneus show little overlap, however, with the locations of double-strand breaks in wild-derived house mouse strains. Finally, we also find that genetic diversity in M. m. castaneus is positively correlated with the rate of recombination, consistent with pervasive natural selection operating in the genome. Our study suggests that recombination rate variation is conserved at broad scales between house mouse subspecies, but it is not strongly conserved at fine scales. Copyright © 2017 by the Genetics Society of America.

  11. Genetic diversity and population structure analysis of the tropical pasture grass Brachiaria humidicola based on microsatellites, cytogenetics, morphological traits, and geographical origin.

    Science.gov (United States)

    Jungmann, L; Vigna, B B Z; Boldrini, K R; Sousa, A C B; do Valle, C B; Resende, R M S; Pagliarini, M S; Zucchi, M I; de Souza, A P

    2010-09-01

    Brachiaria humidicola (Rendle) Schweick. is a warm-season grass commonly used as forage in the tropics. Accessions of this species were collected in eastern Africa and massively introduced into South America in the 1980s. Several of these accessions form a germplasm collection at the Brazilian Agricultural Research Corporation. However, apomixis, ploidy, and limited knowledge of the genetic basis of this germplasm collection have constrained breeding activities. The objectives of this work were to identify genetic variability in the Brazilian B. humidicola germplasm collection using microsatellite markers and to compare the results with information on the following: (1) collection sites of the accessions; (2) reproductive mode and ploidy levels; and (3) genetic diversity revealed by morphological traits. The evaluated germplasm population is highly structured into four major groups. The sole sexual accession did not group with any of the clusters. Genetic dissimilarities did not correlate with either geographic distances or genetic distances inferred from morphological descriptors. Additionally, the genetic structure identified in this collection did not correspond to differences in ploidy level. Alleles exclusive to either sexual or apomictic accessions were identified, suggesting that further evaluation of the association of these loci with apospory should be carried out.

  12. INFERENCE BUILDING BLOCKS

    Science.gov (United States)

    2018-02-15

    expressed a variety of inference techniques on discrete and continuous distributions: exact inference, importance sampling, Metropolis-Hastings (MH...without redoing any math or rewriting any code. And although our main goal is composable reuse, our performance is also good because we can use...control paths. • The Hakaru language can express mixtures of discrete and continuous distributions, but the current disintegration transformation

  13. Practical Bayesian Inference

    Science.gov (United States)

    Bailer-Jones, Coryn A. L.

    2017-04-01

    Preface; 1. Probability basics; 2. Estimation and uncertainty; 3. Statistical models and inference; 4. Linear models, least squares, and maximum likelihood; 5. Parameter estimation: single parameter; 6. Parameter estimation: multiple parameters; 7. Approximating distributions; 8. Monte Carlo methods for inference; 9. Parameter estimation: Markov chain Monte Carlo; 10. Frequentist hypothesis testing; 11. Model comparison; 12. Dealing with more complicated problems; References; Index.

  14. Using adaptive network based fuzzy inference system to forecast regional electricity loads

    International Nuclear Information System (INIS)

    Ying, L.-C.; Pan, M.-C.

    2008-01-01

    Since accurate regional load forecasting is very important for improvement of the management performance of the electric industry, various regional load forecasting methods have been developed. The purpose of this study is to apply the adaptive network based fuzzy inference system (ANFIS) model to forecast the regional electricity loads in Taiwan and demonstrate the forecasting performance of this model. Based on the mean absolute percentage errors and statistical results, we can see that the ANFIS model has better forecasting performance than the regression model, artificial neural network (ANN) model, support vector machines with genetic algorithms (SVMG) model, recurrent support vector machines with genetic algorithms (RSVMG) model and hybrid ellipsoidal fuzzy systems for time series forecasting (HEFST) model. Thus, the ANFIS model is a promising alternative for forecasting regional electricity loads

  15. Using adaptive network based fuzzy inference system to forecast regional electricity loads

    Energy Technology Data Exchange (ETDEWEB)

    Ying, Li-Chih [Department of Marketing Management, Central Taiwan University of Science and Technology, 11, Pu-tzu Lane, Peitun, Taichung City 406 (China); Pan, Mei-Chiu [Graduate Institute of Management Sciences, Nanhua University, 32, Chung Keng Li, Dalin, Chiayi 622 (China)

    2008-02-15

    Since accurate regional load forecasting is very important for improvement of the management performance of the electric industry, various regional load forecasting methods have been developed. The purpose of this study is to apply the adaptive network based fuzzy inference system (ANFIS) model to forecast the regional electricity loads in Taiwan and demonstrate the forecasting performance of this model. Based on the mean absolute percentage errors and statistical results, we can see that the ANFIS model has better forecasting performance than the regression model, artificial neural network (ANN) model, support vector machines with genetic algorithms (SVMG) model, recurrent support vector machines with genetic algorithms (RSVMG) model and hybrid ellipsoidal fuzzy systems for time series forecasting (HEFST) model. Thus, the ANFIS model is a promising alternative for forecasting regional electricity loads. (author)

  16. Genetic Diversity of Rose germplasm based on RAPD analysis

    African Journals Online (AJOL)

    AHSAN IQBAL

    2012-06-12

    Jun 12, 2012 ... identification and analysis of genetic variation within a collection of 4 species and 30 accessions of rose using RAPD analysis technique. The results showed the molecular distinctions among the ... that range in colour from white and yellow to many shades of pink and red have been developed. Since.

  17. Genetic characterization and evolutionary inference of TNF-α through computational analysis

    Directory of Open Access Journals (Sweden)

    Gauri Awasthi

    Full Text Available TNF-α is an important human cytokine that imparts dualism in malaria pathogenicity. At high dosages, TNF-α is believed to provoke pathogenicity in cerebral malaria; while at lower dosages TNF-α is protective against severe human malaria. In order to understand the human TNF-α gene and to ascertain evolutionary aspects of its dualistic nature for malaria pathogenicity, we characterized this gene in detail in six different mammalian taxa. The avian taxon, Gallus gallus was included in our study, as TNF-α is not present in birds; therefore, a tandemly placed duplicate of TNF-α (LT-α or TNF-β was included. A comparative study was made of nucleotide length variations, intron and exon sizes and number variations, differential compositions of coding to non-coding bases, etc., to look for similarities/dissimilarities in the TNF-α gene across all seven taxa. A phylogenetic analysis revealed the pattern found in other genes, as humans, chimpanzees and rhesus monkeys were placed in a single clade, and rats and mice in another; the chicken was in a clearly separate branch. We further focused on these three taxa and aligned the amino acid sequences; there were small differences between humans and chimpanzees; both were more different from the rhesus monkey. Further, comparison of coding and non-coding nucleotide length variations and coding to non-coding nucleotide ratio between TNF-α and TNF-β among these three mammalian taxa provided a first-hand indication of the role of the TNF-α gene, but not of TNF-β in the dualistic nature of TNF-α in malaria pathogenicity.

  18. Comparison of Urban Human Movements Inferring from Multi-Source Spatial-Temporal Data

    Science.gov (United States)

    Cao, Rui; Tu, Wei; Cao, Jinzhou; Li, Qingquan

    2016-06-01

    The quantification of human movements is very hard because of the sparsity of traditional data and the labour intensive of the data collecting process. Recently, much spatial-temporal data give us an opportunity to observe human movement. This research investigates the relationship of city-wide human movements inferring from two types of spatial-temporal data at traffic analysis zone (TAZ) level. The first type of human movement is inferred from long-time smart card transaction data recording the boarding actions. The second type of human movement is extracted from citywide time sequenced mobile phone data with 30 minutes interval. Travel volume, travel distance and travel time are used to measure aggregated human movements in the city. To further examine the relationship between the two types of inferred movements, the linear correlation analysis is conducted on the hourly travel volume. The obtained results show that human movements inferred from smart card data and mobile phone data have a correlation of 0.635. However, there are still some non-ignorable differences in some special areas. This research not only reveals the citywide spatial-temporal human dynamic but also benefits the understanding of the reliability of the inference of human movements with big spatial-temporal data.

  19. COMPARISON OF URBAN HUMAN MOVEMENTS INFERRING FROM MULTI-SOURCE SPATIAL-TEMPORAL DATA

    Directory of Open Access Journals (Sweden)

    R. Cao

    2016-06-01

    Full Text Available The quantification of human movements is very hard because of the sparsity of traditional data and the labour intensive of the data collecting process. Recently, much spatial-temporal data give us an opportunity to observe human movement. This research investigates the relationship of city-wide human movements inferring from two types of spatial-temporal data at traffic analysis zone (TAZ level. The first type of human movement is inferred from long-time smart card transaction data recording the boarding actions. The second type of human movement is extracted from citywide time sequenced mobile phone data with 30 minutes interval. Travel volume, travel distance and travel time are used to measure aggregated human movements in the city. To further examine the relationship between the two types of inferred movements, the linear correlation analysis is conducted on the hourly travel volume. The obtained results show that human movements inferred from smart card data and mobile phone data have a correlation of 0.635. However, there are still some non-ignorable differences in some special areas. This research not only reveals the citywide spatial-temporal human dynamic but also benefits the understanding of the reliability of the inference of human movements with big spatial-temporal data.

  20. Inference of Transcriptional Network for Pluripotency in Mouse Embryonic Stem Cells

    International Nuclear Information System (INIS)

    Aburatani, S

    2015-01-01

    In embryonic stem cells, various transcription factors (TFs) maintain pluripotency. To gain insights into the regulatory system controlling pluripotency, I inferred the regulatory relationships between the TFs expressed in ES cells. In this study, I applied a method based on structural equation modeling (SEM), combined with factor analysis, to 649 expression profiles of 19 TF genes measured in mouse Embryonic Stem Cells (ESCs). The factor analysis identified 19 TF genes that were regulated by several unmeasured factors. Since the known cell reprogramming TF genes (Pou5f1, Sox2 and Nanog) are regulated by different factors, each estimated factor is considered to be an input for signal transduction to control pluripotency in mouse ESCs. In the inferred network model, TF proteins were also arranged as unmeasured factors that control other TFs. The interpretation of the inferred network model revealed the regulatory mechanism for controlling pluripotency in ES cells

  1. Evolutionary history of the third chromosome gene arrangements of Drosophila pseudoobscura inferred from inversion breakpoints.

    Science.gov (United States)

    Wallace, Andre G; Detweiler, Don; Schaeffer, Stephen W

    2011-08-01

    The third chromosome of Drosophila pseudoobscura is polymorphic for numerous gene arrangements that form classical clines in North America. The polytene salivary chromosomes isolated from natural populations revealed changes in gene order that allowed the different gene arrangements to be linked together by paracentric inversions representing one of the first cases where genetic data were used to construct a phylogeny. Although the inversion phylogeny can be used to determine the relationships among the gene arrangements, the cytogenetic data are unable to infer the ancestral arrangement or the age of the different chromosome types. These are both important properties if one is to infer the evolutionary forces responsible for the spread and maintenance of the chromosomes. Here, we employ the nucleotide sequences of 18 regions distributed across the third chromosome in 80-100 D. pseudoobscura strains to test whether five gene arrangements are of unique or multiple origin, what the ancestral arrangement was, and what are the ages of the different arrangements. Each strain carried one of six commonly found gene arrangements and the sequences were used to infer their evolutionary relationships. Breakpoint regions in the center of the chromosome supported monophyly of the gene arrangements, whereas regions at the ends of the chromosome gave phylogenies that provided less support for monophyly of the chromosomes either because the individual markers did not have enough phylogenetically informative sites or genetic exchange scrambled information among the gene arrangements. A data set where the genetic markers were concatenated strongly supported a unique origin of the different gene arrangements. The inversion polymorphism of D. pseudoobscura is estimated to be about a million years old. We have also shown that the generated phylogeny is consistent with the cytological phylogeny of this species. In addition, the data presented here support hypothetical as the ancestral

  2. Bayesian Inference for Neural Electromagnetic Source Localization: Analysis of MEG Visual Evoked Activity

    International Nuclear Information System (INIS)

    George, J.S.; Schmidt, D.M.; Wood, C.C.

    1999-01-01

    We have developed a Bayesian approach to the analysis of neural electromagnetic (MEG/EEG) data that can incorporate or fuse information from other imaging modalities and addresses the ill-posed inverse problem by sarnpliig the many different solutions which could have produced the given data. From these samples one can draw probabilistic inferences about regions of activation. Our source model assumes a variable number of variable size cortical regions of stimulus-correlated activity. An active region consists of locations on the cortical surf ace, within a sphere centered on some location in cortex. The number and radi of active regions can vary to defined maximum values. The goal of the analysis is to determine the posterior probability distribution for the set of parameters that govern the number, location, and extent of active regions. Markov Chain Monte Carlo is used to generate a large sample of sets of parameters distributed according to the posterior distribution. This sample is representative of the many different source distributions that could account for given data, and allows identification of probable (i.e. consistent) features across solutions. Examples of the use of this analysis technique with both simulated and empirical MEG data are presented

  3. A Continuous Correlated Beta Process Model for Genetic Ancestry in Admixed Populations.

    Directory of Open Access Journals (Sweden)

    Zachariah Gompert

    Full Text Available Admixture and recombination create populations and genomes with genetic ancestry from multiple source populations. Analyses of genetic ancestry in admixed populations are relevant for trait and disease mapping, studies of speciation, and conservation efforts. Consequently, many methods have been developed to infer genome-average ancestry and to deconvolute ancestry into continuous local ancestry blocks or tracts within individuals. Current methods for local ancestry inference perform well when admixture occurred recently or hybridization is ongoing, or when admixture occurred in the distant past such that local ancestry blocks have fixed in the admixed population. However, methods to infer local ancestry frequencies in isolated admixed populations still segregating for ancestry do not exist. In the current paper, I develop and test a continuous correlated beta process model to fill this analytical gap. The method explicitly models autocorrelations in ancestry frequencies at the population-level and uses discriminant analysis of SNP windows to take advantage of ancestry blocks within individuals. Analyses of simulated data sets show that the method is generally accurate such that ancestry frequency estimates exhibited low root-mean-square error and were highly correlated with the true values, particularly when large (±10 or ±20 SNP windows were used. Along these lines, the proposed method outperformed post hoc inference of ancestry frequencies from a traditional hidden Markov model (i.e., the linkage model in structure, particularly when admixture occurred more distantly in the past with little on-going gene flow or was followed by natural selection. The reliability and utility of the method was further assessed by analyzing genetic ancestry in an admixed human population (Uyghur and three populations from a hybrid zone between Mus domesticus and M. musculus. Considerable variation in ancestry frequencies was detected within and among

  4. Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference

    Energy Technology Data Exchange (ETDEWEB)

    Beggs, W.J.

    1981-02-01

    This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; the analysis of variance; quality control procedures; and linear regression analysis.

  5. Logical inference and evaluation

    International Nuclear Information System (INIS)

    Perey, F.G.

    1981-01-01

    Most methodologies of evaluation currently used are based upon the theory of statistical inference. It is generally perceived that this theory is not capable of dealing satisfactorily with what are called systematic errors. Theories of logical inference should be capable of treating all of the information available, including that not involving frequency data. A theory of logical inference is presented as an extension of deductive logic via the concept of plausibility and the application of group theory. Some conclusions, based upon the application of this theory to evaluation of data, are also given

  6. Hierarchical linear modeling of longitudinal pedigree data for genetic association analysis

    DEFF Research Database (Denmark)

    Tan, Qihua; B Hjelmborg, Jacob V; Thomassen, Mads

    2014-01-01

    -effect models to explicitly model the genetic relationship. These have proved to be an efficient way of dealing with sample clustering in pedigree data. Although current algorithms implemented in popular statistical packages are useful for adjusting relatedness in the mixed modeling of genetic effects...... associated with blood pressure with estimated inflation factors of 0.99, suggesting that our modeling of random effects efficiently handles the genetic relatedness in pedigrees. Application to simulated data captures important variants specified in the simulation. Our results show that the method is useful......Genetic association analysis on complex phenotypes under a longitudinal design involving pedigrees encounters the problem of correlation within pedigrees, which could affect statistical assessment of the genetic effects. Approaches have been proposed to integrate kinship correlation into the mixed...

  7. Raps markers for genetic diversity analysis in rice (Oryza sativa L)

    Energy Technology Data Exchange (ETDEWEB)

    Alvarez, A; Fuentes, Jorge L [Centro de Estudios Aplicados al Desarrollo Nuclear, La Habana (Cuba); Deus, Juan E [Instituto de Investigaciones del Arroz, Habana (Cuba); Duque, Maria C [Centro Internacional de la Agricultura Tropical. Proyecto de Arroz , Cali (Colombia)

    1999-07-01

    The establishment of relationships between genotypes existing in gene banks that may be used in new crosses, and about genetic diversity in available germplasm, is very useful for plant breeders. In this work, a genetic diversity analysis among 20 varieties of the Cuban rice germplasm bank was performed by using RAPD markers. Twenty four decamer primers were screened which produced 61 polymorphic bands out of 105 consistent and reproducible amplified fragments (58.1 %). The proportion of polymorphic bands varied for each primer, with an average of 3 polymorphic bands per primer, these results agreed with previous reports on RAPD polymorphism in rice germplasm. Depending on the primer, 1 to 7 distinct patterns were obtained among the screened genotypes. Pair-wise genetic distances between genotypes were computed based on Dice's coefficient. Three major, statistically robust groups were obtained in the UPGMA dendrogram (A, B and C) which clearly corresponded to different genetic pools. Additionally, more insight could be gained according to the sub-grouping pattern within group A, which included the principal semi-dwarf commercial varieties. The present study allowed to prove the efficiency of RAPD markers for genetic diversity analysis in closely related germplasm, particularly for the semi-dwarf Cuban commercial rice cultivars. Also, the existence of a narrow genetic base among these varieties has been confirmed, pointing at the urgent necessity of widen it.

  8. Raps markers for genetic diversity analysis in rice (Oryza sativa L)

    International Nuclear Information System (INIS)

    Alvarez, A.; Fuentes, Jorge L.; Deus, Juan E.; Duque, Maria C.

    1999-01-01

    The establishment of relationships between genotypes existing in gene banks that may be used in new crosses, and about genetic diversity in available germplasm, is very useful for plant breeders. In this work, a genetic diversity analysis among 20 varieties of the Cuban rice germplasm bank was performed by using RAPD markers. Twenty four decamer primers were screened which produced 61 polymorphic bands out of 105 consistent and reproducible amplified fragments (58.1 %). The proportion of polymorphic bands varied for each primer, with an average of 3 polymorphic bands per primer, these results agreed with previous reports on RAPD polymorphism in rice germplasm. Depending on the primer, 1 to 7 distinct patterns were obtained among the screened genotypes. Pair-wise genetic distances between genotypes were computed based on Dice's coefficient. Three major, statistically robust groups were obtained in the UPGMA dendrogram (A, B and C) which clearly corresponded to different genetic pools. Additionally, more insight could be gained according to the sub-grouping pattern within group A, which included the principal semi-dwarf commercial varieties. The present study allowed to prove the efficiency of RAPD markers for genetic diversity analysis in closely related germplasm, particularly for the semi-dwarf Cuban commercial rice cultivars. Also, the existence of a narrow genetic base among these varieties has been confirmed, pointing at the urgent necessity of widen it

  9. Genetic Algorithm-Based Optimization to Match Asteroid Energy Deposition Curves

    Science.gov (United States)

    Tarano, Ana; Mathias, Donovan; Wheeler, Lorien; Close, Sigrid

    2018-01-01

    An asteroid entering Earth's atmosphere deposits energy along its path due to thermal ablation and dissipative forces that can be measured by ground-based and spaceborne instruments. Inference of pre-entry asteroid properties and characterization of the atmospheric breakup is facilitated by using an analytic fragment-cloud model (FCM) in conjunction with a Genetic Algorithm (GA). This optimization technique is used to inversely solve for the asteroid's entry properties, such as diameter, density, strength, velocity, entry angle, and strength scaling, from simulations using FCM. The previous parameters' fitness evaluation involves minimizing error to ascertain the best match between the physics-based calculated energy deposition and the observed meteors. This steady-state GA provided sets of solutions agreeing with literature, such as the meteor from Chelyabinsk, Russia in 2013 and Tagish Lake, Canada in 2000, which were used as case studies in order to validate the optimization routine. The assisted exploration and exploitation of this multi-dimensional search space enables inference and uncertainty analysis that can inform studies of near-Earth asteroids and consequently improve risk assessment.

  10. Genomic inferences of domestication events are corroborated by written records in Brassica rapa.

    Science.gov (United States)

    Qi, Xinshuai; An, Hong; Ragsdale, Aaron P; Hall, Tara E; Gutenkunst, Ryan N; Chris Pires, J; Barker, Michael S

    2017-07-01

    Demographic modelling is often used with population genomic data to infer the relationships and ages among populations. However, relatively few analyses are able to validate these inferences with independent data. Here, we leverage written records that describe distinct Brassica rapa crops to corroborate demographic models of domestication. Brassica rapa crops are renowned for their outstanding morphological diversity, but the relationships and order of domestication remain unclear. We generated genomewide SNPs from 126 accessions collected globally using high-throughput transcriptome data. Analyses of more than 31,000 SNPs across the B. rapa genome revealed evidence for five distinct genetic groups and supported a European-Central Asian origin of B. rapa crops. Our results supported the traditionally recognized South Asian and East Asian B. rapa groups with evidence that pak choi, Chinese cabbage and yellow sarson are likely monophyletic groups. In contrast, the oil-type B. rapa subsp. oleifera and brown sarson were polyphyletic. We also found no evidence to support the contention that rapini is the wild type or the earliest domesticated subspecies of B. rapa. Demographic analyses suggested that B. rapa was introduced to Asia 2,400-4,100 years ago, and that Chinese cabbage originated 1,200-2,100 years ago via admixture of pak choi and European-Central Asian B. rapa. We also inferred significantly different levels of founder effect among the B. rapa subspecies. Written records from antiquity that document these crops are consistent with these inferences. The concordance between our age estimates of domestication events with historical records provides unique support for our demographic inferences. © 2017 John Wiley & Sons Ltd.

  11. Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models.

    Directory of Open Access Journals (Sweden)

    Richard R Stein

    2015-07-01

    Full Text Available Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

  12. Microsatellite DNA analysis of northern pike ( Esox lucius L.) populations: insights into the genetic structure and demographic history of a genetically depauperate species

    DEFF Research Database (Denmark)

    Jacobsen, B. H.; Hansen, Michael Møller; Loeschcke, V.

    2005-01-01

    The northern pike Esox lucius L. is a freshwater fish exhibiting pronounced population subdivision and low genetic variability. However, there is limited knowledge on phylogeographical patterns within the species, and it is not known whether the low genetic variability reflects primarily current...... low effective population sizes or historical bottlenecks. We analysed six microsatellite loci in ten populations from Europe and North America. Genetic variation was low, with the average number of alleles within populations ranging from 2.3 to 4.0 per locus. Genetic differentiation among populations...... was high (overall theta(ST) = 0.51; overall rho(ST) = 0.50). Multidimensional scaling analysis of genetic distances between populations and spatial analysis of molecular variance suggested a single phylogeographical race within the sampled populations from northern Europe, whereas North American...

  13. Low genetic diversity and minimal population substructure in the endangered Florida manatee: implications for conservation

    Science.gov (United States)

    Tucker, Kimberly Pause; Hunter, Margaret E.; Bonde, Robert K.; Austin, James D.; Clark, Ann Marie; Beck, Cathy A.; McGuire, Peter M.; Oli, Madan K.

    2012-01-01

    Species of management concern that have been affected by human activities typically are characterized by low genetic diversity, which can adversely affect their ability to adapt to environmental changes. We used 18 microsatellite markers to genotype 362 Florida manatees (Trichechus manatus latirostris), and investigated genetic diversity, population structure, and estimated genetically effective population size (Ne). The observed and expected heterozygosity and average number of alleles were 0.455 ± 0.04, 0.479 ± 0.04, and 4.77 ± 0.51, respectively. All measures of Florida manatee genetic diversity were less than averages reported for placental mammals, including fragmented or nonideal populations. Overall estimates of differentiation were low, though significantly greater than zero, and analysis of molecular variance revealed that over 95% of the total variance was among individuals within predefined management units or among individuals along the coastal subpopulations, with only minor portions of variance explained by between group variance. Although genetic issues, as inferred by neutral genetic markers, appear not to be critical at present, the Florida manatee continues to face demographic challenges due to anthropogenic activities and stochastic factors such as red tides, oil spills, and disease outbreaks; these can further reduce genetic diversity of the manatee population.

  14. Models for probability and statistical inference theory and applications

    CERN Document Server

    Stapleton, James H

    2007-01-01

    This concise, yet thorough, book is enhanced with simulations and graphs to build the intuition of readersModels for Probability and Statistical Inference was written over a five-year period and serves as a comprehensive treatment of the fundamentals of probability and statistical inference. With detailed theoretical coverage found throughout the book, readers acquire the fundamentals needed to advance to more specialized topics, such as sampling, linear models, design of experiments, statistical computing, survival analysis, and bootstrapping.Ideal as a textbook for a two-semester sequence on probability and statistical inference, early chapters provide coverage on probability and include discussions of: discrete models and random variables; discrete distributions including binomial, hypergeometric, geometric, and Poisson; continuous, normal, gamma, and conditional distributions; and limit theory. Since limit theory is usually the most difficult topic for readers to master, the author thoroughly discusses mo...

  15. Analysis of conditional genetic effects and variance components in developmental genetics.

    Science.gov (United States)

    Zhu, J

    1995-12-01

    A genetic model with additive-dominance effects and genotype x environment interactions is presented for quantitative traits with time-dependent measures. The genetic model for phenotypic means at time t conditional on phenotypic means measured at previous time (t-1) is defined. Statistical methods are proposed for analyzing conditional genetic effects and conditional genetic variance components. Conditional variances can be estimated by minimum norm quadratic unbiased estimation (MINQUE) method. An adjusted unbiased prediction (AUP) procedure is suggested for predicting conditional genetic effects. A worked example from cotton fruiting data is given for comparison of unconditional and conditional genetic variances and additive effects.

  16. Inference

    DEFF Research Database (Denmark)

    Møller, Jesper

    (This text written by Jesper Møller, Aalborg University, is submitted for the collection ‘Stochastic Geometry: Highlights, Interactions and New Perspectives', edited by Wilfrid S. Kendall and Ilya Molchanov, to be published by ClarendonPress, Oxford, and planned to appear as Section 4.1 with the ......(This text written by Jesper Møller, Aalborg University, is submitted for the collection ‘Stochastic Geometry: Highlights, Interactions and New Perspectives', edited by Wilfrid S. Kendall and Ilya Molchanov, to be published by ClarendonPress, Oxford, and planned to appear as Section 4.......1 with the title ‘Inference'.) This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods using Markov chain Monte Carlo (MCMC) simulations. Due to space limitations the focus...

  17. Hierarchical modeling and inference in ecology: The analysis of data from populations, metapopulations and communities

    Science.gov (United States)

    Royle, J. Andrew; Dorazio, Robert M.

    2008-01-01

    A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics.

  18. Multivariate Survival Mixed Models for Genetic Analysis of Longevity Traits

    DEFF Research Database (Denmark)

    Pimentel Maia, Rafael; Madsen, Per; Labouriau, Rodrigo

    2014-01-01

    A class of multivariate mixed survival models for continuous and discrete time with a complex covariance structure is introduced in a context of quantitative genetic applications. The methods introduced can be used in many applications in quantitative genetics although the discussion presented co...... applications. The methods presented are implemented in such a way that large and complex quantitative genetic data can be analyzed......A class of multivariate mixed survival models for continuous and discrete time with a complex covariance structure is introduced in a context of quantitative genetic applications. The methods introduced can be used in many applications in quantitative genetics although the discussion presented...... concentrates on longevity studies. The framework presented allows to combine models based on continuous time with models based on discrete time in a joint analysis. The continuous time models are approximations of the frailty model in which the hazard function will be assumed to be piece-wise constant...

  19. Multivariate Survival Mixed Models for Genetic Analysis of Longevity Traits

    DEFF Research Database (Denmark)

    Pimentel Maia, Rafael; Madsen, Per; Labouriau, Rodrigo

    2013-01-01

    A class of multivariate mixed survival models for continuous and discrete time with a complex covariance structure is introduced in a context of quantitative genetic applications. The methods introduced can be used in many applications in quantitative genetics although the discussion presented co...... applications. The methods presented are implemented in such a way that large and complex quantitative genetic data can be analyzed......A class of multivariate mixed survival models for continuous and discrete time with a complex covariance structure is introduced in a context of quantitative genetic applications. The methods introduced can be used in many applications in quantitative genetics although the discussion presented...... concentrates on longevity studies. The framework presented allows to combine models based on continuous time with models based on discrete time in a joint analysis. The continuous time models are approximations of the frailty model in which the hazard function will be assumed to be piece-wise constant...

  20. Lower complexity bounds for lifted inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred

    2015-01-01

    instances of the model. Numerous approaches for such “lifted inference” techniques have been proposed. While it has been demonstrated that these techniques will lead to significantly more efficient inference on some specific models, there are only very recent and still quite restricted results that show...... the feasibility of lifted inference on certain syntactically defined classes of models. Lower complexity bounds that imply some limitations for the feasibility of lifted inference on more expressive model classes were established earlier in Jaeger (2000; Jaeger, M. 2000. On the complexity of inference about...... that under the assumption that NETIME≠ETIME, there is no polynomial lifted inference algorithm for knowledge bases of weighted, quantifier-, and function-free formulas. Further strengthening earlier results, this is also shown to hold for approximate inference and for knowledge bases not containing...

  1. Multivariate Meta-Analysis of Genetic Association Studies: A Simulation Study.

    Directory of Open Access Journals (Sweden)

    Binod Neupane

    Full Text Available In a meta-analysis with multiple end points of interests that are correlated between or within studies, multivariate approach to meta-analysis has a potential to produce more precise estimates of effects by exploiting the correlation structure between end points. However, under random-effects assumption the multivariate estimation is more complex (as it involves estimation of more parameters simultaneously than univariate estimation, and sometimes can produce unrealistic parameter estimates. Usefulness of multivariate approach to meta-analysis of the effects of a genetic variant on two or more correlated traits is not well understood in the area of genetic association studies. In such studies, genetic variants are expected to roughly maintain Hardy-Weinberg equilibrium within studies, and also their effects on complex traits are generally very small to modest and could be heterogeneous across studies for genuine reasons. We carried out extensive simulation to explore the comparative performance of multivariate approach with most commonly used univariate inverse-variance weighted approach under random-effects assumption in various realistic meta-analytic scenarios of genetic association studies of correlated end points. We evaluated the performance with respect to relative mean bias percentage, and root mean square error (RMSE of the estimate and coverage probability of corresponding 95% confidence interval of the effect for each end point. Our simulation results suggest that multivariate approach performs similarly or better than univariate method when correlations between end points within or between studies are at least moderate and between-study variation is similar or larger than average within-study variation for meta-analyses of 10 or more genetic studies. Multivariate approach produces estimates with smaller bias and RMSE especially for the end point that has randomly or informatively missing summary data in some individual studies, when

  2. A Cautionary Analysis of STAPLE Using Direct Inference of Segmentation Truth

    DEFF Research Database (Denmark)

    Van Leemput, Koen; Sabuncu, Mert R.

    2014-01-01

    In this paper we analyze the properties of the well-known segmentation fusion algorithm STAPLE, using a novel inference technique that analytically marginalizes out all model parameters. We demonstrate both theoretically and empirically that when the number of raters is large, or when consensus r...

  3. Genetic affinities of north and northeastern populations of India: inference from HLA-based study.

    Science.gov (United States)

    Agrawal, S; Srivastava, S K; Borkar, M; Chaudhuri, T K

    2008-08-01

    India is like a microcosm of the world in terms of its diversity; religion, climate and ethnicity which leads to genetic variations in the populations. As a highly polymorphic marker, the human leukocyte antigen (HLA) system plays an important role in the genetic differentiation studies. To assess the genetic diversity of HLA class II loci, we studied a total of 1336 individuals from north India using DNA-based techniques. The study included four endogamous castes (Kayastha, Mathurs, Rastogies and Vaishyas), two inbreeding Muslim populations (Shias and Sunnis) from north India and three northeast Indian populations (Lachung, Mech and Rajbanshi). A total of 36 alleles were observed at DRB1 locus in both Hindu castes and Muslims from north, while 21 alleles were seen in northeast Indians. At the DQA1 locus, the number of alleles ranged from 11 to 17 in the studied populations. The total number of alleles at DQB1 was 19, 12 and 20 in the studied castes, Muslims and northeastern populations, respectively. The most frequent haplotypes observed in all the studied populations were DRB1*0701-DQA1*0201-DQB1*0201 and DRB1*1501-DQA1*0103-DQB1*0601. Upon comparing our results with other world populations, we observed the presence of Caucasoid element in north Indian population. However, differential admixturing among Sunnis and Shias with the other north Indians was evident. Northeastern populations showed genetic affinity with Mongoloids from southeast Asia. When genetic distances were calculated, we found the north Indians and northeastern populations to be markedly unrelated.

  4. Variations on Bayesian Prediction and Inference

    Science.gov (United States)

    2016-05-09

    inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle

  5. Evaluation of mature cow weight: genetic correlations with traits used in selection indices, correlated responses, and genetic trends in Nelore cattle.

    Science.gov (United States)

    Boligon, A A; Carvalheiro, R; Albuquerque, L G

    2013-01-01

    Genetic correlations of selection indices and the traits considered in these indices with mature weight (MW) of Nelore females and correlated responses were estimated to determine whether current selection practices will result in an undesired correlated response in MW. Genetic trends for weaning and yearling indices and MW were also estimated. Data from 612,244 Nelore animals born between 1984 and 2010, belonging to different beef cattle evaluation programs from Brazil and Paraguay, were used. The following traits were studied: weaning conformation (WC), weaning precocity (WP), weaning muscling (WM), yearling conformation (YC), yearling precocity (YP), yearling muscling (YM), weaning and yearling indices, BW gain from birth to weaning (BWG), postweaning BW gain (PWG), scrotal circumference (SC), and MW. The variance and covariance components were estimated by Bayesian inference in a multitrait analysis, including all traits in the same analysis, using a nonlinear (threshold) animal model for visual scores and a linear animal model for the other traits. The mean direct heritabilities were 0.21±0.007 (WC), 0.22±0.007 (WP), 0.20±0.007 (WM), 0.43±0.005 (YC), 0.40±0.005 (YP), 0.40±0.005 (YM), 0.17±0.003 (BWG), 0.21±0.004 (PWG), 0.32±0.001 (SC), and 0.44±0.018 (MW). The genetic correlations between MW and weaning and yearling indices were positive and of medium magnitude (0.30±0.01 and 0.31±0.01, respectively). The genetic changes in weaning index, yearling index, and MW, expressed as units of genetic SD per year, were 0.26, 0.27, and 0.01, respectively. The genetic trend for MW was nonsignificant, suggesting no negative correlated response. The selection practice based on the use of sires with high final index giving preference for those better ranked for yearling precocity and muscling than for conformation generates only a minimal correlated response in MW.

  6. Analysis of genetic diversity in Bolivian llama populations using microsatellites.

    Science.gov (United States)

    Barreta, J; Gutiérrez-Gil, B; Iñiguez, V; Romero, F; Saavedra, V; Chiri, R; Rodríguez, T; Arranz, J J

    2013-08-01

    South American camelids (SACs) have a major role in the maintenance and potential future of rural Andean human populations. More than 60% of the 3.7 million llamas living worldwide are found in Bolivia. Due to the lack of studies focusing on genetic diversity in Bolivian llamas, this analysis investigates both the genetic diversity and structure of 12 regional groups of llamas that span the greater part of the range of distribution for this species in Bolivia. The analysis of 42 microsatellite markers in the considered regional groups showed that, in general, there were high levels of polymorphism (a total of 506 detected alleles; average PIC across per marker: 0.66), which are comparable with those reported for other populations of domestic SACs. The estimated diversity parameters indicated that there was high intrapopulational genetic variation (average number of alleles and average expected heterozygosity per marker: 12.04 and 0.68, respectively) and weak genetic differentiation among populations (FST range: 0.003-0.052). In agreement with these estimates, Bolivian llamas showed a weak genetic structure and an intense gene flow between all the studied regional groups, which is due to the exchange of reproductive males between the different flocks. Interestingly, the groups for which the largest pairwise FST estimates were observed, Sud Lípez and Nor Lípez, showed a certain level of genetic differentiation that is probably due to the pattern of geographic isolation and limited communication infrastructures of these southern localities. Overall, the population parameters reported here may serve as a reference when establishing conservation policies that address Bolivian llama populations. © 2012 Blackwell Verlag GmbH.

  7. A genetic analysis of segregation distortion revealed by molecular ...

    Indian Academy of Sciences (India)

    Journal of Genetics, Vol. 90, No. ... Segregation analysis was based on 64 molecular markers, including 26 .... FHB of RIL populations was controlled by quantitative trait ... The authors acknowledge financial support by the National Basic.

  8. Genetic evidence for a Paleolithic human population expansion in Africa

    Science.gov (United States)

    Reich, David E.; Goldstein, David B.

    1998-01-01

    Human populations have undergone dramatic expansions in size, but other than the growth associated with agriculture, the dates and magnitudes of those expansions have never been resolved. Here, we introduce two new statistical tests for population expansion, which use variation at a number of unlinked genetic markers to study the demographic histories of natural populations. By analyzing genetic variation in various aboriginal populations from throughout the world, we show highly significant evidence for a major human population expansion in Africa, but no evidence of expansion outside of Africa. The inferred African expansion is estimated to have occurred between 49,000 and 640,000 years ago, certainly before the Neolithic expansions, and probably before the splitting of African and non-African populations. In showing a significant difference between African and non-African populations, our analysis supports the unique role of Africa in human evolutionary history, as has been suggested by most other genetic work. In addition, the missing signal in non-African populations may be the result of a population bottleneck associated with the emergence of these populations from Africa, as postulated in the “Out of Africa” model of modern human origins. PMID:9653150

  9. Comparison between genetic fuzzy system and neuro fuzzy system to select oil wells for hydraulic fracturing; Comparacao entre genetic fuzzy system e neuro fuzzy system para selecao de pocos de petroleo para fraturamento hidraulico

    Energy Technology Data Exchange (ETDEWEB)

    Castro, Antonio Orestes de Salvo [PETROBRAS, Rio de Janeiro, RJ (Brazil); Ferreira Filho, Virgilio Jose Martins [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil)

    2004-07-01

    The hydraulic fracture operation is wide used to increase the oil wells production and to reduce formation damage. Reservoir studies and engineer analysis are made to select the wells for this kind of operation. As the reservoir parameters have some diffuses characteristics, Fuzzy Inference Systems (SIF) have been tested for this selection processes in the last few years. This paper compares the performance of a neuro fuzzy system and a genetic fuzzy system used for hydraulic Fracture well selection, with knowledge acquisition from an operational data base to set the SIF membership functions. The training data and the validation data used were the same for both systems. We concluded that, in despite of the genetic fuzzy system would be a younger process, it got better results than the neuro fuzzy system. Another conclusion was that, as the genetic fuzzy system can work with constraints, the membership functions setting kept the consistency of variables linguistic values. (author)

  10. Analysis of CYP3A4 genetic polymorphisms in Han Chinese.

    Science.gov (United States)

    Zhou, Qing; Yu, Xiaomin; Shu, Chang; Cai, Yimei; Gong, Wei; Wang, Xumin; Wang, Duen-mei; Hu, Songnian

    2011-06-01

    Our study aimed to comprehensively investigate the genetic polymorphisms of CYP3A4 in Han Chinese. We sequenced the gene regions of CYP3A4, including its promoter, exons, surrounding introns and 3' untranslated region (3'UTR), from 100 unrelated-healthy Han Chinese individuals. We detected 11 SNPs, three of which are novel. According to in silico functional prediction of novel variants, 20148 A>G in exon 10, resulting in substitution of Tyr319 with Cys (CYP3A4*21), may induce dramatic alteration of protein conformation, and 26908 G>A in 3'UTR may disrupt post-transcriptional regulation. We identified five alleles in Han Chinese, the allele frequencies of CYP3A4*1, *5, *6, *18 and *21 are 97, 0.5, 1, 1 and 0.5%, respectively. Haplotype inference revealed 14 haplotypes, of which the major haplotype CYP3A4*1A constitutes 59% of the total chromosomes. We also examined the possible role of natural selection in shaping the variation of CYP3A4 and confirmed a trend, consistent with the action of positive selection. We systematically screened the genetic polymorphisms of CYP3A4 in Han Chinese, highlighted possible functional impairment of the novel allele and summarized the distinct allele and haplotype frequency distribution, with an emphasis on detecting the footprint of recent positive selection on the CYP3A4 gene in Han Chinese.

  11. R Package multiPIM: A Causal Inference Approach to Variable Importance Analysis

    Directory of Open Access Journals (Sweden)

    Stephan J Ritter

    2014-04-01

    Full Text Available We describe the R package multiPIM, including statistical background, functionality and user options. The package is for variable importance analysis, and is meant primarily for analyzing data from exploratory epidemiological studies, though it could certainly be applied in other areas as well. The approach taken to variable importance comes from the causal inference field, and is different from approaches taken in other R packages. By default, multiPIM uses a double robust targeted maximum likelihood estimator (TMLE of a parameter akin to the attributable risk. Several regression methods/machine learning algorithms are available for estimating the nuisance parameters of the models, including super learner, a meta-learner which combines several different algorithms into one. We describe a simulation in which the double robust TMLE is compared to the graphical computation estimator. We also provide example analyses using two data sets which are included with the package.

  12. Obesity as a risk factor for developing functional limitation among older adults: A conditional inference tree analysis.

    Science.gov (United States)

    Cheng, Feon W; Gao, Xiang; Bao, Le; Mitchell, Diane C; Wood, Craig; Sliwinski, Martin J; Smiciklas-Wright, Helen; Still, Christopher D; Rolston, David D K; Jensen, Gordon L

    2017-07-01

    To examine the risk factors of developing functional decline and make probabilistic predictions by using a tree-based method that allows higher order polynomials and interactions of the risk factors. The conditional inference tree analysis, a data mining approach, was used to construct a risk stratification algorithm for developing functional limitation based on BMI and other potential risk factors for disability in 1,951 older adults without functional limitations at baseline (baseline age 73.1 ± 4.2 y). We also analyzed the data with multivariate stepwise logistic regression and compared the two approaches (e.g., cross-validation). Over a mean of 9.2 ± 1.7 years of follow-up, 221 individuals developed functional limitation. Higher BMI, age, and comorbidity were consistently identified as significant risk factors for functional decline using both methods. Based on these factors, individuals were stratified into four risk groups via the conditional inference tree analysis. Compared to the low-risk group, all other groups had a significantly higher risk of developing functional limitation. The odds ratio comparing two extreme categories was 9.09 (95% confidence interval: 4.68, 17.6). Higher BMI, age, and comorbid disease were consistently identified as significant risk factors for functional decline among older individuals across all approaches and analyses. © 2017 The Obesity Society.

  13. Bayesian inference in probabilistic risk assessment-The current state of the art

    International Nuclear Information System (INIS)

    Kelly, Dana L.; Smith, Curtis L.

    2009-01-01

    Markov chain Monte Carlo (MCMC) approaches to sampling directly from the joint posterior distribution of aleatory model parameters have led to tremendous advances in Bayesian inference capability in a wide variety of fields, including probabilistic risk analysis. The advent of freely available software coupled with inexpensive computing power has catalyzed this advance. This paper examines where the risk assessment community is with respect to implementing modern computational-based Bayesian approaches to inference. Through a series of examples in different topical areas, it introduces salient concepts and illustrates the practical application of Bayesian inference via MCMC sampling to a variety of important problems

  14. Approximate maximum likelihood estimation for population genetic inference.

    Science.gov (United States)

    Bertl, Johanna; Ewing, Gregory; Kosiol, Carolin; Futschik, Andreas

    2017-11-27

    In many population genetic problems, parameter estimation is obstructed by an intractable likelihood function. Therefore, approximate estimation methods have been developed, and with growing computational power, sampling-based methods became popular. However, these methods such as Approximate Bayesian Computation (ABC) can be inefficient in high-dimensional problems. This led to the development of more sophisticated iterative estimation methods like particle filters. Here, we propose an alternative approach that is based on stochastic approximation. By moving along a simulated gradient or ascent direction, the algorithm produces a sequence of estimates that eventually converges to the maximum likelihood estimate, given a set of observed summary statistics. This strategy does not sample much from low-likelihood regions of the parameter space, and is fast, even when many summary statistics are involved. We put considerable efforts into providing tuning guidelines that improve the robustness and lead to good performance on problems with high-dimensional summary statistics and a low signal-to-noise ratio. We then investigate the performance of our resulting approach and study its properties in simulations. Finally, we re-estimate parameters describing the demographic history of Bornean and Sumatran orang-utans.

  15. A Visual Analysis Approach for Inferring Personal Job and Housing Locations Based on Public Bicycle Data

    Directory of Open Access Journals (Sweden)

    Xiaoying Shi

    2017-07-01

    Full Text Available Information concerning the home and workplace of residents is the basis of analyzing the urban job-housing spatial relationship. Traditional methods conduct time-consuming user surveys to obtain personal job and housing location information. Some new methods define rules to detect personal places based on human mobility data. However, because the travel patterns of residents are variable, simple rule-based methods are unable to generalize highly changing and complex travel modes. In this paper, we propose a visual analysis approach to assist the analyzer in inferring personal job and housing locations interactively based on public bicycle data. All users are first clustered to find potential commuting users. Then, several visual views are designed to find the key candidate stations for a specific user, and the visited temporal pattern of stations and the user’s hire behavior are analyzed, which helps with the inference of station semantic meanings. Finally, a number of users’ job and housing locations are detected by the analyzer and visualized. Our approach can manage the complex and diverse cycling habits of users. The effectiveness of the approach is shown through case studies based on a real-world public bicycle dataset.

  16. Inference of neuronal network spike dynamics and topology from calcium imaging data

    Directory of Open Access Journals (Sweden)

    Henry eLütcke

    2013-12-01

    Full Text Available Two-photon calcium imaging enables functional analysis of neuronal circuits by inferring action potential (AP occurrence ('spike trains' from cellular fluorescence signals. It remains unclear how experimental parameters such as signal-to-noise ratio (SNR and acquisition rate affect spike inference and whether additional information about network structure can be extracted. Here we present a simulation framework for quantitatively assessing how well spike dynamics and network topology can be inferred from noisy calcium imaging data. For simulated AP-evoked calcium transients in neocortical pyramidal cells, we analyzed the quality of spike inference as a function of SNR and data acquisition rate using a recently introduced peeling algorithm. Given experimentally attainable values of SNR and acquisition rate, neural spike trains could be reconstructed accurately and with up to millisecond precision. We then applied statistical neuronal network models to explore how remaining uncertainties in spike inference affect estimates of network connectivity and topological features of network organization. We define the experimental conditions suitable for inferring whether the network has a scale-free structure and determine how well hub neurons can be identified. Our findings provide a benchmark for future calcium imaging studies that aim to reliably infer neuronal network properties.

  17. Genetic Analysis of Oncorhynchus Nerka : Life History and Genetic Analysis of Redfish Lake Oncorhynchus Nerka, 1993-1994 Completion Report.

    Energy Technology Data Exchange (ETDEWEB)

    Brannon, E.L.; Thorgaard, G.H.; Cummings, S.A.

    1994-10-01

    The study has shown through life history examination and DNA analysis that three forms of O. nerka are present in Redfish Lake. The three forms are closely related, but may be sufficiently different to be considered three separate stocks. Fishhook Creek kokanee are temporally isolated from the beach spawners, and may represent the gene pool most similar to the historic sockeye population that once spawned there. Fishhook Creek offers the best spawning area available in the lake system, and should be considered for use in reestablishing an anadromous Fishhook Creek sockeye swain. The resident beach spawning strain of O. nerka is likewise the most similar genetic form of the companion anadromous beach spawning O. nerka, and needs to be considered the most appropriate genetic source to help minimize reduced fitness of the sockeye from inbreeding.

  18. Analysis of genetic diversity of Piper spp. in Hainan Island (China ...

    African Journals Online (AJOL)

    Inter-simple sequence repeat (ISSR) analysis was used to evaluate the genetic variation of Piper spp. from Hainan, China. 247 polymorphic bands out of a total of 248 (99.60%) were generated from 74 individual plants of Piper spp. The overall level of genetic diversity among Piper spp. in Hainan was high, with the mean ...

  19. Postmarital residence and within-sex genetic diversity among the Urubu-Ka'apor Indians, Brazilian Amazon.

    Science.gov (United States)

    Aguiar, G F; Neves, W A

    1991-08-01

    The analysis of biologic variation in prehistoric human populations separately by sex has been used as a tool to recover post-marital residential rules. These studies, which focus on the sexual distribution of skeletal traits, assume that the degree of intragroup or intergroup biologic diversity is higher in one sex with regard to unilocality (uxori- or virilocality). Despite a recent attempt to interpret this phenomenon in terms of population genetics (Konigsberg 1988), the main assumption has never been tested in situations in which the real residential practice of an indigenous population is known and in which genetic rather than phenotypic data are available. We investigated the within-group and between-group genetic variability among males and females from 4 villages of an uxorilocal Amazonian tribe, the Urubu-Ka'apor, on the basis of 20 polymorphic loci. The results were only partly concordant with the expected. Individual mean per locus heterozygosities were not different between the sexes, and the analysis of genetic heterogeneity showed similar gene frequencies for males and females in all villages. On the other hand, the intergroup approach detected a level of variation significantly greater among females than among males. The ethnographic evidence shows that three of the four subgroups studied belong to the same gamic unity, with the fourth subgroup belonging to another gamic network. Within-sex differences in intergroup analysis turned out to be more evident; yet, when those 3 villages were investigated separately, the female FST (0.0609) proved to be significantly higher than the male FST (0.0218). Such results suggest that the intergroup analysis is more sensitive to the genetic effects of differential migration rates between the sexes. In prehistoric contexts, therefore, an intergroup genetic approach can provide more reliable grounds for sociocultural inferences.

  20. Genetic analysis of a consanguineous Pakistani family with Leber ...

    Indian Academy of Sciences (India)

    2014-08-01

    Aug 1, 2014 ... RESEARCH NOTE. Genetic analysis of a consanguineous Pakistani family with Leber .... representation of the deleterious mutation at genomic and protein level. ... In the last couple of years, numerous mutations in. GUCY2D ...

  1. A theoretical analysis of population genetics of plants on restored habitats

    Energy Technology Data Exchange (ETDEWEB)

    Bogoliubov, A.G. [Botanical Institute, Russian Academy of Science, St. Petersburg (Russian Federation); Loehle, C. [Argonne National Lab., IL (United States)

    1995-02-01

    Seed and propagules used for habitat restoration are not likely to be closely adapted to local site conditions. Rapid changes of genotypes frequencies on local microsites and/or microevolution would allow plants to become better adapted to a site. These same factors would help to maintain genetic diversity and ensure the survival of small endangered populations. We used population genetics models to examine the selection of genotypes during establishment on restored sites. Vegetative spread was shown to affect selection and significantly reduce genetic diversity. To study general microevolution, we linked a model of resource usage with a genetics model and analyzed competition between genotypes. A complex suite of feasible ecogenetic states was shown to result. The state actually resulting would depend strongly on initial conditions. This analysis indicated that genetic structure can vary locally and can produce overall genetic variability that is not simply the result of microsite adaptations. For restoration activities, the implication is that small differences in seed source could lead to large differences in local genetic structure after selection.

  2. A theoretical analysis of population genetics of plants on restored habitats

    Energy Technology Data Exchange (ETDEWEB)

    Bogoliubov, A.G. [Russian Academy of Science, St. Petersburg (Russian Federation). Botanical Inst.; Loehle, C. [Argonne National Lab., IL (United States). Environmental Research Div.

    1997-07-01

    Seed and propagules used for habitat restoration are not likely to be closely adapted to local site conditions. Rapid changes of genotypes frequencies on local microsites and/or microevolution would allow plants to become better adapted to a site. These same factors would help to maintain genetic diversity and ensure the survival of small endangered populations. The authors used population genetics models to examine the selection of genotypes during establishment on restored sites. Vegetative spread was shown to affect selection and significantly reduce genetic diversity. To study general microevolution, the authors linked a model of resource usage with a genetics model and analyzed competition between genotypes. A complex suite of feasible ecogenetic states was shown to result. The state actually resulting would depend strongly on initial conditions. This analysis indicated that genetic structure can vary locally and can produce overall genetic variability that is not simply the result of microsite adaptations. For restoration activities, the implication is that small differences in seed source could lead to large differences in local genetic structure after selection.

  3. Role of Speaker Cues in Attention Inference

    Directory of Open Access Journals (Sweden)

    Jin Joo Lee

    2017-10-01

    Full Text Available Current state-of-the-art approaches to emotion recognition primarily focus on modeling the nonverbal expressions of the sole individual without reference to contextual elements such as the co-presence of the partner. In this paper, we demonstrate that the accurate inference of listeners’ social-emotional state of attention depends on accounting for the nonverbal behaviors of their storytelling partner, namely their speaker cues. To gain a deeper understanding of the role of speaker cues in attention inference, we conduct investigations into real-world interactions of children (5–6 years old storytelling with their peers. Through in-depth analysis of human–human interaction data, we first identify nonverbal speaker cues (i.e., backchannel-inviting cues and listener responses (i.e., backchannel feedback. We then demonstrate how speaker cues can modify the interpretation of attention-related backchannels as well as serve as a means to regulate the responsiveness of listeners. We discuss the design implications of our findings toward our primary goal of developing attention recognition models for storytelling robots, and we argue that social robots can proactively use speaker cues to form more accurate inferences about the attentive state of their human partners.

  4. Shallow Population Genetic Structures of Thread-sail Filefish ( Populations from Korean Coastal Waters

    Directory of Open Access Journals (Sweden)

    M. Yoon

    2012-02-01

    Full Text Available Genetic diversities, population genetic structures and demographic histories of the thread-sail filefish Stephanolepis cirrhifer were investigated by nucleotide sequencing of 336 base pairs of the mitochondrial DNA (mtDNA control region in 111 individuals collected from six populations in Korean coastal waters. A total of 70 haplotypes were defined by 58 variable nucleotide sites. The neighbor-joining tree of the 70 haplotypes was shallow and did not provide evidence of geographical associations. Expansion of S. cirrhifer populations began approximate 51,000 to 102,000 years before present, correlating with the period of sea level rise since the late Pleistocene glacial maximum. High levels of haplotype diversities (0.974±0.029 to 1.000±0.076 and nucleotide diversities (0.014 to 0.019, and low levels of genetic differentiation among populations inferred from pairwise population FST values (−0.007 to 0.107, support an expansion of the S. cirrhifer population. Hierarchical analysis of molecular variance (AMOVA revealed weak but significant genetic structures among three groups (FCT = 0.028, p<0.05, and no genetic variation within groups (0.53%; FSC = 0.005, p = 0.23. These results may help establish appropriate fishery management strategies for stocks of S. cirrhifer and related species.

  5. Adaptive Inference on General Graphical Models

    OpenAIRE

    Acar, Umut A.; Ihler, Alexander T.; Mettu, Ramgopal; Sumer, Ozgur

    2012-01-01

    Many algorithms and applications involve repeatedly solving variations of the same inference problem; for example we may want to introduce new evidence to the model or perform updates to conditional dependencies. The goal of adaptive inference is to take advantage of what is preserved in the model and perform inference more rapidly than from scratch. In this paper, we describe techniques for adaptive inference on general graphs that support marginal computation and updates to the conditional ...

  6. Reward Behavior by Male and Female Leaders: A Causal Inference Analysis.

    Science.gov (United States)

    Szilagyi, Andrew D.

    1980-01-01

    Investigated causal inferences between leader reward behavior and subordinate goal attainment, absenteeism, and work satisfaction. Results revealed that no significant differences were attributed to sex and that the leader reward behavior and subordinate attitudes and behavior were independent of the effects of sex of supervisor or subordinate.…

  7. Using AFLP markers and the Geneland program for the inference of population genetic structure

    DEFF Research Database (Denmark)

    Guillot, Gilles; Santos, Filipe

    2010-01-01

    the computer program Geneland designed to infer population structure has been adapted to deal with dominant markers; and (ii) we use Geneland for numerical comparison of dominant and codominant markers to perform clustering. AFLP markers lead to less accurate results than bi-allelic codominant markers...... such as single nucleotide polymorphisms (SNP) markers but this difference becomes negligible for data sets of common size (number of individuals n≥100, number of markers L≥200). The latest Geneland version (3.2.1) handling dominant markers is freely available as an R package with a fully clickable graphical...

  8. The Use of Carcasses for the Analysis of Cetacean Population Genetic Structure: A Comparative Study in Two Dolphin Species

    Science.gov (United States)

    Bilgmann, Kerstin; Möller, Luciana M.; Harcourt, Robert G.; Kemper, Catherine M.; Beheregaray, Luciano B.

    2011-01-01

    Advances in molecular techniques have enabled the study of genetic diversity and population structure in many different contexts. Studies that assess the genetic structure of cetacean populations often use biopsy samples from free-ranging individuals and tissue samples from stranded animals or individuals that became entangled in fishery or aquaculture equipment. This leads to the question of how representative the location of a stranded or entangled animal is with respect to its natural range, and whether similar results would be obtained when comparing carcass samples with samples from free-ranging individuals in studies of population structure. Here we use tissue samples from carcasses of dolphins that stranded or died as a result of bycatch in South Australia to investigate spatial population structure in two species: coastal bottlenose (Tursiops sp.) and short-beaked common dolphins (Delphinus delphis). We compare these results with those previously obtained from biopsy sampled free-ranging dolphins in the same area to test whether carcass samples yield similar patterns of genetic variability and population structure. Data from dolphin carcasses were gathered using seven microsatellite markers and a fragment of the mitochondrial DNA control region. Analyses based on carcass samples alone failed to detect genetic structure in Tursiops sp., a species previously shown to exhibit restricted dispersal and moderate genetic differentiation across a small spatial scale in this region. However, genetic structure was correctly inferred in D. delphis, a species previously shown to have reduced genetic structure over a similar geographic area. We propose that in the absence of corroborating data, and when population structure is assessed over relatively small spatial scales, the sole use of carcasses may lead to an underestimate of genetic differentiation. This can lead to a failure in identifying management units for conservation. Therefore, this risk should be carefully

  9. Estimating mountain basin-mean precipitation from streamflow using Bayesian inference

    Science.gov (United States)

    Henn, Brian; Clark, Martyn P.; Kavetski, Dmitri; Lundquist, Jessica D.

    2015-10-01

    Estimating basin-mean precipitation in complex terrain is difficult due to uncertainty in the topographical representativeness of precipitation gauges relative to the basin. To address this issue, we use Bayesian methodology coupled with a multimodel framework to infer basin-mean precipitation from streamflow observations, and we apply this approach to snow-dominated basins in the Sierra Nevada of California. Using streamflow observations, forcing data from lower-elevation stations, the Bayesian Total Error Analysis (BATEA) methodology and the Framework for Understanding Structural Errors (FUSE), we infer basin-mean precipitation, and compare it to basin-mean precipitation estimated using topographically informed interpolation from gauges (PRISM, the Parameter-elevation Regression on Independent Slopes Model). The BATEA-inferred spatial patterns of precipitation show agreement with PRISM in terms of the rank of basins from wet to dry but differ in absolute values. In some of the basins, these differences may reflect biases in PRISM, because some implied PRISM runoff ratios may be inconsistent with the regional climate. We also infer annual time series of basin precipitation using a two-step calibration approach. Assessment of the precision and robustness of the BATEA approach suggests that uncertainty in the BATEA-inferred precipitation is primarily related to uncertainties in hydrologic model structure. Despite these limitations, time series of inferred annual precipitation under different model and parameter assumptions are strongly correlated with one another, suggesting that this approach is capable of resolving year-to-year variability in basin-mean precipitation.

  10. Inferring regulatory networks from experimental morphological phenotypes: a computational method reverse-engineers planarian regeneration.

    Directory of Open Access Journals (Sweden)

    Daniel Lobo

    2015-06-01

    Full Text Available Transformative applications in biomedicine require the discovery of complex regulatory networks that explain the development and regeneration of anatomical structures, and reveal what external signals will trigger desired changes of large-scale pattern. Despite recent advances in bioinformatics, extracting mechanistic pathway models from experimental morphological data is a key open challenge that has resisted automation. The fundamental difficulty of manually predicting emergent behavior of even simple networks has limited the models invented by human scientists to pathway diagrams that show necessary subunit interactions but do not reveal the dynamics that are sufficient for complex, self-regulating pattern to emerge. To finally bridge the gap between high-resolution genetic data and the ability to understand and control patterning, it is critical to develop computational tools to efficiently extract regulatory pathways from the resultant experimental shape phenotypes. For example, planarian regeneration has been studied for over a century, but despite increasing insight into the pathways that control its stem cells, no constructive, mechanistic model has yet been found by human scientists that explains more than one or two key features of its remarkable ability to regenerate its correct anatomical pattern after drastic perturbations. We present a method to infer the molecular products, topology, and spatial and temporal non-linear dynamics of regulatory networks recapitulating in silico the rich dataset of morphological phenotypes resulting from genetic, surgical, and pharmacological experiments. We demonstrated our approach by inferring complete regulatory networks explaining the outcomes of the main functional regeneration experiments in the planarian literature; By analyzing all the datasets together, our system inferred the first systems-biology comprehensive dynamical model explaining patterning in planarian regeneration. This method

  11. Genetic analysis of floating Enteromorpha prolifera in the Yellow Sea with AFLP marker

    Science.gov (United States)

    Liu, Cui; Zhang, Jing; Sun, Xiaoyu; Li, Jian; Zhang, Xi; Liu, Tao

    2011-09-01

    Extremely large accumulation of green algae Enteromorpha prolifera floated along China' coastal region of the Yellow Sea ever since the summer of 2008. Amplified Fragment Length Polymorphism (AFLP) analysis was applied to assess the genetic diversity and relationships among E. prolifera samples collected from 9 affected areas of the Yellow Sea. Two hundred reproducible fragments were generated with 8 AFLP primer combinations, of which 194 (97%) were polymorphic. The average Nei's genetic diversity, the coefficiency of genetic differentiation (Gst), and the average gene flow estimated from Gst in the 9 populations were 0.4018, 0.6404 and 0.2807 respectively. Cluster analysis based on the unweighed pair group method with arithmetic averages (UPGMA) showed that the genetic relationships within one population or among different populations were all related to their collecting locations and sampling time. Large genetic differentiation was detected among the populations. The E. prolifera originated from different areas and were undergoing a course of mixing.

  12. A Genetic Analysis of Mortality in Pigs

    DEFF Research Database (Denmark)

    Varona, Luis; Sorensen, Daniel

    2010-01-01

    to investigate whether there is support for genetic variation for mortality and to study the quality of fit and predictive properties of the various models. In both breeds, the model that provided the best fit to the data was the standard binomial hierarchical model. The model that performed best in terms......An analysis of mortality is undertaken in two breeds of pigs: Danish Landrace and Yorkshire. Zero-inflated and standard versions of hierarchical Poisson, binomial, and negative binomial Bayesian models were fitted using Markov chain Monte Carlo (MCMC). The objectives of the study were...... of the ability to predict the distribution of stillbirths was the hierarchical zero-inflated negative binomial model. The best fit of the binomial hierarchical model and of the zero-inflated hierarchical negative binomial model was obtained when genetic variation was included as a parameter. For the hierarchical...

  13. sick: The Spectroscopic Inference Crank

    Science.gov (United States)

    Casey, Andrew R.

    2016-03-01

    There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal

  14. SICK: THE SPECTROSCOPIC INFERENCE CRANK

    Energy Technology Data Exchange (ETDEWEB)

    Casey, Andrew R., E-mail: arc@ast.cam.ac.uk [Institute of Astronomy, University of Cambridge, Madingley Road, Cambdridge, CB3 0HA (United Kingdom)

    2016-03-15

    There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal

  15. SICK: THE SPECTROSCOPIC INFERENCE CRANK

    International Nuclear Information System (INIS)

    Casey, Andrew R.

    2016-01-01

    There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal

  16. Molecular genetic analysis of activation-tagged transcription factors thought to be involved in photomorphogenesis

    Energy Technology Data Exchange (ETDEWEB)

    Neff, Michael M.

    2011-06-23

    This is a final report for Department of Energy Grant No. DE-FG02-08ER15927 entitled “Molecular Genetic Analysis of Activation-Tagged Transcription Factors Thought to be Involved in Photomorphogenesis”. Based on our preliminary photobiological and genetic analysis of the sob1-D mutant, we hypothesized that OBP3 is a transcription factor involved in both phytochrome and cryptochrome-mediated signal transduction. In addition, we hypothesized that OBP3 is involved in auxin signaling and root development. Based on our preliminary photobiological and genetic analysis of the sob2-D mutant, we also hypothesized that a related gene, LEP, is involved in hormone signaling and seedling development.

  17. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods.

    Science.gov (United States)

    Schaffter, Thomas; Marbach, Daniel; Floreano, Dario

    2011-08-15

    Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5). GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data. Supplementary data are available at Bioinformatics online. dario.floreano@epfl.ch.

  18. Analysis of the Threat of Genetically Modified Organisms for Biological Warfare

    Science.gov (United States)

    2011-05-01

    biological warfare. The primary focus of the framework are those aspects of the technology directly affecting humans by inducing virulent infectious disease...applications. Simple organisms such as fruit flies have been used to study the effects of genetic changes across generations. Transgenic mice are...Analysis * Multi-cell pathogens * Toxins (Chemical products of living cells.) * Fungi (Robust organism; no genetic manipulation needed

  19. Phylogenetic Inference of HIV Transmission Clusters

    Directory of Open Access Journals (Sweden)

    Vlad Novitsky

    2017-10-01

    Full Text Available Better understanding the structure and dynamics of HIV transmission networks is essential for designing the most efficient interventions to prevent new HIV transmissions, and ultimately for gaining control of the HIV epidemic. The inference of phylogenetic relationships and the interpretation of results rely on the definition of the HIV transmission cluster. The definition of the HIV cluster is complex and dependent on multiple factors, including the design of sampling, accuracy of sequencing, precision of sequence alignment, evolutionary models, the phylogenetic method of inference, and specified thresholds for cluster support. While the majority of studies focus on clusters, non-clustered cases could also be highly informative. A new dimension in the analysis of the global and local HIV epidemics is the concept of phylogenetically distinct HIV sub-epidemics. The identification of active HIV sub-epidemics reveals spreading viral lineages and may help in the design of targeted interventions.HIVclustering can also be affected by sampling density. Obtaining a proper sampling density may increase statistical power and reduce sampling bias, so sampling density should be taken into account in study design and in interpretation of phylogenetic results. Finally, recent advances in long-range genotyping may enable more accurate inference of HIV transmission networks. If performed in real time, it could both inform public-health strategies and be clinically relevant (e.g., drug-resistance testing.

  20. Causal inference of asynchronous audiovisual speech

    Directory of Open Access Journals (Sweden)

    John F Magnotti

    2013-11-01

    Full Text Available During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions abut the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.

  1. On the Ability To Infer Deficiency in Mathematics From Performance in Physics Using Hierarchies

    Science.gov (United States)

    Riban, David M.

    1971-01-01

    Presents the procedures, results, and conclusions of a study designed to see if mathematical deficiencies can be inferred from PSSC students' performance by using a hierarchical model of requisite skills. Assuming inferences were possible, remediation was given. No effect due to remediation was observed but analysis indicated incidental learning…

  2. Introductory statistical inference

    CERN Document Server

    Mukhopadhyay, Nitis

    2014-01-01

    This gracefully organized text reveals the rigorous theory of probability and statistical inference in the style of a tutorial, using worked examples, exercises, figures, tables, and computer simulations to develop and illustrate concepts. Drills and boxed summaries emphasize and reinforce important ideas and special techniques.Beginning with a review of the basic concepts and methods in probability theory, moments, and moment generating functions, the author moves to more intricate topics. Introductory Statistical Inference studies multivariate random variables, exponential families of dist

  3. SecureMA: protecting participant privacy in genetic association meta-analysis.

    Science.gov (United States)

    Xie, Wei; Kantarcioglu, Murat; Bush, William S; Crawford, Dana; Denny, Joshua C; Heatherly, Raymond; Malin, Bradley A

    2014-12-01

    Sharing genomic data is crucial to support scientific investigation such as genome-wide association studies. However, recent investigations suggest the privacy of the individual participants in these studies can be compromised, leading to serious concerns and consequences, such as overly restricted access to data. We introduce a novel cryptographic strategy to securely perform meta-analysis for genetic association studies in large consortia. Our methodology is useful for supporting joint studies among disparate data sites, where privacy or confidentiality is of concern. We validate our method using three multisite association studies. Our research shows that genetic associations can be analyzed efficiently and accurately across substudy sites, without leaking information on individual participants and site-level association summaries. Our software for secure meta-analysis of genetic association studies, SecureMA, is publicly available at http://github.com/XieConnect/SecureMA. Our customized secure computation framework is also publicly available at http://github.com/XieConnect/CircuitService. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. Comparison of Machine Learning Techniques in Inferring Phytoplankton Size Classes

    Directory of Open Access Journals (Sweden)

    Shuibo Hu

    2018-03-01

    Full Text Available The size of phytoplankton not only influences its physiology, metabolic rates and marine food web, but also serves as an indicator of phytoplankton functional roles in ecological and biogeochemical processes. Therefore, some algorithms have been developed to infer the synoptic distribution of phytoplankton cell size, denoted as phytoplankton size classes (PSCs, in surface ocean waters, by the means of remotely sensed variables. This study, using the NASA bio-Optical Marine Algorithm Data set (NOMAD high performance liquid chromatography (HPLC database, and satellite match-ups, aimed to compare the effectiveness of modeling techniques, including partial least square (PLS, artificial neural networks (ANN, support vector machine (SVM and random forests (RF, and feature selection techniques, including genetic algorithm (GA, successive projection algorithm (SPA and recursive feature elimination based on support vector machine (SVM-RFE, for inferring PSCs from remote sensing data. Results showed that: (1 SVM-RFE worked better in selecting sensitive features; (2 RF performed better than PLS, ANN and SVM in calibrating PSCs retrieval models; (3 machine learning techniques produced better performance than the chlorophyll-a based three-component method; (4 sea surface temperature, wind stress, and spectral curvature derived from the remote sensing reflectance at 490, 510, and 555 nm were among the most sensitive features to PSCs; and (5 the combination of SVM-RFE feature selection techniques and random forests regression was recommended for inferring PSCs. This study demonstrated the effectiveness of machine learning techniques in selecting sensitive features and calibrating models for PSCs estimations with remote sensing.

  5. Analysis of genetic polymorphism and genetic distance among four ...

    African Journals Online (AJOL)

    use

    2011-11-21

    Nov 21, 2011 ... The genomes of 4 sheep populations {Yuanqu white Tan sheep (YWT), Baozhongchang white Tan sheep. (BWT), black Tan sheep (BT) and small-tailed Han sheep (Han)} were screened using 10 microsatellite. DNA markers to estimate the genetic diversities and genetic distances among these ...

  6. Analysis of the genetic basis of disease in the context of worldwide human relationships and migration.

    Directory of Open Access Journals (Sweden)

    Erik Corona

    2013-05-01

    Full Text Available Genetic diversity across different human populations can enhance understanding of the genetic basis of disease. We calculated the genetic risk of 102 diseases in 1,043 unrelated individuals across 51 populations of the Human Genome Diversity Panel. We found that genetic risk for type 2 diabetes and pancreatic cancer decreased as humans migrated toward East Asia. In addition, biliary liver cirrhosis, alopecia areata, bladder cancer, inflammatory bowel disease, membranous nephropathy, systemic lupus erythematosus, systemic sclerosis, ulcerative colitis, and vitiligo have undergone genetic risk differentiation. This analysis represents a large-scale attempt to characterize genetic risk differentiation in the context of migration. We anticipate that our findings will enable detailed analysis pertaining to the driving forces behind genetic risk differentiation.

  7. Improving the extraction of complex regulatory events from scientific text by using ontology-based inference.

    Science.gov (United States)

    Kim, Jung-Jae; Rebholz-Schuhmann, Dietrich

    2011-10-06

    The extraction of complex events from biomedical text is a challenging task and requires in-depth semantic analysis. Previous approaches associate lexical and syntactic resources with ontologies for the semantic analysis, but fall short in testing the benefits from the use of domain knowledge. We developed a system that deduces implicit events from explicitly expressed events by using inference rules that encode domain knowledge. We evaluated the system with the inference module on three tasks: First, when tested against a corpus with manually annotated events, the inference module of our system contributes 53.2% of correct extractions, but does not cause any incorrect results. Second, the system overall reproduces 33.1% of the transcription regulatory events contained in RegulonDB (up to 85.0% precision) and the inference module is required for 93.8% of the reproduced events. Third, we applied the system with minimum adaptations to the identification of cell activity regulation events, confirming that the inference improves the performance of the system also on this task. Our research shows that the inference based on domain knowledge plays a significant role in extracting complex events from text. This approach has great potential in recognizing the complex concepts of such biomedical ontologies as Gene Ontology in the literature.

  8. Improving the extraction of complex regulatory events from scientific text by using ontology-based inference

    Directory of Open Access Journals (Sweden)

    Kim Jung-jae

    2011-10-01

    Full Text Available Abstract Background The extraction of complex events from biomedical text is a challenging task and requires in-depth semantic analysis. Previous approaches associate lexical and syntactic resources with ontologies for the semantic analysis, but fall short in testing the benefits from the use of domain knowledge. Results We developed a system that deduces implicit events from explicitly expressed events by using inference rules that encode domain knowledge. We evaluated the system with the inference module on three tasks: First, when tested against a corpus with manually annotated events, the inference module of our system contributes 53.2% of correct extractions, but does not cause any incorrect results. Second, the system overall reproduces 33.1% of the transcription regulatory events contained in RegulonDB (up to 85.0% precision and the inference module is required for 93.8% of the reproduced events. Third, we applied the system with minimum adaptations to the identification of cell activity regulation events, confirming that the inference improves the performance of the system also on this task. Conclusions Our research shows that the inference based on domain knowledge plays a significant role in extracting complex events from text. This approach has great potential in recognizing the complex concepts of such biomedical ontologies as Gene Ontology in the literature.

  9. Active inference, communication and hermeneutics.

    Science.gov (United States)

    Friston, Karl J; Frith, Christopher D

    2015-07-01

    Hermeneutics refers to interpretation and translation of text (typically ancient scriptures) but also applies to verbal and non-verbal communication. In a psychological setting it nicely frames the problem of inferring the intended content of a communication. In this paper, we offer a solution to the problem of neural hermeneutics based upon active inference. In active inference, action fulfils predictions about how we will behave (e.g., predicting we will speak). Crucially, these predictions can be used to predict both self and others--during speaking and listening respectively. Active inference mandates the suppression of prediction errors by updating an internal model that generates predictions--both at fast timescales (through perceptual inference) and slower timescales (through perceptual learning). If two agents adopt the same model, then--in principle--they can predict each other and minimise their mutual prediction errors. Heuristically, this ensures they are singing from the same hymn sheet. This paper builds upon recent work on active inference and communication to illustrate perceptual learning using simulated birdsongs. Our focus here is the neural hermeneutics implicit in learning, where communication facilitates long-term changes in generative models that are trying to predict each other. In other words, communication induces perceptual learning and enables others to (literally) change our minds and vice versa. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  10. Optimization of analytical parameters for inferring relationships among Escherichia coli isolates from repetitive-element PCR by maximizing correspondence with multilocus sequence typing data.

    Science.gov (United States)

    Goldberg, Tony L; Gillespie, Thomas R; Singer, Randall S

    2006-09-01

    Repetitive-element PCR (rep-PCR) is a method for genotyping bacteria based on the selective amplification of repetitive genetic elements dispersed throughout bacterial chromosomes. The method has great potential for large-scale epidemiological studies because of its speed and simplicity; however, objective guidelines for inferring relationships among bacterial isolates from rep-PCR data are lacking. We used multilocus sequence typing (MLST) as a "gold standard" to optimize the analytical parameters for inferring relationships among Escherichia coli isolates from rep-PCR data. We chose 12 isolates from a large database to represent a wide range of pairwise genetic distances, based on the initial evaluation of their rep-PCR fingerprints. We conducted MLST with these same isolates and systematically varied the analytical parameters to maximize the correspondence between the relationships inferred from rep-PCR and those inferred from MLST. Methods that compared the shapes of densitometric profiles ("curve-based" methods) yielded consistently higher correspondence values between data types than did methods that calculated indices of similarity based on shared and different bands (maximum correspondences of 84.5% and 80.3%, respectively). Curve-based methods were also markedly more robust in accommodating variations in user-specified analytical parameter values than were "band-sharing coefficient" methods, and they enhanced the reproducibility of rep-PCR. Phylogenetic analyses of rep-PCR data yielded trees with high topological correspondence to trees based on MLST and high statistical support for major clades. These results indicate that rep-PCR yields accurate information for inferring relationships among E. coli isolates and that accuracy can be enhanced with the use of analytical methods that consider the shapes of densitometric profiles.

  11. Genetic Diversity Analysis of Iranian Jujube Ecotypes (Ziziphus spp. Using RAPD Molecular Marker

    Directory of Open Access Journals (Sweden)

    S Abbasi

    2012-12-01

    Full Text Available Jujube (Ziziphus jujuba Mill. is a valuable medicinal plant which is important in Iranian traditional medicines. Although the regional plants such as jujube play an important role in our economy, but they are forgotten in research and technology. Considering the economic and medicinal importance of jujube, the first step in breeding programs is determination of the genetic diversity among the individuals. 34 ecotypes of jujube, which have been collected from eight provinces of Iran, were used in this study. The genetic relationships of Iranian jujube ecotypes were analyzed using Random Amplified Polymorphic DNA (RAPD marker. Six out of 15 random decamer primers applied for RAPD analysis, showed an informative polymorphism. According to clustering analysis using UPGMA's methods, the ecotypes were classified into two major groups at the 0.81 level of genetic similarity. The highest value of similarity coefficient (0.92 was detected between Mazandaran and Golestan ecotypes and the most genetic diversity was observed in ecotypes of Khorasan-Jonoubi. The affinity of Khorasan-Jonoubi and Esfahan ecotypes indicated a possible common origin for the variation in these areas. Results indicated that RAPD analysis could be successfully used for the estimation of genetic diversity among Ziziphus ecotypes and it can be useful for further investigations.

  12. Conflation of Short Identity-by-Descent Segments Bias Their Inferred Length Distribution

    Directory of Open Access Journals (Sweden)

    Charleston W. K. Chiang

    2016-05-01

    Full Text Available Identity-by-descent (IBD is a fundamental concept in genetics with many applications. In a common definition, two haplotypes are said to share an IBD segment if that segment is inherited from a recent shared common ancestor without intervening recombination. Segments several cM long can be efficiently detected by a number of algorithms using high-density SNP array data from a population sample, and there are currently efforts to detect shorter segments from sequencing. Here, we study a problem of identifiability: because existing approaches detect IBD based on contiguous segments of identity-by-state, inferred long segments of IBD may arise from the conflation of smaller, nearby IBD segments. We quantified this effect using coalescent simulations, finding that significant proportions of inferred segments 1–2 cM long are results of conflations of two or more shorter segments, each at least 0.2 cM or longer, under demographic scenarios typical for modern humans for all programs tested. The impact of such conflation is much smaller for longer (> 2 cM segments. This biases the inferred IBD segment length distribution, and so can affect downstream inferences that depend on the assumption that each segment of IBD derives from a single common ancestor. As an example, we present and analyze an estimator of the de novo mutation rate using IBD segments, and demonstrate that unmodeled conflation leads to underestimates of the ages of the common ancestors on these segments, and hence a significant overestimate of the mutation rate. Understanding the conflation effect in detail will make its correction in future methods more tractable.

  13. Genetic analysis and molecular detection of the corn endosperm mutants induced by space flight

    International Nuclear Information System (INIS)

    Zhang Caibo; Zhou Yuanyuan; Wang Hanyu; Wang Hongwei; Wang Shengqing; Rong Tingzhao; Cao Moju

    2013-01-01

    In this study, two maize inbred lines 08-641 and 18-599 were carried into cosmic space by recoverable satellite 'Shijian 8', grain shrunken transparently and opaquely mutants were selected as experimental materials and their soluble sugar content in kernel were measured by annthrone colorimetry. The content of soluble sugar in mutant st1 kernels began to rise in 10 days after pollination, to reach the peak in 25 days and significantly higher than the contrast 08-641, while in mutant sol kernels it began to rise in 10 days after pollination, to reach the peak in 20 days and significantly higher than the contrast 18-599. The results of genetic analysis and allelism test showed that the trait in both mutants was all controlled by a single recessive gene, the mutant st1 was allelic to the su1 and the mutant sol was allelic to the sh2. DNA sequence alignment found 2 single-base mutations in 2 and 13 exon of su1 gene in the mutant st1 and 3 single-base mutations in 2, 5 and 16 exon of sh2 gene in mutant so1 leading to the change in amino acid sequences. So it is inferred that starch biosynthesis in the mutants may be blocked by these mutations, which lead to the increase of soluble sugar content in kernel. (authors)

  14. Bacterial Population Genetics in a Forensic Context

    Energy Technology Data Exchange (ETDEWEB)

    Velsko, S P

    2009-11-02

    This report addresses the recent Department of Homeland Security (DHS) call for a Phase I study to (1) assess gaps in the forensically relevant knowledge about the population genetics of eight bacterial agents of concern, (2) formulate a technical roadmap to address those gaps, and (3) identify new bioinformatics tools that would be necessary to analyze and interpret population genetic data in a forensic context. The eight organisms that were studied are B. anthracis, Y. pestis, F. tularensis, Brucella spp., E. coli O157/H7, Burkholderia mallei, Burkholderia pseudomallei, and C. botulinum. Our study focused on the use of bacterial population genetics by forensic investigators to test hypotheses about the possible provenance of an agent that was used in a crime or act of terrorism. Just as human population genetics underpins the calculations of match probabilities for human DNA evidence, bacterial population genetics determines the level of support that microbial DNA evidence provides for or against certain well-defined hypotheses about the origins of an infecting strain. Our key findings are: (1) Bacterial population genetics is critical for answering certain types of questions in a probabilistic manner, akin (but not identical) to 'match probabilities' in DNA forensics. (2) A basic theoretical framework for calculating likelihood ratios or posterior probabilities for forensic hypotheses based on microbial genetic comparisons has been formulated. This 'inference-on-networks' framework has deep but simple connections to the population genetics of mtDNA and Y-STRs in human DNA forensics. (3) The 'phylogeographic' approach to identifying microbial sources is not an adequate basis for understanding bacterial population genetics in a forensic context, and has limited utility, even for generating 'leads' with respect to strain origin. (4) A collection of genotyped isolates obtained opportunistically from international locations

  15. Genetic analysis for two italian siblings with usher syndrome and schizophrenia.

    Science.gov (United States)

    Domanico, Daniela; Fragiotta, Serena; Trabucco, Paolo; Nebbioso, Marcella; Vingolo, Enzo Maria

    2012-01-01

    Usher syndrome is a group of autosomal recessive genetic disorders characterized by deafness, retinitis pigmentosa, and sometimes vestibular areflexia. The relationship between Usher syndrome and mental disorders, most commonly a "schizophrenia-like" psychosis, is sometimes described in the literature. The etiology of psychiatric expression of Usher syndrome is still unclear. We reported a case of two natural siblings with congenital hypoacusis, retinitis pigmentosa, and psychiatric symptoms. Clinical features and genetic analysis were also reported. We analyzed possible causes to explain the high prevalence of psychiatric manifestations in Usher syndrome: genetic factors, brain damage, and "stress-related" hypothesis.

  16. Genetic Analysis for Two Italian Siblings with Usher Syndrome and Schizophrenia

    Directory of Open Access Journals (Sweden)

    Daniela Domanico

    2012-01-01

    Full Text Available Usher syndrome is a group of autosomal recessive genetic disorders characterized by deafness, retinitis pigmentosa, and sometimes vestibular areflexia. The relationship between Usher syndrome and mental disorders, most commonly a “schizophrenia-like” psychosis, is sometimes described in the literature. The etiology of psychiatric expression of Usher syndrome is still unclear. We reported a case of two natural siblings with congenital hypoacusis, retinitis pigmentosa, and psychiatric symptoms. Clinical features and genetic analysis were also reported. We analyzed possible causes to explain the high prevalence of psychiatric manifestations in Usher syndrome: genetic factors, brain damage, and “stress-related” hypothesis.

  17. Bayesian inferences suggest that Amazon Yunga Natives diverged from Andeans less than 5000 ybp: implications for South American prehistory.

    Science.gov (United States)

    Scliar, Marilia O; Gouveia, Mateus H; Benazzo, Andrea; Ghirotto, Silvia; Fagundes, Nelson J R; Leal, Thiago P; Magalhães, Wagner C S; Pereira, Latife; Rodrigues, Maira R; Soares-Souza, Giordano B; Cabrera, Lilia; Berg, Douglas E; Gilman, Robert H; Bertorelle, Giorgio; Tarazona-Santos, Eduardo

    2014-09-30

    Archaeology reports millenary cultural contacts between Peruvian Coast-Andes and the Amazon Yunga, a rainforest transitional region between Andes and Lower Amazonia. To clarify the relationships between cultural and biological evolution of these populations, in particular between Amazon Yungas and Andeans, we used DNA-sequence data, a model-based Bayesian approach and several statistical validations to infer a set of demographic parameters. We found that the genetic diversity of the Shimaa (an Amazon Yunga population) is a subset of that of Quechuas from Central-Andes. Using the Isolation-with-Migration population genetics model, we inferred that the Shimaa ancestors were a small subgroup that split less than 5300 years ago (after the development of complex societies) from an ancestral Andean population. After the split, the most plausible scenario compatible with our results is that the ancestors of Shimaas moved toward the Peruvian Amazon Yunga and incorporated the culture and language of some of their neighbors, but not a substantial amount of their genes. We validated our results using Approximate Bayesian Computations, posterior predictive tests and the analysis of pseudo-observed datasets. We presented a case study in which model-based Bayesian approaches, combined with necessary statistical validations, shed light into the prehistoric demographic relationship between Andeans and a population from the Amazon Yunga. Our results offer a testable model for the peopling of this large transitional environmental region between the Andes and the Lower Amazonia. However, studies on larger samples and involving more populations of these regions are necessary to confirm if the predominant Andean biological origin of the Shimaas is the rule, and not the exception.

  18. Population and genomic lessons from genetic analysis of two Indian populations.

    Science.gov (United States)

    Juyal, Garima; Mondal, Mayukh; Luisi, Pierre; Laayouni, Hafid; Sood, Ajit; Midha, Vandana; Heutink, Peter; Bertranpetit, Jaume; Thelma, B K; Casals, Ferran

    2014-10-01

    Indian demographic history includes special features such as founder effects, interpopulation segregation, complex social structure with a caste system and elevated frequency of consanguineous marriages. It also presents a higher frequency for some rare mendelian disorders and in the last two decades increased prevalence of some complex disorders. Despite the fact that India represents about one-sixth of the human population, deep genetic studies from this terrain have been scarce. In this study, we analyzed high-density genotyping and whole-exome sequencing data of a North and a South Indian population. Indian populations show higher differentiation levels than those reported between populations of other continents. In this work, we have analyzed its consequences, by specifically assessing the transferability of genetic markers from or to Indian populations. We show that there is limited genetic marker portability from available genetic resources such as HapMap or the 1,000 Genomes Project to Indian populations, which also present an excess of private rare variants. Conversely, tagSNPs show a high level of portability between the two Indian populations, in contrast to the common belief that North and South Indian populations are genetically very different. By estimating kinship from mates and consanguinity in our data from trios, we also describe different patterns of assortative mating and inbreeding in the two populations, in agreement with distinct mating preferences and social structures. In addition, this analysis has allowed us to describe genomic regions under recent adaptive selection, indicating differential adaptive histories for North and South Indian populations. Our findings highlight the importance of considering demography for design and analysis of genetic studies, as well as the need for extending human genetic variation catalogs to new populations and particularly to those with particular demographic histories.

  19. SSR Analysis of Genetic Diversity Among 192 Diploid Potato Cultivars

    Directory of Open Access Journals (Sweden)

    Xiaoyan Song

    2016-05-01

    Full Text Available In potato breeding, it is difficult to improve the traits of interest at the tetraploid level due to the tetrasomic inheritance. A promising alternative is diploid breeding. Thus it is necessary to assess the genetic diversity of diploid potato germplasm for efficient exploration and deployment of desirable traits. In this study, we used SSR markers to evaluate the genetic diversity of diploid potato cultivars. To screen polymorphic SSR markers, 55 pairs of SSR primers were employed to amplify 39 cultivars with relatively distant genetic relationships. Among them, 12 SSR markers with high polymorphism located at 12 chromosomes were chosen to evaluate the genetic diversity of 192 diploid potato cultivars. The primers produced 6 to 18 bands with an average of 8.2 bands per primer. In total, 98 bands were amplified from 192 cultivars, and 97 of them were polymorphic. Cluster analysis using UPGMA showed the genetic relationships of all accessions tested: 186 of the 192 accessions could be distinguished by only 12 pairs of SSR primers, and the 192 diploid cultivars were divided into 11 groups, and 83.3% constituted the first group. Clustering results showed relatively low genetic diversity among 192 diploid cultivars, with closer relationship at the molecular level. The results can provide molecular basis for diploid potato breeding.

  20. Diagnostic and therapeutic implications of genetic heterogeneity in myeloid neoplasms uncovered by comprehensive mutational analysis

    Directory of Open Access Journals (Sweden)

    Sarah M. Choi

    2017-01-01

    Full Text Available While growing use of comprehensive mutational analysis has led to the discovery of innumerable genetic alterations associated with various myeloid neoplasms, the under-recognized phenomenon of genetic heterogeneity within such neoplasms creates a potential for diagnostic confusion. Here, we describe two cases where expanded mutational testing led to amendment of an initial diagnosis of chronic myelogenous leukemia with subsequent altered treatment of each patient. We demonstrate the power of comprehensive testing in ensuring appropriate classification of genetically heterogeneous neoplasms, and emphasize thoughtful analysis of molecular and genetic data as an essential component of diagnosis and management.

  1. Functional networks inference from rule-based machine learning models.

    Science.gov (United States)

    Lazzarini, Nicola; Widera, Paweł; Williamson, Stuart; Heer, Rakesh; Krasnogor, Natalio; Bacardit, Jaume

    2016-01-01

    Functional networks play an important role in the analysis of biological processes and systems. The inference of these networks from high-throughput (-omics) data is an area of intense research. So far, the similarity-based inference paradigm (e.g. gene co-expression) has been the most popular approach. It assumes a functional relationship between genes which are expressed at similar levels across different samples. An alternative to this paradigm is the inference of relationships from the structure of machine learning models. These models are able to capture complex relationships between variables, that often are different/complementary to the similarity-based methods. We propose a protocol to infer functional networks from machine learning models, called FuNeL. It assumes, that genes used together within a rule-based machine learning model to classify the samples, might also be functionally related at a biological level. The protocol is first tested on synthetic datasets and then evaluated on a test suite of 8 real-world datasets related to human cancer. The networks inferred from the real-world data are compared against gene co-expression networks of equal size, generated with 3 different methods. The comparison is performed from two different points of view. We analyse the enriched biological terms in the set of network nodes and the relationships between known disease-associated genes in a context of the network topology. The comparison confirms both the biological relevance and the complementary character of the knowledge captured by the FuNeL networks in relation to similarity-based methods and demonstrates its potential to identify known disease associations as core elements of the network. Finally, using a prostate cancer dataset as a case study, we confirm that the biological knowledge captured by our method is relevant to the disease and consistent with the specialised literature and with an independent dataset not used in the inference process. The

  2. Evidence of high genetic connectivity for the longnose spurdog Squalus blainville in the Mediterranean Sea

    Directory of Open Access Journals (Sweden)

    V. KOUSTENI

    2015-04-01

    Full Text Available Squalus blainville is one of the least studied Mediterranean shark species. Despite being intensively fished in several locations, biological knowledge is limited and no genetic structure information is available. This is the first study to examine the genetic structure of S. blainville in the Mediterranean Sea. Considering the high dispersal potential inferred for other squalid sharks, the hypothesis of panmixia was tested based on a 585 bp fragment of the mitochondrial DNA cytochrome c oxidase subunit I gene from 107 individuals and six nuclear microsatellite loci from 577 individuals. Samples were collected across the Ionian, Aegean and Libyan Seas and off the Balearic Islands. Twenty three additional sequences of Mediterranean and South African origin were retrieved from GenBank and included in the mitochondrial DNA analysis. The overall haplotype diversity was high, in contrast to the low nucleotide diversity. Low and non-significant pairwise ΦST and FST values along with a Bayesian cluster analysis suggested high connectivity with subsequent genetic homogeneity among the populations studied, and thus a high dispersal potential for S. blainville similar to other squalids. The historical demography of the species was also assessed, revealing a pattern of population expansion since the middle Pleistocene. These findings could be considered in species-specific conservation plans, although sampling over a larger spatial scale and more genetic markers are required to fully elucidate the genetic structure and dispersal potential of S. blainville.

  3. Statistical inference and visualization in scale-space for spatially dependent images

    KAUST Repository

    Vaughan, Amy

    2012-03-01

    SiZer (SIgnificant ZERo crossing of the derivatives) is a graphical scale-space visualization tool that allows for statistical inferences. In this paper we develop a spatial SiZer for finding significant features and conducting goodness-of-fit tests for spatially dependent images. The spatial SiZer utilizes a family of kernel estimates of the image and provides not only exploratory data analysis but also statistical inference with spatial correlation taken into account. It is also capable of comparing the observed image with a specific null model being tested by adjusting the statistical inference using an assumed covariance structure. Pixel locations having statistically significant differences between the image and a given null model are highlighted by arrows. The spatial SiZer is compared with the existing independent SiZer via the analysis of simulated data with and without signal on both planar and spherical domains. We apply the spatial SiZer method to the decadal temperature change over some regions of the Earth. © 2011 The Korean Statistical Society.

  4. Inferring Drosophila gap gene regulatory network: Pattern analysis of simulated gene expression profiles and stability analysis

    NARCIS (Netherlands)

    Fomekong-Nanfack, Y.; Postma, M.; Kaandorp, J.A.

    2009-01-01

    Background: Inference of gene regulatory networks (GRNs) requires accurate data, a method to simulate the expression patterns and an efficient optimization algorithm to estimate the unknown parameters. Using this approach it is possible to obtain alternative circuits without making any a priori

  5. Inferring genome-wide patterns of admixture in Qataris using fifty-five ancestral populations

    Directory of Open Access Journals (Sweden)

    Omberg Larsson

    2012-06-01

    Full Text Available Abstract Background Populations of the Arabian Peninsula have a complex genetic structure that reflects waves of migrations including the earliest human migrations from Africa and eastern Asia, migrations along ancient civilization trading routes and colonization history of recent centuries. Results Here, we present a study of genome-wide admixture in this region, using 156 genotyped individuals from Qatar, a country located at the crossroads of these migration patterns. Since haplotypes of these individuals could have originated from many different populations across the world, we have developed a machine learning method "SupportMix" to infer loci-specific genomic ancestry when simultaneously analyzing many possible ancestral populations. Simulations show that SupportMix is not only more accurate than other popular admixture discovery tools but is the first admixture inference method that can efficiently scale for simultaneous analysis of 50-100 putative ancestral populations while being independent of prior demographic information. Conclusions By simultaneously using the 55 world populations from the Human Genome Diversity Panel, SupportMix was able to extract the fine-scale ancestry of the Qatar population, providing many new observations concerning the ancestry of the region. For example, as well as recapitulating the three major sub-populations in Qatar, composed of mainly Arabic, Persian, and African ancestry, SupportMix additionally identifies the specific ancestry of the Persian group to populations sampled in Greater Persia rather than from China and the ancestry of the African group to sub-Saharan origin and not Southern African Bantu origin as previously thought.

  6. Inferring species interactions through joint mark–recapture analysis

    Science.gov (United States)

    Yackulic, Charles B.; Korman, Josh; Yard, Michael D.; Dzul, Maria C.

    2018-01-01

    Introduced species are frequently implicated in declines of native species. In many cases, however, evidence linking introduced species to native declines is weak. Failure to make strong inferences regarding the role of introduced species can hamper attempts to predict population viability and delay effective management responses. For many species, mark–recapture analysis is the more rigorous form of demographic analysis. However, to our knowledge, there are no mark–recapture models that allow for joint modeling of interacting species. Here, we introduce a two‐species mark–recapture population model in which the vital rates (and capture probabilities) of one species are allowed to vary in response to the abundance of the other species. We use a simulation study to explore bias and choose an approach to model selection. We then use the model to investigate species interactions between endangered humpback chub (Gila cypha) and introduced rainbow trout (Oncorhynchus mykiss) in the Colorado River between 2009 and 2016. In particular, we test hypotheses about how two environmental factors (turbidity and temperature), intraspecific density dependence, and rainbow trout abundance are related to survival, growth, and capture of juvenile humpback chub. We also project the long‐term effects of different rainbow trout abundances on adult humpback chub abundances. Our simulation study suggests this approach has minimal bias under potentially challenging circumstances (i.e., low capture probabilities) that characterized our application and that model selection using indicator variables could reliably identify the true generating model even when process error was high. When the model was applied to rainbow trout and humpback chub, we identified negative relationships between rainbow trout abundance and the survival, growth, and capture probability of juvenile humpback chub. Effects on interspecific interactions on survival and capture probability were strongly

  7. Optimization methods for logical inference

    CERN Document Server

    Chandru, Vijay

    2011-01-01

    Merging logic and mathematics in deductive inference-an innovative, cutting-edge approach. Optimization methods for logical inference? Absolutely, say Vijay Chandru and John Hooker, two major contributors to this rapidly expanding field. And even though ""solving logical inference problems with optimization methods may seem a bit like eating sauerkraut with chopsticks. . . it is the mathematical structure of a problem that determines whether an optimization model can help solve it, not the context in which the problem occurs."" Presenting powerful, proven optimization techniques for logic in

  8. Patterns of genetic differentiation at MHC class I genes and microsatellites identify conservation units in the giant panda.

    Science.gov (United States)

    Zhu, Ying; Wan, Qiu-Hong; Yu, Bin; Ge, Yun-Fa; Fang, Sheng-Guo

    2013-10-22

    Evaluating patterns of genetic variation is important to identify conservation units (i.e., evolutionarily significant units [ESUs], management units [MUs], and adaptive units [AUs]) in endangered species. While neutral markers could be used to infer population history, their application in the estimation of adaptive variation is limited. The capacity to adapt to various environments is vital for the long-term survival of endangered species. Hence, analysis of adaptive loci, such as the major histocompatibility complex (MHC) genes, is critical for conservation genetics studies. Here, we investigated 4 classical MHC class I genes (Aime-C, Aime-F, Aime-I, and Aime-L) and 8 microsatellites to infer patterns of genetic variation in the giant panda (Ailuropoda melanoleuca) and to further define conservation units. Overall, we identified 24 haplotypes (9 for Aime-C, 1 for Aime-F, 7 for Aime-I, and 7 for Aime-L) from 218 individuals obtained from 6 populations of giant panda. We found that the Xiaoxiangling population had the highest genetic variation at microsatellites among the 6 giant panda populations and higher genetic variation at Aime-MHC class I genes than other larger populations (Qinling, Qionglai, and Minshan populations). Differentiation index (FST)-based phylogenetic and Bayesian clustering analyses for Aime-MHC-I and microsatellite loci both supported that most populations were highly differentiated. The Qinling population was the most genetically differentiated. The giant panda showed a relatively higher level of genetic diversity at MHC class I genes compared with endangered felids. Using all of the loci, we found that the 6 giant panda populations fell into 2 ESUs: Qinling and non-Qinling populations. We defined 3 MUs based on microsatellites: Qinling, Minshan-Qionglai, and Daxiangling-Xiaoxiangling-Liangshan. We also recommended 3 possible AUs based on MHC loci: Qinling, Minshan-Qionglai, and Daxiangling-Xiaoxiangling-Liangshan. Furthermore, we recommend

  9. Genetic diversity analysis of Jatropha curcas L. (Euphorbiaceae) based on methylation-sensitive amplification polymorphism.

    Science.gov (United States)

    Kanchanaketu, T; Sangduen, N; Toojinda, T; Hongtrakul, V

    2012-04-13

    Genetic analysis of 56 samples of Jatropha curcas L. collected from Thailand and other countries was performed using the methylation-sensitive amplification polymorphism (MSAP) technique. Nine primer combinations were used to generate MSAP fingerprints. When the data were interpreted as amplified fragment length polymorphism (AFLP) markers, 471 markers were scored. All 56 samples were classified into three major groups: γ-irradiated, non-toxic and toxic accessions. Genetic similarity among the samples was extremely high, ranging from 0.95 to 1.00, which indicated very low genetic diversity in this species. The MSAP fingerprint was further analyzed for DNA methylation polymorphisms. The results revealed differences in the DNA methylation level among the samples. However, the samples collected from saline areas and some species hybrids showed specific DNA methylation patterns. AFLP data were used, together with methylation-sensitive AFLP (MS-AFLP) data, to construct a phylogenetic tree, resulting in higher efficiency to distinguish the samples. This combined analysis separated samples previously grouped in the AFLP analysis. This analysis also distinguished some hybrids. Principal component analysis was also performed; the results confirmed the separation in the phylogenetic tree. Some polymorphic bands, involving both nucleotide and DNA methylation polymorphism, that differed between toxic and non-toxic samples were identified, cloned and sequenced. BLAST analysis of these fragments revealed differences in DNA methylation in some known genes and nucleotide polymorphism in chloroplast DNA. We conclude that MSAP is a powerful technique for the study of genetic diversity for organisms that have a narrow genetic base.

  10. Reliability of dose volume constraint inference from clinical data

    Science.gov (United States)

    Lutz, C. M.; Møller, D. S.; Hoffmann, L.; Knap, M. M.; Alber, M.

    2017-04-01

    Dose volume histogram points (DVHPs) frequently serve as dose constraints in radiotherapy treatment planning. An experiment was designed to investigate the reliability of DVHP inference from clinical data for multiple cohort sizes and complication incidence rates. The experimental background was radiation pneumonitis in non-small cell lung cancer and the DVHP inference method was based on logistic regression. From 102 NSCLC real-life dose distributions and a postulated DVHP model, an ‘ideal’ cohort was generated where the most predictive model was equal to the postulated model. A bootstrap and a Cohort Replication Monte Carlo (CoRepMC) approach were applied to create 1000 equally sized populations each. The cohorts were then analyzed to establish inference frequency distributions. This was applied to nine scenarios for cohort sizes of 102 (1), 500 (2) to 2000 (3) patients (by sampling with replacement) and three postulated DVHP models. The Bootstrap was repeated for a ‘non-ideal’ cohort, where the most predictive model did not coincide with the postulated model. The Bootstrap produced chaotic results for all models of cohort size 1 for both the ideal and non-ideal cohorts. For cohort size 2 and 3, the distributions for all populations were more concentrated around the postulated DVHP. For the CoRepMC, the inference frequency increased with cohort size and incidence rate. Correct inference rates  >85 % were only achieved by cohorts with more than 500 patients. Both Bootstrap and CoRepMC indicate that inference of the correct or approximate DVHP for typical cohort sizes is highly uncertain. CoRepMC results were less spurious than Bootstrap results, demonstrating the large influence that randomness in dose-response has on the statistical analysis.

  11. Inference in `poor` languages

    Energy Technology Data Exchange (ETDEWEB)

    Petrov, S.

    1996-10-01

    Languages with a solvable implication problem but without complete and consistent systems of inference rules (`poor` languages) are considered. The problem of existence of finite complete and consistent inference rule system for a ``poor`` language is stated independently of the language or rules syntax. Several properties of the problem arc proved. An application of results to the language of join dependencies is given.

  12. Population genetics of Leishmania infantum in Israel and the Palestinian Authority through microsatellite analysis.

    Science.gov (United States)

    Amro, Ahmad; Schönian, Gabriele; Al-Sharabati, Mohamed Barakat; Azmi, Kifaya; Nasereddin, Abedelmajeed; Abdeen, Ziad; Schnur, Lionel F; Baneth, Gad; Jaffe, Charles L; Kuhls, Katrin

    2009-04-01

    Multilocus microsatellite typing (MLMT) was used to investigate the genetic variation among 44 Israeli and Palestinian strains of L. infantum isolated from infected dogs and human cases to determine their population structure and to compare them with strains isolated from different European countries. Most of the Israeli and Palestinian strains had their own individual MLMT profiles; a few shared the same profile. A Bayesian model-based approach and phylogenetic reconstructions based on genetic distances inferred two main populations that were significantly different from the European strains: population A, containing 16 strains from places in the West Bank and 11 strains from central Israel;and population B, containing 7 strains from northern Israel, 9 from central Israel, and one Palestinian strain from the Jenin District.Geographically distributed sub-populations were detected within population B. These results demonstrate similar disease dynamics in Israel and the Palestinian Authority. The re-emergence of VL in the case of population A is more likely owing to increased dog and human contact with sylvatic cycles of parasitic infection rather than to recent introduction from the older foci of northern Israel. The latter scenario could be true for population B found in few foci of Central Israel. (c) 2009 Elsevier Masson SAS. All rights reserved.

  13. Genetic diversity and structure analysis in wild and landraces of barley from Jordan by using ISJ markers

    International Nuclear Information System (INIS)

    Baloch, A. W.; Balogh, M. J.; Baloch, M.; Baloch, I. A.

    2016-01-01

    The present experiment was carried out to estimate genetic diversity and genetic structure in cultivated and wild barley populations collected from Jordan which is considered as primary gene pool of barley. In a total, 94 cultivated barley accessions composed of 4 populations and 52 wild barley accessions consisted of 3 populations were used for genetic analysis using 7 Intron Splice Junction (ISJ) markers. The genetic diversity index (He) of cultivated barley ranged between 0.049 and 0.060; whereas that of wild barley populations ranged between 0.084 and 0.146, suggesting that wild resources of barley harbored greater genetic diversity than its domesticated counterpart, reflecting that barley domestication occurred with genetic bottleneck. Analysis of molecular variance showed high genetic variations among rather than within populations, referring that high genetic differentiation of barley populations caused by genetic and geographical separation of the populations in the harsh growing conditions of Fertile Crescent. Principal coordinate, clustering and structure analysis not only separated cultivated and wild barley, but also each single population, showing their genetic basis and original sample site. The obtained Results also revealed that there is lesser genetic communication between cultivated and wild barley under natural environments. The current findings can better be exploited for collection and utilization of plant germplasms. (author)

  14. EI: A Program for Ecological Inference

    Directory of Open Access Journals (Sweden)

    Gary King

    2004-09-01

    Full Text Available The program EI provides a method of inferring individual behavior from aggregate data. It implements the statistical procedures, diagnostics, and graphics from the book A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data (King 1997. Ecological inference, as traditionally defined, is the process of using aggregate (i.e., "ecological" data to infer discrete individual-level relationships of interest when individual-level data are not available. Ecological inferences are required in political science research when individual-level surveys are unavailable (e.g., local or comparative electoral politics, unreliable (racial politics, insufficient (political geography, or infeasible (political history. They are also required in numerous areas of ma jor significance in public policy (e.g., for applying the Voting Rights Act and other academic disciplines ranging from epidemiology and marketing to sociology and quantitative history.

  15. Bryozoans are returning home: recolonization of freshwater ecosystems inferred from phylogenetic relationships.

    Science.gov (United States)

    Koletić, Nikola; Novosel, Maja; Rajević, Nives; Franjević, Damjan

    2015-01-01

    Bryozoans are aquatic invertebrates that inhabit all types of aquatic ecosystems. They are small animals that form large colonies by asexual budding. Colonies can reach the size of several tens of centimeters, while individual units within a colony are the size of a few millimeters. Each individual within a colony works as a separate zooid and is genetically identical to each other individual within the same colony. Most freshwater species of bryozoans belong to the Phylactolaemata class, while several species that tolerate brackish water belong to the Gymnolaemata class. Tissue samples for this study were collected in the rivers of Adriatic and Danube basin and in the wetland areas in the continental part of Croatia (Europe). Freshwater and brackish taxons of bryozoans were genetically analyzed for the purpose of creating phylogenetic relationships between freshwater and brackish taxons of the Phylactolaemata and Gymnolaemata classes and determining the role of brackish species in colonizing freshwater and marine ecosystems. Phylogenetic relationships inferred on the genes for 18S rRNA, 28S rRNA, COI, and ITS2 region confirmed Phylactolaemata bryozoans as radix bryozoan group. Phylogenetic analysis proved Phylactolaemata bryozoan's close relations with taxons from Phoronida phylum as well as the separation of the Lophopodidae family from other families within the Plumatellida genus. Comparative analysis of existing knowledge about the phylogeny of bryozoans and the expansion of known evolutionary hypotheses is proposed with the model of settlement of marine and freshwater ecosystems by the bryozoans group during their evolutionary past. In this case study, brackish bryozoan taxons represent a link for this ecological phylogenetic hypothesis. Comparison of brackish bryozoan species Lophopus crystallinus and Conopeum seurati confirmed a dual colonization of freshwater ecosystems throughout evolution of this group of animals.

  16. Analysis of genetic variation and potential applications in genome-scale metabolic modeling

    DEFF Research Database (Denmark)

    Cardoso, Joao; Andersen, Mikael Rørdam; Herrgard, Markus

    2015-01-01

    scale and resolution by re-sequencing thousands of strains systematically. In this article, we review challenges in the integration and analysis of large-scale re-sequencing data, present an extensive overview of bioinformatics methods for predicting the effects of genetic variants on protein function......Genetic variation is the motor of evolution and allows organisms to overcome the environmental challenges they encounter. It can be both beneficial and harmful in the process of engineering cell factories for the production of proteins and chemicals. Throughout the history of biotechnology......, there have been efforts to exploit genetic variation in our favor to create strains with favorable phenotypes. Genetic variation can either be present in natural populations or it can be artificially created by mutagenesis and selection or adaptive laboratory evolution. On the other hand, unintended genetic...

  17. Genotyping-By-Sequencing for Plant Genetic Diversity Analysis: A Lab Guide for SNP Genotyping

    Directory of Open Access Journals (Sweden)

    Gregory W. Peterson

    2014-10-01

    Full Text Available Genotyping-by-sequencing (GBS has recently emerged as a promising genomic approach for exploring plant genetic diversity on a genome-wide scale. However, many uncertainties and challenges remain in the application of GBS, particularly in non-model species. Here, we present a GBS protocol we developed and use for plant genetic diversity analysis. It uses two restriction enzymes to reduce genome complexity, applies Illumina multiplexing indexes for barcoding and has a custom bioinformatics pipeline for genotyping. This genetic diversity-focused GBS (gd-GBS protocol can serve as an easy-to-follow lab guide to assist a researcher through every step of a GBS application with five main components: sample preparation, library assembly, sequencing, SNP calling and diversity analysis. Specifically, in this presentation, we provide a brief overview of the GBS approach, describe the gd-GBS procedures, illustrate it with an application to analyze genetic diversity in 20 flax (Linum usitatissimum L. accessions and discuss related issues in GBS application. Following these lab bench procedures and using the custom bioinformatics pipeline, one could generate genome-wide SNP genotype data for a conventional genetic diversity analysis of a non-model plant species.

  18. Systems genetics identifies a convergent gene network for cognition and neurodevelopmental disease.

    Science.gov (United States)

    Johnson, Michael R; Shkura, Kirill; Langley, Sarah R; Delahaye-Duriez, Andree; Srivastava, Prashant; Hill, W David; Rackham, Owen J L; Davies, Gail; Harris, Sarah E; Moreno-Moral, Aida; Rotival, Maxime; Speed, Doug; Petrovski, Slavé; Katz, Anaïs; Hayward, Caroline; Porteous, David J; Smith, Blair H; Padmanabhan, Sandosh; Hocking, Lynne J; Starr, John M; Liewald, David C; Visconti, Alessia; Falchi, Mario; Bottolo, Leonardo; Rossetti, Tiziana; Danis, Bénédicte; Mazzuferi, Manuela; Foerch, Patrik; Grote, Alexander; Helmstaedter, Christoph; Becker, Albert J; Kaminski, Rafal M; Deary, Ian J; Petretto, Enrico

    2016-02-01

    Genetic determinants of cognition are poorly characterized, and their relationship to genes that confer risk for neurodevelopmental disease is unclear. Here we performed a systems-level analysis of genome-wide gene expression data to infer gene-regulatory networks conserved across species and brain regions. Two of these networks, M1 and M3, showed replicable enrichment for common genetic variants underlying healthy human cognitive abilities, including memory. Using exome sequence data from 6,871 trios, we found that M3 genes were also enriched for mutations ascertained from patients with neurodevelopmental disease generally, and intellectual disability and epileptic encephalopathy in particular. M3 consists of 150 genes whose expression is tightly developmentally regulated, but which are collectively poorly annotated for known functional pathways. These results illustrate how systems-level analyses can reveal previously unappreciated relationships between neurodevelopmental disease-associated genes in the developed human brain, and provide empirical support for a convergent gene-regulatory network influencing cognition and neurodevelopmental disease.

  19. Preimplantation genetic diagnosis of X-linked diseases examined by indirect linkage analysis.

    Science.gov (United States)

    Borgulova, I; Putzova, M; Soldatova, I; Krautova, L; Pecnova, L; Mika, J; Kren, R; Potuznikova, P; Stejskal, D

    2015-01-01

    Many centers of assisted reproduction in the Czech Republic offer preimplantation genetic diagnosis with fluorescent in situ hybridization (FISH) to couples requiring preimplantation genetic diagnosis (PGD) of X-linked diseases. However, this process results in discarding all male embryos and is not able to distinguish a carrier or healthy female embryo in X-linked recessive disorders. The main aim of this study was to summarize a six-year period of PGD of X-linked monogenic diseases using indirect linkage analysis. We wanted to accentuate the advantage indirect analysis of PGD using multiple displacement amplification (MDA) followed by short tandem repeat (STR) analysis. We present forty-six PGD cycles, including pre-case haplotyping (PGH) panel, for fifteen X-linked diseases. Embryo transfer was made thirty-eight times and gravidity was confirmed in thirteen female probands with a success rate of pregnancy calculated at 42 %. PGD procedure using MDA amplification followed by STR analysis provides help in identifying genetic defects within embryos prior to implantation. The reliability of the method was also supported by high pregnancy rate compared to other publications, which commonly achieved a 30-35 % success rate (Tab. 2, Fig. 1, Ref. 33).

  20. Genetic structure in two northern muriqui populations (Brachyteles hypoxanthus, Primates, Atelidae as inferred from fecal DNA

    Directory of Open Access Journals (Sweden)

    Valéria Fagundes

    2008-01-01

    Full Text Available We assessed the genetic diversity of two northern muriqui (Brachyteles hypoxanthus Primata, Atelidae populations, the Feliciano Miguel Abdala population (FMA, n = 108 in the Brazilian state of Minas Gerais (19°44' S, 41°49' W and the Santa Maria de Jetibá population (SMJ, n = 18 in the Brazilian state of Espírito Santo (20°01' S, 40°44' W. Fecal DNA was isolated and PCR-RFLP analysis used to analyze 2160 bp of mitochondrial DNA, made up of an 820 bp segment of the gene cytochrome c oxidase subunit 2 (cox2, EC 1.9.3.1, an 880 bp segment of the gene cytochrome b (cytb, EC 1.10.2.2 and 460 bp of the hypervariable segment of the mtDNA control region (HVRI. The cox2 and cytb sequences were monomorphic within and between populations whereas the HVRI revealed three different population exclusive haplotypes, one unique to the SMJ population and two, present at similar frequencies, in the FMA population. Overall haplotype diversity (h = 0.609 and nucleotide diversity (pi = 0.181 were high but reduced within populations. The populations were genetically structured with a high fixation index (F ST = 0.725, possibly due to historical subdivision. These findings have conservation implications because they seem to indicate that the populations are distinct management units.

  1. Linking unfounded beliefs to genetic dopamine availability

    Science.gov (United States)

    Schmack, Katharina; Rössler, Hannes; Sekutowicz, Maria; Brandl, Eva J.; Müller, Daniel J.; Petrovic, Predrag; Sterzer, Philipp

    2015-01-01

    Unfounded convictions involving beliefs in the paranormal, grandiosity ideas or suspicious thoughts are endorsed at varying degrees among the general population. Here, we investigated the neurobiopsychological basis of the observed inter-individual variability in the propensity toward unfounded beliefs. One hundred two healthy individuals were genotyped for four polymorphisms in the COMT gene (rs6269, rs4633, rs4818, and rs4680, also known as val158met) that define common functional haplotypes with substantial impact on synaptic dopamine degradation, completed a questionnaire measuring unfounded beliefs, and took part in a behavioral experiment assessing perceptual inference. We found that greater dopamine availability was associated with a stronger propensity toward unfounded beliefs, and that this effect was statistically mediated by an enhanced influence of expectations on perceptual inference. Our results indicate that genetic differences in dopaminergic neurotransmission account for inter-individual differences in perceptual inference linked to the formation and maintenance of unfounded beliefs. Thus, dopamine might be critically involved in the processes underlying one's interpretation of the relationship between the self and the world. PMID:26483654

  2. Linking unfounded beliefs to genetic dopamine availability

    Directory of Open Access Journals (Sweden)

    Katharina eSchmack

    2015-09-01

    Full Text Available Unfounded convictions involving beliefs in the paranormal, grandiosity ideas or suspicious thoughts are endorsed at varying degrees among the general population. Here, we investigated the neurobiopsychological basis of the observed inter-individual variability in the propensity towards unfounded beliefs. 109 healthy individuals were genotyped for four polymorphisms in the COMT gene (rs6269, rs4633, rs4818 and rs4680, also known as val158met that define common functional haplotypes with substantial impact on synaptic dopamine degradation, completed a questionnaire measuring unfounded beliefs, and took part in a behavioural experiment assessing perceptual inference. We found that greater dopamine availability was associated with a stronger propensity towards unfounded beliefs, and that this effect was mediated by an enhanced influence of expectations on perceptual inference. Our results indicate that genetic differences in dopaminergic neurotransmission account for inter-individual differences in perceptual inference linked to the formation and maintenance of unfounded beliefs. Thus, dopamine might be critically involved in the processes underlying one's interpretation of the relationship between the self and the world.

  3. Genetic analysis of human parainfluenza virus type 3 obtained in Croatia, 2011-2015.

    Science.gov (United States)

    Košutić-Gulija, Tanja; Slovic, Anamarija; Ljubin-Sternak, Sunčanica; Mlinarić-Galinović, Gordana; Forčić, Dubravko

    2017-04-01

    This study investigated the HPIV3 circulating strains in Croatia and whether the other parts of HPIV3 genome (F gene and HN 582 nucleotides fragment) could be equally suitable for genetic and phylogenetic analysis. Clinical materials were collected in period 2011-2015 from children suffering from respiratory illnesses. In positive HPIV3 samples viral genome was partially amplified and sequenced for HN and F genes. Obtained sequences were analysed by phylogenetic analysis and genetic characterization was performed. All samples from this study belonged to subcluster C and over a short period of time, genetic lineage C3a gained prevalence over the other C genetic lineages, from 39 % in 2011 to more than 90 % in 2013 and 2014. Phylogenetic classifications of HPIV3 based on the entire HN gene, HN 582 nt fragment and entire fusion (F) gene showed identical classification results for Croatian strains and the reference strains. Molecular analysis of the F and HN glycoproteins, showed their similar nucleotide diversity (Fcds P=0.0244 and HNcds P=0.0231) and similar Ka/Ks ratios (F Ka/Ks=0.0553 and HN Ka/Ks=0.0428). Potential N-glycosylation sites, cysteine residues and antigenic sites are generally strongly conserved in HPIV3 glycoproteins from both our and the reference samples. The HPIV3 subclaster C3 (genetic lineage C3a) became the most detected circulating HPIV3 strain in Croatia. The results indicated that the HN 582 nt and the entire F gene sequences were as good for phylogenetic analysis as the entire HN gene sequence.

  4. Population genetics of Enterocytozoon bieneusi in captive giant pandas of China.

    Science.gov (United States)

    Li, Wei; Song, Yuan; Zhong, Zhijun; Huang, Xiangming; Wang, Chengdong; Li, Caiwu; Yang, Haidi; Liu, Haifeng; Ren, Zhihua; Lan, Jingchao; Wu, Kongju; Peng, Guangneng

    2017-10-18

    Most studies on Enterocytozoon bieneusi are conducted based on the internal transcribed spacer (ITS) region of the rRNA gene, whereas some have examined E. bieneusi population structures. Currently, the population genetics of this pathogen in giant panda remains unknown. The objective of this study was to determine the E. bieneusi population in captive giant pandas in China. We examined 69 E. bieneusi-positive specimens from captive giant pandas in China using five loci (ITS, MS1, MS3, MS4 and MS7) to infer E. bieneusi population genetics. For multilocus genotype (MLG) analysis of E. bieneusi-positive isolates, the MS1, MS3, MS4, and MS7 microsatellite and minisatellite loci were amplified and sequenced in 48, 45, 50 and 47 specimens, respectively, generating ten, eight, nine and five types. We successfully amplified 36 specimens and sequenced all five loci, forming 24 MLGs. Multilocus sequence analysis revealed a strong and significant linkage disequilibrium (LD), indicating a clonal population. This result was further supported by measurements of pairwise intergenic LD and a standardized index of association (I S A ) from allelic profile data. The analysis in STRUCTURE suggested three subpopulations in E. bieneusi, further confirmed using right's fixation index (F ST ). Subpopulations 1 and 2 exhibited an epidemic structure, whereas subpopulation 3 had a clonal structure. Our results describe E. bieneusi population genetics in giant pandas for the first time, improving the current understanding E. bieneusi epidemiology in the studied region. These data also benefit future studies exploring potential transmission risks from pandas to other animals, including humans.

  5. Analysis of genetic diversity in mango ( Mangifera indica L.) using ...

    African Journals Online (AJOL)

    Analysis of genetic diversity in mango ( Mangifera indica L.) using isozymetic polymorphism. ... All the isozymes, used in the present study showed polymorphism for mango. A total of 25 different electrophoretic ... HOW TO USE AJOL.

  6. Smoking and caffeine consumption: a genetic analysis of their association.

    Science.gov (United States)

    Treur, Jorien L; Taylor, Amy E; Ware, Jennifer J; Nivard, Michel G; Neale, Michael C; McMahon, George; Hottenga, Jouke-Jan; Baselmans, Bart M L; Boomsma, Dorret I; Munafò, Marcus R; Vink, Jacqueline M

    2017-07-01

    Smoking and caffeine consumption show a strong positive correlation, but the mechanism underlying this association is unclear. Explanations include shared genetic/environmental factors or causal effects. This study employed three methods to investigate the association between smoking and caffeine. First, bivariate genetic models were applied to data of 10 368 twins from the Netherlands Twin Register in order to estimate genetic and environmental correlations between smoking and caffeine use. Second, from the summary statistics of meta-analyses of genome-wide association studies on smoking and caffeine, the genetic correlation was calculated by LD-score regression. Third, causal effects were tested using Mendelian randomization analysis in 6605 Netherlands Twin Register participants and 5714 women from the Avon Longitudinal Study of Parents and Children. Through twin modelling, a genetic correlation of r0.47 and an environmental correlation of r0.30 were estimated between current smoking (yes/no) and coffee use (high/low). Between current smoking and total caffeine use, this was r0.44 and r0.00, respectively. LD-score regression also indicated sizeable genetic correlations between smoking and coffee use (r0.44 between smoking heaviness and cups of coffee per day, r0.28 between smoking initiation and coffee use and r0.25 between smoking persistence and coffee use). Consistent with the relatively high genetic correlations and lower environmental correlations, Mendelian randomization provided no evidence for causal effects of smoking on caffeine or vice versa. Genetic factors thus explain most of the association between smoking and caffeine consumption. These findings suggest that quitting smoking may be more difficult for heavy caffeine consumers, given their genetic susceptibility. © 2016 The Authors.Addiction Biology published by John Wiley & Sons Ltd on behalf of Society for the Study of Addiction.

  7. Bayesian inference for the genetic control of water deficit tolerance in spring wheat by stochastic search variable selection.

    Science.gov (United States)

    Safari, Parviz; Danyali, Syyedeh Fatemeh; Rahimi, Mehdi

    2018-06-02

    Drought is the main abiotic stress seriously influencing wheat production. Information about the inheritance of drought tolerance is necessary to determine the most appropriate strategy to develop tolerant cultivars and populations. In this study, generation means analysis to identify the genetic effects controlling grain yield inheritance in water deficit and normal conditions was considered as a model selection problem in a Bayesian framework. Stochastic search variable selection (SSVS) was applied to identify the most important genetic effects and the best fitted models using different generations obtained from two crosses applying two water regimes in two growing seasons. The SSVS is used to evaluate the effect of each variable on the dependent variable via posterior variable inclusion probabilities. The model with the highest posterior probability is selected as the best model. In this study, the grain yield was controlled by the main effects (additive and non-additive effects) and epistatic. The results demonstrate that breeding methods such as recurrent selection and subsequent pedigree method and hybrid production can be useful to improve grain yield.

  8. Analysis of the genetic diversity of selected East African sweet potato

    African Journals Online (AJOL)

    The genetic relationship of the germplasm was evaluated using the Jaccard's coefficient for dissimilarity analysis, unweighted pair group method with arithmetic means (UPGMA) tree and principal component analysis (PCoA) on DARwin software, while summary statistics was done using PowerMarker and Popgene ...

  9. Genetic analysis of Schizosaccharomyces pombe

    DEFF Research Database (Denmark)

    Ekwall, Karl; Thon, Genevieve

    2017-01-01

    In this introduction we discuss some basic genetic tools and techniques that are used with the fission yeast Schizosaccharomyces pombe. Genes commonly used for selection or as reporters are discussed, with an emphasis on genes that permit counterselection, intragenic complementation, or colony......-color assays. S. pombe is most stable as a haploid organism. We describe its mating-type system, how to perform genetic crosses and methods for selecting and propagating diploids. We discuss the relative merits of tetrad dissection and random spore preparation in strain construction and genetic analyses...

  10. The genetic analysis of repeated measures I: Simplex models

    NARCIS (Netherlands)

    Molenaar, P.C.M.; Boomsma, D.I.

    1987-01-01

    Extends the simplex model to a model that may be used for the genetic and environmental analysis of covariance (ANCOVA) structures. This "double" simplex structure can be specified as a linear structural relationships model. It is shown that data that give rise to a simplex correlation structure,

  11. Phylogeny and genetic diversity of Bridgeoporus nobilissimus inferred using mitochondrial and nuclear rDNA sequences

    Science.gov (United States)

    Redberg, G.L.; Hibbett, D.S.; Ammirati, J.F.; Rodriguez, R.J.

    2003-01-01

    The genetic diversity and phylogeny of Bridgeoporus nobilissimus have been analyzed. DNA was extracted from spores collected from individual fruiting bodies representing six geographically distinct populations in Oregon and Washington. Spore samples collected contained low levels of bacteria, yeast and a filamentous fungal species. Using taxon-specific PCR primers, it was possible to discriminate among rDNA from bacteria, yeast, a filamentous associate and B. nobilissimus. Nuclear rDNA internal transcribed spacer (ITS) region sequences of B. nobilissimus were compared among individuals representing six populations and were found to have less than 2% variation. These sequences also were used to design dual and nested PCR primers for B. nobilissimus-specific amplification. Mitochondrial small-subunit rDNA sequences were used in a phylogenetic analysis that placed B. nobilissimus in the hymenochaetoid clade, where it was associated with Oxyporus and Schizopora.

  12. On the criticality of inferred models

    Science.gov (United States)

    Mastromatteo, Iacopo; Marsili, Matteo

    2011-10-01

    Advanced inference techniques allow one to reconstruct a pattern of interaction from high dimensional data sets, from probing simultaneously thousands of units of extended systems—such as cells, neural tissues and financial markets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to singular values of parameters, akin to critical points in physics where phase transitions occur. These are points where the response of physical systems to external perturbations, as measured by the susceptibility, is very large and diverges in the limit of infinite size. We show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher information) are directly related to the susceptibility of the inferred model. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. This region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time scales naturally yield models which are close to criticality.

  13. On the criticality of inferred models

    International Nuclear Information System (INIS)

    Mastromatteo, Iacopo; Marsili, Matteo

    2011-01-01

    Advanced inference techniques allow one to reconstruct a pattern of interaction from high dimensional data sets, from probing simultaneously thousands of units of extended systems—such as cells, neural tissues and financial markets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to singular values of parameters, akin to critical points in physics where phase transitions occur. These are points where the response of physical systems to external perturbations, as measured by the susceptibility, is very large and diverges in the limit of infinite size. We show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher information) are directly related to the susceptibility of the inferred model. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. This region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time scales naturally yield models which are close to criticality

  14. Vertically Integrated Seismological Analysis II : Inference

    Science.gov (United States)

    Arora, N. S.; Russell, S.; Sudderth, E.

    2009-12-01

    Methods for automatically associating detected waveform features with hypothesized seismic events, and localizing those events, are a critical component of efforts to verify the Comprehensive Test Ban Treaty (CTBT). As outlined in our companion abstract, we have developed a hierarchical model which views detection, association, and localization as an integrated probabilistic inference problem. In this abstract, we provide more details on the Markov chain Monte Carlo (MCMC) methods used to solve this inference task. MCMC generates samples from a posterior distribution π(x) over possible worlds x by defining a Markov chain whose states are the worlds x, and whose stationary distribution is π(x). In the Metropolis-Hastings (M-H) method, transitions in the Markov chain are constructed in two steps. First, given the current state x, a candidate next state x‧ is generated from a proposal distribution q(x‧ | x), which may be (more or less) arbitrary. Second, the transition to x‧ is not automatic, but occurs with an acceptance probability—α(x‧ | x) = min(1, π(x‧)q(x | x‧)/π(x)q(x‧ | x)). The seismic event model outlined in our companion abstract is quite similar to those used in multitarget tracking, for which MCMC has proved very effective. In this model, each world x is defined by a collection of events, a list of properties characterizing those events (times, locations, magnitudes, and types), and the association of each event to a set of observed detections. The target distribution π(x) = P(x | y), the posterior distribution over worlds x given the observed waveform data y at all stations. Proposal distributions then implement several types of moves between worlds. For example, birth moves create new events; death moves delete existing events; split moves partition the detections for an event into two new events; merge moves combine event pairs; swap moves modify the properties and assocations for pairs of events. Importantly, the rules for

  15. An Inference Language for Imaging

    DEFF Research Database (Denmark)

    Pedemonte, Stefano; Catana, Ciprian; Van Leemput, Koen

    2014-01-01

    We introduce iLang, a language and software framework for probabilistic inference. The iLang framework enables the definition of directed and undirected probabilistic graphical models and the automated synthesis of high performance inference algorithms for imaging applications. The iLang framewor...

  16. Metis: A Pure Metropolis Markov Chain Monte Carlo Bayesian Inference Library

    Energy Technology Data Exchange (ETDEWEB)

    Bates, Cameron Russell [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Mckigney, Edward Allen [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2018-01-09

    The use of Bayesian inference in data analysis has become the standard for large scienti c experiments [1, 2]. The Monte Carlo Codes Group(XCP-3) at Los Alamos has developed a simple set of algorithms currently implemented in C++ and Python to easily perform at-prior Markov Chain Monte Carlo Bayesian inference with pure Metropolis sampling. These implementations are designed to be user friendly and extensible for customization based on speci c application requirements. This document describes the algorithmic choices made and presents two use cases.

  17. Analysis of the genetic diversity of four rabbit genotypes using ...

    African Journals Online (AJOL)

    Dr.Ola

    2013-05-15

    May 15, 2013 ... consumption and low cost, it has been widely utilized in genetics analysis in ... isozyme variation among the selected individuals within each rabbit genotype. ... with different embryo survival (Bolet and Theau-Clement, 1994).

  18. The Distinct Genetics of Carbonaceous and Non-Carbonaceous Meteorites Inferred from Molybdenum Isotopes

    Science.gov (United States)

    Budde, G.; Burkhardt, C.; Kleine, T.

    2017-07-01

    Mo isotope systematics manifest a fundamental dichotomy in the genetic heritage of carbonaceous and non-carbonaceous meteorites. We discuss its implications in light of the most recent literature data and new isotope data for primitive achondrites.

  19. Genetic Counseling, Professional Values, and Habitus: An Analysis of Disability Narratives in Textbooks.

    Science.gov (United States)

    Reed, Amy R

    2016-10-19

    This article analyzes narrative illustrations in genetic counseling textbooks as a way of understanding professional habitus--the dispositions that motivate professional behavior. In particular, this analysis shows that there are significant differences in how the textbooks' expository and narrative portions represent Down syndrome, genetic counseling practice, and patient behaviors. While the narrative portions of the text position the genetic counseling profession as working in service to the values of genetic medicine, the expository portions represent genetic counselors as neutral parties. Ultimately, this article argues that this ambiguity is harmful to the production of a professional habitus that is consistent with espoused professional values concerning respect for persons with disabilities and the promotion of psychosocial counseling.

  20. Inference

    DEFF Research Database (Denmark)

    Møller, Jesper

    2010-01-01

    Chapter 9: This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods based on a maximum likelihood or Bayesian approach combined with markov chain Monte Carlo...... (MCMC) techniques. Due to space limitations the focus is on spatial point processes....