Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
Statistically significant relational data mining :
Energy Technology Data Exchange (ETDEWEB)
Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.
2014-02-01
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
On statistical significance of signal
International Nuclear Information System (INIS)
Zhu Yongsheng
2006-01-01
A definition for the statistical significance of a signal in an experiment is proposed by establishing a correlation between the observed p-value and the normal distribution integral probability, which is suitable for both counting experiment and continuous test statistics. The explicit expressions to calculate the statistical significance for both cases are given. (author)
Research Pearls: The Significance of Statistics and Perils of Pooling. Part 2: Predictive Modeling.
Hohmann, Erik; Wetzler, Merrick J; D'Agostino, Ralph B
2017-07-01
The focus of predictive modeling or predictive analytics is to use statistical techniques to predict outcomes and/or the results of an intervention or observation for patients that are conditional on a specific set of measurements taken on the patients prior to the outcomes occurring. Statistical methods to estimate these models include using such techniques as Bayesian methods; data mining methods, such as machine learning; and classical statistical models of regression such as logistic (for binary outcomes), linear (for continuous outcomes), and survival (Cox proportional hazards) for time-to-event outcomes. A Bayesian approach incorporates a prior estimate that the outcome of interest is true, which is made prior to data collection, and then this prior probability is updated to reflect the information provided by the data. In principle, data mining uses specific algorithms to identify patterns in data sets and allows a researcher to make predictions about outcomes. Regression models describe the relations between 2 or more variables where the primary difference among methods concerns the form of the outcome variable, whether it is measured as a binary variable (i.e., success/failure), continuous measure (i.e., pain score at 6 months postop), or time to event (i.e., time to surgical revision). The outcome variable is the variable of interest, and the predictor variable(s) are used to predict outcomes. The predictor variable is also referred to as the independent variable and is assumed to be something the researcher can modify in order to see its impact on the outcome (i.e., using one of several possible surgical approaches). Survival analysis investigates the time until an event occurs. This can be an event such as failure of a medical device or death. It allows the inclusion of censored data, meaning that not all patients need to have the event (i.e., die) prior to the study's completion. Copyright © 2017 Arthroscopy Association of North America. Published by
Statistical Significance for Hierarchical Clustering
Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.
2017-01-01
Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990
Statistical significance of cis-regulatory modules
Directory of Open Access Journals (Sweden)
Smith Andrew D
2007-01-01
Full Text Available Abstract Background It is becoming increasingly important for researchers to be able to scan through large genomic regions for transcription factor binding sites or clusters of binding sites forming cis-regulatory modules. Correspondingly, there has been a push to develop algorithms for the rapid detection and assessment of cis-regulatory modules. While various algorithms for this purpose have been introduced, most are not well suited for rapid, genome scale scanning. Results We introduce methods designed for the detection and statistical evaluation of cis-regulatory modules, modeled as either clusters of individual binding sites or as combinations of sites with constrained organization. In order to determine the statistical significance of module sites, we first need a method to determine the statistical significance of single transcription factor binding site matches. We introduce a straightforward method of estimating the statistical significance of single site matches using a database of known promoters to produce data structures that can be used to estimate p-values for binding site matches. We next introduce a technique to calculate the statistical significance of the arrangement of binding sites within a module using a max-gap model. If the module scanned for has defined organizational parameters, the probability of the module is corrected to account for organizational constraints. The statistical significance of single site matches and the architecture of sites within the module can be combined to provide an overall estimation of statistical significance of cis-regulatory module sites. Conclusion The methods introduced in this paper allow for the detection and statistical evaluation of single transcription factor binding sites and cis-regulatory modules. The features described are implemented in the Search Tool for Occurrences of Regulatory Motifs (STORM and MODSTORM software.
Directory of Open Access Journals (Sweden)
E. A. Tatokchin
2017-01-01
Full Text Available Development of the modern educational technologies caused by broad introduction of comput-er testing and development of distant forms of education does necessary revision of methods of an examination of pupils. In work it was shown, need transition to mathematical criteria, exami-nations of knowledge which are deprived of subjectivity. In article the review of the problems arising at realization of this task and are offered approaches for its decision. The greatest atten-tion is paid to discussion of a problem of objective transformation of rated estimates of the ex-pert on to the scale estimates of the student. In general, the discussion this question is was con-cluded that the solution to this problem lies in the creation of specialized intellectual systems. The basis for constructing intelligent system laid the mathematical model of self-organizing nonequilibrium dissipative system, which is a group of students. This article assumes that the dissipative system is provided by the constant influx of new test items of the expert and non-equilibrium – individual psychological characteristics of students in the group. As a result, the system must self-organize themselves into stable patterns. This patern will allow for, relying on large amounts of data, get a statistically significant assessment of student. To justify the pro-posed approach in the work presents the data of the statistical analysis of the results of testing a large sample of students (> 90. Conclusions from this statistical analysis allowed to develop intelligent system statistically significant examination of student performance. It is based on data clustering algorithm (k-mean for the three key parameters. It is shown that this approach allows you to create of the dynamics and objective expertise evaluation.
The thresholds for statistical and clinical significance
DEFF Research Database (Denmark)
Jakobsen, Janus Christian; Gluud, Christian; Winkel, Per
2014-01-01
, assessment of intervention effects in randomised clinical trials deserves more rigour in order to become more valid. METHODS: Several methodologies for assessing the statistical and clinical significance of intervention effects in randomised clinical trials were considered. Balancing simplicity......BACKGROUND: Thresholds for statistical significance are insufficiently demonstrated by 95% confidence intervals or P-values when assessing results from randomised clinical trials. First, a P-value only shows the probability of getting a result assuming that the null hypothesis is true and does...... not reflect the probability of getting a result assuming an alternative hypothesis to the null hypothesis is true. Second, a confidence interval or a P-value showing significance may be caused by multiplicity. Third, statistical significance does not necessarily result in clinical significance. Therefore...
The insignificance of statistical significance testing
Johnson, Douglas H.
1999-01-01
Despite their use in scientific journals such as The Journal of Wildlife Management, statistical hypothesis tests add very little value to the products of research. Indeed, they frequently confuse the interpretation of data. This paper describes how statistical hypothesis tests are often viewed, and then contrasts that interpretation with the correct one. I discuss the arbitrariness of P-values, conclusions that the null hypothesis is true, power analysis, and distinctions between statistical and biological significance. Statistical hypothesis testing, in which the null hypothesis about the properties of a population is almost always known a priori to be false, is contrasted with scientific hypothesis testing, which examines a credible null hypothesis about phenomena in nature. More meaningful alternatives are briefly outlined, including estimation and confidence intervals for determining the importance of factors, decision theory for guiding actions in the face of uncertainty, and Bayesian approaches to hypothesis testing and other statistical practices.
Assessing statistical significance in causal graphs.
Chindelevitch, Leonid; Loh, Po-Ru; Enayetallah, Ahmed; Berger, Bonnie; Ziemek, Daniel
2012-02-20
Causal graphs are an increasingly popular tool for the analysis of biological datasets. In particular, signed causal graphs--directed graphs whose edges additionally have a sign denoting upregulation or downregulation--can be used to model regulatory networks within a cell. Such models allow prediction of downstream effects of regulation of biological entities; conversely, they also enable inference of causative agents behind observed expression changes. However, due to their complex nature, signed causal graph models present special challenges with respect to assessing statistical significance. In this paper we frame and solve two fundamental computational problems that arise in practice when computing appropriate null distributions for hypothesis testing. First, we show how to compute a p-value for agreement between observed and model-predicted classifications of gene transcripts as upregulated, downregulated, or neither. Specifically, how likely are the classifications to agree to the same extent under the null distribution of the observed classification being randomized? This problem, which we call "Ternary Dot Product Distribution" owing to its mathematical form, can be viewed as a generalization of Fisher's exact test to ternary variables. We present two computationally efficient algorithms for computing the Ternary Dot Product Distribution and investigate its combinatorial structure analytically and numerically to establish computational complexity bounds.Second, we develop an algorithm for efficiently performing random sampling of causal graphs. This enables p-value computation under a different, equally important null distribution obtained by randomizing the graph topology but keeping fixed its basic structure: connectedness and the positive and negative in- and out-degrees of each vertex. We provide an algorithm for sampling a graph from this distribution uniformly at random. We also highlight theoretical challenges unique to signed causal graphs
Swiss solar power statistics 2007 - Significant expansion
International Nuclear Information System (INIS)
Hostettler, T.
2008-01-01
This article presents and discusses the 2007 statistics for solar power in Switzerland. A significant number of new installations is noted as is the high production figures from newer installations. The basics behind the compilation of the Swiss solar power statistics are briefly reviewed and an overview for the period 1989 to 2007 is presented which includes figures on the number of photovoltaic plant in service and installed peak power. Typical production figures in kilowatt-hours (kWh) per installed kilowatt-peak power (kWp) are presented and discussed for installations of various sizes. Increased production after inverter replacement in older installations is noted. Finally, the general political situation in Switzerland as far as solar power is concerned are briefly discussed as are international developments.
Statistical Model for Content Extraction
DEFF Research Database (Denmark)
Qureshi, Pir Abdul Rasool; Memon, Nasrullah
2011-01-01
We present a statistical model for content extraction from HTML documents. The model operates on Document Object Model (DOM) tree of the corresponding HTML document. It evaluates each tree node and associated statistical features to predict significance of the node towards overall content of the ...... also describe the significance of the model in the domain of counterterrorism and open source intelligence....
Goldstein, Harvey
2011-01-01
This book provides a clear introduction to this important area of statistics. The author provides a wide of coverage of different kinds of multilevel models, and how to interpret different statistical methodologies and algorithms applied to such models. This 4th edition reflects the growth and interest in this area and is updated to include new chapters on multilevel models with mixed response types, smoothing and multilevel data, models with correlated random effects and modeling with variance.
Sampling, Probability Models and Statistical Reasoning Statistical ...
Indian Academy of Sciences (India)
Home; Journals; Resonance – Journal of Science Education; Volume 1; Issue 5. Sampling, Probability Models and Statistical Reasoning Statistical Inference. Mohan Delampady V R Padmawar. General Article Volume 1 Issue 5 May 1996 pp 49-58 ...
Sampling, Probability Models and Statistical Reasoning Statistical
Indian Academy of Sciences (India)
Home; Journals; Resonance – Journal of Science Education; Volume 1; Issue 5. Sampling, Probability Models and Statistical Reasoning Statistical Inference. Mohan Delampady V R Padmawar. General Article Volume 1 Issue 5 May 1996 pp 49-58 ...
Diffeomorphic Statistical Deformation Models
DEFF Research Database (Denmark)
Hansen, Michael Sass; Hansen, Mads/Fogtman; Larsen, Rasmus
2007-01-01
In this paper we present a new method for constructing diffeomorphic statistical deformation models in arbitrary dimensional images with a nonlinear generative model and a linear parameter space. Our deformation model is a modified version of the diffeomorphic model introduced by Cootes et al....... The modifications ensure that no boundary restriction has to be enforced on the parameter space to prevent folds or tears in the deformation field. For straightforward statistical analysis, principal component analysis and sparse methods, we assume that the parameters for a class of deformations lie on a linear...
Caveats for using statistical significance tests in research assessments
DEFF Research Database (Denmark)
Schneider, Jesper Wiborg
2013-01-01
This article raises concerns about the advantages of using statistical significance tests in research assessments as has recently been suggested in the debate about proper normalization procedures for citation indicators by Opthof and Leydesdorff (2010). Statistical significance tests are highly ...... are important or not. On the contrary their use may be harmful. Like many other critics, we generally believe that statistical significance tests are over- and misused in the empirical sciences including scientometrics and we encourage a reform on these matters....
The Use of Meta-Analytic Statistical Significance Testing
Polanin, Joshua R.; Pigott, Terri D.
2015-01-01
Meta-analysis multiplicity, the concept of conducting multiple tests of statistical significance within one review, is an underdeveloped literature. We address this issue by considering how Type I errors can impact meta-analytic results, suggest how statistical power may be affected through the use of multiplicity corrections, and propose how…
On detection and assessment of statistical significance of Genomic Islands
Directory of Open Access Journals (Sweden)
Chaudhuri Probal
2008-04-01
Full Text Available Abstract Background Many of the available methods for detecting Genomic Islands (GIs in prokaryotic genomes use markers such as transposons, proximal tRNAs, flanking repeats etc., or they use other supervised techniques requiring training datasets. Most of these methods are primarily based on the biases in GC content or codon and amino acid usage of the islands. However, these methods either do not use any formal statistical test of significance or use statistical tests for which the critical values and the P-values are not adequately justified. We propose a method, which is unsupervised in nature and uses Monte-Carlo statistical tests based on randomly selected segments of a chromosome. Such tests are supported by precise statistical distribution theory, and consequently, the resulting P-values are quite reliable for making the decision. Results Our algorithm (named Design-Island, an acronym for Detection of Statistically Significant Genomic Island runs in two phases. Some 'putative GIs' are identified in the first phase, and those are refined into smaller segments containing horizontally acquired genes in the refinement phase. This method is applied to Salmonella typhi CT18 genome leading to the discovery of several new pathogenicity, antibiotic resistance and metabolic islands that were missed by earlier methods. Many of these islands contain mobile genetic elements like phage-mediated genes, transposons, integrase and IS elements confirming their horizontal acquirement. Conclusion The proposed method is based on statistical tests supported by precise distribution theory and reliable P-values along with a technique for visualizing statistically significant islands. The performance of our method is better than many other well known methods in terms of their sensitivity and accuracy, and in terms of specificity, it is comparable to other methods.
Methods of statistical model estimation
Hilbe, Joseph
2013-01-01
Methods of Statistical Model Estimation examines the most important and popular methods used to estimate parameters for statistical models and provide informative model summary statistics. Designed for R users, the book is also ideal for anyone wanting to better understand the algorithms used for statistical model fitting. The text presents algorithms for the estimation of a variety of regression procedures using maximum likelihood estimation, iteratively reweighted least squares regression, the EM algorithm, and MCMC sampling. Fully developed, working R code is constructed for each method. Th
Exclusion statistics and integrable models
International Nuclear Information System (INIS)
Mashkevich, S.
1998-01-01
The definition of exclusion statistics that was given by Haldane admits a 'statistical interaction' between distinguishable particles (multispecies statistics). For such statistics, thermodynamic quantities can be evaluated exactly; explicit expressions are presented here for cluster coefficients. Furthermore, single-species exclusion statistics is realized in one-dimensional integrable models of the Calogero-Sutherland type. The interesting questions of generalizing this correspondence to the higher-dimensional and the multispecies cases remain essentially open; however, our results provide some hints as to searches for the models in question
Sensometrics: Thurstonian and Statistical Models
DEFF Research Database (Denmark)
Christensen, Rune Haubo Bojesen
This thesis is concerned with the development and bridging of Thurstonian and statistical models for sensory discrimination testing as applied in the scientific discipline of sensometrics. In sensory discrimination testing sensory differences between products are detected and quantified by the us...... of generalized linear mixed models, cumulative link models and cumulative link mixed models. The relation between the Wald, likelihood and score statistics is expanded upon using the shape of the (profile) likelihood function as common reference....
Statistical modeling for degradation data
Lio, Yuhlong; Ng, Hon; Tsai, Tzong-Ru
2017-01-01
This book focuses on the statistical aspects of the analysis of degradation data. In recent years, degradation data analysis has come to play an increasingly important role in different disciplines such as reliability, public health sciences, and finance. For example, information on products’ reliability can be obtained by analyzing degradation data. In addition, statistical modeling and inference techniques have been developed on the basis of different degradation measures. The book brings together experts engaged in statistical modeling and inference, presenting and discussing important recent advances in degradation data analysis and related applications. The topics covered are timely and have considerable potential to impact both statistics and reliability engineering.
ASSA: Fast identification of statistically significant interactions between long RNAs.
Antonov, Ivan; Marakhonov, Andrey; Zamkova, Maria; Medvedeva, Yulia
2018-02-01
The discovery of thousands of long noncoding RNAs (lncRNAs) in mammals raises a question about their functionality. It has been shown that some of them are involved in post-transcriptional regulation of other RNAs and form inter-molecular duplexes with their targets. Sequence alignment tools have been used for transcriptome-wide prediction of RNA-RNA interactions. However, such approaches have poor prediction accuracy since they ignore RNA's secondary structure. Application of the thermodynamics-based algorithms to long transcripts is not computationally feasible on a large scale. Here, we describe a new computational pipeline ASSA that combines sequence alignment and thermodynamics-based tools for efficient prediction of RNA-RNA interactions between long transcripts. To measure the hybridization strength, the sum energy of all the putative duplexes is computed. The main novelty implemented in ASSA is the ability to quickly estimate the statistical significance of the observed interaction energies. Most of the functional hybridizations between long RNAs were classified as statistically significant. ASSA outperformed 11 other tools in terms of the Area Under the Curve on two out of four test sets. Additionally, our results emphasized a unique property of the [Formula: see text] repeats with respect to the RNA-RNA interactions in the human transcriptome. ASSA is available at https://sourceforge.net/projects/assa/.
Statistical modelling with quantile functions
Gilchrist, Warren
2000-01-01
Galton used quantiles more than a hundred years ago in describing data. Tukey and Parzen used them in the 60s and 70s in describing populations. Since then, the authors of many papers, both theoretical and practical, have used various aspects of quantiles in their work. Until now, however, no one put all the ideas together to form what turns out to be a general approach to statistics.Statistical Modelling with Quantile Functions does just that. It systematically examines the entire process of statistical modelling, starting with using the quantile function to define continuous distributions. The author shows that by using this approach, it becomes possible to develop complex distributional models from simple components. A modelling kit can be developed that applies to the whole model - deterministic and stochastic components - and this kit operates by adding, multiplying, and transforming distributions rather than data.Statistical Modelling with Quantile Functions adds a new dimension to the practice of stati...
Statistical validation of stochastic models
Energy Technology Data Exchange (ETDEWEB)
Hunter, N.F. [Los Alamos National Lab., NM (United States). Engineering Science and Analysis Div.; Barney, P.; Paez, T.L. [Sandia National Labs., Albuquerque, NM (United States). Experimental Structural Dynamics Dept.; Ferregut, C.; Perez, L. [Univ. of Texas, El Paso, TX (United States). Dept. of Civil Engineering
1996-12-31
It is common practice in structural dynamics to develop mathematical models for system behavior, and the authors are now capable of developing stochastic models, i.e., models whose parameters are random variables. Such models have random characteristics that are meant to simulate the randomness in characteristics of experimentally observed systems. This paper suggests a formal statistical procedure for the validation of mathematical models of stochastic systems when data taken during operation of the stochastic system are available. The statistical characteristics of the experimental system are obtained using the bootstrap, a technique for the statistical analysis of non-Gaussian data. The authors propose a procedure to determine whether or not a mathematical model is an acceptable model of a stochastic system with regard to user-specified measures of system behavior. A numerical example is presented to demonstrate the application of the technique.
A tutorial on hunting statistical significance by chasing N
Directory of Open Access Journals (Sweden)
Denes Szucs
2016-09-01
Full Text Available There is increasing concern about the replicability of studies in psychology and cognitive neuroscience. Hidden data dredging (also called p-hacking is a major contributor to this crisis because it substantially increases Type I error resulting in a much larger proportion of false positive findings than the usually expected 5%. In order to build better intuition to avoid, detect and criticise some typical problems, here I systematically illustrate the large impact of some easy to implement and so, perhaps frequent data dredging techniques on boosting false positive findings. I illustrate several forms of two special cases of data dredging. First, researchers may violate the data collection stopping rules of null hypothesis significance testing by repeatedly checking for statistical significance with various numbers of participants. Second, researchers may group participants post-hoc along potential but unplanned independent grouping variables. The first approach 'hacks' the number of participants in studies, the second approach ‘hacks’ the number of variables in the analysis. I demonstrate the high amount of false positive findings generated by these techniques with data from true null distributions. I also illustrate that it is extremely easy to introduce strong bias into data by very mild selection and re-testing. Similar, usually undocumented data dredging steps can easily lead to having 20-50%, or more false positives.
DEFF Research Database (Denmark)
Engsted, Tom
I comment on the controversy between McCloskey & Ziliak and Hoover & Siegler on statistical versus economic significance, in the March 2008 issue of the Journal of Economic Methodology. I argue that while McCloskey & Ziliak are right in emphasizing 'real error', i.e. non-sampling error that cannot...... be eliminated through specification testing, they fail to acknowledge those areas in economics, e.g. rational expectations macroeconomics and asset pricing, where researchers clearly distinguish between statistical and economic significance and where statistical testing plays a relatively minor role in model...... reliable estimates, and I argue that significance tests are useful tools in those cases where a statistical model serves as input in the quantification of an economic model. Finally, I provide a specific example from economics - asset return predictability - where the distinction between statistical...
Statistical Models for Social Networks
Snijders, Tom A. B.; Cook, KS; Massey, DS
2011-01-01
Statistical models for social networks as dependent variables must represent the typical network dependencies between tie variables such as reciprocity, homophily, transitivity, etc. This review first treats models for single (cross-sectionally observed) networks and then for network dynamics. For
Statistical Model of Extreme Shear
DEFF Research Database (Denmark)
Hansen, Kurt Schaldemose; Larsen, Gunner Chr.
2005-01-01
In order to continue cost-optimisation of modern large wind turbines, it is important to continuously increase the knowledge of wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... (PDF) of turbulence driven short-term extreme wind shear events, conditioned on the mean wind speed, for an arbitrary recurrence period. The model is based on an asymptotic expansion, and only a few and easily accessible parameters are needed as input. The model of the extreme PDF is supplemented...... by a model that, on a statistically consistent basis, describes the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of full-scale measurements recorded with a high sampling rate...
Statistical Model of Extreme Shear
DEFF Research Database (Denmark)
Larsen, Gunner Chr.; Hansen, Kurt Schaldemose
2004-01-01
In order to continue cost-optimisation of modern large wind turbines, it is important to continously increase the knowledge on wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... (PDF) of turbulence driven short-term extreme wind shear events, conditioned on the mean wind speed, for an arbitrary recurrence period. The model is based on an asymptotic expansion, and only a few and easily accessible parameters are needed as input. The model of the extreme PDF is supplemented...... by a model that, on a statistically consistent basis, describe the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of high-sampled full-scale time series measurements...
A Statistical Programme Assignment Model
DEFF Research Database (Denmark)
Rosholm, Michael; Staghøj, Jonas; Svarer, Michael
When treatment effects of active labour market programmes are heterogeneous in an observable way across the population, the allocation of the unemployed into different programmes becomes a particularly important issue. In this paper, we present a statistical model designed to improve the present...
Textual information access statistical models
Gaussier, Eric
2013-01-01
This book presents statistical models that have recently been developed within several research communities to access information contained in text collections. The problems considered are linked to applications aiming at facilitating information access:- information extraction and retrieval;- text classification and clustering;- opinion mining;- comprehension aids (automatic summarization, machine translation, visualization).In order to give the reader as complete a description as possible, the focus is placed on the probability models used in the applications
Detecting Statistically Significant Communities of Triangle Motifs in Undirected Networks
2016-04-26
extend the work of Perry et al. [6] by developing a statistical framework that supports the detection of triangle motif- based clusters in complex...priori, the need for triangle motif- based clustering. 2. Developed an algorithm for clustering undirected networks, where the triangle con guration was...13 5 Application to Real Networks 18 5.1 2012 FBS Football Schedule Network
Conducting tests for statistically significant differences using forest inventory data
James A. Westfall; Scott A. Pugh; John W. Coulston
2013-01-01
Many forest inventory and monitoring programs are based on a sample of ground plots from which estimates of forest resources are derived. In addition to evaluating metrics such as number of trees or amount of cubic wood volume, it is often desirable to make comparisons between resource attributes. To properly conduct statistical tests for differences, it is imperative...
Uncertainty the soul of modeling, probability & statistics
Briggs, William
2016-01-01
This book presents a philosophical approach to probability and probabilistic thinking, considering the underpinnings of probabilistic reasoning and modeling, which effectively underlie everything in data science. The ultimate goal is to call into question many standard tenets and lay the philosophical and probabilistic groundwork and infrastructure for statistical modeling. It is the first book devoted to the philosophy of data aimed at working scientists and calls for a new consideration in the practice of probability and statistics to eliminate what has been referred to as the "Cult of Statistical Significance". The book explains the philosophy of these ideas and not the mathematics, though there are a handful of mathematical examples. The topics are logically laid out, starting with basic philosophy as related to probability, statistics, and science, and stepping through the key probabilistic ideas and concepts, and ending with statistical models. Its jargon-free approach asserts that standard methods, suc...
Improved model for statistical alignment
Energy Technology Data Exchange (ETDEWEB)
Miklos, I.; Toroczkai, Z. (Zoltan)
2001-01-01
The statistical approach to molecular sequence evolution involves the stochastic modeling of the substitution, insertion and deletion processes. Substitution has been modeled in a reliable way for more than three decades by using finite Markov-processes. Insertion and deletion, however, seem to be more difficult to model, and thc recent approaches cannot acceptably deal with multiple insertions and deletions. A new method based on a generating function approach is introduced to describe the multiple insertion process. The presented algorithm computes the approximate joint probability of two sequences in 0(13) running time where 1 is the geometric mean of the sequence lengths.
Wilkinson, Michael
2014-03-01
Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.
Evaluating clinical significance: incorporating robust statistics with normative comparison tests.
van Wieringen, Katrina; Cribbie, Robert A
2014-05-01
The purpose of this study was to evaluate a modified test of equivalence for conducting normative comparisons when distribution shapes are non-normal and variances are unequal. A Monte Carlo study was used to compare the empirical Type I error rates and power of the proposed Schuirmann-Yuen test of equivalence, which utilizes trimmed means, with that of the previously recommended Schuirmann and Schuirmann-Welch tests of equivalence when the assumptions of normality and variance homogeneity are satisfied, as well as when they are not satisfied. The empirical Type I error rates of the Schuirmann-Yuen were much closer to the nominal α level than those of the Schuirmann or Schuirmann-Welch tests, and the power of the Schuirmann-Yuen was substantially greater than that of the Schuirmann or Schuirmann-Welch tests when distributions were skewed or outliers were present. The Schuirmann-Yuen test is recommended for assessing clinical significance with normative comparisons. © 2013 The British Psychological Society.
EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance
DEFF Research Database (Denmark)
Larsen, Thomas Schou; Krogh, Anders Stærmose
2003-01-01
is the expected number of ORFs in one megabase of random sequence at the same significance level or better, where the random sequence has the same statistics as the genome in the sense of a third order Markov chain.Conclusions: The result is a flexible gene finder whose overall performance matches or exceeds......Background: Contrary to other areas of sequence analysis, a measure of statistical significance of a putative gene has not been devised to help in discriminating real genes from the masses of random Open Reading Frames (ORFs) in prokaryotic genomes. Therefore, many genomes have too many short ORFs...... annotated as genes.Results: In this paper, we present a new automated gene-finding method, EasyGene, which estimates the statistical significance of a predicted gene. The gene finder is based on a hidden Markov model (HMM) that is automatically estimated for a new genome. Using extensions of similarities...
Statistical significance of trends in monthly heavy precipitation over the US
Mahajan, Salil
2011-05-11
Trends in monthly heavy precipitation, defined by a return period of one year, are assessed for statistical significance in observations and Global Climate Model (GCM) simulations over the contiguous United States using Monte Carlo non-parametric and parametric bootstrapping techniques. The results from the two Monte Carlo approaches are found to be similar to each other, and also to the traditional non-parametric Kendall\\'s τ test, implying the robustness of the approach. Two different observational data-sets are employed to test for trends in monthly heavy precipitation and are found to exhibit consistent results. Both data-sets demonstrate upward trends, one of which is found to be statistically significant at the 95% confidence level. Upward trends similar to observations are observed in some climate model simulations of the twentieth century, but their statistical significance is marginal. For projections of the twenty-first century, a statistically significant upwards trend is observed in most of the climate models analyzed. The change in the simulated precipitation variance appears to be more important in the twenty-first century projections than changes in the mean precipitation. Stochastic fluctuations of the climate-system are found to be dominate monthly heavy precipitation as some GCM simulations show a downwards trend even in the twenty-first century projections when the greenhouse gas forcings are strong. © 2011 Springer-Verlag.
Statistical Analysis by Statistical Physics Model for the STOCK Markets
Wang, Tiansong; Wang, Jun; Fan, Bingli
A new stochastic stock price model of stock markets based on the contact process of the statistical physics systems is presented in this paper, where the contact model is a continuous time Markov process, one interpretation of this model is as a model for the spread of an infection. Through this model, the statistical properties of Shanghai Stock Exchange (SSE) and Shenzhen Stock Exchange (SZSE) are studied. In the present paper, the data of SSE Composite Index and the data of SZSE Component Index are analyzed, and the corresponding simulation is made by the computer computation. Further, we investigate the statistical properties, fat-tail phenomena, the power-law distributions, and the long memory of returns for these indices. The techniques of skewness-kurtosis test, Kolmogorov-Smirnov test, and R/S analysis are applied to study the fluctuation characters of the stock price returns.
EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance
Directory of Open Access Journals (Sweden)
Larsen Thomas
2003-06-01
Full Text Available Abstract Background Contrary to other areas of sequence analysis, a measure of statistical significance of a putative gene has not been devised to help in discriminating real genes from the masses of random Open Reading Frames (ORFs in prokaryotic genomes. Therefore, many genomes have too many short ORFs annotated as genes. Results In this paper, we present a new automated gene-finding method, EasyGene, which estimates the statistical significance of a predicted gene. The gene finder is based on a hidden Markov model (HMM that is automatically estimated for a new genome. Using extensions of similarities in Swiss-Prot, a high quality training set of genes is automatically extracted from the genome and used to estimate the HMM. Putative genes are then scored with the HMM, and based on score and length of an ORF, the statistical significance is calculated. The measure of statistical significance for an ORF is the expected number of ORFs in one megabase of random sequence at the same significance level or better, where the random sequence has the same statistics as the genome in the sense of a third order Markov chain. Conclusions The result is a flexible gene finder whose overall performance matches or exceeds other methods. The entire pipeline of computer processing from the raw input of a genome or set of contigs to a list of putative genes with significance is automated, making it easy to apply EasyGene to newly sequenced organisms. EasyGene with pre-trained models can be accessed at http://www.cbs.dtu.dk/services/EasyGene.
Understanding the Sampling Distribution and Its Use in Testing Statistical Significance.
Breunig, Nancy A.
Despite the increasing criticism of statistical significance testing by researchers, particularly in the publication of the 1994 American Psychological Association's style manual, statistical significance test results are still popular in journal articles. For this reason, it remains important to understand the logic of inferential statistics. A…
Statistical modelling of fish stocks
DEFF Research Database (Denmark)
Kvist, Trine
1999-01-01
for modelling the dynamics of a fish population is suggested. A new approach is introduced to analyse the sources of variation in age composition data, which is one of the most important sources of information in the cohort based models for estimation of stock abundancies and mortalities. The approach combines...... and it is argued that an approach utilising stochastic differential equations might be advantagous in fish stoch assessments....
Tsilidis, Konstantinos K; Papatheodorou, Stefania I; Evangelou, Evangelos; Ioannidis, John P A
2012-12-19
Numerous biomarkers have been associated with cancer risk. We assessed whether there is evidence for excess statistical significance in results of cancer biomarker studies, suggesting biases. We systematically searched PubMed for meta-analyses of nongenetic biomarkers and cancer risk. The number of observed studies with statistically significant results was compared with the expected number, based on the statistical power of each study under different assumptions for the plausible effect size. We also evaluated small-study effects using asymmetry tests. All statistical tests were two-sided. We included 98 meta-analyses with 847 studies. Forty-three meta-analyses (44%) found nominally statistically significant summary effects (random effects). The proportion of meta-analyses with statistically significant effects was highest for infectious agents (86%), inflammatory (67%), and insulin-like growth factor (IGF)/insulin system (52%) biomarkers. Overall, 269 (32%) individual studies observed nominally statistically significant results. A statistically significant excess of the observed over the expected number of studies with statistically significant results was seen in 20 meta-analyses. An excess of observed vs expected was observed in studies of IGF/insulin (P ≤ .04) and inflammation systems (P ≤ .02). Only 12 meta-analyses (12%) had a statistically significant summary effect size, more than 1000 case patients, and no hints of small-study effects or excess statistical significance; only four of them had large effect sizes, three of which pertained to infectious agents (Helicobacter pylori, hepatitis and human papilloma viruses). Most well-documented biomarkers of cancer risk without evidence of bias pertain to infectious agents. Conversely, an excess of statistically significant findings was observed in studies of IGF/insulin and inflammation systems, suggesting reporting biases.
Statistical lung model for microdosimetry
International Nuclear Information System (INIS)
Fisher, D.R.; Hadley, R.T.
1984-03-01
To calculate the microdosimetry of plutonium in the lung, a mathematical description is needed of lung tissue microstructure that defines source-site parameters. Beagle lungs were expanded using a glutaraldehyde fixative at 30 cm water pressure. Tissue specimens, five microns thick, were stained with hematoxylin and eosin then studied using an image analyzer. Measurements were made along horizontal lines through the magnified tissue image. The distribution of air space and tissue chord lengths and locations of epithelial cell nuclei were recorded from about 10,000 line scans. The distribution parameters constituted a model of lung microstructure for predicting the paths of random alpha particle tracks in the lung and the probability of traversing biologically sensitive sites. This lung model may be used in conjunction with established deposition and retention models for determining the microdosimetry in the pulmonary lung for a wide variety of inhaled radioactive materials
Actuarial statistics with generalized linear mixed models
Antonio, K.; Beirlant, J.
2007-01-01
Over the last decade the use of generalized linear models (GLMs) in actuarial statistics has received a lot of attention, starting from the actuarial illustrations in the standard text by McCullagh and Nelder [McCullagh, P., Nelder, J.A., 1989. Generalized linear models. In: Monographs on Statistics
Statistical Modeling of Bivariate Data.
1982-08-01
to one. Following Crain (1974), one may consider order m approximators m log f111(X) - k k (x) - c(e), asx ;b. (4.4.5) k,-r A m and attempt to find...literature. Consider the approximate model m log fn (x) = 7 ekk(x) + a G(x), aSx ;b, (44.8) " k=-Mn ’ where G(x) is a Gaussian process and n is a
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"
Ozturk, Elif
2012-01-01
The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
Papageorgiou, Spyridon N; Kloukos, Dimitrios; Petridis, Haralampos; Pandis, Nikolaos
2015-10-01
To assess the hypothesis that there is excessive reporting of statistically significant studies published in prosthodontic and implantology journals, which could indicate selective publication. The last 30 issues of 9 journals in prosthodontics and implant dentistry were hand-searched for articles with statistical analyses. The percentages of significant and non-significant results were tabulated by parameter of interest. Univariable/multivariable logistic regression analyses were applied to identify possible predictors of reporting statistically significance findings. The results of this study were compared with similar studies in dentistry with random-effects meta-analyses. From the 2323 included studies 71% of them reported statistically significant results, with the significant results ranging from 47% to 86%. Multivariable modeling identified that geographical area and involvement of statistician were predictors of statistically significant results. Compared to interventional studies, the odds that in vitro and observational studies would report statistically significant results was increased by 1.20 times (OR: 2.20, 95% CI: 1.66-2.92) and 0.35 times (OR: 1.35, 95% CI: 1.05-1.73), respectively. The probability of statistically significant results from randomized controlled trials was significantly lower compared to various study designs (difference: 30%, 95% CI: 11-49%). Likewise the probability of statistically significant results in prosthodontics and implant dentistry was lower compared to other dental specialties, but this result did not reach statistical significant (P>0.05). The majority of studies identified in the fields of prosthodontics and implant dentistry presented statistically significant results. The same trend existed in publications of other specialties in dentistry. Copyright © 2015 Elsevier Ltd. All rights reserved.
Statistical Models and Methods for Lifetime Data
Lawless, Jerald F
2011-01-01
Praise for the First Edition"An indispensable addition to any serious collection on lifetime data analysis and . . . a valuable contribution to the statistical literature. Highly recommended . . ."-Choice"This is an important book, which will appeal to statisticians working on survival analysis problems."-Biometrics"A thorough, unified treatment of statistical models and methods used in the analysis of lifetime data . . . this is a highly competent and agreeable statistical textbook."-Statistics in MedicineThe statistical analysis of lifetime or response time data is a key tool in engineering,
Accelerated life models modeling and statistical analysis
Bagdonavicius, Vilijandas
2001-01-01
Failure Time DistributionsIntroductionParametric Classes of Failure Time DistributionsAccelerated Life ModelsIntroductionGeneralized Sedyakin's ModelAccelerated Failure Time ModelProportional Hazards ModelGeneralized Proportional Hazards ModelsGeneralized Additive and Additive-Multiplicative Hazards ModelsChanging Shape and Scale ModelsGeneralizationsModels Including Switch-Up and Cycling EffectsHeredity HypothesisSummaryAccelerated Degradation ModelsIntroductionDegradation ModelsModeling the Influence of Explanatory Varia
Statistical tests of simple earthquake cycle models
Devries, Phoebe M. R.; Evans, Eileen
2016-01-01
A central goal of observing and modeling the earthquake cycle is to forecast when a particular fault may generate an earthquake: a fault late in its earthquake cycle may be more likely to generate an earthquake than a fault early in its earthquake cycle. Models that can explain geodetic observations throughout the entire earthquake cycle may be required to gain a more complete understanding of relevant physics and phenomenology. Previous efforts to develop unified earthquake models for strike-slip faults have largely focused on explaining both preseismic and postseismic geodetic observations available across a few faults in California, Turkey, and Tibet. An alternative approach leverages the global distribution of geodetic and geologic slip rate estimates on strike-slip faults worldwide. Here we use the Kolmogorov-Smirnov test for similarity of distributions to infer, in a statistically rigorous manner, viscoelastic earthquake cycle models that are inconsistent with 15 sets of observations across major strike-slip faults. We reject a large subset of two-layer models incorporating Burgers rheologies at a significance level of α = 0.05 (those with long-term Maxwell viscosities ηM ~ 4.6 × 1020 Pa s) but cannot reject models on the basis of transient Kelvin viscosity ηK. Finally, we examine the implications of these results for the predicted earthquake cycle timing of the 15 faults considered and compare these predictions to the geologic and historical record.
Bayesian models: A statistical primer for ecologists
Hobbs, N. Thompson; Hooten, Mevin B.
2015-01-01
Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models
Automated statistical modeling of analytical measurement systems
International Nuclear Information System (INIS)
Jacobson, J.J.
1992-01-01
The statistical modeling of analytical measurement systems at the Idaho Chemical Processing Plant (ICPP) has been completely automated through computer software. The statistical modeling of analytical measurement systems is one part of a complete quality control program used by the Remote Analytical Laboratory (RAL) at the ICPP. The quality control program is an integration of automated data input, measurement system calibration, database management, and statistical process control. The quality control program and statistical modeling program meet the guidelines set forth by the American Society for Testing Materials and American National Standards Institute. A statistical model is a set of mathematical equations describing any systematic bias inherent in a measurement system and the precision of a measurement system. A statistical model is developed from data generated from the analysis of control standards. Control standards are samples which are made up at precise known levels by an independent laboratory and submitted to the RAL. The RAL analysts who process control standards do not know the values of those control standards. The object behind statistical modeling is to describe real process samples in terms of their bias and precision and, to verify that a measurement system is operating satisfactorily. The processing of control standards gives us this ability
Brouwer, D.; Meijer, R.R.; Zevalkink, D.J.
2013-01-01
Several researchers have emphasized that item response theory (IRT)-based methods should be preferred over classical approaches in measuring change for individual patients. In the present study we discuss and evaluate the use of IRT-based statistics to measure statistical significant individual
Brouwer, Danny; Meijer, Rob; Zevalkink, J.
2013-01-01
Several researchers have emphasized that item response theory (IRT)-based methods should be preferred over classical approaches in measuring change for individual patients. In the present study we discuss and evaluate the use of IRT-based statistics to measure statistical significant individual
Statistical modelling of citation exchange between statistics journals.
Varin, Cristiano; Cattelan, Manuela; Firth, David
2016-01-01
Rankings of scholarly journals based on citation data are often met with scepticism by the scientific community. Part of the scepticism is due to disparity between the common perception of journals' prestige and their ranking based on citation counts. A more serious concern is the inappropriate use of journal rankings to evaluate the scientific influence of researchers. The paper focuses on analysis of the table of cross-citations among a selection of statistics journals. Data are collected from the Web of Science database published by Thomson Reuters. Our results suggest that modelling the exchange of citations between journals is useful to highlight the most prestigious journals, but also that journal citation data are characterized by considerable heterogeneity, which needs to be properly summarized. Inferential conclusions require care to avoid potential overinterpretation of insignificant differences between journal ratings. Comparison with published ratings of institutions from the UK's research assessment exercise shows strong correlation at aggregate level between assessed research quality and journal citation 'export scores' within the discipline of statistics.
Topology for statistical modeling of petascale data.
Energy Technology Data Exchange (ETDEWEB)
Pascucci, Valerio (University of Utah, Salt Lake City, UT); Mascarenhas, Ajith Arthur; Rusek, Korben (Texas A& M University, College Station, TX); Bennett, Janine Camille; Levine, Joshua (University of Utah, Salt Lake City, UT); Pebay, Philippe Pierre; Gyulassy, Attila (University of Utah, Salt Lake City, UT); Thompson, David C.; Rojas, Joseph Maurice (Texas A& M University, College Station, TX)
2011-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled 'Topology for Statistical Modeling of Petascale Data', funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program. Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is thus to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, our approach is based on the complementary techniques of combinatorial topology and statistical modeling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modeling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. This document summarizes the technical advances we have made to date that were made possible in whole or in part by MAPD funding. These technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modeling, and (3) new integrated topological and statistical methods.
Xu, Kuan-Man
2006-01-01
A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.
Health significance and statistical uncertainty. The value of P-value.
Consonni, Dario; Bertazzi, Pier Alberto
2017-10-27
The P-value is widely used as a summary statistics of scientific results. Unfortunately, there is a widespread tendency to dichotomize its value in "P0.05" ("statistically not significant"), with the former implying a "positive" result and the latter a "negative" one. To show the unsuitability of such an approach when evaluating the effects of environmental and occupational risk factors. We provide examples of distorted use of P-value and of the negative consequences for science and public health of such a black-and-white vision. The rigid interpretation of P-value as a dichotomy favors the confusion between health relevance and statistical significance, discourages thoughtful thinking, and distorts attention from what really matters, the health significance. A much better way to express and communicate scientific results involves reporting effect estimates (e.g., risks, risks ratios or risk differences) and their confidence intervals (CI), which summarize and convey both health significance and statistical uncertainty. Unfortunately, many researchers do not usually consider the whole interval of CI but only examine if it includes the null-value, therefore degrading this procedure to the same P-value dichotomy (statistical significance or not). In reporting statistical results of scientific research present effects estimates with their confidence intervals and do not qualify the P-value as "significant" or "not significant".
Statistical Significance, Power, and Effect Size: A Response to the Reexamination of Reviewer Bias.
Wampold, Bruce E.; And Others
1983-01-01
Examines the influence of statistical significance on reviewers' recommendations for the acceptance or rejection of a manuscript for publication. Discusses the concept of power and the importance of effect size in the evaluation of research findings. (WAS)
Infinite Random Graphs as Statistical Mechanical Models
DEFF Research Database (Denmark)
Durhuus, Bergfinnur Jøgvan; Napolitano, George Maria
2011-01-01
We discuss two examples of infinite random graphs obtained as limits of finite statistical mechanical systems: a model of two-dimensional dis-cretized quantum gravity defined in terms of causal triangulated surfaces, and the Ising model on generic random trees. For the former model we describe...
Review of statistical models for nuclear reactions
International Nuclear Information System (INIS)
Igarasi, Sin-iti
1991-01-01
Statistical model calculations have been widely performed for nuclear data evaluations. These were based on the models of Hauser-Feshbach, Weisskopf-Ewing and their modifications. Since the 1940s, non-compound nuclear phenomena have been observed, and stimulated many nuclear physicists to study compound and non-compound nuclear reaction mechanisms. Concerning compound nuclear reactions, they investigated problems on the basis of fundamental properties of S-matrix, statistical distributions of resonance pole parameters, random matrix elements of the nuclear Hamiltonian, and so forth. They have presented many sophisticated results. But old statistical models have been still useful, because these models were simple and easily utilizable. In this report, these old and new models will be briefly reviewed with a purpose of application to nuclear data evaluation, and examine applicability of the new models. (author)
Matrix Tricks for Linear Statistical Models
Puntanen, Simo; Styan, George PH
2011-01-01
In teaching linear statistical models to first-year graduate students or to final-year undergraduate students there is no way to proceed smoothly without matrices and related concepts of linear algebra; their use is really essential. Our experience is that making some particular matrix tricks very familiar to students can substantially increase their insight into linear statistical models (and also multivariate statistical analysis). In matrix algebra, there are handy, sometimes even very simple "tricks" which simplify and clarify the treatment of a problem - both for the student and
Daily precipitation statistics in regional climate models
DEFF Research Database (Denmark)
Frei, Christoph; Christensen, Jens Hesselbjerg; Déqué, Michel
2003-01-01
. The 15-year integrations were forced from reanalyses and observed sea surface temperature and sea ice (global model from sea surface only). The observational reference is based on 6400 rain gauge records (10-50 stations per grid box). Evaluation statistics encompass mean precipitation, wet-day frequency...... for other statistics. In summer, all models underestimate precipitation intensity (by 16-42%) and there is a too low frequency of heavy events. This bias reflects too dry summer mean conditions in three of the models, while it is partly compensated by too many low-intensity events in the other two models...
Statistical Shape Modeling of Cam Femoroacetabular Impingement
Energy Technology Data Exchange (ETDEWEB)
Harris, Michael D.; Dater, Manasi; Whitaker, Ross; Jurrus, Elizabeth R.; Peters, Christopher L.; Anderson, Andrew E.
2013-10-01
In this study, statistical shape modeling (SSM) was used to quantify three-dimensional (3D) variation and morphologic differences between femurs with and without cam femoroacetabular impingement (FAI). 3D surfaces were generated from CT scans of femurs from 41 controls and 30 cam FAI patients. SSM correspondence particles were optimally positioned on each surface using a gradient descent energy function. Mean shapes for control and patient groups were defined from the resulting particle configurations. Morphological differences between group mean shapes and between the control mean and individual patients were calculated. Principal component analysis was used to describe anatomical variation present in both groups. The first 6 modes (or principal components) captured statistically significant shape variations, which comprised 84% of cumulative variation among the femurs. Shape variation was greatest in femoral offset, greater trochanter height, and the head-neck junction. The mean cam femur shape protruded above the control mean by a maximum of 3.3 mm with sustained protrusions of 2.5-3.0 mm along the anterolateral head-neck junction and distally along the anterior neck, corresponding well with reported cam lesion locations and soft-tissue damage. This study provides initial evidence that SSM can describe variations in femoral morphology in both controls and cam FAI patients and may be useful for developing new measurements of pathological anatomy. SSM may also be applied to characterize cam FAI severity and provide templates to guide patient-specific surgical resection of bone.
Distributions with given marginals and statistical modelling
Fortiana, Josep; Rodriguez-Lallena, José
2002-01-01
This book contains a selection of the papers presented at the meeting `Distributions with given marginals and statistical modelling', held in Barcelona (Spain), July 17-20, 2000. In 24 chapters, this book covers topics such as the theory of copulas and quasi-copulas, the theory and compatibility of distributions, models for survival distributions and other well-known distributions, time series, categorical models, definition and estimation of measures of dependence, monotonicity and stochastic ordering, shape and separability of distributions, hidden truncation models, diagonal families, orthogonal expansions, tests of independence, and goodness of fit assessment. These topics share the use and properties of distributions with given marginals, this being the fourth specialised text on this theme. The innovative aspect of the book is the inclusion of statistical aspects such as modelling, Bayesian statistics, estimation, and tests.
Statistical Modeling for Radiation Hardness Assurance
Ladbury, Raymond L.
2014-01-01
We cover the models and statistics associated with single event effects (and total ionizing dose), why we need them, and how to use them: What models are used, what errors exist in real test data, and what the model allows us to say about the DUT will be discussed. In addition, how to use other sources of data such as historical, heritage, and similar part and how to apply experience, physics, and expert opinion to the analysis will be covered. Also included will be concepts of Bayesian statistics, data fitting, and bounding rates.
Performance modeling, loss networks, and statistical multiplexing
Mazumdar, Ravi
2009-01-01
This monograph presents a concise mathematical approach for modeling and analyzing the performance of communication networks with the aim of understanding the phenomenon of statistical multiplexing. The novelty of the monograph is the fresh approach and insights provided by a sample-path methodology for queueing models that highlights the important ideas of Palm distributions associated with traffic models and their role in performance measures. Also presented are recent ideas of large buffer, and many sources asymptotics that play an important role in understanding statistical multiplexing. I
Simple statistical model for branched aggregates
DEFF Research Database (Denmark)
Lemarchand, Claire; Hansen, Jesper Schmidt
2015-01-01
We propose a statistical model that can reproduce the size distribution of any branched aggregate, including amylopectin, dendrimers, molecular clusters of monoalcohols, and asphaltene nanoaggregates. It is based on the conditional probability for one molecule to form a new bond with a molecule......, given that it already has bonds with others. The model is applied here to asphaltene nanoaggregates observed in molecular dynamics simulations of Cooee bitumen. The variation with temperature of the probabilities deduced from this model is discussed in terms of statistical mechanics arguments....... The relevance of the statistical model in the case of asphaltene nanoaggregates is checked by comparing the predicted value of the probability for one molecule to have exactly i bonds with the same probability directly measured in the molecular dynamics simulations. The agreement is satisfactory...
Statistical Model Checking for Stochastic Hybrid Systems
DEFF Research Database (Denmark)
David, Alexandre; Du, Dehui; Larsen, Kim Guldstrand
2012-01-01
This paper presents novel extensions and applications of the UPPAAL-SMC model checker. The extensions allow for statistical model checking of stochastic hybrid systems. We show how our race-based stochastic semantics extends to networks of hybrid systems, and indicate the integration technique ap...
Advances in statistical models for data analysis
Minerva, Tommaso; Vichi, Maurizio
2015-01-01
This edited volume focuses on recent research results in classification, multivariate statistics and machine learning and highlights advances in statistical models for data analysis. The volume provides both methodological developments and contributions to a wide range of application areas such as economics, marketing, education, social sciences and environment. The papers in this volume were first presented at the 9th biannual meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society, held in September 2013 at the University of Modena and Reggio Emilia, Italy.
Probabilistic statistical modeling of air pollution from vehicles
Adikanova, Saltanat; Malgazhdarov, Yerzhan A.; Madiyarov, Muratkan N.; Temirbekov, Nurlan M.
2017-09-01
The aim of the work is to create a probabilistic-statistical mathematical model for the distribution of emissions from vehicles. In this article, it is proposed to use the probabilistic and statistical approach for modeling the distribution of harmful impurities in the atmosphere from vehicles using the example of the Ust-Kamenogorsk city. Using a simplified methodology of stochastic modeling, it is possible to construct effective numerical computational algorithms that significantly reduce the amount of computation without losing their accuracy.
Zhang, Zhang
2012-03-22
Background: Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis.Results: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.Conclusions: As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions. 2012 Zhang et al; licensee BioMed Central Ltd.
Statistical physics of pairwise probability models
DEFF Research Database (Denmark)
Roudi, Yasser; Aurell, Erik; Hertz, John
2009-01-01
(dansk abstrakt findes ikke) Statistical models for describing the probability distribution over the states of biological systems are commonly used for dimensional reduction. Among these models, pairwise models are very attractive in part because they can be fit using a reasonable amount of data......: knowledge of the means and correlations between pairs of elements in the system is sufficient. Not surprisingly, then, using pairwise models for studying neural data has been the focus of many studies in recent years. In this paper, we describe how tools from statistical physics can be employed for studying...... and using pairwise models. We build on our previous work on the subject and study the relation between different methods for fitting these models and evaluating their quality. In particular, using data from simulated cortical networks we study how the quality of various approximate methods for inferring...
Growth curve models and statistical diagnostics
Pan, Jian-Xin
2002-01-01
Growth-curve models are generalized multivariate analysis-of-variance models. These models are especially useful for investigating growth problems on short times in economics, biology, medical research, and epidemiology. This book systematically introduces the theory of the GCM with particular emphasis on their multivariate statistical diagnostics, which are based mainly on recent developments made by the authors and their collaborators. The authors provide complete proofs of theorems as well as practical data sets and MATLAB code.
DEFF Research Database (Denmark)
Jakobsen, Janus Christian; Wetterslev, Jorn; Winkel, Per
2014-01-01
BACKGROUND: Thresholds for statistical significance when assessing meta-analysis results are being insufficiently demonstrated by traditional 95% confidence intervals and P-values. Assessment of intervention effects in systematic reviews with meta-analysis deserves greater rigour. METHODS......: Methodologies for assessing statistical and clinical significance of intervention effects in systematic reviews were considered. Balancing simplicity and comprehensiveness, an operational procedure was developed, based mainly on The Cochrane Collaboration methodology and the Grading of Recommendations...... conservative results as the main results. (2) Explore the reasons behind substantial statistical heterogeneity using sensitivity analyses. (3) To take account of problems with multiplicity adjust the thresholds for significance according to the number of primary outcomes. (4) Calculate required information...
Topology for Statistical Modeling of Petascale Data
Energy Technology Data Exchange (ETDEWEB)
Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Levine, Joshua [Univ. of Utah, Salt Lake City, UT (United States); Gyulassy, Attila [Univ. of Utah, Salt Lake City, UT (United States); Bremer, P. -T. [Univ. of Utah, Salt Lake City, UT (United States)
2013-10-31
Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, the approach of the entire team involving all three institutions is based on the complementary techniques of combinatorial topology and statistical modelling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modelling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. The overall technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modelling, and (3) new integrated topological and statistical methods. Roughly speaking, the division of labor between our 3 groups (Sandia Labs in Livermore, Texas A&M in College Station, and U Utah in Salt Lake City) is as follows: the Sandia group focuses on statistical methods and their formulation in algebraic terms, and finds the application problems (and data sets) most relevant to this project, the Texas A&M Group develops new algebraic geometry algorithms, in particular with fewnomial theory, and the Utah group develops new algorithms in computational topology via Discrete Morse Theory. However, we hasten to point out that our three groups stay in tight contact via videconference every 2 weeks, so there is much synergy of ideas between the groups. The following of this document is focused on the contributions that had grater direct involvement from the team at the University of Utah in Salt Lake City.
A statistical model for predicting muscle performance
Byerly, Diane Leslie De Caix
The objective of these studies was to develop a capability for predicting muscle performance and fatigue to be utilized for both space- and ground-based applications. To develop this predictive model, healthy test subjects performed a defined, repetitive dynamic exercise to failure using a Lordex spinal machine. Throughout the exercise, surface electromyography (SEMG) data were collected from the erector spinae using a Mega Electronics ME3000 muscle tester and surface electrodes placed on both sides of the back muscle. These data were analyzed using a 5th order Autoregressive (AR) model and statistical regression analysis. It was determined that an AR derived parameter, the mean average magnitude of AR poles, significantly correlated with the maximum number of repetitions (designated Rmax) that a test subject was able to perform. Using the mean average magnitude of AR poles, a test subject's performance to failure could be predicted as early as the sixth repetition of the exercise. This predictive model has the potential to provide a basis for improving post-space flight recovery, monitoring muscle atrophy in astronauts and assessing the effectiveness of countermeasures, monitoring astronaut performance and fatigue during Extravehicular Activity (EVA) operations, providing pre-flight assessment of the ability of an EVA crewmember to perform a given task, improving the design of training protocols and simulations for strenuous International Space Station assembly EVA, and enabling EVA work task sequences to be planned enhancing astronaut performance and safety. Potential ground-based, medical applications of the predictive model include monitoring muscle deterioration and performance resulting from illness, establishing safety guidelines in the industry for repetitive tasks, monitoring the stages of rehabilitation for muscle-related injuries sustained in sports and accidents, and enhancing athletic performance through improved training protocols while reducing
Directory of Open Access Journals (Sweden)
Melissa Coulson
2010-07-01
Full Text Available A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST, or confidence intervals (CIs. Authors of articles published in psychology, behavioural neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST.
An R companion to linear statistical models
Hay-Jahans, Christopher
2011-01-01
Focusing on user-developed programming, An R Companion to Linear Statistical Models serves two audiences: those who are familiar with the theory and applications of linear statistical models and wish to learn or enhance their skills in R; and those who are enrolled in an R-based course on regression and analysis of variance. For those who have never used R, the book begins with a self-contained introduction to R that lays the foundation for later chapters.This book includes extensive and carefully explained examples of how to write programs using the R programming language. These examples cove
Bayesian models a statistical primer for ecologists
Hobbs, N Thompson
2015-01-01
Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods-in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach. Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probabili
STATISTICAL MODELS OF REPRESENTING INTELLECTUAL CAPITAL
Directory of Open Access Journals (Sweden)
Andreea Feraru
2016-06-01
Full Text Available This article entitled Statistical Models of Representing Intellectual Capital approaches and analyses the concept of intellectual capital, as well as the main models which can support enterprisers/managers in evaluating and quantifying the advantages of intellectual capital. Most authors examine intellectual capital from a static perspective and focus on the development of its various evaluation models. In this chapter we surveyed the classical static models: Sveiby, Edvisson, Balanced Scorecard, as well as the canonical model of intellectual capital. Among the group of static models for evaluating organisational intellectual capital the canonical model stands out. This model enables the structuring of organisational intellectual capital in: human capital, structural capital and relational capital. Although the model is widely spread, it is a static one and can thus create a series of errors in the process of evaluation, because all the three entities mentioned above are not independent from the viewpoint of their contents, as any logic of structuring complex entities requires.
Statistical Model Checking for Product Lines
DEFF Research Database (Denmark)
ter Beek, Maurice H.; Legay, Axel; Lluch Lafuente, Alberto
2016-01-01
We report on the suitability of statistical model checking for the analysis of quantitative properties of product line models by an extended treatment of earlier work by the authors. The type of analysis that can be performed includes the likelihood of specific product behaviour, the expected...... average cost of products (in terms of the attributes of the products’ features) and the probability of features to be (un)installed at runtime. The product lines must be modelled in QFLan, which extends the probabilistic feature-oriented language PFLan with novel quantitative constraints among features...... behaviour converge in a discrete-time Markov chain semantics, enabling the analysis of quantitative properties. Technically, a Maude implementation of QFLan, integrated with Microsoft’s SMT constraint solver Z3, is combined with the distributed statistical model checker MultiVeStA, developed by one...
(ajst) statistical mechanics model for orientational
African Journals Online (AJOL)
2: December, 2005. African Journal of Science and Technology (AJST). Science and Engineering Series Vol. 6, No. 2, pp. 94 - 101. STATISTICAL MECHANICS MODEL FOR ORIENTATIONAL. MOTION OF TWO-DIMENSIONAL RIGID ROTATOR. Malo, J.O.. Department of Physics, University of Nairobi, P.O. Box 30197 ...
Probing NWP model deficiencies by statistical postprocessing
DEFF Research Database (Denmark)
Rosgaard, Martin Haubjerg; Nielsen, Henrik Aalborg; Nielsen, Torben S.
2016-01-01
The objective in this article is twofold. On one hand, a Model Output Statistics (MOS) framework for improved wind speed forecast accuracy is described and evaluated. On the other hand, the approach explored identifies unintuitive explanatory value from a diagnostic variable in an operational num...
Topology for Statistical Modeling of Petascale Data
Energy Technology Data Exchange (ETDEWEB)
Bennett, Janine Camille [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Pebay, Philippe Pierre [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Levine, Joshua [Univ. of Utah, Salt Lake City, UT (United States); Gyulassy, Attila [Univ. of Utah, Salt Lake City, UT (United States); Rojas, Maurice [Texas A & M Univ., College Station, TX (United States)
2014-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled "Topology for Statistical Modeling of Petascale Data", funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program.
DEFF Research Database (Denmark)
Jones, Allan; Sommerlund, Bo
2007-01-01
The uses of null hypothesis significance testing (NHST) and statistical power analysis within psychological research are critically discussed. The article looks at the problems of relying solely on NHST when dealing with small and large sample sizes. The use of power-analysis in estimating...
Spinella, Sarah
2011-01-01
As result replicability is essential to science and difficult to achieve through external replicability, the present paper notes the insufficiency of null hypothesis statistical significance testing (NHSST) and explains the bootstrap as a plausible alternative, with a heuristic example to illustrate the bootstrap method. The bootstrap relies on…
Linting, Marielle; van Os, Bart Jan; Meulman, Jacqueline J.
2011-01-01
In this paper, the statistical significance of the contribution of variables to the principal components in principal components analysis (PCA) is assessed nonparametrically by the use of permutation tests. We compare a new strategy to a strategy used in previous research consisting of permuting the columns (variables) of a data matrix…
P-Value, a true test of statistical significance? a cautionary note ...
African Journals Online (AJOL)
While it's not the intention of the founders of significance testing and hypothesis testing to have the two ideas intertwined as if they are complementary, the inconvenient marriage of the two practices into one coherent, convenient, incontrovertible and misinterpreted practice has dotted our standard statistics textbooks and ...
Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.
Kieffer, Kevin M.; Thompson, Bruce
As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate…
Recent Literature on Whether Statistical Significance Tests Should or Should Not Be Banned.
Deegear, James
This paper summarizes the literature regarding statistical significant testing with an emphasis on recent literature in various discipline and literature exploring why researchers have demonstrably failed to be influenced by the American Psychological Association publication manual's encouragement to report effect sizes. Also considered are…
van Tulder, M.W.; Malmivaara, A.; Hayden, J.; Koes, B.
2007-01-01
STUDY DESIGN. Critical appraisal of the literature. OBJECIVES. The objective of this study was to assess if results of back pain trials are statistically significant and clinically important. SUMMARY OF BACKGROUND DATA. There seems to be a discrepancy between conclusions reported by authors and
Statistical models for competing risk analysis
International Nuclear Information System (INIS)
Sather, H.N.
1976-08-01
Research results on three new models for potential applications in competing risks problems. One section covers the basic statistical relationships underlying the subsequent competing risks model development. Another discusses the problem of comparing cause-specific risk structure by competing risks theory in two homogeneous populations, P1 and P2. Weibull models which allow more generality than the Berkson and Elveback models are studied for the effect of time on the hazard function. The use of concomitant information for modeling single-risk survival is extended to the multiple failure mode domain of competing risks. The model used to illustrate the use of this methodology is a life table model which has constant hazards within pre-designated intervals of the time scale. Two parametric models for bivariate dependent competing risks, which provide interesting alternatives, are proposed and examined
Performance modeling, stochastic networks, and statistical multiplexing
Mazumdar, Ravi R
2013-01-01
This monograph presents a concise mathematical approach for modeling and analyzing the performance of communication networks with the aim of introducing an appropriate mathematical framework for modeling and analysis as well as understanding the phenomenon of statistical multiplexing. The models, techniques, and results presented form the core of traffic engineering methods used to design, control and allocate resources in communication networks.The novelty of the monograph is the fresh approach and insights provided by a sample-path methodology for queueing models that highlights the importan
Statistical physics of pairwise probability models
Directory of Open Access Journals (Sweden)
Yasser Roudi
2009-11-01
Full Text Available Statistical models for describing the probability distribution over the states of biological systems are commonly used for dimensional reduction. Among these models, pairwise models are very attractive in part because they can be fit using a reasonable amount of data: knowledge of the means and correlations between pairs of elements in the system is sufficient. Not surprisingly, then, using pairwise models for studying neural data has been the focus of many studies in recent years. In this paper, we describe how tools from statistical physics can be employed for studying and using pairwise models. We build on our previous work on the subject and study the relation between different methods for fitting these models and evaluating their quality. In particular, using data from simulated cortical networks we study how the quality of various approximate methods for inferring the parameters in a pairwise model depends on the time bin chosen for binning the data. We also study the effect of the size of the time bin on the model quality itself, again using simulated data. We show that using finer time bins increases the quality of the pairwise model. We offer new ways of deriving the expressions reported in our previous work for assessing the quality of pairwise models.
Statistical models of petrol engines vehicles dynamics
Ilie, C. O.; Marinescu, M.; Alexa, O.; Vilău, R.; Grosu, D.
2017-10-01
This paper focuses on studying statistical models of vehicles dynamics. It was design and perform a one year testing program. There were used many same type cars with gasoline engines and different mileage. Experimental data were collected of onboard sensors and those on the engine test stand. A database containing data of 64th tests was created. Several mathematical modelling were developed using database and the system identification method. Each modelling is a SISO or a MISO linear predictive ARMAX (AutoRegressive-Moving-Average with eXogenous inputs) model. It represents a differential equation with constant coefficients. It were made 64th equations for each dependency like engine torque as output and engine’s load and intake manifold pressure, as inputs. There were obtained strings with 64 values for each type of model. The final models were obtained using average values of the coefficients. The accuracy of models was assessed.
Equilibrium statistical mechanics of lattice models
Lavis, David A
2015-01-01
Most interesting and difficult problems in equilibrium statistical mechanics concern models which exhibit phase transitions. For graduate students and more experienced researchers this book provides an invaluable reference source of approximate and exact solutions for a comprehensive range of such models. Part I contains background material on classical thermodynamics and statistical mechanics, together with a classification and survey of lattice models. The geometry of phase transitions is described and scaling theory is used to introduce critical exponents and scaling laws. An introduction is given to finite-size scaling, conformal invariance and Schramm—Loewner evolution. Part II contains accounts of classical mean-field methods. The parallels between Landau expansions and catastrophe theory are discussed and Ginzburg—Landau theory is introduced. The extension of mean-field theory to higher-orders is explored using the Kikuchi—Hijmans—De Boer hierarchy of approximations. In Part III the use of alge...
Statistical Models of Adaptive Immune populations
Sethna, Zachary; Callan, Curtis; Walczak, Aleksandra; Mora, Thierry
The availability of large (104-106 sequences) datasets of B or T cell populations from a single individual allows reliable fitting of complex statistical models for naïve generation, somatic selection, and hypermutation. It is crucial to utilize a probabilistic/informational approach when modeling these populations. The inferred probability distributions allow for population characterization, calculation of probability distributions of various hidden variables (e.g. number of insertions), as well as statistical properties of the distribution itself (e.g. entropy). In particular, the differences between the T cell populations of embryonic and mature mice will be examined as a case study. Comparing these populations, as well as proposed mixed populations, provides a concrete exercise in model creation, comparison, choice, and validation.
Statistical shape and appearance models of bones.
Sarkalkan, Nazli; Weinans, Harrie; Zadpoor, Amir A
2014-03-01
When applied to bones, statistical shape models (SSM) and statistical appearance models (SAM) respectively describe the mean shape and mean density distribution of bones within a certain population as well as the main modes of variations of shape and density distribution from their mean values. The availability of this quantitative information regarding the detailed anatomy of bones provides new opportunities for diagnosis, evaluation, and treatment of skeletal diseases. The potential of SSM and SAM has been recently recognized within the bone research community. For example, these models have been applied for studying the effects of bone shape on the etiology of osteoarthritis, improving the accuracy of clinical osteoporotic fracture prediction techniques, design of orthopedic implants, and surgery planning. This paper reviews the main concepts, methods, and applications of SSM and SAM as applied to bone. Copyright © 2013 Elsevier Inc. All rights reserved.
Cellular automata and statistical mechanical models
International Nuclear Information System (INIS)
Rujan, P.
1987-01-01
The authors elaborate on the analogy between the transfer matrix of usual lattice models and the master equation describing the time development of cellular automata. Transient and stationary properties of probabilistic automata are linked to surface and bulk properties, respectively, of restricted statistical mechanical systems. It is demonstrated that methods of statistical physics can be successfully used to describe the dynamic and the stationary behavior of such automata. Some exact results are derived, including duality transformations, exact mappings, disorder, and linear solutions. Many examples are worked out in detail to demonstrate how to use statistical physics in order to construct cellular automata with desired properties. This approach is considered to be a first step toward the design of fully parallel, probabilistic systems whose computational abilities rely on the cooperative behavior of their components
Statistical modeling of geopressured geothermal reservoirs
Ansari, Esmail; Hughes, Richard; White, Christopher D.
2017-06-01
Identifying attractive candidate reservoirs for producing geothermal energy requires predictive models. In this work, inspectional analysis and statistical modeling are used to create simple predictive models for a line drive design. Inspectional analysis on the partial differential equations governing this design yields a minimum number of fifteen dimensionless groups required to describe the physics of the system. These dimensionless groups are explained and confirmed using models with similar dimensionless groups but different dimensional parameters. This study models dimensionless production temperature and thermal recovery factor as the responses of a numerical model. These responses are obtained by a Box-Behnken experimental design. An uncertainty plot is used to segment the dimensionless time and develop a model for each segment. The important dimensionless numbers for each segment of the dimensionless time are identified using the Boosting method. These selected numbers are used in the regression models. The developed models are reduced to have a minimum number of predictors and interactions. The reduced final models are then presented and assessed using testing runs. Finally, applications of these models are offered. The presented workflow is generic and can be used to translate the output of a numerical simulator into simple predictive models in other research areas involving numerical simulation.
Directory of Open Access Journals (Sweden)
Mashhood Ahmed Sheikh
2017-08-01
Full Text Available The life course perspective, the risky families model, and stress-and-coping models provide the rationale for assessing the role of smoking as a mediator in the association between childhood adversity and anxious and depressive symptomatology (ADS in adulthood. However, no previous study has assessed the independent mediating role of smoking in the association between childhood adversity and ADS in adulthood. Moreover, the importance of mediator-response confounding variables has rarely been demonstrated empirically in social and psychiatric epidemiology. The aim of this paper was to (i assess the mediating role of smoking in adulthood in the association between childhood adversity and ADS in adulthood, and (ii assess the change in estimates due to different mediator-response confounding factors (education, alcohol intake, and social support. The present analysis used data collected from 1994 to 2008 within the framework of the Tromsø Study (N = 4,530, a representative prospective cohort study of men and women. Seven childhood adversities (low mother's education, low father's education, low financial conditions, exposure to passive smoke, psychological abuse, physical abuse, and substance abuse distress were used to create a childhood adversity score. Smoking status was measured at a mean age of 54.7 years (Tromsø IV, and ADS in adulthood was measured at a mean age of 61.7 years (Tromsø V. Mediation analysis was used to assess the indirect effect and the proportion of mediated effect (% of childhood adversity on ADS in adulthood via smoking in adulthood. The test-retest reliability of smoking was good (Kappa: 0.67, 95% CI: 0.63; 0.71 in this sample. Childhood adversity was associated with a 10% increased risk of smoking in adulthood (Relative risk: 1.10, 95% CI: 1.03; 1.18, and both childhood adversity and smoking in adulthood were associated with greater levels of ADS in adulthood (p < 0.001. Smoking in adulthood did not significantly
Statistical Modelling of Wind Proles - Data Analysis and Modelling
DEFF Research Database (Denmark)
Jónsson, Tryggvi; Pinson, Pierre
The aim of the analysis presented in this document is to investigate whether statistical models can be used to make very short-term predictions of wind profiles.......The aim of the analysis presented in this document is to investigate whether statistical models can be used to make very short-term predictions of wind profiles....
Logarithmic transformed statistical models in calibration
International Nuclear Information System (INIS)
Zeis, C.D.
1975-01-01
A general type of statistical model used for calibration of instruments having the property that the standard deviations of the observed values increase as a function of the mean value is described. The application to the Helix Counter at the Rocky Flats Plant is primarily from a theoretical point of view. The Helix Counter measures the amount of plutonium in certain types of chemicals. The method described can be used also for other calibrations. (U.S.)
Statistical model for high energy inclusive processes
International Nuclear Information System (INIS)
Pomorisac, B.
1980-01-01
We propose a statistical model of inclusive processes. The model is an extension of the model proposed by Salapino and Sugar for the inclusive distributions in rapidity. The model is defined in terms of a random variable on the full phase space of the produced particles and in terms of a Lorentz-invariant probability distribution. We suggest that the Lorentz invariance is broken spontaneously, this may describe the observed anisotropy of the inclusive distributions. Based on this model we calculate the distribution in transverse momentum. An explicit calculation is given of the one-particle inclusive cross sections and the two-particle correlation. The results give a fair representation of the shape of one-particle inclusive cross sections, and positive correlation for the particles emitted. The relevance of our results to experiments is discussed
Farrell, Mary Beth
2018-02-02
This article is the second part of continuing education series reviewing basic statistics that nuclear medicine and molecular imaging technologists should understand. In this article, the statistics for evaluating interpretation accuracy, significance and variance are discussed. Throughout the article, actual statistics are pulled from published literature. Part two begins by explaining two methods for quantifying interpretive accuracy: inter-reader and intra-reader reliability. Agreement among readers can simply be expressed by percentage. However, Cohen's kappa is a more robust measure of agreement that accounts for chance. The higher the kappa score, the more agreement between readers. When three or more readers are being compared, Fleiss' kappa is used. Significance testing determines if the difference between two conditions or interventions is meaningful. Statistical significance is usually expressed using a number called a P-value. Calculation of P-value is beyond the scope of this review. However, knowing how to interpret P-values in important for understanding scientific literature. Generally, a P-value less than 0.05 is considered significant and indicates that the results of the experiment are due to more than just chance. Variance, standard deviation, confidence intervals and standard error explain the dispersion of data around a mean of a sample drawn from a population. Standard deviation is commonly reported in the literature. A small standard deviation indicates that there is not much variation in the sample data. Many biologic measurements fall into what is referred to as a normal distribution taking the shape of a bell curve. In a normal distribution, 68% of the data will fall within one standard deviation, 95% will fall between two standard deviations, and 99.7% of the data will fall within three standard deviations. Confidence intervals define the range of possible values within which the population parameter is likely to lie and gives an idea of
Directory of Open Access Journals (Sweden)
Dagmar Sigmundová
2012-01-01
Full Text Available BACKGROUND: An adequate statistical analysis of obtained data is a necessary condition for the right interpretation of results followed by correct formulation of conclusions. The use of appropriate statistical procedures is repeatedly a subject of justified criticism and recommendations in our and other international publications. For example, during the interpretation of results we often "blindly" rely on the statistical significance, and the practical significance of the research is ignored. OBJECTIVE: The main aim of this study is to highlight the formulation of the practical significance of the results and its correct interpretation. We used an analysis of the international studies and own findings from physical activity monitoring. Another aim is to introduce an applicable eﬀect size coefficient as a guide for assessing the practical significance. METHODS: Materials for formulating rules to assess the practical significance and introduction of eﬀect size coefficients was comprised of the 29 international and Czech studies about the level of week and short time (PE lesson, training, exercise lesson physical activity of Czech children and adolescents (1,129 girls and 938 boys, and adults (5,727 females and 5,426 males with use of accelerometers, pedometers and IPAQ questionnaires during 2000-2010. RESULTS: Rules for assessing the practical significance consider measurement error, data variability and the size of position from the beginning on the measurement scale. The formulation of the practical significance should include - a a determination of the minimal values of certain measurement unit that will be limiting for assessing the significancy of the diﬀerence; b the determination of a minimal size of mutual relationship between the expected results and findings. The presentation of the "eﬀect size" coefficients (d, r, r2, K2, ω2 comprises of their definition, the conditions for use as well as the calculation and interpretation of
Applications of spatial statistical network models to stream data
Isaak, Daniel J.; Peterson, Erin E.; Ver Hoef, Jay M.; Wenger, Seth J.; Falke, Jeffrey A.; Torgersen, Christian E.; Sowder, Colin; Steel, E. Ashley; Fortin, Marie-Josée; Jordan, Chris E.; Ruesch, Aaron S.; Som, Nicholas; Monestiez, Pascal
2014-01-01
Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for terrestrial applications and are not optimized for streams. A new class of spatial statistical model, based on valid covariance structures for stream networks, can be used with many common types of stream data (e.g., water quality attributes, habitat conditions, biological surveys) through application of appropriate distributions (e.g., Gaussian, binomial, Poisson). The spatial statistical network models account for spatial autocorrelation (i.e., nonindependence) among measurements, which allows their application to databases with clustered measurement locations. Large amounts of stream data exist in many areas where spatial statistical analyses could be used to develop novel insights, improve predictions at unsampled sites, and aid in the design of efficient monitoring strategies at relatively low cost. We review the topic of spatial autocorrelation and its effects on statistical inference, demonstrate the use of spatial statistics with stream datasets relevant to common research and management questions, and discuss additional applications and development potential for spatial statistics on stream networks. Free software for implementing the spatial statistical network models has been developed that enables custom applications with many stream databases.
Statistical Modelling of the Soil Dielectric Constant
Usowicz, Boguslaw; Marczewski, Wojciech; Bogdan Usowicz, Jerzy; Lipiec, Jerzy
2010-05-01
The dielectric constant of soil is the physical property being very sensitive on water content. It funds several electrical measurement techniques for determining the water content by means of direct (TDR, FDR, and others related to effects of electrical conductance and/or capacitance) and indirect RS (Remote Sensing) methods. The work is devoted to a particular statistical manner of modelling the dielectric constant as the property accounting a wide range of specific soil composition, porosity, and mass density, within the unsaturated water content. Usually, similar models are determined for few particular soil types, and changing the soil type one needs switching the model on another type or to adjust it by parametrization of soil compounds. Therefore, it is difficult comparing and referring results between models. The presented model was developed for a generic representation of soil being a hypothetical mixture of spheres, each representing a soil fraction, in its proper phase state. The model generates a serial-parallel mesh of conductive and capacitive paths, which is analysed for a total conductive or capacitive property. The model was firstly developed to determine the thermal conductivity property, and now it is extended on the dielectric constant by analysing the capacitive mesh. The analysis is provided by statistical means obeying physical laws related to the serial-parallel branching of the representative electrical mesh. Physical relevance of the analysis is established electrically, but the definition of the electrical mesh is controlled statistically by parametrization of compound fractions, by determining the number of representative spheres per unitary volume per fraction, and by determining the number of fractions. That way the model is capable covering properties of nearly all possible soil types, all phase states within recognition of the Lorenz and Knudsen conditions. In effect the model allows on generating a hypothetical representative of
Encoding Dissimilarity Data for Statistical Model Building.
Wahba, Grace
2010-12-01
We summarize, review and comment upon three papers which discuss the use of discrete, noisy, incomplete, scattered pairwise dissimilarity data in statistical model building. Convex cone optimization codes are used to embed the objects into a Euclidean space which respects the dissimilarity information while controlling the dimension of the space. A "newbie" algorithm is provided for embedding new objects into this space. This allows the dissimilarity information to be incorporated into a Smoothing Spline ANOVA penalized likelihood model, a Support Vector Machine, or any model that will admit Reproducing Kernel Hilbert Space components, for nonparametric regression, supervised learning, or semi-supervised learning. Future work and open questions are discussed. The papers are: F. Lu, S. Keles, S. Wright and G. Wahba 2005. A framework for kernel regularization with application to protein clustering. Proceedings of the National Academy of Sciences 102, 12332-1233.G. Corrada Bravo, G. Wahba, K. Lee, B. Klein, R. Klein and S. Iyengar 2009. Examining the relative influence of familial, genetic and environmental covariate information in flexible risk models. Proceedings of the National Academy of Sciences 106, 8128-8133F. Lu, Y. Lin and G. Wahba. Robust manifold unfolding with kernel regularization. TR 1008, Department of Statistics, University of Wisconsin-Madison.
Statistical Model Checking for Biological Systems
DEFF Research Database (Denmark)
David, Alexandre; Larsen, Kim Guldstrand; Legay, Axel
2014-01-01
Statistical Model Checking (SMC) is a highly scalable simulation-based verification approach for testing and estimating the probability that a stochastic system satisfies a given linear temporal property. The technique has been applied to (discrete and continuous time) Markov chains, stochastic...... timed automata and most recently hybrid systems using the tool Uppaal SMC. In this paper we enable the application of SMC to complex biological systems, by combining Uppaal SMC with ANIMO, a plugin of the tool Cytoscape used by biologists, as well as with SimBiology®, a plugin of Matlab to simulate...
Average Nuclear properties based on statistical model
International Nuclear Information System (INIS)
El-Jaick, L.J.
1974-01-01
The rough properties of nuclei were investigated by statistical model, in systems with the same and different number of protons and neutrons, separately, considering the Coulomb energy in the last system. Some average nuclear properties were calculated based on the energy density of nuclear matter, from Weizsscker-Beth mass semiempiric formulae, generalized for compressible nuclei. In the study of a s surface energy coefficient, the great influence exercised by Coulomb energy and nuclear compressibility was verified. For a good adjust of beta stability lines and mass excess, the surface symmetry energy were established. (M.C.K.) [pt
Digital Repository Service at National Institute of Oceanography (India)
Gopinath, A.; Muraleedharan, N.S.; Chandramohanakumar, N.; Jayalakshmy, K.V.
of the relative health and condition of subtropical and tropical estuarine ecosystems (Kemp, 2000). This article reports the statistical significance of the base-line data obtained from the analysis of some marine macro algae for their heavy metal contents... the highest levels of contamination within an aquatic ecosystem. In this study , all the metals studied were well below the limit prescribed by FAO, it is not at all a concern for its sources. Apart from the population factor, agro based activities can also...
Statistical modeling to support power system planning
Staid, Andrea
This dissertation focuses on data-analytic approaches that improve our understanding of power system applications to promote better decision-making. It tackles issues of risk analysis, uncertainty management, resource estimation, and the impacts of climate change. Tools of data mining and statistical modeling are used to bring new insight to a variety of complex problems facing today's power system. The overarching goal of this research is to improve the understanding of the power system risk environment for improved operation, investment, and planning decisions. The first chapter introduces some challenges faced in planning for a sustainable power system. Chapter 2 analyzes the driving factors behind the disparity in wind energy investments among states with a goal of determining the impact that state-level policies have on incentivizing wind energy. Findings show that policy differences do not explain the disparities; physical and geographical factors are more important. Chapter 3 extends conventional wind forecasting to a risk-based focus of predicting maximum wind speeds, which are dangerous for offshore operations. Statistical models are presented that issue probabilistic predictions for the highest wind speed expected in a three-hour interval. These models achieve a high degree of accuracy and their use can improve safety and reliability in practice. Chapter 4 examines the challenges of wind power estimation for onshore wind farms. Several methods for wind power resource assessment are compared, and the weaknesses of the Jensen model are demonstrated. For two onshore farms, statistical models outperform other methods, even when very little information is known about the wind farm. Lastly, chapter 5 focuses on the power system more broadly in the context of the risks expected from tropical cyclones in a changing climate. Risks to U.S. power system infrastructure are simulated under different scenarios of tropical cyclone behavior that may result from climate
Statistical mechanics of helical wormlike chain model
Liu, Ya; Pérez, Toni; Li, Wei; Gunton, J. D.; Green, Amanda
2011-02-01
We investigate the statistical mechanics of polymers with bending and torsional elasticity described by the helical wormlike model. Noticing that the energy function is factorizable, we provide a numerical method to solve the model using a transfer matrix formulation. The tangent-tangent and binormal-binormal correlation functions have been calculated and displayed rich profiles which are sensitive to the combination of the temperature and the equilibrium torsion. Their behaviors indicate that there is no finite temperature Lifshitz point between the disordered and helical phases. The asymptotic behavior at low temperature has been investigated theoretically and the predictions fit the numerical results very well. Our analysis could be used to understand the statics of dsDNA and other chiral polymers.
Statistical Mechanics of Helical Wormlike Model
Liu, Ya; Perez, Toni; Li, Wei; Gunton, James; Green, Amanda
2011-03-01
The bending and torsional elasticities are crucial in determining the static and dynamic properties of ~biopolymers such as dsDNA and sickle hemoglobin. We investigate the statistical mechanics of stiff polymers ~described by the helical wormlike model. We provide a numerical method to solve the model using a transfer matrix formulation. The correlation functions have been calculated and display rich profiles which are sensitive to the combination of the temperature and the equilibrium torsion. The asymptotic behavior at low temperature has been investigated theoretically and the predictions fit the numerical results very well. Our analysis could be used to understand the statics of dsDNA and other chiral polymers. This work is supported by grants from the NSF and Mathers Foundation.
Understanding and forecasting polar stratospheric variability with statistical models
Directory of Open Access Journals (Sweden)
C. Blume
2012-07-01
Full Text Available The variability of the north-polar stratospheric vortex is a prominent aspect of the middle atmosphere. This work investigates a wide class of statistical models with respect to their ability to model geopotential and temperature anomalies, representing variability in the polar stratosphere. Four partly nonstationary, nonlinear models are assessed: linear discriminant analysis (LDA; a cluster method based on finite elements (FEM-VARX; a neural network, namely the multi-layer perceptron (MLP; and support vector regression (SVR. These methods model time series by incorporating all significant external factors simultaneously, including ENSO, QBO, the solar cycle, volcanoes, to then quantify their statistical importance. We show that variability in reanalysis data from 1980 to 2005 is successfully modeled. The period from 2005 to 2011 can be hindcasted to a certain extent, where MLP performs significantly better than the remaining models. However, variability remains that cannot be statistically hindcasted within the current framework, such as the unexpected major warming in January 2009. Finally, the statistical model with the best generalization performance is used to predict a winter 2011/12 with warm and weak vortex conditions. A vortex breakdown is predicted for late January, early February 2012.
Atmospheric corrosion: statistical validation of models
International Nuclear Information System (INIS)
Diaz, V.; Martinez-Luaces, V.; Guineo-Cobs, G.
2003-01-01
In this paper we discuss two different methods for validation of regression models, applied to corrosion data. One of them is based on the correlation coefficient and the other one is the statistical test of lack of fit. Both methods are used here to analyse fitting of bi logarithmic model in order to predict corrosion for very low carbon steel substrates in rural and urban-industrial atmospheres in Uruguay. Results for parameters A and n of the bi logarithmic model are reported here. For this purpose, all repeated values were used instead of using average values as usual. Modelling is carried out using experimental data corresponding to steel substrates under the same initial meteorological conditions ( in fact, they are put in the rack at the same time). Results of correlation coefficient are compared with the lack of it tested at two different signification levels (α=0.01 and α=0.05). Unexpected differences between them are explained and finally, it is possible to conclude, at least in the studied atmospheres, that the bi logarithmic model does not fit properly the experimental data. (Author) 18 refs
Significance, Errors, Power, and Sample Size: The Blocking and Tackling of Statistics.
Mascha, Edward J; Vetter, Thomas R
2018-02-01
Inferential statistics relies heavily on the central limit theorem and the related law of large numbers. According to the central limit theorem, regardless of the distribution of the source population, a sample estimate of that population will have a normal distribution, but only if the sample is large enough. The related law of large numbers holds that the central limit theorem is valid as random samples become large enough, usually defined as an n ≥ 30. In research-related hypothesis testing, the term "statistically significant" is used to describe when an observed difference or association has met a certain threshold. This significance threshold or cut-point is denoted as alpha (α) and is typically set at .05. When the observed P value is less than α, one rejects the null hypothesis (Ho) and accepts the alternative. Clinical significance is even more important than statistical significance, so treatment effect estimates and confidence intervals should be regularly reported. A type I error occurs when the Ho of no difference or no association is rejected, when in fact the Ho is true. A type II error occurs when the Ho is not rejected, when in fact there is a true population effect. Power is the probability of detecting a true difference, effect, or association if it truly exists. Sample size justification and power analysis are key elements of a study design. Ethical concerns arise when studies are poorly planned or underpowered. When calculating sample size for comparing groups, 4 quantities are needed: α, type II error, the difference or effect of interest, and the estimated variability of the outcome variable. Sample size increases for increasing variability and power, and for decreasing α and decreasing difference to detect. Sample size for a given relative reduction in proportions depends heavily on the proportion in the control group itself, and increases as the proportion decreases. Sample size for single-group studies estimating an unknown parameter
Modelling vocal anatomy's significant effect on speech
de Boer, B.
2010-01-01
This paper investigates the effect of larynx position on the articulatory abilities of a humanlike vocal tract. Previous work has investigated models that were built to resemble the anatomy of existing species or fossil ancestors. This has led to conflicting conclusions about the relation between
McKay, J. Lucas; Welch, Torrence D. J.; Vidakovic, Brani
2013-01-01
We developed wavelet-based functional ANOVA (wfANOVA) as a novel approach for comparing neurophysiological signals that are functions of time. Temporal resolution is often sacrificed by analyzing such data in large time bins, increasing statistical power by reducing the number of comparisons. We performed ANOVA in the wavelet domain because differences between curves tend to be represented by a few temporally localized wavelets, which we transformed back to the time domain for visualization. We compared wfANOVA and ANOVA performed in the time domain (tANOVA) on both experimental electromyographic (EMG) signals from responses to perturbation during standing balance across changes in peak perturbation acceleration (3 levels) and velocity (4 levels) and on simulated data with known contrasts. In experimental EMG data, wfANOVA revealed the continuous shape and magnitude of significant differences over time without a priori selection of time bins. However, tANOVA revealed only the largest differences at discontinuous time points, resulting in features with later onsets and shorter durations than those identified using wfANOVA (P < 0.02). Furthermore, wfANOVA required significantly fewer (∼¼×; P < 0.015) significant F tests than tANOVA, resulting in post hoc tests with increased power. In simulated EMG data, wfANOVA identified known contrast curves with a high level of precision (r2 = 0.94 ± 0.08) and performed better than tANOVA across noise levels (P < <0.01). Therefore, wfANOVA may be useful for revealing differences in the shape and magnitude of neurophysiological signals (e.g., EMG, firing rates) across multiple conditions with both high temporal resolution and high statistical power. PMID:23100136
MSMBuilder: Statistical Models for Biomolecular Dynamics.
Harrigan, Matthew P; Sultan, Mohammad M; Hernández, Carlos X; Husic, Brooke E; Eastman, Peter; Schwantes, Christian R; Beauchamp, Kyle A; McGibbon, Robert T; Pande, Vijay S
2017-01-10
MSMBuilder is a software package for building statistical models of high-dimensional time-series data. It is designed with a particular focus on the analysis of atomistic simulations of biomolecular dynamics such as protein folding and conformational change. MSMBuilder is named for its ability to construct Markov state models (MSMs), a class of models that has gained favor among computational biophysicists. In addition to both well-established and newer MSM methods, the package includes complementary algorithms for understanding time-series data such as hidden Markov models and time-structure based independent component analysis. MSMBuilder boasts an easy to use command-line interface, as well as clear and consistent abstractions through its Python application programming interface. MSMBuilder was developed with careful consideration for compatibility with the broader machine learning community by following the design of scikit-learn. The package is used primarily by practitioners of molecular dynamics, but is just as applicable to other computational or experimental time-series measurements. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Spherical Process Models for Global Spatial Statistics
Jeong, Jaehong
2017-11-28
Statistical models used in geophysical, environmental, and climate science applications must reflect the curvature of the spatial domain in global data. Over the past few decades, statisticians have developed covariance models that capture the spatial and temporal behavior of these global data sets. Though the geodesic distance is the most natural metric for measuring distance on the surface of a sphere, mathematical limitations have compelled statisticians to use the chordal distance to compute the covariance matrix in many applications instead, which may cause physically unrealistic distortions. Therefore, covariance functions directly defined on a sphere using the geodesic distance are needed. We discuss the issues that arise when dealing with spherical data sets on a global scale and provide references to recent literature. We review the current approaches to building process models on spheres, including the differential operator, the stochastic partial differential equation, the kernel convolution, and the deformation approaches. We illustrate realizations obtained from Gaussian processes with different covariance structures and the use of isotropic and nonstationary covariance models through deformations and geographical indicators for global surface temperature data. To assess the suitability of each method, we compare their log-likelihood values and prediction scores, and we end with a discussion of related research problems.
Statistical determination of significant curved I-girder bridge seismic response parameters
Seo, Junwon
2013-06-01
Curved steel bridges are commonly used at interchanges in transportation networks and more of these structures continue to be designed and built in the United States. Though the use of these bridges continues to increase in locations that experience high seismicity, the effects of curvature and other parameters on their seismic behaviors have been neglected in current risk assessment tools. These tools can evaluate the seismic vulnerability of a transportation network using fragility curves. One critical component of fragility curve development for curved steel bridges is the completion of sensitivity analyses that help identify influential parameters related to their seismic response. In this study, an accessible inventory of existing curved steel girder bridges located primarily in the Mid-Atlantic United States (MAUS) was used to establish statistical characteristics used as inputs for a seismic sensitivity study. Critical seismic response quantities were captured using 3D nonlinear finite element models. Influential parameters from these quantities were identified using statistical tools that incorporate experimental Plackett-Burman Design (PBD), which included Pareto optimal plots and prediction profiler techniques. The findings revealed that the potential variation in the influential parameters included number of spans, radius of curvature, maximum span length, girder spacing, and cross-frame spacing. These parameters showed varying levels of influence on the critical bridge response.
Statistical Validation of Normal Tissue Complication Probability Models
Energy Technology Data Exchange (ETDEWEB)
Xu Chengjian, E-mail: c.j.xu@umcg.nl [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Schaaf, Arjen van der; Veld, Aart A. van' t; Langendijk, Johannes A. [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Schilstra, Cornelis [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Radiotherapy Institute Friesland, Leeuwarden (Netherlands)
2012-09-01
Purpose: To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. Methods and Materials: A penalized regression method, LASSO (least absolute shrinkage and selection operator), was used to build NTCP models for xerostomia after radiation therapy treatment of head-and-neck cancer. Model assessment was based on the likelihood function and the area under the receiver operating characteristic curve. Results: Repeated double cross-validation showed the uncertainty and instability of the NTCP models and indicated that the statistical significance of model performance can be obtained by permutation testing. Conclusion: Repeated double cross-validation and permutation tests are recommended to validate NTCP models before clinical use.
Energy Technology Data Exchange (ETDEWEB)
Crow, C.J.
1985-01-01
Middle Ordovician age Chickamauga Group carbonates crop out along the Birmingham and Murphrees Valley anticlines in central Alabama. The macrofossil contents on exposed surfaces of seven bioherms have been counted to determine their various paleontologic characteristics. Twelve groups of organisms are present in these bioherms. Dominant organisms include bryozoans, algae, brachiopods, sponges, pelmatozoans, stromatoporoids and corals. Minor accessory fauna include predators, scavengers and grazers such as gastropods, ostracods, trilobites, cephalopods and pelecypods. Vertical and horizontal niche zonation has been detected for some of the bioherm dwelling fauna. No one bioherm of those studied exhibits all 12 groups of organisms; rather, individual bioherms display various subsets of the total diversity. Statistical treatment (G-test) of the diversity data indicates a lack of statistical homogeneity of the bioherms, both within and between localities. Between-locality population heterogeneity can be ascribed to differences in biologic responses to such gross environmental factors as water depth and clarity, and energy levels. At any one locality, gross aspects of the paleoenvironments are assumed to have been more uniform. Significant differences among bioherms at any one locality may have resulted from patchy distribution of species populations, differential preservation and other factors.
Van Aert, Robbie C M; van Assen, Marcel A.L.M.
2017-01-01
The unrealistically high rate of positive results within psychology has increased the attention to replication research. However, researchers who conduct a replication and want to statistically combine the results of their replication with a statistically significant original study encounter
Van Aert, R.C.M.; Van Assen, M.A.L.M.
2018-01-01
The unrealistically high rate of positive results within psychology has increased the attention to replication research. However, researchers who conduct a replication and want to statistically combine the results of their replication with a statistically significant original study encounter
Statistical significance of non-reproducibility of cross sections in dissipative reactions
International Nuclear Information System (INIS)
Wang Qi; Dong Yuchuan; Li Songlin; Tian Wendong
2003-01-01
Two independent excitation function measurements have been performed in the reaction system of 19 F+ 93 Nb using two target foils of the same nominal thickness. The authors measured the dissipative reaction products at incident energies of 102 through 108 MeV with a step of 250 keV. The variance of energy autocorrelation functions of the reaction products was found to be three times of that originated from the randomized counting rates. By analysing the probability distributions of the deviations in the measured cross sections, the authors found that about 20% of all the deviations exceeds three standard deviations. This indicates that the non-reproducibility of the cross sections in the two independent measurements is of a statistical significance but not originated from randomized fluctuation of counting rates
Statistical model for OCT image denoising
Li, Muxingzi
2017-08-01
Optical coherence tomography (OCT) is a non-invasive technique with a large array of applications in clinical imaging and biological tissue visualization. However, the presence of speckle noise affects the analysis of OCT images and their diagnostic utility. In this article, we introduce a new OCT denoising algorithm. The proposed method is founded on a numerical optimization framework based on maximum-a-posteriori estimate of the noise-free OCT image. It combines a novel speckle noise model, derived from local statistics of empirical spectral domain OCT (SD-OCT) data, with a Huber variant of total variation regularization for edge preservation. The proposed approach exhibits satisfying results in terms of speckle noise reduction as well as edge preservation, at reduced computational cost.
Current algebra, statistical mechanics and quantum models
Vilela Mendes, R.
2017-11-01
Results obtained in the past for free boson systems at zero and nonzero temperatures are revisited to clarify the physical meaning of current algebra reducible functionals which are associated to systems with density fluctuations, leading to observable effects on phase transitions. To use current algebra as a tool for the formulation of quantum statistical mechanics amounts to the construction of unitary representations of diffeomorphism groups. Two mathematical equivalent procedures exist for this purpose. One searches for quasi-invariant measures on configuration spaces, the other for a cyclic vector in Hilbert space. Here, one argues that the second approach is closer to the physical intuition when modelling complex systems. An example of application of the current algebra methodology to the pairing phenomenon in two-dimensional fermion systems is discussed.
Kellerer-Pirklbauer, Andreas
2016-04-01
Longer data series (e.g. >10 a) of ground temperatures in alpine regions are helpful to improve the understanding regarding the effects of present climate change on distribution and thermal characteristics of seasonal frost- and permafrost-affected areas. Beginning in 2004 - and more intensively since 2006 - a permafrost and seasonal frost monitoring network was established in Central and Eastern Austria by the University of Graz. This network consists of c.60 ground temperature (surface and near-surface) monitoring sites which are located at 1922-3002 m a.s.l., at latitude 46°55'-47°22'N and at longitude 12°44'-14°41'E. These data allow conclusions about general ground thermal conditions, potential permafrost occurrence, trend during the observation period, and regional pattern of changes. Calculations and analyses of several different temperature-related parameters were accomplished. At an annual scale a region-wide statistical significant warming during the observation period was revealed by e.g. an increase in mean annual temperature values (mean, maximum) or the significant lowering of the surface frost number (F+). At a seasonal scale no significant trend of any temperature-related parameter was in most cases revealed for spring (MAM) and autumn (SON). Winter (DJF) shows only a weak warming. In contrast, the summer (JJA) season reveals in general a significant warming as confirmed by several different temperature-related parameters such as e.g. mean seasonal temperature, number of thawing degree days, number of freezing degree days, or days without night frost. On a monthly basis August shows the statistically most robust and strongest warming of all months, although regional differences occur. Despite the fact that the general ground temperature warming during the last decade is confirmed by the field data in the study region, complications in trend analyses arise by temperature anomalies (e.g. warm winter 2006/07) or substantial variations in the winter
New advances in statistical modeling and applications
Santos, Rui; Oliveira, Maria; Paulino, Carlos
2014-01-01
This volume presents selected papers from the XIXth Congress of the Portuguese Statistical Society, held in the town of Nazaré, Portugal, from September 28 to October 1, 2011. All contributions were selected after a thorough peer-review process. It covers a broad range of papers in the areas of statistical science, probability and stochastic processes, extremes and statistical applications.
Directory of Open Access Journals (Sweden)
Sadreyev Ruslan I
2004-08-01
Full Text Available Abstract Background Profile-based analysis of multiple sequence alignments (MSA allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1 MSA position and a set of predicted residue frequencies, and (2 between two MSA positions. These problems are important for (i evaluation and optimization of methods predicting residue occurrence at protein positions; (ii detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii detection of sites that determine functional or structural specificity in two related families. Results For problems (1 and (2, we propose analytical estimates of P-value and apply them to the detection of significant positional dissimilarities in various experimental situations. (a We compare structure-based predictions of residue propensities at a protein position to the actual residue frequencies in the MSA of homologs. (b We evaluate our method by the ability to detect erroneous position matches produced by an automatic sequence aligner. (c We compare MSA positions that correspond to residues aligned by automatic structure aligners. (d We compare MSA positions that are aligned by high-quality manual superposition of structures. Detected dissimilarities reveal shortcomings of the automatic methods for residue frequency prediction and alignment construction. For the high-quality structural alignments, the dissimilarities suggest sites of potential functional or structural importance. Conclusion The proposed computational method is of significant potential value for the analysis of protein families.
Directory of Open Access Journals (Sweden)
Ernesto eIacucci
2012-02-01
Full Text Available High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an interesting set of genes—say, genes that are co-expressed or bound by the same factor. One way of understanding the biological meaning of such a set is to consider what processes or functions, as defined in an ontology, are over-represented (enriched or under-represented (depleted among genes in the set. Usually, the significance of enrichment or depletion scores is based on simple statistical models and on the membership of genes in different classifications. We consider the more general problem of computing p-values for arbitrary integer additive statistics, or weighted membership functions. Such membership functions can be used to represent, for example, prior knowledge on the role of certain genes or classifications, differential importance of different classifications or genes to the experimenter, hierarchical relationships between classifications, or different degrees of interestingness or evidence for specific genes. We describe a generic dynamic programming algorithm that can compute exact p-values for arbitrary integer additive statistics. We also describe several optimizations for important special cases, which can provide orders-of-magnitude speed up in the computations. We apply our methods to datasets describing oxidative phosphorylation and parturition and compare p-values based on computations of several different statistics for measuring enrichment. We find major differences between p-values resulting from these statistics, and that some statistics recover gold standard annotations of the data better than others. Our work establishes a theoretical and algorithmic basis for far richer notions of enrichment or depletion of gene sets with respect to gene ontologies than has previously been available.
Deng, Nina; Allison, Jeroan J; Fang, Hua Julia; Ash, Arlene S; Ware, John E
2013-05-31
Relative validity (RV), a ratio of ANOVA F-statistics, is often used to compare the validity of patient-reported outcome (PRO) measures. We used the bootstrap to establish the statistical significance of the RV and to identify key factors affecting its significance. Based on responses from 453 chronic kidney disease (CKD) patients to 16 CKD-specific and generic PRO measures, RVs were computed to determine how well each measure discriminated across clinically-defined groups of patients compared to the most discriminating (reference) measure. Statistical significance of RV was quantified by the 95% bootstrap confidence interval. Simulations examined the effects of sample size, denominator F-statistic, correlation between comparator and reference measures, and number of bootstrap replicates. The statistical significance of the RV increased as the magnitude of denominator F-statistic increased or as the correlation between comparator and reference measures increased. A denominator F-statistic of 57 conveyed sufficient power (80%) to detect an RV of 0.6 for two measures correlated at r = 0.7. Larger denominator F-statistics or higher correlations provided greater power. Larger sample size with a fixed denominator F-statistic or more bootstrap replicates (beyond 500) had minimal impact. The bootstrap is valuable for establishing the statistical significance of RV estimates. A reasonably large denominator F-statistic (F > 57) is required for adequate power when using the RV to compare the validity of measures with small or moderate correlations (r 0.9).
Directory of Open Access Journals (Sweden)
Blalock Eric M
2007-07-01
Full Text Available Abstract Background Researchers using RNA expression microarrays in experimental designs with more than two treatment groups often identify statistically significant genes with ANOVA approaches. However, the ANOVA test does not discriminate which of the multiple treatment groups differ from one another. Thus, post hoc tests, such as linear contrasts, template correlations, and pairwise comparisons are used. Linear contrasts and template correlations work extremely well, especially when the researcher has a priori information pointing to a particular pattern/template among the different treatment groups. Further, all pairwise comparisons can be used to identify particular, treatment group-dependent patterns of gene expression. However, these approaches are biased by the researcher's assumptions, and some treatment-based patterns may fail to be detected using these approaches. Finally, different patterns may have different probabilities of occurring by chance, importantly influencing researchers' conclusions about a pattern and its constituent genes. Results We developed a four step, post hoc pattern matching (PPM algorithm to automate single channel gene expression pattern identification/significance. First, 1-Way Analysis of Variance (ANOVA, coupled with post hoc 'all pairwise' comparisons are calculated for all genes. Second, for each ANOVA-significant gene, all pairwise contrast results are encoded to create unique pattern ID numbers. The # genes found in each pattern in the data is identified as that pattern's 'actual' frequency. Third, using Monte Carlo simulations, those patterns' frequencies are estimated in random data ('random' gene pattern frequency. Fourth, a Z-score for overrepresentation of the pattern is calculated ('actual' against 'random' gene pattern frequencies. We wrote a Visual Basic program (StatiGen that automates PPM procedure, constructs an Excel workbook with standardized graphs of overrepresented patterns, and lists of
A Statistical Model for Energy Intensity
Directory of Open Access Journals (Sweden)
Marjaneh Issapour
2012-12-01
Full Text Available A promising approach to improve scientific literacy in regards to global warming and climate change is using a simulation as part of a science education course. The simulation needs to employ scientific analysis of actual data from internationally accepted and reputable databases to demonstrate the reality of the current climate change situation. One of the most important criteria for using a simulation in a science education course is the fidelity of the model. The realism of the events and consequences modeled in the simulation is significant as well. Therefore, all underlying equations and algorithms used in the simulation must have real-world scientific basis. The "Energy Choices" simulation is one such simulation. The focus of this paper is the development of a mathematical model for "Energy Intensity" as a part of the overall system dynamics in "Energy Choices" simulation. This model will define the "Energy Intensity" as a function of other independent variables that can be manipulated by users of the simulation. The relationship discovered by this research will be applied to an algorithm in the "Energy Choices" simulation.
Directory of Open Access Journals (Sweden)
Helmut Kern
2012-03-01
Full Text Available Aging is a multifactorial process that is characterized by decline in muscle mass and performance. Several factors, including reduced exercise, poor nutrition and modified hormonal metabolism, are responsible for changes in the rates of protein synthesis and degradation that drive skeletal muscle mass reduction with a consequent decline of force generation and mobility functional performances. Seniors with normal life style were enrolled: two groups in Vienna (n=32 and two groups in Bratislava: (n=19. All subjects were healthy and declared not to have any specific physical/disease problems. The two Vienna groups of seniors exercised for 10 weeks with two different types of training (leg press at the hospital or home-based functional electrical stimulation, h-b FES. Demografic data (age, height and weight were recorded before and after the training period and before and after the training period the patients were submitted to mobility functional analyses and muscle biopsies. The mobility functional analyses were: 1. gait speed (10m test fastest speed, in m/s; 2. time which the subject needed to rise from a chair for five times (5x Chair-Rise, in s; 3. Timed –Up-Go- Test, in s; 4. Stair-Test, in s; 5. isometric measurement of quadriceps force (Torque/kg, in Nm/kg; and 6. Dynamic Balance in mm. Preliminary analyses of muscle biopsies from quadriceps in some of the Vienna and Bratislava patients present morphometric results consistent with their functional behaviors. The statistically significant improvements in functional testings here reported demonstrates the effectiveness of h-b FES, and strongly support h-b FES, as a safe home-based method to improve contractility and performances of ageing muscles.
Statistical model selection with “Big Data”
Directory of Open Access Journals (Sweden)
Jurgen A. Doornik
2015-12-01
Full Text Available Big Data offer potential benefits for statistical modelling, but confront problems including an excess of false positives, mistaking correlations for causes, ignoring sampling biases and selecting by inappropriate methods. We consider the many important requirements when searching for a data-based relationship using Big Data, and the possible role of Autometrics in that context. Paramount considerations include embedding relationships in general initial models, possibly restricting the number of variables to be selected over by non-statistical criteria (the formulation problem, using good quality data on all variables, analyzed with tight significance levels by a powerful selection procedure, retaining available theory insights (the selection problem while testing for relationships being well specified and invariant to shifts in explanatory variables (the evaluation problem, using a viable approach that resolves the computational problem of immense numbers of possible models.
Statistical Challenges in Modeling Big Brain Signals
Yu, Zhaoxia
2017-11-01
Brain signal data are inherently big: massive in amount, complex in structure, and high in dimensions. These characteristics impose great challenges for statistical inference and learning. Here we review several key challenges, discuss possible solutions, and highlight future research directions.
Statistical Learning Theory: Models, Concepts, and Results
von Luxburg, Ulrike; Schoelkopf, Bernhard
2008-01-01
Statistical learning theory provides the theoretical basis for many of today's machine learning algorithms. In this article we attempt to give a gentle, non-technical overview over the key ideas and insights of statistical learning theory. We target at a broad audience, not necessarily machine learning researchers. This paper can serve as a starting point for people who want to get an overview on the field before diving into technical details.
Online Statistical Modeling (Regression Analysis) for Independent Responses
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
Risk prediction model: Statistical and artificial neural network approach
Paiman, Nuur Azreen; Hariri, Azian; Masood, Ibrahim
2017-04-01
Prediction models are increasingly gaining popularity and had been used in numerous areas of studies to complement and fulfilled clinical reasoning and decision making nowadays. The adoption of such models assist physician's decision making, individual's behavior, and consequently improve individual outcomes and the cost-effectiveness of care. The objective of this paper is to reviewed articles related to risk prediction model in order to understand the suitable approach, development and the validation process of risk prediction model. A qualitative review of the aims, methods and significant main outcomes of the nineteen published articles that developed risk prediction models from numerous fields were done. This paper also reviewed on how researchers develop and validate the risk prediction models based on statistical and artificial neural network approach. From the review done, some methodological recommendation in developing and validating the prediction model were highlighted. According to studies that had been done, artificial neural network approached in developing the prediction model were more accurate compared to statistical approach. However currently, only limited published literature discussed on which approach is more accurate for risk prediction model development.
Statistical modelling of transcript profiles of differentially regulated genes
Directory of Open Access Journals (Sweden)
Sergeant Martin J
2008-07-01
allowed 11% of the Escherichia coli features to be fitted by an exponential function, and 25% of the Rattus norvegicus features could be described by the critical exponential model, all with statistical significance of p Conclusion The statistical non-linear regression approaches presented in this study provide detailed biologically oriented descriptions of individual gene expression profiles, using biologically variable data to generate a set of defining parameters. These approaches have application to the modelling and greater interpretation of profiles obtained across a wide range of platforms, such as microarrays. Through careful choice of appropriate model forms, such statistical regression approaches allow an improved comparison of gene expression profiles, and may provide an approach for the greater understanding of common regulatory mechanisms between genes.
Linear Mixed Models in Statistical Genetics
R. de Vlaming (Ronald)
2017-01-01
markdownabstractOne of the goals of statistical genetics is to elucidate the genetic architecture of phenotypes (i.e., observable individual characteristics) that are affected by many genetic variants (e.g., single-nucleotide polymorphisms; SNPs). A particular aim is to identify specific SNPs that
Statistical models and methods for reliability and survival analysis
Couallier, Vincent; Huber-Carol, Catherine; Mesbah, Mounir; Huber -Carol, Catherine; Limnios, Nikolaos; Gerville-Reache, Leo
2013-01-01
Statistical Models and Methods for Reliability and Survival Analysis brings together contributions by specialists in statistical theory as they discuss their applications providing up-to-date developments in methods used in survival analysis, statistical goodness of fit, stochastic processes for system reliability, amongst others. Many of these are related to the work of Professor M. Nikulin in statistics over the past 30 years. The authors gather together various contributions with a broad array of techniques and results, divided into three parts - Statistical Models and Methods, Statistical
Spatio-temporal statistical models with applications to atmospheric processes
International Nuclear Information System (INIS)
Wikle, C.K.
1996-01-01
This doctoral dissertation is presented as three self-contained papers. An introductory chapter considers traditional spatio-temporal statistical methods used in the atmospheric sciences from a statistical perspective. Although this section is primarily a review, many of the statistical issues considered have not been considered in the context of these methods and several open questions are posed. The first paper attempts to determine a means of characterizing the semiannual oscillation (SAO) spatial variation in the northern hemisphere extratropical height field. It was discovered that the midlatitude SAO in 500hPa geopotential height could be explained almost entirely as a result of spatial and temporal asymmetries in the annual variation of stationary eddies. It was concluded that the mechanism for the SAO in the northern hemisphere is a result of land-sea contrasts. The second paper examines the seasonal variability of mixed Rossby-gravity waves (MRGW) in lower stratospheric over the equatorial Pacific. Advanced cyclostationary time series techniques were used for analysis. It was found that there are significant twice-yearly peaks in MRGW activity. Analyses also suggested a convergence of horizontal momentum flux associated with these waves. In the third paper, a new spatio-temporal statistical model is proposed that attempts to consider the influence of both temporal and spatial variability. This method is mainly concerned with prediction in space and time, and provides a spatially descriptive and temporally dynamic model
Spatio-temporal statistical models with applications to atmospheric processes
Energy Technology Data Exchange (ETDEWEB)
Wikle, Christopher K. [Iowa State Univ., Ames, IA (United States)
1996-01-01
This doctoral dissertation is presented as three self-contained papers. An introductory chapter considers traditional spatio-temporal statistical methods used in the atmospheric sciences from a statistical perspective. Although this section is primarily a review, many of the statistical issues considered have not been considered in the context of these methods and several open questions are posed. The first paper attempts to determine a means of characterizing the semiannual oscillation (SAO) spatial variation in the northern hemisphere extratropical height field. It was discovered that the midlatitude SAO in 500hPa geopotential height could be explained almost entirely as a result of spatial and temporal asymmetries in the annual variation of stationary eddies. It was concluded that the mechanism for the SAO in the northern hemisphere is a result of land-sea contrasts. The second paper examines the seasonal variability of mixed Rossby-gravity waves (MRGW) in lower stratospheric over the equatorial Pacific. Advanced cyclostationary time series techniques were used for analysis. It was found that there are significant twice-yearly peaks in MRGW activity. Analyses also suggested a convergence of horizontal momentum flux associated with these waves. In the third paper, a new spatio-temporal statistical model is proposed that attempts to consider the influence of both temporal and spatial variability. This method is mainly concerned with prediction in space and time, and provides a spatially descriptive and temporally dynamic model.
Geometric modeling in probability and statistics
Calin, Ovidiu
2014-01-01
This book covers topics of Informational Geometry, a field which deals with the differential geometric study of the manifold probability density functions. This is a field that is increasingly attracting the interest of researchers from many different areas of science, including mathematics, statistics, geometry, computer science, signal processing, physics and neuroscience. It is the authors’ hope that the present book will be a valuable reference for researchers and graduate students in one of the aforementioned fields. This textbook is a unified presentation of differential geometry and probability theory, and constitutes a text for a course directed at graduate or advanced undergraduate students interested in applications of differential geometry in probability and statistics. The book contains over 100 proposed exercises meant to help students deepen their understanding, and it is accompanied by software that is able to provide numerical computations of several information geometric objects. The reader...
Statistical Model Checking of Rich Models and Properties
DEFF Research Database (Denmark)
Poulsen, Danny Bøgsted
Software is in increasing fashion embedded within safety- and business critical processes of society. Errors in these embedded systems can lead to human casualties or severe monetary loss. Model checking technology has proven formal methods capable of finding and correcting errors in software...... motivates why existing model checking technology should be supplemented by new techniques. It also contains a brief introduction to probability theory and concepts covered by the six papers making up the second part. The first two papers are concerned with developing online monitoring techniques...... systems. The fifth paper shows how stochastic hybrid automata are useful for modelling biological systems and the final paper is concerned with showing how statistical model checking is efficiently distributed. In parallel with developing the theory contained in the papers, a substantial part of this work...
A statistical model of future human actions
International Nuclear Information System (INIS)
Woo, G.
1992-02-01
A critical review has been carried out of models of future human actions during the long term post-closure period of a radioactive waste repository. Various Markov models have been considered as alternatives to the standard Poisson model, and the problems of parameterisation have been addressed. Where the simplistic Poisson model unduly exaggerates the intrusion risk, some form of Markov model may have to be introduced. This situation may well arise for shallow repositories, but it is less likely for deep repositories. Recommendations are made for a practical implementation of a computer based model and its associated database. (Author)
A two-component rain model for the prediction of attenuation statistics
Crane, R. K.
1982-01-01
A two-component rain model has been developed for calculating attenuation statistics. In contrast to most other attenuation prediction models, the two-component model calculates the occurrence probability for volume cells or debris attenuation events. The model performed significantly better than the International Radio Consultative Committee model when used for predictions on earth-satellite paths. It is expected that the model will have applications in modeling the joint statistics required for space diversity system design, the statistics of interference due to rain scatter at attenuating frequencies, and the duration statistics for attenuation events.
Statistical models of shape optimisation and evaluation
Davies, Rhodri; Taylor, Chris
2014-01-01
Deformable shape models have wide application in computer vision and biomedical image analysis. This book addresses a key issue in shape modelling: establishment of a meaningful correspondence between a set of shapes. Full implementation details are provided.
Enhanced surrogate models for statistical design exploiting space mapping technology
DEFF Research Database (Denmark)
Koziel, Slawek; Bandler, John W.; Mohamed, Achmed S.
2005-01-01
We present advances in microwave and RF device modeling exploiting Space Mapping (SM) technology. We propose new SM modeling formulations utilizing input mappings, output mappings, frequency scaling and quadratic approximations. Our aim is to enhance circuit models for statistical analysis...
Experimental investigation of statistical models describing distribution of counts
International Nuclear Information System (INIS)
Salma, I.; Zemplen-Papp, E.
1992-01-01
The binomial, Poisson and modified Poisson models which are used for describing the statistical nature of the distribution of counts are compared theoretically, and conclusions for application are considered. The validity of the Poisson and the modified Poisson statistical distribution for observing k events in a short time interval is investigated experimentally for various measuring times. The experiments to measure the influence of the significant radioactive decay were performed with 89 Y m (T 1/2 =16.06 s), using a multichannel analyser (4096 channels) in the multiscaling mode. According to the results, Poisson statistics describe the counting experiment for short measuring times (up to T=0.5T 1/2 ) and its application is recommended. However, analysis of the data demonstrated, with confidence, that for long measurements (T≥T 1/2 ) Poisson distribution is not valid and the modified Poisson function is preferable. The practical implications in calculating uncertainties and in optimizing the measuring time are discussed. Differences between the standard deviations evaluated on the basis of the Poisson and binomial models are especially significant for experiments with long measuring time (T/T 1/2 ≥2) and/or large detection efficiency (ε>0.30). Optimization of the measuring time for paired observations yields the same solution for either the binomial or the Poisson distribution. (orig.)
The issue of statistical power for overall model fit in evaluating structural equation models
Directory of Open Access Journals (Sweden)
Richard HERMIDA
2015-06-01
Full Text Available Statistical power is an important concept for psychological research. However, examining the power of a structural equation model (SEM is rare in practice. This article provides an accessible review of the concept of statistical power for the Root Mean Square Error of Approximation (RMSEA index of overall model fit in structural equation modeling. By way of example, we examine the current state of power in the literature by reviewing studies in top Industrial-Organizational (I/O Psychology journals using SEMs. Results indicate that in many studies, power is very low, which implies acceptance of invalid models. Additionally, we examined methodological situations which may have an influence on statistical power of SEMs. Results showed that power varies significantly as a function of model type and whether or not the model is the main model for the study. Finally, results indicated that power is significantly related to model fit statistics used in evaluating SEMs. The results from this quantitative review imply that researchers should be more vigilant with respect to power in structural equation modeling. We therefore conclude by offering methodological best practices to increase confidence in the interpretation of structural equation modeling results with respect to statistical power issues.
Statistical Tests for Mixed Linear Models
Khuri, André I; Sinha, Bimal K
2011-01-01
An advanced discussion of linear models with mixed or random effects. In recent years a breakthrough has occurred in our ability to draw inferences from exact and optimum tests of variance component models, generating much research activity that relies on linear models with mixed and random effects. This volume covers the most important research of the past decade as well as the latest developments in hypothesis testing. It compiles all currently available results in the area of exact and optimum tests for variance component models and offers the only comprehensive treatment for these models a
Statistical image processing and multidimensional modeling
Fieguth, Paul
2010-01-01
Images are all around us! The proliferation of low-cost, high-quality imaging devices has led to an explosion in acquired images. When these images are acquired from a microscope, telescope, satellite, or medical imaging device, there is a statistical image processing task: the inference of something - an artery, a road, a DNA marker, an oil spill - from imagery, possibly noisy, blurry, or incomplete. A great many textbooks have been written on image processing. However this book does not so much focus on images, per se, but rather on spatial data sets, with one or more measurements taken over
Directory of Open Access Journals (Sweden)
Dominic Beaulieu-Prévost
2006-03-01
Full Text Available For the last 50 years of research in quantitative social sciences, the empirical evaluation of scientific hypotheses has been based on the rejection or not of the null hypothesis. However, more than 300 articles demonstrated that this method was problematic. In summary, null hypothesis testing (NHT is unfalsifiable, its results depend directly on sample size and the null hypothesis is both improbable and not plausible. Consequently, alternatives to NHT such as confidence intervals (CI and measures of effect size are starting to be used in scientific publications. The purpose of this article is, first, to provide the conceptual tools necessary to implement an approach based on confidence intervals, and second, to briefly demonstrate why such an approach is an interesting alternative to an approach based on NHT. As demonstrated in the article, the proposed CI approach avoids most problems related to a NHT approach and can often improve the scientific and contextual relevance of the statistical interpretations by testing range hypotheses instead of a point hypothesis and by defining the minimal value of a substantial effect. The main advantage of such a CI approach is that it replaces the notion of statistical power by an easily interpretable three-value logic (probable presence of a substantial effect, probable absence of a substantial effect and probabilistic undetermination. The demonstration includes a complete example.
DEFF Research Database (Denmark)
Madsen, Tobias
2017-01-01
. These challenges include model selection and multiple testing problems. On the other hand, in large datasets we can often exploit hierarchical structures to improve inference: E.g. in differential expression studies with multiple genes, it is natural to define a distribution of the variability of each gene......-point approximation and an importance sampling scheme that are fast to evaluate yet accurate. We demonstrate the methods on multiple models including the Poisson-binomial model, a high-order Markov chain motif model and phylogenetic trees of sequence evolution The methods are implemented in a publicly available R...
Statistical modeling and extrapolation of carcinogenesis data
International Nuclear Information System (INIS)
Krewski, D.; Murdoch, D.; Dewanji, A.
1986-01-01
Mathematical models of carcinogenesis are reviewed, including pharmacokinetic models for metabolic activation of carcinogenic substances. Maximum likelihood procedures for fitting these models to epidemiological data are discussed, including situations where the time to tumor occurrence is unobservable. The plausibility of different possible shapes of the dose response curve at low doses is examined, and a robust method for linear extrapolation to low doses is proposed and applied to epidemiological data on radiation carcinogenesis
Statistical Model Selection for TID Hardness Assurance
Ladbury, R.; Gorelick, J. L.; McClure, S.
2010-01-01
Radiation Hardness Assurance (RHA) methodologies against Total Ionizing Dose (TID) degradation impose rigorous statistical treatments for data from a part's Radiation Lot Acceptance Test (RLAT) and/or its historical performance. However, no similar methods exist for using "similarity" data - that is, data for similar parts fabricated in the same process as the part under qualification. This is despite the greater difficulty and potential risk in interpreting of similarity data. In this work, we develop methods to disentangle part-to-part, lot-to-lot and part-type-to-part-type variation. The methods we develop apply not just for qualification decisions, but also for quality control and detection of process changes and other "out-of-family" behavior. We begin by discussing the data used in ·the study and the challenges of developing a statistic providing a meaningful measure of degradation across multiple part types, each with its own performance specifications. We then develop analysis techniques and apply them to the different data sets.
Multivariate statistical modelling based on generalized linear models
Fahrmeir, Ludwig
1994-01-01
This book is concerned with the use of generalized linear models for univariate and multivariate regression analysis. Its emphasis is to provide a detailed introductory survey of the subject based on the analysis of real data drawn from a variety of subjects including the biological sciences, economics, and the social sciences. Where possible, technical details and proofs are deferred to an appendix in order to provide an accessible account for non-experts. Topics covered include: models for multi-categorical responses, model checking, time series and longitudinal data, random effects models, and state-space models. Throughout, the authors have taken great pains to discuss the underlying theoretical ideas in ways that relate well to the data at hand. As a result, numerous researchers whose work relies on the use of these models will find this an invaluable account to have on their desks. "The basic aim of the authors is to bring together and review a large part of recent advances in statistical modelling of m...
Directory of Open Access Journals (Sweden)
Simone Fiori
2007-07-01
Full Text Available Bivariate statistical modeling from incomplete data is a useful statistical tool that allows to discover the model underlying two data sets when the data in the two sets do not correspond in size nor in ordering. Such situation may occur when the sizes of the two data sets do not match (i.e., there are Ã‚Â“holesÃ‚Â” in the data or when the data sets have been acquired independently. Also, statistical modeling is useful when the amount of available data is enough to show relevant statistical features of the phenomenon underlying the data. We propose to tackle the problem of statistical modeling via a neural (nonlinear system that is able to match its input-output statistic to the statistic of the available data sets. A key point of the new implementation proposed here is that it is based on look-up-table (LUT neural systems, which guarantee a computationally advantageous way of implementing neural systems. A number of numerical experiments, performed on both synthetic and real-world data sets, illustrate the features of the proposed modeling procedure.
Statistical Modelling of Extreme Rainfall in Taiwan
L-F. Chu (Lan-Fen); M.J. McAleer (Michael); C-C. Chang (Ching-Chung)
2012-01-01
textabstractIn this paper, the annual maximum daily rainfall data from 1961 to 2010 are modelled for 18 stations in Taiwan. We fit the rainfall data with stationary and non-stationary generalized extreme value distributions (GEV), and estimate their future behaviour based on the best fitting model.
Statistical Modelling of Extreme Rainfall in Taiwan
L. Chu (LanFen); M.J. McAleer (Michael); C-H. Chang (Chu-Hsiang)
2013-01-01
textabstractIn this paper, the annual maximum daily rainfall data from 1961 to 2010 are modelled for 18 stations in Taiwan. We fit the rainfall data with stationary and non-stationary generalized extreme value distributions (GEV), and estimate their future behaviour based on the best fitting model.
Statistical modelling of traffic safety development
DEFF Research Database (Denmark)
Christens, Peter
2004-01-01
Road safety is a major concern for society and individuals. Although road safety has improved in recent years, the number of road fatalities is still unacceptably high. In 2000, road accidents killed over 40,000 people in the European Union and injured more than 1.7 million. In 2001 in Denmark...... there were 6861 injury trafficc accidents reported by the police, resulting in 4519 minor injuries, 3946 serious injuries, and 431 fatalities. The general purpose of the research was to improve the insight into aggregated road safety methodology in Denmark. The aim was to analyse advanced statistical methods......, that were designed to study developments over time, including effects of interventions. This aim has been achieved by investigating variations in aggregated Danish traffic accident series and by applying state of the art methodologies to specific case studies. The thesis comprises an introduction...
A Noise Robust Statistical Texture Model
DEFF Research Database (Denmark)
Hilger, Klaus Baggesen; Stegmann, Mikkel Bille; Larsen, Rasmus
2002-01-01
This paper presents a novel approach to the problem of obtaining a low dimensional representation of texture (pixel intensity) variation present in a training set after alignment using a Generalised Procrustes analysis.We extend the conventional analysis of training textures in the Active...... Appearance Models segmentation framework. This is accomplished by augmenting the model with an estimate of the covariance of the noise present in the training data. This results in a more compact model maximising the signal-to-noise ratio, thus favouring subspaces rich on signal, but low on noise....... Differences in the methods are illustrated on a set of left cardiac ventricles obtained using magnetic resonance imaging....
Statistical models for nuclear decay from evaporation to vaporization
Cole, A J
2000-01-01
Elements of equilibrium statistical mechanics: Introduction. Microstates and macrostates. Sub-systems and convolution. The Boltzmann distribution. Statistical mechanics and thermodynamics. The grand canonical ensemble. Equations of state for ideal and real gases. Pseudo-equilibrium. Statistical models of nuclear decay. Nuclear physics background: Introduction. Elements of the theory of nuclear reactions. Quantum mechanical description of scattering from a potential. Decay rates and widths. Level and state densities in atomic nuclei. Angular momentum in quantum mechanics. History of statistical
Hayslett, H T
1991-01-01
Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the
Estimating Predictive Variance for Statistical Gas Distribution Modelling
International Nuclear Information System (INIS)
Lilienthal, Achim J.; Asadi, Sahar; Reggente, Matteo
2009-01-01
Recent publications in statistical gas distribution modelling have proposed algorithms that model mean and variance of a distribution. This paper argues that estimating the predictive concentration variance entails not only a gradual improvement but is rather a significant step to advance the field. This is, first, since the models much better fit the particular structure of gas distributions, which exhibit strong fluctuations with considerable spatial variations as a result of the intermittent character of gas dispersal. Second, because estimating the predictive variance allows to evaluate the model quality in terms of the data likelihood. This offers a solution to the problem of ground truth evaluation, which has always been a critical issue for gas distribution modelling. It also enables solid comparisons of different modelling approaches, and provides the means to learn meta parameters of the model, to determine when the model should be updated or re-initialised, or to suggest new measurement locations based on the current model. We also point out directions of related ongoing or potential future research work.
Determining the statistical significance of particle precipitation related to EMIC waves.
Shin, D. K.; Lee, D. Y.; Noh, S. J.; Hwang, J.; Lee, J.
2016-12-01
One of the particle loss processes in the magnetosphere is precipitation into the Earth's atmosphere caused by electromagnetic ion cyclotron (EMIC) waves through pitch angle scattering. These particle precipitations can affect the dynamics of ring current protons ( tens of keV) and radiation belt electrons ( MeV) in the inner magnetosphere. Although there have been many reports to support the precipitation by EMIC waves, its effectiveness has not been demonstrated statistically. In this study, we use Van Allen Probes observations to identify a large number of EMIC waves for which we then determine their association with relativistic electron and energetic (30-80 keV) proton precipitation observed at NOAA low earth orbit satellites. We find that the detection rates of precipitation given EMIC waves in space strongly depends on the number of available low-altitude satellites: The average detection rates by one low-altitude satellite are 8.4 % for electrons and 22.2 % for protons, and they increase by a factor of > 2 if one uses observations from five NOAA satellites. This implies a strong MLT dependence of precipitation given EMIC wave in space. To demonstrate this we determine the MLT distribution of precipitations as a function of MLT of identified EMIC wave location. Finally we determine the relationship between precipitations of electrons and protons, and dependence of EMIC waves and precipitations on the solar phase years.
Introduction to statistical modelling: linear regression.
Lunt, Mark
2015-07-01
In many studies we wish to assess how a range of variables are associated with a particular outcome and also determine the strength of such relationships so that we can begin to understand how these factors relate to each other at a population level. Ultimately, we may also be interested in predicting the outcome from a series of predictive factors available at, say, a routine clinic visit. In a recent article in Rheumatology, Desai et al. did precisely that when they studied the prediction of hip and spine BMD from hand BMD and various demographic, lifestyle, disease and therapy variables in patients with RA. This article aims to introduce the statistical methodology that can be used in such a situation and explain the meaning of some of the terms employed. It will also outline some common pitfalls encountered when performing such analyses. © The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Krumbholz, Aniko; Anielski, Patricia; Gfrerer, Lena; Graw, Matthias; Geyer, Hans; Schänzer, Wilhelm; Dvorak, Jiri; Thieme, Detlef
2014-01-01
Clenbuterol is a well-established β2-agonist, which is prohibited in sports and strictly regulated for use in the livestock industry. During the last few years clenbuterol-positive results in doping controls and in samples from residents or travellers from a high-risk country were suspected to be related the illegal use of clenbuterol for fattening. A sensitive liquid chromatography-tandem mass spectrometry (LC-MS/MS) method was developed to detect low clenbuterol residues in hair with a detection limit of 0.02 pg/mg. A sub-therapeutic application study and a field study with volunteers, who have a high risk of contamination, were performed. For the application study, a total dosage of 30 µg clenbuterol was applied to 20 healthy volunteers on 5 subsequent days. One month after the beginning of the application, clenbuterol was detected in the proximal hair segment (0-1 cm) in concentrations between 0.43 and 4.76 pg/mg. For the second part, samples of 66 Mexican soccer players were analyzed. In 89% of these volunteers, clenbuterol was detectable in their hair at concentrations between 0.02 and 1.90 pg/mg. A comparison of both parts showed no statistical difference between sub-therapeutic application and contamination. In contrast, discrimination to a typical abuse of clenbuterol is apparently possible. Due to these findings results of real doping control samples can be evaluated. Copyright © 2014 John Wiley & Sons, Ltd.
Latent domain models for statistical machine translation
Hoàng, C.
2017-01-01
A data-driven approach to model translation suffers from the data mismatch problem and demands domain adaptation techniques. Given parallel training data originating from a specific domain, training an MT system on the data would result in a rather suboptimal translation for other domains. But does
Statistical modelling of fine red wine production
Directory of Open Access Journals (Sweden)
María Rosa Castro
2010-01-01
Full Text Available Producing wine is a very important economic activity in the province of San Juan in Argentina; it is therefore most important to predict production regarding the quantity of raw material needed. This work was aimed at obtaining a model relating kilograms of crushed grape to the litres of wine so produced. Such model will be used for predicting precise future values and confidence intervals for determined quantities of crushed grapes. Data from a vineyard in the province of San Juan was thus used in this work. The sampling coefficient of correlation was calculated and a dispersion diagram was then constructed; this indicated a li- neal relationship between the litres of wine obtained and the kilograms of crushed grape. Two lineal models were then adopted and variance analysis was carried out because the data came from normal populations having the same variance. The most appropriate model was obtained from this analysis; it was validated with experimental values, a good approach being obtained.
Behavioral and statistical models of educational inequality
DEFF Research Database (Denmark)
Holm, Anders; Breen, Richard
2016-01-01
This paper addresses the question of how students and their families make educational decisions. We describe three types of behavioral model that might underlie decision-making and we show that they have consequences for what decisions are made. Our study thus has policy implications if we wish...
Statistical model semiquantitatively approximates arabinoxylooligosaccharides' structural diversity
DEFF Research Database (Denmark)
Dotsenko, Gleb; Nielsen, Michael Krogsgaard; Lange, Lene
2016-01-01
(wheat flour arabinoxylan (arabinose/xylose, A/X = 0.47); grass arabinoxylan (A/X = 0.24); wheat straw arabinoxylan (A/X = 0.15); and hydrothermally pretreated wheat straw arabinoxylan (A/X = 0.05)), is semiquantitatively approximated using the proposed model. The suggested approach can be applied...
A STATISTICAL MODEL FOR STOCK ASSESSMENT OF ...
African Journals Online (AJOL)
Assessment of the status of southern bluefin tuna (SBT) by Australia and Japan has used a method (ADAPT) that imposes a number of structural restrictions, and is ... over time within the bounds of specific structure, and (3) autocorrelation in recruitment processes is considered within the likelihood framework of the model.
Model output statistics applied to wind power prediction
Energy Technology Data Exchange (ETDEWEB)
Joensen, A.; Giebel, G.; Landberg, L. [Risoe National Lab., Roskilde (Denmark); Madsen, H.; Nielsen, H.A. [The Technical Univ. of Denmark, Dept. of Mathematical Modelling, Lyngby (Denmark)
1999-03-01
Being able to predict the output of a wind farm online for a day or two in advance has significant advantages for utilities, such as better possibility to schedule fossil fuelled power plants and a better position on electricity spot markets. In this paper prediction methods based on Numerical Weather Prediction (NWP) models are considered. The spatial resolution used in NWP models implies that these predictions are not valid locally at a specific wind farm. Furthermore, due to the non-stationary nature and complexity of the processes in the atmosphere, and occasional changes of NWP models, the deviation between the predicted and the measured wind will be time dependent. If observational data is available, and if the deviation between the predictions and the observations exhibits systematic behavior, this should be corrected for; if statistical methods are used, this approaches is usually referred to as MOS (Model Output Statistics). The influence of atmospheric turbulence intensity, topography, prediction horizon length and auto-correlation of wind speed and power is considered, and to take the time-variations into account, adaptive estimation methods are applied. Three estimation techniques are considered and compared, Extended Kalman Filtering, recursive least squares and a new modified recursive least squares algorithm. (au) EU-JOULE-3. 11 refs.
Williams, Arnold C.; Pachowicz, Peter W.
2004-09-01
Current mine detection research indicates that no single sensor or single look from a sensor will detect mines/minefields in a real-time manner at a performance level suitable for a forward maneuver unit. Hence, the integrated development of detectors and fusion algorithms are of primary importance. A problem in this development process has been the evaluation of these algorithms with relatively small data sets, leading to anecdotal and frequently over trained results. These anecdotal results are often unreliable and conflicting among various sensors and algorithms. Consequently, the physical phenomena that ought to be exploited and the performance benefits of this exploitation are often ambiguous. The Army RDECOM CERDEC Night Vision Laboratory and Electron Sensors Directorate has collected large amounts of multisensor data such that statistically significant evaluations of detection and fusion algorithms can be obtained. Even with these large data sets care must be taken in algorithm design and data processing to achieve statistically significant performance results for combined detectors and fusion algorithms. This paper discusses statistically significant detection and combined multilook fusion results for the Ellipse Detector (ED) and the Piecewise Level Fusion Algorithm (PLFA). These statistically significant performance results are characterized by ROC curves that have been obtained through processing this multilook data for the high resolution SAR data of the Veridian X-Band radar. We discuss the implications of these results on mine detection and the importance of statistical significance, sample size, ground truth, and algorithm design in performance evaluation.
van Aert, Robbie C M; van Assen, Marcel A L M
2017-09-21
The unrealistically high rate of positive results within psychology has increased the attention to replication research. However, researchers who conduct a replication and want to statistically combine the results of their replication with a statistically significant original study encounter problems when using traditional meta-analysis techniques. The original study's effect size is most probably overestimated because it is statistically significant, and this bias is not taken into consideration in traditional meta-analysis. We have developed a hybrid method that does take the statistical significance of an original study into account and enables (a) accurate effect size estimation, (b) estimation of a confidence interval, and (c) testing of the null hypothesis of no effect. We analytically approximate the performance of the hybrid method and describe its statistical properties. By applying the hybrid method to data from the Reproducibility Project: Psychology (Open Science Collaboration, 2015), we demonstrate that the conclusions based on the hybrid method are often in line with those of the replication, suggesting that many published psychological studies have smaller effect sizes than those reported in the original study, and that some effects may even be absent. We offer hands-on guidelines for how to statistically combine an original study and replication, and have developed a Web-based application ( https://rvanaert.shinyapps.io/hybrid ) for applying the hybrid method.
Fluctuations of offshore wind generation: Statistical modelling
DEFF Research Database (Denmark)
Pinson, Pierre; Christensen, Lasse E.A.; Madsen, Henrik
2007-01-01
The magnitude of power fluctuations at large offshore wind farms has a significant impact on the control and management strategies of their power output. If focusing on the minute scale, one observes successive periods with smaller and larger power fluctuations. It seems that different regimes yi...
Modeling statistical properties of written text.
Directory of Open Access Journals (Sweden)
M Angeles Serrano
Full Text Available Written text is one of the fundamental manifestations of human language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Among these regularities, only Zipf's law has been explored in depth. Other basic properties, such as the existence of bursts of rare words in specific documents, have only been studied independently of each other and mainly by descriptive models. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on burstiness, Heaps' law describing the sublinear growth of vocabulary size with the length of a document, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science and linguistics.
Advanced data analysis in neuroscience integrating statistical and computational models
Durstewitz, Daniel
2017-01-01
This book is intended for use in advanced graduate courses in statistics / machine learning, as well as for all experimental neuroscientists seeking to understand statistical methods at a deeper level, and theoretical neuroscientists with a limited background in statistics. It reviews almost all areas of applied statistics, from basic statistical estimation and test theory, linear and nonlinear approaches for regression and classification, to model selection and methods for dimensionality reduction, density estimation and unsupervised clustering. Its focus, however, is linear and nonlinear time series analysis from a dynamical systems perspective, based on which it aims to convey an understanding also of the dynamical mechanisms that could have generated observed time series. Further, it integrates computational modeling of behavioral and neural dynamics with statistical estimation and hypothesis testing. This way computational models in neuroscience are not only explanat ory frameworks, but become powerfu...
Domain analysis and modeling to improve comparability of health statistics.
Okada, M; Hashimoto, H; Ohida, T
2001-01-01
Health statistics is an essential element to improve the ability of managers of health institutions, healthcare researchers, policy makers, and health professionals to formulate appropriate course of reactions and to make decisions based on evidence. To ensure adequate health statistics, standards are of critical importance. A study on healthcare statistics domain analysis is underway in an effort to improve usability and comparability of health statistics. The ongoing study focuses on structuring the domain knowledge and making the knowledge explicit with a data element dictionary being the core. Supplemental to the dictionary are a domain term list, a terminology dictionary, and a data model to help organize the concepts constituting the health statistics domain.
Tests of Statistical Significance and Background Estimation in Gamma-Ray Air Shower Experiments
Fleysher, R.; Fleysher, L.; Nemethy, P.; Mincer, A. I.; Haines, T. J.
2004-03-01
In this paper we discuss established methods of significance calculation for testing the existence of a signal in the presence of unknown background and point out the limits of their applicability. We then introduce a new self-consistent scheme for source detection and discuss some of its properties. The method overcomes weaknesses of those used previously and allows incorporating background anisotropies by vetoing existing localized sources and sinks on the sky and compensating for known large-scale anisotropies. By giving an example using the Milagro gamma-ray observatory data, we demonstrate how the method can be employed to relax the detector stability assumption. The new method is universal and can be used with any large field-of-view detector, in which the object of investigation, steady or transient, point or extended, traverses its field of view.
Directory of Open Access Journals (Sweden)
Márcio Mourão
Full Text Available We investigated commonly used methods (Autocorrelation, Enright, and Discrete Fourier Transform to estimate the periodicity of oscillatory data and determine which method most accurately estimated periods while being least vulnerable to the presence of noise. Both simulated and experimental data were used in the analysis performed. We determined the significance of calculated periods by applying these methods to several random permutations of the data and then calculating the probability of obtaining the period's peak in the corresponding periodograms. Our analysis suggests that the Enright method is the most accurate for estimating the period of oscillatory data. We further show that to accurately estimate the period of oscillatory data, it is necessary that at least five cycles of data are sampled, using at least four data points per cycle. These results suggest that the Enright method should be more widely applied in order to improve the analysis of oscillatory data.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Spectral statistics in particles-rotor model and cranking model
Zhou Xian Rong; Zhao En Guang; Guo Lu
2002-01-01
Spectral statistics for six particles in single-j and two-j model coupled with a deformed core are studied in the frames of particles-rotor model and cranking shell model. The nearest-neighbor-distribution of energy levels and spectral rigidity are studied as a function of the spin or cranking frequency, respectively. The results of single-j shell are compared with those in two-j case. The system becomes more regular when single-j space (i sub 1 sub 3 sub / sub 2) is replaced by two-j shell (g sub 7 sub / sub 2 + d sub 5 sub / sub 2), although the basis size of the configuration space is unchanged. However, the degree of chaoticity of the system changes slightly when configuration space is enlarged by extending single-j shell (i sub 1 sub 3 sub / sub 2) to two-j shell (i sub 1 sub 3 sub / sub 2 + g sub 9 sub / sub 2). Nuclear chaotic behavior is studied when authors take a two-body interaction as delta force and pairing interaction, respectively
Saha, Ranajit; Pan, Sudip; Chattaraj, Pratim K
2016-11-05
The validity of the maximum hardness principle (MHP) is tested in the cases of 50 chemical reactions, most of which are organic in nature and exhibit anomeric effect. To explore the effect of the level of theory on the validity of MHP in an exothermic reaction, B3LYP/6-311++G(2df,3pd) and LC-BLYP/6-311++G(2df,3pd) (def2-QZVP for iodine and mercury) levels are employed. Different approximations like the geometric mean of hardness and combined hardness are considered in case there are multiple reactants and/or products. It is observed that, based on the geometric mean of hardness, while 82% of the studied reactions obey the MHP at the B3LYP level, 84% of the reactions follow this rule at the LC-BLYP level. Most of the reactions possess the hardest species on the product side. A 50% null hypothesis is rejected at a 1% level of significance.
A Statistical Graphical Model of the California Reservoir System
Taeb, A.; Reager, J. T.; Turmon, M.; Chandrasekaran, V.
2017-11-01
The recent California drought has highlighted the potential vulnerability of the state's water management infrastructure to multiyear dry intervals. Due to the high complexity of the network, dynamic storage changes in California reservoirs on a state-wide scale have previously been difficult to model using either traditional statistical or physical approaches. Indeed, although there is a significant line of research on exploring models for single (or a small number of) reservoirs, these approaches are not amenable to a system-wide modeling of the California reservoir network due to the spatial and hydrological heterogeneities of the system. In this work, we develop a state-wide statistical graphical model to characterize the dependencies among a collection of 55 major California reservoirs across the state; this model is defined with respect to a graph in which the nodes index reservoirs and the edges specify the relationships or dependencies between reservoirs. We obtain and validate this model in a data-driven manner based on reservoir volumes over the period 2003-2016. A key feature of our framework is a quantification of the effects of external phenomena that influence the entire reservoir network. We further characterize the degree to which physical factors (e.g., state-wide Palmer Drought Severity Index (PDSI), average temperature, snow pack) and economic factors (e.g., consumer price index, number of agricultural workers) explain these external influences. As a consequence of this analysis, we obtain a system-wide health diagnosis of the reservoir network as a function of PDSI.
Statistical modelling in biostatistics and bioinformatics selected papers
Peng, Defen
2014-01-01
This book presents selected papers on statistical model development related mainly to the fields of Biostatistics and Bioinformatics. The coverage of the material falls squarely into the following categories: (a) Survival analysis and multivariate survival analysis, (b) Time series and longitudinal data analysis, (c) Statistical model development and (d) Applied statistical modelling. Innovations in statistical modelling are presented throughout each of the four areas, with some intriguing new ideas on hierarchical generalized non-linear models and on frailty models with structural dispersion, just to mention two examples. The contributors include distinguished international statisticians such as Philip Hougaard, John Hinde, Il Do Ha, Roger Payne and Alessandra Durio, among others, as well as promising newcomers. Some of the contributions have come from researchers working in the BIO-SI research programme on Biostatistics and Bioinformatics, centred on the Universities of Limerick and Galway in Ireland and fu...
Functional summary statistics for the Johnson-Mehl model
DEFF Research Database (Denmark)
Møller, Jesper; Ghorbani, Mohammad
The Johnson-Mehl germination-growth model is a spatio-temporal point process model which among other things have been used for the description of neurotransmitters datasets. However, for such datasets parametric Johnson-Mehl models fitted by maximum likelihood have yet not been evaluated by means...... of functional summary statistics. This paper therefore invents four functional summary statistics adapted to the Johnson-Mehl model, with two of them based on the second-order properties and the other two on the nuclei-boundary distances for the associated Johnson-Mehl tessellation. The functional summary...... statistics theoretical properties are investigated, non-parametric estimators are suggested, and their usefulness for model checking is examined in a simulation study. The functional summary statistics are also used for checking fitted parametric Johnson-Mehl models for a neurotransmitters dataset....
Fitting statistical models in bivariate allometry.
Packard, Gary C; Birchard, Geoffrey F; Boardman, Thomas J
2011-08-01
Several attempts have been made in recent years to formulate a general explanation for what appear to be recurring patterns of allometric variation in morphology, physiology, and ecology of both plants and animals (e.g. the Metabolic Theory of Ecology, the Allometric Cascade, the Metabolic-Level Boundaries hypothesis). However, published estimates for parameters in allometric equations often are inaccurate, owing to undetected bias introduced by the traditional method for fitting lines to empirical data. The traditional method entails fitting a straight line to logarithmic transformations of the original data and then back-transforming the resulting equation to the arithmetic scale. Because of fundamental changes in distributions attending transformation of predictor and response variables, the traditional practice may cause influential outliers to go undetected, and it may result in an underparameterized model being fitted to the data. Also, substantial bias may be introduced by the insidious rotational distortion that accompanies regression analyses performed on logarithms. Consequently, the aforementioned patterns of allometric variation may be illusions, and the theoretical explanations may be wide of the mark. Problems attending the traditional procedure can be largely avoided in future research simply by performing preliminary analyses on arithmetic values and by validating fitted equations in the arithmetic domain. The goal of most allometric research is to characterize relationships between biological variables and body size, and this is done most effectively with data expressed in the units of measurement. Back-transforming from a straight line fitted to logarithms is not a generally reliable way to estimate an allometric equation in the original scale. © 2010 The Authors. Biological Reviews © 2010 Cambridge Philosophical Society.
International Nuclear Information System (INIS)
2005-01-01
For the years 2004 and 2005 the figures shown in the tables of Energy Review are partly preliminary. The annual statistics published in Energy Review are presented in more detail in a publication called Energy Statistics that comes out yearly. Energy Statistics also includes historical time-series over a longer period of time (see e.g. Energy Statistics, Statistics Finland, Helsinki 2004.) The applied energy units and conversion coefficients are shown in the back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supplies and total consumption of electricity GWh, Energy imports by country of origin in January-June 2003, Energy exports by recipient country in January-June 2003, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes, precautionary stock fees and oil pollution fees
Mixed deterministic statistical modelling of regional ozone air pollution
Kalenderski, Stoitchko
2011-03-17
We develop a physically motivated statistical model for regional ozone air pollution by separating the ground-level pollutant concentration field into three components, namely: transport, local production and large-scale mean trend mostly dominated by emission rates. The model is novel in the field of environmental spatial statistics in that it is a combined deterministic-statistical model, which gives a new perspective to the modelling of air pollution. The model is presented in a Bayesian hierarchical formalism, and explicitly accounts for advection of pollutants, using the advection equation. We apply the model to a specific case of regional ozone pollution-the Lower Fraser valley of British Columbia, Canada. As a predictive tool, we demonstrate that the model vastly outperforms existing, simpler modelling approaches. Our study highlights the importance of simultaneously considering different aspects of an air pollution problem as well as taking into account the physical bases that govern the processes of interest. © 2011 John Wiley & Sons, Ltd..
Adams, James; Howsmon, Daniel P; Kruger, Uwe; Geis, Elizabeth; Gehn, Eva; Fimbres, Valeria; Pollard, Elena; Mitchell, Jessica; Ingram, Julie; Hellmers, Robert; Quig, David; Hahn, Juergen
2017-01-01
A number of previous studies examined a possible association of toxic metals and autism, and over half of those studies suggest that toxic metal levels are different in individuals with Autism Spectrum Disorders (ASD). Additionally, several studies found that those levels correlate with the severity of ASD. In order to further investigate these points, this paper performs the most detailed statistical analysis to date of a data set in this field. First morning urine samples were collected from 67 children and adults with ASD and 50 neurotypical controls of similar age and gender. The samples were analyzed to determine the levels of 10 urinary toxic metals (UTM). Autism-related symptoms were assessed with eleven behavioral measures. Statistical analysis was used to distinguish participants on the ASD spectrum and neurotypical participants based upon the UTM data alone. The analysis also included examining the association of autism severity with toxic metal excretion data using linear and nonlinear analysis. "Leave-one-out" cross-validation was used to ensure statistical independence of results. Average excretion levels of several toxic metals (lead, tin, thallium, antimony) were significantly higher in the ASD group. However, ASD classification using univariate statistics proved difficult due to large variability, but nonlinear multivariate statistical analysis significantly improved ASD classification with Type I/II errors of 15% and 18%, respectively. These results clearly indicate that the urinary toxic metal excretion profiles of participants in the ASD group were significantly different from those of the neurotypical participants. Similarly, nonlinear methods determined a significantly stronger association between the behavioral measures and toxic metal excretion. The association was strongest for the Aberrant Behavior Checklist (including subscales on Irritability, Stereotypy, Hyperactivity, and Inappropriate Speech), but significant associations were found
Adams, James; Kruger, Uwe; Geis, Elizabeth; Gehn, Eva; Fimbres, Valeria; Pollard, Elena; Mitchell, Jessica; Ingram, Julie; Hellmers, Robert; Quig, David; Hahn, Juergen
2017-01-01
Introduction A number of previous studies examined a possible association of toxic metals and autism, and over half of those studies suggest that toxic metal levels are different in individuals with Autism Spectrum Disorders (ASD). Additionally, several studies found that those levels correlate with the severity of ASD. Methods In order to further investigate these points, this paper performs the most detailed statistical analysis to date of a data set in this field. First morning urine samples were collected from 67 children and adults with ASD and 50 neurotypical controls of similar age and gender. The samples were analyzed to determine the levels of 10 urinary toxic metals (UTM). Autism-related symptoms were assessed with eleven behavioral measures. Statistical analysis was used to distinguish participants on the ASD spectrum and neurotypical participants based upon the UTM data alone. The analysis also included examining the association of autism severity with toxic metal excretion data using linear and nonlinear analysis. “Leave-one-out” cross-validation was used to ensure statistical independence of results. Results and Discussion Average excretion levels of several toxic metals (lead, tin, thallium, antimony) were significantly higher in the ASD group. However, ASD classification using univariate statistics proved difficult due to large variability, but nonlinear multivariate statistical analysis significantly improved ASD classification with Type I/II errors of 15% and 18%, respectively. These results clearly indicate that the urinary toxic metal excretion profiles of participants in the ASD group were significantly different from those of the neurotypical participants. Similarly, nonlinear methods determined a significantly stronger association between the behavioral measures and toxic metal excretion. The association was strongest for the Aberrant Behavior Checklist (including subscales on Irritability, Stereotypy, Hyperactivity, and Inappropriate
Reliability of the ADI-R for the single case-part II: clinical versus statistical significance.
Cicchetti, Domenic V; Lord, Catherine; Koenig, Kathy; Klin, Ami; Volkmar, Fred R
2014-12-01
In an earlier investigation, the authors assessed the reliability of the ADI-R when multiple clinicians evaluated a single case, here a female 3 year old toddler suspected of having an autism spectrum disorder (Cicchetti et al. in J Autism Dev Disord 38:764-770, 2008). Applying the clinical criteria of Cicchetti and Sparrow (Am J Men Def 86:127-137, 1981); and those of Cicchetti et al. (Child Neuropsychol 126-137, 1995): 74 % of the ADI-R items showed 100 % agreement; 6 % showed excellent agreement; 7 % showed good agreement; 3 % manifested average agreement; and the remaining 10 % evidenced poor agreement. In this follow-up investigation, the authors described and applied a novel method for determining levels of statistical significance of the reliability coefficients obtained in the earlier investigation. It is based upon a modification of the Z test for comparing a given level of inter-examiner reliability with a lower limit value of 70 % (Dixon and Massey in Introduction to statistical analysis. McGraw-Hill, New York, 1957). Results indicated that every item producing a clinically acceptable level of inter-examiner reliability was also statistically significant. However, the reverse was not true, since a number of the items with statistically significant reliability levels did not reach levels of agreement that were clinically meaningful. This indicated that clinical significance was an accurate marker of statistical significance. The generalization of these findings to other areas of diagnostic interest and importance is also examined.
International Nuclear Information System (INIS)
2000-01-01
For the year 1999 and 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g., Energiatilastot 1998, Statistics Finland, Helsinki 1999, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-March 2000, Energy exports by recipient country in January-March 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products
International Nuclear Information System (INIS)
1999-01-01
For the year 1998 and the year 1999, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g. Energiatilastot 1998, Statistics Finland, Helsinki 1999, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-June 1999, Energy exports by recipient country in January-June 1999, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products
International Nuclear Information System (INIS)
2001-01-01
For the year 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g. Energiatilastot 1999, Statistics Finland, Helsinki 2000, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions from the use of fossil fuels, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in 2000, Energy exports by recipient country in 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products
Bayesian Test of Significance for Conditional Independence: The Multinomial Model
Directory of Open Access Journals (Sweden)
Pablo de Morais Andrade
2014-03-01
Full Text Available Conditional independence tests have received special attention lately in machine learning and computational intelligence related literature as an important indicator of the relationship among the variables used by their models. In the field of probabilistic graphical models, which includes Bayesian network models, conditional independence tests are especially important for the task of learning the probabilistic graphical model structure from data. In this paper, we propose the full Bayesian significance test for tests of conditional independence for discrete datasets. The full Bayesian significance test is a powerful Bayesian test for precise hypothesis, as an alternative to the frequentist’s significance tests (characterized by the calculation of the p-value.
Kolmogorov complexity, pseudorandom generators and statistical models testing
Czech Academy of Sciences Publication Activity Database
Šindelář, Jan; Boček, Pavel
2002-01-01
Roč. 38, č. 6 (2002), s. 747-759 ISSN 0023-5954 R&D Projects: GA ČR GA102/99/1564 Institutional research plan: CEZ:AV0Z1075907 Keywords : Kolmogorov complexity * pseudorandom generators * statistical models testing Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.341, year: 2002
Role of scaling in the statistical modelling of finance
Indian Academy of Sciences (India)
Economics and mathematical finance are multidisciplinary fields in which the ten- dency of statistical physicists to focus on universal laws has been criticized some- ..... is coherent and catches the essential statistical features of a long index history. A very important test for the proposed model concerns the scaling of the ...
Allan, G Michael; Finley, Caitlin R; McCormack, James; Kumar, Vivek; Kwong, Simon; Braschi, Emelie; Korownyk, Christina; Kolber, Michael R; Lindblad, Adriennne J; Babenko, Oksana; Garrison, Scott
2017-03-20
While journals and reporting guidelines recommend the presentation of confidence intervals, many authors adhere strictly to statistically significant testing. Our objective was to determine what proportions of not statistically significant (NSS) cardiovascular trials include potentially clinically meaningful effects in primary outcomes and if these are associated with authors' conclusions. Cardiovascular studies published in six high-impact journals between 1 January 2010 and 31 December 2014 were identified via PubMed. Two independent reviewers selected trials with major adverse cardiovascular events (stroke, myocardial infarction, or cardiovascular death) as primary outcomes and extracted data on trial characteristics, quality, and primary outcome. Potentially clinically meaningful effects were defined broadly as a relative risk point estimate ≤0.94 (based on the effects of ezetimibe) and/or a lower confidence interval ≤0.75 (based on the effects of statins). We identified 127 randomized trial comparisons from 3200 articles. The primary outcomes were statistically significant (SS) favoring treatment in 21% (27/127), NSS in 72% (92/127), and SS favoring control in 6% (8/127). In 61% of NSS trials (56/92), the point estimate and/or lower confidence interval included potentially meaningful effects. Both point estimate and confidence interval included potentially meaningful effects in 67% of trials (12/18) in which authors' concluded that treatment was superior, in 28% (16/58) with a neutral conclusion, and in 6% (1/16) in which authors' concluded that control was superior. In a sensitivity analysis, 26% of NSS trials would include potential meaningful effects with relative risk thresholds of point estimate ≤0.85 and/or a lower confidence interval ≤0.65. Point estimates and/or confidence intervals included potentially clinically meaningful effects in up to 61% of NSS cardiovascular trials. Authors' conclusions often reflect potentially meaningful results of
Assessing risk factors for dental caries: a statistical modeling approach.
Trottini, Mario; Bossù, Maurizio; Corridore, Denise; Ierardo, Gaetano; Luzzi, Valeria; Saccucci, Matteo; Polimeni, Antonella
2015-01-01
The problem of identifying potential determinants and predictors of dental caries is of key importance in caries research and it has received considerable attention in the scientific literature. From the methodological side, a broad range of statistical models is currently available to analyze dental caries indices (DMFT, dmfs, etc.). These models have been applied in several studies to investigate the impact of different risk factors on the cumulative severity of dental caries experience. However, in most of the cases (i) these studies focus on a very specific subset of risk factors; and (ii) in the statistical modeling only few candidate models are considered and model selection is at best only marginally addressed. As a result, our understanding of the robustness of the statistical inferences with respect to the choice of the model is very limited; the richness of the set of statistical models available for analysis in only marginally exploited; and inferences could be biased due the omission of potentially important confounding variables in the model's specification. In this paper we argue that these limitations can be overcome considering a general class of candidate models and carefully exploring the model space using standard model selection criteria and measures of global fit and predictive performance of the candidate models. Strengths and limitations of the proposed approach are illustrated with a real data set. In our illustration the model space contains more than 2.6 million models, which require inferences to be adjusted for 'optimism'.
Statistical Damage Detection of Civil Engineering Structures using ARMAV Models
DEFF Research Database (Denmark)
Andersen, P.; Kirkegaard, Poul Henning
In this paper a statistically based damage detection of a lattice steel mast is performed. By estimation of the modal parameters and their uncertainties it is possible to detect whether some of the modal parameters have changed with a statistical significance. The estimation of the uncertainties ...
Improving statistical reasoning theoretical models and practical implications
Sedlmeier, Peter
1999-01-01
This book focuses on how statistical reasoning works and on training programs that can exploit people''s natural cognitive capabilities to improve their statistical reasoning. Training programs that take into account findings from evolutionary psychology and instructional theory are shown to have substantially larger effects that are more stable over time than previous training regimens. The theoretical implications are traced in a neural network model of human performance on statistical reasoning problems. This book apppeals to judgment and decision making researchers and other cognitive scientists, as well as to teachers of statistics and probabilistic reasoning.
Schedulability of Herschel revisited using statistical model checking
DEFF Research Database (Denmark)
David, Alexandre; Larsen, Kim Guldstrand; Legay, Axel
2015-01-01
to obtain some guarantee on the (un)schedulability of the model even in the presence of undecidability. Two methods are considered: symbolic model checking and statistical model checking. Since the model uses stop-watches, the reachability problem becomes undecidable so we are using an over......-approximation technique. We can safely conclude that the system is schedulable for varying values of BCET. For the cases where deadlines are violated, we use polyhedra to try to confirm the witnesses. Our alternative method to confirm non-schedulability uses statistical model-checking (SMC) to generate counter...
Some remarks on the statistical model of heavy ion collisions
International Nuclear Information System (INIS)
Koch, V.
2003-01-01
This contribution is an attempt to assess what can be learned from the remarkable success of this statistical model in describing ratios of particle abundances in ultra-relativistic heavy ion collisions
International Nuclear Information System (INIS)
2003-01-01
For the year 2002, part of the figures shown in the tables of the Energy Review are partly preliminary. The annual statistics of the Energy Review also includes historical time-series over a longer period (see e.g. Energiatilastot 2001, Statistics Finland, Helsinki 2002). The applied energy units and conversion coefficients are shown in the inside back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supply and total consumption of electricity GWh, Energy imports by country of origin in January-June 2003, Energy exports by recipient country in January-June 2003, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Excise taxes, precautionary stock fees on oil pollution fees on energy products
International Nuclear Information System (INIS)
2000-01-01
For the year 1999 and 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy also includes historical time series over a longer period (see e.g., Energiatilastot 1999, Statistics Finland, Helsinki 2000, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-June 2000, Energy exports by recipient country in January-June 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products
International Nuclear Information System (INIS)
2004-01-01
For the year 2003 and 2004, the figures shown in the tables of the Energy Review are partly preliminary. The annual statistics of the Energy Review also includes historical time-series over a longer period (see e.g. Energiatilastot, Statistics Finland, Helsinki 2003, ISSN 0785-3165). The applied energy units and conversion coefficients are shown in the inside back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supplies and total consumption of electricity GWh, Energy imports by country of origin in January-March 2004, Energy exports by recipient country in January-March 2004, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Excise taxes, precautionary stock fees on oil pollution fees
Possibilities of the Statistical Scoring Models' Application at Lithuanian Banks
Dzidzevičiūtė, Laima
2013-01-01
The goal of this dissertation is to develop the rating system of Lithuanian companies based on the statistical scoring model and assess the possibilities of this system‘s application at Lithuanian banks. The dissertation consists of three Chapters. Development and application peculiarities of rating systems based on statistical scoring models are described in the first Chapter. In the second Chapter the results of the survey of commercial banks and foreign bank branches, operating in the coun...
A no extensive statistical model for the nucleon structure function
Energy Technology Data Exchange (ETDEWEB)
Trevisan, Luis A. [Departamento de Matematica e Estatistica, Universidade Estadual de Ponta Grossa, 84010-790, Ponta Grossa, PR (Brazil); Mirez, Carlos [Instituto de Ciencia, Engenharia e Tecnologia - ICET, Universidade Federal dos Vales do Jequitinhonha e Mucuri - UFVJM, Campus do Mucuri, Rua do Cruzeiro 01, Jardim Sao Paulo, 39803-371, Teofilo Otoni, Minas Gerais (Brazil)
2013-03-25
We studied an application of nonextensive thermodynamics to describe the structure function of nucleon, in a model where the usual Fermi-Dirac and Bose-Einstein energy distribution were replaced by the equivalent functions of the q-statistical. The parameters of the model are given by an effective temperature T, the q parameter (from Tsallis statistics), and two chemical potentials given by the corresponding up (u) and down (d) quark normalization in the nucleon.
Improved analyses using function datasets and statistical modeling
John S. Hogland; Nathaniel M. Anderson
2014-01-01
Raster modeling is an integral component of spatial analysis. However, conventional raster modeling techniques can require a substantial amount of processing time and storage space and have limited statistical functionality and machine learning algorithms. To address this issue, we developed a new modeling framework using C# and ArcObjects and integrated that framework...
Thiessen, Erik D
2017-01-05
Statistical learning has been studied in a variety of different tasks, including word segmentation, object identification, category learning, artificial grammar learning and serial reaction time tasks (e.g. Saffran et al. 1996 Science 274: , 1926-1928; Orban et al. 2008 Proceedings of the National Academy of Sciences 105: , 2745-2750; Thiessen & Yee 2010 Child Development 81: , 1287-1303; Saffran 2002 Journal of Memory and Language 47: , 172-196; Misyak & Christiansen 2012 Language Learning 62: , 302-331). The difference among these tasks raises questions about whether they all depend on the same kinds of underlying processes and computations, or whether they are tapping into different underlying mechanisms. Prior theoretical approaches to statistical learning have often tried to explain or model learning in a single task. However, in many cases these approaches appear inadequate to explain performance in multiple tasks. For example, explaining word segmentation via the computation of sequential statistics (such as transitional probability) provides little insight into the nature of sensitivity to regularities among simultaneously presented features. In this article, we will present a formal computational approach that we believe is a good candidate to provide a unifying framework to explore and explain learning in a wide variety of statistical learning tasks. This framework suggests that statistical learning arises from a set of processes that are inherent in memory systems, including activation, interference, integration of information and forgetting (e.g. Perruchet & Vinter 1998 Journal of Memory and Language 39: , 246-263; Thiessen et al. 2013 Psychological Bulletin 139: , 792-814). From this perspective, statistical learning does not involve explicit computation of statistics, but rather the extraction of elements of the input into memory traces, and subsequent integration across those memory traces that emphasize consistent information (Thiessen and Pavlik
Flashover of a vacuum-insulator interface: A statistical model
Directory of Open Access Journals (Sweden)
W. A. Stygar
2004-07-01
Full Text Available We have developed a statistical model for the flashover of a 45° vacuum-insulator interface (such as would be found in an accelerator subject to a pulsed electric field. The model assumes that the initiation of a flashover plasma is a stochastic process, that the characteristic statistical component of the flashover delay time is much greater than the plasma formative time, and that the average rate at which flashovers occur is a power-law function of the instantaneous value of the electric field. Under these conditions, we find that the flashover probability is given by 1-exp(-E_{p}^{β}t_{eff}C/k^{β}, where E_{p} is the peak value in time of the spatially averaged electric field E(t, t_{eff}≡∫[E(t/E_{p}]^{β}dt is the effective pulse width, C is the insulator circumference, k∝exp(λ/d, and β and λ are constants. We define E(t as V(t/d, where V(t is the voltage across the insulator and d is the insulator thickness. Since the model assumes that flashovers occur at random azimuthal locations along the insulator, it does not apply to systems that have a significant defect, i.e., a location contaminated with debris or compromised by an imperfection at which flashovers repeatedly take place, and which prevents a random spatial distribution. The model is consistent with flashover measurements to within 7% for pulse widths between 0.5 ns and 10 μs, and to within a factor of 2 between 0.5 ns and 90 s (a span of over 11 orders of magnitude. For these measurements, E_{p} ranges from 64 to 651 kV/cm, d from 0.50 to 4.32 cm, and C from 4.96 to 95.74 cm. The model is significantly more accurate, and is valid over a wider range of parameters, than the J. C. Martin flashover relation that has been in use since 1971 [J. C. Martin on Pulsed Power, edited by T. H. Martin, A. H. Guenther, and M. Kristiansen (Plenum, New York, 1996]. We have generalized the statistical model to estimate the total-flashover probability of an
Statistics Based Models for the Dynamics of Chernivtsi Children Disease
Directory of Open Access Journals (Sweden)
Igor G. Nesteruk
2017-10-01
Full Text Available Background. Simple mathematical models of contamination and SIR-model of spreading an infection were used to simulate the time dynamics of the unknown before children disease, which occurred in Chernivtsi (Ukraine. The cause of many cases of alopecia, which began in this city in August 1988 is still not fully clarified. According to the official report of the governmental commission, the last new cases occurred in the middle of November 1988, and the reason of the illness was reported as chemical exogenous intoxication. Later this illness became the name “Chernivtsi chemical disease”. Nevertheless, the significantly increased number of new cases of the local alopecia was registered almost three years and is still not clarified. Objective. The comparison of two different versions of the disease: chemical exogenous intoxication and infection. Identification of the parameters of mathematical models and prediction of the disease development. Methods. Analytical solutions of the contamination models and SIR-model for an epidemic are obtained. The optimal values of parameters with the use of linear regression were found. Results. The optimal values of the models parameters with the use of statistical approach were identified. The calculations showed that the infectious version of the disease is more reliable in comparison with the popular contamination one. The possible date of the epidemic beginning was estimated. Conclusions. The optimal parameters of SIR-model allow calculating the realistic number of victims and other characteristics of possible epidemic. They also show that increased number of cases of local alopecia could be a part of the same epidemic as “Chernivtsi chemical disease”.
A Statistical Model for Regional Tornado Climate Studies.
Directory of Open Access Journals (Sweden)
Thomas H Jagger
Full Text Available Tornado reports are locally rare, often clustered, and of variable quality making it difficult to use them directly to describe regional tornado climatology. Here a statistical model is demonstrated that overcomes some of these difficulties and produces a smoothed regional-scale climatology of tornado occurrences. The model is applied to data aggregated at the level of counties. These data include annual population, annual tornado counts and an index of terrain roughness. The model has a term to capture the smoothed frequency relative to the state average. The model is used to examine whether terrain roughness is related to tornado frequency and whether there are differences in tornado activity by County Warning Area (CWA. A key finding is that tornado reports increase by 13% for a two-fold increase in population across Kansas after accounting for improvements in rating procedures. Independent of this relationship, tornadoes have been increasing at an annual rate of 1.9%. Another finding is the pattern of correlated residuals showing more Kansas tornadoes in a corridor of counties running roughly north to south across the west central part of the state consistent with the dryline climatology. The model is significantly improved by adding terrain roughness. The effect amounts to an 18% reduction in the number of tornadoes for every ten meter increase in elevation standard deviation. The model indicates that tornadoes are 51% more likely to occur in counties served by the CWAs of DDC and GID than elsewhere in the state. Flexibility of the model is illustrated by fitting it to data from Illinois, Mississippi, South Dakota, and Ohio.
Models for probability and statistical inference theory and applications
Stapleton, James H
2007-01-01
This concise, yet thorough, book is enhanced with simulations and graphs to build the intuition of readersModels for Probability and Statistical Inference was written over a five-year period and serves as a comprehensive treatment of the fundamentals of probability and statistical inference. With detailed theoretical coverage found throughout the book, readers acquire the fundamentals needed to advance to more specialized topics, such as sampling, linear models, design of experiments, statistical computing, survival analysis, and bootstrapping.Ideal as a textbook for a two-semester sequence on probability and statistical inference, early chapters provide coverage on probability and include discussions of: discrete models and random variables; discrete distributions including binomial, hypergeometric, geometric, and Poisson; continuous, normal, gamma, and conditional distributions; and limit theory. Since limit theory is usually the most difficult topic for readers to master, the author thoroughly discusses mo...
Kim, Sung-Min; Choi, Yosoon
2017-06-18
To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z -score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z -scores: high content with a high z -score (HH), high content with a low z -score (HL), low content with a high z -score (LH), and low content with a low z -score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1-4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.
Monsarrat, Paul; Vergnes, Jean-Noel
2018-01-01
In medicine, effect sizes (ESs) allow the effects of independent variables (including risk/protective factors or treatment interventions) on dependent variables (e.g., health outcomes) to be quantified. Given that many public health decisions and health care policies are based on ES estimates, it is important to assess how ESs are used in the biomedical literature and to investigate potential trends in their reporting over time. Through a big data approach, the text mining process automatically extracted 814 120 ESs from 13 322 754 PubMed abstracts. Eligible ESs were risk ratio, odds ratio, and hazard ratio, along with their confidence intervals. Here we show a remarkable decrease of ES values in PubMed abstracts between 1990 and 2015 while, concomitantly, results become more often statistically significant. Medians of ES values have decreased over time for both "risk" and "protective" values. This trend was found in nearly all fields of biomedical research, with the most marked downward tendency in genetics. Over the same period, the proportion of statistically significant ESs increased regularly: among the abstracts with at least 1 ES, 74% were statistically significant in 1990-1995, vs 85% in 2010-2015. whereas decreasing ESs could be an intrinsic evolution in biomedical research, the concomitant increase of statistically significant results is more intriguing. Although it is likely that growing sample sizes in biomedical research could explain these results, another explanation may lie in the "publish or perish" context of scientific research, with the probability of a growing orientation toward sensationalism in research reports. Important provisions must be made to improve the credibility of biomedical research and limit waste of resources. © The Authors 2017. Published by Oxford University Press.
Directory of Open Access Journals (Sweden)
Sung-Min Kim
2017-06-01
Full Text Available To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z-score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z-scores: high content with a high z-score (HH, high content with a low z-score (HL, low content with a high z-score (LH, and low content with a low z-score (LL. The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1–4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.
Statistical detection model for eddy-current systems
International Nuclear Information System (INIS)
Martinez, J.R.; Bahr, A.J.
1984-01-01
This chapter presents a detailed analysis of some measured noise data and the results of using those data with a probe-flaw interaction model to compute the surface-crack detection characteristics of two different air-core coil probes. The objective is to develop a statistical model for determining the probability of detecting a given flaw using an eddy-current system. The basis for developing a statistical detection model is a measurement model that relates the output voltage of the system to its various signal and noise components. Topics considered include statistics of the measured background voltage, calibration of the probe-flaw interaction model and signal-to-noise ratio (SNR) definition, the operating characteristic, and a comparison of air-core probes
Linear mixed models a practical guide using statistical software
West, Brady T; Galecki, Andrzej T
2006-01-01
Simplifying the often confusing array of software programs for fitting linear mixed models (LMMs), Linear Mixed Models: A Practical Guide Using Statistical Software provides a basic introduction to primary concepts, notation, software implementation, model interpretation, and visualization of clustered and longitudinal data. This easy-to-navigate reference details the use of procedures for fitting LMMs in five popular statistical software packages: SAS, SPSS, Stata, R/S-plus, and HLM. The authors introduce basic theoretical concepts, present a heuristic approach to fitting LMMs based on bo
Statistical Model and the mesonic-baryonic transition region
Oeschler, H.; Redlich, K.; Wheaton, S.
2009-01-01
The statistical model assuming chemical equilibriumand local strangeness conservation describes most of the observed features of strange particle production from SIS up to RHIC. Deviations are found as the maximum in the measured K+/pi+ ratio is much sharper than in the model calculations. At the incident energy of the maximum, the statistical model shows that freeze out changes regime from one being dominated by baryons at the lower energies toward one being dominated by mesons. It will be shown how deviations from the usual freeze-out curve influence the various particle ratios. Furthermore, other observables exhibit also changes just in this energy regime.
Multiple commodities in statistical microeconomics: Model and market
Baaquie, Belal E.; Yu, Miao; Du, Xin
2016-11-01
A statistical generalization of microeconomics has been made in Baaquie (2013). In Baaquie et al. (2015), the market behavior of single commodities was analyzed and it was shown that market data provides strong support for the statistical microeconomic description of commodity prices. The case of multiple commodities is studied and a parsimonious generalization of the single commodity model is made for the multiple commodities case. Market data shows that the generalization can accurately model the simultaneous correlation functions of up to four commodities. To accurately model five or more commodities, further terms have to be included in the model. This study shows that the statistical microeconomics approach is a comprehensive and complete formulation of microeconomics, and which is independent to the mainstream formulation of microeconomics.
Multi-region Statistical Shape Model for Cochlear Implantation
DEFF Research Database (Denmark)
Romera, Jordi; Kjer, H. Martin; Piella, Gemma
2016-01-01
Statistical shape models are commonly used to analyze the variability between similar anatomical structures and their use is established as a tool for analysis and segmentation of medical images. However, using a global model to capture the variability of complex structures is not enough to achie...
Evaluation of Statistical Models for Analysis of Insect, Disease and ...
African Journals Online (AJOL)
It is concluded that LMMs and GLMs simultaneously consider the effect of treatments and heterogeneity of variance and hence are more appropriate for analysis of abundance and incidence data than ordinary ANOVA. Keywords: Mixed Models; Generalized Linear Models; Statistical Power East African Journal of Sciences ...
Complex Data Modeling and Computationally Intensive Statistical Methods
Mantovan, Pietro
2010-01-01
The last years have seen the advent and development of many devices able to record and store an always increasing amount of complex and high dimensional data; 3D images generated by medical scanners or satellite remote sensing, DNA microarrays, real time financial data, system control datasets. The analysis of this data poses new challenging problems and requires the development of novel statistical models and computational methods, fueling many fascinating and fast growing research areas of modern statistics. The book offers a wide variety of statistical methods and is addressed to statistici
Validation of statistical models for creep rupture by parametric analysis
Energy Technology Data Exchange (ETDEWEB)
Bolton, J., E-mail: john.bolton@uwclub.net [65, Fisher Ave., Rugby, Warks CV22 5HW (United Kingdom)
2012-01-15
Statistical analysis is an efficient method for the optimisation of any candidate mathematical model of creep rupture data, and for the comparative ranking of competing models. However, when a series of candidate models has been examined and the best of the series has been identified, there is no statistical criterion to determine whether a yet more accurate model might be devised. Hence there remains some uncertainty that the best of any series examined is sufficiently accurate to be considered reliable as a basis for extrapolation. This paper proposes that models should be validated primarily by parametric graphical comparison to rupture data and rupture gradient data. It proposes that no mathematical model should be considered reliable for extrapolation unless the visible divergence between model and data is so small as to leave no apparent scope for further reduction. This study is based on the data for a 12% Cr alloy steel used in BS PD6605:1998 to exemplify its recommended statistical analysis procedure. The models considered in this paper include a) a relatively simple model, b) the PD6605 recommended model and c) a more accurate model of somewhat greater complexity. - Highlights: Black-Right-Pointing-Pointer The paper discusses the validation of creep rupture models derived from statistical analysis. Black-Right-Pointing-Pointer It demonstrates that models can be satisfactorily validated by a visual-graphic comparison of models to data. Black-Right-Pointing-Pointer The method proposed utilises test data both as conventional rupture stress and as rupture stress gradient. Black-Right-Pointing-Pointer The approach is shown to be more reliable than a well-established and widely used method (BS PD6605).
2015-09-30
information on fish school distributions by monitoring the direction of birds returning to the colony or the behavior of other birds at sea through...active sonar. Toward this goal, fundamental advances in the understanding of fish behavior , especially in aggregations, will be made under conditions...relevant to the echo statistics problem. OBJECTIVES To develop new models of behavior of fish aggregations, including the fission/fusion process
Pulit, Sara L; de With, Sera A J; de Bakker, Paul I W
2017-02-01
Genome-wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype-phenotype associations. As the cost of sequencing continues to drop, assembling large samples in global populations is becoming increasingly feasible. Sequencing studies interrogate not only common variants, as was true for genotyping-based GWAS, but variation across the full allele frequency spectrum, yielding many more (independent) statistical tests. We sought to empirically determine genome-wide significance thresholds for various analysis scenarios. Using whole-genome sequence data, we simulated sequencing-based disease studies of varying sample size and ancestry. We determined that future sequencing efforts in >2,000 samples of European, Asian, or admixed ancestry should set genome-wide significance at approximately P = 5 × 10 -9 , and studies of African samples should apply a more stringent genome-wide significance threshold of P = 1 × 10 -9 . Adoption of a revised multiple test correction will be crucial in avoiding irreproducible claims of association. © 2016 WILEY PERIODICALS, INC.
Statistical Validation of Engineering and Scientific Models: Background
International Nuclear Information System (INIS)
Hills, Richard G.; Trucano, Timothy G.
1999-01-01
A tutorial is presented discussing the basic issues associated with propagation of uncertainty analysis and statistical validation of engineering and scientific models. The propagation of uncertainty tutorial illustrates the use of the sensitivity method and the Monte Carlo method to evaluate the uncertainty in predictions for linear and nonlinear models. Four example applications are presented; a linear model, a model for the behavior of a damped spring-mass system, a transient thermal conduction model, and a nonlinear transient convective-diffusive model based on Burger's equation. Correlated and uncorrelated model input parameters are considered. The model validation tutorial builds on the material presented in the propagation of uncertainty tutoriaI and uses the damp spring-mass system as the example application. The validation tutorial illustrates several concepts associated with the application of statistical inference to test model predictions against experimental observations. Several validation methods are presented including error band based, multivariate, sum of squares of residuals, and optimization methods. After completion of the tutorial, a survey of statistical model validation literature is presented and recommendations for future work are made
Modern statistical models for forensic fingerprint examinations: a critical review.
Abraham, Joshua; Champod, Christophe; Lennard, Chris; Roux, Claude
2013-10-10
Over the last decade, the development of statistical models in support of forensic fingerprint identification has been the subject of increasing research attention, spurned on recently by commentators who claim that the scientific basis for fingerprint identification has not been adequately demonstrated. Such models are increasingly seen as useful tools in support of the fingerprint identification process within or in addition to the ACE-V framework. This paper provides a critical review of recent statistical models from both a practical and theoretical perspective. This includes analysis of models of two different methodologies: Probability of Random Correspondence (PRC) models that focus on calculating probabilities of the occurrence of fingerprint configurations for a given population, and Likelihood Ratio (LR) models which use analysis of corresponding features of fingerprints to derive a likelihood value representing the evidential weighting for a potential source. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Growth Curve Models and Applications : Indian Statistical Institute
2017-01-01
Growth curve models in longitudinal studies are widely used to model population size, body height, biomass, fungal growth, and other variables in the biological sciences, but these statistical methods for modeling growth curves and analyzing longitudinal data also extend to general statistics, economics, public health, demographics, epidemiology, SQC, sociology, nano-biotechnology, fluid mechanics, and other applied areas. There is no one-size-fits-all approach to growth measurement. The selected papers in this volume build on presentations from the GCM workshop held at the Indian Statistical Institute, Giridih, on March 28-29, 2016. They represent recent trends in GCM research on different subject areas, both theoretical and applied. This book includes tools and possibilities for further work through new techniques and modification of existing ones. The volume includes original studies, theoretical findings and case studies from a wide range of app lied work, and these contributions have been externally r...
Statistical modelling for recurrent events: an application to sports injuries.
Ullah, Shahid; Gabbett, Tim J; Finch, Caroline F
2014-09-01
Injuries are often recurrent, with subsequent injuries influenced by previous occurrences and hence correlation between events needs to be taken into account when analysing such data. This paper compares five different survival models (Cox proportional hazards (CoxPH) model and the following generalisations to recurrent event data: Andersen-Gill (A-G), frailty, Wei-Lin-Weissfeld total time (WLW-TT) marginal, Prentice-Williams-Peterson gap time (PWP-GT) conditional models) for the analysis of recurrent injury data. Empirical evaluation and comparison of different models were performed using model selection criteria and goodness-of-fit statistics. Simulation studies assessed the size and power of each model fit. The modelling approach is demonstrated through direct application to Australian National Rugby League recurrent injury data collected over the 2008 playing season. Of the 35 players analysed, 14 (40%) players had more than 1 injury and 47 contact injuries were sustained over 29 matches. The CoxPH model provided the poorest fit to the recurrent sports injury data. The fit was improved with the A-G and frailty models, compared to WLW-TT and PWP-GT models. Despite little difference in model fit between the A-G and frailty models, in the interest of fewer statistical assumptions it is recommended that, where relevant, future studies involving modelling of recurrent sports injury data use the frailty model in preference to the CoxPH model or its other generalisations. The paper provides a rationale for future statistical modelling approaches for recurrent sports injury. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
DEFF Research Database (Denmark)
Serviss, Jason T.; Gådin, Jesper R.; Eriksson, Per
2017-01-01
Summary Multi-dimensional data generated via high-throughput experiments is increasingly used in conjunction with dimensionality reduction methods to ascertain if resulting separations of the data correspond with known classes. This is particularly useful to determine if a subset of the variables......, e.g. genes in a specific pathway, alone can separate samples into these established classes. Despite this, the evaluation of class separations is often subjective and performed via visualization. Here we present the ClusterSignificance package; a set of tools designed to assess the statistical...
The Statistical Modeling of the Trends Concerning the Romanian Population
Directory of Open Access Journals (Sweden)
Gabriela OPAIT
2014-11-01
Full Text Available This paper reflects the statistical modeling concerning the resident population in Romania, respectively the total of the romanian population, through by means of the „Least Squares Method”. Any country it develops by increasing of the population, respectively of the workforce, which is a factor of influence for the growth of the Gross Domestic Product (G.D.P.. The „Least Squares Method” represents a statistical technique for to determine the trend line of the best fit concerning a model.
Statistical Model of the 2001 Czech Census for Interactive Presentation
Czech Academy of Sciences Publication Activity Database
Grim, Jiří; Hora, Jan; Boček, Pavel; Somol, Petr; Pudil, Pavel
Vol. 26, č. 4 (2010), s. 1-23 ISSN 0282-423X R&D Projects: GA ČR GA102/07/1594; GA MŠk 1M0572 Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : Interactive statistical model * census data presentation * distribution mixtures * data modeling * EM algorithm * incomplete data * data reproduction accuracy * data mining Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.492, year: 2010 http://library.utia.cas.cz/separaty/2010/RO/grim-0350513.pdf
Applied systems ecology: models, data, and statistical methods
Energy Technology Data Exchange (ETDEWEB)
Eberhardt, L L
1976-01-01
In this report, systems ecology is largely equated to mathematical or computer simulation modelling. The need for models in ecology stems from the necessity to have an integrative device for the diversity of ecological data, much of which is observational, rather than experimental, as well as from the present lack of a theoretical structure for ecology. Different objectives in applied studies require specialized methods. The best predictive devices may be regression equations, often non-linear in form, extracted from much more detailed models. A variety of statistical aspects of modelling, including sampling, are discussed. Several aspects of population dynamics and food-chain kinetics are described, and it is suggested that the two presently separated approaches should be combined into a single theoretical framework. It is concluded that future efforts in systems ecology should emphasize actual data and statistical methods, as well as modelling.
Analyzing sickness absence with statistical models for survival data
DEFF Research Database (Denmark)
Christensen, Karl Bang; Andersen, Per Kragh; Smith-Hansen, Lars
2007-01-01
OBJECTIVES: Sickness absence is the outcome in many epidemiologic studies and is often based on summary measures such as the number of sickness absences per year. In this study the use of modern statistical methods was examined by making better use of the available information. Since sickness...... absence data deal with events occurring over time, the use of statistical models for survival data has been reviewed, and the use of frailty models has been proposed for the analysis of such data. METHODS: Three methods for analyzing data on sickness absences were compared using a simulation study...... involving the following: (i) Poisson regression using a single outcome variable (number of sickness absences), (ii) analysis of time to first event using the Cox proportional hazards model, and (iii) frailty models, which are random effects proportional hazards models. Data from a study of the relation...
A Review of Modeling Bioelectrochemical Systems: Engineering and Statistical Aspects
Directory of Open Access Journals (Sweden)
Shuai Luo
2016-02-01
Full Text Available Bioelectrochemical systems (BES are promising technologies to convert organic compounds in wastewater to electrical energy through a series of complex physical-chemical, biological and electrochemical processes. Representative BES such as microbial fuel cells (MFCs have been studied and advanced for energy recovery. Substantial experimental and modeling efforts have been made for investigating the processes involved in electricity generation toward the improvement of the BES performance for practical applications. However, there are many parameters that will potentially affect these processes, thereby making the optimization of system performance hard to be achieved. Mathematical models, including engineering models and statistical models, are powerful tools to help understand the interactions among the parameters in BES and perform optimization of BES configuration/operation. This review paper aims to introduce and discuss the recent developments of BES modeling from engineering and statistical aspects, including analysis on the model structure, description of application cases and sensitivity analysis of various parameters. It is expected to serves as a compass for integrating the engineering and statistical modeling strategies to improve model accuracy for BES development.
Linking statistical bias description to multiobjective model calibration
Reichert, P.; Schuwirth, N.
2012-09-01
In the absence of model deficiencies, simulation results at the correct parameter values lead to an unbiased description of observed data with remaining deviations due to observation errors only. However, this ideal cannot be reached in the practice of environmental modeling, because the required simplified representation of the complex reality by the model and errors in model input lead to errors that are reflected in biased model output. This leads to two related problems: First, ignoring bias of output in the statistical model description leads to bias in parameter estimates, model predictions and, in particular, in the quantification of their uncertainty. Second, as there is no objective choice of how much bias to accept in which output variable, it is not possible to design an "objective" model calibration procedure. The first of these problems has been addressed by introducing a statistical (Bayesian) description of bias, the second by suggesting the use of multiobjective calibration techniques that cannot easily be used for uncertainty analysis. We merge the ideas of these two approaches by using the prior of the statistical bias description to quantify the importance of multiple calibration objectives. This leads to probabilistic inference and prediction while still taking multiple calibration objectives into account. The ideas and technical details of the suggested approach are outlined and a didactical example as well as an application to environmental data are provided to demonstrate its practical feasibility and computational efficiency.
Automated robust generation of compact 3D statistical shape models
Vrtovec, Tomaz; Likar, Bostjan; Tomazevic, Dejan; Pernus, Franjo
2004-05-01
Ascertaining the detailed shape and spatial arrangement of anatomical structures is important not only within diagnostic settings but also in the areas of planning, simulation, intraoperative navigation, and tracking of pathology. Robust, accurate and efficient automated segmentation of anatomical structures is difficult because of their complexity and inter-patient variability. Furthermore, the position of the patient during image acquisition, the imaging device and protocol, image resolution, and other factors induce additional variations in shape and appearance. Statistical shape models (SSMs) have proven quite successful in capturing structural variability. A possible approach to obtain a 3D SSM is to extract reference voxels by precisely segmenting the structure in one, reference image. The corresponding voxels in other images are determined by registering the reference image to each other image. The SSM obtained in this way describes statistically plausible shape variations over the given population as well as variations due to imperfect registration. In this paper, we present a completely automated method that significantly reduces shape variations induced by imperfect registration, thus allowing a more accurate description of variations. At each iteration, the derived SSM is used for coarse registration, which is further improved by describing finer variations of the structure. The method was tested on 64 lumbar spinal column CT scans, from which 23, 38, 45, 46 and 42 volumes of interest containing vertebra L1, L2, L3, L4 and L5, respectively, were extracted. Separate SSMs were generated for each vertebra. The results show that the method is capable of reducing the variations induced by registration errors.
Statistical learning modeling method for space debris photometric measurement
Sun, Wenjing; Sun, Jinqiu; Zhang, Yanning; Li, Haisen
2016-03-01
Photometric measurement is an important way to identify the space debris, but the present methods of photometric measurement have many constraints on star image and need complex image processing. Aiming at the problems, a statistical learning modeling method for space debris photometric measurement is proposed based on the global consistency of the star image, and the statistical information of star images is used to eliminate the measurement noises. First, the known stars on the star image are divided into training stars and testing stars. Then, the training stars are selected as the least squares fitting parameters to construct the photometric measurement model, and the testing stars are used to calculate the measurement accuracy of the photometric measurement model. Experimental results show that, the accuracy of the proposed photometric measurement model is about 0.1 magnitudes.
Statistical, Morphometric, Anatomical Shape Model (Atlas) of Calcaneus
Melinska, Aleksandra U.; Romaszkiewicz, Patryk; Wagel, Justyna; Sasiadek, Marek; Iskander, D. Robert
2015-01-01
The aim was to develop a morphometric and anatomically accurate atlas (statistical shape model) of calcaneus. The model is based on 18 left foot and 18 right foot computed tomography studies of 28 male individuals aged from 17 to 62 years, with no known foot pathology. A procedure for automatic atlas included extraction and identification of common features, averaging feature position, obtaining mean geometry, mathematical shape description and variability analysis. Expert manual assistance was included for the model to fulfil the accuracy sought by medical professionals. The proposed for the first time statistical shape model of the calcaneus could be of value in many orthopaedic applications including providing support in diagnosing pathological lesions, pre-operative planning, classification and treatment of calcaneus fractures as well as for the development of future implant procedures. PMID:26270812
Workshop on Model Uncertainty and its Statistical Implications
1988-01-01
In this book problems related to the choice of models in such diverse fields as regression, covariance structure, time series analysis and multinomial experiments are discussed. The emphasis is on the statistical implications for model assessment when the assessment is done with the same data that generated the model. This is a problem of long standing, notorious for its difficulty. Some contributors discuss this problem in an illuminating way. Others, and this is a truly novel feature, investigate systematically whether sample re-use methods like the bootstrap can be used to assess the quality of estimators or predictors in a reliable way given the initial model uncertainty. The book should prove to be valuable for advanced practitioners and statistical methodologists alike.
Statistical Modeling for Radiation Hardness Assurance: Toward Bigger Data
Ladbury, R.; Campola, M. J.
2015-01-01
New approaches to statistical modeling in radiation hardness assurance are discussed. These approaches yield quantitative bounds on flight-part radiation performance even in the absence of conventional data sources. This allows the analyst to bound radiation risk at all stages and for all decisions in the RHA process. It also allows optimization of RHA procedures for the project's risk tolerance.
Interactive comparison of hypothesis tests for statistical model checking
de Boer, Pieter-Tjerk; Reijsbergen, D.P.; Scheinhardt, Willem R.W.
2015-01-01
We present a web-based interactive comparison of hypothesis tests as are used in statistical model checking, providing users and tool developers with more insight into their characteristics. Parameters can be modified easily and their influence is visualized in real time; an integrated simulation
Syntactic discriminative language model rerankers for statistical machine translation
Carter, S.; Monz, C.
2011-01-01
This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language models in differentiating between Statistical
Hierarchical modelling for the environmental sciences statistical methods and applications
Clark, James S
2006-01-01
New statistical tools are changing the way in which scientists analyze and interpret data and models. Hierarchical Bayes and Markov Chain Monte Carlo methods for analysis provide a consistent framework for inference and prediction where information is heterogeneous and uncertain, processes are complicated, and responses depend on scale. Nowhere are these methods more promising than in the environmental sciences.
Using statistical compatibility to derive advanced probabilistic fatigue models
Czech Academy of Sciences Publication Activity Database
Fernández-Canteli, A.; Castillo, E.; López-Aenlle, M.; Seitl, Stanislav
2010-01-01
Roč. 2, č. 1 (2010), s. 1131-1140 E-ISSN 1877-7058. [Fatigue 2010. Praha, 06.06.2010-11.06.2010] Institutional research plan: CEZ:AV0Z20410507 Keywords : Fatigue models * Statistical compatibility * Functional equations Subject RIV: JL - Materials Fatigue, Friction Mechanics
Modelling geographical graduate job search using circular statistics
Faggian, Alessandra; Corcoran, Jonathan; McCann, Philip
Theory suggests that the spatial patterns of migration flows are contingent both on individual human capital and underlying geographical structures. Here we demonstrate these features by using circular statistics in an econometric modelling framework applied to the flows of UK university graduates.
Statistical Modeling of Energy Production by Photovoltaic Farms
Czech Academy of Sciences Publication Activity Database
Brabec, Marek; Pelikán, Emil; Krč, Pavel; Eben, Kryštof; Musílek, P.
2011-01-01
Roč. 5, č. 9 (2011), s. 785-793 ISSN 1934-8975 Grant - others:GA AV ČR(CZ) M100300904 Institutional research plan: CEZ:AV0Z10300504 Keywords : electrical energy * solar energy * numerical weather prediction model * nonparametric regression * beta regression Subject RIV: BB - Applied Statistics, Operational Research
Two-dimensional models in statistical mechanics and field theory
International Nuclear Information System (INIS)
Koberle, R.
1980-01-01
Several features of two-dimensional models in statistical mechanics and Field theory, such as, lattice quantum chromodynamics, Z(N), Gross-Neveu and CP N-1 are discussed. The problems of confinement and dynamical mass generation are also analyzed. (L.C.) [pt
Statistical properties of the nuclear shell-model Hamiltonian
International Nuclear Information System (INIS)
Dias, H.; Hussein, M.S.; Oliveira, N.A. de
1986-01-01
The statistical properties of realistic nuclear shell-model Hamiltonian are investigated in sd-shell nuclei. The probability distribution of the basic-vector amplitude is calculated and compared with the Porter-Thomas distribution. Relevance of the results to the calculation of the giant resonance mixing parameter is pointed out. (Author) [pt
Eigenfunction statistics for Anderson model with Hölder continuous ...
Indian Academy of Sciences (India)
continuous (0 < α ≤ 1) single site distribution. In localized regime, we study the distri- bution of eigenfunctions in space and energy simultaneously. In a certain scaling limit, we prove limit points are Poisson. Keywords. Anderson model; Hölder continuous measure; Poisson statistics. 2010 Mathematics Subject Classification ...
Casati, Michele
2014-05-01
The assertion that solar activity may play a significant role in the trigger of large volcanic eruptions is, and has been discussed by many geophysicists. Numerous scientific papers have established a possible correlation between these events and the electromagnetic coupling between the Earth and the Sun, but none of them has been able to highlight a possible statistically significant relationship between large volcanic eruptions and any of the series, such as geomagnetic activity, solar wind, sunspots number. In our research, we compare the 148 volcanic eruptions with index VEI4, the major 37 historical volcanic eruptions equal to or greater than index VEI5, recorded from 1610 to 2012 , with its sunspots number. Staring, as the threshold value, a monthly sunspot number of 46 (recorded during the great eruption of Krakatoa VEI6 historical index, August 1883), we note some possible relationships and conduct a statistical test. • Of the historical 31 large volcanic eruptions with index VEI5+, recorded between 1610 and 1955, 29 of these were recorded when the SSN<46. The remaining 2 eruptions were not recorded when the SSN<46, but rather during solar maxima of the solar cycle of the year 1739 and in the solar cycle No. 14 (Shikotsu eruption of 1739 and Ksudach 1907). • Of the historical 8 large volcanic eruptions with index VEI6+, recorded from 1610 to the present, 7 of these were recorded with SSN<46 and more specifically, within the three large solar minima known : Maunder (1645-1710), Dalton (1790-1830) and during the solar minimums occurred between 1880 and 1920. As the only exception, we note the eruption of Pinatubo of June 1991, recorded in the solar maximum of cycle 22. • Of the historical 6 major volcanic eruptions with index VEI5+, recorded after 1955, 5 of these were not recorded during periods of low solar activity, but rather during solar maxima, of the cycles 19,21 and 22. The significant tests, conducted with the chi-square χ ² = 7,782, detect a
Integration of Advanced Statistical Analysis Tools and Geophysical Modeling
2012-08-01
1.56 0.48 Beale: MetalMapper Cued: Beale_MMstat Target: 477 Cell 202 of 1547 (SOI, 2OI) Model 1 of 3 (Inv #1 / 2 = SOI: 1 / 1) Tag...Statistical classification of buried unexploded ordnance using nonparametric prior models. IEEE Trans. Geosci. Remote Sensing, 45: 2794–2806, 2007. T...Bell and B. Barrow. Subsurface discrimination using electromagnetic induction sensors. IEEE Trans. Geosci. Remote Sensing, 39:1286–1293, 2001. S. D
A Statistical Model for Synthesis of Detailed Facial Geometry
Golovinskiy, Aleksey; Matusik, Wojciech; Pfister, Hanspeter; Rusinkiewicz, Szymon; Funkhouser, Thomas
2006-01-01
Detailed surface geometry contributes greatly to the visual realism of 3D face models. However, acquiring high-resolution face geometry is often tedious and expensive. Consequently, most face models used in games, virtual reality, or computer vision look unrealistically smooth. In this paper, we introduce a new statistical technique for the analysis and synthesis of small three-dimensional facial features, such as wrinkles and pores. We acquire high-resolution face geometry for people across ...
Statistical and RBF NN models : providing forecasts and risk assessment
Marček, Milan
2009-01-01
Forecast accuracy of economic and financial processes is a popular measure for quantifying the risk in decision making. In this paper, we develop forecasting models based on statistical (stochastic) methods, sometimes called hard computing, and on a soft method using granular computing. We consider the accuracy of forecasting models as a measure for risk evaluation. It is found that the risk estimation process based on soft methods is simplified and less critical to the question w...
Advances on statistical/thermodynamical models for unpolarized structure functions
Energy Technology Data Exchange (ETDEWEB)
Trevisan, Luis A. [Departamento de Matematica e Estatistica, Universidade Estadual de Ponta Grossa, 84010-790, Ponta Grossa, PR (Brazil); Mirez, Carlos [Universidade Federal dos Vales do Jequitinhonha e Mucuri, Campus do Mucuri, 39803-371, Teofilo Otoni, Minas Gerais (Brazil); Tomio, Lauro [Instituto de Fisica Teorica, Universidade Estadual Paulista, R. Dr. Bento Teobaldo Ferraz 271, Bl II Barra Funda, 01140070, Sao Paulo, SP (Brazil)
2013-03-25
During the eights and nineties many statistical/thermodynamical models were proposed to describe the nucleons' structure functions and distribution of the quarks in the hadrons. Most of these models describe the compound quarks and gluons inside the nucleon as a Fermi / Bose gas respectively, confined in a MIT bag with continuous energy levels. Another models considers discrete spectrum. Some interesting features of the nucleons are obtained by these models, like the sea asymmetries {sup -}d/{sup -}u and {sup -}d-{sup -}u.
Directory of Open Access Journals (Sweden)
Ioan Catalin VLAD
2012-11-01
Full Text Available Purpose: In recent studies perineural invasion (PNI is associated with poor survival rates in rectal cancer, but the impact of PNI it’s still controversial. We assessed PNI as a potential prognostic factor in rectal cancer. Patients and Methods: We analyzed 317 patients with rectal cancer resected at The Oncology Institute”Prof. Dr. Ion Chiricuţă” Cluj-Napoca, between January 2000 and December 2008. Tumors were reviewed for PNI by a pathologist. Patients data were reviewed and entered into a comprehensive database. The statistical analysis in our study was carried out in R environment for statistical computing and graphics, version 1.15.1. Overall and disease-free survivals were determined using the Kaplan-Meier method, and multivariate analysis using the Cox multiple hazards model. Results were compared using the log-rank test. Results: In our study PNI was identified in 19% of tumors. The 5-year disease-free survival rate was higher for patients with PNI-negative tumors versus those with PNI-positive tumors (57.31% vs. 36.99%, p=0.009. The 5-year overall survival rate was 59.15% for PNI-negative tumors versus 39.19% for PNI-positive tumors (p=0.014. On multivariate analysis, PNI was an independent prognostic factor for overall survival (Hazard Ratio = 0.6; 95% CI = 0.41 to 0.87; p = 0.0082. Conclusions: PNI can be considered an independent prognostic factor of outcomes in patients with rectal cancer. PNI should be taken into account when selecting patients for adjuvant treatment. R environment for statistical computing and graphics is complex yet easy to use software that has proven to be efficient in our clinical study.
Applications of spatial statistical network models to stream data
Daniel J. Isaak; Erin E. Peterson; Jay M. Ver Hoef; Seth J. Wenger; Jeffrey A. Falke; Christian E. Torgersen; Colin Sowder; E. Ashley Steel; Marie-Josee Fortin; Chris E. Jordan; Aaron S. Ruesch; Nicholas Som; Pascal. Monestiez
2014-01-01
Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for...
Bilingual Cluster Based Models for Statistical Machine Translation
Yamamoto, Hirofumi; Sumita, Eiichiro
We propose a domain specific model for statistical machine translation. It is well-known that domain specific language models perform well in automatic speech recognition. We show that domain specific language and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparseness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In order to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynamically. For these cases, not only the translation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statistical machine translation. In the proposed method, a bilingual training corpus, is automatically clustered into sub-corpora. Each sub-corpus is deemed to be a domain. The domain of a source sentence is predicted by using its similarity to the sub-corpora. The predicted domain (sub-corpus) specific language and translation models are then used for the translation decoding. This approach gave an improvement of 2.7 in BLEU score on the IWSLT05 Japanese to English evaluation corpus (improving the score from 52.4 to 55.1). This is a substantial gain and indicates the validity of the proposed bilingual cluster based models.
Directory of Open Access Journals (Sweden)
Leitner Dietmar
2005-04-01
Full Text Available Abstract Background A reliable prediction of the Xaa-Pro peptide bond conformation would be a useful tool for many protein structure calculation methods. We have analyzed the Protein Data Bank and show that the combined use of sequential and structural information has a predictive value for the assessment of the cis versus trans peptide bond conformation of Xaa-Pro within proteins. For the analysis of the data sets different statistical methods such as the calculation of the Chou-Fasman parameters and occurrence matrices were used. Furthermore we analyzed the relationship between the relative solvent accessibility and the relative occurrence of prolines in the cis and in the trans conformation. Results One of the main results of the statistical investigations is the ranking of the secondary structure and sequence information with respect to the prediction of the Xaa-Pro peptide bond conformation. We observed a significant impact of secondary structure information on the occurrence of the Xaa-Pro peptide bond conformation, while the sequence information of amino acids neighboring proline is of little predictive value for the conformation of this bond. Conclusion In this work, we present an extensive analysis of the occurrence of the cis and trans proline conformation in proteins. Based on the data set, we derived patterns and rules for a possible prediction of the proline conformation. Upon adoption of the Chou-Fasman parameters, we are able to derive statistically relevant correlations between the secondary structure of amino acid fragments and the Xaa-Pro peptide bond conformation.
International Nuclear Information System (INIS)
Weathers, J.B.; Luck, R.; Weathers, J.W.
2009-01-01
The complexity of mathematical models used by practicing engineers is increasing due to the growing availability of sophisticated mathematical modeling tools and ever-improving computational power. For this reason, the need to define a well-structured process for validating these models against experimental results has become a pressing issue in the engineering community. This validation process is partially characterized by the uncertainties associated with the modeling effort as well as the experimental results. The net impact of the uncertainties on the validation effort is assessed through the 'noise level of the validation procedure', which can be defined as an estimate of the 95% confidence uncertainty bounds for the comparison error between actual experimental results and model-based predictions of the same quantities of interest. Although general descriptions associated with the construction of the noise level using multivariate statistics exists in the literature, a detailed procedure outlining how to account for the systematic and random uncertainties is not available. In this paper, the methodology used to derive the covariance matrix associated with the multivariate normal pdf based on random and systematic uncertainties is examined, and a procedure used to estimate this covariance matrix using Monte Carlo analysis is presented. The covariance matrices are then used to construct approximate 95% confidence constant probability contours associated with comparison error results for a practical example. In addition, the example is used to show the drawbacks of using a first-order sensitivity analysis when nonlinear local sensitivity coefficients exist. Finally, the example is used to show the connection between the noise level of the validation exercise calculated using multivariate and univariate statistics.
WE-A-201-02: Modern Statistical Modeling
International Nuclear Information System (INIS)
Niemierko, A.
2016-01-01
Chris Marshall: Memorial Introduction Donald Edmonds Herbert Jr., or Don to his colleagues and friends, exemplified the “big tent” vision of medical physics, specializing in Applied Statistics and Dynamical Systems theory. He saw, more clearly than most, that “Making models is the difference between doing science and just fooling around [ref Woodworth, 2004]”. Don developed an interest in chemistry at school by “reading a book” - a recurring theme in his story. He was awarded a Westinghouse Science scholarship and attended the Carnegie Institute of Technology (later Carnegie Mellon University) where his interest turned to physics and led to a BS in Physics after transfer to Northwestern University. After (voluntary) service in the Navy he earned his MS in Physics from the University of Oklahoma, which led him to Johns Hopkins University in Baltimore to pursue a PhD. The early death of his wife led him to take a salaried position in the Physics Department of Colorado College in Colorado Springs so as to better care for their young daughter. There, a chance invitation from Dr. Juan del Regato to teach physics to residents at the Penrose Cancer Hospital introduced him to Medical Physics, and he decided to enter the field. He received his PhD from the University of London (UK) under Prof. Joseph Rotblat, where I first met him, and where he taught himself statistics. He returned to Penrose as a clinical medical physicist, also largely self-taught. In 1975 he formalized an evolving interest in statistical analysis as Professor of Radiology and Head of the Division of Physics and Statistics at the College of Medicine of the University of South Alabama in Mobile, AL where he remained for the rest of his career. He also served as the first Director of their Bio-Statistics and Epidemiology Core Unit working in part on a sickle-cell disease. After retirement he remained active as Professor Emeritus. Don served for several years as a consultant to the Nuclear
WE-A-201-02: Modern Statistical Modeling
Energy Technology Data Exchange (ETDEWEB)
Niemierko, A.
2016-06-15
Chris Marshall: Memorial Introduction Donald Edmonds Herbert Jr., or Don to his colleagues and friends, exemplified the “big tent” vision of medical physics, specializing in Applied Statistics and Dynamical Systems theory. He saw, more clearly than most, that “Making models is the difference between doing science and just fooling around [ref Woodworth, 2004]”. Don developed an interest in chemistry at school by “reading a book” - a recurring theme in his story. He was awarded a Westinghouse Science scholarship and attended the Carnegie Institute of Technology (later Carnegie Mellon University) where his interest turned to physics and led to a BS in Physics after transfer to Northwestern University. After (voluntary) service in the Navy he earned his MS in Physics from the University of Oklahoma, which led him to Johns Hopkins University in Baltimore to pursue a PhD. The early death of his wife led him to take a salaried position in the Physics Department of Colorado College in Colorado Springs so as to better care for their young daughter. There, a chance invitation from Dr. Juan del Regato to teach physics to residents at the Penrose Cancer Hospital introduced him to Medical Physics, and he decided to enter the field. He received his PhD from the University of London (UK) under Prof. Joseph Rotblat, where I first met him, and where he taught himself statistics. He returned to Penrose as a clinical medical physicist, also largely self-taught. In 1975 he formalized an evolving interest in statistical analysis as Professor of Radiology and Head of the Division of Physics and Statistics at the College of Medicine of the University of South Alabama in Mobile, AL where he remained for the rest of his career. He also served as the first Director of their Bio-Statistics and Epidemiology Core Unit working in part on a sickle-cell disease. After retirement he remained active as Professor Emeritus. Don served for several years as a consultant to the Nuclear
Organism-level models: When mechanisms and statistics fail us
Phillips, M. H.; Meyer, J.; Smith, W. P.; Rockhill, J. K.
2014-03-01
Purpose: To describe the unique characteristics of models that represent the entire course of radiation therapy at the organism level and to highlight the uses to which such models can be put. Methods: At the level of an organism, traditional model-building runs into severe difficulties. We do not have sufficient knowledge to devise a complete biochemistry-based model. Statistical model-building fails due to the vast number of variables and the inability to control many of them in any meaningful way. Finally, building surrogate models, such as animal-based models, can result in excluding some of the most critical variables. Bayesian probabilistic models (Bayesian networks) provide a useful alternative that have the advantages of being mathematically rigorous, incorporating the knowledge that we do have, and being practical. Results: Bayesian networks representing radiation therapy pathways for prostate cancer and head & neck cancer were used to highlight the important aspects of such models and some techniques of model-building. A more specific model representing the treatment of occult lymph nodes in head & neck cancer were provided as an example of how such a model can inform clinical decisions. A model of the possible role of PET imaging in brain cancer was used to illustrate the means by which clinical trials can be modelled in order to come up with a trial design that will have meaningful outcomes. Conclusions: Probabilistic models are currently the most useful approach to representing the entire therapy outcome process.
Experimental, statistical, and biological models of radon carcinogenesis
International Nuclear Information System (INIS)
Cross, F.T.
1991-09-01
Risk models developed for underground miners have not been consistently validated in studies of populations exposed to indoor radon. Imprecision in risk estimates results principally from differences between exposures in mines as compared to domestic environments and from uncertainties about the interaction between cigarette-smoking and exposure to radon decay products. Uncertainties in extrapolating miner data to domestic exposures can be reduced by means of a broad-based health effects research program that addresses the interrelated issues of exposure, respiratory tract dose, carcinogenesis (molecular/cellular and animal studies, plus developing biological and statistical models), and the relationship of radon to smoking and other copollutant exposures. This article reviews experimental animal data on radon carcinogenesis observed primarily in rats at Pacific Northwest Laboratory. Recent experimental and mechanistic carcinogenesis models of exposures to radon, uranium ore dust, and cigarette smoke are presented with statistical analyses of animal data. 20 refs., 1 fig
Experimental, statistical and biological models of radon carcinogenesis
International Nuclear Information System (INIS)
Cross, F.T.
1992-01-01
Risk models developed for underground miners have not been consistently validated in studies of populations exposed to indoor radon. Imprecision in risk estimates results principally from differences between exposures in mines as compared with domestic environments and from uncertainties about the interaction between cigarette smoking and exposure to radon decay products. Uncertainties in extrapolating miner data to domestic exposures can be reduced by means of a broad-based health effects research programme that addresses the interrelated issues of exposure, respiratory tract dose, carcinogenesis (molecular/cellular and animal studies, plus developing biological and statistical models) and the relationship of radon to smoking and other co-pollutant exposures. This article reviews experimental animal data on radon carcinogenesis observed primarily in rats at Pacific Northwest Laboratory. Recent experimental and mechanistic carcinogenesis models of exposures to radon, uranium ore dust, and cigarette smoke are presented with statistical analyses of animal data. (author)
Statistical 3D damage accumulation model for ion implant simulators
International Nuclear Information System (INIS)
Hernandez-Mangas, J.M.; Lazaro, J.; Enriquez, L.; Bailon, L.; Barbolla, J.; Jaraiz, M.
2003-01-01
A statistical 3D damage accumulation model, based on the modified Kinchin-Pease formula, for ion implant simulation has been included in our physically based ion implantation code. It has only one fitting parameter for electronic stopping and uses 3D electron density distributions for different types of targets including compound semiconductors. Also, a statistical noise reduction mechanism based on the dose division is used. The model has been adapted to be run under parallel execution in order to speed up the calculation in 3D structures. Sequential ion implantation has been modelled including previous damage profiles. It can also simulate the implantation of molecular and cluster projectiles. Comparisons of simulated doping profiles with experimental SIMS profiles are presented. Also comparisons between simulated amorphization and experimental RBS profiles are shown. An analysis of sequential versus parallel processing is provided
Statistical 3D damage accumulation model for ion implant simulators
Hernandez-Mangas, J M; Enriquez, L E; Bailon, L; Barbolla, J; Jaraiz, M
2003-01-01
A statistical 3D damage accumulation model, based on the modified Kinchin-Pease formula, for ion implant simulation has been included in our physically based ion implantation code. It has only one fitting parameter for electronic stopping and uses 3D electron density distributions for different types of targets including compound semiconductors. Also, a statistical noise reduction mechanism based on the dose division is used. The model has been adapted to be run under parallel execution in order to speed up the calculation in 3D structures. Sequential ion implantation has been modelled including previous damage profiles. It can also simulate the implantation of molecular and cluster projectiles. Comparisons of simulated doping profiles with experimental SIMS profiles are presented. Also comparisons between simulated amorphization and experimental RBS profiles are shown. An analysis of sequential versus parallel processing is provided.
SoS contract verification using statistical model checking
Directory of Open Access Journals (Sweden)
Alessandro Mignogna
2013-11-01
Full Text Available Exhaustive formal verification for systems of systems (SoS is impractical and cannot be applied on a large scale. In this paper we propose to use statistical model checking for efficient verification of SoS. We address three relevant aspects for systems of systems: 1 the model of the SoS, which includes stochastic aspects; 2 the formalization of the SoS requirements in the form of contracts; 3 the tool-chain to support statistical model checking for SoS. We adapt the SMC technique for application to heterogeneous SoS. We extend the UPDM/SysML specification language to express the SoS requirements that the implemented strategies over the SoS must satisfy. The requirements are specified with a new contract language specifically designed for SoS, targeting a high-level English- pattern language, but relying on an accurate semantics given by the standard temporal logics. The contracts are verified against the UPDM/SysML specification using the Statistical Model Checker (SMC PLASMA combined with the simulation engine DESYRE, which integrates heterogeneous behavioral models through the functional mock-up interface (FMI standard. The tool-chain allows computing an estimation of the satisfiability of the contracts by the SoS. The results help the system architect to trade-off different solutions to guide the evolution of the SoS.
Tsujimoto, Yasushi; Tsutsumi, Yusuke; Kataoka, Yuki; Tsujimoto, Hiraku; Yamamoto, Yosuke; Papola, Davide; Guyatt, Gordon H; Fukuhara, Shunichi; Furukawa, Toshi A
2017-10-22
Many studies have indicated the impact of bias in dissemination and publication in medical research. Existence of such bias among clinical trials has been repeatedly pointed out, but it has not been well studied in the field of systematic reviews (SRs). We therefore aim to investigate whether or not time lag bias and publication bias in SRs based on statistical significance in results exist. In addition, we will examine at what stage of paper publication process such bias, if any, creeps in. The present study is a meta-epidemiological study. We will include all SRs of interventions registered in the international prospective register of SRs (PROSPERO) before December 2014 if the SR has completed its analysis irrespective of its publication status. All contact authors of eligible SRs will be asked to participate in a survey administered through the Internet. Our primary outcome is time from protocol registration to full publication of SR as a journal article, defined as time from the registration date to the acceptance date among all the relevant SRs. We will examine the impact of statistically significant findings on the primary outcomes through time to event analyses. Ethics approval will be obtained from the Ethical Committee of the Kyoto University Graduate School of Medicine. This protocol has been registered in the University Hospital Medical Information Network Clinical Trials Registry. We will publish our findings in a peer-reviewed journal and also may present them at conferences. UMIN000028325. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Perneger, Thomas V; Combescure, Christophe
2017-07-01
Published P-values provide a window into the global enterprise of medical research. The aim of this study was to use the distribution of published P-values to estimate the relative frequencies of null and alternative hypotheses and to seek irregularities suggestive of publication bias. This cross-sectional study included P-values published in 120 medical research articles in 2016 (30 each from the BMJ, JAMA, Lancet, and New England Journal of Medicine). The observed distribution of P-values was compared with expected distributions under the null hypothesis (i.e., uniform between 0 and 1) and the alternative hypothesis (strictly decreasing from 0 to 1). P-values were categorized according to conventional levels of statistical significance and in one-percent intervals. Among 4,158 recorded P-values, 26.1% were highly significant (P journals are not a random sample of null and alternative hypotheses but that selective reporting is prevalent. In particular, significant results are about twice as likely to be reported as nonsignificant results. Copyright © 2017 Elsevier Inc. All rights reserved.
Nuclear EMC effect in non-extensive statistical model
Energy Technology Data Exchange (ETDEWEB)
Trevisan, Luis A. [Departamento de Matematica e Estatistica, Universidade Estadual de Ponta Grossa, 84010-790, Ponta Grossa, PR (Brazil); Mirez, Carlos [ICET, Universidade Federal dos Vales do Jequitinhonha e Mucuri - UFVJM, Campus do Mucuri, Rua do Cruzeiro 01, Jardim Sao Paulo, 39803-371, Teofilo Otoni, MG (Brazil)
2013-05-06
In the present work, we attempt to describe the nuclear EMC effect by using the proton structure functions obtained from the non-extensive statistical quark model. We record that such model has three fundamental variables, the temperature T, the radius, and the Tsallis parameter q. By combining different small changes, a good agreement with the experimental data may be obtained. Another interesting point of the model is to allow phenomenological interpretation, for instance, with q constant and changing the radius and the temperature or changing the radius and q and keeping the temperature.
Martin, Justin D.
2017-01-01
This essay presents data from a census of statistics requirements and offerings at all 4-year journalism programs in the United States (N = 369) and proposes a model of a potential course in statistics for journalism majors. The author proposes that three philosophies underlie a statistics course for journalism students. Such a course should (a)…
Statistical modelling of a new global potential vegetation distribution
Levavasseur, G.; Vrac, M.; Roche, D. M.; Paillard, D.
2012-12-01
The potential natural vegetation (PNV) distribution is required for several studies in environmental sciences. Most of the available databases are quite subjective or depend on vegetation models. We have built a new high-resolution world-wide PNV map using a objective statistical methodology based on multinomial logistic models. Our method appears as a fast and robust alternative in vegetation modelling, independent of any vegetation model. In comparison with other databases, our method provides a realistic PNV distribution in agreement with respect to BIOME 6000 data. Among several advantages, the use of probabilities allows us to estimate the uncertainty, bringing some confidence in the modelled PNV, or to highlight the regions needing some data to improve the PNV modelling. Despite our PNV map being highly dependent on the distribution of data points, it is easily updatable as soon as additional data are available and provides very useful additional information for further applications.
Olive mill wastewater characteristics: modelling and statistical analysis
Directory of Open Access Journals (Sweden)
Martins-Dias, Susete
2004-09-01
Full Text Available A synthesis of the work carried out on Olive Mill Wastewater (OMW characterisation is given, covering articles published over the last 50 years. Data on OMW characterisation found in the literature are summarised and correlations between them and with phenolic compounds content are sought. This permits the characteristics of an OMW to be estimated from one simple measurement: the phenolic compounds concentration. A model based on OMW characterisations accounting 6 countries was developed along with a model for Portuguese OMW. The statistical analysis of the correlations obtained indicates that Chemical Oxygen Demand of a given OMW is a second-degree polynomial function of its phenolic compounds concentration. Tests to evaluate the regressions significance were carried out, based on multivariable ANOVA analysis, on visual standardised residuals distribution and their means for confidence levels of 95 and 99 %, validating clearly these models. This modelling work will help in the future planning, operation and monitoring of an OMW treatment plant.Presentamos una síntesis de los trabajos realizados en los últimos 50 años relacionados con la caracterización del alpechín. Realizamos una recopilación de los datos publicados, buscando correlaciones entre los datos relativos al alpechín y los compuestos fenólicos. Esto permite la determinación de las características del alpechín a partir de una sola medida: La concentración de compuestos fenólicos. Proponemos dos modelos, uno basado en datos relativos a seis países y un segundo aplicado únicamente a Portugal. El análisis estadístico de las correlaciones obtenidas indica que la demanda química de oxígeno de un determinado alpechín es una función polinómica de segundo grado de su concentración de compuestos fenólicos. Se comprobó la significancia de esta correlación mediante la aplicación del análisis multivariable ANOVA, y además se evaluó la distribución de residuos y sus
Bayesian statistic methods and theri application in probabilistic simulation models
Directory of Open Access Journals (Sweden)
Sergio Iannazzo
2007-03-01
Full Text Available Bayesian statistic methods are facing a rapidly growing level of interest and acceptance in the field of health economics. The reasons of this success are probably to be found on the theoretical fundaments of the discipline that make these techniques more appealing to decision analysis. To this point should be added the modern IT progress that has developed different flexible and powerful statistical software framework. Among them probably one of the most noticeably is the BUGS language project and its standalone application for MS Windows WinBUGS. Scope of this paper is to introduce the subject and to show some interesting applications of WinBUGS in developing complex economical models based on Markov chains. The advantages of this approach reside on the elegance of the code produced and in its capability to easily develop probabilistic simulations. Moreover an example of the integration of bayesian inference models in a Markov model is shown. This last feature let the analyst conduce statistical analyses on the available sources of evidence and exploit them directly as inputs in the economic model.
Can spatial statistical river temperature models be transferred between catchments?
Jackson, Faye L.; Fryer, Robert J.; Hannah, David M.; Malcolm, Iain A.
2017-09-01
There has been increasing use of spatial statistical models to understand and predict river temperature (Tw) from landscape covariates. However, it is not financially or logistically feasible to monitor all rivers and the transferability of such models has not been explored. This paper uses Tw data from four river catchments collected in August 2015 to assess how well spatial regression models predict the maximum 7-day rolling mean of daily maximum Tw (Twmax) within and between catchments. Models were fitted for each catchment separately using (1) landscape covariates only (LS models) and (2) landscape covariates and an air temperature (Ta) metric (LS_Ta models). All the LS models included upstream catchment area and three included a river network smoother (RNS) that accounted for unexplained spatial structure. The LS models transferred reasonably to other catchments, at least when predicting relative levels of Twmax. However, the predictions were biased when mean Twmax differed between catchments. The RNS was needed to characterise and predict finer-scale spatially correlated variation. Because the RNS was unique to each catchment and thus non-transferable, predictions were better within catchments than between catchments. A single model fitted to all catchments found no interactions between the landscape covariates and catchment, suggesting that the landscape relationships were transferable. The LS_Ta models transferred less well, with particularly poor performance when the relationship with the Ta metric was physically implausible or required extrapolation outside the range of the data. A single model fitted to all catchments found catchment-specific relationships between Twmax and the Ta metric, indicating that the Ta metric was not transferable. These findings improve our understanding of the transferability of spatial statistical river temperature models and provide a foundation for developing new approaches for predicting Tw at unmonitored locations across
Directory of Open Access Journals (Sweden)
Karen Larwin
2014-02-01
Full Text Available The present study examined students' statistics-related self-efficacy, as measured with the current statistics self-efficacy (CSSE inventory developed by Finney and Schraw (2003. Structural equation modeling was used to check the confirmatory factor analysis of the one-dimensional factor of CSSE. Once confirmed, this factor was used to test whether a significant link to prior mathematics experiences exists. Additionally a new post-structural equation modeling (SEM application was employed to compute error-free latent variable score for CSSE in an effort to examine the ancillary effects of gender, age, ethnicity, department, degree level, hours completed, expected course grade, number of college-level math classes, current GPA on students' CSSE scores. Results support the one-dimensional construct and as expected, the model demonstrated a significant link between CSSE scores and prior mathematics experiences to CSSE. Additionally the students' department, expected grade, and number of prior math classes were found to have a significant effect on student's CSSE scores.
Statistical volumetric model for characterization and visualization of prostate cancer
Lu, Jianping; Srikanchana, Rujirutana; McClain, Maxine A.; Wang, Yue J.; Xuan, Jian Hua; Sesterhenn, Isabell A.; Freedman, Matthew T.; Mun, Seong K.
2000-04-01
To reveal the spatial pattern of localized prostate cancer distribution, a 3D statistical volumetric model, showing the probability map of prostate cancer distribution, together with the anatomical structure of the prostate, has been developed from 90 digitally-imaged surgical specimens. Through an enhanced virtual environment with various visualization modes, this master model permits for the first time an accurate characterization and understanding of prostate cancer distribution patterns. The construction of the statistical volumetric model is characterized by mapping all of the individual models onto a generic prostate site model, in which a self-organizing scheme is used to decompose a group of contours representing multifold tumors into localized tumor elements. Next crucial step of creating the master model is the development of an accurate multi- object and non-rigid registration/warping scheme incorporating various variations among these individual moles in true 3D. This is achieved with a multi-object based principle-axis alignment followed by an affine transform, and further fine-tuned by a thin-plate spline interpolation driven by the surface based deformable warping dynamics. Based on the accurately mapped tumor distribution, a standard finite normal mixture is used to model the cancer volumetric distribution statistics, whose parameters are estimated using both the K-means and expectation- maximization algorithms under the information theoretic criteria. Given the desired number of tissue samplings, the prostate needle biopsy site selection is optimized through a probabilistic self-organizing map thus achieving a maximum likelihood of cancer detection. We describe the details of our theory and methodology, and report our pilot results and evaluation of the effectiveness of the algorithm in characterizing prostate cancer distributions and optimizing needle biopsy techniques.
Spin studies of nucleons in a statistical model
International Nuclear Information System (INIS)
Singh, J P; Upadhyay, Alka
2004-01-01
We decompose various quark-gluon Fock states of a nucleon in a set of states in which each of the three-quark core and the rest of the stuff, termed as sea, appears with definite spin and colour quantum number, their weights being determined, statistically, from their multiplicities. The expansion coefficients in the quark-gluon Fock state expansion have been taken from a recently proposed statistical model. We have also considered two modifications of this model with a view to reducing the contributions of the sea components with higher multiplicities. With certain approximations, we have calculated the quark contributions to the spin of the nucleon, the ratio of the magnetic moments of nucleons, their weak decay constant and the ratio of SU(3) reduced matrix elements for the axial current
Statistical inference to advance network models in epidemiology.
Welch, David; Bansal, Shweta; Hunter, David R
2011-03-01
Contact networks are playing an increasingly important role in the study of epidemiology. Most of the existing work in this area has focused on considering the effect of underlying network structure on epidemic dynamics by using tools from probability theory and computer simulation. This work has provided much insight on the role that heterogeneity in host contact patterns plays on infectious disease dynamics. Despite the important understanding afforded by the probability and simulation paradigm, this approach does not directly address important questions about the structure of contact networks such as what is the best network model for a particular mode of disease transmission, how parameter values of a given model should be estimated, or how precisely the data allow us to estimate these parameter values. We argue that these questions are best answered within a statistical framework and discuss the role of statistical inference in estimating contact networks from epidemiological data. Copyright © 2011 Elsevier B.V. All rights reserved.
Kossobokov, V.G.; Romashkova, L.L.; Keilis-Borok, V. I.; Healy, J.H.
1999-01-01
Algorithms M8 and MSc (i.e., the Mendocino Scenario) were used in a real-time intermediate-term research prediction of the strongest earthquakes in the Circum-Pacific seismic belt. Predictions are made by M8 first. Then, the areas of alarm are reduced by MSc at the cost that some earthquakes are missed in the second approximation of prediction. In 1992-1997, five earthquakes of magnitude 8 and above occurred in the test area: all of them were predicted by M8 and MSc identified correctly the locations of four of them. The space-time volume of the alarms is 36% and 18%, correspondingly, when estimated with a normalized product measure of empirical distribution of epicenters and uniform time. The statistical significance of the achieved results is beyond 99% both for M8 and MSc. For magnitude 7.5 + , 10 out of 19 earthquakes were predicted by M8 in 40% and five were predicted by M8-MSc in 13% of the total volume considered. This implies a significance level of 81% for M8 and 92% for M8-MSc. The lower significance levels might result from a global change in seismic regime in 1993-1996, when the rate of the largest events has doubled and all of them become exclusively normal or reversed faults. The predictions are fully reproducible; the algorithms M8 and MSc in complete formal definitions were published before we started our experiment [Keilis-Borok, V.I., Kossobokov, V.G., 1990. Premonitory activation of seismic flow: Algorithm M8, Phys. Earth and Planet. Inter. 61, 73-83; Kossobokov, V.G., Keilis-Borok, V.I., Smith, S.W., 1990. Localization of intermediate-term earthquake prediction, J. Geophys. Res., 95, 19763-19772; Healy, J.H., Kossobokov, V.G., Dewey, J.W., 1992. A test to evaluate the earthquake prediction algorithm, M8. U.S. Geol. Surv. OFR 92-401]. M8 is available from the IASPEI Software Library [Healy, J.H., Keilis-Borok, V.I., Lee, W.H.K. (Eds.), 1997. Algorithms for Earthquake Statistics and Prediction, Vol. 6. IASPEI Software Library]. ?? 1999 Elsevier
A statistical method for descriminating between alternative radiobiological models
International Nuclear Information System (INIS)
Kinsella, I.A.; Malone, J.F.
1977-01-01
Radiobiological models assist understanding of the development of radiation damage, and may provide a basis for extrapolating dose-effect curves from high to low dose regions. Many models have been proposed such as multitarget and its modifications, enzymatic models, and those with a quadratic dose response relationship (i.e. αD + βD 2 forms). It is difficult to distinguish between these because the statistical techniques used are almost always limited, in that one method can rarely be applied to the whole range of models. A general statistical procedure for parameter estimation (Maximum Liklihood Method) has been found applicable to a wide range of radiobiological models. The curve parameters are estimated using a computerised search that continues until the most likely set of values to fit the data is obtained. When the search is complete two procedures are carried out. First a goodness of fit test is applied which examines the applicability of an individual model to the data. Secondly an index is derived which provides an indication of the adequacy of any model compared with alternative models. Thus the models may be ranked according to how well they fit the data. For example, with one set of data, multitarget types were found to be more suitable than quadratic types (αD + βD 2 ). This method should be of assitance is evaluating various models. It may also be profitably applied to selection of the most appropriate model to use, when it is necessary to extrapolate from high to low doses
A statistical model of structure functions and quantum chromodynamics
International Nuclear Information System (INIS)
Mac, E.; Ugaz, E.; Universidad Nacional de Ingenieria, Lima
1989-01-01
We consider a model for the x-dependence of the quark distributions in the proton. Within the context of simple statistical assumptions, we obtain the parton densities in the infinite momentum frame. In a second step lowest order QCD corrections are incorporated to these distributions. Crude, but reasonable, agreement with experiment is found for the F 2 , valence and q, anti q distributions for x> or approx.0.2. (orig.)
A Statistical Model for Soliton Particle Interaction in Plasmas
DEFF Research Database (Denmark)
Dysthe, K. B.; Pécseli, Hans; Truelsen, J.
1986-01-01
A statistical model for soliton-particle interaction is presented. A master equation is derived for the time evolution of the particle velocity distribution as induced by resonant interaction with Korteweg-de Vries solitons. The detailed energy balance during the interaction subsequently determines...... the evolution of the soliton amplitude distribution. The analysis applies equally well for weakly nonlinear plasma waves in a strongly magnetized waveguide, or for ion acoustic waves propagating in one-dimensional systems....
Physical-Statistical Model of Thermal Conductivity of Nanofluids
Directory of Open Access Journals (Sweden)
B. Usowicz
2014-01-01
Full Text Available A physical-statistical model for predicting the effective thermal conductivity of nanofluids is proposed. The volumetric unit of nanofluids in the model consists of solid, liquid, and gas particles and is treated as a system made up of regular geometric figures, spheres, filling the volumetric unit by layers. The model assumes that connections between layers of the spheres and between neighbouring spheres in the layer are represented by serial and parallel connections of thermal resistors, respectively. This model is expressed in terms of thermal resistance of nanoparticles and fluids and the multinomial distribution of particles in the nanofluids. The results for predicted and measured effective thermal conductivity of several nanofluids (Al2O3/ethylene glycol-based and Al2O3/water-based; CuO/ethylene glycol-based and CuO/water-based; and TiO2/ethylene glycol-based are presented. The physical-statistical model shows a reasonably good agreement with the experimental results and gives more accurate predictions for the effective thermal conductivity of nanofluids compared to existing classical models.
Statistical modeling of global geogenic fluoride contamination in groundwaters.
Amini, Manouchehr; Mueller, Kim; Abbaspour, Karim C; Rosenberg, Thomas; Afyuni, Majid; Møller, Klaus N; Sarr, Mamadou; Johnson, C Annette
2008-05-15
The use of groundwater with high fluoride concentrations poses a health threat to millions of people around the world. This study aims at providing a global overview of potentially fluoride-rich groundwaters by modeling fluoride concentration. A large database of worldwide fluoride concentrations as well as available information on related environmental factors such as soil properties, geological settings, and climatic and topographical information on a global scale have all been used in the model. The modeling approach combines geochemical knowledge with statistical methods to devise a rule-based statistical procedure, which divides the world into 8 different "process regions". For each region a separate predictive model was constructed. The end result is a global probability map of fluoride concentration in the groundwater. Comparisons of the modeled and measured data indicate that 60-70% of the fluoride variation could be explained by the models in six process regions, while in two process regions only 30% of the variation in the measured data was explained. Furthermore, the global probability map corresponded well with fluorotic areas described in the international literature. Although the probability map should not replace fluoride testing, it can give a first indication of possible contamination and thus may support the planning process of new drinking water projects.
Statistical model of the powder flow regulation by nanomaterials
Kurfess, D.; Hinrichsen, H.; Zimmermann, I.
2005-01-01
Fine powders often tend to agglomerate due to van der Waals forces between the particles. These forces can be reduced significantly by covering the particles with nanoscaled adsorbates, as shown by recent experiments. In the present work a quantitative statistical analysis of the effect of powder flow regulating nanomaterials on the adhesive forces in powders is given. Covering two spherical powder particles randomly with nanoadsorbates we compute the decrease of the mutual van der Waals forc...
UPPAAL-SMC: Statistical Model Checking for Priced Timed Automata
DEFF Research Database (Denmark)
Bulychev, Petr; David, Alexandre; Larsen, Kim Guldstrand
2012-01-01
in the form of probability distributions and compare probabilities to analyze performance aspects of systems. The focus of the survey is on the evolution of the tool – including modeling and specification formalisms as well as techniques applied – together with applications of the tool to case studies....... on a series of extensions of the statistical model checking approach generalized to handle real-time systems and estimate undecidable problems. U PPAAL - SMC comes together with a friendly user interface that allows a user to specify complex problems in an efficient manner as well as to get feedback...
Statistical mechanics of attractor neural network models with synaptic depression
International Nuclear Information System (INIS)
Igarashi, Yasuhiko; Oizumi, Masafumi; Otsubo, Yosuke; Nagata, Kenji; Okada, Masato
2009-01-01
Synaptic depression is known to control gain for presynaptic inputs. Since cortical neurons receive thousands of presynaptic inputs, and their outputs are fed into thousands of other neurons, the synaptic depression should influence macroscopic properties of neural networks. We employ simple neural network models to explore the macroscopic effects of synaptic depression. Systems with the synaptic depression cannot be analyzed due to asymmetry of connections with the conventional equilibrium statistical-mechanical approach. Thus, we first propose a microscopic dynamical mean field theory. Next, we derive macroscopic steady state equations and discuss the stabilities of steady states for various types of neural network models.
Linguistically motivated statistical machine translation models and algorithms
Xiong, Deyi
2015-01-01
This book provides a wide variety of algorithms and models to integrate linguistic knowledge into Statistical Machine Translation (SMT). It helps advance conventional SMT to linguistically motivated SMT by enhancing the following three essential components: translation, reordering and bracketing models. It also serves the purpose of promoting the in-depth study of the impacts of linguistic knowledge on machine translation. Finally it provides a systematic introduction of Bracketing Transduction Grammar (BTG) based SMT, one of the state-of-the-art SMT formalisms, as well as a case study of linguistically motivated SMT on a BTG-based platform.
Efficient Parallel Statistical Model Checking of Biochemical Networks
Directory of Open Access Journals (Sweden)
Paolo Ballarini
2009-12-01
Full Text Available We consider the problem of verifying stochastic models of biochemical networks against behavioral properties expressed in temporal logic terms. Exact probabilistic verification approaches such as, for example, CSL/PCTL model checking, are undermined by a huge computational demand which rule them out for most real case studies. Less demanding approaches, such as statistical model checking, estimate the likelihood that a property is satisfied by sampling executions out of the stochastic model. We propose a methodology for efficiently estimating the likelihood that a LTL property P holds of a stochastic model of a biochemical network. As with other statistical verification techniques, the methodology we propose uses a stochastic simulation algorithm for generating execution samples, however there are three key aspects that improve the efficiency: first, the sample generation is driven by on-the-fly verification of P which results in optimal overall simulation time. Second, the confidence interval estimation for the probability of P to hold is based on an efficient variant of the Wilson method which ensures a faster convergence. Third, the whole methodology is designed according to a parallel fashion and a prototype software tool has been implemented that performs the sampling/verification process in parallel over an HPC architecture.
Statistical models for expert judgement and wear prediction
International Nuclear Information System (INIS)
Pulkkinen, U.
1994-01-01
This thesis studies the statistical analysis of expert judgements and prediction of wear. The point of view adopted is the one of information theory and Bayesian statistics. A general Bayesian framework for analyzing both the expert judgements and wear prediction is presented. Information theoretic interpretations are given for some averaging techniques used in the determination of consensus distributions. Further, information theoretic models are compared with a Bayesian model. The general Bayesian framework is then applied in analyzing expert judgements based on ordinal comparisons. In this context, the value of information lost in the ordinal comparison process is analyzed by applying decision theoretic concepts. As a generalization of the Bayesian framework, stochastic filtering models for wear prediction are formulated. These models utilize the information from condition monitoring measurements in updating the residual life distribution of mechanical components. Finally, the application of stochastic control models in optimizing operational strategies for inspected components are studied. Monte-Carlo simulation methods, such as the Gibbs sampler and the stochastic quasi-gradient method, are applied in the determination of posterior distributions and in the solution of stochastic optimization problems. (orig.) (57 refs., 7 figs., 1 tab.)
Model-generated air quality statistics for application in vegetation response models in Alberta
International Nuclear Information System (INIS)
McVehil, G.E.; Nosal, M.
1990-01-01
To test and apply vegetation response models in Alberta, air pollution statistics representative of various parts of the Province are required. At this time, air quality monitoring data of the requisite accuracy and time resolution are not available for most parts of Alberta. Therefore, there exists a need to develop appropriate air quality statistics. The objectives of the work reported here were to determine the applicability of model generated air quality statistics and to develop by modelling, realistic and representative time series of hourly SO 2 concentrations that could be used to generate the statistics demanded by vegetation response models
The Impact of Statistical Leakage Models on Design Yield Estimation
Directory of Open Access Journals (Sweden)
Rouwaida Kanj
2011-01-01
Full Text Available Device mismatch and process variation models play a key role in determining the functionality and yield of sub-100 nm design. Average characteristics are often of interest, such as the average leakage current or the average read delay. However, detecting rare functional fails is critical for memory design and designers often seek techniques that enable accurately modeling such events. Extremely leaky devices can inflict functionality fails. The plurality of leaky devices on a bitline increase the dimensionality of the yield estimation problem. Simplified models are possible by adopting approximations to the underlying sum of lognormals. The implications of such approximations on tail probabilities may in turn bias the yield estimate. We review different closed form approximations and compare against the CDF matching method, which is shown to be most effective method for accurate statistical leakage modeling.
The GNASH preequilibrium-statistical nuclear model code
International Nuclear Information System (INIS)
Arthur, E. D.
1988-01-01
The following report is based on materials presented in a series of lectures at the International Center for Theoretical Physics, Trieste, which were designed to describe the GNASH preequilibrium statistical model code and its use. An overview is provided of the code with emphasis upon code's calculational capabilities and the theoretical models that have been implemented in it. Two sample problems are discussed, the first dealing with neutron reactions on 58 Ni. the second illustrates the fission model capabilities implemented in the code and involves n + 235 U reactions. Finally a description is provided of current theoretical model and code development underway. Examples of calculated results using these new capabilities are also given. 19 refs., 17 figs., 3 tabs
Govindan, R. B.; Al-Shargabi, Tareq; Andescavage, Nickie N.; Metzler, Marina; Lenin, R. B.; Plessis, Adré du
2017-01-01
Phase differences of two signals in perfect synchrony exhibit a narrow band distribution, whereas the phase differences of two asynchronous signals exhibit uniform distribution. We assess the statistical significance of the phase synchronization between two signals by using a signed rank test to compare the distribution of their phase differences to the theoretically expected uniform distribution for two asynchronous signals. Using numerical simulation of a second order autoregressive (AR2) process, we show that the proposed approach correctly identifies the coupling between the AR2 process and the driving white noise. We also identify the optimal p-value that distinguishes coupled scenarios from uncoupled ones. To identify the limiting cases, we study the phase synchronization between two independent white noises as a function of bandwidth of the filter in a different second simulation. We identify the frequency bandwidth below which the proposed approach fails and suggest using a data-driven approach for those scenarios. Finally, we demonstrate the application of this approach to study the coupling between beat-to-beat cardiac intervals and continuous blood pressure obtained from critically-ill infants to characterize the baroreflex function.
International Nuclear Information System (INIS)
Shakespeare, T.P.; Mukherjee, R.K.; Gebski, V.J.
2003-01-01
Confidence levels, clinical significance curves, and risk-benefit contours are tools improving analysis of clinical studies and minimizing misinterpretation of published results, however no software has been available for their calculation. The objective was to develop software to help clinicians utilize these tools. Excel 2000 spreadsheets were designed using only built-in functions, without macros. The workbook was protected and encrypted so that users can modify only input cells. The workbook has 4 spreadsheets for use in studies comparing two patient groups. Sheet 1 comprises instructions and graphic examples for use. Sheet 2 allows the user to input the main study results (e.g. survival rates) into a 2-by-2 table. Confidence intervals (95%), p-value and the confidence level for Treatment A being better than Treatment B are automatically generated. An additional input cell allows the user to determine the confidence associated with a specified level of benefit. For example if the user wishes to know the confidence that Treatment A is at least 10% better than B, 10% is entered. Sheet 2 automatically displays clinical significance curves, graphically illustrating confidence levels for all possible benefits of one treatment over the other. Sheet 3 allows input of toxicity data, and calculates the confidence that one treatment is more toxic than the other. It also determines the confidence that the relative toxicity of the most effective arm does not exceed user-defined tolerability. Sheet 4 automatically calculates risk-benefit contours, displaying the confidence associated with a specified scenario of minimum benefit and maximum risk of one treatment arm over the other. The spreadsheet is freely downloadable at www.ontumor.com/professional/statistics.htm A simple, self-explanatory, freely available spreadsheet calculator was developed using Excel 2000. The incorporated decision-making tools can be used for data analysis and improve the reporting of results of any
Austin, Peter C
2018-01-01
The use of the Cox proportional hazards regression model is widespread. A key assumption of the model is that of proportional hazards. Analysts frequently test the validity of this assumption using statistical significance testing. However, the statistical power of such assessments is frequently unknown. We used Monte Carlo simulations to estimate the statistical power of two different methods for detecting violations of this assumption. When the covariate was binary, we found that a model-based method had greater power than a method based on cumulative sums of martingale residuals. Furthermore, the parametric nature of the distribution of event times had an impact on power when the covariate was binary. Statistical power to detect a strong violation of the proportional hazards assumption was low to moderate even when the number of observed events was high. In many data sets, power to detect a violation of this assumption is likely to be low to modest.
Edwards, Phil; Shakur, Haleema; Barnetson, Lin; Prieto, David; Evans, Stephen; Roberts, Ian
2014-06-01
Background The purpose of monitoring in clinical trials is to ensure the rights, safety, and well-being of trial patients and the accuracy of the trial data. In the Clinical Randomisation of an Antifibrinolytic in Significant Haemorrhage (CRASH-2) trial, which recruited over 20,000 adult trauma patients worldwide, the nature and extent of monitoring was based on a risk assessment undertaken before recruitment started. Purpose We report the methods used for central and statistical monitoring in the CRASH-2 trial and explain how central monitoring was used to target on-site investigations. Methods To ensure that trial participants met the inclusion criteria, we monitored event rates for the primary (death) and secondary outcomes (blood transfusion given). We monitored four quantitative variables (systolic blood pressure (SBP), heart rate (HR), respiratory rate, and capillary refill time) as indicators of the severity of bleeding. We used the coefficient of variation (CV) to identify sites with too much or too little variability. To ensure the accuracy of the data on side effects, we monitored thromboembolic events at each site. Sites with higher or lower than expected event rates were identified for further evaluation. Results A total of 274 sites recruited patients: 145 sites recruited ≥20; patients, and 52 sites recruited ≥100 patients. Sites with low case fatality and low blood transfusion rates were found to be including patients with relatively mild haemorrhage. One site with a high rate of thromboembolic events was found to be using clinical judgement alone. Measurements of SBP and HR varied by about one-fifth of their average value, and capillary refill time measurements varied by around one-third of their average; between-site variation was lowest for blood pressure. Limitations A comparison of mean and median CV indicated that the distributions are slightly skewed to the right. Our simple approach to calculating 95% confidence intervals for the CV may be
Editorial to: Six papers on Dynamic Statistical Models
DEFF Research Database (Denmark)
2014-01-01
statistical methodology and theory for large and complex data sets that included biostatisticians and mathematical statisticians from three faculties at the University of Copenhagen. The satellite meeting took place August 17–19, 2011. Its purpose was to bring together researchers in statistics and related...... Group-Sequential Covariate-Adjusted Randomized Clinical Trials Antoine Chambaz and Mark J. van der Laan Estimation of Causal Odds of Concordance using the Aalen Additive Model Torben Martinussen and Christian Bressen Pipper We would like to acknowledge the financial support from the University...... of Copenhagen Program of Excellence and Elsevier. We would also like to thank the authors for contributing interesting papers, the referees for their helpful reports, and the present and previous editors of SJS for their support of the publication of the papers from the satellite meeting....
The Statistical Multifragmentation Model with Skyrme Effective Interactions
Carlson, B V; Donangelo, R; Lynch, W G; Steiner, A W; Tsang, M B
2010-01-01
The Statistical Multifragmentation Model is modified to incorporate Helmholtz free energies calculated in the finite temperature Thomas-Fermi approximation using Skyrme effective interactions. In this formulation, the density of the fragments at the freeze-out configuration corresponds to the equilibrium value obtained in the Thomas-Fermi approximation at the given temperature. The behavior of the nuclear caloric curve, at constant volume, is investigated in the micro-canonical ensemble and a plateau is observed for excitation energies between 8 and 10 MeV per nucleon. A small kink in the caloric curve is found at the onset of this gas transition, indicating the existence of negative heat capacity, even in this case in which the system is constrained to a fixed volume, in contrast to former statistical calculations.
Bayesian Sensitivity Analysis of Statistical Models with Missing Data.
Zhu, Hongtu; Ibrahim, Joseph G; Tang, Niansheng
2014-04-01
Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investigate the tenability of the non-ignorable missing at random (NMAR) assumption. Simulation studies are conducted to evaluate our methods, and a dataset is analyzed to illustrate the use of our diagnostic measures.
A statistical model for interpreting computerized dynamic posturography data
Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.
2002-01-01
Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.
Monthly to seasonal low flow prediction: statistical versus dynamical models
Ionita-Scholz, Monica; Klein, Bastian; Meissner, Dennis; Rademacher, Silke
2016-04-01
the Alfred Wegener Institute a purely statistical scheme to generate streamflow forecasts for several months ahead. Instead of directly using teleconnection indices (e.g. NAO, AO) the idea is to identify regions with stable teleconnections between different global climate information (e.g. sea surface temperature, geopotential height etc.) and streamflow at different gauges relevant for inland waterway transport. So-called stability (correlation) maps are generated showing regions where streamflow and climate variable from previous months are significantly correlated in a 21 (31) years moving window. Finally, the optimal forecast model is established based on a multiple regression analysis of the stable predictors. We will present current results of the aforementioned approaches with focus on the River Rhine (being one of the world's most frequented waterways and the backbone of the European inland waterway network) and the Elbe River. Overall, our analysis reveals the existence of a valuable predictability of the low flows at monthly and seasonal time scales, a result that may be useful to water resources management. Given that all predictors used in the models are available at the end of each month, the forecast scheme can be used operationally to predict extreme events and to provide early warnings for upcoming low flows.
Dynamic statistical models of biological cognition: insights from communications theory
Wallace, Rodrick
2014-10-01
Maturana's cognitive perspective on the living state, Dretske's insight on how information theory constrains cognition, the Atlan/Cohen cognitive paradigm, and models of intelligence without representation, permit construction of a spectrum of dynamic necessary conditions statistical models of signal transduction, regulation, and metabolism at and across the many scales and levels of organisation of an organism and its context. Nonequilibrium critical phenomena analogous to physical phase transitions, driven by crosstalk, will be ubiquitous, representing not only signal switching, but the recruitment of underlying cognitive modules into tunable dynamic coalitions that address changing patterns of need and opportunity at all scales and levels of organisation. The models proposed here, while certainly providing much conceptual insight, should be most useful in the analysis of empirical data, much as are fitted regression equations.
MASKED AREAS IN SHEAR PEAK STATISTICS: A FORWARD MODELING APPROACH
Energy Technology Data Exchange (ETDEWEB)
Bard, D. [KIPAC, SLAC National Accelerator Laboratory, 2575 Sand Hill Rd, Menlo Park, CA 94025 (United States); Kratochvil, J. M. [Astrophysics and Cosmology Research Unit, University of KwaZulu-Natal, Westville, Durban 4000 (South Africa); Dawson, W., E-mail: djbard@slac.stanford.edu [Lawrence Livermore National Laboratory, 7000 East Ave, Livermore, CA 94550 (United States)
2016-03-10
The statistics of shear peaks have been shown to provide valuable cosmological information beyond the power spectrum, and will be an important constraint of models of cosmology in forthcoming astronomical surveys. Surveys include masked areas due to bright stars, bad pixels etc., which must be accounted for in producing constraints on cosmology from shear maps. We advocate a forward-modeling approach, where the impacts of masking and other survey artifacts are accounted for in the theoretical prediction of cosmological parameters, rather than correcting survey data to remove them. We use masks based on the Deep Lens Survey, and explore the impact of up to 37% of the survey area being masked on LSST and DES-scale surveys. By reconstructing maps of aperture mass the masking effect is smoothed out, resulting in up to 14% smaller statistical uncertainties compared to simply reducing the survey area by the masked area. We show that, even in the presence of large survey masks, the bias in cosmological parameter estimation produced in the forward-modeling process is ≈1%, dominated by bias caused by limited simulation volume. We also explore how this potential bias scales with survey area and evaluate how much small survey areas are impacted by the differences in cosmological structure in the data and simulated volumes, due to cosmic variance.
International Nuclear Information System (INIS)
Tumur, Odgerel; Soon, Kean; Brown, Fraser; Mykytowycz, Marcus
2013-01-01
The aims of our study were to evaluate the effect of application of Adaptive Statistical Iterative Reconstruction (ASIR) algorithm on the radiation dose of coronary computed tomography angiography (CCTA) and its effects on image quality of CCTA and to evaluate the effects of various patient and CT scanning factors on the radiation dose of CCTA. This was a retrospective study that included 347 consecutive patients who underwent CCTA at a tertiary university teaching hospital between 1 July 2009 and 20 September 2011. Analysis was performed comparing patient demographics, scan characteristics, radiation dose and image quality in two groups of patients in whom conventional Filtered Back Projection (FBP) or ASIR was used for image reconstruction. There were 238 patients in the FBP group and 109 patients in the ASIR group. There was no difference between the groups in the use of prospective gating, scan length or tube voltage. In ASIR group, significantly lower tube current was used compared with FBP group, 550mA (450–600) vs. 650mA (500–711.25) (median (interquartile range)), respectively, P<0.001. There was 27% effective radiation dose reduction in the ASIR group compared with FBP group, 4.29mSv (2.84–6.02) vs. 5.84mSv (3.88–8.39) (median (interquartile range)), respectively, P<0.001. Although ASIR was associated with increased image noise compared with FBP (39.93±10.22 vs. 37.63±18.79 (mean ±standard deviation), respectively, P<001), it did not affect the signal intensity, signal-to-noise ratio, contrast-to-noise ratio or the diagnostic quality of CCTA. Application of ASIR reduces the radiation dose of CCTA without affecting the image quality.
Development of modelling algorithm of technological systems by statistical tests
Shemshura, E. A.; Otrokov, A. V.; Chernyh, V. G.
2018-03-01
The paper tackles the problem of economic assessment of design efficiency regarding various technological systems at the stage of their operation. The modelling algorithm of a technological system was performed using statistical tests and with account of the reliability index allows estimating the level of machinery technical excellence and defining the efficiency of design reliability against its performance. Economic feasibility of its application shall be determined on the basis of service quality of a technological system with further forecasting of volumes and the range of spare parts supply.
A Tensor Statistical Model for Quantifying Dynamic Functional Connectivity.
Zhu, Yingying; Zhu, Xiaofeng; Kim, Minjeong; Yan, Jin; Wu, Guorong
2017-06-01
Functional connectivity (FC) has been widely investigated in many imaging-based neuroscience and clinical studies. Since functional Magnetic Resonance Image (MRI) signal is just an indirect reflection of brain activity, it is difficult to accurately quantify the FC strength only based on signal correlation. To address this limitation, we propose a learning-based tensor model to derive high sensitivity and specificity connectome biomarkers at the individual level from resting-state fMRI images. First, we propose a learning-based approach to estimate the intrinsic functional connectivity. In addition to the low level region-to-region signal correlation, latent module-to-module connection is also estimated and used to provide high level heuristics for measuring connectivity strength. Furthermore, sparsity constraint is employed to automatically remove the spurious connections, thus alleviating the issue of searching for optimal threshold. Second, we integrate our learning-based approach with the sliding-window technique to further reveal the dynamics of functional connectivity. Specifically, we stack the functional connectivity matrix within each sliding window and form a 3D tensor where the third dimension denotes for time. Then we obtain dynamic functional connectivity (dFC) for each individual subject by simultaneously estimating the within-sliding-window functional connectivity and characterizing the across-sliding-window temporal dynamics. Third, in order to enhance the robustness of the connectome patterns extracted from dFC, we extend the individual-based 3D tensors to a population-based 4D tensor (with the fourth dimension stands for the training subjects) and learn the statistics of connectome patterns via 4D tensor analysis. Since our 4D tensor model jointly (1) optimizes dFC for each training subject and (2) captures the principle connectome patterns, our statistical model gains more statistical power of representing new subject than current state
Error statistics of hidden Markov model and hidden Boltzmann model results
Directory of Open Access Journals (Sweden)
Newberg Lee A
2009-07-01
Full Text Available Abstract Background Hidden Markov models and hidden Boltzmann models are employed in computational biology and a variety of other scientific fields for a variety of analyses of sequential data. Whether the associated algorithms are used to compute an actual probability or, more generally, an odds ratio or some other score, a frequent requirement is that the error statistics of a given score be known. What is the chance that random data would achieve that score or better? What is the chance that a real signal would achieve a given score threshold? Results Here we present a novel general approach to estimating these false positive and true positive rates that is significantly more efficient than are existing general approaches. We validate the technique via an implementation within the HMMER 3.0 package, which scans DNA or protein sequence databases for patterns of interest, using a profile-HMM. Conclusion The new approach is faster than general naïve sampling approaches, and more general than other current approaches. It provides an efficient mechanism by which to estimate error statistics for hidden Markov model and hidden Boltzmann model results.
Statistical Models for Inferring Vegetation Composition from Fossil Pollen
Paciorek, C.; McLachlan, J. S.; Shang, Z.
2011-12-01
Fossil pollen provide information about vegetation composition that can be used to help understand how vegetation has changed over the past. However, these data have not traditionally been analyzed in a way that allows for statistical inference about spatio-temporal patterns and trends. We build a Bayesian hierarchical model called STEPPS (Spatio-Temporal Empirical Prediction from Pollen in Sediments) that predicts forest composition in southern New England, USA, over the last two millenia based on fossil pollen. The critical relationships between abundances of tree taxa in the pollen record and abundances in actual vegetation are estimated using modern (Forest Inventory Analysis) data and (witness tree) data from colonial records. This gives us two time points at which both pollen and direct vegetation data are available. Based on these relationships, and incorporating our uncertainty about them, we predict forest composition using fossil pollen. We estimate the spatial distribution and relative abundances of tree species and draw inference about how these patterns have changed over time. Finally, we describe ongoing work to extend the modeling to the upper Midwest of the U.S., including an approach to infer tree density and thereby estimate the prairie-forest boundary in Minnesota and Wisconsin. This work is part of the PalEON project, which brings together a team of ecosystem modelers, paleoecologists, and statisticians with the goal of reconstructing vegetation responses to climate during the last two millenia in the northeastern and midwestern United States. The estimates from the statistical modeling will be used to assess and calibrate ecosystem models that are used to project ecological changes in response to global change.
Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.
2017-12-01
Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.
Fisher, Aaron; Anderson, G Brooke; Peng, Roger; Leek, Jeff
2014-01-01
Scatterplots are the most common way for statisticians, scientists, and the public to visually detect relationships between measured variables. At the same time, and despite widely publicized controversy, P-values remain the most commonly used measure to statistically justify relationships identified between variables. Here we measure the ability to detect statistically significant relationships from scatterplots in a randomized trial of 2,039 students in a statistics massive open online course (MOOC). Each subject was shown a random set of scatterplots and asked to visually determine if the underlying relationships were statistically significant at the P relationships, and 74.6% (95% CI [72.5%-76.6%]) of non-significant relationships. Adding visual aids such as a best fit line or scatterplot smooth increased the probability a relationship was called significant, regardless of whether the relationship was actually significant. Classification of statistically significant relationships improved on repeat attempts of the survey, although classification of non-significant relationships did not. Our results suggest: (1) that evidence-based data analysis can be used to identify weaknesses in theoretical procedures in the hands of average users, (2) data analysts can be trained to improve detection of statistically significant results with practice, but (3) data analysts have incorrect intuition about what statistically significant relationships look like, particularly for small effects. We have built a web tool for people to compare scatterplots with their corresponding p-values which is available here: http://glimmer.rstudio.com/afisher/EDA/.
Statistical Agent Based Modelization of the Phenomenon of Drug Abuse
di Clemente, Riccardo; Pietronero, Luciano
2012-07-01
We introduce a statistical agent based model to describe the phenomenon of drug abuse and its dynamical evolution at the individual and global level. The agents are heterogeneous with respect to their intrinsic inclination to drugs, to their budget attitude and social environment. The various levels of drug use were inspired by the professional description of the phenomenon and this permits a direct comparison with all available data. We show that certain elements have a great importance to start the use of drugs, for example the rare events in the personal experiences which permit to overcame the barrier of drug use occasionally. The analysis of how the system reacts to perturbations is very important to understand its key elements and it provides strategies for effective policy making. The present model represents the first step of a realistic description of this phenomenon and can be easily generalized in various directions.
A statistical model of Rift Valley fever activity in Egypt.
Drake, John M; Hassan, Ali N; Beier, John C
2013-12-01
Rift Valley fever (RVF) is a viral disease of animals and humans and a global public health concern due to its ecological plasticity, adaptivity, and potential for spread to countries with a temperate climate. In many places, outbreaks are episodic and linked to climatic, hydrologic, and socioeconomic factors. Although outbreaks of RVF have occurred in Egypt since 1977, attempts to identify risk factors have been limited. Using a statistical learning approach (lasso-regularized generalized linear model), we tested the hypotheses that outbreaks in Egypt are linked to (1) River Nile conditions that create a mosquito vector habitat, (2) entomologic conditions favorable to transmission, (3) socio-economic factors (Islamic festival of Greater Bairam), and (4) recent history of transmission activity. Evidence was found for effects of rainfall and river discharge and recent history of transmission activity. There was no evidence for an effect of Greater Bairam. The model predicted RVF activity correctly in 351 of 358 months (98.0%). This is the first study to statistically identify risk factors for RVF outbreaks in a region of unstable transmission. © 2013 The Society for Vector Ecology.
Statistical modeling of global geogenic arsenic contamination in groundwater.
Amini, Manouchehr; Abbaspour, Karim C; Berg, Michael; Winkel, Lenny; Hug, Stephan J; Hoehn, Eduard; Yang, Hong; Johnson, C Annette
2008-05-15
Contamination of groundwaters with geogenic arsenic poses a major health risk to millions of people. Although the main geochemical mechanisms of arsenic mobilization are well understood, the worldwide scale of affected regions is still unknown. In this study we used a large database of measured arsenic concentration in groundwaters (around 20,000 data points) from around the world as well as digital maps of physical characteristics such as soil, geology, climate, and elevation to model probability maps of global arsenic contamination. A novel rule-based statistical procedure was used to combine the physical data and expert knowledge to delineate two process regions for arsenic mobilization: "reducing" and "high-pH/ oxidizing". Arsenic concentrations were modeled in each region using regression analysis and adaptive neuro-fuzzy inferencing followed by Latin hypercube sampling for uncertainty propagation to produce probability maps. The derived global arsenic models could benefit from more accurate geologic information and aquifer chemical/physical information. Using some proxy surface information, however, the models explained 77% of arsenic variation in reducing regions and 68% of arsenic variation in high-pH/oxidizing regions. The probability maps based on the above models correspond well with the known contaminated regions around the world and delineate new untested areas that have a high probability of arsenic contamination. Notable among these regions are South East and North West of China in Asia, Central Australia, New Zealand, Northern Afghanistan, and Northern Mali and Zambia in Africa.
Drobny, Jon; Curreli, Davide; Ruzic, David; Lasa, Ane; Green, David; Canik, John; Younkin, Tim; Blondel, Sophie; Wirth, Brian
2017-10-01
Surface roughness greatly impacts material erosion, and thus plays an important role in Plasma-Surface Interactions. Developing strategies for efficiently introducing rough surfaces into ion-solid interaction codes will be an important step towards whole-device modeling of plasma devices and future fusion reactors such as ITER. Fractal TRIDYN (F-TRIDYN) is an upgraded version of the Monte Carlo, BCA program TRIDYN developed for this purpose that includes an explicit fractal model of surface roughness and extended input and output options for file-based code coupling. Code coupling with both plasma and material codes has been achieved and allows for multi-scale, whole-device modeling of plasma experiments. These code coupling results will be presented. F-TRIDYN has been further upgraded with an alternative, statistical model of surface roughness. The statistical model is significantly faster than and compares favorably to the fractal model. Additionally, the statistical model compares well to alternative computational surface roughness models and experiments. Theoretical links between the fractal and statistical models are made, and further connections to experimental measurements of surface roughness are explored. This work was supported by the PSI-SciDAC Project funded by the U.S. Department of Energy through contract DOE-DE-SC0008658.
Statistical power of likelihood ratio and Wald tests in latent class models with covariates
Gudicha, D.W.; Schmittmann, V.D.; Vermunt, J.K.
2017-01-01
This paper discusses power and sample-size computation for likelihood ratio and Wald testing of the significance of covariate effects in latent class models. For both tests, asymptotic distributions can be used; that is, the test statistic can be assumed to follow a central Chi-square under the null
Linear mixed models a practical guide using statistical software
West, Brady T; Galecki, Andrzej T
2014-01-01
Highly recommended by JASA, Technometrics, and other journals, the first edition of this bestseller showed how to easily perform complex linear mixed model (LMM) analyses via a variety of software programs. Linear Mixed Models: A Practical Guide Using Statistical Software, Second Edition continues to lead readers step by step through the process of fitting LMMs. This second edition covers additional topics on the application of LMMs that are valuable for data analysts in all fields. It also updates the case studies using the latest versions of the software procedures and provides up-to-date information on the options and features of the software procedures available for fitting LMMs in SAS, SPSS, Stata, R/S-plus, and HLM.New to the Second Edition A new chapter on models with crossed random effects that uses a case study to illustrate software procedures capable of fitting these models Power analysis methods for longitudinal and clustered study designs, including software options for power analyses and suggest...
Stochastic Spatial Models in Ecology: A Statistical Physics Approach
Pigolotti, Simone; Cencini, Massimo; Molina, Daniel; Muñoz, Miguel A.
2017-11-01
Ecosystems display a complex spatial organization. Ecologists have long tried to characterize them by looking at how different measures of biodiversity change across spatial scales. Ecological neutral theory has provided simple predictions accounting for general empirical patterns in communities of competing species. However, while neutral theory in well-mixed ecosystems is mathematically well understood, spatial models still present several open problems, limiting the quantitative understanding of spatial biodiversity. In this review, we discuss the state of the art in spatial neutral theory. We emphasize the connection between spatial ecological models and the physics of non-equilibrium phase transitions and how concepts developed in statistical physics translate in population dynamics, and vice versa. We focus on non-trivial scaling laws arising at the critical dimension D = 2 of spatial neutral models, and their relevance for biological populations inhabiting two-dimensional environments. We conclude by discussing models incorporating non-neutral effects in the form of spatial and temporal disorder, and analyze how their predictions deviate from those of purely neutral theories.
The epistemology of mathematical and statistical modeling: a quiet methodological revolution.
Rodgers, Joseph Lee
2010-01-01
A quiet methodological revolution, a modeling revolution, has occurred over the past several decades, almost without discussion. In contrast, the 20th century ended with contentious argument over the utility of null hypothesis significance testing (NHST). The NHST controversy may have been at least partially irrelevant, because in certain ways the modeling revolution obviated the NHST argument. I begin with a history of NHST and modeling and their relation to one another. Next, I define and illustrate principles involved in developing and evaluating mathematical models. Following, I discuss the difference between using statistical procedures within a rule-based framework and building mathematical models from a scientific epistemology. Only the former is treated carefully in most psychology graduate training. The pedagogical implications of this imbalance and the revised pedagogy required to account for the modeling revolution are described. To conclude, I discuss how attention to modeling implies shifting statistical practice in certain progressive ways. The epistemological basis of statistics has moved away from being a set of procedures, applied mechanistically, and moved toward building and evaluating statistical and scientific models. Copyrigiht 2009 APA, all rights reserved.
Multiresolution wavelet-ANN model for significant wave height forecasting.
Digital Repository Service at National Institute of Oceanography (India)
Deka, P.C.; Mandal, S.; Prahlada, R.
(ANN) modeling. The transformed output data are used as inputs to ANN models. Various decomposition levels have been tried for a db3 wavelet to obtain optimal results. It is found that the performance of hybrid WLNN is better than that of ANN when lead...
A Statistical Toolbox For Mining And Modeling Spatial Data
Directory of Open Access Journals (Sweden)
D’Aubigny Gérard
2016-12-01
Full Text Available Most data mining projects in spatial economics start with an evaluation of a set of attribute variables on a sample of spatial entities, looking for the existence and strength of spatial autocorrelation, based on the Moran’s and the Geary’s coefficients, the adequacy of which is rarely challenged, despite the fact that when reporting on their properties, many users seem likely to make mistakes and to foster confusion. My paper begins by a critical appraisal of the classical definition and rational of these indices. I argue that while intuitively founded, they are plagued by an inconsistency in their conception. Then, I propose a principled small change leading to corrected spatial autocorrelation coefficients, which strongly simplifies their relationship, and opens the way to an augmented toolbox of statistical methods of dimension reduction and data visualization, also useful for modeling purposes. A second section presents a formal framework, adapted from recent work in statistical learning, which gives theoretical support to our definition of corrected spatial autocorrelation coefficients. More specifically, the multivariate data mining methods presented here, are easily implementable on the existing (free software, yield methods useful to exploit the proposed corrections in spatial data analysis practice, and, from a mathematical point of view, whose asymptotic behavior, already studied in a series of papers by Belkin & Niyogi, suggests that they own qualities of robustness and a limited sensitivity to the Modifiable Areal Unit Problem (MAUP, valuable in exploratory spatial data analysis.
Statistical Models and Methods for Network Meta-Analysis.
Madden, L V; Piepho, H-P; Paul, P A
2016-08-01
Meta-analysis, the methodology for analyzing the results from multiple independent studies, has grown tremendously in popularity over the last four decades. Although most meta-analyses involve a single effect size (summary result, such as a treatment difference) from each study, there are often multiple treatments of interest across the network of studies in the analysis. Multi-treatment (or network) meta-analysis can be used for simultaneously analyzing the results from all the treatments. However, the methodology is considerably more complicated than for the analysis of a single effect size, and there have not been adequate explanations of the approach for agricultural investigations. We review the methods and models for conducting a network meta-analysis based on frequentist statistical principles, and demonstrate the procedures using a published multi-treatment plant pathology data set. A major advantage of network meta-analysis is that correlations of estimated treatment effects are automatically taken into account when an appropriate model is used. Moreover, treatment comparisons may be possible in a network meta-analysis that are not possible in a single study because all treatments of interest may not be included in any given study. We review several models that consider the study effect as either fixed or random, and show how to interpret model-fitting output. We further show how to model the effect of moderator variables (study-level characteristics) on treatment effects, and present one approach to test for the consistency of treatment effects across the network. Online supplemental files give explanations on fitting the network meta-analytical models using SAS.
A statistical analysis based recommender model for heart disease patients.
Mustaqeem, Anam; Anwar, Syed Muhammad; Khan, Abdul Rashid; Majid, Muhammad
2017-12-01
An intelligent information technology based system could have a positive impact on the life-style of patients suffering from chronic diseases by providing useful health recommendations. In this paper, we have proposed a hybrid model that provides disease prediction and medical recommendations to cardiac patients. The first part aims at implementing a prediction model, that can identify the disease of a patient and classify it into one of the four output classes i.e., non-cardiac chest pain, silent ischemia, angina, and myocardial infarction. Following the disease prediction, the second part of the model provides general medical recommendations to patients. The recommendations are generated by assessing the severity of clinical features of patients, estimating the risk associated with clinical features and disease, and calculating the probability of occurrence of disease. The purpose of this model is to build an intelligent and adaptive recommender system for heart disease patients. The experiments for the proposed recommender system are conducted on a clinical data set collected and labelled in consultation with medical experts from a known hospital. The performance of the proposed prediction model is evaluated using accuracy and kappa statistics as evaluation measures. The medical recommendations are generated based on information collected from a knowledge base created with the help of physicians. The results of the recommendation model are evaluated using confusion matrix and gives an accuracy of 97.8%. The proposed system exhibits good prediction and recommendation accuracies and promises to be a useful contribution in the field of e-health and medical informatics. Copyright © 2017 Elsevier B.V. All rights reserved.
A statistical downscaling model for summer rainfall over Pakistan
Kazmi, Dildar Hussain; Li, Jianping; Ruan, Chengqing; Zhao, Sen; Li, Yanjie
2016-10-01
A statistical approach is utilized to construct an interannual model for summer (July-August) rainfall over the western parts of South Asian Monsoon. Observed monthly rainfall data for selected stations of Pakistan for the last 55 years (1960-2014) is taken as predictand. Recommended climate indices along with the oceanic and atmospheric data on global scales, for the period April-June are employed as predictors. First 40 years data has been taken as training period and the rest as validation period. Cross-validation stepwise regression approach adopted to select the robust predictors. Upper tropospheric zonal wind at 200 hPa over the northeastern Atlantic is finally selected as the best predictor for interannual model. Besides, the next possible candidate `geopotential height at upper troposphere' is taken as the indirect predictor for being a source of energy transportation from core region (northeast Atlantic/western Europe) to the study area. The model performed well for both the training as well as validation period with correlation coefficient of 0.71 and tolerable root mean square errors. Cross-validation of the model has been processed by incorporating JRA-55 data for potential predictors in addition to NCEP and fragmentation of study period to five non-overlapping test samples. Subsequently, to verify the outcome of the model on physical grounds, observational analyses as well as the model simulations are incorporated. It is revealed that originating from the jet exit region through large vorticity gradients, zonally dominating waves may transport energy and momentum to the downstream areas of west-central Asia, that ultimately affect interannual variability of the specific rainfall. It has been detected that both the circumglobal teleconnection and Rossby wave propagation play vital roles in modulating the proposed mechanism.
STATISTICAL MECHANICS MODELING OF MESOSCALE DEFORMATION IN METALS
Energy Technology Data Exchange (ETDEWEB)
Anter El-Azab
2013-04-08
The research under this project focused on a theoretical and computational modeling of dislocation dynamics of mesoscale deformation of metal single crystals. Specifically, the work aimed to implement a continuum statistical theory of dislocations to understand strain hardening and cell structure formation under monotonic loading. These aspects of crystal deformation are manifestations of the evolution of the underlying dislocation system under mechanical loading. The project had three research tasks: 1) Investigating the statistical characteristics of dislocation systems in deformed crystals. 2) Formulating kinetic equations of dislocations and coupling these kinetics equations and crystal mechanics. 3) Computational solution of coupled crystal mechanics and dislocation kinetics. Comparison of dislocation dynamics predictions with experimental results in the area of statistical properties of dislocations and their field was also a part of the proposed effort. In the first research task, the dislocation dynamics simulation method was used to investigate the spatial, orientation, velocity, and temporal statistics of dynamical dislocation systems, and on the use of the results from this investigation to complete the kinetic description of dislocations. The second task focused on completing the formulation of a kinetic theory of dislocations that respects the discrete nature of crystallographic slip and the physics of dislocation motion and dislocation interaction in the crystal. Part of this effort also targeted the theoretical basis for establishing the connection between discrete and continuum representation of dislocations and the analysis of discrete dislocation simulation results within the continuum framework. This part of the research enables the enrichment of the kinetic description with information representing the discrete dislocation systems behavior. The third task focused on the development of physics-inspired numerical methods of solution of the coupled
Adams, James; Howsmon, Daniel P.; Kruger, Uwe; Geis, Elizabeth; Gehn, Eva; Fimbres, Valeria; Pollard, Elena; Mitchell, Jessica; Ingram, Julie; Hellmers, Robert; Quig, David; Hahn, Juergen
2017-01-01
Introduction A number of previous studies examined a possible association of toxic metals and autism, and over half of those studies suggest that toxic metal levels are different in individuals with Autism Spectrum Disorders (ASD). Additionally, several studies found that those levels correlate with the severity of ASD. Methods In order to further investigate these points, this paper performs the most detailed statistical analysis to date of a data set in this field. First morning urine sampl...
Smooth extrapolation of unknown anatomy via statistical shape models
Grupp, R. B.; Chiang, H.; Otake, Y.; Murphy, R. J.; Gordon, C. R.; Armand, M.; Taylor, R. H.
2015-03-01
Several methods to perform extrapolation of unknown anatomy were evaluated. The primary application is to enhance surgical procedures that may use partial medical images or medical images of incomplete anatomy. Le Fort-based, face-jaw-teeth transplant is one such procedure. From CT data of 36 skulls and 21 mandibles separate Statistical Shape Models of the anatomical surfaces were created. Using the Statistical Shape Models, incomplete surfaces were projected to obtain complete surface estimates. The surface estimates exhibit non-zero error in regions where the true surface is known; it is desirable to keep the true surface and seamlessly merge the estimated unknown surface. Existing extrapolation techniques produce non-smooth transitions from the true surface to the estimated surface, resulting in additional error and a less aesthetically pleasing result. The three extrapolation techniques evaluated were: copying and pasting of the surface estimate (non-smooth baseline), a feathering between the patient surface and surface estimate, and an estimate generated via a Thin Plate Spline trained from displacements between the surface estimate and corresponding vertices of the known patient surface. Feathering and Thin Plate Spline approaches both yielded smooth transitions. However, feathering corrupted known vertex values. Leave-one-out analyses were conducted, with 5% to 50% of known anatomy removed from the left-out patient and estimated via the proposed approaches. The Thin Plate Spline approach yielded smaller errors than the other two approaches, with an average vertex error improvement of 1.46 mm and 1.38 mm for the skull and mandible respectively, over the baseline approach.
Uncertainty analysis in statistical modeling of extreme hydrological events
Xu, YuePing; Booij, Martijn J.; Tong, Yang-Bin
2010-01-01
With the increase of both magnitude and frequency of hydrological extreme events such as drought and flooding, the significance of adequately modeling hydrological extreme events is fully recognized. Estimation of extreme rainfall/flood for various return periods is of prime importance for
Optimizing DNA assembly based on statistical language modelling.
Fang, Gang; Zhang, Shemin; Dong, Yafei
2017-12-15
By successively assembling genetic parts such as BioBrick according to grammatical models, complex genetic constructs composed of dozens of functional blocks can be built. However, usually every category of genetic parts includes a few or many parts. With increasing quantity of genetic parts, the process of assembling more than a few sets of these parts can be expensive, time consuming and error prone. At the last step of assembling it is somewhat difficult to decide which part should be selected. Based on statistical language model, which is a probability distribution P(s) over strings S that attempts to reflect how frequently a string S occurs as a sentence, the most commonly used parts will be selected. Then, a dynamic programming algorithm was designed to figure out the solution of maximum probability. The algorithm optimizes the results of a genetic design based on a grammatical model and finds an optimal solution. In this way, redundant operations can be reduced and the time and cost required for conducting biological experiments can be minimized. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Critical, statistical, and thermodynamical properties of lattice models
Energy Technology Data Exchange (ETDEWEB)
Varma, Vipin Kerala
2013-10-15
In this thesis we investigate zero temperature and low temperature properties - critical, statistical and thermodynamical - of lattice models in the contexts of bosonic cold atom systems, magnetic materials, and non-interacting particles on various lattice geometries. We study quantum phase transitions in the Bose-Hubbard model with higher body interactions, as relevant for optical lattice experiments of strongly interacting bosons, in one and two dimensions; the universality of the Mott insulator to superfluid transition is found to remain unchanged for even large three body interaction strengths. A systematic renormalization procedure is formulated to fully re-sum these higher (three and four) body interactions into the two body terms. In the strongly repulsive limit, we analyse the zero and low temperature physics of interacting hard-core bosons on the kagome lattice at various fillings. Evidence for a disordered phase in the Ising limit of the model is presented; in the strong coupling limit, the transition between the valence bond solid and the superfluid is argued to be first order at the tip of the solid lobe.
Critical, statistical, and thermodynamical properties of lattice models
International Nuclear Information System (INIS)
Varma, Vipin Kerala
2013-10-01
In this thesis we investigate zero temperature and low temperature properties - critical, statistical and thermodynamical - of lattice models in the contexts of bosonic cold atom systems, magnetic materials, and non-interacting particles on various lattice geometries. We study quantum phase transitions in the Bose-Hubbard model with higher body interactions, as relevant for optical lattice experiments of strongly interacting bosons, in one and two dimensions; the universality of the Mott insulator to superfluid transition is found to remain unchanged for even large three body interaction strengths. A systematic renormalization procedure is formulated to fully re-sum these higher (three and four) body interactions into the two body terms. In the strongly repulsive limit, we analyse the zero and low temperature physics of interacting hard-core bosons on the kagome lattice at various fillings. Evidence for a disordered phase in the Ising limit of the model is presented; in the strong coupling limit, the transition between the valence bond solid and the superfluid is argued to be first order at the tip of the solid lobe.
Terminal-Dependent Statistical Inference for the FBSDEs Models
Directory of Open Access Journals (Sweden)
Yunquan Song
2014-01-01
Full Text Available The original stochastic differential equations (OSDEs and forward-backward stochastic differential equations (FBSDEs are often used to model complex dynamic process that arise in financial, ecological, and many other areas. The main difference between OSDEs and FBSDEs is that the latter is designed to depend on a terminal condition, which is a key factor in some financial and ecological circumstances. It is interesting but challenging to estimate FBSDE parameters from noisy data and the terminal condition. However, to the best of our knowledge, the terminal-dependent statistical inference for such a model has not been explored in the existing literature. We proposed a nonparametric terminal control variables estimation method to address this problem. The reason why we use the terminal control variables is that the newly proposed inference procedures inherit the terminal-dependent characteristic. Through this new proposed method, the estimators of the functional coefficients of the FBSDEs model are obtained. The asymptotic properties of the estimators are also discussed. Simulation studies show that the proposed method gives satisfying estimates for the FBSDE parameters from noisy data and the terminal condition. A simulation is performed to test the feasibility of our method.
The statistical multifragmentation model: Origins and recent advances
Energy Technology Data Exchange (ETDEWEB)
Donangelo, R., E-mail: donangel@fing.edu.uy [Instituto de Física, Facultad de Ingeniería, Universidad de la República, Julio Herrera y Reissig 565, 11300, Montevideo (Uruguay); Instituto de Física, Universidade Federal do Rio de Janeiro, C.P. 68528, 21941-972 Rio de Janeiro - RJ (Brazil); Souza, S. R., E-mail: srsouza@if.ufrj.br [Instituto de Física, Universidade Federal do Rio de Janeiro, C.P. 68528, 21941-972 Rio de Janeiro - RJ (Brazil); Instituto de Física, Universidade Federal do Rio Grande do Sul, C.P. 15051, 91501-970 Porto Alegre - RS (Brazil)
2016-07-07
We review the Statistical Multifragmentation Model (SMM) which considers a generalization of the liquid-drop model for hot nuclei and allows one to calculate thermodynamic quantities characterizing the nuclear ensemble at the disassembly stage. We show how to determine probabilities of definite partitions of finite nuclei and how to determine, through Monte Carlo calculations, observables such as the caloric curve, multiplicity distributions, heat capacity, among others. Some experimental measurements of the caloric curve confirmed the SMM predictions of over 10 years before, leading to a surge in the interest in the model. However, the experimental determination of the fragmentation temperatures relies on the yields of different isotopic species, which were not correctly calculated in the schematic, liquid-drop picture, employed in the SMM. This led to a series of improvements in the SMM, in particular to the more careful choice of nuclear masses and energy densities, specially for the lighter nuclei. With these improvements the SMM is able to make quantitative determinations of isotope production. We show the application of SMM to the production of exotic nuclei through multifragmentation. These preliminary calculations demonstrate the need for a careful choice of the system size and excitation energy to attain maximum yields.
Multivariate Statistical Modelling of Drought and Heat Wave Events
Manning, Colin; Widmann, Martin; Vrac, Mathieu; Maraun, Douglas; Bevaqua, Emanuele
2016-04-01
Multivariate Statistical Modelling of Drought and Heat Wave Events C. Manning1,2, M. Widmann1, M. Vrac2, D. Maraun3, E. Bevaqua2,3 1. School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham, UK 2. Laboratoire des Sciences du Climat et de l'Environnement, (LSCE-IPSL), Centre d'Etudes de Saclay, Gif-sur-Yvette, France 3. Wegener Center for Climate and Global Change, University of Graz, Brandhofgasse 5, 8010 Graz, Austria Compound extreme events are a combination of two or more contributing events which in themselves may not be extreme but through their joint occurrence produce an extreme impact. Compound events are noted in the latest IPCC report as an important type of extreme event that have been given little attention so far. As part of the CE:LLO project (Compound Events: muLtivariate statisticaL mOdelling) we are developing a multivariate statistical model to gain an understanding of the dependence structure of certain compound events. One focus of this project is on the interaction between drought and heat wave events. Soil moisture has both a local and non-local effect on the occurrence of heat waves where it strongly controls the latent heat flux affecting the transfer of sensible heat to the atmosphere. These processes can create a feedback whereby a heat wave maybe amplified or suppressed by the soil moisture preconditioning, and vice versa, the heat wave may in turn have an effect on soil conditions. An aim of this project is to capture this dependence in order to correctly describe the joint probabilities of these conditions and the resulting probability of their compound impact. We will show an application of Pair Copula Constructions (PCCs) to study the aforementioned compound event. PCCs allow in theory for the formulation of multivariate dependence structures in any dimension where the PCC is a decomposition of a multivariate distribution into a product of bivariate components modelled using copulas. A
Towards a Statistical Model of Tropical Cyclone Genesis
Fernandez, A.; Kashinath, K.; McAuliffe, J.; Prabhat, M.; Stark, P. B.; Wehner, M. F.
2017-12-01
Tropical Cyclones (TCs) are important extreme weather phenomena that have a strong impact on humans. TC forecasts are largely based on global numerical models that produce TC-like features. Aspects of Tropical Cyclones such as their formation/genesis, evolution, intensification and dissipation over land are important and challenging problems in climate science. This study investigates the environmental conditions associated with Tropical Cyclone Genesis (TCG) by testing how accurately a statistical model can predict TCG in the CAM5.1 climate model. TCG events are defined using TECA software @inproceedings{Prabhat2015teca, title={TECA: Petascale Pattern Recognition for Climate Science}, author={Prabhat and Byna, Surendra and Vishwanath, Venkatram and Dart, Eli and Wehner, Michael and Collins, William D}, booktitle={Computer Analysis of Images and Patterns}, pages={426-436}, year={2015}, organization={Springer}} to extract TC trajectories from CAM5.1. L1-regularized logistic regression (L1LR) is applied to the CAM5.1 output. The predictions have nearly perfect accuracy for data not associated with TC tracks and high accuracy differentiating between high vorticity and low vorticity systems. The model's active variables largely correspond to current hypotheses about important factors for TCG, such as wind field patterns and local pressure minima, and suggests new routes for investigation. Furthermore, our model's predictions of TC activity are competitive with the output of an instantaneous version of Emanuel and Nolan's Genesis Potential Index (GPI) @inproceedings{eman04, title = "Tropical cyclone activity and the global climate system", author = "Kerry Emanuel and Nolan, {David S.}", year = "2004", pages = "240-241", booktitle = "26th Conference on Hurricanes and Tropical Meteorology"}.
Feature and Statistical Model Development in Structural Health Monitoring
Kim, Inho
All structures suffer wear and tear because of impact, excessive load, fatigue, corrosion, etc. in addition to inherent defects during their manufacturing processes and their exposure to various environmental effects. These structural degradations are often imperceptible, but they can severely affect the structural performance of a component, thereby severely decreasing its service life. Although previous studies of Structural Health Monitoring (SHM) have revealed extensive prior knowledge on the parts of SHM processes, such as the operational evaluation, data processing, and feature extraction, few studies have been conducted from a systematical perspective, the statistical model development. The first part of this dissertation, the characteristics of inverse scattering problems, such as ill-posedness and nonlinearity, reviews ultrasonic guided wave-based structural health monitoring problems. The distinctive features and the selection of the domain analysis are investigated by analytically searching the conditions of the uniqueness solutions for ill-posedness and are validated experimentally. Based on the distinctive features, a novel wave packet tracing (WPT) method for damage localization and size quantification is presented. This method involves creating time-space representations of the guided Lamb waves (GLWs), collected at a series of locations, with a spatially dense distribution along paths at pre-selected angles with respect to the direction, normal to the direction of wave propagation. The fringe patterns due to wave dispersion, which depends on the phase velocity, are selected as the primary features that carry information, regarding the wave propagation and scattering. The following part of this dissertation presents a novel damage-localization framework, using a fully automated process. In order to construct the statistical model for autonomous damage localization deep-learning techniques, such as restricted Boltzmann machine and deep belief network
A scan statistic for continuous data based on the normal probability model
Directory of Open Access Journals (Sweden)
Huang Lan
2009-10-01
Full Text Available Abstract Temporal, spatial and space-time scan statistics are commonly used to detect and evaluate the statistical significance of temporal and/or geographical disease clusters, without any prior assumptions on the location, time period or size of those clusters. Scan statistics are mostly used for count data, such as disease incidence or mortality. Sometimes there is an interest in looking for clusters with respect to a continuous variable, such as lead levels in children or low birth weight. For such continuous data, we present a scan statistic where the likelihood is calculated using the the normal probability model. It may also be used for other distributions, while still maintaining the correct alpha level. In an application of the new method, we look for geographical clusters of low birth weight in New York City.
Hu, J.; Emile-Geay, J.; Partin, J. W.; Dee, S.
2016-12-01
The oxygen isotope composition of speleothem calcite is commonly used as a paleoclimate proxy due to its potential for accurate dating and high temporal resolution, but its interpretation can be complex. Here we present a cautionary tale from Crystal Cave, CA, which exemplifies the danger of statistical calibrations as the sole basis for proxy interpretation. The Crystal Cave δ18O record [1] was interpreted as a proxy for sea surface temperature in the Kuroshio Extension region on the basis of an apparently high correlation. Here we show by considering serial autocorrelation, test multiplicity ("look elsewhere" effect), and age uncertainties, that this interpretation is potentially based on a statistical artifact. Using the published age model, we first revisit the correlation analysis by considering the effect of serial correlation. The resulting degrees of freedom decrease, raising the bar for significance. Consideration of the false discovery rate due to the multiplicity problem [2] further reduces the number of gridpoints exhibiting significant correlations. Finally, we quantify age uncertainties using Bchron [3], providing an ensemble of 1,000 possible realizations of the δ18O time series. A statistical analysis of correlations with this ensemble challenges the published interpretation of the Crystal Cave record [1], finding no robust relationship to sea-surface temperature. Our study cautions against "correlation-fishing" as a basis for paleoclimate interpretation, and reaffirms the importance of mechanistic studies as a foundation of this interpretation. Accordingly, we propose a new interpretation of the Crystal Cave δ18O record based on an isotope-enabled climate model [4] and a proxy system model for speleothem calcite [5]. References [1] McCabe-Glynn, S., et al., 2013. Nat. Geosci. 6, 617-621.. [2] Benjamini, Y., Hochberg, Y., 1995. J. R. Stat. Ser. Series B (Methodological) 57, 289-300. [3] Haslett, J., Parnell, A., 2008. J. R. Stat. Soc. Ser. C
Statistical emulation of a tsunami model for sensitivity analysis and uncertainty quantification
Directory of Open Access Journals (Sweden)
A. Sarri
2012-06-01
Full Text Available Due to the catastrophic consequences of tsunamis, early warnings need to be issued quickly in order to mitigate the hazard. Additionally, there is a need to represent the uncertainty in the predictions of tsunami characteristics corresponding to the uncertain trigger features (e.g. either position, shape and speed of a landslide, or sea floor deformation associated with an earthquake. Unfortunately, computer models are expensive to run. This leads to significant delays in predictions and makes the uncertainty quantification impractical. Statistical emulators run almost instantaneously and may represent well the outputs of the computer model. In this paper, we use the outer product emulator to build a fast statistical surrogate of a landslide-generated tsunami computer model. This Bayesian framework enables us to build the emulator by combining prior knowledge of the computer model properties with a few carefully chosen model evaluations. The good performance of the emulator is validated using the leave-one-out method.
Statistical osteoporosis models using composite finite elements: a parameter study.
Wolfram, Uwe; Schwen, Lars Ole; Simon, Ulrich; Rumpf, Martin; Wilke, Hans-Joachim
2009-09-18
Osteoporosis is a widely spread disease with severe consequences for patients and high costs for health care systems. The disease is characterised by a loss of bone mass which induces a loss of mechanical performance and structural integrity. It was found that transverse trabeculae are thinned and perforated while vertical trabeculae stay intact. For understanding these phenomena and the mechanisms leading to fractures of trabecular bone due to osteoporosis, numerous researchers employ micro-finite element models. To avoid disadvantages in setting up classical finite element models, composite finite elements (CFE) can be used. The aim of the study is to test the potential of CFE. For that, a parameter study on numerical lattice samples with statistically simulated, simplified osteoporosis is performed. These samples are subjected to compression and shear loading. Results show that the biggest drop of compressive stiffness is reached for transverse isotropic structures losing 32% of the trabeculae (minus 89.8% stiffness). The biggest drop in shear stiffness is found for an isotropic structure also losing 32% of the trabeculae (minus 67.3% stiffness). The study indicates that losing trabeculae leads to a worse drop of macroscopic stiffness than thinning of trabeculae. The results further demonstrate the advantages of CFEs for simulating micro-structured samples.
Patch-based generative shape model and MDL model selection for statistical analysis of archipelagos
DEFF Research Database (Denmark)
Ganz, Melanie; Nielsen, Mads; Brandt, Sami
2010-01-01
as a probability distribution of a binary image where the model is intended to facilitate sequential simulation. Our results show that a relatively simple model is able to generate structures visually similar to calcifications. Furthermore, we used the shape model as a shape prior in the statistical segmentation......We propose a statistical generative shape model for archipelago-like structures. These kind of structures occur, for instance, in medical images, where our intention is to model the appearance and shapes of calcifications in x-ray radio graphs. The generative model is constructed by (1) learning...... a patch-based dictionary for possible shapes, (2) building up a time-homogeneous Markov model to model the neighbourhood correlations between the patches, and (3) automatic selection of the model complexity by the minimum description length principle. The generative shape model is proposed...
Earthquake statistics in a Block Slider Model and a fully dynamic Fault Model
Directory of Open Access Journals (Sweden)
D. Weatherley
2004-01-01
Full Text Available We examine the event statistics obtained from two differing simplified models for earthquake faults. The first model is a reproduction of the Block-Slider model of Carlson et al. (1991, a model often employed in seismicity studies. The second model is an elastodynamic fault model based upon the Lattice Solid Model (LSM of Mora and Place (1994. We performed simulations in which the fault length was varied in each model and generated synthetic catalogs of event sizes and times. From these catalogs, we constructed interval event size distributions and inter-event time distributions. The larger, localised events in the Block-Slider model displayed the same scaling behaviour as events in the LSM however the distribution of inter-event times was markedly different. The analysis of both event size and inter-event time statistics is an effective method for comparative studies of differing simplified models for earthquake faults.
Directory of Open Access Journals (Sweden)
Justin London
2010-01-01
Full Text Available In “National Metrical Types in Nineteenth Century Art Song” Leigh Van Handel gives a sympathetic critique of William Rothstein’s claim that in western classical music of the late 18th and 19th centuries there are discernable differences in the phrasing and metrical practice of German versus French and Italian composers. This commentary (a examines just what Rothstein means in terms of his proposed metrical typology, (b questions Van Handel on how she has applied it to a purely melodic framework, (c amplifies Van Handel’s critique of Rothstein, and then (d concludes with a rumination on the reach of quantitative (i.e., statistically-driven versus qualitative claims regarding such things as “national metrical types.”
Directory of Open Access Journals (Sweden)
Kok-Yong Seng
2008-01-01
Full Text Available Currently, statistical techniques for analysis of microarray-generated data sets have deficiencies due to limited understanding of errors inherent in the data. A generalized likelihood ratio (GLR test based on an error model has been recently proposed to identify differentially expressed genes from microarray experiments. However, the use of different error structures under the GLR test has not been evaluated, nor has this method been compared to commonly used statistical tests such as the parametric t-test. The concomitant effects of varying data signal-to-noise ratio and replication number on the performance of statistical tests also remain largely unexplored. In this study, we compared the effects of different underlying statistical error structures on the GLR test’s power in identifying differentially expressed genes in microarray data. We evaluated such variants of the GLR test as well as the one sample t-test based on simulated data by means of receiver operating characteristic (ROC curves. Further, we used bootstrapping of ROC curves to assess statistical significance of differences between the areas under the curves. Our results showed that i the GLR tests outperformed the t-test for detecting differential gene expression, ii the identity of the underlying error structure was important in determining the GLR tests’ performance, and iii signal-to-noise ratio was a more important contributor than sample replication in identifying statistically significant differential gene expression.
Local yield stress statistics in model amorphous solids
Barbot, Armand; Lerbinger, Matthias; Hernandez-Garcia, Anier; García-García, Reinaldo; Falk, Michael L.; Vandembroucq, Damien; Patinet, Sylvain
2018-03-01
We develop and extend a method presented by Patinet, Vandembroucq, and Falk [Phys. Rev. Lett. 117, 045501 (2016), 10.1103/PhysRevLett.117.045501] to compute the local yield stresses at the atomic scale in model two-dimensional Lennard-Jones glasses produced via differing quench protocols. This technique allows us to sample the plastic rearrangements in a nonperturbative manner for different loading directions on a well-controlled length scale. Plastic activity upon shearing correlates strongly with the locations of low yield stresses in the quenched states. This correlation is higher in more structurally relaxed systems. The distribution of local yield stresses is also shown to strongly depend on the quench protocol: the more relaxed the glass, the higher the local plastic thresholds. Analysis of the magnitude of local plastic relaxations reveals that stress drops follow exponential distributions, justifying the hypothesis of an average characteristic amplitude often conjectured in mesoscopic or continuum models. The amplitude of the local plastic rearrangements increases on average with the yield stress, regardless of the system preparation. The local yield stress varies with the shear orientation tested and strongly correlates with the plastic rearrangement locations when the system is sheared correspondingly. It is thus argued that plastic rearrangements are the consequence of shear transformation zones encoded in the glass structure that possess weak slip planes along different orientations. Finally, we justify the length scale employed in this work and extract the yield threshold statistics as a function of the size of the probing zones. This method makes it possible to derive physically grounded models of plasticity for amorphous materials by directly revealing the relevant details of the shear transformation zones that mediate this process.
Viswanathan, Sharadha; Pope, Stephen B.
2007-11-01
Probability density function (PDF) calculations are reported for the dispersion from line sources in isotropic turbulence. These flows pose a significant challenge to statistical models, because the scalar length scale (of the initial plume) is much smaller than the turbulence integral scale. The PDF calculations are based on a new near-neighbor implementation of the interaction by exchange with the conditional mean (IECM) mixing model. The calculations are compared to the experimental data of Warhaft (1984) on single and pairs of line sources, and with the previous calculations of Sawford (2004). This establishes the accuracy of the new implementation of IECM. An array of line sources is also considered with comparison to the experimental data of Warhaft & Lumley (1978), which show the dependence of the scalar variance decay rate on the array spacing relative to the turbulence integral scale. The near-neighbor implementation is applicable to other local mixing models, as arise, for example, in multiple mapping conditioning (Klimenko & Pope 2003). In the particle method used to solve the modeled PDF equation, the near-neighbor implementation results in a particle's mixing with just one or two near neighbors (in the relevant space), and hence maximizes the localness of mixing.
Extension of the Wald statistic to models with dependent observations
Czech Academy of Sciences Publication Activity Database
Morales, D.; Pardo, L.; Pardo, M. C.; Vajda, Igor
2000-01-01
Roč. 52, č. 2 (2000), s. 97-113 ISSN 0026-1335 R&D Projects: GA ČR GA102/99/1137 Grant - others:DGES(ES) PB-960635; GV(ES) 99/159/01 Institutional research plan: AV0Z1075907 Keywords : composite parametric hypotheses * generalized likelihood ratio statistic * generalized Wald statistic Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.212, year: 2000
Statistical power analysis a simple and general model for traditional and modern hypothesis tests
Murphy, Kevin R; Wolach, Allen
2014-01-01
Noted for its accessible approach, this text applies the latest approaches of power analysis to both null hypothesis and minimum-effect testing using the same basic unified model. Through the use of a few simple procedures and examples, the authors show readers with little expertise in statistical analysis how to obtain the values needed to carry out the power analysis for their research. Illustrations of how these analyses work and how they can be used to choose the appropriate criterion for defining statistically significant outcomes are sprinkled throughout. The book presents a simple and g
DEFF Research Database (Denmark)
A methodology is presented that combines modelling based on first principles and data based modelling into a modelling cycle that facilitates fast decision-making based on statistical methods. A strong feature of this methodology is that given a first principles model along with process data......, the corresponding modelling cycle model of the given system for a given purpose. A computer-aided tool, which integrates the elements of the modelling cycle, is also presented, and an example is given of modelling a fed-batch bioreactor....
Ivanov, Martin; Warrach-Sagi, Kirsten; Wulfmeyer, Volker
2018-04-01
A new approach for rigorous spatial analysis of the downscaling performance of regional climate model (RCM) simulations is introduced. It is based on a multiple comparison of the local tests at the grid cells and is also known as `field' or `global' significance. The block length for the local resampling tests is precisely determined to adequately account for the time series structure. New performance measures for estimating the added value of downscaled data relative to the large-scale forcing fields are developed. The methodology is exemplarily applied to a standard EURO-CORDEX hindcast simulation with the Weather Research and Forecasting (WRF) model coupled with the land surface model NOAH at 0.11 ∘ grid resolution. Daily precipitation climatology for the 1990-2009 period is analysed for Germany for winter and summer in comparison with high-resolution gridded observations from the German Weather Service. The field significance test controls the proportion of falsely rejected local tests in a meaningful way and is robust to spatial dependence. Hence, the spatial patterns of the statistically significant local tests are also meaningful. We interpret them from a process-oriented perspective. While the downscaled precipitation distributions are statistically indistinguishable from the observed ones in most regions in summer, the biases of some distribution characteristics are significant over large areas in winter. WRF-NOAH generates appropriate stationary fine-scale climate features in the daily precipitation field over regions of complex topography in both seasons and appropriate transient fine-scale features almost everywhere in summer. As the added value of global climate model (GCM)-driven simulations cannot be smaller than this perfect-boundary estimate, this work demonstrates in a rigorous manner the clear additional value of dynamical downscaling over global climate simulations. The evaluation methodology has a broad spectrum of applicability as it is
Two statistical approaches, weighted regression on time, discharge, and season and generalized additive models, have recently been used to evaluate water quality trends in estuaries. Both models have been used in similar contexts despite differences in statistical foundations and...
Dataset of coded handwriting features for use in statistical modelling
Directory of Open Access Journals (Sweden)
Anna Agius
2018-02-01
Full Text Available The data presented here is related to the article titled, “Using handwriting to infer a writer's country of origin for forensic intelligence purposes” (Agius et al., 2017 [1]. This article reports original writer, spatial and construction characteristic data for thirty-seven English Australian writers and thirty-seven Vietnamese writers. All of these characteristics were coded and recorded in Microsoft Excel 2013 (version 15.31. The construction characteristics coded were only extracted from seven characters, which were: ‘g’, ‘h’, ‘th’, ‘M’, ‘0’, ‘7’ and ‘9’. The coded format of the writer, spatial and construction characteristics is made available in this Data in Brief in order to allow others to perform statistical analyses and modelling to investigate whether there is a relationship between the handwriting features and the nationality of the writer, and whether the two nationalities can be differentiated. Furthermore, to employ mathematical techniques that are capable of characterising the extracted features from each participant.
Increased Statistical Efficiency in a Lognormal Mean Model
Directory of Open Access Journals (Sweden)
Grant H. Skrepnek
2014-01-01
Full Text Available Within the context of clinical and other scientific research, a substantial need exists for an accurate determination of the point estimate in a lognormal mean model, given that highly skewed data are often present. As such, logarithmic transformations are often advocated to achieve the assumptions of parametric statistical inference. Despite this, existing approaches that utilize only a sample’s mean and variance may not necessarily yield the most efficient estimator. The current investigation developed and tested an improved efficient point estimator for a lognormal mean by capturing more complete information via the sample’s coefficient of variation. Results of an empirical simulation study across varying sample sizes and population standard deviations indicated relative improvements in efficiency of up to 129.47 percent compared to the usual maximum likelihood estimator and up to 21.33 absolute percentage points above the efficient estimator presented by Shen and colleagues (2006. The relative efficiency of the proposed estimator increased particularly as a function of decreasing sample size and increasing population standard deviation.
Statistical physics of medical diagnostics: Study of a probabilistic model
Mashaghi, Alireza; Ramezanpour, Abolfazl
2018-03-01
We study a diagnostic strategy which is based on the anticipation of the diagnostic process by simulation of the dynamical process starting from the initial findings. We show that such a strategy could result in more accurate diagnoses compared to a strategy that is solely based on the direct implications of the initial observations. We demonstrate this by employing the mean-field approximation of statistical physics to compute the posterior disease probabilities for a given subset of observed signs (symptoms) in a probabilistic model of signs and diseases. A Monte Carlo optimization algorithm is then used to maximize an objective function of the sequence of observations, which favors the more decisive observations resulting in more polarized disease probabilities. We see how the observed signs change the nature of the macroscopic (Gibbs) states of the sign and disease probability distributions. The structure of these macroscopic states in the configuration space of the variables affects the quality of any approximate inference algorithm (so the diagnostic performance) which tries to estimate the sign-disease marginal probabilities. In particular, we find that the simulation (or extrapolation) of the diagnostic process is helpful when the disease landscape is not trivial and the system undergoes a phase transition to an ordered phase.
Definitions and Models of Statistical Literacy: A Literature Review
Sharma, Sashi
2017-01-01
Despite statistical literacy being relatively new in statistics education research, it needs special attention as attempts are being made to enhance the teaching, learning and assessing of this sub-strand. It is important that teachers and researchers are aware of the challenges of teaching this literacy. In this article, the growing importance of…
Statistical model of stress corrosion cracking based on extended ...
Indian Academy of Sciences (India)
In the previous paper ({\\it Pramana – J. Phys.} 81(6), 1009 (2013)), the mechanism of stress corrosion cracking (SCC) based on non-quadratic form of Dirichlet energy was proposed and its statistical features were discussed. Following those results, we discuss here how SCC propagates on pipe wall statistically. It reveals ...
Maximum entropy principle and hydrodynamic models in statistical mechanics
International Nuclear Information System (INIS)
Trovato, M.; Reggiani, L.
2012-01-01
This review presents the state of the art of the maximum entropy principle (MEP) in its classical and quantum (QMEP) formulation. Within the classical MEP we overview a general theory able to provide, in a dynamical context, the macroscopic relevant variables for carrier transport in the presence of electric fields of arbitrary strength. For the macroscopic variables the linearized maximum entropy approach is developed including full-band effects within a total energy scheme. Under spatially homogeneous conditions, we construct a closed set of hydrodynamic equations for the small-signal (dynamic) response of the macroscopic variables. The coupling between the driving field and the energy dissipation is analyzed quantitatively by using an arbitrary number of moments of the distribution function. Analogously, the theoretical approach is applied to many one-dimensional n + nn + submicron Si structures by using different band structure models, different doping profiles, different applied biases and is validated by comparing numerical calculations with ensemble Monte Carlo simulations and with available experimental data. Within the quantum MEP we introduce a quantum entropy functional of the reduced density matrix, the principle of quantum maximum entropy is then asserted as fundamental principle of quantum statistical mechanics. Accordingly, we have developed a comprehensive theoretical formalism to construct rigorously a closed quantum hydrodynamic transport within a Wigner function approach. The theory is formulated both in thermodynamic equilibrium and nonequilibrium conditions, and the quantum contributions are obtained by only assuming that the Lagrange multipliers can be expanded in powers of ħ 2 , being ħ the reduced Planck constant. In particular, by using an arbitrary number of moments, we prove that: i) on a macroscopic scale all nonlocal effects, compatible with the uncertainty principle, are imputable to high-order spatial derivatives both of the
Directory of Open Access Journals (Sweden)
Hongshan Zhao
2012-05-01
Full Text Available Short-term solar irradiance forecasting (STSIF is of great significance for the optimal operation and power predication of grid-connected photovoltaic (PV plants. However, STSIF is very complex to handle due to the random and nonlinear characteristics of solar irradiance under changeable weather conditions. Artificial Neural Network (ANN is suitable for STSIF modeling and many research works on this topic are presented, but the conciseness and robustness of the existing models still need to be improved. After discussing the relation between weather variations and irradiance, the characteristics of the statistical feature parameters of irradiance under different weather conditions are figured out. A novel ANN model using statistical feature parameters (ANN-SFP for STSIF is proposed in this paper. The input vector is reconstructed with several statistical feature parameters of irradiance and ambient temperature. Thus sufficient information can be effectively extracted from relatively few inputs and the model complexity is reduced. The model structure is determined by cross-validation (CV, and the Levenberg-Marquardt algorithm (LMA is used for the network training. Simulations are carried out to validate and compare the proposed model with the conventional ANN model using historical data series (ANN-HDS, and the results indicated that the forecast accuracy is obviously improved under variable weather conditions.
Statistical models for thermal ageing of steel materials in nuclear power plants
International Nuclear Information System (INIS)
Persoz, M.
1996-01-01
Some category of steel materials in nuclear power plants may be subjected to thermal ageing, whose extent depends on the steel chemical composition and the ageing parameters, i.e. temperature and duration. This ageing affects the 'impact strength' of the materials, which is a mechanical property. In order to assess the residual lifetime of these components, a probabilistic study has been launched, which takes into account the scatter over the input parameters of the mechanical model. Predictive formulae for estimating the impact strength of aged materials are important input data of the model. A data base has been created with impact strength results obtained from an ageing program in laboratory and statistical treatments have been undertaken. Two kinds of model have been developed, with non linear regression methods (PROC NLIN, available in SAS/STAT). The first one, using a hyperbolic tangent function, is partly based on physical considerations, and the second one, of an exponential type, is purely statistically built. The difficulties consist in selecting the significant parameters and attributing initial values to the coefficients, which is a requirement of the NLIN procedure. This global statistical analysis has led to general models that are unction of the chemical variables and the ageing parameters. These models are as precise (if not more) as local models that had been developed earlier for some specific values of ageing temperature and ageing duration. This paper describes the data and the methodology used to build the models and analyses the results given by the SAS system. (author)
Directory of Open Access Journals (Sweden)
Chifflard Peter
2018-03-01
Full Text Available We examine the feasibility and added value of upscaling point data of soil moisture from a small- to a mesoscale catchment for the purpose of single-event flood prediction. We test the hypothesis that in a given catchment, the present soil moisture status is a key factor governing peak discharge, flow volume and flood duration. Multiple regression analyses of rainfall, pre-event discharge, single point soil moisture profiles from representative locations and peak discharge, discharge duration, discharge volume are discussed. The soil moisture profiles are selected along a convergent slope connected to the groundwater in flood plain within the small-scale catchment Husten (2.6 km², which is a headwater catchment of the larger Hüppcherhammer catchment (47.2 km², Germany. Results show that the number of explanatory variables in the regression models is higher in summer (up to 8 variables than in winter (up to 3 variables and higher in the meso-scale catchment than in the small-scale catchment (up to 2 variables. Soil moisture data from selected key locations in the small catchment improves the quality of regression models established for the meso-scale catchment. For the different target variables peak discharge, discharge duration and discharge volume the adding of the soil moisture from the flood plain and the lower slope as explanatory variable improves the quality of the regression model by 15%, 20% and 10%, respectively, especially during the summer season. In the winter season the improvement is smaller (up to 6% and the regression models mainly include rainfall characteristics as explanatory variables. The appearance of the soil moisture variables in the stepwise regression indicates their varying importance, depending on which characteristics of the discharge are focused on. Thus, we conclude that point data for soil moisture in functional landscape elements describe the catchments’ initial conditions very well and may yield valuable
Statistical behaviour of adaptive multilevel splitting algorithms in simple models
International Nuclear Information System (INIS)
Rolland, Joran; Simonnet, Eric
2015-01-01
Adaptive multilevel splitting algorithms have been introduced rather recently for estimating tail distributions in a fast and efficient way. In particular, they can be used for computing the so-called reactive trajectories corresponding to direct transitions from one metastable state to another. The algorithm is based on successive selection–mutation steps performed on the system in a controlled way. It has two intrinsic parameters, the number of particles/trajectories and the reaction coordinate used for discriminating good or bad trajectories. We investigate first the convergence in law of the algorithm as a function of the timestep for several simple stochastic models. Second, we consider the average duration of reactive trajectories for which no theoretical predictions exist. The most important aspect of this work concerns some systems with two degrees of freedom. They are studied in detail as a function of the reaction coordinate in the asymptotic regime where the number of trajectories goes to infinity. We show that during phase transitions, the statistics of the algorithm deviate significatively from known theoretical results when using non-optimal reaction coordinates. In this case, the variance of the algorithm is peaking at the transition and the convergence of the algorithm can be much slower than the usual expected central limit behaviour. The duration of trajectories is affected as well. Moreover, reactive trajectories do not correspond to the most probable ones. Such behaviour disappears when using the optimal reaction coordinate called committor as predicted by the theory. We finally investigate a three-state Markov chain which reproduces this phenomenon and show logarithmic convergence of the trajectory durations
Modelling malaria treatment practices in Bangladesh using spatial statistics
Directory of Open Access Journals (Sweden)
Haque Ubydul
2012-03-01
Full Text Available Abstract Background Malaria treatment-seeking practices vary worldwide and Bangladesh is no exception. Individuals from 88 villages in Rajasthali were asked about their treatment-seeking practices. A portion of these households preferred malaria treatment from the National Control Programme, but still a large number of households continued to use drug vendors and approximately one fourth of the individuals surveyed relied exclusively on non-control programme treatments. The risks of low-control programme usage include incomplete malaria treatment, possible misuse of anti-malarial drugs, and an increased potential for drug resistance. Methods The spatial patterns of treatment-seeking practices were first examined using hot-spot analysis (Local Getis-Ord Gi statistic and then modelled using regression. Ordinary least squares (OLS regression identified key factors explaining more than 80% of the variation in control programme and vendor treatment preferences. Geographically weighted regression (GWR was then used to assess where each factor was a strong predictor of treatment-seeking preferences. Results Several factors including tribal affiliation, housing materials, household densities, education levels, and proximity to the regional urban centre, were found to be effective predictors of malaria treatment-seeking preferences. The predictive strength of each of these factors, however, varied across the study area. While education, for example, was a strong predictor in some villages, it was less important for predicting treatment-seeking outcomes in other villages. Conclusion Understanding where each factor is a strong predictor of treatment-seeking outcomes may help in planning targeted interventions aimed at increasing control programme usage. Suggested strategies include providing additional training for the Building Resources across Communities (BRAC health workers, implementing educational programmes, and addressing economic factors.
Computational algebraic geometry for statistical modeling FY09Q2 progress.
Energy Technology Data Exchange (ETDEWEB)
Thompson, David C.; Rojas, Joseph Maurice; Pebay, Philippe Pierre
2009-03-01
This is a progress report on polynomial system solving for statistical modeling. This is a progress report on polynomial system solving for statistical modeling. This quarter we have developed our first model of shock response data and an algorithm for identifying the chamber cone containing a polynomial system in n variables with n+k terms within polynomial time - a significant improvement over previous algorithms, all having exponential worst-case complexity. We have implemented and verified the chamber cone algorithm for n+3 and are working to extend the implementation to handle arbitrary k. Later sections of this report explain chamber cones in more detail; the next section provides an overview of the project and how the current progress fits into it.
Improving statistical reasoning: theoretical models and practical implications
National Research Council Canada - National Science Library
Sedlmeier, Peter
1999-01-01
... in Psychology? 206 References 216 Author Index 230 Subject Index 235 v PrefacePreface Statistical literacy, the art of drawing reasonable inferences from an abundance of numbers provided daily by...
A New Statistic for Evaluating Item Response Theory Models for Ordinal Data. CRESST Report 839
Cai, Li; Monroe, Scott
2014-01-01
We propose a new limited-information goodness of fit test statistic C[subscript 2] for ordinal IRT models. The construction of the new statistic lies formally between the M[subscript 2] statistic of Maydeu-Olivares and Joe (2006), which utilizes first and second order marginal probabilities, and the M*[subscript 2] statistic of Cai and Hansen…
Computational and Statistical Models: A Comparison for Policy Modeling of Childhood Obesity
Mabry, Patricia L.; Hammond, Ross; Ip, Edward Hak-Sing; Huang, Terry T.-K.
As systems science methodologies have begun to emerge as a set of innovative approaches to address complex problems in behavioral, social science, and public health research, some apparent conflicts with traditional statistical methodologies for public health have arisen. Computational modeling is an approach set in context that integrates diverse sources of data to test the plausibility of working hypotheses and to elicit novel ones. Statistical models are reductionist approaches geared towards proving the null hypothesis. While these two approaches may seem contrary to each other, we propose that they are in fact complementary and can be used jointly to advance solutions to complex problems. Outputs from statistical models can be fed into computational models, and outputs from computational models can lead to further empirical data collection and statistical models. Together, this presents an iterative process that refines the models and contributes to a greater understanding of the problem and its potential solutions. The purpose of this panel is to foster communication and understanding between statistical and computational modelers. Our goal is to shed light on the differences between the approaches and convey what kinds of research inquiries each one is best for addressing and how they can serve complementary (and synergistic) roles in the research process, to mutual benefit. For each approach the panel will cover the relevant "assumptions" and how the differences in what is assumed can foster misunderstandings. The interpretations of the results from each approach will be compared and contrasted and the limitations for each approach will be delineated. We will use illustrative examples from CompMod, the Comparative Modeling Network for Childhood Obesity Policy. The panel will also incorporate interactive discussions with the audience on the issues raised here.
International Nuclear Information System (INIS)
Xu Chengjian; Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van’t
2012-01-01
Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.
Energy Technology Data Exchange (ETDEWEB)
Xu Chengjian, E-mail: c.j.xu@umcg.nl [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van' t [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands)
2012-03-15
Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.
Information Geometric Complexity of a Trivariate Gaussian Statistical Model
Directory of Open Access Journals (Sweden)
Domenico Felice
2014-05-01
Full Text Available We evaluate the information geometric complexity of entropic motion on low-dimensional Gaussian statistical manifolds in order to quantify how difficult it is to make macroscopic predictions about systems in the presence of limited information. Specifically, we observe that the complexity of such entropic inferences not only depends on the amount of available pieces of information but also on the manner in which such pieces are correlated. Finally, we uncover that, for certain correlational structures, the impossibility of reaching the most favorable configuration from an entropic inference viewpoint seems to lead to an information geometric analog of the well-known frustration effect that occurs in statistical physics.
Fang, Yongxiang; Wit, Ernst
2008-01-01
Fisher’s combined probability test is the most commonly used method to test the overall significance of a set independent p-values. However, it is very obviously that Fisher’s statistic is more sensitive to smaller p-values than to larger p-value and a small p-value may overrule the other p-values
Directory of Open Access Journals (Sweden)
Anita Lindmark
Full Text Available When profiling hospital performance, quality inicators are commonly evaluated through hospital-specific adjusted means with confidence intervals. When identifying deviations from a norm, large hospitals can have statistically significant results even for clinically irrelevant deviations while important deviations in small hospitals can remain undiscovered. We have used data from the Swedish Stroke Register (Riksstroke to illustrate the properties of a benchmarking method that integrates considerations of both clinical relevance and level of statistical significance.The performance measure used was case-mix adjusted risk of death or dependency in activities of daily living within 3 months after stroke. A hospital was labeled as having outlying performance if its case-mix adjusted risk exceeded a benchmark value with a specified statistical confidence level. The benchmark was expressed relative to the population risk and should reflect the clinically relevant deviation that is to be detected. A simulation study based on Riksstroke patient data from 2008-2009 was performed to investigate the effect of the choice of the statistical confidence level and benchmark value on the diagnostic properties of the method.Simulations were based on 18,309 patients in 76 hospitals. The widely used setting, comparing 95% confidence intervals to the national average, resulted in low sensitivity (0.252 and high specificity (0.991. There were large variations in sensitivity and specificity for different requirements of statistical confidence. Lowering statistical confidence improved sensitivity with a relatively smaller loss of specificity. Variations due to different benchmark values were smaller, especially for sensitivity. This allows the choice of a clinically relevant benchmark to be driven by clinical factors without major concerns about sufficiently reliable evidence.The study emphasizes the importance of combining clinical relevance and level of statistical
Validation of the measure automobile emissions model : a statistical analysis
2000-09-01
The Mobile Emissions Assessment System for Urban and Regional Evaluation (MEASURE) model provides an external validation capability for hot stabilized option; the model is one of several new modal emissions models designed to predict hot stabilized e...
Parameterizing Phrase Based Statistical Machine Translation Models: An Analytic Study
Cer, Daniel
2011-01-01
The goal of this dissertation is to determine the best way to train a statistical machine translation system. I first develop a state-of-the-art machine translation system called Phrasal and then use it to examine a wide variety of potential learning algorithms and optimization criteria and arrive at two very surprising results. First, despite the…
Monte Carlo simulation of quantum statistical lattice models
Raedt, Hans De; Lagendijk, Ad
1985-01-01
In this article we review recent developments in computational methods for quantum statistical lattice problems. We begin by giving the necessary mathematical basis, the generalized Trotter formula, and discuss the computational tools, exact summations and Monte Carlo simulation, that will be used
On cumulative process model and its statistical analysis
Czech Academy of Sciences Publication Activity Database
Volf, Petr
2000-01-01
Roč. 36, č. 2 (2000), s. 165-176 ISSN 0023-5954 R&D Projects: GA ČR GA201/97/0354; GA ČR GA402/98/0742 Institutional research plan: AV0Z1075907 Subject RIV: BB - Applied Statistics, Operational Research
Statistical model of stress corrosion cracking based on extended ...
Indian Academy of Sciences (India)
2016-09-07
Sep 7, 2016 ... Abstract. In the previous paper (Pramana – J. Phys. 81(6), 1009 (2013)), the mechanism of stress corrosion cracking (SCC) based on non-quadratic form of Dirichlet energy was proposed and its statistical features were discussed. Following those results, we discuss here how SCC propagates on pipe wall ...
Learning Statistical Patterns in Relational Data Using Probabilistic Relational Models
National Research Council Canada - National Science Library
Koller, Daphne
2005-01-01
.... This effort focused on developing undirected probabilistic models for representing and learning graph patterns, learning patterns involving links between objects, learning discriminative models...
Statistical description of tropospheric delay for InSAR : Overview and a new model
DEFF Research Database (Denmark)
Merryman Boncori, John Peter; Mohr, Johan Jacob
2007-01-01
This paper focuses on statistical modeling of water vapor fluctuations for InSAR. The structure function and power spectral density approaches are reviewed, summarizing their assumptions and results. The linking equations between these modeling techniques are reported. A structure function model ...... of these, to atmospheric statistics. The latter approach is used to compare the derived model with previously published results....
DEFF Research Database (Denmark)
ter Beek, Maurice H.; Legay, Axel; Lluch Lafuente, Alberto
2015-01-01
We investigate the suitability of statistical model checking techniques for analysing quantitative properties of software product line models with probabilistic aspects. For this purpose, we enrich the feature-oriented language FLAN with action rates, which specify the likelihood of exhibiting...... particular behaviour or of installing features at a specific moment or in a specific order. The enriched language (called PFLAN) allows us to specify models of software product lines with probabilistic configurations and behaviour, e.g. by considering a PFLAN semantics based on discrete-time Markov chains....... The Maude implementation of PFLAN is combined with the distributed statistical model checker MultiVeStA to perform quantitative analyses of a simple product line case study. The presented analyses include the likelihood of certain behaviour of interest (e.g. product malfunctioning) and the expected average...
Directory of Open Access Journals (Sweden)
Maurice H. ter Beek
2015-04-01
Full Text Available We investigate the suitability of statistical model checking techniques for analysing quantitative properties of software product line models with probabilistic aspects. For this purpose, we enrich the feature-oriented language FLan with action rates, which specify the likelihood of exhibiting particular behaviour or of installing features at a specific moment or in a specific order. The enriched language (called PFLan allows us to specify models of software product lines with probabilistic configurations and behaviour, e.g. by considering a PFLan semantics based on discrete-time Markov chains. The Maude implementation of PFLan is combined with the distributed statistical model checker MultiVeStA to perform quantitative analyses of a simple product line case study. The presented analyses include the likelihood of certain behaviour of interest (e.g. product malfunctioning and the expected average cost of products.
Gallagher, H. Colin; Robins, Garry
2015-01-01
As part of the shift within second language acquisition (SLA) research toward complex systems thinking, researchers have called for investigations of social network structure. One strand of social network analysis yet to receive attention in SLA is network statistical models, whereby networks are explained in terms of smaller substructures of…
Study on Semi-Parametric Statistical Model of Safety Monitoring of Cracks in Concrete Dams
Gu, Chongshi; Qin, Dong; Li, Zhanchao; Zheng, Xueqin
2013-01-01
Cracks are one of the hidden dangers in concrete dams. The study on safety monitoring models of concrete dam cracks has always been difficult. Using the parametric statistical model of safety monitoring of cracks in concrete dams, with the help of the semi-parametric statistical theory, and considering the abnormal behaviors of these cracks, the semi-parametric statistical model of safety monitoring of concrete dam cracks is established to overcome the limitation of the parametric model in ex...
Analytical model of SiPM time resolution and order statistics with crosstalk
Energy Technology Data Exchange (ETDEWEB)
Vinogradov, S., E-mail: Sergey.Vinogradov@liverpool.ac.uk [University of Liverpool and Cockcroft Institute, Sci-Tech Daresbury, Keckwick Lane, Warrington WA4 4AD (United Kingdom); P.N. Lebedev Physical Institute of the Russian Academy of Sciences, 119991 Leninskiy Prospekt 53, Moscow (Russian Federation)
2015-07-01
Time resolution is the most important parameter of photon detectors in a wide range of time-of-flight and time correlation applications within the areas of high energy physics, medical imaging, and others. Silicon photomultipliers (SiPM) have been initially recognized as perfect photon-number-resolving detectors; now they also provide outstanding results in the scintillator timing resolution. However, crosstalk and afterpulsing introduce false secondary non-Poissonian events, and SiPM time resolution models are experiencing significant difficulties with that. This study presents an attempt to develop an analytical model of the timing resolution of an SiPM taking into account statistics of secondary events resulting from a crosstalk. Two approaches have been utilized to derive an analytical expression for time resolution: the first one based on statistics of independent identically distributed detection event times and the second one based on order statistics of these times. The first approach is found to be more straightforward and “analytical-friendly” to model analog SiPMs. Comparisons of coincidence resolving times predicted by the model with the known experimental results from a LYSO:Ce scintillator and a Hamamatsu MPPC are presented.
Analytical model of SiPM time resolution and order statistics with crosstalk
International Nuclear Information System (INIS)
Vinogradov, S.
2015-01-01
Time resolution is the most important parameter of photon detectors in a wide range of time-of-flight and time correlation applications within the areas of high energy physics, medical imaging, and others. Silicon photomultipliers (SiPM) have been initially recognized as perfect photon-number-resolving detectors; now they also provide outstanding results in the scintillator timing resolution. However, crosstalk and afterpulsing introduce false secondary non-Poissonian events, and SiPM time resolution models are experiencing significant difficulties with that. This study presents an attempt to develop an analytical model of the timing resolution of an SiPM taking into account statistics of secondary events resulting from a crosstalk. Two approaches have been utilized to derive an analytical expression for time resolution: the first one based on statistics of independent identically distributed detection event times and the second one based on order statistics of these times. The first approach is found to be more straightforward and “analytical-friendly” to model analog SiPMs. Comparisons of coincidence resolving times predicted by the model with the known experimental results from a LYSO:Ce scintillator and a Hamamatsu MPPC are presented
Ten Years of Cloud Properties from MODIS: Global Statistics and Use in Climate Model Evaluation
Platnick, Steven E.
2011-01-01
The NASA Moderate Resolution Imaging Spectroradiometer (MODIS), launched onboard the Terra and Aqua spacecrafts, began Earth observations on February 24, 2000 and June 24,2002, respectively. Among the algorithms developed and applied to this sensor, a suite of cloud products includes cloud masking/detection, cloud-top properties (temperature, pressure), and optical properties (optical thickness, effective particle radius, water path, and thermodynamic phase). All cloud algorithms underwent numerous changes and enhancements between for the latest Collection 5 production version; this process continues with the current Collection 6 development. We will show example MODIS Collection 5 cloud climatologies derived from global spatial . and temporal aggregations provided in the archived gridded Level-3 MODIS atmosphere team product (product names MOD08 and MYD08 for MODIS Terra and Aqua, respectively). Data sets in this Level-3 product include scalar statistics as well as 1- and 2-D histograms of many cloud properties, allowing for higher order information and correlation studies. In addition to these statistics, we will show trends and statistical significance in annual and seasonal means for a variety of the MODIS cloud properties, as well as the time required for detection given assumed trends. To assist in climate model evaluation, we have developed a MODIS cloud simulator with an accompanying netCDF file containing subsetted monthly Level-3 statistical data sets that correspond to the simulator output. Correlations of cloud properties with ENSO offer the potential to evaluate model cloud sensitivity; initial results will be discussed.
Statistical modeling of total crash frequency at highway intersections
Directory of Open Access Journals (Sweden)
Arash M. Roshandeh
2016-04-01
Full Text Available Intersection-related crashes are associated with high proportion of accidents involving drivers, occupants, pedestrians, and cyclists. In general, the purpose of intersection safety analysis is to determine the impact of safety-related variables on pedestrians, cyclists and vehicles, so as to facilitate the design of effective and efficient countermeasure strategies to improve safety at intersections. This study investigates the effects of traffic, environmental, intersection geometric and pavement-related characteristics on total crash frequencies at intersections. A random-parameter Poisson model was used with crash data from 357 signalized intersections in Chicago from 2004 to 2010. The results indicate that out of the identified factors, evening peak period traffic volume, pavement condition, and unlighted intersections have the greatest effects on crash frequencies. Overall, the results seek to suggest that, in order to improve effective highway-related safety countermeasures at intersections, significant attention must be focused on ensuring that pavements are adequately maintained and intersections should be well lighted. It needs to be mentioned that, projects could be implemented at and around the study intersections during the study period (7 years, which could affect the crash frequency over the time. This is an important variable which could be a part of the future studies to investigate the impacts of safety-related works at intersections and their marginal effects on crash frequency at signalized intersections.
Ivanov, Martin; Warrach-Sagi, Kirsten; Wulfmeyer, Volker
2018-04-01
A new approach for rigorous spatial analysis of the downscaling performance of regional climate model (RCM) simulations is introduced. It is based on a multiple comparison of the local tests at the grid cells and is also known as "field" or "global" significance. New performance measures for estimating the added value of downscaled data relative to the large-scale forcing fields are developed. The methodology is exemplarily applied to a standard EURO-CORDEX hindcast simulation with the Weather Research and Forecasting (WRF) model coupled with the land surface model NOAH at 0.11 ∘ grid resolution. Monthly temperature climatology for the 1990-2009 period is analysed for Germany for winter and summer in comparison with high-resolution gridded observations from the German Weather Service. The field significance test controls the proportion of falsely rejected local tests in a meaningful way and is robust to spatial dependence. Hence, the spatial patterns of the statistically significant local tests are also meaningful. We interpret them from a process-oriented perspective. In winter and in most regions in summer, the downscaled distributions are statistically indistinguishable from the observed ones. A systematic cold summer bias occurs in deep river valleys due to overestimated elevations, in coastal areas due probably to enhanced sea breeze circulation, and over large lakes due to the interpolation of water temperatures. Urban areas in concave topography forms have a warm summer bias due to the strong heat islands, not reflected in the observations. WRF-NOAH generates appropriate fine-scale features in the monthly temperature field over regions of complex topography, but over spatially homogeneous areas even small biases can lead to significant deteriorations relative to the driving reanalysis. As the added value of global climate model (GCM)-driven simulations cannot be smaller than this perfect-boundary estimate, this work demonstrates in a rigorous manner the
Linear System Models for Ultrasonic Imaging: Intensity Signal Statistics.
Abbey, Craig K; Zhu, Yang; Bahramian, Sara; Insana, Michael F
2017-04-01
Despite a great deal of work characterizing the statistical properties of radio frequency backscattered ultrasound signals, less is known about the statistical properties of demodulated intensity signals. Analysis of intensity is made more difficult by a strong nonlinearity that arises in the process of demodulation. This limits our ability to characterize the spatial resolution and noise properties of B-mode ultrasound images. In this paper, we generalize earlier results on two-point intensity covariance using a multivariate systems approach. We derive the mean and autocovariance function of the intensity signal under Gaussian assumptions on both the object scattering function and acquisition noise, and with the assumption of a locally shift-invariant pulse-echo system function. We investigate the limiting cases of point statistics and a uniform scattering field with a stationary distribution. Results from validation studies using simulation and data from a real system applied to a uniform scattering phantom are presented. In the simulation studies, we find errors less than 10% between the theoretical mean and variance, and sample estimates of these quantities. Prediction of the intensity power spectrum (PS) in the real system exhibits good qualitative agreement (errors less than 3.5 dB for frequencies between 0.1 and 10 cyc/mm, but with somewhat higher error outside this range that may be due to the use of a window in the PS estimation procedure). We also replicate the common finding that the intensity mean is equal to its standard deviation (i.e., signal-to-noise ratio = 1) for fully developed speckle. We show how the derived statistical properties can be used to characterize the quality of an ultrasound linear array for low-contrast patterns using generalized noise-equivalent quanta directly on the intensity signal.
International Nuclear Information System (INIS)
Lim, Gyeong Hui
2008-03-01
This book consists of 15 chapters, which are basic conception and meaning of statistical thermodynamics, Maxwell-Boltzmann's statistics, ensemble, thermodynamics function and fluctuation, statistical dynamics with independent particle system, ideal molecular system, chemical equilibrium and chemical reaction rate in ideal gas mixture, classical statistical thermodynamics, ideal lattice model, lattice statistics and nonideal lattice model, imperfect gas theory on liquid, theory on solution, statistical thermodynamics of interface, statistical thermodynamics of a high molecule system and quantum statistics
Statistical analysis and model validation of automobile emissions
2000-09-01
The article discusses the development of a comprehensive modal emissions model that is currently being integrated with a variety of transportation models as part of National Cooperative Highway Research Program project 25-11. Described is the second-...
Cross-Lingual Lexical Triggers in Statistical Language Modeling
National Research Council Canada - National Science Library
Kim, Woosung; Khudanpur, Sanjeev
2003-01-01
.... We achieve this through an extension of the method of lexical triggers to the cross-language problem, and by developing a likelihoodbased adaptation scheme for combining a trigger model with an N-gram model...
Energy Technology Data Exchange (ETDEWEB)
Lovejoy, S., E-mail: lovejoy@physics.mcgill.ca [Physics Department, McGill University, Montreal, Quebec H3A 2T8 (Canada); Lima, M. I. P. de [Institute of Marine Research (IMAR) and Marine and Environmental Sciences Centre (MARE), Coimbra (Portugal); Department of Civil Engineering, University of Coimbra, 3030-788 Coimbra (Portugal)
2015-07-15
Over the range of time scales from about 10 days to 30–100 years, in addition to the familiar weather and climate regimes, there is an intermediate “macroweather” regime characterized by negative temporal fluctuation exponents: implying that fluctuations tend to cancel each other out so that averages tend to converge. We show theoretically and numerically that macroweather precipitation can be modeled by a stochastic weather-climate model (the Climate Extended Fractionally Integrated Flux, model, CEFIF) first proposed for macroweather temperatures and we show numerically that a four parameter space-time CEFIF model can approximately reproduce eight or so empirical space-time exponents. In spite of this success, CEFIF is theoretically and numerically difficult to manage. We therefore propose a simplified stochastic model in which the temporal behavior is modeled as a fractional Gaussian noise but the spatial behaviour as a multifractal (climate) cascade: a spatial extension of the recently introduced ScaLIng Macroweather Model, SLIMM. Both the CEFIF and this spatial SLIMM model have a property often implicitly assumed by climatologists that climate statistics can be “homogenized” by normalizing them with the standard deviation of the anomalies. Physically, it means that the spatial macroweather variability corresponds to different climate zones that multiplicatively modulate the local, temporal statistics. This simplified macroweather model provides a framework for macroweather forecasting that exploits the system's long range memory and spatial correlations; for it, the forecasting problem has been solved. We test this factorization property and the model with the help of three centennial, global scale precipitation products that we analyze jointly in space and in time.
Lovejoy, S; de Lima, M I P
2015-07-01
Over the range of time scales from about 10 days to 30-100 years, in addition to the familiar weather and climate regimes, there is an intermediate "macroweather" regime characterized by negative temporal fluctuation exponents: implying that fluctuations tend to cancel each other out so that averages tend to converge. We show theoretically and numerically that macroweather precipitation can be modeled by a stochastic weather-climate model (the Climate Extended Fractionally Integrated Flux, model, CEFIF) first proposed for macroweather temperatures and we show numerically that a four parameter space-time CEFIF model can approximately reproduce eight or so empirical space-time exponents. In spite of this success, CEFIF is theoretically and numerically difficult to manage. We therefore propose a simplified stochastic model in which the temporal behavior is modeled as a fractional Gaussian noise but the spatial behaviour as a multifractal (climate) cascade: a spatial extension of the recently introduced ScaLIng Macroweather Model, SLIMM. Both the CEFIF and this spatial SLIMM model have a property often implicitly assumed by climatologists that climate statistics can be "homogenized" by normalizing them with the standard deviation of the anomalies. Physically, it means that the spatial macroweather variability corresponds to different climate zones that multiplicatively modulate the local, temporal statistics. This simplified macroweather model provides a framework for macroweather forecasting that exploits the system's long range memory and spatial correlations; for it, the forecasting problem has been solved. We test this factorization property and the model with the help of three centennial, global scale precipitation products that we analyze jointly in space and in time.
Regional temperature models are needed for characterizing and mapping stream thermal regimes, establishing reference conditions, predicting future impacts and identifying critical thermal refugia. Spatial statistical models have been developed to improve regression modeling techn...
Human turnover dynamics during sleep: Statistical behavior and its modeling
Yoneyama, Mitsuru; Okuma, Yasuyuki; Utsumi, Hiroya; Terashi, Hiroo; Mitoma, Hiroshi
2014-03-01
Turnover is a typical intermittent body movement while asleep. Exploring its behavior may provide insights into the mechanisms and management of sleep. However, little is understood about the dynamic nature of turnover in healthy humans and how it can be modified in disease. Here we present a detailed analysis of turnover signals that are collected by accelerometry from healthy elderly subjects and age-matched patients with neurodegenerative disorders such as Parkinson's disease. In healthy subjects, the time intervals between consecutive turnover events exhibit a well-separated bimodal distribution with one mode at ⩽10 s and the other at ⩾100 s, whereas such bimodality tends to disappear in neurodegenerative patients. The discovery of bimodality and fine temporal structures (⩽10 s) is a contribution that is not revealed by conventional sleep recordings with less time resolution (≈30 s). Moreover, we estimate the scaling exponent of the interval fluctuations, which also shows a clear difference between healthy subjects and patients. We incorporate these experimental results into a computational model of human decision making. A decision is to be made at each simulation step between two choices: to keep on sleeping or to make a turnover, the selection of which is determined dynamically by comparing a pair of random numbers assigned to each choice. This decision is weighted by a single parameter that reflects the depth of sleep. The resulting simulated behavior accurately replicates many aspects of observed turnover patterns, including the appearance or disappearance of bimodality and leads to several predictions, suggesting that the depth parameter may be useful as a quantitative measure for differentiating between normal and pathological sleep. These findings have significant clinical implications and may pave the way for the development of practical sleep assessment technologies.
PyEvolve: a toolkit for statistical modelling of molecular evolution
Directory of Open Access Journals (Sweden)
Wakefield Matthew J
2004-01-01
Full Text Available Abstract Background Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences – ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation. Results Here we describe the implementation of PyEvolve, a toolkit for the application of existing, and development of new, statistical methods for molecular evolution. We present the object architecture and design schema of PyEvolve, which includes an adaptable multi-level parallelisation schema. The approach for defining new methods is illustrated by implementing a novel dinucleotide model of substitution that includes a parameter for mutation of methylated CpG's, which required 8 lines of standard Python code to define. Benchmarking was performed using either a dinucleotide or codon substitution model applied to an alignment of BRCA1 sequences from 20 mammals, or a 10 species subset. Up to five-fold parallel performance gains over serial were recorded. Compared to leading alternative software, PyEvolve exhibited significantly better real world performance for parameter rich models with a large data set, reducing the time required for optimisation from ~10 days to ~6 hours. Conclusion PyEvolve provides flexible functionality that can be used either for statistical modelling of molecular evolution, or the development of new methods in the
Central Limit Theorem for Exponentially Quasi-local Statistics of Spin Models on Cayley Graphs
Reddy, Tulasi Ram; Vadlamani, Sreekar; Yogeshwaran, D.
2018-04-01
Central limit theorems for linear statistics of lattice random fields (including spin models) are usually proven under suitable mixing conditions or quasi-associativity. Many interesting examples of spin models do not satisfy mixing conditions, and on the other hand, it does not seem easy to show central limit theorem for local statistics via quasi-associativity. In this work, we prove general central limit theorems for local statistics and exponentially quasi-local statistics of spin models on discrete Cayley graphs with polynomial growth. Further, we supplement these results by proving similar central limit theorems for random fields on discrete Cayley graphs taking values in a countable space, but under the stronger assumptions of α -mixing (for local statistics) and exponential α -mixing (for exponentially quasi-local statistics). All our central limit theorems assume a suitable variance lower bound like many others in the literature. We illustrate our general central limit theorem with specific examples of lattice spin models and statistics arising in computational topology, statistical physics and random networks. Examples of clustering spin models include quasi-associated spin models with fast decaying covariances like the off-critical Ising model, level sets of Gaussian random fields with fast decaying covariances like the massive Gaussian free field and determinantal point processes with fast decaying kernels. Examples of local statistics include intrinsic volumes, face counts, component counts of random cubical complexes while exponentially quasi-local statistics include nearest neighbour distances in spin models and Betti numbers of sub-critical random cubical complexes.
Oseloka Ezepue, Patrick; Ojo, Adegbola
2012-12-01
A challenging problem in some developing countries such as Nigeria is inadequate training of students in effective problem solving using the core concepts of their disciplines. Related to this is a disconnection between their learning and socio-economic development agenda of a country. These problems are more vivid in statistical education which is dominated by textbook examples and unbalanced assessment 'for' and 'of' learning within traditional curricula. The problems impede the achievement of socio-economic development objectives such as those stated in the Nigerian Vision 2020 blueprint and United Nations Millennium Development Goals. They also impoverish the ability of (statistics) graduates to creatively use their knowledge in relevant business and industry sectors, thereby exacerbating mass graduate unemployment in Nigeria and similar developing countries. This article uses a case study in statistical modelling to discuss the nature of innovations in statistics education vital to producing new kinds of graduates who can link their learning to national economic development goals, create wealth and alleviate poverty through (self) employment. Wider implications of the innovations for repositioning mathematical sciences education globally are explored in this article.
Modelling the statistical dependence of rainfall event variables through copula functions
Directory of Open Access Journals (Sweden)
M. Balistrocchi
2011-06-01
Full Text Available In many hydrological models, such as those derived by analytical probabilistic methods, the precipitation stochastic process is represented by means of individual storm random variables which are supposed to be independent of each other. However, several proposals were advanced to develop joint probability distributions able to account for the observed statistical dependence. The traditional technique of the multivariate statistics is nevertheless affected by several drawbacks, whose most evident issue is the unavoidable subordination of the dependence structure assessment to the marginal distribution fitting. Conversely, the copula approach can overcome this limitation, by dividing the problem in two distinct parts. Furthermore, goodness-of-fit tests were recently made available and a significant improvement in the function selection reliability has been achieved. Herein the dependence structure of the rainfall event volume, the wet weather duration and the interevent time is assessed and verified by test statistics with respect to three long time series recorded in different Italian climates. Paired analyses revealed a non negligible dependence between volume and duration, while the interevent period proved to be substantially independent of the other variables. A unique copula model seems to be suitable for representing this dependence structure, despite the sensitivity demonstrated by its parameter towards the threshold utilized in the procedure for extracting the independent events. The joint probability function was finally developed by adopting a Weibull model for the marginal distributions.
A Statistical Model for Natural Gas Standardized Load Profiles
Czech Academy of Sciences Publication Activity Database
Brabec, Marek; Konár, Ondřej; Malý, Marek; Pelikán, Emil; Vondráček, Jiří
2009-01-01
Roč. 58, č. 1 (2009), s. 123-139 ISSN 0035-9254 R&D Projects: GA AV ČR 1ET400300513 Institutional research plan: CEZ:AV0Z10300504 Keywords : disaggregation * generalized additive models * multiplicative model * non-linear effects * segmentation * semiparametric regression model Subject RIV: JE - Non-nuclear Energetics, Energy Consumption ; Use Impact factor: 1.060, year: 2009
Carrier Statistics and Quantum Capacitance Models of Graphene Nanoscroll
Directory of Open Access Journals (Sweden)
M. Khaledian
2014-01-01
schematic perfect scroll-like Archimedes spiral. The DOS model was derived at first, while it was later applied to compute the carrier concentration and quantum capacitance model. Furthermore, the carrier concentration and quantum capacitance were modeled for both degenerate and nondegenerate regimes, along with examining the effect of structural parameters and chirality number on the density of state and carrier concentration. Latterly, the temperature effect on the quantum capacitance was studied too.
Mark Burden, Adrian; Lewis, Sandra Elizabeth; Willcox, Emma
2014-12-01
Numerous ways exist to process raw electromyograms (EMGs). However, the effect of altering processing methods on peak and mean EMG has seldom been investigated. The aim of this study was to investigate the effect of using different root mean square (RMS) window lengths and overlaps on the amplitude, reliability and inter-individual variability of gluteus maximus EMGs recorded during the clam exercise, and on the statistical significance and clinical relevance of amplitude differences between two exercise conditions. Mean and peak RMS of 10 repetitions from 17 participants were obtained using processing window lengths of 0.01, 0.15, 0.2, 0.25 and 1 s, with no overlap and overlaps of 25, 50 and 75% of window length. The effect of manipulating window length on reliability and inter-individual variability was greater for peak EMG (coefficient of variation [CV] window generally displaying the lowest variability. As a consequence, neither statistical significance nor clinical relevance (effect size [ES]) of mean EMG was affected by manipulation of window length. Statistical significance of peak EMG was more sensitive to changes in window length, with lower p-values generally being recorded for the 1 s window. As use of different window lengths has a greater effect on variability and statistical significance of the peak EMG, then clinicians should use the mean EMG. They should also be aware that use of different numbers of exercise repetitions and participants can have a greater effect on EMG parameters than length of processing window. Copyright © 2014 Elsevier Ltd. All rights reserved.
Role of scaling in the statistical modelling of finance
Indian Academy of Sciences (India)
Abstract. Modelling the evolution of a financial index as a stochastic process is a prob- lem awaiting a full, satisfactory solution since it was first formulated by Bachelier in 1900. Here it is shown that the scaling with time of the return probability density function sampled from the historical series suggests a successful model.
Statistical shape model with random walks for inner ear segmentation
DEFF Research Database (Denmark)
Pujadas, Esmeralda Ruiz; Kjer, Hans Martin; Piella, Gemma
2016-01-01
Cochlear implants can restore hearing to completely or partially deaf patients. The intervention planning can be aided by providing a patient-specific model of the inner ear. Such a model has to be built from high resolution images with accurate segmentations. Thus, a precise segmentation is requ...
Recent advances in importance sampling for statistical model checking
Reijsbergen, D.P.; de Boer, Pieter-Tjerk; Scheinhardt, Willem R.W.; Haverkort, Boudewijn R.H.M.
2013-01-01
In the following work we present an overview of recent advances in rare event simulation for model checking made at the University of Twente. The overview is divided into the several model classes for which we propose algorithms, namely multicomponent systems, Markov chains and stochastic Petri
Role of scaling in the statistical modelling of finance
Indian Academy of Sciences (India)
Modelling the evolution of a financial index as a stochastic process is a problem awaiting a full, satisfactory solution since it was first formulated by Bachelier in 1900. Here it is shown that the scaling with time of the return probability density function sampled from the historical series suggests a successful model.
Statistical model of stress corrosion cracking based on extended
Indian Academy of Sciences (India)
The mechanism of stress corrosion cracking (SCC) has been discussed for decades. Here I propose a model of SCC reflecting the feature of fracture in brittle manner based on the variational principle under approximately supposed thermal equilibrium. In that model the functionals are expressed with extended forms of ...
Statistical model of stress corrosion cracking based on extended ...
Indian Academy of Sciences (India)
The mechanism of stress corrosion cracking (SCC) has been discussed for decades. Here I propose a model of SCC reflecting the feature of fracture in brittle manner based on the variational principle under approximately supposed thermal equilibrium. In that model the functionals are expressed with extended forms of ...
Statistical model of stress corrosion cracking based on extended ...
Indian Academy of Sciences (India)
2013-12-01
Dec 1, 2013 ... Abstract. The mechanism of stress corrosion cracking (SCC) has been discussed for decades. Here I propose a model of SCC reflecting the feature of fracture in brittle manner based on the vari- ational principle under approximately supposed thermal equilibrium. In that model the functionals are expressed ...
Bias in iterative reconstruction of low-statistics PET data: benefits of a resolution model
Energy Technology Data Exchange (ETDEWEB)
Walker, M D; Asselin, M-C; Julyan, P J; Feldmann, M; Matthews, J C [School of Cancer and Enabling Sciences, Wolfson Molecular Imaging Centre, MAHSC, University of Manchester, Manchester M20 3LJ (United Kingdom); Talbot, P S [Mental Health and Neurodegeneration Research Group, Wolfson Molecular Imaging Centre, MAHSC, University of Manchester, Manchester M20 3LJ (United Kingdom); Jones, T, E-mail: matthew.walker@manchester.ac.uk [Academic Department of Radiation Oncology, Christie Hospital, University of Manchester, Manchester M20 4BX (United Kingdom)
2011-02-21
Iterative image reconstruction methods such as ordered-subset expectation maximization (OSEM) are widely used in PET. Reconstructions via OSEM are however reported to be biased for low-count data. We investigated this and considered the impact for dynamic PET. Patient listmode data were acquired in [{sup 11}C]DASB and [{sup 15}O]H{sub 2}O scans on the HRRT brain PET scanner. These data were subsampled to create many independent, low-count replicates. The data were reconstructed and the images from low-count data were compared to the high-count originals (from the same reconstruction method). This comparison enabled low-statistics bias to be calculated for the given reconstruction, as a function of the noise-equivalent counts (NEC). Two iterative reconstruction methods were tested, one with and one without an image-based resolution model (RM). Significant bias was observed when reconstructing data of low statistical quality, for both subsampled human and simulated data. For human data, this bias was substantially reduced by including a RM. For [{sup 11}C]DASB the low-statistics bias in the caudate head at 1.7 M NEC (approx. 30 s) was -5.5% and -13% with and without RM, respectively. We predicted biases in the binding potential of -4% and -10%. For quantification of cerebral blood flow for the whole-brain grey- or white-matter, using [{sup 15}O]H{sub 2}O and the PET autoradiographic method, a low-statistics bias of <2.5% and <4% was predicted for reconstruction with and without the RM. The use of a resolution model reduces low-statistics bias and can hence be beneficial for quantitative dynamic PET.
Study on Semi-Parametric Statistical Model of Safety Monitoring of Cracks in Concrete Dams
Directory of Open Access Journals (Sweden)
Chongshi Gu
2013-01-01
Full Text Available Cracks are one of the hidden dangers in concrete dams. The study on safety monitoring models of concrete dam cracks has always been difficult. Using the parametric statistical model of safety monitoring of cracks in concrete dams, with the help of the semi-parametric statistical theory, and considering the abnormal behaviors of these cracks, the semi-parametric statistical model of safety monitoring of concrete dam cracks is established to overcome the limitation of the parametric model in expressing the objective model. Previous projects show that the semi-parametric statistical model has a stronger fitting effect and has a better explanation for cracks in concrete dams than the parametric statistical model. However, when used for forecast, the forecast capability of the semi-parametric statistical model is equivalent to that of the parametric statistical model. The modeling of the semi-parametric statistical model is simple, has a reasonable principle, and has a strong practicality, with a good application prospect in the actual project.
Model unspecific search in CMS. Treatment of insufficient Monte Carlo statistics
Energy Technology Data Exchange (ETDEWEB)
Lieb, Jonas; Albert, Andreas; Duchardt, Deborah; Hebbeker, Thomas; Knutzen, Simon; Meyer, Arnd; Pook, Tobias; Roemer, Jonas [III. Physikalisches Institut A, RWTH Aachen University (Germany)
2016-07-01
In 2015, the CMS detector recorded proton-proton collisions at an unprecedented center of mass energy of √(s)=13 TeV. The Model Unspecific Search in CMS (MUSiC) offers an analysis approach of these data which is complementary to dedicated analyses: By taking all produced final states into consideration, MUSiC is sensitive to indicators of new physics appearing in final states that are usually not investigated. In a two step process, MUSiC first classifies events according to their physics content and then searches kinematic distributions for the most significant deviations between Monte Carlo simulations and observed data. Such a general approach introduces its own set of challenges. One of them is the treatment of situations with insufficient Monte Carlo statistics. Complementing introductory presentations on the MUSiC event selection and classification, this talk will present a method of dealing with the issue of low Monte Carlo statistics.
Reflections on the Baron and Kenny model of statistical mediation
Directory of Open Access Journals (Sweden)
Antonio Pardo
2013-05-01
Full Text Available In the 25 years since Baron and Kenny (1986 published their ideas on how to analyze and interpret statistical mediation, few works have been more cited, and perhaps, so decisively influenced the way applied researchers understand and analyze mediation in social and health sciences. However, the widespread use of a procedure does not necessarily make it a safe or reliable strategy. In fact, during these years, many researchers have pointed out the limitations of the procedure Baron and Kenny proposed for demonstrating mediation. The twofold aim of this paper is to (1 carry out a review of the limitations of the method by Baron and Kenny, with particular attention to the weakness in the confirmatory logic of the procedure, and (2 provide an empirical example that, in applying the method, data obtained from the same theoretical scenario (i.e., with or without the presence of mediation can be compatible with both the mediation and no-mediation hypotheses.
Statistical modelling of Poisson/log-normal data
International Nuclear Information System (INIS)
Miller, G.
2007-01-01
In statistical data fitting, self consistency is checked by examining the closeness of the quantity Χ 2 /NDF to 1, where Χ 2 is the sum of squares of data minus fit divided by standard deviation, and NDF is the number of data minus the number of fit parameters. In order to calculate Χ 2 one needs an expression for the standard deviation. In this note several alternative expressions for the standard deviation of data distributed according to a Poisson/log-normal distribution are proposed and evaluated by Monte Carlo simulation. Two preferred alternatives are identified. The use of replicate data to obtain uncertainty is problematic for a small number of replicates. A method to correct this problem is proposed. The log-normal approximation is good for sufficiently positive data. A modification of the log-normal approximation is proposed, which allows it to be used to test the hypothesis that the true value is zero. (authors)
Short-run and Current Analysis Model in Statistics
Directory of Open Access Journals (Sweden)
Constantin Anghelache
2006-01-01
Full Text Available Using the short-run statistic indicators is a compulsory requirement implied in the current analysis. Therefore, there is a system of EUROSTAT indicators on short run which has been set up in this respect, being recommended for utilization by the member-countries. On the basis of these indicators, there are regular, usually monthly, analysis being achieved in respect of: the production dynamic determination; the evaluation of the short-run investment volume; the development of the turnover; the wage evolution: the employment; the price indexes and the consumer price index (inflation; the volume of exports and imports and the extent to which the imports are covered by the exports and the sold of trade balance. The EUROSTAT system of indicators of conjuncture is conceived as an open system, so that it can be, at any moment extended or restricted, allowing indicators to be amended or even removed, depending on the domestic users requirements as well as on the specific requirements of the harmonization and integration. For the short-run analysis, there is also the World Bank system of indicators of conjuncture, which is utilized, relying on the data sources offered by the World Bank, The World Institute for Resources or other international organizations statistics. The system comprises indicators of the social and economic development and focuses on the indicators for the following three fields: human resources, environment and economic performances. At the end of the paper, there is a case study on the situation of Romania, for which we used all these indicators.
Statistical Texture Model for mass Detection in Mammography
Directory of Open Access Journals (Sweden)
Nicolás Gallego-Ortiz
2013-12-01
Full Text Available In the context of image processing algorithms for mass detection in mammography, texture is a key feature to be used to distinguish abnormal tissue from normal tissue. Recently, a texture model based on a multivariate Gaussian mixture was proposed, of which the parameters are learned in an unsupervised way from the pixel intensities of images. The model produces images that are probabilistic maps of texture normality and it was proposed as a visualization aid for diagnostic by clinical experts. In this paper, the usability of the model is studied for automatic mass detection. A segmentation strategy is proposed and evaluated using 79 mammography cases.
Computational modeling of neural activities for statistical inference
Kolossa, Antonio
2016-01-01
This authored monograph supplies empirical evidence for the Bayesian brain hypothesis by modeling event-related potentials (ERP) of the human electroencephalogram (EEG) during successive trials in cognitive tasks. The employed observer models are useful to compute probability distributions over observable events and hidden states, depending on which are present in the respective tasks. Bayesian model selection is then used to choose the model which best explains the ERP amplitude fluctuations. Thus, this book constitutes a decisive step towards a better understanding of the neural coding and computing of probabilities following Bayesian rules. The target audience primarily comprises research experts in the field of computational neurosciences, but the book may also be beneficial for graduate students who want to specialize in this field. .
A Statistical Model of Current Loops and Magnetic Monopoles
International Nuclear Information System (INIS)
Ayyer, Arvind
2015-01-01
We formulate a natural model of loops and isolated vertices for arbitrary planar graphs, which we call the monopole-dimer model. We show that the partition function of this model can be expressed as a determinant. We then extend the method of Kasteleyn and Temperley-Fisher to calculate the partition function exactly in the case of rectangular grids. This partition function turns out to be a square of a polynomial with positive integer coefficients when the grid lengths are even. Finally, we analyse this formula in the infinite volume limit and show that the local monopole density, free energy and entropy can be expressed in terms of well-known elliptic functions. Our technique is a novel determinantal formula for the partition function of a model of isolated vertices and loops for arbitrary graphs
Statistical model based gender prediction for targeted NGS clinical panels
Directory of Open Access Journals (Sweden)
Palani Kannan Kandavel
2017-12-01
The reference test dataset are being used to test the model. The sensitivity on predicting the gender has been increased from the current “genotype composition in ChrX” based approach. In addition, the prediction score given by the model can be used to evaluate the quality of clinical dataset. The higher prediction score towards its respective gender indicates the higher quality of sequenced data.
Illness-death model: statistical perspective and differential equations.
Brinks, Ralph; Hoyer, Annika
2018-01-27
The aim of this work is to relate the theory of stochastic processes with the differential equations associated with multistate (compartment) models. We show that the Kolmogorov Forward Differential Equations can be used to derive a relation between the prevalence and the transition rates in the illness-death model. Then, we prove mathematical well-definedness and epidemiological meaningfulness of the prevalence of the disease. As an application, we derive the incidence of diabetes from a series of cross-sections.
Statistical geological discrete fracture network model. Forsmark modelling stage 2.2
International Nuclear Information System (INIS)
Fox, Aaron; La Pointe, Paul; Simeonov, Assen; Hermanson, Jan; Oehman, Johan
2007-11-01
The Swedish Nuclear Fuel and Waste Management Company (SKB) is performing site characterization at two different locations, Forsmark and Laxemar, in order to locate a site for a final geologic repository for spent nuclear fuel. The program is built upon the development of Site Descriptive Models (SDMs) at specific timed data freezes. Each SDM is formed from discipline-specific reports from across the scientific spectrum. This report describes the methods, analyses, and conclusions of the geological modeling team with respect to a geological and statistical model of fractures and minor deformation zones (henceforth referred to as the geological DFN), version 2.2, at the Forsmark site. The geological DFN builds upon the work of other geological modelers, including the deformation zone (DZ), rock domain (RD), and fracture domain (FD) models. The geological DFN is a statistical model for stochastically simulating rock fractures and minor deformation zones as a scale of less than 1,000 m (the lower cut-off of the DZ models). The geological DFN is valid within four specific fracture domains inside the local model region, and encompassing the candidate volume at Forsmark: FFM01, FFM02, FFM03, and FFM06. The models are build using data from detailed surface outcrop maps and the cored borehole record at Forsmark. The conceptual model for the Forsmark 2.2 geological revolves around the concept of orientation sets; for each fracture domain, other model parameters such as size and intensity are tied to the orientation sets. Two classes of orientation sets were described; Global sets, which are encountered everywhere in the model region, and Local sets, which represent highly localized stress environments. Orientation sets were described in terms of their general cardinal direction (NE, NW, etc). Two alternatives are presented for fracture size modeling: - the tectonic continuum approach (TCM, TCMF) described by coupled size-intensity scaling following power law distributions
Statistical geological discrete fracture network model. Forsmark modelling stage 2.2
Energy Technology Data Exchange (ETDEWEB)
Fox, Aaron; La Pointe, Paul [Golder Associates Inc (United States); Simeonov, Assen [Swedish Nuclear Fuel and Waste Management Co., Stockholm (Sweden); Hermanson, Jan; Oehman, Johan [Golder Associates AB, Stockholm (Sweden)
2007-11-15
The Swedish Nuclear Fuel and Waste Management Company (SKB) is performing site characterization at two different locations, Forsmark and Laxemar, in order to locate a site for a final geologic repository for spent nuclear fuel. The program is built upon the development of Site Descriptive Models (SDMs) at specific timed data freezes. Each SDM is formed from discipline-specific reports from across the scientific spectrum. This report describes the methods, analyses, and conclusions of the geological modeling team with respect to a geological and statistical model of fractures and minor deformation zones (henceforth referred to as the geological DFN), version 2.2, at the Forsmark site. The geological DFN builds upon the work of other geological modelers, including the deformation zone (DZ), rock domain (RD), and fracture domain (FD) models. The geological DFN is a statistical model for stochastically simulating rock fractures and minor deformation zones as a scale of less than 1,000 m (the lower cut-off of the DZ models). The geological DFN is valid within four specific fracture domains inside the local model region, and encompassing the candidate volume at Forsmark: FFM01, FFM02, FFM03, and FFM06. The models are build using data from detailed surface outcrop maps and the cored borehole record at Forsmark. The conceptual model for the Forsmark 2.2 geological revolves around the concept of orientation sets; for each fracture domain, other model parameters such as size and intensity are tied to the orientation sets. Two classes of orientation sets were described; Global sets, which are encountered everywhere in the model region, and Local sets, which represent highly localized stress environments. Orientation sets were described in terms of their general cardinal direction (NE, NW, etc). Two alternatives are presented for fracture size modeling: - the tectonic continuum approach (TCM, TCMF) described by coupled size-intensity scaling following power law distributions
Modelling West African Total Precipitation Depth: A Statistical Approach
Directory of Open Access Journals (Sweden)
S. Sovoe
2015-09-01
Full Text Available Even though several reports over the past few decades indicate an increasing aridity over West Africa, attempts to establish the controlling factor(s have not been successful. The traditional belief of the position of the Inter-tropical Convergence Zone (ITCZ as the predominant factor over the region has been refuted by recent findings. Changes in major atmospheric circulations such as African Easterly Jet (AEJ and Tropical Easterly Jet (TEJ are being cited as major precipitation driving forces over the region. Thus, any attempt to predict long term precipitation events over the region using Global Circulation or Local Circulation Models could be flawed as the controlling factors are not fully elucidated yet. Successful prediction effort may require models which depend on past events as their inputs as in the case of time series models such as Autoregressive Integrated Moving Average (ARIMA model. In this study, historical precipitation data was imported as time series data structure into an R programming language and was used to build appropriate Seasonal Multiplicative Autoregressive Integrated Moving Average model, ARIMA (p, d, q*(P, D, Q. The model was then used to predict long term precipitation events over the Ghanaian segment of the Volta Basin which could be used in planning and implementation of development policies.
Directory of Open Access Journals (Sweden)
Eric M. Laflamme
2016-06-01
Full Text Available In this work we perform a statistical downscaling by applying a CDF transformation function to local-level daily precipitation extremes (from NCDC station data and corresponding NARCCAP regional climate model (RCM output to derive local-scale projections. These high-resolution projections are essential in assessing the impacts of projected climate change. The downscaling method is performed on 58 locations throughout New England, and from the projected distribution of extreme precipitation local-level 25-year return levels are calculated. To obtain uncertainty estimates for return levels, three procedures are employed: a parametric bootstrapping with mean corrected confidence intervals, a non-parametric bootstrapping with BCa (bias corrected and acceleration intervals, and a Bayesian model. In all cases, results are presented via distributions of differences in return levels between predicted and historical periods. Results from the three procedures show very few New England locations with significant increases in 25-year return levels from the historical to projected periods. This may indicate that projected trends in New England precipitation tend to be statistically less significant than suggested by many studies. For all three procedures, downscaled results are highly dependent on RCM and GCM model choice.
Peters, J. M.; Kravtsov, S.
2011-12-01
This study quantifies the dependence of nonlinear regimes (manifested in non-gaussian probability distributions) and spreads of ensemble trajectories in a reduced phase space of a realistic three-layer quasi-geostrophic (QG3) atmospheric model on this model's climate state.To elucidate probabilistic properties of the QG3 trajectories, we compute, in phase planes of leading EOFs of the model, the coefficients of the corresponding Fokker-Planck (FP) equations. These coefficients represent drift vectors (computed from one-day phase space tendencies) and diffusion tensors (computed from one-day lagged covariance matrices of model trajectory displacements), and are based on a long QG3 simulation. We also fit two statistical trajectory models to the reduced phase-space time series spanned by the full QG3 model states. One reduced model is a standard Linear Inverse Model (LIM) fitted to a long QG3 time series. The LIM model is forced by state-independent (additive) noise and has a deterministic operator which represents non-divergent velocity field in the reduced phase space considered. The other, more advanced model (NSM), is nonlinear, divergent, and is driven by state-dependent noise. The NSM model mimics well the full QG3 model trajectory behavior in the reduced phase space; its corresponding FP model is nearly identical to that based on the full QG3 simulations. By systematic analysis of the differences between the drift vectors and diffusion tensors of the QG3-based, NSM-based, and LIM-based FP models, as well as the PDF evolution simulated by these FP models, we disentangle the contributions of the multiplicative noise and deterministic dynamics into nonlinear behavior and predictability of the atmospheric states produced by the dynamical QG3 model.
Benchmark validation of statistical models: Application to mediation analysis of imagery and memory.
MacKinnon, David P; Valente, Matthew J; Wurpts, Ingrid C
2018-03-29
This article describes benchmark validation, an approach to validating a statistical model. According to benchmark validation, a valid model generates estimates and research conclusions consistent with a known substantive effect. Three types of benchmark validation-(a) benchmark value, (b) benchmark estimate, and (c) benchmark effect-are described and illustrated with examples. Benchmark validation methods are especially useful for statistical models with assumptions that are untestable or very difficult to test. Benchmark effect validation methods were applied to evaluate statistical mediation analysis in eight studies using the established effect that increasing mental imagery improves recall of words. Statistical mediation analysis led to conclusions about mediation that were consistent with established theory that increased imagery leads to increased word recall. Benchmark validation based on established substantive theory is discussed as a general way to investigate characteristics of statistical models and a complement to mathematical proof and statistical simulation. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Physics-based statistical model and simulation method of RF propagation in urban environments
Pao, Hsueh-Yuan; Dvorak, Steven L.
2010-09-14
A physics-based statistical model and simulation/modeling method and system of electromagnetic wave propagation (wireless communication) in urban environments. In particular, the model is a computationally efficient close-formed parametric model of RF propagation in an urban environment which is extracted from a physics-based statistical wireless channel simulation method and system. The simulation divides the complex urban environment into a network of interconnected urban canyon waveguides which can be analyzed individually; calculates spectral coefficients of modal fields in the waveguides excited by the propagation using a database of statistical impedance boundary conditions which incorporates the complexity of building walls in the propagation model; determines statistical parameters of the calculated modal fields; and determines a parametric propagation model based on the statistical parameters of the calculated modal fields from which predictions of communications capability may be made.
Sharing brain mapping statistical results with the neuroimaging data model
Maumet, Camille; Auer, Tibor; Bowring, Alexander; Chen, Gang; Das, Samir; Flandin, Guillaume; Ghosh, Satrajit; Glatard, Tristan; Gorgolewski, Krzysztof J.; Helmer, Karl G.; Jenkinson, Mark; Keator, David B.; Nichols, B. Nolan; Poline, Jean-Baptiste; Reynolds, Richard; Sochat, Vanessa; Turner, Jessica; Nichols, Thomas E.
2016-01-01
Only a tiny fraction of the data and metadata produced by an fMRI study is finally conveyed to the community. This lack of transparency not only hinders the reproducibility of neuroimaging results but also impairs future meta-analyses. In this work we introduce NIDM-Results, a format specification providing a machine-readable description of neuroimaging statistical results along with key image data summarising the experiment. NIDM-Results provides a unified representation of mass univariate analyses including a level of detail consistent with available best practices. This standardized representation allows authors to relay methods and results in a platform-independent regularized format that is not tied to a particular neuroimaging software package. Tools are available to export NIDM-Result graphs and associated files from the widely used SPM and FSL software packages, and the NeuroVault repository can import NIDM-Results archives. The specification is publically available at: http://nidm.nidash.org/specs/nidm-results.html. PMID:27922621
Directory of Open Access Journals (Sweden)
Tao Gao
2014-01-01
Full Text Available Extreme precipitation is likely to be one of the most severe meteorological disasters in China; however, studies on the physical factors affecting precipitation extremes and corresponding prediction models are not accurately available. From a new point of view, the sensible heat flux (SHF and latent heat flux (LHF, which have significant impacts on summer extreme rainfall in Yangtze River basin (YRB, have been quantified and then selections of the impact factors are conducted. Firstly, a regional extreme precipitation index was applied to determine Regions of Significant Correlation (RSC by analyzing spatial distribution of correlation coefficients between this index and SHF, LHF, and sea surface temperature (SST on global ocean scale; then the time series of SHF, LHF, and SST in RSCs during 1967–2010 were selected. Furthermore, other factors that significantly affect variations in precipitation extremes over YRB were also selected. The methods of multiple stepwise regression and leave-one-out cross-validation (LOOCV were utilized to analyze and test influencing factors and statistical prediction model. The correlation coefficient between observed regional extreme index and model simulation result is 0.85, with significant level at 99%. This suggested that the forecast skill was acceptable although many aspects of the prediction model should be improved.
From intuition to statistics in building subsurface structural models
Brandenburg, J.P.; Alpak, F.O.; Naruk, S.; Solum, J.
2011-01-01
Experts associated with the oil and gas exploration industry suggest that combining forward trishear models with stochastic global optimization algorithms allows a quantitative assessment of the uncertainty associated with a given structural model. The methodology is applied to incompletely imaged structures related to deepwater hydrocarbon reservoirs and results are compared to prior manual palinspastic restorations and borehole data. This methodology is also useful for extending structural interpretations into other areas of limited resolution, such as subsalt in addition to extrapolating existing data into seismic data gaps. This technique can be used for rapid reservoir appraisal and potentially have other applications for seismic processing, well planning, and borehole stability analysis.
Monitoring and statistical modelling of sedimentation in gully pots
Post, J.A.B.; Pothof, I.W.M.; Dirksen, J.; Baars, E. J.; Langeveld, J.G.; Clemens, F.H.L.R.
2016-01-01
Gully pots are essential assets designed to relief the downstream system by trapping solids and attached pollutants suspended in runoff. This study applied a methodology to develop a quantitative gully pot sedimentation and blockage model. To this end, sediment bed level time series from 300
Statistical Modelling of Fishing Activities in the North Atlantic
Fernández, C.; Ley, E.; Steel, M.F.J.
1997-01-01
This paper deals with the issue of modeling daily catches of fishing boats in the Grand Bank fishing grounds. We have data on catches per species for a number of vessels collected by the European Union in the context of the North Atlantic Fisheries Organization. Many variables can be thought to
A simple statistical signal loss model for deep underground garage
DEFF Research Database (Denmark)
Nguyen, Huan Cong; Gimenez, Lucas Chavarria; Kovacs, Istvan
2016-01-01
In this paper we address the channel modeling aspects for a deep-indoor scenario with extreme coverage conditions in terms of signal losses, namely underground garage areas. We provide an in-depth analysis in terms of path loss (gain) and large scale signal shadowing, and a propose simple...
Liu, Wei; Ding, Jinhui
2018-04-01
The application of the principle of the intention-to-treat (ITT) to the analysis of clinical trials is challenged in the presence of missing outcome data. The consequences of stopping an assigned treatment in a withdrawn subject are unknown. It is difficult to make a single assumption about missing mechanisms for all clinical trials because there are complicated reactions in the human body to drugs due to the presence of complex biological networks, leading to data missing randomly or non-randomly. Currently there is no statistical method that can tell whether a difference between two treatments in the ITT population of a randomized clinical trial with missing data is significant at a pre-specified level. Making no assumptions about the missing mechanisms, we propose a generalized complete-case (GCC) analysis based on the data of completers. An evaluation of the impact of missing data on the ITT analysis reveals that a statistically significant GCC result implies a significant treatment effect in the ITT population at a pre-specified significance level unless, relative to the comparator, the test drug is poisonous to the non-completers as documented in their medical records. Applications of the GCC analysis are illustrated using literature data, and its properties and limits are discussed.
Kleijnen, J.P.C.
1995-01-01
This tutorial discusses what-if analysis and optimization of System Dynamics models. These problems are solved, using the statistical techniques of regression analysis and design of experiments (DOE). These issues are illustrated by applying the statistical techniques to a System Dynamics model for
Sound statistical model checking for MDP using partial order and confluence reduction
Hartmanns, Arnd; Timmer, Mark
Statistical model checking (SMC) is an analysis method that circumvents the state space explosion problem in model-based verification by combining probabilistic simulation with statistical methods that provide clear error bounds. As a simulation-based technique, it can in general only provide sound
On-the-fly confluence detection for statistical model checking (extended version)
Hartmanns, Arnd; Timmer, Mark
Statistical model checking is an analysis method that circumvents the state space explosion problem in model-based verification by combining probabilistic simulation with statistical methods that provide clear error bounds. As a simulation-based technique, it can only provide sound results if the
Limited Area Forecasting and Statistical Modelling for Wind Energy Scheduling
DEFF Research Database (Denmark)
Rosgaard, Martin Haubjerg
forecast accuracy for operational wind power scheduling. Numerical weather prediction history and scales of atmospheric motion are summarised, followed by a literature review of limited area wind speed forecasting. Hereafter, the original contribution to research on the topic is outlined. The quality...... control of wind farm data used as forecast reference is described in detail, and a preliminary limited area forecasting study illustrates the aggravation of issues related to numerical orography representation and accurate reference coordinates at ne weather model resolutions. For the o shore and coastal...... sites studied limited area forecasting is found to deteriorate wind speed prediction accuracy, while inland results exhibit a steady forecast performance increase with weather model resolution. Temporal smoothing of wind speed forecasts is shown to improve wind power forecast performance by up to almost...
Statistical modelling and deconvolution of yield meter data
DEFF Research Database (Denmark)
Tøgersen, Frede Aakmann; Waagepetersen, Rasmus Plenge
2004-01-01
This paper considers the problem of mapping spatial variation of yield in a field using data from a yield monitoring system on a combine harvester. The unobserved yield is assumed to be a Gaussian random field and the yield monitoring system data is modelled as a convolution of the yield and an i......This paper considers the problem of mapping spatial variation of yield in a field using data from a yield monitoring system on a combine harvester. The unobserved yield is assumed to be a Gaussian random field and the yield monitoring system data is modelled as a convolution of the yield...... and an impulse response function. This results in an unusual spatial covariance structure (depending on the driving pattern of the combine harverster) for the yield monitoring system data. Parameters of the impulse response function and the spatial covariance function of the yield are estimated using maximum...
Some aspects of statistical modeling of human-error probability
International Nuclear Information System (INIS)
Prairie, R.R.
1982-01-01
Human reliability analyses (HRA) are often performed as part of risk assessment and reliability projects. Recent events in nuclear power have shown the potential importance of the human element. There are several on-going efforts in the US and elsewhere with the purpose of modeling human error such that the human contribution can be incorporated into an overall risk assessment associated with one or more aspects of nuclear power. An effort that is described here uses the HRA (event tree) to quantify and model the human contribution to risk. As an example, risk analyses are being prepared on several nuclear power plants as part of the Interim Reliability Assessment Program (IREP). In this process the risk analyst selects the elements of his fault tree that could be contributed to by human error. He then solicits the HF analyst to do a HRA on this element
A Two Step Face Alignment Approach Using Statistical Models
Directory of Open Access Journals (Sweden)
Ying Cui
2012-10-01
Full Text Available Although face alignment using the Active Appearance Model (AAM is relatively stable, it is known to be sensitive to initial values and not robust under inconstant circumstances. In order to strengthen the ability of AAM performance for face alignment, a two step approach for face alignment combining AAM and Active Shape Model (ASM is proposed. In the first step, AAM is used to locate the inner landmarks of the face. In the second step, the extended ASM is used to locate the outer landmarks of the face under the constraint of the estimated inner landmarks by AAM. The two kinds of landmarks are then combined together to form the whole facial landmarks. The proposed approach is compared with the basic AAM and the progressive AAM methods. Experimental results show that the proposed approach gives a much more effective performance.
Statistical gravitational waveform models: What to simulate next?
Doctor, Zoheyr; Farr, Ben; Holz, Daniel E.; Pürrer, Michael
2017-12-01
Models of gravitational waveforms play a critical role in detecting and characterizing the gravitational waves (GWs) from compact binary coalescences. Waveforms from numerical relativity (NR), while highly accurate, are too computationally expensive to produce to be directly used with Bayesian parameter estimation tools like Markov-chain-Monte-Carlo and nested sampling. We propose a Gaussian process regression (GPR) method to generate reduced-order-model waveforms based only on existing accurate (e.g. NR) simulations. Using a training set of simulated waveforms, our GPR approach produces interpolated waveforms along with uncertainties across the parameter space. As a proof of concept, we use a training set of IMRPhenomD waveforms to build a GPR model in the 2-d parameter space of mass ratio q and equal-and-aligned spin χ1=χ2. Using a regular, equally-spaced grid of 120 IMRPhenomD training waveforms in q ∈[1 ,3 ] and χ1∈[-0.5 ,0.5 ], the GPR mean approximates IMRPhenomD in this space to mismatches below 4.3 ×10-5. Our approach could in principle use training waveforms directly from numerical relativity. Beyond interpolation of waveforms, we also present a greedy algorithm that utilizes the errors provided by our GPR model to optimize the placement of future simulations. In a fiducial test case we find that using the greedy algorithm to iteratively add simulations achieves GPR errors that are ˜1 order of magnitude lower than the errors from using Latin-hypercube or square training grids.
Error Estimation of An Ensemble Statistical Seasonal Precipitation Prediction Model
Shen, Samuel S. P.; Lau, William K. M.; Kim, Kyu-Myong; Li, Gui-Long
2001-01-01
This NASA Technical Memorandum describes an optimal ensemble canonical correlation forecasting model for seasonal precipitation. Each individual forecast is based on the canonical correlation analysis (CCA) in the spectral spaces whose bases are empirical orthogonal functions (EOF). The optimal weights in the ensemble forecasting crucially depend on the mean square error of each individual forecast. An estimate of the mean square error of a CCA prediction is made also using the spectral method. The error is decomposed onto EOFs of the predictand and decreases linearly according to the correlation between the predictor and predictand. Since new CCA scheme is derived for continuous fields of predictor and predictand, an area-factor is automatically included. Thus our model is an improvement of the spectral CCA scheme of Barnett and Preisendorfer. The improvements include (1) the use of area-factor, (2) the estimation of prediction error, and (3) the optimal ensemble of multiple forecasts. The new CCA model is applied to the seasonal forecasting of the United States (US) precipitation field. The predictor is the sea surface temperature (SST). The US Climate Prediction Center's reconstructed SST is used as the predictor's historical data. The US National Center for Environmental Prediction's optimally interpolated precipitation (1951-2000) is used as the predictand's historical data. Our forecast experiments show that the new ensemble canonical correlation scheme renders a reasonable forecasting skill. For example, when using September-October-November SST to predict the next season December-January-February precipitation, the spatial pattern correlation between the observed and predicted are positive in 46 years among the 50 years of experiments. The positive correlations are close to or greater than 0.4 in 29 years, which indicates excellent performance of the forecasting model. The forecasting skill can be further enhanced when several predictors are used.
Labots, M; Laarakker, M C; Ohl, F; van Lith, H A
2016-06-29
Selecting chromosome substitution strains (CSSs, also called consomic strains/lines) used in the search for quantitative trait loci (QTLs) consistently requires the identification of the respective phenotypic trait of interest and is simply based on a significant difference between a consomic and host strain. However, statistical significance as represented by P values does not necessarily predicate practical importance. We therefore propose a method that pays attention to both the statistical significance and the actual size of the observed effect. The present paper extends on this approach and describes in more detail the use of effect size measures (Cohen's d, partial eta squared - η p (2) ) together with the P value as statistical selection parameters for the chromosomal assignment of QTLs influencing anxiety-related behavior and locomotion in laboratory mice. The effect size measures were based on integrated behavioral z-scoring and were calculated in three experiments: (A) a complete consomic male mouse panel with A/J as the donor strain and C57BL/6J as the host strain. This panel, including host and donor strains, was analyzed in the modified Hole Board (mHB). The consomic line with chromosome 19 from A/J (CSS-19A) was selected since it showed increased anxiety-related behavior, but similar locomotion compared to its host. (B) Following experiment A, female CSS-19A mice were compared with their C57BL/6J counterparts; however no significant differences and effect sizes close to zero were found. (C) A different consomic mouse strain (CSS-19PWD), with chromosome 19 from PWD/PhJ transferred on the genetic background of C57BL/6J, was compared with its host strain. Here, in contrast with CSS-19A, there was a decreased overall anxiety in CSS-19PWD compared to C57BL/6J males, but not locomotion. This new method shows an improved way to identify CSSs for QTL analysis for anxiety-related behavior using a combination of statistical significance testing and effect
Physical and statistical models for steam generator clogging diagnosis
Girard, Sylvain
2014-01-01
Clogging of steam generators in nuclear power plants is a highly sensitive issue in terms of performance and safety and this book proposes a completely novel methodology for diagnosing this phenomenon. It demonstrates real-life industrial applications of this approach to French steam generators and applies the approach to operational data gathered from French nuclear power plants. The book presents a detailed review of in situ diagnosis techniques and assesses existing methodologies for clogging diagnosis, whilst examining their limitations. It also addresses numerical modelling of the dynamic
Optimal statistical decisions about some alternative financial models
Czech Academy of Sciences Publication Activity Database
Vajda, Igor; Stummer, W.
2007-01-01
Roč. 137, č. 2 (2007), s. 441-471 ISSN 0304-4076 R&D Projects: GA MŠk(CZ) 1M0572; GA ČR GA201/02/1391; GA AV ČR IAA1075403 Institutional research plan: CEZ:AV0Z10750506 Keywords : Black-Scholes-Merton models * Relative entropies * Power divergences * Hellinger intergrals * Total variation distance * Bayesian decisions * Neyman-Pearson testing Subject RIV: BD - Theory of Information Impact factor: 1.990, year: 2007
Di Florio, Adriano
2017-10-01
In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B + → J/ψϕK +. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.
Qi, D.; Majda, A.
2017-12-01
A low-dimensional reduced-order statistical closure model is developed for quantifying the uncertainty in statistical sensitivity and intermittency in principal model directions with largest variability in high-dimensional turbulent system and turbulent transport models. Imperfect model sensitivity is improved through a recent mathematical strategy for calibrating model errors in a training phase, where information theory and linear statistical response theory are combined in a systematic fashion to achieve the optimal model performance. The idea in the reduced-order method is from a self-consistent mathematical framework for general systems with quadratic nonlinearity, where crucial high-order statistics are approximated by a systematic model calibration procedure. Model efficiency is improved through additional damping and noise corrections to replace the expensive energy-conserving nonlinear interactions. Model errors due to the imperfect nonlinear approximation are corrected by tuning the model parameters using linear response theory with an information metric in a training phase before prediction. A statistical energy principle is adopted to introduce a global scaling factor in characterizing the higher-order moments in a consistent way to improve model sensitivity. Stringent models of barotropic and baroclinic turbulence are used to display the feasibility of the reduced-order methods. Principal statistical responses in mean and variance can be captured by the reduced-order models with accuracy and efficiency. Besides, the reduced-order models are also used to capture crucial passive tracer field that is advected by the baroclinic turbulent flow. It is demonstrated that crucial principal statistical quantities like the tracer spectrum and fat-tails in the tracer probability density functions in the most important large scales can be captured efficiently with accuracy using the reduced-order tracer model in various dynamical regimes of the flow field with
Examining rainfall and cholera dynamics in Haiti using statistical and dynamic modeling approaches.
Eisenberg, Marisa C; Kujbida, Gregory; Tuite, Ashleigh R; Fisman, David N; Tien, Joseph H
2013-12-01
Haiti has been in the midst of a cholera epidemic since October 2010. Rainfall is thought to be associated with cholera here, but this relationship has only begun to be quantitatively examined. In this paper, we quantitatively examine the link between rainfall and cholera in Haiti for several different settings (including urban, rural, and displaced person camps) and spatial scales, using a combination of statistical and dynamic models. Statistical analysis of the lagged relationship between rainfall and cholera incidence was conducted using case crossover analysis and distributed lag nonlinear models. Dynamic models consisted of compartmental differential equation models including direct (fast) and indirect (delayed) disease transmission, where indirect transmission was forced by empirical rainfall data. Data sources include cholera case and hospitalization time series from the Haitian Ministry of Public Health, the United Nations Water, Sanitation and Health Cluster, International Organization for Migration, and Hôpital Albert Schweitzer. Rainfall data was obtained from rain gauges from the U.S. Geological Survey and Haiti Regeneration Initiative, and remote sensing rainfall data from the National Aeronautics and Space Administration Tropical Rainfall Measuring Mission. A strong relationship between rainfall and cholera was found for all spatial scales and locations examined. Increased rainfall was significantly correlated with increased cholera incidence 4-7 days later. Forcing the dynamic models with rainfall data resulted in good fits to the cholera case data, and rainfall-based predictions from the dynamic models closely matched observed cholera cases. These models provide a tool for planning and managing the epidemic as it continues. Copyright © 2013 Elsevier B.V. All rights reserved.
Off-critical statistical models: factorized scattering theories and bootstrap program
International Nuclear Information System (INIS)
Mussardo, G.
1992-01-01
We analyze those integrable statistical systems which originate from some relevant perturbations of the minimal models of conformal field theories. When only massive excitations are present, the systems can be efficiently characterized in terms of the relativistic scattering data. We review the general properties of the factorizable S-matrix in two dimensions with particular emphasis on the bootstrap principle. The classification program of the allowed spins of conserved currents and of the non-degenerate S-matrices is discussed and illustrated by means of some significant examples. The scattering theories of several massive perturbations of the minimal models are fully discussed. Among them are the Ising model, the tricritical Ising model, the Potts models, the series of the non-unitary minimal models M 2,2n+3 , the non-unitary model M 3,5 and the scaling limit of the polymer system. The ultraviolet limit of these massive integrable theories can be exploited by the thermodynamics Bethe ansatz, in particular the central charge of the original conformal theories can be recovered from the scattering data. We also consider the numerical method based on the so-called conformal space truncated approach which confirms the theoretical results and allows a direct measurement of the scattering data, i.e. the masses and the S-matrix of the particles in bootstrap interaction. The problem of computing the off-critical correlation functions is discussed in terms of the form-factor approach
Statistical modelling of neural networks in {gamma}-spectrometry applications
Energy Technology Data Exchange (ETDEWEB)
Vigneron, V.; Martinez, J.M. [CEA Centre d`Etudes de Saclay, 91 - Gif-sur-Yvette (France). Dept. de Mecanique et de Technologie; Morel, J.; Lepy, M.C. [CEA Centre d`Etudes de Saclay, 91 - Gif-sur-Yvette (France). Dept. des Applications et de la Metrologie des Rayonnements Ionisants
1995-12-31
Layered Neural Networks, which are a class of models based on neural computation, are applied to the measurement of uranium enrichment, i.e. the isotope ratio {sup 235} U/({sup 235} U + {sup 236} U + {sup 238} U). The usual method consider a limited number of {Gamma}-ray and X-ray peaks, and require previously calibrated instrumentation for each sample. But, in practice, the source-detector ensemble geometry conditions are critically different, thus a means of improving the above convention methods is to reduce the region of interest: this is possible by focusing on the K{sub {alpha}} X region where the three elementary components are present. Real data are used to study the performance of neural networks. Training is done with a Maximum Likelihood method to measure uranium {sup 235} U and {sup 238} U quantities in infinitely thick samples. (authors). 18 refs., 6 figs., 3 tabs.
Break-up fragment topology in statistical multifragmentation models
International Nuclear Information System (INIS)
Raduta, Ad. R.
2009-01-01
Break-up fragmentation patterns together with kinetic and configurational energy fluctuations are investigated in the framework of a microcanonical model with fragment degrees of freedom over a broad excitation energy range. As long as fragment partitioning is approximately preserved, energy fluctuations are found to be rather insensitive to both the way in which the freeze-out volume is constrained and the trajectory followed by the system in the excitation-energy-freeze-out volume space. Due to hard-core repulsion, the freeze-out volume is found to be populated nonuniformly, its highly depleted core giving the source a bubble-like structure. The most probable localization of the largest fragments in the freeze-out volume may be inferred experimentally from their kinematic properties, largely dictated by Coulomb repulsion.
Geo-Based Statistical Models for Vulnerability Prediction of Highway Network Segments
Directory of Open Access Journals (Sweden)
Keren Pollak
2014-04-01
Full Text Available This study describes four statistical models—Poisson; Negative Binomial; Zero-Inflated Poisson; and Zero-Inflated Negative Binomial—which were devised in order to examine traffic accidents and estimate the best probability estimating model in terms of future risk assessment at interurban road sections. The study was conducted on four sets of fixed-length sections of the road network: 500, 750, 1000, and 1500 m. The contribution of transportation and spatial parameters as predictors of road accident rates was evaluated for all four data sets separately. In addition, the Empirical Bayes method was applied. This method uses historical accidents information, allowing regression to the mean phenomenon so as to improve model results. The study was performed using Geographic Information System (GIS software. Other analyses, such as statistical analyses combined with spatial parameters, interactions, and examination of other geographical areas, were also performed. The results showed that the short road sections data sets of 500 and 750 m yielded the most stable models. This allows focused treatment on short sections of the road network as a way to save resources (enforcement; education and information; finance and potentially gain maximum benefit at minimum investment. It was found that the significant parameters affecting accident rates are: curvature of the road section; the region and traffic volume. An interaction between the region and traffic volume was also found.
A statistical framework for modeling HLA-dependent T cell response data.
Directory of Open Access Journals (Sweden)
Jennifer Listgarten
2007-10-01
Full Text Available The identification of T cell epitopes and their HLA (human leukocyte antigen restrictions is important for applications such as the design of cellular vaccines for HIV. Traditional methods for such identification are costly and time-consuming. Recently, a more expeditious laboratory technique using ELISpot assays has been developed that allows for rapid screening of specific responses. However, this assay does not directly provide information concerning the HLA restriction of a response, a critical piece of information for vaccine design. Thus, we introduce, apply, and validate a statistical model for identifying HLA-restricted epitopes from ELISpot data. By looking at patterns across a broad range of donors, in conjunction with our statistical model, we can determine (probabilistically which of the HLA alleles are likely to be responsible for the observed reactivities. Additionally, we can provide a good estimate of the number of false positives generated by our analysis (i.e., the false discovery rate. This model allows us to learn about new HLA-restricted epitopes from ELISpot data in an efficient, cost-effective, and high-throughput manner. We applied our approach to data from donors infected with HIV and identified many potential new HLA restrictions. Among 134 such predictions, six were confirmed in the lab and the remainder could not be ruled as invalid. These results shed light on the extent of HLA class I promiscuity, which has significant implications for the understanding of HLA class I antigen presentation and vaccine development.
Conditioning model output statistics of regional climate model precipitation on circulation patterns
Directory of Open Access Journals (Sweden)
F. Wetterhall
2012-11-01
Full Text Available Dynamical downscaling of Global Climate Models (GCMs through regional climate models (RCMs potentially improves the usability of the output for hydrological impact studies. However, a further downscaling or interpolation of precipitation from RCMs is often needed to match the precipitation characteristics at the local scale. This study analysed three Model Output Statistics (MOS techniques to adjust RCM precipitation; (1 a simple direct method (DM, (2 quantile-quantile mapping (QM and (3 a distribution-based scaling (DBS approach. The modelled precipitation was daily means from 16 RCMs driven by ERA40 reanalysis data over the 1961–2000 provided by the ENSEMBLES (ENSEMBLE-based Predictions of Climate Changes and their Impacts project over a small catchment located in the Midlands, UK. All methods were conditioned on the entire time series, separate months and using an objective classification of Lamb's weather types. The performance of the MOS techniques were assessed regarding temporal and spatial characteristics of the precipitation fields, as well as modelled runoff using the HBV rainfall-runoff model. The results indicate that the DBS conditioned on classification patterns performed better than the other methods, however an ensemble approach in terms of both climate models and downscaling methods is recommended to account for uncertainties in the MOS methods.
CANFIS: A non-linear regression procedure to produce statistical air-quality forecast models
Energy Technology Data Exchange (ETDEWEB)
Burrows, W.R.; Montpetit, J. [Environment Canada, Downsview, Ontario (Canada). Meteorological Research Branch; Pudykiewicz, J. [Environment Canada, Dorval, Quebec (Canada)
1997-12-31
Statistical models for forecasts of environmental variables can provide a good trade-off between significance and precision in return for substantial saving of computer execution time. Recent non-linear regression techniques give significantly increased accuracy compared to traditional linear regression methods. Two are Classification and Regression Trees (CART) and the Neuro-Fuzzy Inference System (NFIS). Both can model predict and distributions, including the tails, with much better accuracy than linear regression. Given a learning data set of matched predict and predictors, CART regression produces a non-linear, tree-based, piecewise-continuous model of the predict and data. Its variance-minimizing procedure optimizes the task of predictor selection, often greatly reducing initial data dimensionality. NFIS reduces dimensionality by a procedure known as subtractive clustering but it does not of itself eliminate predictors. Over-lapping coverage in predictor space is enhanced by NFIS with a Gaussian membership function for each cluster component. Coefficients for a continuous response model based on the fuzzified cluster centers are obtained by a least-squares estimation procedure. CANFIS is a two-stage data-modeling technique that combines the strength of CART to optimize the process of selecting predictors from a large pool of potential predictors with the modeling strength of NFIS. A CANFIS model requires negligible computer time to run. CANFIS models for ground-level O{sub 3}, particulates, and other pollutants will be produced for each of about 100 Canadian sites. The air-quality models will run twice daily using a small number of predictors isolated from a large pool of upstream and local Lagrangian potential predictors.
Using the open-source statistical language R to analyze the dichotomous Rasch model.
Li, Yuelin
2006-08-01
R, an open-source statistical language and data analysis tool, is gaining popularity among psychologists currently teaching statistics. R is especially suitable for teaching advanced topics, such as fitting the dichotomous Rasch model--a topic that involves transforming complicated mathematical formulas into statistical computations. This article describes R's use as a teaching tool and a data analysis software program in the analysis of the Rasch model in item response theory. It also explains thetheory behind, as well as an educator's goals for, fitting the Rasch model with joint maximum likelihood estimation. This article also summarizes the R syntax for parameter estimation and the calculation of fit statistics. The results produced by R is compared with the results obtained from MINISTEP and the output of a conditional logit model. The use of R is encouraged because it is free, supported by a network of peer researchers, and covers both basic and advanced topics in statistics frequently used by psychologists.
Macroeconomic determinants of savings in Egypt "Statistical Model"
Directory of Open Access Journals (Sweden)
Hanaa Abdelaty Hasan Esmail
2014-07-01
Full Text Available Like lot of countries, aggregate consumption constitutes a major portion of Gross Domestic Product (GDP in Egypt. Consumption decisions determine savings decisions. In long term growth literature, differences in long term growth had been explained to a large extent by differences in the rates of savings which also determine a country’s investment in productive capacity, human capital and socio-economic infrastructure. In this study, we analyse macroeconomic determinants of savings in Egypt using Ordinary multiple regression. Our results indicate that national savings rate is positively related with real GDP growth rate. This suggests that saving is a positive function of income. The evidence suggests that national savings rate is negatively related with federal debt growth and inflation. This hints towards crowding out of private sector investment through decline in savings rate as a result of government’s indebtedness. Finally, negative association between savings rate and inflation implies that the consumer is rational and makes decisions based on his perceptions when it comes to allocating the lifetime resources over the period of his life. Increase in inflation dampens the incentive to save and people respond rationally which is made evident by the negative sign on inflation coefficient in our model.
A statistical model for spatial patterns of Buruli ulcer in the Amansie West district, Ghana
Duker, Alfred A.; Stein, Alfred; Hale, Martin
2006-06-01
Buruli ulcer (BU), a skin ulceration caused by Mycobacterium ulcerans (MU), is the second most widespread mycobacterium infection in Ghana. Its infection pathway is possibly related to the potable and agricultural water supply. This study aims to identify environmental factors that influence infection in a part of Ghana. It examines the significance of contaminated surface drainage channels and groundwater using conditional autoregressive (CAR) statistical modelling. This type of modelling implies that the spatial pattern of BU incidence in one community depends on the influence of the environment in neighbouring communities. Covariates were included to assess the spatial relationship between environmental risk factors and BU incidence in the study area. The study reveals an association between (a) the mean As content of soil and spatial distribution of BU and (b) the distance to sites of gold mining and spatial distribution of BU. We conclude that both arsenic in the natural environment and gold mining influence BU infection.
Statistical properties of Olami-Feder-Christensen model on Barabasi-Albert scale-free network
Tanaka, Hiroki; Hatano, Takahiro
2017-12-01
The Olami-Feder-Christensen model on the Barabasi-Albert type scale-free network is investigated in the context of statistical seismology. This simple model may be regarded as the interacting faults obeying power-law size distribution under two assumptions: (i) each node represents a distinct fault; (ii) the degree of a node is proportional to the fault size and the energy accumulated around it. Depending on the strength of an interaction, the toppling events exhibit temporal clustering as is ubiquitously observed for natural earthquakes. Defining a geometrical parameter that characterizes the heterogeneity of the energy stored in the nodes, we show that aftershocks are characterized as a process of regaining the heterogeneity that is lost by the main shock. The heterogeneity is not significantly altered during the loading process and foreshocks.
GW-SEM: A Statistical Package to Conduct Genome-Wide Structural Equation Modeling.
Verhulst, Brad; Maes, Hermine H; Neale, Michael C
2017-05-01
Improving the accuracy of phenotyping through the use of advanced psychometric tools will increase the power to find significant associations with genetic variants and expand the range of possible hypotheses that can be tested on a genome-wide scale. Multivariate methods, such as structural equation modeling (SEM), are valuable in the phenotypic analysis of psychiatric and substance use phenotypes, but these methods have not been integrated into standard genome-wide association analyses because fitting a SEM at each single nucleotide polymorphism (SNP) along the genome was hitherto considered to be too computationally demanding. By developing a method that can efficiently fit SEMs, it is possible to expand the set of models that can be tested. This is particularly necessary in psychiatric and behavioral genetics, where the statistical methods are often handicapped by phenotypes with large components of stochastic variance. Due to the enormous amount of data that genome-wide scans produce, the statistical methods used to analyze the data are relatively elementary and do not directly correspond with the rich theoretical development, and lack the potential to test more complex hypotheses about the measurement of, and interaction between, comorbid traits. In this paper, we present a method to test the association of a SNP with multiple phenotypes or a latent construct on a genome-wide basis using a diagonally weighted least squares (DWLS) estimator for four common SEMs: a one-factor model, a one-factor residuals model, a two-factor model, and a latent growth model. We demonstrate that the DWLS parameters and p-values strongly correspond with the more traditional full information maximum likelihood parameters and p-values. We also present the timing of simulations and power analyses and a comparison with and existing multivariate GWAS software package.
Development of a statistical shape model of multi-organ and its performance evaluation
International Nuclear Information System (INIS)
Nakada, Misaki; Shimizu, Akinobu; Kobatake, Hidefumi; Nawano, Shigeru
2010-01-01
Existing statistical shape modeling methods for an organ can not take into account the correlation between neighboring organs. This study focuses on a level set distribution model and proposes two modeling methods for multiple organs that can take into account the correlation between neighboring organs. The first method combines level set functions of multiple organs into a vector. Subsequently it analyses the distribution of the vectors of a training dataset by a principal component analysis and builds a multiple statistical shape model. Second method constructs a statistical shape model for each organ independently and assembles component scores of different organs in a training dataset so as to generate a vector. It analyses the distribution of the vectors of to build a statistical shape model of multiple organs. This paper shows results of applying the proposed methods trained by 15 abdominal CT volumes to unknown 8 CT volumes. (author)
Modeling gallic acid production rate by empirical and statistical analysis
Directory of Open Access Journals (Sweden)
Bratati Kar
2000-01-01
Full Text Available For predicting the rate of enzymatic reaction empirical correlation based on the experimental results obtained under various operating conditions have been developed. Models represent both the activation as well as deactivation conditions of enzymatic hydrolysis and the results have been analyzed by analysis of variance (ANOVA. The tannase activity was found maximum at incubation time 5 min, reaction temperature 40ºC, pH 4.0, initial enzyme concentration 0.12 v/v, initial substrate concentration 0.42 mg/ml, ionic strength 0.2 M and under these optimal conditions, the maximum rate of gallic acid production was 33.49 mumoles/ml/min.Para predizer a taxa das reações enzimaticas uma correlação empírica baseada nos resultados experimentais foi desenvolvida. Os modelos representam a ativação e a desativativação da hydrolise enzimatica. Os resultados foram avaliados pela análise de variança (ANOVA. A atividade máxima da tannase foi obtida após 5 minutos de incubação, temperatura 40ºC, pH 4,0, concentração inicial da enzima de 0,12 v/v, concentração inicial do substrato 0,42 mg/ml, força iônica 0,2 M. Sob essas condições a taxa máxima de produção ácido galico foi de 33,49 µmoles/ml/min.
Operational benefits and challenges of the use of fingerprint statistical models: a field study.
Neumann, Cedric; Mateos-Garcia, Ismael; Langenburg, Glenn; Kostroski, Jennifer; Skerrett, James E; Koolen, Martin
2011-10-10
Research projects aimed at proposing fingerprint statistical models based on the likelihood ratio framework have shown that low quality finger impressions left on crime scenes may have significant evidential value. These impressions are currently either not recovered, considered to be of no value when first analyzed by fingerprint examiners, or lead to inconclusive results when compared to control prints. There are growing concerns within the fingerprint community that recovering and examining these low quality impressions will result in a significant increase of the workload of fingerprint units and ultimately of the number of backlogged cases. This study was designed to measure the number of impressions currently not recovered or not considered for examination, and to assess the usefulness of these impressions in terms of the number of additional detections that would result from their examination. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Statistical Models to Assess the Health Effects and to Forecast Ground Level Ozone
Czech Academy of Sciences Publication Activity Database
Schlink, U.; Herbath, O.; Richter, M.; Dorling, S.; Nunnari, G.; Cawley, G.; Pelikán, Emil
2006-01-01
Roč. 21, č. 4 (2006), s. 547-558 ISSN 1364-8152 R&D Projects: GA AV ČR 1ET400300414 Institutional research plan: CEZ:AV0Z10300504 Keywords : statistical models * ground level ozone * health effects * logistic model * forecasting * prediction performance * neural network * generalised additive model * integrated assessment Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 1.992, year: 2006
A statistical model of operational impacts on the framework of the bridge crane
Antsev, V. Yu; Tolokonnikov, A. S.; Gorynin, A. D.; Reutov, A. A.
2017-02-01
The technical regulations of the Customs Union demands implementation of the risk analysis of the bridge cranes operation at their design stage. The statistical model has been developed for performance of random calculations of risks, allowing us to model possible operational influences on the bridge crane metal structure in their various combination. The statistical model is practically actualized in the software product automated calculation of risks of failure occurrence of bridge cranes.
Directory of Open Access Journals (Sweden)
Waldir de Carvalho Junior
2014-06-01
Full Text Available Soil properties have an enormous impact on economic and environmental aspects of agricultural production. Quantitative relationships between soil properties and the factors that influence their variability are the basis of digital soil mapping. The predictive models of soil properties evaluated in this work are statistical (multiple linear regression-MLR and geostatistical (ordinary kriging and co-kriging. The study was conducted in the municipality of Bom Jardim, RJ, using a soil database with 208 sampling points. Predictive models were evaluated for sand, silt and clay fractions, pH in water and organic carbon at six depths according to the specifications of the consortium of digital soil mapping at the global level (GlobalSoilMap. Continuous covariates and categorical predictors were used and their contributions to the model assessed. Only the environmental covariates elevation, aspect, stream power index (SPI, soil wetness index (SWI, normalized difference vegetation index (NDVI, and b3/b2 band ratio were significantly correlated with soil properties. The predictive models had a mean coefficient of determination of 0.21. Best results were obtained with the geostatistical predictive models, where the highest coefficient of determination 0.43 was associated with sand properties between 60 to 100 cm deep. The use of a sparse data set of soil properties for digital mapping can explain only part of the spatial variation of these properties. The results may be related to the sampling density and the quantity and quality of the environmental covariates and predictive models used.
Statistical modeling of the power grid from a wind farm standpoint
DEFF Research Database (Denmark)
Farajzadehbibalan, Saber; Ramezani, Mohammad H.; Nielsen, Peter
2017-01-01
In this study, we derive a statistical model of a power grid from the wind farm's standpoint based on dynamic principal component analysis. The main advantages of our model compared to the previously developed models are twofold. Firstly, our proposed model benefits from logged data of an offshor...
Asking Sensitive Questions: A Statistical Power Analysis of Randomized Response Models
Ulrich, Rolf; Schroter, Hannes; Striegel, Heiko; Simon, Perikles
2012-01-01
This article derives the power curves for a Wald test that can be applied to randomized response models when small prevalence rates must be assessed (e.g., detecting doping behavior among elite athletes). These curves enable the assessment of the statistical power that is associated with each model (e.g., Warner's model, crosswise model, unrelated…
A Statistical Cyclone Intensity Prediction (SCIP) model for the Bay of ...
Indian Academy of Sciences (India)
been proposed. The model is developed applying multiple linear regression technique. The model ... Keywords. Tropical cyclone; intensity prediction; multiple linear regression; regression coefficient; statistical model. J. Earth Syst. Sci. 117, No. ... posed a simple empirical model for predicting the intensity of tropical cyclones ...
Analysis of TCE Fate and Transport in Karst Groundwater Systems Using Statistical Mixed Models
Anaya, A. A.; Padilla, I. Y.
2012-12-01
Karst groundwater systems are highly productive and provide an important fresh water resource for human development and ecological integrity. Their high productivity is often associated with conduit flow and high matrix permeability. The same characteristics that make these aquifers productive also make them highly vulnerable to contamination and a likely for contaminant exposure. Of particular interest are trichloroethylene, (TCE) and Di-(2-Ethylhexyl) phthalate (DEHP). These chemicals have been identified as potential precursors of pre-term birth, a leading cause of neonatal complications with a significant health and societal cost. Both of these contaminants have been found in the karst groundwater formations in this area of the island. The general objectives of this work are to: (1) develop fundamental knowledge and determine the processes controlling the release, mobility, persistence, and possible pathways of contaminants in karst groundwater systems, and (2) characterize transport processes in conduit and diffusion-dominated flow under base flow and storm flow conditions. The work presented herein focuses on the use of geo-hydro statistical tools to characterize flow and transport processes under different flow regimes, and their application in the analysis of fate and transport of TCE. Multidimensional, laboratory-scale Geo-Hydrobed models (GHM) were used for this purpose. The models consist of stainless-steel tanks containing karstified limestone blocks collected from the karst aquifer formation of northern Puerto Rico. The models integrates a network of sampling wells to monitor flow, pressure, and solute concentrations temporally and spatially. Experimental work entails injecting dissolved CaCl2 tracers and TCE in the upstream boundary of the GHM while monitoring TCE and tracer concentrations spatially and temporally in the limestone under different groundwater flow regimes. Analysis of the temporal and spatial concentration distributions of solutes
Statistical mechanics of directed models of polymers in the square lattice
Rensburg, J V
2003-01-01
Directed square lattice models of polymers and vesicles have received considerable attention in the recent mathematical and physical sciences literature. These are idealized geometric directed lattice models introduced to study phase behaviour in polymers, and include Dyck paths, partially directed paths, directed trees and directed vesicles models. Directed models are closely related to models studied in the combinatorics literature (and are often exactly solvable). They are also simplified versions of a number of statistical mechanics models, including the self-avoiding walk, lattice animals and lattice vesicles. The exchange of approaches and ideas between statistical mechanics and combinatorics have considerably advanced the description and understanding of directed lattice models, and this will be explored in this review. The combinatorial nature of directed lattice path models makes a study using generating function approaches most natural. In contrast, the statistical mechanics approach would introduce...
The polarized structure function of the nucleons with a non-extensive statistical quark model
Energy Technology Data Exchange (ETDEWEB)
Trevisan, Luis A. [Departamento de Matematica e Estatistica, Universidade Estadual de Ponta Grossa, 84010-790, Ponta Grossa, PR (Brazil); Mirez, Carlos [Instituto de Ciencia, Engenharia e Tecnologia - ICET, Universidade Federal dos Vales do Jequitinhonha e Mucuri - UFVJM, Campus do Mucuri, Rua do Cruzeiro 01, Jardim Sao Paulo, 39803-371, Teofilo Otoni, Minas Gerais (Brazil)
2013-05-06
We studied an application of nonextensive thermodynamics to describe the polarized structure function of nucleon, in a model where the usual Fermi-Dirac and Bose-Einstein energy distribution, often used in the statistical models, were replaced by the equivalent functions of the q-statistical. The parameters of the model are given by an effective temperature T, the q parameter (from Tsallis statistics), and the chemical potentials given by the corresponding up (u) and down (d) quark normalization in the nucleon and by {Delta}u and {Delta}d of the polarized functions.
International Nuclear Information System (INIS)
EI-Shanshoury, G.I.
2011-01-01
Several statistical distributions are used to model various reliability and maintainability parameters. The applied distribution depends on the' nature of the data being analyzed. The presented paper deals with analysis of some statistical distributions used in reliability to reach the best fit of distribution analysis. The calculations rely on circuit quantity parameters obtained by using Relex 2009 computer program. The statistical analysis of ten different distributions indicated that Weibull distribution gives the best fit distribution for modeling the reliability of the data set of Temperature Alarm Circuit (TAC). However, the Exponential distribution is found to be the best fit distribution for modeling the failure rate
National Research Council Canada - National Science Library
Hart, Kenneth
2003-01-01
The skill of a mesoscale model based Model Output Statistics (MOS) system that provided hourly forecasts for 18 sites over northern Utah during the 2002 Winter Olympic and Paralympic Games is evaluated...
texreg: Conversion of Statistical Model Output in R to LATEX and HTML Tables
Directory of Open Access Journals (Sweden)
Philip Leifeld
2013-11-01
Full Text Available A recurrent task in applied statistics is the (mostly manual preparation of model output for inclusion in LATEX, Microsoft Word, or HTML documents usually with more than one model presented in a single table along with several goodness-of-fit statistics. However, statistical models in R have diverse object structures and summary methods, which makes this process cumbersome. This article first develops a set of guidelines for converting statistical model output to LATEX and HTML tables, then assesses to what extent existing packages meet these requirements, and finally presents the texreg package as a solution that meets all of the criteria set out in the beginning. After providing various usage examples, a blueprint for writing custom model extensions is proposed.
PVeStA: A Parallel Statistical Model Checking and Quantitative Analysis Tool
AlTurki, Musab
2011-01-01
Statistical model checking is an attractive formal analysis method for probabilistic systems such as, for example, cyber-physical systems which are often probabilistic in nature. This paper is about drastically increasing the scalability of statistical model checking, and making such scalability of analysis available to tools like Maude, where probabilistic systems can be specified at a high level as probabilistic rewrite theories. It presents PVeStA, an extension and parallelization of the VeStA statistical model checking tool [10]. PVeStA supports statistical model checking of probabilistic real-time systems specified as either: (i) discrete or continuous Markov Chains; or (ii) probabilistic rewrite theories in Maude. Furthermore, the properties that it can model check can be expressed in either: (i) PCTL/CSL, or (ii) the QuaTEx quantitative temporal logic. As our experiments show, the performance gains obtained from parallelization can be very high. © 2011 Springer-Verlag.
A new method to determine the number of experimental data using statistical modeling methods
Energy Technology Data Exchange (ETDEWEB)
Jung, Jung-Ho; Kang, Young-Jin; Lim, O-Kaung; Noh, Yoojeong [Pusan National University, Busan (Korea, Republic of)
2017-06-15
For analyzing the statistical performance of physical systems, statistical characteristics of physical parameters such as material properties need to be estimated by collecting experimental data. For accurate statistical modeling, many such experiments may be required, but data are usually quite limited owing to the cost and time constraints of experiments. In this study, a new method for determining a rea- sonable number of experimental data is proposed using an area metric, after obtaining statistical models using the information on the underlying distribution, the Sequential statistical modeling (SSM) approach, and the Kernel density estimation (KDE) approach. The area metric is used as a convergence criterion to determine the necessary and sufficient number of experimental data to be acquired. The pro- posed method is validated in simulations, using different statistical modeling methods, different true models, and different convergence criteria. An example data set with 29 data describing the fatigue strength coefficient of SAE 950X is used for demonstrating the performance of the obtained statistical models that use a pre-determined number of experimental data in predicting the probability of failure for a target fatigue life.
Application of Statistical Model in Wastewater Treatment Process Modeling Using Data Analysis
Directory of Open Access Journals (Sweden)
Alireza Raygan Shirazinezhad
2015-06-01
Full Text Available Background: Wastewater treatment includes very complex and interrelated physical, chemical and biological processes which using data analysis techniques can be rigorously modeled by a non-complex mathematical calculation models. Materials and Methods: In this study, data on wastewater treatment processes from water and wastewater company of Kohgiluyeh and Boyer Ahmad were used. A total of 3306 data for COD, TSS, PH and turbidity were collected, then analyzed by SPSS-16 software (descriptive statistics and data analysis IBM SPSS Modeler 14.2, through 9 algorithm. Results: According to the results on logistic regression algorithms, neural networks, Bayesian networks, discriminant analysis, decision tree C5, tree C & R, CHAID, QUEST and SVM had accuracy precision of 90.16, 94.17, 81.37, 70.48, 97.89, 96.56, 96.46, 96.84 and 88.92, respectively. Discussion and conclusion: The C5 algorithm as the best and most applicable algorithms for modeling of wastewater treatment processes were chosen carefully with accuracy of 97.899 and the most influential variables in this model were PH, COD, TSS and turbidity.
International Nuclear Information System (INIS)
Asim, F.; Mahmood, M.
2017-01-01
Statistical modeling imparts significant role in predicting the impact of potential factors affecting the one step fixation process of reactive printing and easy care finishing. Investigation of significant factors on tear strength of cotton fabric for single step fixation of reactive printing and easy care finishing has been carried out in this research work using experimental design technique. The potential design factors were; concentration of reactive dye, concentration of crease resistant, fixation method and fixation temperature. The experiments were designed using DoE (Design of Experiment) and analyzed through software Design Expert. The detailed analysis of significant factors and interactions including ANOVA (Analysis of Variance), residuals, model accuracy and statistical model for tear strength has been presented. The interaction and contour plots of vital factors has been examined. It has been found from the statistical analysis that each factor has an interaction with other factor. Most of the investigated factors showed curvature effect on other factor. After critical examination of significant plots, quadratic model of tear strength with significant terms and their interaction at alpha = 0.05 has been developed. The calculated correlation coefficient, R2 of the developed model is 0.9056. The high values of correlation coefficient inferred that developed equation of tear strength will precisely predict the tear strength over the range of values. (author)
Analysis and Modeling of Subthreshold Neural Multi-Electrode Array Data by Statistical Field Theory.
Henningson, Måns; Illes, Sebastian
2017-01-01
Multi-electrode arrays (MEA) are increasingly used to investigate spontaneous neuronal network activity. The recorded signals comprise several distinct components: Apart from artifacts without biological significance, one can distinguish between spikes (action potentials) and subthreshold fluctuations (local fields potentials). Here we aim to develop a theoretical model that allows for a compact and robust characterization of subthreshold fluctuations in terms of a Gaussian statistical field theory in two spatial and one temporal dimension. What is usually referred to as the driving noise in the context of statistical physics is here interpreted as a representation of the neural activity. Spatial and temporal correlations of this activity give valuable information about the connectivity in the neural tissue. We apply our methods on a dataset obtained from MEA-measurements in an acute hippocampal brain slice from a rat. Our main finding is that the empirical correlation functions indeed obey the logarithmic behavior that is a general feature of theoretical models of this kind. We also find a clear correlation between the activity and the occurrence of spikes. Another important insight is the importance of correctly separating out certain artifacts from the data before proceeding with the analysis.
Streamwise Evolution of Statistical Events in a Model Wind-Turbine Array
Viestenz, Kyle; Cal, Raúl Bayoán
2016-02-01
Hot-wire anemometry data, obtained from a wind-tunnel experiment containing a 3 × 3 model wind-turbine array, are used to conditionally average the Reynolds stresses. Nine profiles at the centreline behind the array are analyzed to characterize the turbulent velocity statistics of the wake flow. Quadrant analysis yields statistical events occurring in the wake of the wind farm where quadrants 2 and 4 produce ejections and sweeps, respectively. The scaled difference between these two events is expressed via the Δ R0 parameter and is based on the Δ S0 quantity as introduced by M. R. Raupach (J Fluid Mech 108:363-382, 1981). Δ R0 attains a maximum value at hub height and changes sign near the top of the rotor. The ratio of quadrant events of upward momentum flux to those of the downward flux, known as the exuberance, is examined and reveals the effect of root vortices persisting to eight rotor diameters downstream. These events are then associated with the triple correlation term present in the turbulent kinetic energy equation of the fluctuations where it is found that ejections play the dual role of entraining mean kinetic energy while convecting turbulent kinetic energy out of the turbine canopy. The development of these various quantities possesses significance in closure models, and is assessed in light of wake remediation, energy transport and power fluctuations, where it is found that the maximum fluctuation is about 30% of the mean power produced.