Statistical Mechanics of Optimal Convex Inference in High Dimensions
Advani, Madhu; Ganguli, Surya
2016-07-01
A fundamental problem in modern high-dimensional data analysis involves efficiently inferring a set of P unknown model parameters governing the relationship between the inputs and outputs of N noisy measurements. Various methods have been proposed to regress the outputs against the inputs to recover the P parameters. What are fundamental limits on the accuracy of regression, given finite signal-to-noise ratios, limited measurements, prior information, and computational tractability requirements? How can we optimally combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as the measurement density α =(N /P )→∞ . However, these classical results are not relevant to modern high-dimensional inference problems, which instead occur at finite α . We employ replica theory to answer these questions for a class of inference algorithms, known in the statistics literature as M-estimators. These algorithms attempt to recover the P model parameters by solving an optimization problem involving minimizing the sum of a loss function that penalizes deviations between the data and model predictions, and a regularizer that leverages prior information about model parameters. Widely cherished algorithms like maximum likelihood (ML) and maximum-a posteriori (MAP) inference arise as special cases of M-estimators. Our analysis uncovers fundamental limits on the inference accuracy of a subclass of M-estimators corresponding to computationally tractable convex optimization problems. These limits generalize classical statistical theorems like the Cramer-Rao bound to the high-dimensional setting with prior information. We further discover the optimal M-estimator for log-concave signal and noise distributions; we demonstrate that it can achieve our high-dimensional limits on inference accuracy, while ML and MAP cannot. Intriguingly, in high dimensions, these optimal algorithms become computationally simpler than
Rohatgi, Vijay K
2003-01-01
Unified treatment of probability and statistics examines and analyzes the relationship between the two fields, exploring inferential issues. Numerous problems, examples, and diagrams--some with solutions--plus clear-cut, highlighted summaries of results. Advanced undergraduate to graduate level. Contents: 1. Introduction. 2. Probability Model. 3. Probability Distributions. 4. Introduction to Statistical Inference. 5. More on Mathematical Expectation. 6. Some Discrete Models. 7. Some Continuous Models. 8. Functions of Random Variables and Random Vectors. 9. Large-Sample Theory. 10. General Meth
Zhu, Hongjian
2016-12-12
Seamless phase II/III clinical trials have attracted increasing attention recently. They mainly use Bayesian response adaptive randomization (RAR) designs. There has been little research into seamless clinical trials using frequentist RAR designs because of the difficulty in performing valid statistical inference following this procedure. The well-designed frequentist RAR designs can target theoretically optimal allocation proportions, and they have explicit asymptotic results. In this paper, we study the asymptotic properties of frequentist RAR designs with adjusted target allocation proportions, and investigate statistical inference for this procedure. The properties of the proposed design provide an important theoretical foundation for advanced seamless clinical trials. Our numerical studies demonstrate that the design is ethical and efficient.
Probability and Statistical Inference
Prosper, Harrison B.
2006-01-01
These lectures introduce key concepts in probability and statistical inference at a level suitable for graduate students in particle physics. Our goal is to paint as vivid a picture as possible of the concepts covered.
Introductory statistical inference
Mukhopadhyay, Nitis
2014-01-01
This gracefully organized text reveals the rigorous theory of probability and statistical inference in the style of a tutorial, using worked examples, exercises, figures, tables, and computer simulations to develop and illustrate concepts. Drills and boxed summaries emphasize and reinforce important ideas and special techniques.Beginning with a review of the basic concepts and methods in probability theory, moments, and moment generating functions, the author moves to more intricate topics. Introductory Statistical Inference studies multivariate random variables, exponential families of dist
Nonparametric statistical inference
Gibbons, Jean Dickinson
2010-01-01
Overall, this remains a very fine book suitable for a graduate-level course in nonparametric statistics. I recommend it for all people interested in learning the basic ideas of nonparametric statistical inference.-Eugenia Stoimenova, Journal of Applied Statistics, June 2012… one of the best books available for a graduate (or advanced undergraduate) text for a theory course on nonparametric statistics. … a very well-written and organized book on nonparametric statistics, especially useful and recommended for teachers and graduate students.-Biometrics, 67, September 2011This excellently presente
Statistical inferences in phylogeography
DEFF Research Database (Denmark)
Nielsen, Rasmus; Beaumont, Mark A
2009-01-01
In conventional phylogeographic studies, historical demographic processes are elucidated from the geographical distribution of individuals represented on an inferred gene tree. However, the interpretation of gene trees in this context can be difficult as the same demographic/geographical process ...... may also be challenged by computational problems or poor model choice. In this review, we will describe the development of statistical methods in phylogeographic analysis, and discuss some of the challenges facing these methods....... can randomly lead to multiple different genealogies. Likewise, the same gene trees can arise under different demographic models. This problem has led to the emergence of many statistical methods for making phylogeographic inferences. A popular phylogeographic approach based on nested clade analysis...... is challenged by the fact that a certain amount of the interpretation of the data is left to the subjective choices of the user, and it has been argued that the method performs poorly in simulation studies. More rigorous statistical methods based on coalescence theory have been developed. However, these methods...
Nanotechnology and statistical inference
Vesely, Sara; Vesely, Leonardo; Vesely, Alessandro
2017-08-01
We discuss some problems that arise when applying statistical inference to data with the aim of disclosing new func-tionalities. A predictive model analyzes the data taken from experiments on a specific material to assess the likelihood that another product, with similar structure and properties, will exhibit the same functionality. It doesn't have much predictive power if vari-ability occurs as a consequence of a specific, non-linear behavior. We exemplify our discussion on some experiments with biased dice.
Statistical inference for financial engineering
Taniguchi, Masanobu; Ogata, Hiroaki; Taniai, Hiroyuki
2014-01-01
This monograph provides the fundamentals of statistical inference for financial engineering and covers some selected methods suitable for analyzing financial time series data. In order to describe the actual financial data, various stochastic processes, e.g. non-Gaussian linear processes, non-linear processes, long-memory processes, locally stationary processes etc. are introduced and their optimal estimation is considered as well. This book also includes several statistical approaches, e.g., discriminant analysis, the empirical likelihood method, control variate method, quantile regression, realized volatility etc., which have been recently developed and are considered to be powerful tools for analyzing the financial data, establishing a new bridge between time series and financial engineering. This book is well suited as a professional reference book on finance, statistics and statistical financial engineering. Readers are expected to have an undergraduate-level knowledge of statistics.
Statistical Inference: The Big Picture.
Kass, Robert E
2011-02-01
Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results? I suggest that a philosophy compatible with statistical practice, labelled here statistical pragmatism, serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mis-characterize the process of statistical inference and I propose an alternative "big picture" depiction.
Statistical Inference and String Theory
Heckman, Jonathan J
2013-01-01
In this note we expose some surprising connections between string theory and statistical inference. We consider a large collective of agents sweeping out a family of nearby statistical models for an M-dimensional manifold of statistical fitting parameters. When the agents making nearby inferences align along a d-dimensional grid, we find that the pooled probability that the collective reaches a correct inference is the partition function of a non-linear sigma model in d dimensions. Stability under perturbations to the original inference scheme requires the agents of the collective to distribute along two dimensions. Conformal invariance of the sigma model corresponds to the condition of a stable inference scheme, directly leading to the Einstein field equations for classical gravity. By summing over all possible arrangements of the agents in the collective, we reach a string theory. We also use this perspective to quantify how much an observer can hope to learn about the internal geometry of a superstring com...
LBVs and Statistical Inference
Davidson, Kris; Weis, Kerstin
2016-01-01
Smith and Tombleson (2015) asserted that statistical tests disprove the standard view of LBVs, and proposed a far more complex scenario to replace it. But Humphreys et al. (2016) showed that Smith and Tombleson's Magellanic "LBV" sample was a mixture of physically different classes of stars, and genuine LBVs are in fact statistically consistent with the standard view. Smith (2016) recently objected at great length to this result. Here we note that he misrepresented some of the arguments, altered the test criteria, ignored some long-recognized observational facts, and employed inadequate statistical procedures. This case illustrates the dangers of uncareful statistical sampling, as well as the need to be wary of unstated assumptions.
Statistical inference via fiducial methods
Salomé, Diemer
1998-01-01
In this thesis the attention is restricted to inductive reasoning using a mathematical probability model. A statistical procedure prescribes, for every theoretically possible set of data, the inference about the unknown of interest. ... Zie: Summary
On quantum statistical inference
Barndorff-Nielsen, O.E.; Gill, R.D.; Jupp, P.E.
2001-01-01
Recent developments in the mathematical foundations of quantum mechanics have brought the theory closer to that of classical probability and statistics. On the other hand, the unique character of quantum physics sets many of the questions addressed apart from those met classically in stochastics.
Statistical theory and inference
Olive, David J
2014-01-01
This text is for a one semester graduate course in statistical theory and covers minimal and complete sufficient statistics, maximum likelihood estimators, method of moments, bias and mean square error, uniform minimum variance estimators and the Cramer-Rao lower bound, an introduction to large sample theory, likelihood ratio tests and uniformly most powerful tests and the Neyman Pearson Lemma. A major goal of this text is to make these topics much more accessible to students by using the theory of exponential families. Exponential families, indicator functions and the support of the distribution are used throughout the text to simplify the theory. More than 50 ``brand name" distributions are used to illustrate the theory with many examples of exponential families, maximum likelihood estimators and uniformly minimum variance unbiased estimators. There are many homework problems with over 30 pages of solutions.
Applied statistical inference with MINITAB
Lesik, Sally
2009-01-01
Through clear, step-by-step mathematical calculations, Applied Statistical Inference with MINITAB enables students to gain a solid understanding of how to apply statistical techniques using a statistical software program. It focuses on the concepts of confidence intervals, hypothesis testing, validating model assumptions, and power analysis.Illustrates the techniques and methods using MINITABAfter introducing some common terminology, the author explains how to create simple graphs using MINITAB and how to calculate descriptive statistics using both traditional hand computations and MINITAB. Sh
On quantum statistical inference
DEFF Research Database (Denmark)
Barndorff-Nielsen, Ole Eiler; Gill, Richard D.; Jupp, Peter E.
Recent developments in the mathematical foundations of quantum mechanics have brought the theory closer to that of classical probability and statistics. On the other hand, the unique character of quantum physics sets many of the questions addressed apart from those met classically in stochastics....... Furthermore, concurrent advances in experimental techniques and in the theory of quantum computation have led to a strong interest in questions of quantum information, in particular in the sense of the amount of information about unknown parameters in given observational data or accessible through various...
Nonparametric statistical inference
Gibbons, Jean Dickinson
2014-01-01
Thoroughly revised and reorganized, the fourth edition presents in-depth coverage of the theory and methods of the most widely used nonparametric procedures in statistical analysis and offers example applications appropriate for all areas of the social, behavioral, and life sciences. The book presents new material on the quantiles, the calculation of exact and simulated power, multiple comparisons, additional goodness-of-fit tests, methods of analysis of count data, and modern computer applications using MINITAB, SAS, and STATXACT. It includes tabular guides for simplified applications of tests and finding P values and confidence interval estimates.
On quantum statistical inference
DEFF Research Database (Denmark)
Barndorff-Nielsen, Ole Eiler; Gill, Richard D.; Jupp, Peter E.
Recent developments in the mathematical foundations of quantum mechanics have brought the theory closer to that of classical probability and statistics. On the other hand, the unique character of quantum physics sets many of the questions addressed apart from those met classically in stochastics....... Furthermore, concurrent advances in experimental techniques and in the theory of quantum computation have led to a strong interest in questions of quantum information, in particular in the sense of the amount of information about unknown parameters in given observational data or accessible through various...
Statistical Inference on Optimal Points to Evaluate Multi-State Classification Systems
2014-09-18
The Meaning and Use of the Volume Under a Three-Class ROC Surface (VUS)”. IEEE Transactions on Medical Imaging , 27(5):577–588, 2008. [29] He, X., C. E...Transactions on Medical Imaging , 25(5):571–581, 2006. [30] Jund, J., M. Rabilloud, M. Wallon, and R. Ecochard. “Methods to Estimate the Optimal...Analysis: Theory, Methods, and Applications. Springer-Verlang New York Inc., New York, NY, 1990. [61] da Silva, J. Estrela, J. P Marques de Sa, and J
Ricci-Tersenghi, Federico; Zdeborova, Lenka; Zecchina, Riccardo; Tramel, Eric W; Cugliandolo, Leticia F
2015-01-01
This book contains a collection of the presentations that were given in October 2013 at the Les Houches Autumn School on statistical physics, optimization, inference, and message-passing algorithms. In the last decade, there has been increasing convergence of interest and methods between theoretical physics and fields as diverse as probability, machine learning, optimization, and inference problems. In particular, much theoretical and applied work in statistical physics and computer science has relied on the use of message-passing algorithms and their connection to the statistical physics of glasses and spin glasses. For example, both the replica and cavity methods have led to recent advances in compressed sensing, sparse estimation, and random constraint satisfaction, to name a few. This book’s detailed pedagogical lectures on statistical inference, computational complexity, the replica and cavity methods, and belief propagation are aimed particularly at PhD students, post-docs, and young researchers desir...
Bickel, David R
2010-01-01
The normalized maximum likelihood (NML) is a recent penalized likelihood that has properties that justify defining the amount of discrimination information (DI) in the data supporting an alternative hypothesis over a null hypothesis as the logarithm of an NML ratio, namely, the alternative hypothesis NML divided by the null hypothesis NML. The resulting DI, like the Bayes factor but unlike the p-value, measures the strength of evidence for an alternative hypothesis over a null hypothesis such that the probability of misleading evidence vanishes asymptotically under weak regularity conditions and such that evidence can support a simple null hypothesis. Unlike the Bayes factor, the DI does not require a prior distribution and is minimax optimal in a sense that does not involve averaging over outcomes that did not occur. Replacing a (possibly pseudo-) likelihood function with its weighted counterpart extends the scope of the DI to models for which the unweighted NML is undefined. The likelihood weights leverage ...
Statistical inference on variance components
Verdooren, L.R.
1988-01-01
In several sciences but especially in animal and plant breeding, the general mixed model with fixed and random effects plays a great role. Statistical inference on variance components means tests of hypotheses about variance components, constructing confidence intervals for them, estimating them,
Statistical learning and selective inference.
Taylor, Jonathan; Tibshirani, Robert J
2015-06-23
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
On Quantum Statistical Inference, II
Barndorff-Nielsen, O. E.; Gill, R. D.; Jupp, P.E.
2003-01-01
Interest in problems of statistical inference connected to measurements of quantum systems has recently increased substantially, in step with dramatic new developments in experimental techniques for studying small quantum systems. Furthermore, theoretical developments in the theory of quantum measurements have brought the basic mathematical framework for the probability calculations much closer to that of classical probability theory. The present paper reviews this field and proposes and inte...
Statistical inference on residual life
Jeong, Jong-Hyeon
2014-01-01
This is a monograph on the concept of residual life, which is an alternative summary measure of time-to-event data, or survival data. The mean residual life has been used for many years under the name of life expectancy, so it is a natural concept for summarizing survival or reliability data. It is also more interpretable than the popular hazard function, especially for communications between patients and physicians regarding the efficacy of a new drug in the medical field. This book reviews existing statistical methods to infer the residual life distribution. The review and comparison includes existing inference methods for mean and median, or quantile, residual life analysis through medical data examples. The concept of the residual life is also extended to competing risks analysis. The targeted audience includes biostatisticians, graduate students, and PhD (bio)statisticians. Knowledge in survival analysis at an introductory graduate level is advisable prior to reading this book.
Statistical Inference at Work: Statistical Process Control as an Example
Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia
2008-01-01
To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…
Redshift data and statistical inference
Newman, William I.; Haynes, Martha P.; Terzian, Yervant
1994-01-01
Frequency histograms and the 'power spectrum analysis' (PSA) method, the latter developed by Yu & Peebles (1969), have been widely employed as techniques for establishing the existence of periodicities. We provide a formal analysis of these two classes of methods, including controlled numerical experiments, to better understand their proper use and application. In particular, we note that typical published applications of frequency histograms commonly employ far greater numbers of class intervals or bins than is advisable by statistical theory sometimes giving rise to the appearance of spurious patterns. The PSA method generates a sequence of random numbers from observational data which, it is claimed, is exponentially distributed with unit mean and variance, essentially independent of the distribution of the original data. We show that the derived random processes is nonstationary and produces a small but systematic bias in the usual estimate of the mean and variance. Although the derived variable may be reasonably described by an exponential distribution, the tail of the distribution is far removed from that of an exponential, thereby rendering statistical inference and confidence testing based on the tail of the distribution completely unreliable. Finally, we examine a number of astronomical examples wherein these methods have been used giving rise to widespread acceptance of statistically unconfirmed conclusions.
Optimization methods for logical inference
Chandru, Vijay
2011-01-01
Merging logic and mathematics in deductive inference-an innovative, cutting-edge approach. Optimization methods for logical inference? Absolutely, say Vijay Chandru and John Hooker, two major contributors to this rapidly expanding field. And even though ""solving logical inference problems with optimization methods may seem a bit like eating sauerkraut with chopsticks. . . it is the mathematical structure of a problem that determines whether an optimization model can help solve it, not the context in which the problem occurs."" Presenting powerful, proven optimization techniques for logic in
The Reasoning behind Informal Statistical Inference
Makar, Katie; Bakker, Arthur; Ben-Zvi, Dani
2011-01-01
Informal statistical inference (ISI) has been a frequent focus of recent research in statistics education. Considering the role that context plays in developing ISI calls into question the need to be more explicit about the reasoning that underpins ISI. This paper uses educational literature on informal statistical inference and philosophical…
Statistical Inference in Graphical Models
2008-06-17
Probabilistic Network Library ( PNL ). While not fully mature, PNL does provide the most commonly-used algorithms for inference and learning with the efficiency...of C++, and also offers interfaces for calling the library from MATLAB and R 1361. Notably, both BNT and PNL provide learning and inference algorithms...mature and has been used for research purposes for several years, it is written in MATLAB and thus is not suitable to be used in real-time settings. PNL
Predict! Teaching Statistics Using Informational Statistical Inference
Makar, Katie
2013-01-01
Statistics is one of the most widely used topics for everyday life in the school mathematics curriculum. Unfortunately, the statistics taught in schools focuses on calculations and procedures before students have a chance to see it as a useful and powerful tool. Researchers have found that a dominant view of statistics is as an assortment of tools…
Statistical inference based on divergence measures
Pardo, Leandro
2005-01-01
The idea of using functionals of Information Theory, such as entropies or divergences, in statistical inference is not new. However, in spite of the fact that divergence statistics have become a very good alternative to the classical likelihood ratio test and the Pearson-type statistic in discrete models, many statisticians remain unaware of this powerful approach.Statistical Inference Based on Divergence Measures explores classical problems of statistical inference, such as estimation and hypothesis testing, on the basis of measures of entropy and divergence. The first two chapters form an overview, from a statistical perspective, of the most important measures of entropy and divergence and study their properties. The author then examines the statistical analysis of discrete multivariate data with emphasis is on problems in contingency tables and loglinear models using phi-divergence test statistics as well as minimum phi-divergence estimators. The final chapter looks at testing in general populations, prese...
Optimization techniques in statistics
Rustagi, Jagdish S
1994-01-01
Statistics help guide us to optimal decisions under uncertainty. A large variety of statistical problems are essentially solutions to optimization problems. The mathematical techniques of optimization are fundamentalto statistical theory and practice. In this book, Jagdish Rustagi provides full-spectrum coverage of these methods, ranging from classical optimization and Lagrange multipliers, to numerical techniques using gradients or direct search, to linear, nonlinear, and dynamic programming using the Kuhn-Tucker conditions or the Pontryagin maximal principle. Variational methods and optimiza
Local and Global Thinking in Statistical Inference
Pratt, Dave; Johnston-Wilder, Peter; Ainley, Janet; Mason, John
2008-01-01
In this reflective paper, we explore students' local and global thinking about informal statistical inference through our observations of 10- to 11-year-olds, challenged to infer the unknown configuration of a virtual die, but able to use the die to generate as much data as they felt necessary. We report how they tended to focus on local changes…
Bayesian Inference in Statistical Analysis
Box, George E P
2011-01-01
The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Rob
Ranald Macdonald and statistical inference.
Smith, Philip T
2009-05-01
Ranald Roderick Macdonald (1945-2007) was an important contributor to mathematical psychology in the UK, as a referee and action editor for British Journal of Mathematical and Statistical Psychology and as a participant and organizer at the British Psychological Society's Mathematics, statistics and computing section meetings. This appreciation argues that his most important contribution was to the foundations of significance testing, where his concern about what information was relevant in interpreting the results of significance tests led him to be a persuasive advocate for the 'Weak Fisherian' form of hypothesis testing.
Order statistics & inference estimation methods
Balakrishnan, N
1991-01-01
The literature on order statistics and inferenc eis quite extensive and covers a large number of fields ,but most of it is dispersed throughout numerous publications. This volume is the consolidtion of the most important results and places an emphasis on estimation. Both theoretical and computational procedures are presented to meet the needs of researchers, professionals, and students. The methods of estimation discussed are well-illustrated with numerous practical examples from both the physical and life sciences, including sociology,psychology,a nd electrical and chemical engineering. A co
Statistical inference on mixing proportion
Institute of Scientific and Technical Information of China (English)
2008-01-01
In this paper, the interval estimation and hypothesis testing of the mixing proportion in mixture distributions are considered. A statistical inferential method is proposed which is inspired by the generalized p-values and generalized pivotal quantity. In some situations, the true levels of the tests given in the paper are equal to nominal levels, and the true coverage of the interval estimation or confidence bounds is also equal to nominal one. In other situations, under mild conditions, the tests are consistent and the coverage of the interval estimations or the confidence bounds is asymptotically equal to nominal coverage. Meanwhile, some simulations are performed which show that our method is satisfactory.
Statistical inference on mixing proportion
Institute of Scientific and Technical Information of China (English)
XU XingZhong; LIU Fang
2008-01-01
In this paper,the interval estimation and hypothesis testing of the mixing proportion in mixture distributions are considered.A statistical inferential method is proposed which is inspired by the generalized p-values and generalized pivotal quantity.In some situations,the true levels of the tests given in the paper are equal to nominal levels,and the true coverage of the interval estimation or confidence bounds is also equal to nominal one.In other situations,under mild conditions,the tests are consistent and the coverage of the interval estimations or the confidence bounds is asymptotically equal to nominal coverage.Meanwhile,some simulations axe performed which show that our method is satisfactory.
Making statistical inferences about software reliability
Miller, Douglas R.
1988-01-01
Failure times of software undergoing random debugging can be modelled as order statistics of independent but nonidentically distributed exponential random variables. Using this model inferences can be made about current reliability and, if debugging continues, future reliability. This model also shows the difficulty inherent in statistical verification of very highly reliable software such as that used by digital avionics in commercial aircraft.
Investigating Mathematics Teachers' Thoughts of Statistical Inference
Yang, Kai-Lin
2012-01-01
Research on statistical cognition and application suggests that statistical inference concepts are commonly misunderstood by students and even misinterpreted by researchers. Although some research has been done on students' misunderstanding or misconceptions of confidence intervals (CIs), few studies explore either students' or mathematics…
Inference and the introductory statistics course
Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross
2011-10-01
This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its hypothetical probabilistic reasoning process is examined in some depth. We argue that the revolution in the teaching of inference must begin. We also discuss some perplexing issues, problematic areas and some new insights into language conundrums associated with introducing the logic of inference through randomization methods.
Bayesian Inference with Optimal Maps
Moselhy, Tarek A El
2011-01-01
We present a new approach to Bayesian inference that entirely avoids Markov chain simulation, by constructing a map that pushes forward the prior measure to the posterior measure. Existence and uniqueness of a suitable measure-preserving map is established by formulating the problem in the context of optimal transport theory. We discuss various means of explicitly parameterizing the map and computing it efficiently through solution of an optimization problem, exploiting gradient information from the forward model when possible. The resulting algorithm overcomes many of the computational bottlenecks associated with Markov chain Monte Carlo. Advantages of a map-based representation of the posterior include analytical expressions for posterior moments and the ability to generate arbitrary numbers of independent posterior samples without additional likelihood evaluations or forward solves. The optimization approach also provides clear convergence criteria for posterior approximation and facilitates model selectio...
Bayesian Cosmological inference beyond statistical isotropy
Souradeep, Tarun; Das, Santanu; Wandelt, Benjamin
2016-10-01
With advent of rich data sets, computationally challenge of inference in cosmology has relied on stochastic sampling method. First, I review the widely used MCMC approach used to infer cosmological parameters and present a adaptive improved implementation SCoPE developed by our group. Next, I present a general method for Bayesian inference of the underlying covariance structure of random fields on a sphere. We employ the Bipolar Spherical Harmonic (BipoSH) representation of general covariance structure on the sphere. We illustrate the efficacy of the method with a principled approach to assess violation of statistical isotropy (SI) in the sky maps of Cosmic Microwave Background (CMB) fluctuations. The general, principled, approach to a Bayesian inference of the covariance structure in a random field on a sphere presented here has huge potential for application to other many aspects of cosmology and astronomy, as well as, more distant areas of research like geosciences and climate modelling.
Introductory statistical inference with the likelihood function
Rohde, Charles A
2014-01-01
This textbook covers the fundamentals of statistical inference and statistical theory including Bayesian and frequentist approaches and methodology possible without excessive emphasis on the underlying mathematics. This book is about some of the basic principles of statistics that are necessary to understand and evaluate methods for analyzing complex data sets. The likelihood function is used for pure likelihood inference throughout the book. There is also coverage of severity and finite population sampling. The material was developed from an introductory statistical theory course taught by the author at the Johns Hopkins University’s Department of Biostatistics. Students and instructors in public health programs will benefit from the likelihood modeling approach that is used throughout the text. This will also appeal to epidemiologists and psychometricians. After a brief introduction, there are chapters on estimation, hypothesis testing, and maximum likelihood modeling. The book concludes with secti...
Thermodynamics of statistical inference by cells.
Lang, Alex H; Fisher, Charles K; Mora, Thierry; Mehta, Pankaj
2014-10-03
The deep connection between thermodynamics, computation, and information is now well established both theoretically and experimentally. Here, we extend these ideas to show that thermodynamics also places fundamental constraints on statistical estimation and learning. To do so, we investigate the constraints placed by (nonequilibrium) thermodynamics on the ability of biochemical signaling networks to estimate the concentration of an external signal. We show that accuracy is limited by energy consumption, suggesting that there are fundamental thermodynamic constraints on statistical inference.
Ignorability in Statistical and Probabilistic Inference
DEFF Research Database (Denmark)
Jaeger, Manfred
2005-01-01
When dealing with incomplete data in statistical learning, or incomplete observations in probabilistic inference, one needs to distinguish the fact that a certain event is observed from the fact that the observed event has happened. Since the modeling and computational complexities entailed...
Inference and the Introductory Statistics Course
Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross
2011-01-01
This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its…
Pointwise probability reinforcements for robust statistical inference.
Frénay, Benoît; Verleysen, Michel
2014-02-01
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation.
All of statistics a concise course in statistical inference
Wasserman, Larry
2004-01-01
This book is for people who want to learn probability and statistics quickly It brings together many of the main ideas in modern statistics in one place The book is suitable for students and researchers in statistics, computer science, data mining and machine learning This book covers a much wider range of topics than a typical introductory text on mathematical statistics It includes modern topics like nonparametric curve estimation, bootstrapping and classification, topics that are usually relegated to follow-up courses The reader is assumed to know calculus and a little linear algebra No previous knowledge of probability and statistics is required The text can be used at the advanced undergraduate and graduate level Larry Wasserman is Professor of Statistics at Carnegie Mellon University He is also a member of the Center for Automated Learning and Discovery in the School of Computer Science His research areas include nonparametric inference, asymptotic theory, causality, and applications to astrophysics, bi...
Statistical Methods in Phylogenetic and Evolutionary Inferences
Directory of Open Access Journals (Sweden)
Luigi Bertolotti
2013-05-01
Full Text Available Molecular instruments are the most accurate methods in organisms’identification and characterization. Biologists are often involved in studies where the main goal is to identify relationships among individuals. In this framework, it is very important to know and apply the most robust approaches to infer correctly these relationships, allowing the right conclusions about phylogeny. In this review, we will introduce the reader to the most used statistical methods in phylogenetic analyses, the Maximum Likelihood and the Bayesian approaches, considering for simplicity only analyses regardingDNA sequences. Several studieswill be showed as examples in order to demonstrate how the correct phylogenetic inference can lead the scientists to highlight very peculiar features in pathogens biology and evolution.
Conditional statistical inference with multistage testing designs.
Zwitser, Robert J; Maris, Gunter
2015-03-01
In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.
Statistical Inference for Data Adaptive Target Parameters.
Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J
2016-05-01
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Ignorability in Statistical and Probabilistic Inference
Jaeger, M
2011-01-01
When dealing with incomplete data in statistical learning, or incomplete observations in probabilistic inference, one needs to distinguish the fact that a certain event is observed from the fact that the observed event has happened. Since the modeling and computational complexities entailed by maintaining this proper distinction are often prohibitive, one asks for conditions under which it can be safely ignored. Such conditions are given by the missing at random (mar) and coarsened at random (car) assumptions. In this paper we provide an in-depth analysis of several questions relating to mar/car assumptions. Main purpose of our study is to provide criteria by which one may evaluate whether a car assumption is reasonable for a particular data collecting or observational process. This question is complicated by the fact that several distinct versions of mar/car assumptions exist. We therefore first provide an overview over these different versions, in which we highlight the distinction between distributional an...
Parametric statistical inference basic theory and modern approaches
Zacks, Shelemyahu; Tsokos, C P
1981-01-01
Parametric Statistical Inference: Basic Theory and Modern Approaches presents the developments and modern trends in statistical inference to students who do not have advanced mathematical and statistical preparation. The topics discussed in the book are basic and common to many fields of statistical inference and thus serve as a jumping board for in-depth study. The book is organized into eight chapters. Chapter 1 provides an overview of how the theory of statistical inference is presented in subsequent chapters. Chapter 2 briefly discusses statistical distributions and their properties. Chapt
Optimal Inference in Cointegrated Systems
1988-01-01
This paper studies the properties of maximum likelihood estimates of co-integrated systems. Alternative formulations of such models are considered including a new triangular system error correction mechanism. It is shown that full system maximum likelihood brings the problem of inference within the family that is covered by the locally asymptotically mixed normal asymptotic theory provided that all unit roots in the system have been eliminated by specification and data transformation. This re...
Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization
Gelman, Andrew; Lee, Daniel; Guo, Jiqiang
2015-01-01
Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…
Reasoning about Informal Statistical Inference: One Statistician's View
Rossman, Allan J.
2008-01-01
This paper identifies key concepts and issues associated with the reasoning of informal statistical inference. I focus on key ideas of inference that I think all students should learn, including at secondary level as well as tertiary. I argue that a fundamental component of inference is to go beyond the data at hand, and I propose that statistical…
The Importance of Statistical Modeling in Data Analysis and Inference
Rollins, Derrick, Sr.
2017-01-01
Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…
Statistical Inference for Partially Observed Diffusion Processes
DEFF Research Database (Denmark)
Jensen, Anders Christian
-dimensional Ornstein-Uhlenbeck where one coordinate is completely unobserved. This model does not have the Markov property and it makes parameter inference more complicated. Next we take a Bayesian approach and introduce some basic Markov chain Monte Carlo methods. In chapter ve and six we describe an Bayesian method...... to perform parameter inference in multivariate diffusion models that may be only partially observed. The methodology is applied to the stochastic FitzHugh-Nagumo model and the two-dimensional Ornstein-Uhlenbeck process. Chapter seven focus on parameter identifiability in the aprtially observed Ornstein...
Combining statistical inference and decisions in ecology
Williams, Perry J.; Hooten, Mevin B.
2016-01-01
Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation, and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem.
Combining statistical inference and decisions in ecology.
Williams, Perry J; Hooten, Mevin B
2016-09-01
Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods, including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem.
Statistical Inference and Patterns of Inequality in the Global North
Moran, Timothy Patrick
2006-01-01
Cross-national inequality trends have historically been a crucial field of inquiry across the social sciences, and new methodological techniques of statistical inference have recently improved the ability to analyze these trends over time. This paper applies Monte Carlo, bootstrap inference methods to the income surveys of the Luxembourg Income…
Bayesian Information Criterion as an Alternative way of Statistical Inference
Directory of Open Access Journals (Sweden)
Nadejda Yu. Gubanova
2012-05-01
Full Text Available The article treats Bayesian information criterion as an alternative to traditional methods of statistical inference, based on NHST. The comparison of ANOVA and BIC results for psychological experiment is discussed.
Bayesian Information Criterion as an Alternative way of Statistical Inference
Nadejda Yu. Gubanova; Simon Zh. Simavoryan
2012-01-01
The article treats Bayesian information criterion as an alternative to traditional methods of statistical inference, based on NHST. The comparison of ANOVA and BIC results for psychological experiment is discussed.
The renormalisation group via statistical inference
Bény, Cédric
2014-01-01
In physics one attempts to infer the rules governing a system given only the results of imperfect measurements. Hence, microscopic theories may be effectively indistinguishable experimentally. We develop an operationally motivated procedure to identify the corresponding equivalence classes of theories. Here it is argued that the renormalisation group arises from the inherent ambiguities in constructing the classes: one encounters flow parameters as, e.g., a regulator, a scale, or a measure of precision, which specify representatives of the equivalence classes. This provides a unifying framework and identifies the role played by information in renormalisation. We validate this idea by showing that it justifies the use of low-momenta n-point functions as relevant observables around a gaussian hypothesis. Our methods also provide a way to extend renormalisation techniques to effective models which are not based on the usual quantum-field formalism, and elucidates the distinctions between various type of RG.
The statistical inference in the touristinvestigation
Ascanio, Alfredo; Dpto. de Ciencias Económicas y Administrativas de la Universidad Simón Bolívar, Venezuela
2009-01-01
The objective of this article is to demonstrate that the findings of parameters of the descriptive statistic cannot be transferred very intuitively to the population findings or universe, without before tests of statistical meaning are made. El objetivo de este artículo es demostrar que los hallazgos de estadígrafos de la estadística descriptiva no se pueden trasladar intuitivamente a los parámetros poblacionales o del universo, sin que antes se realicen pruebas o tests de significación...
Statistical inference of Minimum Rank Factor Analysis
Shapiro, A; Ten Berge, JMF
2002-01-01
For any given number of factors, Minimum Rank Factor Analysis yields optimal communalities for an observed covariance matrix in the sense that the unexplained common variance with that number of factors is minimized, subject to the constraint that both the diagonal matrix of unique variances and the
Statistical inference of Minimum Rank Factor Analysis
Shapiro, A; Ten Berge, JMF
For any given number of factors, Minimum Rank Factor Analysis yields optimal communalities for an observed covariance matrix in the sense that the unexplained common variance with that number of factors is minimized, subject to the constraint that both the diagonal matrix of unique variances and the
Statistical causal inferences and their applications in public health research
Wu, Pan; Chen, Ding-Geng
2016-01-01
This book compiles and presents new developments in statistical causal inference. The accompanying data and computer programs are publicly available so readers may replicate the model development and data analysis presented in each chapter. In this way, methodology is taught so that readers may implement it directly. The book brings together experts engaged in causal inference research to present and discuss recent issues in causal inference methodological development. This is also a timely look at causal inference applied to scenarios that range from clinical trials to mediation and public health research more broadly. In an academic setting, this book will serve as a reference and guide to a course in causal inference at the graduate level (Master's or Doctorate). It is particularly relevant for students pursuing degrees in Statistics, Biostatistics and Computational Biology. Researchers and data analysts in public health and biomedical research will also find this book to be an important reference.
Nuclear Forensic Inferences Using Iterative Multidimensional Statistics
Energy Technology Data Exchange (ETDEWEB)
Robel, M; Kristo, M J; Heller, M A
2009-06-09
Nuclear forensics involves the analysis of interdicted nuclear material for specific material characteristics (referred to as 'signatures') that imply specific geographical locations, production processes, culprit intentions, etc. Predictive signatures rely on expert knowledge of physics, chemistry, and engineering to develop inferences from these material characteristics. Comparative signatures, on the other hand, rely on comparison of the material characteristics of the interdicted sample (the 'questioned sample' in FBI parlance) with those of a set of known samples. In the ideal case, the set of known samples would be a comprehensive nuclear forensics database, a database which does not currently exist. In fact, our ability to analyze interdicted samples and produce an extensive list of precise materials characteristics far exceeds our ability to interpret the results. Therefore, as we seek to develop the extensive databases necessary for nuclear forensics, we must also develop the methods necessary to produce the necessary inferences from comparison of our analytical results with these large, multidimensional sets of data. In the work reported here, we used a large, multidimensional dataset of results from quality control analyses of uranium ore concentrate (UOC, sometimes called 'yellowcake'). We have found that traditional multidimensional techniques, such as principal components analysis (PCA), are especially useful for understanding such datasets and drawing relevant conclusions. In particular, we have developed an iterative partial least squares-discriminant analysis (PLS-DA) procedure that has proven especially adept at identifying the production location of unknown UOC samples. By removing classes which fell far outside the initial decision boundary, and then rebuilding the PLS-DA model, we have consistently produced better and more definitive attributions than with a single pass classification approach. Performance of the
Vali Ahmadi, Mohammad; Doostparast, Mahdi; Ahmadi, Jafar
2015-04-01
In manufacturing industries, the lifetime of an item is usually characterised by a random variable X and considered to be satisfactory if X exceeds a given lower lifetime limit L. The probability of a satisfactory item is then ηL := P(X ≥ L), called conforming rate. In industrial companies, however, the lifetime performance index, proposed by Montgomery and denoted by CL, is widely used as a process capability index instead of the conforming rate. Assuming a parametric model for the random variable X, we show that there is a connection between the conforming rate and the lifetime performance index. Consequently, the statistical inferences about ηL and CL are equivalent. Hence, we restrict ourselves to statistical inference for CL based on generalised order statistics, which contains several ordered data models such as usual order statistics, progressively Type-II censored data and records. Various point and interval estimators for the parameter CL are obtained and optimal critical regions for the hypothesis testing problems concerning CL are proposed. Finally, two real data-sets on the lifetimes of insulating fluid and ball bearings, due to Nelson (1982) and Caroni (2002), respectively, and a simulated sample are analysed.
QInfer: Statistical inference software for quantum applications
Granade, Christopher; Hincks, Ian; Casagrande, Steven; Alexander, Thomas; Gross, Jonathan; Kononenko, Michal; Sanders, Yuval
2016-01-01
Characterizing quantum systems through experimental data is critical to applications as diverse as metrology and quantum computing. Analyzing this experimental data in a robust and reproducible manner is made challenging, however, by the lack of readily-available software for performing principled statistical analysis. We improve the robustness and reproducibility of characterization by introducing an open-source library, QInfer, to address this need. Our library makes it easy to analyze data from tomography, randomized benchmarking, and Hamiltonian learning experiments either in post-processing, or online as data is acquired. QInfer also provides functionality for predicting the performance of proposed experimental protocols from simulated runs. By delivering easy- to-use characterization tools based on principled statistical analysis, QInfer helps address many outstanding challenges facing quantum technology.
Simultaneous statistical inference for epigenetic data.
Schildknecht, Konstantin; Olek, Sven; Dickhaus, Thorsten
2015-01-01
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology.
Introducing Statistical Inference to Biology Students through Bootstrapping and Randomization
Lock, Robin H.; Lock, Patti Frazer
2008-01-01
Bootstrap methods and randomization tests are increasingly being used as alternatives to standard statistical procedures in biology. They also serve as an effective introduction to the key ideas of statistical inference in introductory courses for biology students. We discuss the use of such simulation based procedures in an integrated curriculum…
Models for probability and statistical inference theory and applications
Stapleton, James H
2007-01-01
This concise, yet thorough, book is enhanced with simulations and graphs to build the intuition of readersModels for Probability and Statistical Inference was written over a five-year period and serves as a comprehensive treatment of the fundamentals of probability and statistical inference. With detailed theoretical coverage found throughout the book, readers acquire the fundamentals needed to advance to more specialized topics, such as sampling, linear models, design of experiments, statistical computing, survival analysis, and bootstrapping.Ideal as a textbook for a two-semester sequence on probability and statistical inference, early chapters provide coverage on probability and include discussions of: discrete models and random variables; discrete distributions including binomial, hypergeometric, geometric, and Poisson; continuous, normal, gamma, and conditional distributions; and limit theory. Since limit theory is usually the most difficult topic for readers to master, the author thoroughly discusses mo...
CALUX measurements: statistical inferences for the dose-response curve.
Elskens, M; Baston, D S; Stumpf, C; Haedrich, J; Keupers, I; Croes, K; Denison, M S; Baeyens, W; Goeyens, L
2011-09-30
Chemical Activated LUciferase gene eXpression [CALUX] is a reporter gene mammalian cell bioassay used for detection and semi-quantitative analyses of dioxin-like compounds. CALUX dose-response curves for 2,3,7,8-tetrachlorodibenzo-p-dioxin [TCDD] are typically smooth and sigmoidal when the dose is portrayed on a logarithmic scale. Non-linear regression models are used to calibrate the CALUX response versus TCDD standards and to convert the sample response into Bioanalytical EQuivalents (BEQs). Several complications may arise in terms of statistical inference, specifically and most important is the uncertainty assessment of the predicted BEQ. This paper presents the use of linear calibration functions based on Box-Cox transformations to overcome the issue of uncertainty assessment. Main issues being addressed are (i) confidence and prediction intervals for the CALUX response, (ii) confidence and prediction intervals for the predicted BEQ-value, and (iii) detection/estimation capabilities for the sigmoid and linearized models. Statistical comparisons between different calculation methods involving inverse prediction, effective concentration ratios (ECR(20-50-80)) and slope ratio were achieved with example datasets in order to provide guidance for optimizing BEQ determinations and expand assay performance with the recombinant mouse hepatoma CALUX cell line H1L6.1c3.
Data-driven inference for the spatial scan statistic
Directory of Open Access Journals (Sweden)
Duczmal Luiz H
2011-08-01
Full Text Available Abstract Background Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. Results A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. Conclusions A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
Inferring Demographic History Using Two-Locus Statistics.
Ragsdale, Aaron P; Gutenkunst, Ryan N
2017-06-01
Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference. Copyright © 2017 by the Genetics Society of America.
Statistical Inference with Data Augmentation and Parameter Expansion
Yatracos, Yannis G.
2015-01-01
Statistical pragmatism embraces all efficient methods in statistical inference. Augmentation of the collected data is used herein to obtain representative population information from a large class of non-representative population's units. Parameter expansion of a probability model is shown to reduce the upper bound on the sum of error probabilities for a test of simple hypotheses, and a measure, R, is proposed for the effect of activating additional component(s) in the sufficient statistic.
Statistical Inference Methods for Sparse Biological Time Series Data
Directory of Open Access Journals (Sweden)
Voit Eberhard O
2011-04-01
Full Text Available Abstract Background Comparing metabolic profiles under different biological perturbations has become a powerful approach to investigating the functioning of cells. The profiles can be taken as single snapshots of a system, but more information is gained if they are measured longitudinally over time. The results are short time series consisting of relatively sparse data that cannot be analyzed effectively with standard time series techniques, such as autocorrelation and frequency domain methods. In this work, we study longitudinal time series profiles of glucose consumption in the yeast Saccharomyces cerevisiae under different temperatures and preconditioning regimens, which we obtained with methods of in vivo nuclear magnetic resonance (NMR spectroscopy. For the statistical analysis we first fit several nonlinear mixed effect regression models to the longitudinal profiles and then used an ANOVA likelihood ratio method in order to test for significant differences between the profiles. Results The proposed methods are capable of distinguishing metabolic time trends resulting from different treatments and associate significance levels to these differences. Among several nonlinear mixed-effects regression models tested, a three-parameter logistic function represents the data with highest accuracy. ANOVA and likelihood ratio tests suggest that there are significant differences between the glucose consumption rate profiles for cells that had been--or had not been--preconditioned by heat during growth. Furthermore, pair-wise t-tests reveal significant differences in the longitudinal profiles for glucose consumption rates between optimal conditions and heat stress, optimal and recovery conditions, and heat stress and recovery conditions (p-values Conclusion We have developed a nonlinear mixed effects model that is appropriate for the analysis of sparse metabolic and physiological time profiles. The model permits sound statistical inference procedures
LOWER LEVEL INFERENCE CONTROL IN STATISTICAL DATABASE SYSTEMS
Energy Technology Data Exchange (ETDEWEB)
Lipton, D.L.; Wong, H.K.T.
1984-02-01
An inference is the process of transforming unclassified data values into confidential data values. Most previous research in inference control has studied the use of statistical aggregates to deduce individual records. However, several other types of inference are also possible. Unknown functional dependencies may be apparent to users who have 'expert' knowledge about the characteristics of a population. Some correlations between attributes may be concluded from 'commonly-known' facts about the world. To counter these threats, security managers should use random sampling of databases of similar populations, as well as expert systems. 'Expert' users of the DATABASE SYSTEM may form inferences from the variable performance of the user interface. Users may observe on-line turn-around time, accounting statistics. the error message received, and the point at which an interactive protocol sequence fails. One may obtain information about the frequency distributions of attribute values, and the validity of data object names from this information. At the back-end of a database system, improved software engineering practices will reduce opportunities to bypass functional units of the database system. The term 'DATA OBJECT' should be expanded to incorporate these data object types which generate new classes of threats. The security of DATABASES and DATABASE SySTEMS must be recognized as separate but related problems. Thus, by increased awareness of lower level inferences, system security managers may effectively nullify the threat posed by lower level inferences.
Breakdown of statistical inference from some random experiments
Kupczynski, Marian
2014-01-01
Many experiments can be interpreted in terms of random processes operating according to some internal protocols. When experiments are costly or cannot be repeated as in some clinical trials one has data gathered in only one or in a few long runs of the experiment. In this paper we study data generated by computer experiments operating according to particular internal protocols. We show that the standard statistical analysis of a sample, containing 100 000 data points or more, may sometimes be highly misleading and statistical errors largely underestimated. Our results confirm in a dramatic way the dangers of standard asymptotic statistical inference based on data gathered in one, possibly long run of the experiment. We demonstrate that analyzing various subdivisions of samples by multiple chi-square tests and chi-square frequency graphs is very effective in detecting the anomalies. Therefore to assure correctness of the statistical inference the above mentioned chi-square tests and other non-parametric sample...
Statistical Inference for a Class of Multivariate Negative Binomial Distributions
DEFF Research Database (Denmark)
Rubak, Ege H.; Møller, Jesper; McCullagh, Peter
This paper considers statistical inference procedures for a class of models for positively correlated count variables called -permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...... studied in the literature, while this is the first statistical paper on -permanental random fields. The focus is on maximum likelihood estimation, maximum quasi-likelihood estimation and on maximum composite likelihood estimation based on uni- and bivariate distributions. Furthermore, new results...
Fisher information and statistical inference for phase-type distributions
DEFF Research Database (Denmark)
Bladt, Mogens; Esparza, Luz Judith R; Nielsen, Bo Friis
2011-01-01
This paper is concerned with statistical inference for both continuous and discrete phase-type distributions. We consider maximum likelihood estimation, where traditionally the expectation-maximization (EM) algorithm has been employed. Certain numerical aspects of this method are revised and we p...
The Philosophical Foundations of Prescriptive Statements and Statistical Inference
Sun, Shuyan; Pan, Wei
2011-01-01
From the perspectives of the philosophy of science and statistical inference, we discuss the challenges of making prescriptive statements in quantitative research articles. We first consider the prescriptive nature of educational research and argue that prescriptive statements are a necessity in educational research. The logic of deduction,…
Fisher information and statistical inference for phase-type distributions
DEFF Research Database (Denmark)
Bladt, Mogens; Esparza, Luz Judith R; Nielsen, Bo Friis
2011-01-01
This paper is concerned with statistical inference for both continuous and discrete phase-type distributions. We consider maximum likelihood estimation, where traditionally the expectation-maximization (EM) algorithm has been employed. Certain numerical aspects of this method are revised and we...
A Framework for Thinking about Informal Statistical Inference
Makar, Katie; Rubin, Andee
2009-01-01
Informal inferential reasoning has shown some promise in developing students' deeper understanding of statistical processes. This paper presents a framework to think about three key principles of informal inference--generalizations "beyond the data," probabilistic language, and data as evidence. The authors use primary school classroom…
The Philosophical Foundations of Prescriptive Statements and Statistical Inference
Sun, Shuyan; Pan, Wei
2011-01-01
From the perspectives of the philosophy of science and statistical inference, we discuss the challenges of making prescriptive statements in quantitative research articles. We first consider the prescriptive nature of educational research and argue that prescriptive statements are a necessity in educational research. The logic of deduction,…
Targeted estimation of nuisance parameters to obtain valid statistical inference.
van der Laan, Mark J
2014-01-01
In order to obtain concrete results, we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing independent and identically distributed copies of a random variable consisting of baseline covariates, a subsequently assigned binary treatment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so-called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically unbiased at any data distribution in the statistical model, it is essential to use data-adaptive estimators of these nuisance parameters such as ensemble learning, and specifically super-learning. Because such estimators involve optimal trade-off of bias and variance w.r.t. the infinite dimensional nuisance parameter itself, they result in a sub-optimal bias/variance trade-off for the resulting real-valued estimator of the estimand. We demonstrate that additional targeting of the estimators of these nuisance parameters guarantees that this bias for the estimand is second order and thereby allows us to prove theorems that establish asymptotic linearity of the estimator of the treatment specific mean under regularity conditions. These insights result in novel targeted minimum loss-based estimators (TMLEs) that use ensemble learning with additional targeted bias reduction to construct estimators of the nuisance parameters. In particular, we construct collaborative TMLEs (C-TMLEs) with known influence curve allowing for statistical inference, even though these C-TMLEs involve variable selection for the propensity score based on a criterion that measures how effective the resulting fit of the propensity score is in removing bias for the estimand. As a particular special
Statistical detection of EEG synchrony using empirical bayesian inference.
Directory of Open Access Journals (Sweden)
Archana K Singh
Full Text Available There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001 for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries.
Statistical detection of EEG synchrony using empirical bayesian inference.
Singh, Archana K; Asoh, Hideki; Takeda, Yuji; Phillips, Steven
2015-01-01
There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries.
In all likelihood statistical modelling and inference using likelihood
Pawitan, Yudi
2001-01-01
Based on a course in the theory of statistics this text concentrates on what can be achieved using the likelihood/Fisherian method of taking account of uncertainty when studying a statistical problem. It takes the concept ot the likelihood as providing the best methods for unifying the demands of statistical modelling and the theory of inference. Every likelihood concept is illustrated by realistic examples, which are not compromised by computational problems. Examples range from asimile comparison of two accident rates, to complex studies that require generalised linear or semiparametric mode
Quantitative evaluation of statistical inference in resting state functional MRI.
Yang, Xue; Kang, Hakmook; Newton, Allen; Landman, Bennett A
2012-01-01
Modern statistical inference techniques may be able to improve the sensitivity and specificity of resting state functional MRI (rs-fMRI) connectivity analysis through more realistic characterization of distributional assumptions. In simulation, the advantages of such modern methods are readily demonstrable. However quantitative empirical validation remains elusive in vivo as the true connectivity patterns are unknown and noise/artifact distributions are challenging to characterize with high fidelity. Recent innovations in capturing finite sample behavior of asymptotically consistent estimators (i.e., SIMulation and EXtrapolation - SIMEX) have enabled direct estimation of bias given single datasets. Herein, we leverage the theoretical core of SIMEX to study the properties of inference methods in the face of diminishing data (in contrast to increasing noise). The stability of inference methods with respect to synthetic loss of empirical data (defined as resilience) is used to quantify the empirical performance of one inference method relative to another. We illustrate this new approach in a comparison of ordinary and robust inference methods with rs-fMRI.
Network inference via adaptive optimal design
Directory of Open Access Journals (Sweden)
Stigter Johannes D
2012-09-01
Full Text Available Abstract Background Current research in network reverse engineering for genetic or metabolic networks very often does not include a proper experimental and/or input design. In this paper we address this issue in more detail and suggest a method that includes an iterative design of experiments based, on the most recent data that become available. The presented approach allows a reliable reconstruction of the network and addresses an important issue, i.e., the analysis and the propagation of uncertainties as they exist in both the data and in our own knowledge. These two types of uncertainties have their immediate ramifications for the uncertainties in the parameter estimates and, hence, are taken into account from the very beginning of our experimental design. Findings The method is demonstrated for two small networks that include a genetic network for mRNA synthesis and degradation and an oscillatory network describing a molecular network underlying adenosine 3’-5’ cyclic monophosphate (cAMP as observed in populations of Dyctyostelium cells. In both cases a substantial reduction in parameter uncertainty was observed. Extension to larger scale networks is possible but needs a more rigorous parameter estimation algorithm that includes sparsity as a constraint in the optimization procedure. Conclusion We conclude that a careful experiment design very often (but not always pays off in terms of reliability in the inferred network topology. For large scale networks a better parameter estimation algorithm is required that includes sparsity as an additional constraint. These algorithms are available in the literature and can also be used in an adaptive optimal design setting as demonstrated in this paper.
The NIRS Analysis Package: noise reduction and statistical inference.
Fekete, Tomer; Rubin, Denis; Carlson, Joshua M; Mujica-Parodi, Lilianne R
2011-01-01
Near infrared spectroscopy (NIRS) is a non-invasive optical imaging technique that can be used to measure cortical hemodynamic responses to specific stimuli or tasks. While analyses of NIRS data are normally adapted from established fMRI techniques, there are nevertheless substantial differences between the two modalities. Here, we investigate the impact of NIRS-specific noise; e.g., systemic (physiological), motion-related artifacts, and serial autocorrelations, upon the validity of statistical inference within the framework of the general linear model. We present a comprehensive framework for noise reduction and statistical inference, which is custom-tailored to the noise characteristics of NIRS. These methods have been implemented in a public domain Matlab toolbox, the NIRS Analysis Package (NAP). Finally, we validate NAP using both simulated and actual data, showing marked improvement in the detection power and reliability of NIRS.
Statistical inference for noisy nonlinear ecological dynamic systems.
Wood, Simon N
2010-08-26
Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.
Statistical inference for a class of multivariate negative binomial distributions
DEFF Research Database (Denmark)
Rubak, Ege Holger; Møller, Jesper; McCullagh, Peter
This paper considers statistical inference procedures for a class of models for positively correlated count variables called α-permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...... studied in the literature, while this is the first statistical paper on α-permanental randomfields. The focus is on maximum likelihood estimation, maximum quasi-likelihood estimation and on maximum composite likelihood estimation based on uni- and bivariate distributions. Furthermore, new results for α...
High-dimensional statistical inference: From vector to matrix
Zhang, Anru
Statistical inference for sparse signals or low-rank matrices in high-dimensional settings is of significant interest in a range of contemporary applications. It has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. In this thesis, we consider several problems in including sparse signal recovery (compressed sensing under restricted isometry) and low-rank matrix recovery (matrix recovery via rank-one projections and structured matrix completion). The first part of the thesis discusses compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that, in compressed sensing, delta kA nuclear norm minimization. Moreover, for any epsilon > 0, delta kA nuclear norm minimization method for stable recovery of low-rank matrices in the noisy case. The procedure is adaptive to the rank and robust against small perturbations. Both upper and lower bounds for the estimation accuracy under the Frobenius norm loss are obtained. The proposed estimator is shown to be rate-optimal under certain conditions. The estimator is easy to implement via convex programming and performs well numerically. The techniques and main results developed in the chapter also have implications to other related statistical problems. An application to estimation of spiked covariance matrices from one-dimensional random projections is considered. The results demonstrate that it is still possible to accurately estimate the covariance matrix of a high-dimensional distribution based only on one-dimensional projections. For the third part of the thesis, we consider another setting of low-rank matrix completion. Current literature
Gene regulatory network inference using out of equilibrium statistical mechanics.
Benecke, Arndt
2008-08-01
Spatiotemporal control of gene expression is fundamental to multicellular life. Despite prodigious efforts, the encoding of gene expression regulation in eukaryotes is not understood. Gene expression analyses nourish the hope to reverse engineer effector-target gene networks using inference techniques. Inference from noisy and circumstantial data relies on using robust models with few parameters for the underlying mechanisms. However, a systematic path to gene regulatory network reverse engineering from functional genomics data is still impeded by fundamental problems. Recently, Johannes Berg from the Theoretical Physics Institute of Cologne University has made two remarkable contributions that significantly advance the gene regulatory network inference problem. Berg, who uses gene expression data from yeast, has demonstrated a nonequilibrium regime for mRNA concentration dynamics and was able to map the gene regulatory process upon simple stochastic systems driven out of equilibrium. The impact of his demonstration is twofold, affecting both the understanding of the operational constraints under which transcription occurs and the capacity to extract relevant information from highly time-resolved expression data. Berg has used his observation to predict target genes of selected transcription factors, and thereby, in principle, demonstrated applicability of his out of equilibrium statistical mechanics approach to the gene network inference problem.
Indirect Fourier transform in the context of statistical inference.
Muthig, Michael; Prévost, Sylvain; Orglmeister, Reinhold; Gradzielski, Michael
2016-09-01
Inferring structural information from the intensity of a small-angle scattering (SAS) experiment is an ill-posed inverse problem. Thus, the determination of a solution is in general non-trivial. In this work, the indirect Fourier transform (IFT), which determines the pair distance distribution function from the intensity and hence yields structural information, is discussed within two different statistical inference approaches, namely a frequentist one and a Bayesian one, in order to determine a solution objectively From the frequentist approach the cross-validation method is obtained as a good practical objective function for selecting an IFT solution. Moreover, modern machine learning methods are employed to suppress oscillatory behaviour of the solution, hence extracting only meaningful features of the solution. By comparing the results yielded by the different methods presented here, the reliability of the outcome can be improved and thus the approach should enable more reliable information to be deduced from SAS experiments.
DEFF Research Database (Denmark)
Møller, Jesper
.1 with the title ‘Inference'.) This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods using Markov chain Monte Carlo (MCMC) simulations. Due to space limitations the focus...
Statistical Inference for Big Data Problems in Molecular Biophysics
Energy Technology Data Exchange (ETDEWEB)
Ramanathan, Arvind [ORNL; Savol, Andrej [University of Pittsburgh School of Medicine, Pittsburgh PA; Burger, Virginia [University of Pittsburgh School of Medicine, Pittsburgh PA; Quinn, Shannon [University of Pittsburgh School of Medicine, Pittsburgh PA; Agarwal, Pratul K [ORNL; Chennubhotla, Chakra [University of Pittsburgh School of Medicine, Pittsburgh PA
2012-01-01
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technologi- cal and algorithmic improvements in computation have brought molecular simu- lations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important information has now become a true challenge on its own. Mining this data for important patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mech- anistic basis of cellular homeostasis.
Statistical Inference for Partially Linear Regression Models with Measurement Errors
Institute of Scientific and Technical Information of China (English)
Jinhong YOU; Qinfeng XU; Bin ZHOU
2008-01-01
In this paper, the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors. Firstly,a bandwidth selection procedure is proposed, which is a combination of the difference-based technique and GCV method. Secondly, a goodness-of-fit test procedure is proposed,which is an extension of the generalized likelihood technique. Thirdly, a variable selection procedure for the parametric part is provided based on the nonconcave penalization and corrected profile least squares. Same as "Variable selection via nonconcave penalized like-lihood and its oracle properties" (J. Amer. Statist. Assoc., 96, 2001, 1348-1360), it is shown that the resulting estimator has an oracle property with a proper choice of regu-larization parameters and penalty function. Simulation studies are conducted to illustrate the finite sample performances of the proposed procedures.
An Alternating Iterative Method and Its Application in Statistical Inference
Institute of Scientific and Technical Information of China (English)
Ning Zhong SHI; Guo Rong HU; Qing CUI
2008-01-01
This paper studies non-convex programming problems. It is known that, in statistical inference, many constrained estimation problems may be expressed as convex programming problems. However, in many practical problems, the objective functions are not convex. In this paper, we give a definition of a semi-convex objective function and discuss the corresponding non-convex programming problems. A two-step iterative algorithm called the alternating iterative method is proposed for finding solutions for such problems. The method is illustrated by three examples in constrained estimation problems given in Sasabuchi et al. (Biometrika, 72, 465–472 (1983)), Shi N. Z. (J. Multivariate Anal.,50, 282–293 (1994)) and El Barmi H. and Dykstra R. (Ann. Statist., 26, 1878–1893 (1998)).
Online Updating of Statistical Inference in the Big Data Setting.
Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui
2016-01-01
We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.
Bayesian inference on the sphere beyond statistical isotropy
Das, Santanu; Souradeep, Tarun
2015-01-01
We present a general method for Bayesian inference of the underlying covariance structure of random fields on a sphere. We employ the Bipolar Spherical Harmonic (BipoSH) representation of general covariance structure on the sphere. We illustrate the efficacy of the method as a principled approach to assess violation of statistical isotropy (SI) in the sky maps of Cosmic Microwave Background (CMB) fluctuations. SI violation in observed CMB maps arise due to known physical effects such as Doppler boost and weak lensing; yet unknown theoretical possibilities like cosmic topology and subtle violations of the cosmological principle, as well as, expected observational artefacts of scanning the sky with a non-circular beam, masking, foreground residuals, anisotropic noise, etc. We explicitly demonstrate the recovery of the input SI violation signals with their full statistics in simulated CMB maps. Our formalism easily adapts to exploring parametric physical models with non-SI covariance, as we illustrate for the in...
Statistical inference to advance network models in epidemiology.
Welch, David; Bansal, Shweta; Hunter, David R
2011-03-01
Contact networks are playing an increasingly important role in the study of epidemiology. Most of the existing work in this area has focused on considering the effect of underlying network structure on epidemic dynamics by using tools from probability theory and computer simulation. This work has provided much insight on the role that heterogeneity in host contact patterns plays on infectious disease dynamics. Despite the important understanding afforded by the probability and simulation paradigm, this approach does not directly address important questions about the structure of contact networks such as what is the best network model for a particular mode of disease transmission, how parameter values of a given model should be estimated, or how precisely the data allow us to estimate these parameter values. We argue that these questions are best answered within a statistical framework and discuss the role of statistical inference in estimating contact networks from epidemiological data.
Terminal-Dependent Statistical Inference for the FBSDEs Models
Directory of Open Access Journals (Sweden)
Yunquan Song
2014-01-01
Full Text Available The original stochastic differential equations (OSDEs and forward-backward stochastic differential equations (FBSDEs are often used to model complex dynamic process that arise in financial, ecological, and many other areas. The main difference between OSDEs and FBSDEs is that the latter is designed to depend on a terminal condition, which is a key factor in some financial and ecological circumstances. It is interesting but challenging to estimate FBSDE parameters from noisy data and the terminal condition. However, to the best of our knowledge, the terminal-dependent statistical inference for such a model has not been explored in the existing literature. We proposed a nonparametric terminal control variables estimation method to address this problem. The reason why we use the terminal control variables is that the newly proposed inference procedures inherit the terminal-dependent characteristic. Through this new proposed method, the estimators of the functional coefficients of the FBSDEs model are obtained. The asymptotic properties of the estimators are also discussed. Simulation studies show that the proposed method gives satisfying estimates for the FBSDE parameters from noisy data and the terminal condition. A simulation is performed to test the feasibility of our method.
Algebraic Statistical Model for Biochemical Network Dynamics Inference.
Linder, Daniel F; Rempala, Grzegorz A
2013-12-01
With modern molecular quantification methods, like, for instance, high throughput sequencing, biologists may perform multiple complex experiments and collect longitudinal data on RNA and DNA concentrations. Such data may be then used to infer cellular level interactions between the molecular entities of interest. One method which formalizes such inference is the stoichiometric algebraic statistical model (SASM) of [2] which allows to analyze the so-called conic (or single source) networks. Despite its intuitive appeal, up until now the SASM has been only heuristically studied on few simple examples. The current paper provides a more formal mathematical treatment of the SASM, expanding the original model to a wider class of reaction systems decomposable into multiple conic subnetworks. In particular, it is proved here that on such networks the SASM enjoys the so-called sparsistency property, that is, it asymptotically (with the number of observed network trajectories) discards the false interactions by setting their reaction rates to zero. For illustration, we apply the extended SASM to in silico data from a generic decomposable network as well as to biological data from an experimental search for a possible transcription factor for the heat shock protein 70 (Hsp70) in the zebrafish retina.
An argument for mechanism-based statistical inference in cancer.
Geman, Donald; Ochs, Michael; Price, Nathan D; Tomasetti, Cristian; Younes, Laurent
2015-05-01
Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning biomarkers, metabolism, cell signaling, network inference and tumorigenesis.
Multiple Illuminant Colour Estimation via Statistical Inference on Factor Graphs.
Mutimbu, Lawrence; Robles-Kelly, Antonio
2016-08-31
This paper presents a method to recover a spatially varying illuminant colour estimate from scenes lit by multiple light sources. Starting with the image formation process, we formulate the illuminant recovery problem in a statistically datadriven setting. To do this, we use a factor graph defined across the scale space of the input image. In the graph, we utilise a set of illuminant prototypes computed using a data driven approach. As a result, our method delivers a pixelwise illuminant colour estimate being devoid of libraries or user input. The use of a factor graph also allows for the illuminant estimates to be recovered making use of a maximum a posteriori (MAP) inference process. Moreover, we compute the probability marginals by performing a Delaunay triangulation on our factor graph. We illustrate the utility of our method for pixelwise illuminant colour recovery on widely available datasets and compare against a number of alternatives. We also show sample colour correction results on real-world images.
Recent Advances in System Reliability Signatures, Multi-state Systems and Statistical Inference
Frenkel, Ilia
2012-01-01
Recent Advances in System Reliability discusses developments in modern reliability theory such as signatures, multi-state systems and statistical inference. It describes the latest achievements in these fields, and covers the application of these achievements to reliability engineering practice. The chapters cover a wide range of new theoretical subjects and have been written by leading experts in reliability theory and its applications. The topics include: concepts and different definitions of signatures (D-spectra), their properties and applications to reliability of coherent systems and network-type structures; Lz-transform of Markov stochastic process and its application to multi-state system reliability analysis; methods for cost-reliability and cost-availability analysis of multi-state systems; optimal replacement and protection strategy; and statistical inference. Recent Advances in System Reliability presents many examples to illustrate the theoretical results. Real world multi-state systems...
Algorithm Optimally Orders Forward-Chaining Inference Rules
James, Mark
2008-01-01
People typically develop knowledge bases in a somewhat ad hoc manner by incrementally adding rules with no specific organization. This often results in a very inefficient execution of those rules since they are so often order sensitive. This is relevant to tasks like Deep Space Network in that it allows the knowledge base to be incrementally developed and have it automatically ordered for efficiency. Although data flow analysis was first developed for use in compilers for producing optimal code sequences, its usefulness is now recognized in many software systems including knowledge-based systems. However, this approach for exhaustively computing data-flow information cannot directly be applied to inference systems because of the ubiquitous execution of the rules. An algorithm is presented that efficiently performs a complete producer/consumer analysis for each antecedent and consequence clause in a knowledge base to optimally order the rules to minimize inference cycles. An algorithm was developed that optimally orders a knowledge base composed of forwarding chaining inference rules such that independent inference cycle executions are minimized, thus, resulting in significantly faster execution. This algorithm was integrated into the JPL tool Spacecraft Health Inference Engine (SHINE) for verification and it resulted in a significant reduction in inference cycles for what was previously considered an ordered knowledge base. For a knowledge base that is completely unordered, then the improvement is much greater.
Inferring Master Painters' Esthetic Biases from the Statistics of Portraits
Aleem, Hassan; Correa-Herran, Ivan; Grzywacz, Norberto M.
2017-01-01
The Processing Fluency Theory posits that the ease of sensory information processing in the brain facilitates esthetic pleasure. Accordingly, the theory would predict that master painters should display biases toward visual properties such as symmetry, balance, and moderate complexity. Have these biases been occurring and if so, have painters been optimizing these properties (fluency variables)? Here, we address these questions with statistics of portrait paintings from the Early Renaissance period. To do this, we first developed different computational measures for each of the aforementioned fluency variables. Then, we measured their statistics in 153 portraits from 26 master painters, in 27 photographs of people in three controlled poses, and in 38 quickly snapped photographs of individual persons. A statistical comparison between Early Renaissance portraits and quickly snapped photographs revealed that painters showed a bias toward balance, symmetry, and moderate complexity. However, a comparison between portraits and controlled-pose photographs showed that painters did not optimize each of these properties. Instead, different painters presented biases toward different, narrow ranges of fluency variables. Further analysis suggested that the painters' individuality stemmed in part from having to resolve the tension between complexity vs. symmetry and balance. We additionally found that constraints on the use of different painting materials by distinct painters modulated these fluency variables systematically. In conclusion, the Processing Fluency Theory of Esthetic Pleasure would need expansion if we were to apply it to the history of visual art since it cannot explain the lack of optimization of each fluency variables. To expand the theory, we propose the existence of a Neuroesthetic Space, which encompasses the possible values that each of the fluency variables can reach in any given art period. We discuss the neural mechanisms of this Space and propose that it
Inferring Master Painters' Esthetic Biases from the Statistics of Portraits.
Aleem, Hassan; Correa-Herran, Ivan; Grzywacz, Norberto M
2017-01-01
The Processing Fluency Theory posits that the ease of sensory information processing in the brain facilitates esthetic pleasure. Accordingly, the theory would predict that master painters should display biases toward visual properties such as symmetry, balance, and moderate complexity. Have these biases been occurring and if so, have painters been optimizing these properties (fluency variables)? Here, we address these questions with statistics of portrait paintings from the Early Renaissance period. To do this, we first developed different computational measures for each of the aforementioned fluency variables. Then, we measured their statistics in 153 portraits from 26 master painters, in 27 photographs of people in three controlled poses, and in 38 quickly snapped photographs of individual persons. A statistical comparison between Early Renaissance portraits and quickly snapped photographs revealed that painters showed a bias toward balance, symmetry, and moderate complexity. However, a comparison between portraits and controlled-pose photographs showed that painters did not optimize each of these properties. Instead, different painters presented biases toward different, narrow ranges of fluency variables. Further analysis suggested that the painters' individuality stemmed in part from having to resolve the tension between complexity vs. symmetry and balance. We additionally found that constraints on the use of different painting materials by distinct painters modulated these fluency variables systematically. In conclusion, the Processing Fluency Theory of Esthetic Pleasure would need expansion if we were to apply it to the history of visual art since it cannot explain the lack of optimization of each fluency variables. To expand the theory, we propose the existence of a Neuroesthetic Space, which encompasses the possible values that each of the fluency variables can reach in any given art period. We discuss the neural mechanisms of this Space and propose that it
Statistical Inference for Point Process Models of Rainfall
Smith, James A.; Karr, Alan F.
1985-01-01
In this paper we develop maximum likelihood procedures for parameter estimation and model selection that apply to a large class of point process models that have been used to model rainfall occurrences, including Cox processes, Neyman-Scott processes, and renewal processes. The statistical inference procedures are based on the stochastic intensity λ(t) = lims→0,s>0 (1/s)E[N(t + s) - N(t)|N(u), u process is shown to have a simple expression in terms of the stochastic intensity. The main result of this paper is a recursive procedure for computing stochastic intensities; the procedure is applicable to a broad class of point process models, including renewal Cox process with Markovian intensity processes and an important class of Neyman-Scott processes. The model selection procedure we propose, which is based on likelihood ratios, allows direct comparison of two classes of point processes to determine which provides a better model for a given data set. The estimation and model selection procedures are applied to two data sets of simulated Cox process arrivals and a data set of daily rainfall occurrences in the Potomac River basin.
Multivariate Statistical Inference of Lightning Occurrence, and Using Lightning Observations
Boccippio, Dennis
2004-01-01
Two classes of multivariate statistical inference using TRMM Lightning Imaging Sensor, Precipitation Radar, and Microwave Imager observation are studied, using nonlinear classification neural networks as inferential tools. The very large and globally representative data sample provided by TRMM allows both training and validation (without overfitting) of neural networks with many degrees of freedom. In the first study, the flashing / or flashing condition of storm complexes is diagnosed using radar, passive microwave and/or environmental observations as neural network inputs. The diagnostic skill of these simple lightning/no-lightning classifiers can be quite high, over land (above 80% Probability of Detection; below 20% False Alarm Rate). In the second, passive microwave and lightning observations are used to diagnose radar reflectivity vertical structure. A priori diagnosis of hydrometeor vertical structure is highly important for improved rainfall retrieval from either orbital radars (e.g., the future Global Precipitation Mission "mothership") or radiometers (e.g., operational SSM/I and future Global Precipitation Mission passive microwave constellation platforms), we explore the incremental benefit to such diagnosis provided by lightning observations.
DEFF Research Database (Denmark)
Møller, Jesper
2010-01-01
Chapter 9: This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods based on a maximum likelihood or Bayesian approach combined with markov chain Monte Carlo...
Physics of epigenetic landscapes and statistical inference by cells
Lang, Alex H.
Biology is currently in the midst of a revolution. Great technological advances have led to unprecedented quantitative data at the whole genome level. However, new techniques are needed to deal with this deluge of high-dimensional data. Therefore, statistical physics has the potential to help develop systems biology level models that can incorporate complex data. Additionally, physicists have made great strides in understanding non-equilibrium thermodynamics. However, the consequences of these advances have yet to be fully incorporated into biology. There are three specific problems that I address in my dissertation. First, a common metaphor for describing development is a rugged "epigenetic landscape'' where cell fates are represented as attracting valleys resulting from a complex regulatory network. I introduce a framework for explicitly constructing epigenetic landscapes that combines genomic data with techniques from spin-glass physics. The model reproduces known reprogramming protocols and identifies candidate transcription factors for reprogramming to novel cell fates, suggesting epigenetic landscapes are a powerful paradigm for understanding cellular identity. Second, I examine the dynamics of cellular reprogramming. By reanalyzing all available time-series data, I show that gene expression dynamics during reprogramming follow a simple one-dimensional reaction coordinate that is independent of both the time and details of experimental protocol used. I show that such a reaction coordinate emerges naturally from epigenetic landscape models of cell identity where cellular reprogramming is viewed as a "barrier-crossing'' between the starting and ending cell fates. Overall, the analysis and model suggest that gene expression dynamics during reprogramming follow a canonical trajectory consistent with the idea of an ``optimal path'' in gene expression space for reprogramming. Third, an important task of cells is to perform complex computations in response to
Wilkinson, Michael
2014-03-01
Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.
Building Intuitions about Statistical Inference Based on Resampling
Watson, Jane; Chance, Beth
2012-01-01
Formal inference, which makes theoretical assumptions about distributions and applies hypothesis testing procedures with null and alternative hypotheses, is notoriously difficult for tertiary students to master. The debate about whether this content should appear in Years 11 and 12 of the "Australian Curriculum: Mathematics" has gone on…
Parametric statistical inference for discretely observed diffusion processes
DEFF Research Database (Denmark)
Pedersen, Asger Roer
Part 1: Theoretical results Part 2: Statistical applications of Gaussian diffusion processes in freshwater ecology......Part 1: Theoretical results Part 2: Statistical applications of Gaussian diffusion processes in freshwater ecology...
Some challenges with statistical inference in adaptive designs.
Hung, H M James; Wang, Sue-Jane; Yang, Peiling
2014-01-01
Adaptive designs have generated a great deal of attention to clinical trial communities. The literature contains many statistical methods to deal with added statistical uncertainties concerning the adaptations. Increasingly encountered in regulatory applications are adaptive statistical information designs that allow modification of sample size or related statistical information and adaptive selection designs that allow selection of doses or patient populations during the course of a clinical trial. For adaptive statistical information designs, a few statistical testing methods are mathematically equivalent, as a number of articles have stipulated, but arguably there are large differences in their practical ramifications. We pinpoint some undesirable features of these methods in this work. For adaptive selection designs, the selection based on biomarker data for testing the correlated clinical endpoints may increase statistical uncertainty in terms of type I error probability, and most importantly the increased statistical uncertainty may be impossible to assess.
Approximation of epidemic models by diffusion processes and their statistical inference.
Guy, Romain; Larédo, Catherine; Vergu, Elisabeta
2015-02-01
Multidimensional continuous-time Markov jump processes [Formula: see text] on [Formula: see text] form a usual set-up for modeling [Formula: see text]-like epidemics. However, when facing incomplete epidemic data, inference based on [Formula: see text] is not easy to be achieved. Here, we start building a new framework for the estimation of key parameters of epidemic models based on statistics of diffusion processes approximating [Formula: see text]. First, previous results on the approximation of density-dependent [Formula: see text]-like models by diffusion processes with small diffusion coefficient [Formula: see text], where [Formula: see text] is the population size, are generalized to non-autonomous systems. Second, our previous inference results on discretely observed diffusion processes with small diffusion coefficient are extended to time-dependent diffusions. Consistent and asymptotically Gaussian estimates are obtained for a fixed number [Formula: see text] of observations, which corresponds to the epidemic context, and for [Formula: see text]. A correction term, which yields better estimates non asymptotically, is also included. Finally, performances and robustness of our estimators with respect to various parameters such as [Formula: see text] (the basic reproduction number), [Formula: see text], [Formula: see text] are investigated on simulations. Two models, [Formula: see text] and [Formula: see text], corresponding to single and recurrent outbreaks, respectively, are used to simulate data. The findings indicate that our estimators have good asymptotic properties and behave noticeably well for realistic numbers of observations and population sizes. This study lays the foundations of a generic inference method currently under extension to incompletely observed epidemic data. Indeed, contrary to the majority of current inference techniques for partially observed processes, which necessitates computer intensive simulations, our method being mostly an
Receptor arrays optimized for natural odor statistics.
Zwicker, David; Murugan, Arvind; Brenner, Michael P
2016-05-17
Natural odors typically consist of many molecules at different concentrations. It is unclear how the numerous odorant molecules and their possible mixtures are discriminated by relatively few olfactory receptors. Using an information theoretic model, we show that a receptor array is optimal for this task if it achieves two possibly conflicting goals: (i) Each receptor should respond to half of all odors and (ii) the response of different receptors should be uncorrelated when averaged over odors presented with natural statistics. We use these design principles to predict statistics of the affinities between receptors and odorant molecules for a broad class of odor statistics. We also show that optimal receptor arrays can be tuned to either resolve concentrations well or distinguish mixtures reliably. Finally, we use our results to predict properties of experimentally measured receptor arrays. Our work can thus be used to better understand natural olfaction, and it also suggests ways to improve artificial sensor arrays.
Receptor arrays optimized for natural odor statistics
Zwicker, David; Brenner, Michael P
2016-01-01
Natural odors typically consist of many molecules at different concentrations. It is unclear how the numerous odorant molecules and their possible mixtures are discriminated by relatively few olfactory receptors. Using an information-theoretic model, we show that a receptor array is optimal for this task if it achieves two possibly conflicting goals: (i) each receptor should respond to half of all odors and (ii) the response of different receptors should be uncorrelated when averaged over odors presented with natural statistics. We use these design principles to predict statistics of the affinities between receptors and odorant molecules for a broad class of odor statistics. We also show that optimal receptor arrays can be tuned to either resolve concentrations well or distinguish mixtures reliably. Finally, we use our results to predict properties of experimentally measured receptor arrays. Our work can thus be used to better understand natural olfaction and it also suggests ways to improve artificial sensor...
For and Against Methodologies: Some Perspectives on Recent Causal and Statistical Inference Debates.
Greenland, Sander
2017-01-01
I present an overview of two methods controversies that are central to analysis and inference: That surrounding causal modeling as reflected in the "causal inference" movement, and that surrounding null bias in statistical methods as applied to causal questions. Human factors have expanded what might otherwise have been narrow technical discussions into broad philosophical debates. There seem to be misconceptions about the requirements and capabilities of formal methods, especially in notions that certain assumptions or models (such as potential-outcome models) are necessary or sufficient for valid inference. I argue that, once these misconceptions are removed, most elements of the opposing views can be reconciled. The chief problem of causal inference then becomes one of how to teach sound use of formal methods (such as causal modeling, statistical inference, and sensitivity analysis), and how to apply them without generating the overconfidence and misinterpretations that have ruined so many statistical practices.
Expert Oracle SQL optimization, deployment, and statistics
Hasler, Tony
2014-01-01
Expert Oracle SQL: Optimization, Deployment, and Statistics is about optimizing individual SQL statements, especially on production database systems. This Oracle-specific book begins by assuming you have already identified a particular SQL statement and are considering taking steps to improve its performance. The book describes a systematic process by which to diagnose a problem statement, identify a fix, and to implement that fix safely in a production system. You'll learn not only to improve performance when it is too slow, but also to stabilize performance when it is too variable. You'll learn about system statistics and how the Cost-Based Optimizer uses them to determine a suitable execution plan for a given statement. That knowledge provides the foundation from which to identify the root cause, and to stabilize and improve performance. Next after identifying a problem and the underlying root cause is to put in place a solution. Expert Oracle SQL: Optimization, Deployment, and Statistics explains how to ...
Moraes, Alvaro
2015-01-01
Epidemics have shaped, sometimes more than wars and natural disasters, demo- graphic aspects of human populations around the world, their health habits and their economies. Ebola and the Middle East Respiratory Syndrome (MERS) are clear and current examples of potential hazards at planetary scale. During the spread of an epidemic disease, there are phenomena, like the sudden extinction of the epidemic, that can not be captured by deterministic models. As a consequence, stochastic models have been proposed during the last decades. A typical forward problem in the stochastic setting could be the approximation of the expected number of infected individuals found in one month from now. On the other hand, a typical inverse problem could be, given a discretely observed set of epidemiological data, infer the transmission rate of the epidemic or its basic reproduction number. Markovian epidemic models are stochastic models belonging to a wide class of pure jump processes known as Stochastic Reaction Networks (SRNs), that are intended to describe the time evolution of interacting particle systems where one particle interacts with the others through a finite set of reaction channels. SRNs have been mainly developed to model biochemical reactions but they also have applications in neural networks, virus kinetics, and dynamics of social networks, among others. 4 This PhD thesis is focused on novel fast simulation algorithms and statistical inference methods for SRNs. Our novel Multi-level Monte Carlo (MLMC) hybrid simulation algorithms provide accurate estimates of expected values of a given observable of SRNs at a prescribed final time. They are designed to control the global approximation error up to a user-selected accuracy and up to a certain confidence level, and with near optimal computational work. We also present novel dual-weighted residual expansions for fast estimation of weak and strong errors arising from the MLMC methodology. Regarding the statistical inference
Inference Based on Simple Step Statistics for the Location Model.
1981-07-01
function. Let TN,k(9) - Zak(’)Vi(e). Then TNk is called the k-step statistic. Noether (1973) studied the 1-step statistic with particular emphasis on...opposed to the sign statistic. These latter two comparisons were first discussed by Noether (1973) in a somewhat different setting. Notice that the...obtained by Noether (1973). If k - 3, we seek the (C + 1)’st and (2N - bI - b2 - C)’th ordered Walsh averages in D The algorithm of Section 3 modified to
Larwin, Karen H.; Larwin, David A.
2011-01-01
Bootstrapping methods and random distribution methods are increasingly recommended as better approaches for teaching students about statistical inference in introductory-level statistics courses. The authors examined the effect of teaching undergraduate business statistics students using random distribution and bootstrapping simulations. It is the…
Larwin, Karen H.; Larwin, David A.
2011-01-01
Bootstrapping methods and random distribution methods are increasingly recommended as better approaches for teaching students about statistical inference in introductory-level statistics courses. The authors examined the effect of teaching undergraduate business statistics students using random distribution and bootstrapping simulations. It is the…
Statistical Inferences from Formaldehyde Dna-Protein Cross-Link Data
Physiologically-based pharmacokinetic (PBPK) modeling has reached considerable sophistication in its application in the pharmacological and environmental health areas. Yet, mature methodologies for making statistical inferences have not been routinely incorporated in these applic...
Statistical Inference Models for Image Datasets with Systematic Variations.
Kim, Won Hwa; Bendlin, Barbara B; Chung, Moo K; Johnson, Sterling C; Singh, Vikas
2015-06-01
Statistical analysis of longitudinal or cross sectional brain imaging data to identify effects of neurodegenerative diseases is a fundamental task in various studies in neuroscience. However, when there are systematic variations in the images due to parameter changes such as changes in the scanner protocol, hardware changes, or when combining data from multi-site studies, the statistical analysis becomes problematic. Motivated by this scenario, the goal of this paper is to develop a unified statistical solution to the problem of systematic variations in statistical image analysis. Based in part on recent literature in harmonic analysis on diffusion maps, we propose an algorithm which compares operators that are resilient to the systematic variations. These operators are derived from the empirical measurements of the image data and provide an efficient surrogate to capturing the actual changes across images. We also establish a connection between our method to the design of wavelets in non-Euclidean space. To evaluate the proposed ideas, we present various experimental results on detecting changes in simulations as well as show how the method offers improved statistical power in the analysis of real longitudinal PIB-PET imaging data acquired from participants at risk for Alzheimer's disease (AD).
Statistical Inference Models for Image Datasets with Systematic Variations
Kim, Won Hwa; Bendlin, Barbara B.; Chung, Moo K.; Johnson, Sterling C.; Singh, Vikas
2016-01-01
Statistical analysis of longitudinal or cross sectional brain imaging data to identify effects of neurodegenerative diseases is a fundamental task in various studies in neuroscience. However, when there are systematic variations in the images due to parameter changes such as changes in the scanner protocol, hardware changes, or when combining data from multi-site studies, the statistical analysis becomes problematic. Motivated by this scenario, the goal of this paper is to develop a unified statistical solution to the problem of systematic variations in statistical image analysis. Based in part on recent literature in harmonic analysis on diffusion maps, we propose an algorithm which compares operators that are resilient to the systematic variations. These operators are derived from the empirical measurements of the image data and provide an efficient surrogate to capturing the actual changes across images. We also establish a connection between our method to the design of wavelets in non-Euclidean space. To evaluate the proposed ideas, we present various experimental results on detecting changes in simulations as well as show how the method offers improved statistical power in the analysis of real longitudinal PIB-PET imaging data acquired from participants at risk for Alzheimer’s disease (AD). PMID:26989336
Contrasting Diversity Values: Statistical Inferences Based on Overlapping Confidence Intervals
MacGregor-Fors, Ian; Payton, Mark E.
2013-01-01
Ecologists often contrast diversity (species richness and abundances) using tests for comparing means or indices. However, many popular software applications do not support performing standard inferential statistics for estimates of species richness and/or density. In this study we simulated the behavior of asymmetric log-normal confidence intervals and determined an interval level that mimics statistical tests with P(α) = 0.05 when confidence intervals from two distributions do not overlap. Our results show that 84% confidence intervals robustly mimic 0.05 statistical tests for asymmetric confidence intervals, as has been demonstrated for symmetric ones in the past. Finally, we provide detailed user-guides for calculating 84% confidence intervals in two of the most robust and highly-used freeware related to diversity measurements for wildlife (i.e., EstimateS, Distance). PMID:23437239
Bayer, Christian
2016-02-20
© 2016 Taylor & Francis Group, LLC. ABSTRACT: In this work, we present an extension of the forward–reverse representation introduced by Bayer and Schoenmakers (Annals of Applied Probability, 24(5):1994–2032, 2014) to the context of stochastic reaction networks (SRNs). We apply this stochastic representation to the computation of efficient approximations of expected values of functionals of SRN bridges, that is, SRNs conditional on their values in the extremes of given time intervals. We then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve a set of deterministic optimization problems where the SRNs are replaced by their reaction-rate ordinary differential equations approximation; then, during phase II, we apply the Monte Carlo version of the expectation-maximization algorithm to the phase I output. By selecting a set of overdispersed seeds as initial points in phase I, the output of parallel runs from our two-phase method is a cluster of approximate maximum likelihood estimates. Our results are supported by numerical examples.
Vilanova, Pedro
2016-01-07
In this work, we present an extension of the forward-reverse representation introduced in Simulation of forward-reverse stochastic representations for conditional diffusions , a 2014 paper by Bayer and Schoenmakers to the context of stochastic reaction networks (SRNs). We apply this stochastic representation to the computation of efficient approximations of expected values of functionals of SRN bridges, i.e., SRNs conditional on their values in the extremes of given time-intervals. We then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve a set of deterministic optimization problems where the SRNs are replaced by their reaction-rate ordinary differential equations approximation; then, during phase II, we apply the Monte Carlo version of the Expectation-Maximization algorithm to the phase I output. By selecting a set of over-dispersed seeds as initial points in phase I, the output of parallel runs from our two-phase method is a cluster of approximate maximum likelihood estimates. Our results are supported by numerical examples.
A Test by Any Other Name: P Values, Bayes Factors, and Statistical Inference.
Stern, Hal S
2016-01-01
Procedures used for statistical inference are receiving increased scrutiny as the scientific community studies the factors associated with insuring reproducible research. This note addresses recent negative attention directed at p values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye toward better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required.
Statistical Inference and Simulation with StatKey
Quinn, Anne
2016-01-01
While looking for an inexpensive technology package to help students in statistics classes, the author found StatKey, a free Web-based app. Not only is StatKey useful for students' year-end projects, but it is also valuable for helping students learn fundamental content such as the central limit theorem. Using StatKey, students can engage in…
Technology Focus: Using Technology to Explore Statistical Inference
Garofalo, Joe; Juersivich, Nicole
2007-01-01
There is much research that documents what many teachers know, that students struggle with many concepts in probability and statistics. This article presents two sample activities the authors use to help preservice teachers develop ideas about how they can use technology to promote their students' ability to understand mathematics and connect…
Specificity and timescales of cortical adaptation as inferences about natural movie statistics
Snow, Michoel; Coen-Cagli, Ruben; Schwartz, Odelia
2016-01-01
Adaptation is a phenomenological umbrella term under which a variety of temporal contextual effects are grouped. Previous models have shown that some aspects of visual adaptation reflect optimal processing of dynamic visual inputs, suggesting that adaptation should be tuned to the properties of natural visual inputs. However, the link between natural dynamic inputs and adaptation is poorly understood. Here, we extend a previously developed Bayesian modeling framework for spatial contextual effects to the temporal domain. The model learns temporal statistical regularities of natural movies and links these statistics to adaptation in primary visual cortex via divisive normalization, a ubiquitous neural computation. In particular, the model divisively normalizes the present visual input by the past visual inputs only to the degree that these are inferred to be statistically dependent. We show that this flexible form of normalization reproduces classical findings on how brief adaptation affects neuronal selectivity. Furthermore, prior knowledge acquired by the Bayesian model from natural movies can be modified by prolonged exposure to novel visual stimuli. We show that this updating can explain classical results on contrast adaptation. We also simulate the recent finding that adaptation maintains population homeostasis, namely, a balanced level of activity across a population of neurons with different orientation preferences. Consistent with previous disparate observations, our work further clarifies the influence of stimulus-specific and neuronal-specific normalization signals in adaptation. PMID:27699416
Specificity and timescales of cortical adaptation as inferences about natural movie statistics.
Snow, Michoel; Coen-Cagli, Ruben; Schwartz, Odelia
2016-10-01
Adaptation is a phenomenological umbrella term under which a variety of temporal contextual effects are grouped. Previous models have shown that some aspects of visual adaptation reflect optimal processing of dynamic visual inputs, suggesting that adaptation should be tuned to the properties of natural visual inputs. However, the link between natural dynamic inputs and adaptation is poorly understood. Here, we extend a previously developed Bayesian modeling framework for spatial contextual effects to the temporal domain. The model learns temporal statistical regularities of natural movies and links these statistics to adaptation in primary visual cortex via divisive normalization, a ubiquitous neural computation. In particular, the model divisively normalizes the present visual input by the past visual inputs only to the degree that these are inferred to be statistically dependent. We show that this flexible form of normalization reproduces classical findings on how brief adaptation affects neuronal selectivity. Furthermore, prior knowledge acquired by the Bayesian model from natural movies can be modified by prolonged exposure to novel visual stimuli. We show that this updating can explain classical results on contrast adaptation. We also simulate the recent finding that adaptation maintains population homeostasis, namely, a balanced level of activity across a population of neurons with different orientation preferences. Consistent with previous disparate observations, our work further clarifies the influence of stimulus-specific and neuronal-specific normalization signals in adaptation.
Probably not future prediction using probability and statistical inference
Dworsky, Lawrence N
2008-01-01
An engaging, entertaining, and informative introduction to probability and prediction in our everyday lives Although Probably Not deals with probability and statistics, it is not heavily mathematical and is not filled with complex derivations, proofs, and theoretical problem sets. This book unveils the world of statistics through questions such as what is known based upon the information at hand and what can be expected to happen. While learning essential concepts including "the confidence factor" and "random walks," readers will be entertained and intrigued as they move from chapter to chapter. Moreover, the author provides a foundation of basic principles to guide decision making in almost all facets of life including playing games, developing winning business strategies, and managing personal finances. Much of the book is organized around easy-to-follow examples that address common, everyday issues such as: How travel time is affected by congestion, driving speed, and traffic lights Why different gambling ...
Statistical Inference of Biometrical Genetic Model With Cultural Transmission.
Guo, Xiaobo; Ji, Tian; Wang, Xueqin; Zhang, Heping; Zhong, Shouqiang
2013-01-01
Twin and family studies establish the foundation for studying the genetic, environmental and cultural transmission effects for phenotypes. In this work, we make use of the well established statistical methods and theory for mixed models to assess cultural transmission in twin and family studies. Specifically, we address two critical yet poorly understood issues: the model identifiability in assessing cultural transmission for twin and family data and the biases in the estimates when sub-models are used. We apply our models and theory to two real data sets. A simulation is conducted to verify the bias in the estimates of genetic effects when the working model is a sub-model.
Modeling urban air pollution with optimized hierarchical fuzzy inference system.
Tashayo, Behnam; Alimohammadi, Abbas
2016-10-01
Environmental exposure assessments (EEA) and epidemiological studies require urban air pollution models with appropriate spatial and temporal resolutions. Uncertain available data and inflexible models can limit air pollution modeling techniques, particularly in under developing countries. This paper develops a hierarchical fuzzy inference system (HFIS) to model air pollution under different land use, transportation, and meteorological conditions. To improve performance, the system treats the issue as a large-scale and high-dimensional problem and develops the proposed model using a three-step approach. In the first step, a geospatial information system (GIS) and probabilistic methods are used to preprocess the data. In the second step, a hierarchical structure is generated based on the problem. In the third step, the accuracy and complexity of the model are simultaneously optimized with a multiple objective particle swarm optimization (MOPSO) algorithm. We examine the capabilities of the proposed model for predicting daily and annual mean PM2.5 and NO2 and compare the accuracy of the results with representative models from existing literature. The benefits provided by the model features, including probabilistic preprocessing, multi-objective optimization, and hierarchical structure, are precisely evaluated by comparing five different consecutive models in terms of accuracy and complexity criteria. Fivefold cross validation is used to assess the performance of the generated models. The respective average RMSEs and coefficients of determination (R (2)) for the test datasets using proposed model are as follows: daily PM2.5 = (8.13, 0.78), annual mean PM2.5 = (4.96, 0.80), daily NO2 = (5.63, 0.79), and annual mean NO2 = (2.89, 0.83). The obtained results demonstrate that the developed hierarchical fuzzy inference system can be utilized for modeling air pollution in EEA and epidemiological studies.
Inferences about time course of Weber's Law violate statistical principles.
Foster, Rachel M; Franz, Volker H
2013-01-15
Recently, Holmes et al. (2011b) suggested that grasping is only subject to Weber's Law at early but not late points of a grasping movement. They therefore conclude that distinct visual computations and information may guide early and late portions of grasping. Here, we argue that their results can be explained by an interesting statistical artifact, and cannot be considered indicative of the presence or absence of Weber's Law during early portions of grasping. Our argument has implications for other studies using similar methodology (e.g., Heath et al., 2011, Holmes et al., 2011a, 2012), and also for the analysis of temporal data (often called time series) in general. Copyright © 2012 Elsevier Ltd. All rights reserved.
Statistical Inference and Sensitivity to Sampling in 11-Month-Old Infants
Xu, Fei; Denison, Stephanie
2009-01-01
Research on initial conceptual knowledge and research on early statistical learning mechanisms have been, for the most part, two separate enterprises. We report a study with 11-month-old infants investigating whether they are sensitive to sampling conditions and whether they can integrate intentional information in a statistical inference task.…
Statistical inference from capture data on closed animal populations
Otis, David L.; Burnham, Kenneth P.; White, Gary C.; Anderson, David R.
1978-01-01
The estimation of animal abundance is an important problem in both the theoretical and applied biological sciences. Serious work to develop estimation methods began during the 1950s, with a few attempts before that time. The literature on estimation methods has increased tremendously during the past 25 years (Cormack 1968, Seber 1973). However, in large part, the problem remains unsolved. Past efforts toward comprehensive and systematic estimation of density (D) or population size (N) have been inadequate, in general. While more than 200 papers have been published on the subject, one is generally left without a unified approach to the estimation of abundance of an animal population This situation is unfortunate because a number of pressing research problems require such information. In addition, a wide array of environmental assessment studies and biological inventory programs require the estimation of animal abundance. These needs have been further emphasized by the requirement for the preparation of Environmental Impact Statements imposed by the National Environmental Protection Act in 1970. This publication treats inference procedures for certain types of capture data on closed animal populations. This includes multiple capture-recapture studies (variously called capture-mark-recapture, mark-recapture, or tag-recapture studies) involving livetrapping techniques and removal studies involving kill traps or at least temporary removal of captured individuals during the study. Animals do not necessarily need to be physically trapped; visual sightings of marked animals and electrofishing studies also produce data suitable for the methods described in this monograph. To provide a frame of reference for what follows, we give an exampled of a capture-recapture experiment to estimate population size of small animals using live traps. The general field experiment is similar for all capture-recapture studies (a removal study is, of course, slightly different). A typical
Difference to Inference: teaching logical and statistical reasoning through on-line interactivity.
Malloy, T E
2001-05-01
Difference to Inference is an on-line JAVA program that simulates theory testing and falsification through research design and data collection in a game format. The program, based on cognitive and epistemological principles, is designed to support learning of the thinking skills underlying deductive and inductive logic and statistical reasoning. Difference to Inference has database connectivity so that game scores can be counted as part of course grades.
Statistical inference for nanopore sequencing with a biased random walk model.
Emmett, Kevin J; Rosenstein, Jacob K; van de Meent, Jan-Willem; Shepard, Ken L; Wiggins, Chris H
2015-04-21
Nanopore sequencing promises long read-lengths and single-molecule resolution, but the stochastic motion of the DNA molecule inside the pore is, as of this writing, a barrier to high accuracy reads. We develop a method of statistical inference that explicitly accounts for this error, and demonstrate that high accuracy (>99%) sequence inference is feasible even under highly diffusive motion by using a hidden Markov model to jointly analyze multiple stochastic reads. Using this model, we place bounds on achievable inference accuracy under a range of experimental parameters.
Assessing Colour-dependent Occupation Statistics Inferred from Galaxy Group Catalogues
Campbell, Duncan; Hearin, Andrew; Padmanabhan, Nikhil; Berlind, Andreas; Mo, H J; Tinker, Jeremy; Yang, Xiaohu
2015-01-01
We investigate the ability of current implementations of galaxy group finders to recover colour-dependent halo occupation statistics. To test the fidelity of group catalogue inferred statistics, we run three different group finders used in the literature over a mock that includes galaxy colours in a realistic manner. Overall, the resulting mock group catalogues are remarkably similar, and most colour-dependent statistics are recovered with reasonable accuracy. However, it is also clear that certain systematic errors arise as a consequence of correlated errors in group membership determination, central/satellite designation, and halo mass assignment. We introduce a new statistic, the halo transition probability (HTP), which captures the combined impact of all these errors. As a rule of thumb, errors tend to equalize the properties of distinct galaxy populations (i.e. red vs. blue galaxies or centrals vs. satellites), and to result in inferred occupation statistics that are more accurate for red galaxies than f...
Statistical physics, optimization and source coding
Indian Academy of Sciences (India)
Riccardo Zecchina
2005-06-01
The combinatorial problem of satisfying a given set of constraints that depend on N discrete variables is a fundamental one in optimization and coding theory. Even for instances of randomly generated problems, the question ``does there exist an assignment to the variables that satisfies all constraints?" may become extraordinarily difficult to solve in some range of parameters where a glass phase sets in. We shall provide a brief review of the recent advances in the statistical mechanics approach to these satisfiability problems and show how the analytic results have helped to design a new class of message-passing algorithms – the survey propagation (SP) algorithms – that can efficiently solve some combinatorial problems considered intractable. As an application, we discuss how the packing properties of clusters of solutions in randomly generated satisfiability problems can be exploited in the design of simple lossy data compression algorithms.
Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie
2013-01-01
Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials.
Lawrence, C.; Lin, L.; Lisiecki, L. E.; Khider, D.
2014-12-01
The broad goal of this presentation is to demonstrate the utility of probabilistic generative models to capture investigators' knowledge of geological processes and proxy data to draw statistical inferences about unobserved paleoclimatological events. We illustrate how this approach forces investigators to be explicit about their assumptions, and about how probability theory yields results that are a mathematical consequence of these assumptions and the data. We illustrate these ideas with the HMM-Match model that infers common times of sediment deposition in two records and the uncertainty in these inferences in the form of confidence bands. HMM-Match models the sedimentation processes that led to proxy data measured in marine sediment cores. This Bayesian model has three components: 1) a generative probabilistic model that proceeds from the underlying geophysical and geochemical events, specifically the sedimentation events to the generation the proxy data Sedimentation ---> Proxy Data ; 2) a recursive algorithm that reverses the logic of the model to yield inference about the unobserved sedimentation events and the associated alignment of the records based on proxy data Proxy Data ---> Sedimentation (Alignment) ; 3) an expectation maximization algorithm for estimating two unknown parameters. We applied HMM-Match to align 35 Late Pleistocene records to a global benthic d18Ostack and found that the mean width of 95% confidence intervals varies between 3-23 kyr depending on the resolution and noisiness of the core's d18O signal. Confidence bands within individual cores also vary greatly, ranging from ~0 to >40 kyr. Results from this algorithm will allow researchers to examine the robustness of their conclusions with respect to alignment uncertainty. Figure 1 shows the confidence bands for one low resolution record.
Protein and gene model inference based on statistical modeling in k-partite graphs.
Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter
2010-07-06
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Assessing colour-dependent occupation statistics inferred from galaxy group catalogues
Campbell, Duncan; van den Bosch, Frank C.; Hearin, Andrew; Padmanabhan, Nikhil; Berlind, Andreas; Mo, H. J.; Tinker, Jeremy; Yang, Xiaohu
2015-09-01
We investigate the ability of current implementations of galaxy group finders to recover colour-dependent halo occupation statistics. To test the fidelity of group catalogue inferred statistics, we run three different group finders used in the literature over a mock that includes galaxy colours in a realistic manner. Overall, the resulting mock group catalogues are remarkably similar, and most colour-dependent statistics are recovered with reasonable accuracy. However, it is also clear that certain systematic errors arise as a consequence of correlated errors in group membership determination, central/satellite designation, and halo mass assignment. We introduce a new statistic, the halo transition probability (HTP), which captures the combined impact of all these errors. As a rule of thumb, errors tend to equalize the properties of distinct galaxy populations (i.e. red versus blue galaxies or centrals versus satellites), and to result in inferred occupation statistics that are more accurate for red galaxies than for blue galaxies. A statistic that is particularly poorly recovered from the group catalogues is the red fraction of central galaxies as a function of halo mass. Group finders do a good job in recovering galactic conformity, but also have a tendency to introduce weak conformity when none is present. We conclude that proper inference of colour-dependent statistics from group catalogues is best achieved using forward modelling (i.e. running group finders over mock data) or by implementing a correction scheme based on the HTP, as long as the latter is not too strongly model dependent.
Statistical Inference on Memory Structure of Processes and Its Applications to Information Theory
2016-05-12
SECURITY CLASSIFICATION OF: Three areas were investigated. First, new memory models of discrete-time and finitely-valued information sources are...computational and storage complexities are proved. Second, a statistical method is developed to estimate the memory depth of discrete-time and continuously...Distribution Unlimited UU UU UU UU 12-05-2016 15-May-2014 14-Feb-2015 Final Report: Statistical Inference on Memory Structure of Processes and Its Applications
A Linear Immigration-Birth-Death Mo del and Its Statistical Inference
Institute of Scientific and Technical Information of China (English)
ZHANG Shu-lin; WEI Zheng-hong; BI Qiu-xiang
2014-01-01
In this paper, we employ moment generating function to obtain some exact formula of transition probability of immigration-birth-death(IBD) model and discuss the simulation of sample path and statistical inference with complete observations of the IBD process by the exact transition density formula.
Statistical inference for discrete-time samples from affine stochastic delay differential equations
DEFF Research Database (Denmark)
Küchler, Uwe; Sørensen, Michael
2013-01-01
Statistical inference for discrete time observations of an affine stochastic delay differential equation is considered. The main focus is on maximum pseudo-likelihood estimators, which are easy to calculate in practice. A more general class of prediction-based estimating functions is investigated...
Optimized Flood Forecasts Using a Statistical Enemble
Silver, Micha; Fredj, Erick
2016-04-01
The method presented here assembles an optimized flood forecast from a set of consecutive WRF-Hydro simulations by applying coefficients which we derive from straightforward statistical procedures. Several government and research institutions that produce climate data offer ensemble forecasts, which merge predictions from different models to gain a more accurate fit to observed data. Existing ensemble forecasts present climate and weather predictions only. In this research we propose a novel approach to constructing hydrological ensembles for flood forecasting. The ensemble flood forecast is created by combining predictions from the same model, but initiated at different times. An operative flood forecasting system, run by the Israeli Hydrological Service, produces flood forecasts twice daily with a 72 hour forecast period. By collating the output from consecutive simulation runs we have access to multiple overlapping forecasts. We then apply two statistical procedures to blend these consecutive forecasts, resulting in a very close fit to observed flood runoff. We first employ cross-correlation with a time lag to determine a time shift for each of the original, consecutive forecasts. This shift corrects for two possible sources of error: slow or fast moving weather fronts in the base climate data; and mis-calibrations of the WRF-Hydro model in determining the rate of flow of surface runoff and in channels. We apply this time shift to all consecutive forecasts, then run a linear regression with the observed runoff data as the dependent variable and all shifted forecasts as the predictor variables. The solution to the linear regression equation is a set of coefficients that corrects the amplitude errors in the forecasts. These resulting regression coefficients are then applied to the consecutive forecasts producing a statistical ensemble which, by design, closely matches the observed runoff. After performing this procedure over many storm events in the Negev region
Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie
2017-01-01
To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical inference. In his project, the intern, Sam, investigated whether patients' blood could be sent through pneumatic post without influencing the measurement of particular blood components. We asked, in the process of making a statistical inference, how are reasons and actions coordinated to reduce uncertainty? For the analysis, we used the semantic theory of inferentialism, specifically, the concept of webs of reasons and actions—complexes of interconnected reasons for facts and actions; these reasons include premises and conclusions, inferential relations, implications, motives for action, and utility of tools for specific purposes in a particular context. Analysis of interviews with Sam, his supervisor and teacher as well as video data of Sam in the classroom showed that many of Sam's actions aimed to reduce variability, rule out errors, and thus reduce uncertainties so as to arrive at a valid inference. Interestingly, the decisive factor was not the outcome of a t test but of the reference change value, a clinical chemical measure of analytic and biological variability. With insights from this case study, we expect that students can be better supported in connecting statistics with context and in dealing with uncertainty.
Statistical inference and visualization in scale-space for spatially dependent images
Vaughan, Amy
2012-03-01
SiZer (SIgnificant ZERo crossing of the derivatives) is a graphical scale-space visualization tool that allows for statistical inferences. In this paper we develop a spatial SiZer for finding significant features and conducting goodness-of-fit tests for spatially dependent images. The spatial SiZer utilizes a family of kernel estimates of the image and provides not only exploratory data analysis but also statistical inference with spatial correlation taken into account. It is also capable of comparing the observed image with a specific null model being tested by adjusting the statistical inference using an assumed covariance structure. Pixel locations having statistically significant differences between the image and a given null model are highlighted by arrows. The spatial SiZer is compared with the existing independent SiZer via the analysis of simulated data with and without signal on both planar and spherical domains. We apply the spatial SiZer method to the decadal temperature change over some regions of the Earth. © 2011 The Korean Statistical Society.
Elucidating the foundations of statistical inference with 2 x 2 tables.
Choi, Leena; Blume, Jeffrey D; Dupont, William D
2015-01-01
To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice. The analysis of 2 x 2 contingency tables, omnipresent in the scientific literature, is a case in point. Fisher's exact test is routinely used even though it has been fraught with controversy for over 70 years. The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent. The root cause of this controversy lies in the table's origins and the manner in which nuisance parameters are eliminated. However, fundamental statistical principles (e.g., sufficiency, ancillarity, conditionality, and likelihood) can shed light on the controversy and guide our approach in using this test. In this paper, we use these fundamental principles to show how much information is lost when the tables origins are ignored and when various approaches are used to eliminate unknown nuisance parameters. We present novel likelihood contours to aid in the visualization of information loss and show that the information loss is often virtually non-existent. We find that problems arising from the discreteness of the sample space are exacerbated by p-value-based inference. Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences.
Emmert-Streib, Frank; Dehmer, Matthias; Haibe-Kains, Benjamin
2014-01-01
In this paper, we shed light on approaches that are currently used to infer networks from gene expression data with respect to their biological meaning. As we will show, the biological interpretation of these networks depends on the chosen theoretical perspective. For this reason, we distinguish a statistical perspective from a mathematical modeling perspective and elaborate their differences and implications. Our results indicate the imperative need for a genomic network ontology in order to avoid increasing confusion about the biological interpretation of inferred networks, which can be even enhanced by approaches that integrate multiple data sets, respectively, data types.
Inferring the statistical interpretation of quantum mechanics from the classical limit
Gottfried
2000-06-01
It is widely believed that the statistical interpretation of quantum mechanics cannot be inferred from the Schrodinger equation itself, and must be stated as an additional independent axiom. Here I propose that the situation is not so stark. For systems that have both continuous and discrete degrees of freedom (such as coordinates and spin respectively), the statistical interpretation for the discrete variables is implied by requiring that the system's gross motion can be classically described under circumstances specified by the Schrodinger equation. However, this is not a full-fledged derivation of the statistical interpretation because it does not apply to the continuous variables of classical mechanics.
STATISTICAL OPTIMIZATION OF PROCESS VARIABLES FOR ...
African Journals Online (AJOL)
2012-11-03
Nov 3, 2012 ... predictive results. The osmotic dehydration process was optimized for water loss and solutes gain. ... Process Variables Optimization for Osmotic Dehydration of Okra in Sucrose Solution. 371 ...... Science des Aliments, Vol. 10,.
Sotos, Ana Elisa Castro; Vanhoof, Stijn; Van den Noortgate, Wim; Onghena, Patrick
2007-01-01
A solid understanding of "inferential statistics" is of major importance for designing and interpreting empirical results in any scientific discipline. However, students are prone to many misconceptions regarding this topic. This article structurally summarizes and describes these misconceptions by presenting a systematic review of publications…
A Trio of Inference Problems That Could Win You a Nobel Prize in Statistics (If You Help Fund It)
Meng, Xiao-Li
2014-01-01
Statistical inference is a field full of problems whose solutions require the same intellectual force needed to win a Nobel Prize in other scientific fields. Multi-resolution inference is the oldest of the trio. But emerging applications such as individualized medicine have challenged us to the limit: Infer estimands with resolution levels that far exceed those of any feasible estimator. Multi-phase inference is another reality because (big) data are almost never collected, processed, and ana...
Large-Scale Optimization for Bayesian Inference in Complex Systems
Energy Technology Data Exchange (ETDEWEB)
Willcox, Karen [MIT; Marzouk, Youssef [MIT
2013-11-12
The SAGUARO (Scalable Algorithms for Groundwater Uncertainty Analysis and Robust Optimization) Project focused on the development of scalable numerical algorithms for large-scale Bayesian inversion in complex systems that capitalize on advances in large-scale simulation-based optimization and inversion methods. The project was a collaborative effort among MIT, the University of Texas at Austin, Georgia Institute of Technology, and Sandia National Laboratories. The research was directed in three complementary areas: efficient approximations of the Hessian operator, reductions in complexity of forward simulations via stochastic spectral approximations and model reduction, and employing large-scale optimization concepts to accelerate sampling. The MIT--Sandia component of the SAGUARO Project addressed the intractability of conventional sampling methods for large-scale statistical inverse problems by devising reduced-order models that are faithful to the full-order model over a wide range of parameter values; sampling then employs the reduced model rather than the full model, resulting in very large computational savings. Results indicate little effect on the computed posterior distribution. On the other hand, in the Texas--Georgia Tech component of the project, we retain the full-order model, but exploit inverse problem structure (adjoint-based gradients and partial Hessian information of the parameter-to-observation map) to implicitly extract lower dimensional information on the posterior distribution; this greatly speeds up sampling methods, so that fewer sampling points are needed. We can think of these two approaches as ``reduce then sample'' and ``sample then reduce.'' In fact, these two approaches are complementary, and can be used in conjunction with each other. Moreover, they both exploit deterministic inverse problem structure, in the form of adjoint-based gradient and Hessian information of the underlying parameter-to-observation map, to
Statistical inference of the generation probability of T-cell receptors from sequence repertoires.
Murugan, Anand; Mora, Thierry; Walczak, Aleksandra M; Callan, Curtis G
2012-10-02
Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as "VDJ recombination", is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.
Research participant compensation: A matter of statistical inference as well as ethics.
Swanson, David M; Betensky, Rebecca A
2015-11-01
The ethics of compensation of research subjects for participation in clinical trials has been debated for years. One ethical issue of concern is variation among subjects in the level of compensation for identical treatments. Surprisingly, the impact of variation on the statistical inferences made from trial results has not been examined. We seek to identify how variation in compensation may influence any existing dependent censoring in clinical trials, thereby also influencing inference about the survival curve, hazard ratio, or other measures of treatment efficacy. In simulation studies, we consider a model for how compensation structure may influence the censoring model. Under existing dependent censoring, we estimate survival curves under different compensation structures and observe how these structures induce variability in the estimates. We show through this model that if the compensation structure affects the censoring model and dependent censoring is present, then variation in that structure induces variation in the estimates and affects the accuracy of estimation and inference on treatment efficacy. From the perspectives of both ethics and statistical inference, standardization and transparency in the compensation of participants in clinical trials are warranted.
Evaluation of statistical inference on empirical resting state fMRI.
Yang, Xue; Kang, Hakmook; Newton, Allen T; Landman, Bennett A
2014-04-01
Modern statistical inference techniques may be able to improve the sensitivity and specificity of resting state functional magnetic resonance imaging (rs-fMRI) connectivity analysis through more realistic assumptions. In simulation, the advantages of such methods are readily demonstrable. However, quantitative empirical validation remains elusive in vivo as the true connectivity patterns are unknown and noise distributions are challenging to characterize, especially in ultra-high field (e.g., 7T fMRI). Though the physiological characteristics of the fMRI signal are difficult to replicate in controlled phantom studies, it is critical that the performance of statistical techniques be evaluated. The SIMulation EXtrapolation (SIMEX) method has enabled estimation of bias with asymptotically consistent estimators on empirical finite sample data by adding simulated noise . To avoid the requirement of accurate estimation of noise structure, the proposed quantitative evaluation approach leverages the theoretical core of SIMEX to study the properties of inference methods in the face of diminishing data (in contrast to increasing noise). The performance of ordinary and robust inference methods in simulation and empirical rs-fMRI are compared using the proposed quantitative evaluation approach. This study provides a simple, but powerful method for comparing a proxy for inference accuracy using empirical data.
Goyal, Ravi; De Gruttola, Victor
2017-07-25
Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
DEFF Research Database (Denmark)
Fournier, David A.; Skaug, Hans J.; Ancheta, Johnoel
2011-01-01
Many criteria for statistical parameter estimation, such as maximum likelihood, are formulated as a nonlinear optimization problem.Automatic Differentiation Model Builder (ADMB) is a programming framework based on automatic differentiation, aimed at highly nonlinear models with a large number...
Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi
2015-07-01
Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.
Assay optimization: a statistical design of experiments approach.
Altekar, Maneesha; Homon, Carol A; Kashem, Mohammed A; Mason, Steven W; Nelson, Richard M; Patnaude, Lori A; Yingling, Jeffrey; Taylor, Paul B
2007-03-01
With the transition from manual to robotic HTS in the last several years, assay optimization has become a significant bottleneck. Recent advances in robotic liquid handling have made it feasible to reduce assay optimization timelines with the application of statistically designed experiments. When implemented, they can efficiently optimize assays by rapidly identifying significant factors, complex interactions, and nonlinear responses. This article focuses on the use of statistically designed experiments in assay optimization.
Cocco, Simona; Monasson, Rémi; Weigt, Martin
2013-12-01
We consider the Hopfield-Potts model for the covariation between residues in protein families recently introduced in Cocco, Monasson, Weigt (2013). The patterns of the model are inferred from the data within a new gauge, more symmetric in the residues. We compute the statistical error bars on the pattern components. Results are illustrated on real data for a response regulator receiver domain (Pfam ID PF00072) family.
Advances and challenges in the attribution of climate impacts using statistical inference
Hsiang, S. M.
2015-12-01
We discuss recent advances, challenges, and debates in the use of statistical models to infer and attribute climate impacts, such as distinguishing effects of "climate" vs. "weather," accounting for simultaneous environmental changes along multiple dimensions, evaluating multiple sources of uncertainty, accounting for adaptation, and simulating counterfactual economic or social trajectories. We relate these ideas to recent findings linking temperature to economic productivity/violence and tropical cyclones to economic growth.
PyClone: statistical inference of clonal population structure in cancer.
Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P
2014-04-01
We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.
Young children use statistical sampling to infer the preferences of other people.
Kushnir, Tamar; Xu, Fei; Wellman, Henry M
2010-08-01
Psychological scientists use statistical information to determine the workings of human behavior. We argue that young children do so as well. Over the course of a few years, children progress from viewing human actions as intentional and goal directed to reasoning about the psychological causes underlying such actions. Here, we show that preschoolers and 20-month-old infants can use statistical information-namely, a violation of random sampling-to infer that an agent is expressing a preference for one type of toy instead of another type of toy. Children saw a person remove five toys of one type from a container of toys. Preschoolers and infants inferred that the person had a preference for that type of toy when there was a mismatch between the sampled toys and the population of toys in the box. Mere outcome consistency, time spent with the toys, and positive attention toward the toys did not lead children to infer a preference. These findings provide an important demonstration of how statistical learning could underpin the rapid acquisition of early psychological knowledge.
A variance components model for statistical inference on functional connectivity networks.
Fiecas, Mark; Cribben, Ivor; Bahktiari, Reyhaneh; Cummine, Jacqueline
2017-01-24
We propose a variance components linear modeling framework to conduct statistical inference on functional connectivity networks that directly accounts for the temporal autocorrelation inherent in functional magnetic resonance imaging (fMRI) time series data and for the heterogeneity across subjects in the study. The novel method estimates the autocorrelation structure in a nonparametric and subject-specific manner, and estimates the variance due to the heterogeneity using iterative least squares. We apply the new model to a resting-state fMRI study to compare the functional connectivity networks in both typical and reading impaired young adults in order to characterize the resting state networks that are related to reading processes. We also compare the performance of our model to other methods of statistical inference on functional connectivity networks that do not account for the temporal autocorrelation or heterogeneity across the subjects using simulated data, and show that by accounting for these sources of variation and covariation results in more powerful tests for statistical inference.
STATISTICAL INFERENCES FOR VARYING-COEFFICINT MODELS BASED ON LOCALLY WEIGHTED REGRESSION TECHNIQUE
Institute of Scientific and Technical Information of China (English)
梅长林; 张文修; 梁怡
2001-01-01
Some fundamental issues on statistical inferences relating to varying-coefficient regression models are addressed and studied. An exact testing procedure is proposed for checking the goodness of fit of a varying-coefficient model fired by the locally weighted regression technique versus an ordinary linear regression model. Also, an appropriate statistic for testing variation of model parameters over the locations where the observations are collected is constructed and a formal testing approach which is essential to exploring spatial non-stationarity in geography science is suggested.
A normative inference approach for optimal sample sizes in decisions from experience.
Ostwald, Dirk; Starke, Ludger; Hertwig, Ralph
2015-01-01
"Decisions from experience" (DFE) refers to a body of work that emerged in research on behavioral decision making over the last decade. One of the major experimental paradigms employed to study experience-based choice is the "sampling paradigm," which serves as a model of decision making under limited knowledge about the statistical structure of the world. In this paradigm respondents are presented with two payoff distributions, which, in contrast to standard approaches in behavioral economics, are specified not in terms of explicit outcome-probability information, but by the opportunity to sample outcomes from each distribution without economic consequences. Participants are encouraged to explore the distributions until they feel confident enough to decide from which they would prefer to draw from in a final trial involving real monetary payoffs. One commonly employed measure to characterize the behavior of participants in the sampling paradigm is the sample size, that is, the number of outcome draws which participants choose to obtain from each distribution prior to terminating sampling. A natural question that arises in this context concerns the "optimal" sample size, which could be used as a normative benchmark to evaluate human sampling behavior in DFE. In this theoretical study, we relate the DFE sampling paradigm to the classical statistical decision theoretic literature and, under a probabilistic inference assumption, evaluate optimal sample sizes for DFE. In our treatment we go beyond analytically established results by showing how the classical statistical decision theoretic framework can be used to derive optimal sample sizes under arbitrary, but numerically evaluable, constraints. Finally, we critically evaluate the value of deriving optimal sample sizes under this framework as testable predictions for the experimental study of sampling behavior in DFE.
A normative inference approach for optimal sample sizes in decisions from experience
Directory of Open Access Journals (Sweden)
Dirk eOstwald
2015-09-01
Full Text Available Decisions from experience (DFE refers to a body of work that emerged in research on behavioral decision making over the last decade. One of the major experimental paradigms employed to study experienced-based choice is the sampling paradigm, which serves as a model of decision making under limited knowledge about the statistical structure of the world. In this paradigm respondents are presented with two payoff distributions, which, in contrast to standard approaches in behavioral economics, are specified not in terms of explicit outcome-probability information, but by the opportunity to sample outcomes from each distribution without economic consequences. Participants are encouraged to explore the distributions until they feel confident enough to decide from which they would prefer to draw from in a final trial involving real monetary payoffs. One commonly employed measure to characterize the behavior of participants in the sampling paradigm is the sample size, that is, the number of outcome draws which participants choose to obtain from each distribution prior to terminating sampling. A natural question that arises in this context concerns the optimal sample size, which could be used as a normative benchmark to evaluate human sampling behavior in DFE. In this theoretical manuscript, we relate the DFE sampling paradigm to the classical statistical decision theoretic literature and, under a probabilistic inference assumption, evaluate optimal sample sizes for decisions from experience. In our treatment we go beyond analytically established results by showing how the classical statistical decision theoretic framework can be used to derive optimal sample sizes under arbitrary, but numerically evaluable, constraints. Finally, we critically evaluate the value of deriving optimal sample sizes under this framework as testable predictions for the experimental study of sampling behavior in DFE.
Directory of Open Access Journals (Sweden)
Jean-Michel eHupé
2015-02-01
Full Text Available Published studies using functional and structural MRI include many errors in the way data are analyzed and conclusions reported. This was observed when working on a comprehensive review of the neural bases of synesthesia, but these errors are probably endemic to neuroimaging studies. All studies reviewed had based their conclusions using Null Hypothesis Significance Tests (NHST. NHST have yet been criticized since their inception because they are more appropriate for taking decisions related to a Null hypothesis (like in manufacturing than for making inferences about behavioral and neuronal processes. Here I focus on a few key problems of NHST related to brain imaging techniques, and explain why or when we should not rely on significance tests. I also observed that, often, the ill-posed logic of NHST was even not correctly applied, and describe what I identified as common mistakes or at least problematic practices in published papers, in light of what could be considered as the very basics of statistical inference. MRI statistics also involve much more complex issues than standard statistical inference. Analysis pipelines vary a lot between studies, even for those using the same software, and there is no consensus which pipeline is the best. I propose a synthetic view of the logic behind the possible methodological choices, and warn against the usage and interpretation of two statistical methods popular in brain imaging studies, the false discovery rate (FDR procedure and permutation tests. I suggest that current models for the analysis of brain imaging data suffer from serious limitations and call for a revision taking into account the new statistics (confidence intervals logic.
Scientific Computation of Optimal Statistical Estimators
2015-07-13
currently no known recipe or algorithm for dividing the design of a statistical model into a sequence of arithmetic operations. With the purpose of...the numerical evaluation of sophisticated statistical models, these models are still designed by humans because there is currently no known recipe or...editing a chapter in that book (on software aspects). H. Owhadi has been invited (by Dr. Bruce Suter DR-04 USAF AFMC AFRL/RITB) to AFRL, Rome NY to
Graffelman, Jan; Sánchez, Milagros; Cook, Samantha; Moreno, Victor
2013-01-01
In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions.
Directory of Open Access Journals (Sweden)
Zhu Qihui
2006-10-01
Full Text Available Abstract Background The identification of chromosomal homology will shed light on such mysteries of genome evolution as DNA duplication, rearrangement and loss. Several approaches have been developed to detect chromosomal homology based on gene synteny or colinearity. However, the previously reported implementations lack statistical inferences which are essential to reveal actual homologies. Results In this study, we present a statistical approach to detect homologous chromosomal segments based on gene colinearity. We implement this approach in a software package ColinearScan to detect putative colinear regions using a dynamic programming algorithm. Statistical models are proposed to estimate proper parameter values and evaluate the significance of putative homologous regions. Statistical inference, high computational efficiency and flexibility of input data type are three key features of our approach. Conclusion We apply ColinearScan to the Arabidopsis and rice genomes to detect duplicated regions within each species and homologous fragments between these two species. We find many more homologous chromosomal segments in the rice genome than previously reported. We also find many small colinear segments between rice and Arabidopsis genomes.
Local dependence in random graph models: characterization, properties and statistical inference.
Schweinberger, Michael; Handcock, Mark S
2015-06-01
Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with 'ground truth'.
Optimal inference in dynamic models with conditional moment restrictions
DEFF Research Database (Denmark)
Christensen, Bent Jesper; Sørensen, Michael
By an application of the theory of optimal estimating function, optimal in- struments for dynamic models with conditional moment restrictions are derived. The general efficiency bound is provided, along with estimators attaining the bound. It is demonstrated that the optimal estimators are always...... optimal estimator reduces to Newey's. Specification and hypothesis testing in our framework are introduced. We derive the theory of optimal instruments and the associated asymptotic dis- tribution theory for general cases including non-martingale estimating functions and general history dependence...
Robust optimization based upon statistical theory.
Sobotta, B; Söhn, M; Alber, M
2010-08-01
Organ movement is still the biggest challenge in cancer treatment despite advances in online imaging. Due to the resulting geometric uncertainties, the delivered dose cannot be predicted precisely at treatment planning time. Consequently, all associated dose metrics (e.g., EUD and maxDose) are random variables with a patient-specific probability distribution. The method that the authors propose makes these distributions the basis of the optimization and evaluation process. The authors start from a model of motion derived from patient-specific imaging. On a multitude of geometry instances sampled from this model, a dose metric is evaluated. The resulting pdf of this dose metric is termed outcome distribution. The approach optimizes the shape of the outcome distribution based on its mean and variance. This is in contrast to the conventional optimization of a nominal value (e.g., PTV EUD) computed on a single geometry instance. The mean and variance allow for an estimate of the expected treatment outcome along with the residual uncertainty. Besides being applicable to the target, the proposed method also seamlessly includes the organs at risk (OARs). The likelihood that a given value of a metric is reached in the treatment is predicted quantitatively. This information reveals potential hazards that may occur during the course of the treatment, thus helping the expert to find the right balance between the risk of insufficient normal tissue sparing and the risk of insufficient tumor control. By feeding this information to the optimizer, outcome distributions can be obtained where the probability of exceeding a given OAR maximum and that of falling short of a given target goal can be minimized simultaneously. The method is applicable to any source of residual motion uncertainty in treatment delivery. Any model that quantifies organ movement and deformation in terms of probability distributions can be used as basis for the algorithm. Thus, it can generate dose
Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark
2013-01-01
Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
Three enhancements to the inference of statistical protein-DNA potentials.
AlQuraishi, Mohammed; McAdams, Harley H
2013-03-01
The energetics of protein-DNA interactions are often modeled using so-called statistical potentials, that is, energy models derived from the atomic structures of protein-DNA complexes. Many statistical protein-DNA potentials based on differing theoretical assumptions have been investigated, but little attention has been paid to the types of data and the parameter estimation process used in deriving the statistical potentials. We describe three enhancements to statistical potential inference that significantly improve the accuracy of predicted protein-DNA interactions: (i) incorporation of binding energy data of protein-DNA complexes, in conjunction with their X-ray crystal structures, (ii) use of spatially-aware parameter fitting, and (iii) use of ensemble-based parameter fitting. We apply these enhancements to three widely-used statistical potentials and use the resulting enhanced potentials in a structure-based prediction of the DNA binding sites of proteins. These enhancements are directly applicable to all statistical potentials used in protein-DNA modeling, and we show that they can improve the accuracy of predicted DNA binding sites by up to 21%.
An Efficient Forward-Reverse EM Algorithm for Statistical Inference in Stochastic Reaction Networks
Bayer, Christian
2016-01-06
In this work [1], we present an extension of the forward-reverse algorithm by Bayer and Schoenmakers [2] to the context of stochastic reaction networks (SRNs). We then apply this bridge-generation technique to the statistical inference problem of approximating the reaction coefficients based on discretely observed data. To this end, we introduce an efficient two-phase algorithm in which the first phase is deterministic and it is intended to provide a starting point for the second phase which is the Monte Carlo EM Algorithm.
Statistical inference on parametric part for partially linear single-index model
Institute of Scientific and Technical Information of China (English)
ZHANG RiQuan; HUANG ZhenSheng
2009-01-01
Statistical inference on parametric part for the partially linear single-index model (PLSIM) is considered in this paper.A profile least-squares technique for estimating the parametric part is proposed and the asymptotic normality of the profile least-squares estimator is given.Based on the estimator,a generalized likelihood ratio (GLR) test is proposed to test whether parameters on linear part for the model is under a contain linear restricted condition.Under the null model,the proposed GLR statistic follows asymptotically the X2-distribution with the scale constant and degree of freedom independent of the nuisance parameters,known as Wilks phenomenon.Both simulated and real data examples are used to illustrate our proposed methods.
Statistical inference on parametric part for partially linear single-index model
Institute of Scientific and Technical Information of China (English)
无
2009-01-01
Statistical inference on parametric part for the partially linear single-index model (PLSIM) is considered in this paper. A profile least-squares technique for estimating the parametric part is proposed and the asymptotic normality of the profile least-squares estimator is given. Based on the estimator, a generalized likelihood ratio (GLR) test is proposed to test whether parameters on linear part for the model is under a contain linear restricted condition. Under the null model, the proposed GLR statistic follows asymptotically the χ2-distribution with the scale constant and degree of freedom independent of the nuisance parameters, known as Wilks phenomenon. Both simulated and real data examples are used to illustrate our proposed methods.
Exploiting Non-Linear Structure in Astronomical Data for Improved Statistical Inference
Lee, Ann B
2011-01-01
Many estimation problems in astrophysics are highly complex, with high-dimensional, non-standard data objects (e.g., images, spectra, entire distributions, etc.) that are not amenable to formal statistical analysis. To utilize such data and make accurate inferences, it is crucial to transform the data into a simpler, reduced form. Spectral kernel methods are non-linear data transformation methods that efficiently reveal the underlying geometry of observable data. Here we focus on one particular technique: diffusion maps or more generally spectral connectivity analysis (SCA). We give examples of applications in astronomy; e.g., photometric redshift estimation, prototype selection for estimation of star formation history, and supernova light curve classification. We outline some computational and statistical challenges that remain, and we discuss some promising future directions for astronomy and data mining.
Truth, possibility and probability new logical foundations of probability and statistical inference
Chuaqui, R
1991-01-01
Anyone involved in the philosophy of science is naturally drawn into the study of the foundations of probability. Different interpretations of probability, based on competing philosophical ideas, lead to different statistical techniques, and frequently to mutually contradictory consequences. This unique book presents a new interpretation of probability, rooted in the traditional interpretation that was current in the 17th and 18th centuries. Mathematical models are constructed based on this interpretation, and statistical inference and decision theory are applied, including some examples in artificial intelligence, solving the main foundational problems. Nonstandard analysis is extensively developed for the construction of the models and in some of the proofs. Many nonstandard theorems are proved, some of them new, in particular, a representation theorem that asserts that any stochastic process can be approximated by a process defined over a space with equiprobable outcomes.
Large scale statistical inference of signaling pathways from RNAi and microarray data
Directory of Open Access Journals (Sweden)
Poustka Annemarie
2007-10-01
Full Text Available Abstract Background The advent of RNA interference techniques enables the selective silencing of biologically interesting genes in an efficient way. In combination with DNA microarray technology this enables researchers to gain insights into signaling pathways by observing downstream effects of individual knock-downs on gene expression. These secondary effects can be used to computationally reverse engineer features of the upstream signaling pathway. Results In this paper we address this challenging problem by extending previous work by Markowetz et al., who proposed a statistical framework to score networks hypotheses in a Bayesian manner. Our extensions go in three directions: First, we introduce a way to omit the data discretization step needed in the original framework via a calculation based on p-values instead. Second, we show how prior assumptions on the network structure can be incorporated into the scoring scheme using regularization techniques. Third and most important, we propose methods to scale up the original approach, which is limited to around 5 genes, to large scale networks. Conclusion Comparisons of these methods on artificial data are conducted. Our proposed module network is employed to infer the signaling network between 13 genes in the ER-α pathway in human MCF-7 breast cancer cells. Using a bootstrapping approach this reconstruction can be found with good statistical stability. The code for the module network inference method is available in the latest version of the R-package nem, which can be obtained from the Bioconductor homepage.
Jagiella, Nick; Rickert, Dennis; Theis, Fabian J; Hasenauer, Jan
2017-02-22
Mechanistic understanding of multi-scale biological processes, such as cell proliferation in a changing biological tissue, is readily facilitated by computational models. While tools exist to construct and simulate multi-scale models, the statistical inference of the unknown model parameters remains an open problem. Here, we present and benchmark a parallel approximate Bayesian computation sequential Monte Carlo (pABC SMC) algorithm, tailored for high-performance computing clusters. pABC SMC is fully automated and returns reliable parameter estimates and confidence intervals. By running the pABC SMC algorithm for ∼10(6) hr, we parameterize multi-scale models that accurately describe quantitative growth curves and histological data obtained in vivo from individual tumor spheroid growth in media droplets. The models capture the hybrid deterministic-stochastic behaviors of 10(5)-10(6) of cells growing in a 3D dynamically changing nutrient environment. The pABC SMC algorithm reliably converges to a consistent set of parameters. Our study demonstrates a proof of principle for robust, data-driven modeling of multi-scale biological systems and the feasibility of multi-scale model parameterization through statistical inference.
Statistical inference of seabed sound-speed structure in the Gulf of Oman Basin.
Sagers, Jason D; Knobles, David P
2014-06-01
Addressed is the statistical inference of the sound-speed depth profile of a thick soft seabed from broadband sound propagation data recorded in the Gulf of Oman Basin in 1977. The acoustic data are in the form of time series signals recorded on a sparse vertical line array and generated by explosive sources deployed along a 280 km track. The acoustic data offer a unique opportunity to study a deep-water bottom-limited thickly sedimented environment because of the large number of time series measurements, very low seabed attenuation, and auxiliary measurements. A maximum entropy method is employed to obtain a conditional posterior probability distribution (PPD) for the sound-speed ratio and the near-surface sound-speed gradient. The multiple data samples allow for a determination of the average error constraint value required to uniquely specify the PPD for each data sample. Two complicating features of the statistical inference study are addressed: (1) the need to develop an error function that can both utilize the measured multipath arrival structure and mitigate the effects of data errors and (2) the effect of small bathymetric slopes on the structure of the bottom interacting arrivals.
A statistical method for lung tumor segmentation uncertainty in PET images based on user inference.
Zheng, Chaojie; Wang, Xiuying; Feng, Dagan
2015-01-01
PET has been widely accepted as an effective imaging modality for lung tumor diagnosis and treatment. However, standard criteria for delineating tumor boundary from PET are yet to develop largely due to relatively low quality of PET images, uncertain tumor boundary definition, and variety of tumor characteristics. In this paper, we propose a statistical solution to segmentation uncertainty on the basis of user inference. We firstly define the uncertainty segmentation band on the basis of segmentation probability map constructed from Random Walks (RW) algorithm; and then based on the extracted features of the user inference, we use Principle Component Analysis (PCA) to formulate the statistical model for labeling the uncertainty band. We validated our method on 10 lung PET-CT phantom studies from the public RIDER collections [1] and 16 clinical PET studies where tumors were manually delineated by two experienced radiologists. The methods were validated using Dice similarity coefficient (DSC) to measure the spatial volume overlap. Our method achieved an average DSC of 0.878 ± 0.078 on phantom studies and 0.835 ± 0.039 on clinical studies.
Process Model Construction and Optimization Using Statistical Experimental Design,
1988-04-01
Memo No. 88-442 ~LECTE March 1988 31988 %,.. MvAY 1 98 0) PROCESS MODEL CONSTRUCTION AND OPTIMIZATION USING STATISTICAL EXPERIMENTAL DESIGN Emmanuel...Sachs and George Prueger Abstract A methodology is presented for the construction of process models by the combination of physically based mechanistic...253-8138. .% I " Process Model Construction and Optimization Using Statistical Experimental Design" by Emanuel Sachs Assistant Professor and George
Using Fuzzy Inference Systems to Optimize Highway Alignments
Directory of Open Access Journals (Sweden)
Gianluca Dell’Acqua
2012-03-01
Full Text Available The general objective of the research project is to explore innovations in integrating infrastructure and land use planning for transportation corridors. In contexts with environmental impact, the choice of transportation routes must address the sensitivity of current and preexisting conditions. Multi-criteria analyses are used to solve problems of this nature, but they do not define an objective approach on a quantitative basis taking into account some important, but often intrinsically unmeasurable parameters. Fuzzy logic becomes a more effective model as systems become more complex. During the preliminary design phase, fuzzy inference systems offer a contribution to decision-making which is much more complete than a benefits/and costs analysis. In this study, alternative alignment options are considered, combining engineering, social, environmental, and economic factors in the decision-making. The research formalizes a general method useful for analyzing different case studies. The method can be used to justify highway alignment choices in environmental impact study analysis.
Directory of Open Access Journals (Sweden)
Jay Krishna Thakur
2015-08-01
Full Text Available The aim of this work is to investigate new approaches using methods based on statistics and geo-statistics for spatio-temporal optimization of groundwater monitoring networks. The formulated and integrated methods were tested with the groundwater quality data set of Bitterfeld/Wolfen, Germany. Spatially, the monitoring network was optimized using geo-statistical methods. Temporal optimization of the monitoring network was carried out using Sen’s method (1968. For geostatistical network optimization, a geostatistical spatio-temporal algorithm was used to identify redundant wells in 2- and 2.5-D Quaternary and Tertiary aquifers. Influences of interpolation block width, dimension, contaminant association, groundwater flow direction and aquifer homogeneity on statistical and geostatistical methods for monitoring network optimization were analysed. The integrated approach shows 37% and 28% redundancies in the monitoring network in Quaternary aquifer and Tertiary aquifer respectively. The geostatistical method also recommends 41 and 22 new monitoring wells in the Quaternary and Tertiary aquifers respectively. In temporal optimization, an overall optimized sampling interval was recommended in terms of lower quartile (238 days, median quartile (317 days and upper quartile (401 days in the research area of Bitterfeld/Wolfen. Demonstrated methods for improving groundwater monitoring network can be used in real monitoring network optimization with due consideration given to influencing factors.
Coulson, Melissa; Healey, Michelle; Fidler, Fiona; Cumming, Geoff
2010-01-01
A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST), or confidence intervals (CIs). Authors of articles published in psychology, behavioral neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST.
Directory of Open Access Journals (Sweden)
Melissa Coulson
2010-07-01
Full Text Available A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST, or confidence intervals (CIs. Authors of articles published in psychology, behavioural neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST.
Inferences on weather extremes and weather-related disasters: a review of statistical methods
Directory of Open Access Journals (Sweden)
H. Visser
2012-02-01
Full Text Available The study of weather extremes and their impacts, such as weather-related disasters, plays an important role in research of climate change. Due to the great societal consequences of extremes – historically, now and in the future – the peer-reviewed literature on this theme has been growing enormously since the 1980s. Data sources have a wide origin, from century-long climate reconstructions from tree rings to relatively short (30 to 60 yr databases with disaster statistics and human impacts.
When scanning peer-reviewed literature on weather extremes and its impacts, it is noticeable that many different methods are used to make inferences. However, discussions on these methods are rare. Such discussions are important since a particular methodological choice might substantially influence the inferences made. A calculation of a return period of once in 500 yr, based on a normal distribution will deviate from that based on a Gumbel distribution. And the particular choice between a linear or a flexible trend model might influence inferences as well.
In this article, a concise overview of statistical methods applied in the field of weather extremes and weather-related disasters is given. Methods have been evaluated as to stationarity assumptions, the choice for specific probability density functions (PDFs and the availability of uncertainty information. As for stationarity assumptions, the outcome was that good testing is essential. Inferences on extremes may be wrong if data are assumed stationary while they are not. The same holds for the block-stationarity assumption. As for PDF choices it was found that often more than one PDF shape fits to the same data. From a simulation study the conclusion can be drawn that both the generalized extreme value (GEV distribution and the log-normal PDF fit very well to a variety of indicators. The application of the normal and Gumbel distributions is more limited. As for uncertainty, it is
Inferences on weather extremes and weather-related disasters: a review of statistical methods
Directory of Open Access Journals (Sweden)
H. Visser
2011-09-01
Full Text Available The study of weather extremes and their impacts, such as weather-related disasters, plays an important role in climate-change research. Due to the great societal consequences of extremes – historically, now and in the future – the peer-reviewed literature on this theme has been growing enormously since the 1980s. Data sources have a wide origin, from century-long climate reconstructions from tree rings to short databases with disaster statistics and human impacts (30 to 60 yr.
In scanning the peer-reviewed literature on weather extremes and impacts thereof we noticed that many different methods are used to make inferences. However, discussions on methods are rare. Such discussions are important since a particular methodological choice might substantially influence the inferences made. A calculation of a return period of once in 500 yr, based on a normal distribution will deviate from that based on a Gumbel distribution. And the particular choice between a linear or a flexible trend model might influence inferences as well.
In this article we give a concise overview of statistical methods applied in the field of weather extremes and weather-related disasters. Methods have been evaluated as to stationarity assumptions, the choice for specific probability density functions (PDFs and the availability of uncertainty information. As for stationarity we found that good testing is essential. Inferences on extremes may be wrong if data are assumed stationary while they are not. The same holds for the block-stationarity assumption. As for PDF choices we found that often more than one PDF shape fits to the same data. From a simulation study we conclude that both the generalized extreme value (GEV distribution and the log-normal PDF fit very well to a variety of indicators. The application of the normal and Gumbel distributions is more limited. As for uncertainty it is advised to test conclusions on extremes for assumptions underlying
On statistical inference in time series analysis of the evolution of road safety.
Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora
2013-11-01
Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research. Copyright © 2012 Elsevier Ltd. All rights reserved.
Partial Optimality by Pruning for MAP-Inference with General Graphical Models.
Swoboda, Paul; Shekhovtsov, Alexander; Kappes, Jorg Hendrik; Schnorr, Christoph; Savchynskyy, Bogdan
2016-07-01
We consider the energy minimization problem for undirected graphical models, also known as MAP-inference problem for Markov random fields which is NP-hard in general. We propose a novel polynomial time algorithm to obtain a part of its optimal non-relaxed integral solution. Our algorithm is initialized with variables taking integral values in the solution of a convex relaxation of the MAP-inference problem and iteratively prunes those, which do not satisfy our criterion for partial optimality. We show that our pruning strategy is in a certain sense theoretically optimal. Also empirically our method outperforms previous approaches in terms of the number of persistently labelled variables. The method is very general, as it is applicable to models with arbitrary factors of an arbitrary order and can employ any solver for the considered relaxed problem. Our method's runtime is determined by the runtime of the convex relaxation solver for the MAP-inference problem.
Optimal Design of Shock Tube Experiments for Parameter Inference
Bisetti, Fabrizio
2014-01-06
We develop a Bayesian framework for the optimal experimental design of the shock tube experiments which are being carried out at the KAUST Clean Combustion Research Center. The unknown parameters are the pre-exponential parameters and the activation energies in the reaction rate expressions. The control parameters are the initial mixture composition and the temperature. The approach is based on first building a polynomial based surrogate model for the observables relevant to the shock tube experiments. Based on these surrogates, a novel MAP based approach is used to estimate the expected information gain in the proposed experiments, and to select the best experimental set-ups yielding the optimal expected information gains. The validity of the approach is tested using synthetic data generated by sampling the PC surrogate. We finally outline a methodology for validation using actual laboratory experiments, and extending experimental design methodology to the cases where the control parameters are noisy.
Johnson, Eric D; Tubau, Elisabet
2016-09-27
Presenting natural frequencies facilitates Bayesian inferences relative to using percentages. Nevertheless, many people, including highly educated and skilled reasoners, still fail to provide Bayesian responses to these computationally simple problems. We show that the complexity of relational reasoning (e.g., the structural mapping between the presented and requested relations) can help explain the remaining difficulties. With a non-Bayesian inference that required identical arithmetic but afforded a more direct structural mapping, performance was universally high. Furthermore, reducing the relational demands of the task through questions that directed reasoners to use the presented statistics, as compared with questions that prompted the representation of a second, similar sample, also significantly improved reasoning. Distinct error patterns were also observed between these presented- and similar-sample scenarios, which suggested differences in relational-reasoning strategies. On the other hand, while higher numeracy was associated with better Bayesian reasoning, higher-numerate reasoners were not immune to the relational complexity of the task. Together, these findings validate the relational-reasoning view of Bayesian problem solving and highlight the importance of considering not only the presented task structure, but also the complexity of the structural alignment between the presented and requested relations.
Palstra, Friso P; Heyer, Evelyne; Austerlitz, Frédéric
2015-06-01
The demographic history of modern humans constitutes a combination of expansions, colonizations, contractions, and remigrations. The advent of large scale genetic data combined with statistically refined methods facilitates inference of this complex history. Here we study the demographic history of two genetically admixed ethnic groups in Central Asia, an area characterized by high levels of genetic diversity and a history of recurrent immigration. Using Approximate Bayesian Computation, we infer that the timing of admixture markedly differs between the two groups. Admixture in the traditionally agricultural Tajiks could be dated back to the onset of the Neolithic transition in the region, whereas admixture in Kyrgyz is more recent, and may have involved the westward movement of Turkic peoples. These results are confirmed by a coalescent method that fits an isolation-with-migration model to the genetic data, with both Central Asian groups having received gene flow from the extremities of Eurasia. Interestingly, our analyses also uncover signatures of gene flow from Eastern to Western Eurasia during Paleolithic times. In conclusion, the high genetic diversity currently observed in these two Central Asian peoples most likely reflects the effects of recurrent immigration that likely started before historical times. Conversely, conquests during historical times may have had a relatively limited genetic impact. These results emphasize the need for a better understanding of the genetic consequences of transmission of culture and technological innovations, as well as those of invasions and conquests.
Challenges and approaches to statistical design and inference in high-dimensional investigations.
Gadbury, Gary L; Garrett, Karen A; Allison, David B
2009-01-01
Advances in modern technologies have facilitated high-dimensional experiments (HDEs) that generate tremendous amounts of genomic, proteomic, and other "omic" data. HDEs involving whole-genome sequences and polymorphisms, expression levels of genes, protein abundance measurements, and combinations thereof have become a vanguard for new analytic approaches to the analysis of HDE data. Such situations demand creative approaches to the processes of statistical inference, estimation, prediction, classification, and study design. The novel and challenging biological questions asked from HDE data have resulted in many specialized analytic techniques being developed. This chapter discusses some of the unique statistical challenges facing investigators studying high-dimensional biology and describes some approaches being developed by statistical scientists. We have included some focus on the increasing interest in questions involving testing multiple propositions simultaneously, appropriate inferential indicators for the types of questions biologists are interested in, and the need for replication of results across independent studies, investigators, and settings. A key consideration inherent throughout is the challenge in providing methods that a statistician judges to be sound and a biologist finds informative.
Demidenko, Eugene; Williams, Benjamin B; Flood, Ann Barry; Swartz, Harold M
2013-05-30
This paper develops a new metric, the standard error of inverse prediction (SEIP), for a dose-response relationship (calibration curve) when dose is estimated from response via inverse regression. SEIP can be viewed as a generalization of the coefficient of variation to regression problem when x is predicted using y-value. We employ nonstandard statistical methods to treat the inverse prediction, which has an infinite mean and variance due to the presence of a normally distributed variable in the denominator. We develop confidence intervals and hypothesis testing for SEIP on the basis of the normal approximation and using the exact statistical inference based on the noncentral t-distribution. We derive the power functions for both approaches and test them via statistical simulations. The theoretical SEIP, as the ratio of the regression standard error to the slope, is viewed as reciprocal of the signal-to-noise ratio, a popular measure of signal processing. The SEIP, as a figure of merit for inverse prediction, can be used for comparison of calibration curves with different dependent variables and slopes. We illustrate our theory with electron paramagnetic resonance tooth dosimetry for a rapid estimation of the radiation dose received in the event of nuclear terrorism.
Henriques, Ana; Oliveira, Hélia
2016-01-01
This paper reports on the results of a study investigating the potential to embed Informal Statistical Inference in statistical investigations, using TinkerPlots, for assisting 8th grade students' informal inferential reasoning to emerge, particularly their articulations of uncertainty. Data collection included students' written work on a…
Designs and Methods for Association Studies and Population Size Inference in Statistical Genetics
DEFF Research Database (Denmark)
Waltoft, Berit Lindum
diseases. In the second part statistical methods for inferring population history is discussed. Knowledge on e.g. the common ancestor of the human species, possible bottlenecks back in time, and the expected number of rare variants in each genome, may be factors in the full picture of any disease aetiology....... Epidemiology In epidemiology the wording "odds ratio" is used for the estimator of any case-control study independent of the sampling of the controls. This phrase is ambiguous without specications of the sampling schemes of the controls. When controls are sampled among the non-diseased individuals at the end......). The OR is interpreted as the eect of an exposure on the probability of being diseased at the end of follow-up, while the interpretation of the IRR is the eect of an exposure on the probability of becoming diseased. Through a simulation study, the OR from a classical case-control study is shown to be an inconsistent...
Oracle Inequalities and Optimal Inference under Group Sparsity
Lounici, Karim; Tsybakov, Alexandre B; van de Geer, Sara
2010-01-01
We consider the problem of estimating a sparse linear regression vector $\\beta^*$ under a gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate $\\beta^*$. We establish oracle inequalities for the prediction and $\\ell_2$ estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger coherence condition, we derive bounds for the estimation error for mixed $(2,p)$-norms with $1\\le p\\leq \\infty$. When $p=\\infty$, this result implies that a threshold version of the Group Lasso estimator selects the sparsity pattern of $\\beta^*$ with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a mi...
Sex, lies, and statistics: inferences from the child sexual abuse accommodation syndrome.
Weiss, Kenneth J; Curcio Alexander, Julia
2013-01-01
Victims of child sexual abuse often recant their complaints or do not report incidents, making prosecution of offenders difficult. The child with sexual abuse accommodation syndrome (CSAAS) has been used to explain this phenomenon by identifying common behavioral responses. Unlike PTSD but like rape trauma syndrome, CSAAS is not an official diagnostic term and should not be used as evidence of a defendant's guilt or to imply probative value in prosecutions. Courts have grappled with the ideal use of CSAAS in the evaluation of child witness testimony. Expert testimony should be helpful to the jurors without prejudicing them. The New Jersey Supreme Court ruled recently that statistical evidence about CSAAS implying the probability that a child is truthful runs the risk of confusing jury members and biasing them against the defendant. We review the parameters of expert testimony and its admissibility in this area, concluding that statistics about CSAAS should not be used to draw inferences about the victim's credibility or the defendant's guilt.
Sojoudi, Alireza; Goodyear, Bradley G
2016-12-01
Spontaneous fluctuations of blood-oxygenation level-dependent functional magnetic resonance imaging (BOLD fMRI) signals are highly synchronous between brain regions that serve similar functions. This provides a means to investigate functional networks; however, most analysis techniques assume functional connections are constant over time. This may be problematic in the case of neurological disease, where functional connections may be highly variable. Recently, several methods have been proposed to determine moment-to-moment changes in the strength of functional connections over an imaging session (so called dynamic connectivity). Here a novel analysis framework based on a hierarchical observation modeling approach was proposed, to permit statistical inference of the presence of dynamic connectivity. A two-level linear model composed of overlapping sliding windows of fMRI signals, incorporating the fact that overlapping windows are not independent was described. To test this approach, datasets were synthesized whereby functional connectivity was either constant (significant or insignificant) or modulated by an external input. The method successfully determines the statistical significance of a functional connection in phase with the modulation, and it exhibits greater sensitivity and specificity in detecting regions with variable connectivity, when compared with sliding-window correlation analysis. For real data, this technique possesses greater reproducibility and provides a more discriminative estimate of dynamic connectivity than sliding-window correlation analysis. Hum Brain Mapp 37:4566-4580, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Univariate description and bivariate statistical inference: the first step delving into data.
Zhang, Zhongheng
2016-03-01
In observational studies, the first step is usually to explore data distribution and the baseline differences between groups. Data description includes their central tendency (e.g., mean, median, and mode) and dispersion (e.g., standard deviation, range, interquartile range). There are varieties of bivariate statistical inference methods such as Student's t-test, Mann-Whitney U test and Chi-square test, for normal, skews and categorical data, respectively. The article shows how to perform these analyses with R codes. Furthermore, I believe that the automation of the whole workflow is of paramount importance in that (I) it allows for others to repeat your results; (II) you can easily find out how you performed analysis during revision; (III) it spares data input by hand and is less error-prone; and (IV) when you correct your original dataset, the final result can be automatically corrected by executing the codes. Therefore, the process of making a publication quality table incorporating all abovementioned statistics and P values is provided, allowing readers to customize these codes to their own needs.
Valid statistical inference methods for a case-control study with missing data.
Tian, Guo-Liang; Zhang, Chi; Jiang, Xuejun
2016-05-19
The main objective of this paper is to derive the valid sampling distribution of the observed counts in a case-control study with missing data under the assumption of missing at random by employing the conditional sampling method and the mechanism augmentation method. The proposed sampling distribution, called the case-control sampling distribution, can be used to calculate the standard errors of the maximum likelihood estimates of parameters via the Fisher information matrix and to generate independent samples for constructing small-sample bootstrap confidence intervals. Theoretical comparisons of the new case-control sampling distribution with two existing sampling distributions exhibit a large difference. Simulations are conducted to investigate the influence of the three different sampling distributions on statistical inferences. One finding is that the conclusion by the Wald test for testing independency under the two existing sampling distributions could be completely different (even contradictory) from the Wald test for testing the equality of the success probabilities in control/case groups under the proposed distribution. A real cervical cancer data set is used to illustrate the proposed statistical methods.
Emmert-Streib, Frank; Glazko, Galina V; Altay, Gökmen; de Matos Simoes, Ricardo
2012-01-01
In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms.
Morozov, Alexandre
2009-03-01
Formation of nucleosome core particles is a first step towards packaging genomic DNA into chromosomes in living cells. Nucleosomes are formed by wrapping 147 base pairs of DNA around a spool of eight histone proteins. It is reasonable to assume that formation of single nucleosomes in vitro is determined by DNA sequence alone: it costs less elastic energy to wrap a flexible DNA polymer around the histone octamer, and more if the polymer is rigid. However, it is unclear to which extent this effect is important in living cells. Cells have evolved chromatin remodeling enzymes that expend ATP to actively reposition nucleosomes. In addition, nucleosome positioning on long DNA sequences is affected by steric exclusion - many nucleosomes have to form simultaneously without overlap. Currently available bioinformatics methods for predicting nucleosome positions are trained on in vivo data sets and are thus unable to distinguish between extrinsic and intrinsic nucleosome positioning signals. In order to see the relative importance of such signals for nucleosome positioning in vivo, we have developed a model based on a large collection of DNA sequences from nucleosomes reconstituted in vitro by salt dialysis. We have used these data to infer the free energy of nucleosome formation at each position along the genome. The method uses an exact result from the statistical mechanics of classical 1D fluids to infer the free energy landscape from nucleosome occupancy. We will discuss the degree to which in vitro nucleosome occupancy profiles are predictive of in vivo nucleosome positions, and will estimate how many nucleosomes are sequence-specific and how many are positioned purely by steric exclusion. Our approach to nucleosome energetics should be applicable across multiple organisms and genomic regions.
Duchesne, Thierry; Fortin, Daniel; Rivest, Louis-Paul
2015-01-01
Animal movement has a fundamental impact on population and community structure and dynamics. Biased correlated random walks (BCRW) and step selection functions (SSF) are commonly used to study movements. Because no studies have contrasted the parameters and the statistical properties of their estimators for models constructed under these two Lagrangian approaches, it remains unclear whether or not they allow for similar inference. First, we used the Weak Law of Large Numbers to demonstrate that the log-likelihood function for estimating the parameters of BCRW models can be approximated by the log-likelihood of SSFs. Second, we illustrated the link between the two approaches by fitting BCRW with maximum likelihood and with SSF to simulated movement data in virtual environments and to the trajectory of bison (Bison bison L.) trails in natural landscapes. Using simulated and empirical data, we found that the parameters of a BCRW estimated directly from maximum likelihood and by fitting an SSF were remarkably similar. Movement analysis is increasingly used as a tool for understanding the influence of landscape properties on animal distribution. In the rapidly developing field of movement ecology, management and conservation biologists must decide which method they should implement to accurately assess the determinants of animal movement. We showed that BCRW and SSF can provide similar insights into the environmental features influencing animal movements. Both techniques have advantages. BCRW has already been extended to allow for multi-state modeling. Unlike BCRW, however, SSF can be estimated using most statistical packages, it can simultaneously evaluate habitat selection and movement biases, and can easily integrate a large number of movement taxes at multiple scales. SSF thus offers a simple, yet effective, statistical technique to identify movement taxis.
Timing optimization utilizing order statistics and multichannel digital silicon photomultipliers
Mandai, S.; Venialgo, E.; Charbon, E.
2014-01-01
We present an optimization technique utilizing order statistics with a multichannel digital silicon photomultiplier (MD-SiPM) for timing measurements. Accurate timing measurements are required by 3D rangefinding and time-of-flight positron emission tomography, to name a few applications. We have
Timing optimization utilizing order statistics and multichannel digital silicon photomultipliers
Mandai, S.; Venialgo, E.; Charbon, E.
2014-01-01
We present an optimization technique utilizing order statistics with a multichannel digital silicon photomultiplier (MD-SiPM) for timing measurements. Accurate timing measurements are required by 3D rangefinding and time-of-flight positron emission tomography, to name a few applications. We have dem
Timing optimization utilizing order statistics and multichannel digital silicon photomultipliers
Mandai, S.; Venialgo, E.; Charbon, E.
2014-01-01
We present an optimization technique utilizing order statistics with a multichannel digital silicon photomultiplier (MD-SiPM) for timing measurements. Accurate timing measurements are required by 3D rangefinding and time-of-flight positron emission tomography, to name a few applications. We have dem
Statistical Optimality in Multipartite Ranking and Ordinal Regression.
Uematsu, Kazuki; Lee, Yoonkyung
2015-05-01
Statistical optimality in multipartite ranking is investigated as an extension of bipartite ranking. We consider the optimality of ranking algorithms through minimization of the theoretical risk which combines pairwise ranking errors of ordinal categories with differential ranking costs. The extension shows that for a certain class of convex loss functions including exponential loss, the optimal ranking function can be represented as a ratio of weighted conditional probability of upper categories to lower categories, where the weights are given by the misranking costs. This result also bridges traditional ranking methods such as proportional odds model in statistics with various ranking algorithms in machine learning. Further, the analysis of multipartite ranking with different costs provides a new perspective on non-smooth list-wise ranking measures such as the discounted cumulative gain and preference learning. We illustrate our findings with simulation study and real data analysis.
Statistical inference for the additive hazards model under outcome-dependent sampling.
Yu, Jichang; Liu, Yanyan; Sandler, Dale P; Zhou, Haibo
2015-09-01
Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.
Conn, Paul B.; Johnson, Devin S.; Ver Hoef, Jay M.; Hooten, Mevin B.; London, Joshua M.; Boveng, Peter L.
2015-01-01
Ecologists often fit models to survey data to estimate and explain variation in animal abundance. Such models typically require that animal density remains constant across the landscape where sampling is being conducted, a potentially problematic assumption for animals inhabiting dynamic landscapes or otherwise exhibiting considerable spatiotemporal variation in density. We review several concepts from the burgeoning literature on spatiotemporal statistical models, including the nature of the temporal structure (i.e., descriptive or dynamical) and strategies for dimension reduction to promote computational tractability. We also review several features as they specifically relate to abundance estimation, including boundary conditions, population closure, choice of link function, and extrapolation of predicted relationships to unsampled areas. We then compare a suite of novel and existing spatiotemporal hierarchical models for animal count data that permit animal density to vary over space and time, including formulations motivated by resource selection and allowing for closed populations. We gauge the relative performance (bias, precision, computational demands) of alternative spatiotemporal models when confronted with simulated and real data sets from dynamic animal populations. For the latter, we analyze spotted seal (Phoca largha) counts from an aerial survey of the Bering Sea where the quantity and quality of suitable habitat (sea ice) changed dramatically while surveys were being conducted. Simulation analyses suggested that multiple types of spatiotemporal models provide reasonable inference (low positive bias, high precision) about animal abundance, but have potential for overestimating precision. Analysis of spotted seal data indicated that several model formulations, including those based on a log-Gaussian Cox process, had a tendency to overestimate abundance. By contrast, a model that included a population closure assumption and a scale prior on total
Sassenhagen, Jona; Alday, Phillip M
2016-11-01
Experimental research on behavior and cognition frequently rests on stimulus or subject selection where not all characteristics can be fully controlled, even when attempting strict matching. For example, when contrasting patients to controls, variables such as intelligence or socioeconomic status are often correlated with patient status. Similarly, when presenting word stimuli, variables such as word frequency are often correlated with primary variables of interest. One procedure very commonly employed to control for such nuisance effects is conducting inferential tests on confounding stimulus or subject characteristics. For example, if word length is not significantly different for two stimulus sets, they are considered as matched for word length. Such a test has high error rates and is conceptually misguided. It reflects a common misunderstanding of statistical tests: interpreting significance not to refer to inference about a particular population parameter, but about 1. the sample in question, 2. the practical relevance of a sample difference (so that a nonsignificant test is taken to indicate evidence for the absence of relevant differences). We show inferential testing for assessing nuisance effects to be inappropriate both pragmatically and philosophically, present a survey showing its high prevalence, and briefly discuss an alternative in the form of regression including nuisance variables.
Schlichting, Margaret L; Guarino, Katharine F; Schapiro, Anna C; Turk-Browne, Nicholas B; Preston, Alison R
2017-01-01
Despite the importance of learning and remembering across the lifespan, little is known about how the episodic memory system develops to support the extraction of associative structure from the environment. Here, we relate individual differences in volumes along the hippocampal long axis to performance on statistical learning and associative inference tasks-both of which require encoding associations that span multiple episodes-in a developmental sample ranging from ages 6 to 30 years. Relating age to volume, we found dissociable patterns across the hippocampal long axis, with opposite nonlinear volume changes in the head and body. These structural differences were paralleled by performance gains across the age range on both tasks, suggesting improvements in the cross-episode binding ability from childhood to adulthood. Controlling for age, we also found that smaller hippocampal heads were associated with superior behavioral performance on both tasks, consistent with this region's hypothesized role in forming generalized codes spanning events. Collectively, these results highlight the importance of examining hippocampal development as a function of position along the hippocampal axis and suggest that the hippocampal head is particularly important in encoding associative structure across development.
Afshar, Saeed; George, Libin; Tapson, Jonathan; van Schaik, André; Hamilton, Tara J
2014-01-01
This paper describes the Synapto-dendritic Kernel Adapting Neuron (SKAN), a simple spiking neuron model that performs statistical inference and unsupervised learning of spatiotemporal spike patterns. SKAN is the first proposed neuron model to investigate the effects of dynamic synapto-dendritic kernels and demonstrate their computational power even at the single neuron scale. The rule-set defining the neuron is simple: there are no complex mathematical operations such as normalization, exponentiation or even multiplication. The functionalities of SKAN emerge from the real-time interaction of simple additive and binary processes. Like a biological neuron, SKAN is robust to signal and parameter noise, and can utilize both in its operations. At the network scale neurons are locked in a race with each other with the fastest neuron to spike effectively "hiding" its learnt pattern from its neighbors. The robustness to noise, high speed, and simple building blocks not only make SKAN an interesting neuron model in computational neuroscience, but also make it ideal for implementation in digital and analog neuromorphic systems which is demonstrated through an implementation in a Field Programmable Gate Array (FPGA). Matlab, Python, and Verilog implementations of SKAN are available at: http://www.uws.edu.au/bioelectronics_neuroscience/bens/reproducible_research.
Maximum entropy approach to statistical inference for an ocean acoustic waveguide.
Knobles, D P; Sagers, J D; Koch, R A
2012-02-01
A conditional probability distribution suitable for estimating the statistical properties of ocean seabed parameter values inferred from acoustic measurements is derived from a maximum entropy principle. The specification of the expectation value for an error function constrains the maximization of an entropy functional. This constraint determines the sensitivity factor (β) to the error function of the resulting probability distribution, which is a canonical form that provides a conservative estimate of the uncertainty of the parameter values. From the conditional distribution, marginal distributions for individual parameters can be determined from integration over the other parameters. The approach is an alternative to obtaining the posterior probability distribution without an intermediary determination of the likelihood function followed by an application of Bayes' rule. In this paper the expectation value that specifies the constraint is determined from the values of the error function for the model solutions obtained from a sparse number of data samples. The method is applied to ocean acoustic measurements taken on the New Jersey continental shelf. The marginal probability distribution for the values of the sound speed ratio at the surface of the seabed and the source levels of a towed source are examined for different geoacoustic model representations.
Ogunnaike, Babatunde A; Gelmi, Claudio A; Edwards, Jeremy S
2010-05-21
Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to the weakness of microarrays. On the other hand, because of the data volume, treating the entire data set as an ensemble, and developing theoretical distributions for these ensembles provides a framework that plays instead to the strength of microarrays. We present theoretical results that under reasonable assumptions, the distribution of microarray intensities follows the Gamma model, with the biological interpretations of the model parameters emerging naturally. We subsequently establish that for each microarray data set, the fractional intensities can be represented as a mixture of Beta densities, and develop a procedure for using these results to draw statistical inference regarding differential gene expression. We illustrate the results with experimental data from gene expression studies on Deinococcus radiodurans following DNA damage using cDNA microarrays.
Energy Technology Data Exchange (ETDEWEB)
Brannigan, V.M. [Univ. of Maryland, College Park, MD (United States); Bier, V.M. [Univ. of Wisconsin, Madison, WI (United States); Berg, C. [Georgetown Univ. School of Medicine, Washington, DC (United States)
1992-09-01
Toxic torts are product liability cases dealing with alleged injuries due to chemical or biological hazards such as radiation, thalidomide, or Agent Orange. Toxic tort cases typically rely more heavily that other product liability cases on indirect or statistical proof of injury in toxic cases. However, there have been only a handful of actual legal decisions regarding the use of such statistical evidence, and most of those decisions have been inconclusive. Recently, a major case from the Fifth Circuit, involving allegations that Benedectin (a morning sickness drug) caused birth defects, was decided entirely on the basis of statistical inference. This paper examines both the conceptual basis of that decision, and also the relationships among statistical inference, scientific evidence, and the rules of product liability in general. 23 refs.
Post, Thierry
2003-01-01
textabstractThis paper discusses statistical inference on the second-order stochastic dominance (SSD) efficiency of a given portfolio relative to all portfolios formed from a set of assets. We derive the asymptotic sampling distribution of the Post test statistic for SSD efficiency. Unfortunately, a test procedure based on this distribution involves low power in small samples. Bootstrapping is a more powerful approach to sampling error. We use the bootstrap to test if the Fama and French valu...
Frank, Laurence Emmanuelle
2006-01-01
Feature Network Models (FNM) are graphical structures that represent proximity data in a discrete space with the use of features. A statistical inference theory is introduced, based on the additivity properties of networks and the linear regression framework. Considering features as predictor variab
Institute of Scientific and Technical Information of China (English)
LIU Fei; JIANG Pingkai; LEI Qingquan; ZHANG Li; SU Wenqun
2013-01-01
Cables that have been in service for over 20 years in Shanghai,a city with abundant surface water,failed more frequently and induced different cable accidents.This necessitates researches on the insulation aging state of cables working in special circumstances.We performed multi-parameter tests with samples from about 300 cable lines in Shanghai.The tests included water tree investigation,tensile test,dielectric spectroscopy test,thermogravimetric analysis (TGA),fourier transform infrared spectroscopy (FTIR),and electrical aging test.Then,we carried out regression analysis between every two test parameters.Moreover,through two-sample t-Test and analysis of variance (ANOVA) of each test parameter,we analyzed the influences of cable-laying method and sampling section on the degradation of cable insulation respectively.Furthermore,the test parameters which have strong correlation in the regression analysis or significant differences in the t-Test or ANOVA analysis were determined to be the ones identifying the XLPE cable insulation aging state.The thresholds for distinguishing insulation aging states had been also obtained with the aid of statistical analysis and fuzzy clustering.Based on the fuzzy inference,we established a cable insulation aging diagnosis model using the intensity transfer method.The results of regression analysis indicate that the degradation of cable insulation accelerates as the degree of in-service aging increases.This validates the rule that the increase of microscopic imperfections in solid material enhances the dielectric breakdown strength.The results of the two-sample t-Test and the ANOVA indicate that the direct-buried cables are more sensitive to insulation degradation than duct cables.This confirms that the tensile strength and breakdown strength are reliable functional parameters in cable insulation evaluations.A case study further indicates that the proposed diagnosis model based on the fuzzy inference can reflect the comprehensive
Stang, Andreas; Deckert, Markus; Poole, Charles; Rothman, Kenneth J
2017-01-01
Since its introduction in the twentieth century, null hypothesis significance testing (NHST), a hybrid of significance testing (ST) advocated by Fisher and null hypothesis testing (NHT) developed by Neyman and Pearson, has become widely adopted but has also been a source of debate. The principal alternative to such testing is estimation with point estimates and confidence intervals (CI). Our aim was to estimate time trends in NHST, ST, NHT and CI reporting in abstracts of major medical and epidemiological journals. We reviewed 89,533 abstracts in five major medical journals and seven major epidemiological journals, 1975-2014, and estimated time trends in the proportions of abstracts containing statistical inference. In those abstracts, we estimated time trends in the proportions relying on NHST and its major variants, ST and NHT, and in the proportions reporting CIs without explicit use of NHST (CI-only approach). The CI-only approach rose monotonically during the study period in the abstracts of all journals. In Epidemiology abstracts, as a result of the journal's editorial policy, the CI-only approach has always been the most common approach. In the other 11 journals, the NHST approach started out more common, but by 2014, this disparity had narrowed, disappeared or reversed in 9 of them. The exceptions were JAMA, New England Journal of Medicine, and Lancet abstracts, where the predominance of the NHST approach prevailed over time. In 2014, the CI-only approach is as popular as the NHST approach in the abstracts of 4 of the epidemiology journals: the American Journal of Epidemiology (48%), the Annals of Epidemiology (55%), Epidemiology (79%) and the International Journal of Epidemiology (52%). The reporting of CIs without explicitly interpreting them as statistical tests is becoming more common in abstracts, particularly in epidemiology journals. Although NHST is becoming less popular in abstracts of most epidemiology journals studied and some widely read medical
Energy Technology Data Exchange (ETDEWEB)
Marzouk, Youssef [Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
2016-08-31
Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesian inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.
Gross, Kevin; Rosenheim, Jay A
2011-10-01
Secondary pest outbreaks occur when the use of a pesticide to reduce densities of an unwanted target pest species triggers subsequent outbreaks of other pest species. Although secondary pest outbreaks are thought to be familiar in agriculture, their rigorous documentation is made difficult by the challenges of performing randomized experiments at suitable scales. Here, we quantify the frequency and monetary cost of secondary pest outbreaks elicited by early-season applications of broad-spectrum insecticides to control the plant bug Lygus spp. (primarily L. hesperus) in cotton grown in the San Joaquin Valley, California, USA. We do so by analyzing pest-control management practices for 969 cotton fields spanning nine years and 11 private ranches. Our analysis uses statistical methods to draw formal causal inferences from nonexperimental data that have become popular in public health and economics, but that are not yet widely known in ecology or agriculture. We find that, in fields that received an early-season broad-spectrum insecticide treatment for Lygus, 20.2% +/- 4.4% (mean +/- SE) of late-season pesticide costs were attributable to secondary pest outbreaks elicited by the early-season insecticide application for Lygus. In 2010 U.S. dollars, this equates to an additional $6.00 +/- $1.30 (mean +/- SE) per acre in management costs. To the extent that secondary pest outbreaks may be driven by eliminating pests' natural enemies, these figures place a lower bound on the monetary value of ecosystem services provided by native communities of arthropod predators and parasitoids in this agricultural system.
Vincent, Martin; Mundbjerg, Kamilla; Skou Pedersen, Jakob; Liang, Gangning; Jones, Peter A; Ørntoft, Torben Falck; Dalsgaard Sørensen, Karina; Wiuf, Carsten
2017-02-21
The study of epigenetic heterogeneity at the level of individual cells and in whole populations is the key to understanding cellular differentiation, organismal development, and the evolution of cancer. We develop a statistical method, epiG, to infer and differentiate between different epi-allelic haplotypes, annotated with CpG methylation status and DNA polymorphisms, from whole-genome bisulfite sequencing data, and nucleosome occupancy from NOMe-seq data. We demonstrate the capabilities of the method by inferring allele-specific methylation and nucleosome occupancy in cell lines, and colon and tumor samples, and by benchmarking the method against independent experimental data.
ROTS: An R package for reproducibility-optimized statistical testing.
Suomi, Tomi; Seyednasrollah, Fatemeh; Jaakkola, Maria K; Faux, Thomas; Elo, Laura L
2017-05-01
Differential expression analysis is one of the most common types of analyses performed on various biological data (e.g. RNA-seq or mass spectrometry proteomics). It is the process that detects features, such as genes or proteins, showing statistically significant differences between the sample groups under comparison. A major challenge in the analysis is the choice of an appropriate test statistic, as different statistics have been shown to perform well in different datasets. To this end, the reproducibility-optimized test statistic (ROTS) adjusts a modified t-statistic according to the inherent properties of the data and provides a ranking of the features based on their statistical evidence for differential expression between two groups. ROTS has already been successfully applied in a range of different studies from transcriptomics to proteomics, showing competitive performance against other state-of-the-art methods. To promote its widespread use, we introduce here a Bioconductor R package for performing ROTS analysis conveniently on different types of omics data. To illustrate the benefits of ROTS in various applications, we present three case studies, involving proteomics and RNA-seq data from public repositories, including both bulk and single cell data. The package is freely available from Bioconductor (https://www.bioconductor.org/packages/ROTS).
Energy Technology Data Exchange (ETDEWEB)
Perlman, M D
1977-03-01
Research activities of the Department of Statistics, University of Chicago, during the period 15 June 1976 to 14 June 1977 are reviewed. Individual projects were carried out in the following eight areas: statistical computing--approximations to statistical tables and functions; numerical computation of boundary-crossing probabilities for Brownian motion and related stochastic processes; probabilistic methods in statistical mechanics; combining independent tests of significance; small-sample efficiencies of tests and estimates; improved procedures for simultaneous estimation and testing of many correlations; statistical computing and improved regression methods; and comparison of several populations. Brief summaries of these projects are given, along with other administrative information. (RWR)
Sweeney, Elizabeth M; Shinohara, Russell T; Shiee, Navid; Mateen, Farrah J; Chudgar, Avni A; Cuzzocreo, Jennifer L; Calabresi, Peter A; Pham, Dzung L; Reich, Daniel S; Crainiceanu, Ciprian M
2013-01-01
Magnetic resonance imaging (MRI) can be used to detect lesions in the brains of multiple sclerosis (MS) patients and is essential for diagnosing the disease and monitoring its progression. In practice, lesion load is often quantified by either manual or semi-automated segmentation of MRI, which is time-consuming, costly, and associated with large inter- and intra-observer variability. We propose OASIS is Automated Statistical Inference for Segmentation (OASIS), an automated statistical method for segmenting MS lesions in MRI studies. We use logistic regression models incorporating multiple MRI modalities to estimate voxel-level probabilities of lesion presence. Intensity-normalized T1-weighted, T2-weighted, fluid-attenuated inversion recovery and proton density volumes from 131 MRI studies (98 MS subjects, 33 healthy subjects) with manual lesion segmentations were used to train and validate our model. Within this set, OASIS detected lesions with a partial area under the receiver operating characteristic curve for clinically relevant false positive rates of 1% and below of 0.59% (95% CI; [0.50%, 0.67%]) at the voxel level. An experienced MS neuroradiologist compared these segmentations to those produced by LesionTOADS, an image segmentation software that provides segmentation of both lesions and normal brain structures. For lesions, OASIS out-performed LesionTOADS in 74% (95% CI: [65%, 82%]) of cases for the 98 MS subjects. To further validate the method, we applied OASIS to 169 MRI studies acquired at a separate center. The neuroradiologist again compared the OASIS segmentations to those from LesionTOADS. For lesions, OASIS ranked higher than LesionTOADS in 77% (95% CI: [71%, 83%]) of cases. For a randomly selected subset of 50 of these studies, one additional radiologist and one neurologist also scored the images. Within this set, the neuroradiologist ranked OASIS higher than LesionTOADS in 76% (95% CI: [64%, 88%]) of cases, the neurologist 66% (95% CI: [52%, 78
Convertino, Matteo; Mangoubi, Rami S.; Linkov, Igor; Lowry, Nathan C.; Desai, Mukund
2012-01-01
Shannon entropy of pixel intensity.To test our approach, we specifically use the green band of Landsat images for a water conservation area in the Florida Everglades. We validate our predictions against data of species occurrences for a twenty-eight years long period for both wet and dry seasons. Our method correctly predicts 73% of species richness. For species turnover, the newly proposed KL divergence prediction performance is near 100% accurate. This represents a significant improvement over the more conventional Shannon entropy difference, which provides 85% accuracy. Furthermore, we find that changes in soil and water patterns, as measured by fluctuations of the Shannon entropy for the red and blue bands respectively, are positively correlated with changes in vegetation. The fluctuations are smaller in the wet season when compared to the dry season. Conclusions/Significance Texture-based statistical multiresolution image analysis is a promising method for quantifying interseasonal differences and, consequently, the degree to which vegetation, soil, and water patterns vary. The proposed automated method for quantifying species richness and turnover can also provide analysis at higher spatial and temporal resolution than is currently obtainable from expensive monitoring campaigns, thus enabling more prompt, more cost effective inference and decision making support regarding anomalous variations in biodiversity. Additionally, a matrix-based visualization of the statistical multiresolution analysis is presented to facilitate both insight and quick recognition of anomalous data. PMID:23115629
Directory of Open Access Journals (Sweden)
Matteo Convertino
richness, or [Formula: see text] diversity, based on the Shannon entropy of pixel intensity.To test our approach, we specifically use the green band of Landsat images for a water conservation area in the Florida Everglades. We validate our predictions against data of species occurrences for a twenty-eight years long period for both wet and dry seasons. Our method correctly predicts 73% of species richness. For species turnover, the newly proposed KL divergence prediction performance is near 100% accurate. This represents a significant improvement over the more conventional Shannon entropy difference, which provides 85% accuracy. Furthermore, we find that changes in soil and water patterns, as measured by fluctuations of the Shannon entropy for the red and blue bands respectively, are positively correlated with changes in vegetation. The fluctuations are smaller in the wet season when compared to the dry season. CONCLUSIONS/SIGNIFICANCE: Texture-based statistical multiresolution image analysis is a promising method for quantifying interseasonal differences and, consequently, the degree to which vegetation, soil, and water patterns vary. The proposed automated method for quantifying species richness and turnover can also provide analysis at higher spatial and temporal resolution than is currently obtainable from expensive monitoring campaigns, thus enabling more prompt, more cost effective inference and decision making support regarding anomalous variations in biodiversity. Additionally, a matrix-based visualization of the statistical multiresolution analysis is presented to facilitate both insight and quick recognition of anomalous data.
Institute of Scientific and Technical Information of China (English)
Jongbin Im; Jungsun Park
2013-01-01
This paper focuses on a method to solve structural optimization problems using particle swarm optimization (PSO),surrogate models and Bayesian statistics.PSO is a random/stochastic search algorithm designed to find the global optimum.However,PSO needs many evaluations compared to gradient-based optimization.This means PSO increases the analysis costs of structural optimization.One of the methods to reduce computing costs in stochastic optimization is to use approximation techniques.In this work,surrogate models are used,including the response surface method (RSM) and Kriging.When surrogate models are used,there are some errors between exact values and approximated values.These errors decrease the reliability of the optimum values and discard the realistic approximation of using surrogate models.In this paper,Bayesian statistics is used to obtain more reliable results.To verify and confirm the efficiency of the proposed method using surrogate models and Bayesian statistics for stochastic structural optimization,two numerical examples are optimized,and the optimization of a hub sleeve is demonstrated as a practical problem.
Energy Technology Data Exchange (ETDEWEB)
Wallace, D L; Perlman, M D
1980-06-01
This report describes the research activities of the Department of Statistics, University of Chicago, during the period June 15, 1975 to July 30, 1979. Nine research projects are briefly described on the following subjects: statistical computing and approximation techniques in statistics; numerical computation of first passage distributions; probabilities of large deviations; combining independent tests of significance; small-sample efficiencies of tests and estimates; improved procedures for simultaneous estimation and testing of many correlations; statistical computing and improved regression methods; comparison of several populations; and unbiasedness in multivariate statistics. A description of the statistical consultation activities of the Department that are of interest to DOE, in particular, the scientific interactions between the Department and the scientists at Argonne National Laboratories, is given. A list of publications issued during the term of the contract is included.
Bickel, David R
2011-01-01
In statistical practice, whether a Bayesian or frequentist approach is used in inference depends not only on the availability of prior information but also on the attitude taken toward partial prior information, with frequentists tending to be more cautious than Bayesians. The proposed framework defines that attitude in terms of a specified amount of caution, thereby enabling data analysis at the level of caution desired and on the basis of any prior information. The caution parameter represents the attitude toward partial prior information in much the same way as a loss function represents the attitude toward risk. When there is very little prior information and nonzero caution, the resulting inferences correspond to those of the candidate confidence intervals and p-values that are most similar to the credible intervals and hypothesis probabilities of the specified Bayesian posterior. On the other hand, in the presence of a known physical distribution of the parameter, inferences are based only on the corres...
Bagging statistical network inference from large-scale gene expression data.
Ricardo de Matos Simoes; Frank Emmert-Streib
2012-01-01
Modern biology and medicine aim at hunting molecular and cellular causes of biological functions and diseases. Gene regulatory networks (GRN) inferred from gene expression data are considered an important aid for this research by providing a map of molecular interactions. Hence, GRNs have the potential enabling and enhancing basic as well as applied research in the life sciences. In this paper, we introduce a new method called BC3NET for inferring causal gene regulatory networks from large-sc...
Statistical Mechanics Approximation of Biogeography-Based Optimization.
Ma, Haiping; Simon, Dan; Fei, Minrui
2016-01-01
Biogeography-based optimization (BBO) is an evolutionary algorithm inspired by biogeography, which is the study of the migration of species between habitats. This paper derives a mathematical description of the dynamics of BBO based on ideas from statistical mechanics. Rather than trying to exactly predict the evolution of the population, statistical mechanics methods describe the evolution of statistical properties of the population fitness. This paper uses the one-max problem, which has only one optimum and whose fitness function is the number of 1s in a binary string, to derive equations that predict the statistical properties of BBO each generation in terms of those of the previous generation. These equations reveal the effect of migration and mutation on the population fitness dynamics of BBO. The results obtained in this paper are similar to those for the simple genetic algorithm with selection and mutation. The paper also derives equations for the population fitness dynamics of general separable functions, and we find that the results obtained for separable functions are the same as those for the one-max problem. The statistical mechanics theory of BBO is shown to be in good agreement with simulation.
Directory of Open Access Journals (Sweden)
Haines Albert
2010-07-01
Full Text Available Abstract Background The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. Results We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. Conclusion The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible.
Automated assay optimization with integrated statistics and smart robotics.
Taylor, P B; Stewart, F P; Dunnington, D J; Quinn, S T; Schulz, C K; Vaidya, K S; Kurali, E; Lane, T R; Xiong, W C; Sherrill, T P; Snider, J S; Terpstra, N D; Hertzberg, R P
2000-08-01
The transition from manual to robotic high throughput screening (HTS) in the last few years has made it feasible to screen hundreds of thousands of chemical entities against a biological target in less than a month. This rate of HTS has increased the visibility of bottlenecks, one of which is assay optimization. In many organizations, experimental methods are generated by therapeutic teams associated with specific targets and passed on to the HTS group. The resulting assays frequently need to be further optimized to withstand the rigors and time frames inherent in robotic handling. Issues such as protein aggregation, ligand instability, and cellular viability are common variables in the optimization process. The availability of robotics capable of performing rapid random access tasks has made it possible to design optimization experiments that would be either very difficult or impossible for a person to carry out. Our approach to reducing the assay optimization bottleneck has been to unify the highly specific fields of statistics, biochemistry, and robotics. The product of these endeavors is a process we have named automated assay optimization (AAO). This has enabled us to determine final optimized assay conditions, which are often a composite of variables that we would not have arrived at by examining each variable independently. We have applied this approach to both radioligand binding and enzymatic assays and have realized benefits in both time and performance that we would not have predicted a priori. The fully developed AAO process encompasses the ability to download information to a robot and have liquid handling methods automatically created. This evolution in smart robotics has proven to be an invaluable tool for maintaining high-quality data in the context of increasing HTS demands.
Saatchi, R.
2004-03-01
The aim of the study was to automate the identification of a saccade-related visual evoked potential (EP) called the lambda wave. The lambda waves were extracted from single trials of electroencephalogram (EEG) waveforms using independent component analysis (ICA). A trial was a set of EEG waveforms recorded from 64 scalp electrode locations while a saccade was performed. Forty saccade-related EEG trials (recorded from four normal subjects) were used in the study. The number of waveforms per trial was reduced from 64 to 22 by pre-processing. The application of ICA to the resulting waveforms produced 880 components (i.e. 4 subjects × 10 trials per subject × 22 components per trial). The components were divided into 373 lambda and 507 nonlambda waves by visual inspection and then they were represented by one spatial and two temporal features. The classification performance of a Bayesian approach called predictive statistical diagnosis (PSD) was compared with that of a fuzzy logic approach called a fuzzy inference system (FIS). The outputs from the two classification approaches were then combined and the resulting discrimination accuracy was evaluated. For each approach, half the data from the lambda and nonlambda wave categories were used to determine the operating parameters of the classification schemes while the rest (i.e. the validation set) were used to evaluate their classification accuracies. The sensitivity and specificity values when the classification approaches were applied to the lambda wave validation data set were as follows: for the PSD 92.51% and 91.73% respectively, for the FIS 95.72% and 89.76% respectively, and for the combined FIS and PSD approach 97.33% and 97.24% respectively (classification threshold was 0.5). The devised signal processing techniques together with the classification approaches provided for an effective extraction and classification of the single-trial lambda waves. However, as only four subjects were included, it will be
DEFF Research Database (Denmark)
Korneliussen, Thorfinn Sand
that is required before being able to make inferences from the data introduces multiple levels of uncertainty, especially for low-depth data. Therefore methods that take into account the inherent uncertainty are needed for being able to make robust inferences in the downstream analysis of such data. This poses...... a problem for a range of key summary statistics within populations genetics where existing methods are based on the assumption that the true genotypes are known. Motivated by this I present: 1) a new method for the estimation of relatedness between pairs of individuals, 2) a new method for estimating...... neutrality test statistics, which are commonly used for finding genomic regions that have been under natural selection, 3) a new method for estimating individual admixture proportions, which can be used for finding population structure and 4) a general framework for analysis of high-throughput sequencing...
Simpson, Helen Blair; Petkova, Eva; Cheng, Jianfeng; Huppert, Jonathan; Foa, Edna; Liebowitz, Michael R.
2007-01-01
Longitudinal clinical trials in psychiatry have used various statistical methods to examine treatment effects. The validity of the inferences depends upon the different method’s assumptions and whether a given study violates those assumptions. The objective of this paper was to elucidate these complex issues by comparing various methods for handling missing data (e.g., last observation carried forward [LOCF], completer analysis, propensity-adjusted multiple imputation) and for analyzing outco...
Optimization of Statistical Methods Impact on Quantitative Proteomics Data.
Pursiheimo, Anna; Vehmas, Anni P; Afzal, Saira; Suomi, Tomi; Chand, Thaman; Strauss, Leena; Poutanen, Matti; Rokka, Anne; Corthals, Garry L; Elo, Laura L
2015-10-02
As tools for quantitative label-free mass spectrometry (MS) rapidly develop, a consensus about the best practices is not apparent. In the work described here we compared popular statistical methods for detecting differential protein expression from quantitative MS data using both controlled experiments with known quantitative differences for specific proteins used as standards as well as "real" experiments where differences in protein abundance are not known a priori. Our results suggest that data-driven reproducibility-optimization can consistently produce reliable differential expression rankings for label-free proteome tools and are straightforward in their application.
Energy Technology Data Exchange (ETDEWEB)
Perlman, M D
1976-03-01
Efficient methods for approximating percentage points of the largest characteristic root of a Wishart matrix, and other statistical quantities of interest, were developed. Fitting of non-additive models to two-way and higher-way tables and the further development of the SNAP statistical computing system were reported. Numerical procedures for computing boundary-crossing probabilities for Brownian motion and other stochastic processes, such as Bessel diffusions, were implemented. Mathematical techniques from statistical mechanics were applied to obtain a unified treatment of probabilities of large deviations of the sample; in the setting of general topological vector spaces. The application of the Martin boundary to questions about infinite particle systems was studied. A comparative study of classical ''omnibus'' and Bayes procedures for combining several independent noncentral chi-square test statistics was completed. Work proceeds on the related problem of combining noncentral F-tests. A numerical study of the small-sample powers of the Pearson chi-square and likelihood ratio tests for multinomial goodness-of-fit was made. The relationship between asymptotic (large sample) efficiency of test statistics, as measured by Bahadur's concept of exact slope, and actual small-sample efficiency was studied. A promising new technique for the simultaneous estimation of all correlation coefficients in a multivariate population was developed. The method adapts the James--Stein ''shrinking'' estimator (for location parameters) to the estimating of correlations.
Statistical analysis and optimization of igbt manufacturing flow
Directory of Open Access Journals (Sweden)
Baranov V. V.
2015-02-01
Full Text Available The use of computer simulation, design and optimization of power electronic devices formation technological processes can significantly reduce development time, improve the accuracy of calculations, choose the best options for implementation based on strict mathematical analysis. One of the most common power electronic devices is isolated gate bipolar transistor (IGBT, which combines the advantages of MOSFET and bipolar transistor. The achievement of high requirements for these devices is only possible by optimizing device design and manufacturing process parameters. Therefore important and necessary step in the modern cycle of IC design and manufacturing is to carry out the statistical analysis. Procedure of the IGBT threshold voltage optimization was realized. Through screening experiments according to the Plackett-Burman design the most important input parameters (factors that have the greatest impact on the output characteristic was detected. The coefficients of the approximation polynomial adequately describing the relationship between the input parameters and investigated output characteristics ware determined. Using the calculated approximation polynomial, a series of multiple, in a cycle of Monte Carlo, calculations to determine the spread of threshold voltage values at selected ranges of input parameters deviation were carried out. Combinations of input process parameters values were determined randomly by a normal distribution within a given range of changes. The procedure of IGBT process parameters optimization consist a mathematical problem of determining the value range of the input significant structural and technological parameters providing the change of the IGBT threshold voltage in a given interval. The presented results demonstrate the effectiveness of the proposed optimization techniques.
Designs and Methods for Association Studies and Population Size Inference in Statistical Genetics
DEFF Research Database (Denmark)
Waltoft, Berit Lindum
2016-01-01
. Population genetics In population genetics two methods concerning the inference of the population size back in time are described. Both methods are based on the site iii iv frequency spectrum (SFS), and the fact that the expected SFS only depends on the time between coalescent events back in time. The rst...
Statistical Inference in the Wright-Fisher Model Using Allele Frequency Data.
Tataru, Paula; Simonsen, Maria; Bataillon, Thomas; Hobolth, Asger
2016-08-02
The Wright-Fisher model provides an elegant mathematical framework for understanding allele frequency data. In particular, the model can be used to infer the demographic history of species and identify loci under selection. A crucial quantity for inference under the Wright-Fisher model is the distribution of allele frequencies (DAF). Despite the apparent simplicity of the model, the calculation of the DAF is challenging. We review and discuss strategies for approximating the DAF, and how these are used in methods that perform inference from allele frequency data. Various evolutionary forces can be incorporated in the Wright-Fisher model, and we consider these in turn. We begin our review with the basic bi-allelic Wright-Fisher model where random genetic drift is the only evolutionary force. We then consider mutation, migration, and selection. In particular, we compare diffusion-based and moment-based methods in terms of accuracy, computational efficiency, and analytical tractability. We conclude with a brief overview of the multi-allelic process with a general mutation model. [Allele frequency, diffusion, inference, moments, selection, Wright-Fisher.].
Using Alien Coins to Test Whether Simple Inference Is Bayesian
Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.
2016-01-01
Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…
Directory of Open Access Journals (Sweden)
Jing Li
2017-01-01
Full Text Available The goal of this study is to improve thermal comfort and indoor air quality with the adaptive network-based fuzzy inference system (ANFIS model and improved particle swarm optimization (PSO algorithm. A method to optimize air conditioning parameters and installation distance is proposed. The methodology is demonstrated through a prototype case, which corresponds to a typical laboratory in colleges and universities. A laboratory model is established, and simulated flow field information is obtained with the CFD software. Subsequently, the ANFIS model is employed instead of the CFD model to predict indoor flow parameters, and the CFD database is utilized to train ANN input-output “metamodels” for the subsequent optimization. With the improved PSO algorithm and the stratified sequence method, the objective functions are optimized. The functions comprise PMV, PPD, and mean age of air. The optimal installation distance is determined with the hemisphere model. Results show that most of the staff obtain a satisfactory degree of thermal comfort and that the proposed method can significantly reduce the cost of building an experimental device. The proposed methodology can be used to determine appropriate air supply parameters and air conditioner installation position for a pleasant and healthy indoor environment.
Bettonvil, B.W.M.; Del Castillo, E.; Kleijnen, J.P.C.
2007-01-01
This paper studies simulation-based optimization with multiple outputs. It assumes that the simulation model has one random objective function and must satisfy given constraints on the other random outputs. It presents a statistical procedure for test- ing whether a specific input combination
Bettonvil, B.W.M.; Del Castillo, E.; Kleijnen, J.P.C.
2007-01-01
This paper studies simulation-based optimization with multiple outputs. It assumes that the simulation model has one random objective function and must satisfy given constraints on the other random outputs. It presents a statistical procedure for test- ing whether a specific input combination (propo
Statistical and optimal learning with applications in business analytics
Han, Bin
Statistical learning is widely used in business analytics to discover structure or exploit patterns from historical data, and build models that capture relationships between an outcome of interest and a set of variables. Optimal learning on the other hand, solves the operational side of the problem, by iterating between decision making and data acquisition/learning. All too often the two problems go hand-in-hand, which exhibit a feedback loop between statistics and optimization. We apply this statistical/optimal learning concept on a context of fundraising marketing campaign problem arising in many non-profit organizations. Many such organizations use direct-mail marketing to cultivate one-time donors and convert them into recurring contributors. Cultivated donors generate much more revenue than new donors, but also lapse with time, making it important to steadily draw in new cultivations. The direct-mail budget is limited, but better-designed mailings can improve success rates without increasing costs. We first apply statistical learning to analyze the effectiveness of several design approaches used in practice, based on a massive dataset covering 8.6 million direct-mail communications with donors to the American Red Cross during 2009-2011. We find evidence that mailed appeals are more effective when they emphasize disaster preparedness and training efforts over post-disaster cleanup. Including small cards that affirm donors' identity as Red Cross supporters is an effective strategy, while including gift items such as address labels is not. Finally, very recent acquisitions are more likely to respond to appeals that ask them to contribute an amount similar to their most recent donation, but this approach has an adverse effect on donors with a longer history. We show via simulation that a simple design strategy based on these insights has potential to improve success rates from 5.4% to 8.1%. Given these findings, when new scenario arises, however, new data need to
Moura, Lidia Mvr; Westover, M Brandon; Kwasnik, David; Cole, Andrew J; Hsu, John
2017-01-01
The elderly population faces an increasing number of cases of chronic neurological conditions, such as epilepsy and Alzheimer's disease. Because the elderly with epilepsy are commonly excluded from randomized controlled clinical trials, there are few rigorous studies to guide clinical practice. When the elderly are eligible for trials, they either rarely participate or frequently have poor adherence to therapy, thus limiting both generalizability and validity. In contrast, large observational data sets are increasingly available, but are susceptible to bias when using common analytic approaches. Recent developments in causal inference-analytic approaches also introduce the possibility of emulating randomized controlled trials to yield valid estimates. We provide a practical example of the application of the principles of causal inference to a large observational data set of patients with epilepsy. This review also provides a framework for comparative-effectiveness research in chronic neurological conditions.
Bayesian Statistical Inference in Ion-Channel Models with Exact Missed Event Correction.
Epstein, Michael; Calderhead, Ben; Girolami, Mark A; Sivilotti, Lucia G
2016-07-26
The stochastic behavior of single ion channels is most often described as an aggregated continuous-time Markov process with discrete states. For ligand-gated channels each state can represent a different conformation of the channel protein or a different number of bound ligands. Single-channel recordings show only whether the channel is open or shut: states of equal conductance are aggregated, so transitions between them have to be inferred indirectly. The requirement to filter noise from the raw signal further complicates the modeling process, as it limits the time resolution of the data. The consequence of the reduced bandwidth is that openings or shuttings that are shorter than the resolution cannot be observed; these are known as missed events. Postulated models fitted using filtered data must therefore explicitly account for missed events to avoid bias in the estimation of rate parameters and therefore assess parameter identifiability accurately. In this article, we present the first, to our knowledge, Bayesian modeling of ion-channels with exact missed events correction. Bayesian analysis represents uncertain knowledge of the true value of model parameters by considering these parameters as random variables. This allows us to gain a full appreciation of parameter identifiability and uncertainty when estimating values for model parameters. However, Bayesian inference is particularly challenging in this context as the correction for missed events increases the computational complexity of the model likelihood. Nonetheless, we successfully implemented a two-step Markov chain Monte Carlo method that we called "BICME", which performs Bayesian inference in models of realistic complexity. The method is demonstrated on synthetic and real single-channel data from muscle nicotinic acetylcholine channels. We show that parameter uncertainty can be characterized more accurately than with maximum-likelihood methods. Our code for performing inference in these ion channel
Directory of Open Access Journals (Sweden)
Moura LMVR
2016-12-01
Full Text Available Lidia MVR Moura,1,2 M Brandon Westover,1,2 David Kwasnik,1 Andrew J Cole,1,2 John Hsu3–5 1Massachusetts General Hospital, Department of Neurology, Epilepsy Service, Boston, MA, USA; 2Harvard Medical School, Boston, MA, USA; 3Massachusetts General Hospital, Mongan Institute, Boston, MA, USA; 4Harvard Medical School, Department of Medicine, Boston, MA, USA; 5Harvard Medical School, Department of Health Care Policy, Boston, MA, USA Abstract: The elderly population faces an increasing number of cases of chronic neurological conditions, such as epilepsy and Alzheimer’s disease. Because the elderly with epilepsy are commonly excluded from randomized controlled clinical trials, there are few rigorous studies to guide clinical practice. When the elderly are eligible for trials, they either rarely participate or frequently have poor adherence to therapy, thus limiting both generalizability and validity. In contrast, large observational data sets are increasingly available, but are susceptible to bias when using common analytic approaches. Recent developments in causal inference-analytic approaches also introduce the possibility of emulating randomized controlled trials to yield valid estimates. We provide a practical example of the application of the principles of causal inference to a large observational data set of patients with epilepsy. This review also provides a framework for comparative-effectiveness research in chronic neurological conditions. Keywords: epilepsy, epidemiology, neurostatistics, causal inference
Timing optimization utilizing order statistics and multichannel digital silicon photomultipliers.
Mandai, Shingo; Venialgo, Esteban; Charbon, Edoardo
2014-02-01
We present an optimization technique utilizing order statistics with a multichannel digital silicon photomultiplier (MD-SiPM) for timing measurements. Accurate timing measurements are required by 3D rangefinding and time-of-flight positron emission tomography, to name a few applications. We have demonstrated the ability of the MD-SiPM to detect multiple photons, and we verified the advantage of detecting multiple photons assuming incoming photons follow a Gaussian distribution. We have also shown the advantage of utilizing multiple timestamps for estimating time-of-arrivals more accurately. This estimation technique can be widely available in various applications, which have a certain probability density function of incoming photons, such as a scintillator or a laser source.
View discovery in OLAP databases through statistical combinatorial optimization
Energy Technology Data Exchange (ETDEWEB)
Hengartner, Nick W [Los Alamos National Laboratory; Burke, John [PNNL; Critchlow, Terence [PNNL; Joslyn, Cliff [PNNL; Hogan, Emilie [PNNL
2009-01-01
OnLine Analytical Processing (OLAP) is a relational database technology providing users with rapid access to summary, aggregated views of a single large database, and is widely recognized for knowledge representation and discovery in high-dimensional relational databases. OLAP technologies provide intuitive and graphical access to the massively complex set of possible summary views available in large relational (SQL) structured data repositories. The capability of OLAP database software systems to handle data complexity comes at a high price for analysts, presenting them a combinatorially vast space of views of a relational database. We respond to the need to deploy technologies sufficient to allow users to guide themselves to areas of local structure by casting the space of 'views' of an OLAP database as a combinatorial object of all projections and subsets, and 'view discovery' as an search process over that lattice. We equip the view lattice with statistical information theoretical measures sufficient to support a combinatorial optimization process. We outline 'hop-chaining' as a particular view discovery algorithm over this object, wherein users are guided across a permutation of the dimensions by searching for successive two-dimensional views, pushing seen dimensions into an increasingly large background filter in a 'spiraling' search process. We illustrate this work in the context of data cubes recording summary statistics for radiation portal monitors at US ports.
From a Logical Point of View: An Illuminating Perspective in Teaching Statistical Inference
Sowey, Eric R
2005-01-01
Offering perspectives in the teaching of statistics assists students, immersed in the study of detail, to see the leading principles of the subject more clearly. Especially helpful can be a perspective on the logic of statistical inductive reasoning. Such a perspective can bring to prominence a broad principle on which both interval estimation and…
Using Action Research to Develop a Course in Statistical Inference for Workplace-Based Adults
Forbes, Sharleen
2014-01-01
Many adults who need an understanding of statistical concepts have limited mathematical skills. They need a teaching approach that includes as little mathematical context as possible. Iterative participatory qualitative research (action research) was used to develop a statistical literacy course for adult learners informed by teaching in…
DEFF Research Database (Denmark)
Møller, Jesper; Jacobsen, Robert Dahl
We introduce a promising alternative to the usual hidden Markov tree model for Gaussian wavelet coefficients, where their variances are specified by the hidden states and take values in a finite set. In our new model, the hidden states have a similar dependence structure but they are jointly...... Gaussian, and the wavelet coefficients have log-variances equal to the hidden states. We argue why this provides a flexible model where frequentist and Bayesian inference procedures become tractable for estimation of parameters and hidden states. Our methodology is illustrated for denoising and edge...
How to construct the optimal Bayesian measurement in quantum statistical decision theory
Tanaka, Fuyuhiko
Recently, much more attention has been paid to the study aiming at the application of fundamental properties in quantum theory to information processing and technology. In particular, modern statistical methods have been recognized in quantum state tomography (QST), where we have to estimate a density matrix (positive semidefinite matrix of trace one) representing a quantum system from finite data collected in a certain experiment. When the dimension of the density matrix gets large (from a few hundred to millions), it gets a nontrivial problem. While a specific measurement is often given and fixed in QST, we are also able to choose a measurement itself according to the purpose of QST by using qunatum statistical decision theory. Here we propose a practical method to find the best projective measurement in the Bayesian sense. We assume that a prior distribution (e.g., the uniform distribution) and a convex loss function (e.g., the squared error) are given. In many quantum experiments, these assumptions are not so restrictive. We show that the best projective measurement and the best statistical inference based on the measurement outcome exist and that they are obtained explicitly by using the Monte Carlo optimization. The Grant-in-Aid for Scientific Research (B) (No. 26280005).
Salehi, Sohrab; Steif, Adi; Roth, Andrew; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P
2017-03-01
Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.
Directory of Open Access Journals (Sweden)
DeLisi Charles
2003-09-01
Full Text Available Abstract The problem of functional annotation based on homology modeling is primary to current bioinformatics research. Researchers have noted regularities in sequence, structure and even chromosome organization that allow valid functional cross-annotation. However, these methods provide a lot of false negatives due to limited specificity inherent in the system. We want to create an evolutionarily inspired organization of data that would approach the issue of structure-function correlation from a new, probabilistic perspective. Such organization has possible applications in phylogeny, modeling of functional evolution and structural determination. ELISA (Evolutionary Lineage Inferred from Structural Analysis, http://romi.bu.edu/elisa is an online database that combines functional annotation with structure and sequence homology modeling to place proteins into sequence-structure-function "neighborhoods". The atomic unit of the database is a set of sequences and structural templates that those sequences encode. A graph that is built from the structural comparison of these templates is called PDUG (protein domain universe graph. We introduce a method of functional inference through a probabilistic calculation done on an arbitrary set of PDUG nodes. Further, all PDUG structures are mapped onto all fully sequenced proteomes allowing an easy interface for evolutionary analysis and research into comparative proteomics. ELISA is the first database with applicability to evolutionary structural genomics explicitly in mind. Availability: The database is available at http://romi.bu.edu/elisa.
Sandoval-Castellanos, Edson; Palkopoulou, Eleftheria; Dalén, Love
2014-01-01
Inference of population demographic history has vastly improved in recent years due to a number of technological and theoretical advances including the use of ancient DNA. Approximate Bayesian computation (ABC) stands among the most promising methods due to its simple theoretical fundament and exceptional flexibility. However, limited availability of user-friendly programs that perform ABC analysis renders it difficult to implement, and hence programming skills are frequently required. In addition, there is limited availability of programs able to deal with heterochronous data. Here we present the software BaySICS: Bayesian Statistical Inference of Coalescent Simulations. BaySICS provides an integrated and user-friendly platform that performs ABC analyses by means of coalescent simulations from DNA sequence data. It estimates historical demographic population parameters and performs hypothesis testing by means of Bayes factors obtained from model comparisons. Although providing specific features that improve inference from datasets with heterochronous data, BaySICS also has several capabilities making it a suitable tool for analysing contemporary genetic datasets. Those capabilities include joint analysis of independent tables, a graphical interface and the implementation of Markov-chain Monte Carlo without likelihoods.
Hirose, Osamu; Yoshida, Ryo; Imoto, Seiya; Yamaguchi, Rui; Higuchi, Tomoyuki; Charnock-Jones, D Stephen; Print, Cristin; Miyano, Satoru
2008-04-01
Statistical inference of gene networks by using time-course microarray gene expression profiles is an essential step towards understanding the temporal structure of gene regulatory mechanisms. Unfortunately, most of the current studies have been limited to analysing a small number of genes because the length of time-course gene expression profiles is fairly short. One promising approach to overcome such a limitation is to infer gene networks by exploring the potential transcriptional modules which are sets of genes sharing a common function or involved in the same pathway. In this article, we present a novel approach based on the state space model to identify the transcriptional modules and module-based gene networks simultaneously. The state space model has the potential to infer large-scale gene networks, e.g. of order 10(3), from time-course gene expression profiles. Particularly, we succeeded in the identification of a cell cycle system by using the gene expression profiles of Saccharomyces cerevisiae in which the length of the time-course and number of genes were 24 and 4382, respectively. However, when analysing shorter time-course data, e.g. of length 10 or less, the parameter estimations of the state space model often fail due to overfitting. To extend the applicability of the state space model, we provide an approach to use the technical replicates of gene expression profiles, which are often measured in duplicate or triplicate. The use of technical replicates is important for achieving highly-efficient inferences of gene networks with short time-course data. The potential of the proposed method has been demonstrated through the time-course analysis of the gene expression profiles of human umbilical vein endothelial cells (HUVECs) undergoing growth factor deprivation-induced apoptosis. Supplementary Information and the software (TRANS-MNET) are available at http://daweb.ism.ac.jp/~yoshidar/software/ssm/.
DEFF Research Database (Denmark)
Sjöstrand, Karl; Cardenas, Valerie A.; Larsen, Rasmus;
2008-01-01
regression to address this issue, allowing for a gradual introduction of correlation information into the model. We make the connections between ridge regression and voxel-wise procedures explicit and discuss relations to other statistical methods. Results are given on an in-vivo data set of deformation...
DEFF Research Database (Denmark)
Malzahn, Dorthe; Opper, Manfred
2003-01-01
We employ the replica method of statistical physics to study the average case performance of learning systems. The new feature of our theory is that general distributions of data can be treated, which enables applications to real data. For a class of Bayesian prediction models which are based on ...... on Gaussian processes, we discuss Bootstrap estimates for learning curves....
Reflections on fourteen cryptic issues concerning the nature of statistical inference
Kardaun, O.J.W.F.; Salomé, D.; Schaafsma, W; Steerneman, A.G.M.; Willems, J.C; Cox, D.R.
2003-01-01
The present paper provides the original formulation and a joint response of a group of statistically trained scientists to fourteen cryptic issues for discussion, which were handed out to the public by Professor Dr. D.R. Cox after his Bernoulli Lecture 1997 at Groningen University.
Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.
Neuwald, Andrew F; Altschul, Stephen F
2016-12-01
Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes' theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).
Abbey, Craig K.; Eckstein, Miguel P.
2002-01-01
We consider estimation and statistical hypothesis testing on classification images obtained from the two-alternative forced-choice experimental paradigm. We begin with a probabilistic model of task performance for simple forced-choice detection and discrimination tasks. Particular attention is paid to general linear filter models because these models lead to a direct interpretation of the classification image as an estimate of the filter weights. We then describe an estimation procedure for obtaining classification images from observer data. A number of statistical tests are presented for testing various hypotheses from classification images based on some more compact set of features derived from them. As an example of how the methods we describe can be used, we present a case study investigating detection of a Gaussian bump profile.
RaptorX: exploiting structure information for protein alignment by statistical inference
Peng, Jian; Xu, Jinbo
2011-01-01
This paper presents RaptorX, a statistical method for template-based protein modeling that improves alignment accuracy by exploiting structural information in a single or multiple templates. RaptorX consists of three major components: single-template threading, alignment quality prediction and multiple-template threading. This paper summarizes the methods employed by RaptorX and presents its CASP9 result analysis, aiming to identify major bottlenecks with RaptorX and template-based modeling a...
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics
Pohorille, Andrew
2006-01-01
The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described
Inferring earthquake statistics from soft-glass dynamics below yield stress
Kumar, Pinaki; Toschi, Federico; Benzi, Roberto; Trampert, Jeannot
2016-11-01
The current practice to generate synthetic earthquake catalogs employs purely statistical models, mechanical methods based on ad-hoc constitutive friction laws or a combination of the above. We adopt a new numerical approach based on the multi-component Lattice Boltzmann method to simulate yield stress materials. Below yield stress, under shear forcing, we find that the highly intermittent in time, irreversible T1 topological changes in the soft-glass (termed plastic events) bear a statistical resemblance to seismic events, radiating elastic perturbations through the system. Statistical analysis reveals scaling laws for magnitude similar to the Gutenberg-Richter law for quakes, a recurrence time scale with similar slope, a well-defined clustering of events into causal-aftershock sequences and Poisson events leading to the Omori law. Additionally space intermittency reveals a complex multi-fractal structure, like real quakes, and a characterization of the stick-slip behavior in terms of avalanche size and time distribution agrees with the de-pinning transition. The model system once properly tuned using real earthquake data, may help highlighting the origin of scaling in phenomenological seismic power laws. This research was partly funded by the Shell-NWO/FOM programme "Computational sciences for energy research" under Project Number 14CSER022.
Li, Changyang; Wang, Xiuying; Eberl, Stefan; Fulham, Michael; Yin, Yong; Dagan Feng, David
2015-01-01
Automated and general medical image segmentation can be challenging because the foreground and the background may have complicated and overlapping density distributions in medical imaging. Conventional region-based level set algorithms often assume piecewise constant or piecewise smooth for segments, which are implausible for general medical image segmentation. Furthermore, low contrast and noise make identification of the boundaries between foreground and background difficult for edge-based level set algorithms. Thus, to address these problems, we suggest a supervised variational level set segmentation model to harness the statistical region energy functional with a weighted probability approximation. Our approach models the region density distributions by using the mixture-of-mixtures Gaussian model to better approximate real intensity distributions and distinguish statistical intensity differences between foreground and background. The region-based statistical model in our algorithm can intuitively provide better performance on noisy images. We constructed a weighted probability map on graphs to incorporate spatial indications from user input with a contextual constraint based on the minimization of contextual graphs energy functional. We measured the performance of our approach on ten noisy synthetic images and 58 medical datasets with heterogeneous intensities and ill-defined boundaries and compared our technique to the Chan-Vese region-based level set model, the geodesic active contour model with distance regularization, and the random walker model. Our method consistently achieved the highest Dice similarity coefficient when compared to the other methods.
Statistical inference on censored data for targeted clinical trials under enrichment design.
Chen, Chen-Fang; Lin, Jr-Rung; Liu, Jen-Pei
2013-01-01
For the traditional clinical trials, inclusion and exclusion criteria are usually based on some clinical endpoints; the genetic or genomic variability of the trial participants are not totally utilized in the criteria. After completion of the human genome project, the disease targets at the molecular level can be identified and can be utilized for the treatment of diseases. However, the accuracy of diagnostic devices for identification of such molecular targets is usually not perfect. Some of the patients enrolled in targeted clinical trials with a positive result for the molecular target might not have the specific molecular targets. As a result, the treatment effect may be underestimated in the patient population truly with the molecular target. To resolve this issue, under the exponential distribution, we develop inferential procedures for the treatment effects of the targeted drug based on the censored endpoints in the patients truly with the molecular targets. Under an enrichment design, we propose using the expectation-maximization algorithm in conjunction with the bootstrap technique to incorporate the inaccuracy of the diagnostic device for detection of the molecular targets on the inference of the treatment effects. A simulation study was conducted to empirically investigate the performance of the proposed methods. Simulation results demonstrate that under the exponential distribution, the proposed estimator is nearly unbiased with adequate precision, and the confidence interval can provide adequate coverage probability. In addition, the proposed testing procedure can adequately control the size with sufficient power. On the other hand, when the proportional hazard assumption is violated, additional simulation studies show that the type I error rate is not controlled at the nominal level and is an increasing function of the positive predictive value. A numerical example illustrates the proposed procedures.
Directory of Open Access Journals (Sweden)
S. J. Kollet
2015-05-01
Full Text Available In this study, entropy production optimization and inference principles are applied to a synthetic semi-arid hillslope in high-resolution, physics-based simulations. The results suggest that entropy or power is indeed maximized, because of the strong nonlinearity of variably saturated flow and competing processes related to soil moisture fluxes, the depletion of gradients, and the movement of a free water table. Thus, it appears that the maximum entropy production (MEP principle may indeed be applicable to hydrologic systems. In the application to hydrologic system, the free water table constitutes an important degree of freedom in the optimization of entropy production and may also relate the theory to actual observations. In an ensuing analysis, an attempt is made to transfer the complex, "microscopic" hillslope model into a macroscopic model of reduced complexity using the MEP principle as an interference tool to obtain effective conductance coefficients and forces/gradients. The results demonstrate a new approach for the application of MEP to hydrologic systems and may form the basis for fruitful discussions and research in future.
Directory of Open Access Journals (Sweden)
Nelson Hauck Filho
2014-12-01
Full Text Available Researchers dealing with the task of estimating locations of individuals on continuous latent variables may rely on several statistical models described in the literature. However, weighting costs and benefits of using one specific model over alternative models depends on empirical information that is not always clearly available. Therefore, the aim of this simulation study was to compare the performance of seven popular statistical models in providing adequate latent trait estimates in conditions of items difficulties targeted at the sample mean or at the tails of the latent trait distribution. Results suggested an overall tendency of models to provide more accurate estimates of true latent scores when using items targeted at the sample mean of the latent trait distribution. Rating Scale Model, Graded Response Model, and Weighted Least Squares Mean- and Variance-adjusted Confirmatory Factor Analysis yielded the most reliable latent trait estimates, even when applied to inadequate items for the sample distribution of the latent variable. These findings have important implications concerning some popular methodological practices in Psychology and related areas.
Anderson, Eric C
2012-11-08
Advances in genotyping that allow tens of thousands of individuals to be genotyped at a moderate number of single nucleotide polymorphisms (SNPs) permit parentage inference to be pursued on a very large scale. The intergenerational tagging this capacity allows is revolutionizing the management of cultured organisms (cows, salmon, etc.) and is poised to do the same for scientific studies of natural populations. Currently, however, there are no likelihood-based methods of parentage inference which are implemented in a manner that allows them to quickly handle a very large number of potential parents or parent pairs. Here we introduce an efficient likelihood-based method applicable to the specialized case of cultured organisms in which both parents can be reliably sampled. We develop a Markov chain representation for the cumulative number of Mendelian incompatibilities between an offspring and its putative parents and we exploit it to develop a fast algorithm for simulation-based estimates of statistical confidence in SNP-based assignments of offspring to pairs of parents. The method is implemented in the freely available software SNPPIT. We describe the method in detail, then assess its performance in a large simulation study using known allele frequencies at 96 SNPs from ten hatchery salmon populations. The simulations verify that the method is fast and accurate and that 96 well-chosen SNPs can provide sufficient power to identify the correct pair of parents from amongst millions of candidate pairs.
Earthquake statistics inferred from plastic events in soft-glassy materials
Benzi, Roberto; Trampert, Jeannot
2016-01-01
We propose a new approach for generating synthetic earthquake catalogues based on the physics of soft glasses. The continuum approach produces yield-stress materials based on Lattice-Boltzmann simulations. We show that, if the material is stimulated below yield stress, plastic events occur, which have strong similarities with seismic events. Based on a suitable definition of displacement in the continuum, we show that the plastic events obey a Gutenberg-Richter law with exponents similar to those for real earthquakes. We further find that average acceleration, energy release, stress drop and recurrence times scale with the same exponent. The approach is fully self-consistent and all quantities can be calculated at all scales without the need of ad hoc friction or statistical laws. We therefore suggest that our approach may lead to new insight into understanding of the physics connecting the micro and macro scale of earthquakes.
Cafaro, C
2008-01-01
In this paper, we review our novel information geometrodynamical approach to chaos (IGAC) on curved statistical manifolds and we emphasize the usefulness of our information-geometrodynamical entropy (IGE) as an indicator of chaoticity in a simple application. Furthermore, knowing that integrable and chaotic quantum antiferromagnetic Ising chains are characterized by asymptotic logarithmic and linear growths of their operator space entanglement entropies, respectively, we apply our IGAC to present an alternative characterization of such systems. Remarkably, we show that in the former case the IGE exhibits asymptotic logarithmic growth while in the latter case the IGE exhibits asymptotic linear growth. At this stage of its development, IGAC remains an ambitious unifying information-geometric theoretical construct for the study of chaotic dynamics with several unsolved problems. However, based on our recent findings, we believe it could provide an interesting, innovative and potentially powerful way to study and...
Cafaro, C.; Ali, S. A.
2008-12-01
In this paper, we review our novel information-geometrodynamical approach to chaos (IGAC) on curved statistical manifolds and we emphasize the usefulness of our information-geometrodynamical entropy (IGE) as an indicator of chaoticity in a simple application. Furthermore, knowing that integrable and chaotic quantum antiferromagnetic Ising chains are characterized by asymptotic logarithmic and linear growths of their operator space entanglement entropies, respectively, we apply our IGAC to present an alternative characterization of such systems. Remarkably, we show that in the former case the IGE exhibits asymptotic logarithmic growth while in the latter case the IGE exhibits asymptotic linear growth. At this stage of its development, IGAC remains an ambitious unifying information-geometric theoretical construct for the study of chaotic dynamics with several unsolved problems. However, based on our recent findings, we believe that it could provide an interesting, innovative and potentially powerful way to study and understand the very important and challenging problems of classical and quantum chaos.
Statistical Inference in Hidden Markov Models Using k-Segment Constraints.
Titsias, Michalis K; Holmes, Christopher C; Yau, Christopher
2016-01-02
Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward-backward algorithm. In this article, we expand the amount of information we could obtain from the posterior distribution of an HMM by introducing linear-time dynamic programming recursions that, conditional on a user-specified constraint in the number of segments, allow us to (i) find MAP sequences, (ii) compute posterior probabilities, and (iii) simulate sample paths. We collectively call these recursions k-segment algorithms and illustrate their utility using simulated and real examples. We also highlight the prospective and retrospective use of k-segment constraints for fitting HMMs or exploring existing model fits. Supplementary materials for this article are available online.
Statistical inference methods for two crossing survival curves: a comparison of methods.
Li, Huimin; Han, Dong; Hou, Yawen; Chen, Huilin; Chen, Zheng
2015-01-01
A common problem that is encountered in medical applications is the overall homogeneity of survival distributions when two survival curves cross each other. A survey demonstrated that under this condition, which was an obvious violation of the assumption of proportional hazard rates, the log-rank test was still used in 70% of studies. Several statistical methods have been proposed to solve this problem. However, in many applications, it is difficult to specify the types of survival differences and choose an appropriate method prior to analysis. Thus, we conducted an extensive series of Monte Carlo simulations to investigate the power and type I error rate of these procedures under various patterns of crossing survival curves with different censoring rates and distribution parameters. Our objective was to evaluate the strengths and weaknesses of tests in different situations and for various censoring rates and to recommend an appropriate test that will not fail for a wide range of applications. Simulation studies demonstrated that adaptive Neyman's smooth tests and the two-stage procedure offer higher power and greater stability than other methods when the survival distributions cross at early, middle or late times. Even for proportional hazards, both methods maintain acceptable power compared with the log-rank test. In terms of the type I error rate, Renyi and Cramér-von Mises tests are relatively conservative, whereas the statistics of the Lin-Xu test exhibit apparent inflation as the censoring rate increases. Other tests produce results close to the nominal 0.05 level. In conclusion, adaptive Neyman's smooth tests and the two-stage procedure are found to be the most stable and feasible approaches for a variety of situations and censoring rates. Therefore, they are applicable to a wider spectrum of alternatives compared with other tests.
Cui, Ying; Roberts, Mary Roduta
2013-01-01
The goal of this study was to investigate the usefulness of person-fit analysis in validating student score inferences in a cognitive diagnostic assessment. In this study, a two-stage procedure was used to evaluate person fit for a diagnostic test in the domain of statistical hypothesis testing. In the first stage, the person-fit statistic, the…
Ma, Yan; Zhang, Wei; Lyman, Stephen; Huang, Yihe
2017-05-04
To identify the most appropriate imputation method for missing data in the HCUP State Inpatient Databases (SID) and assess the impact of different missing data methods on racial disparities research. HCUP SID. A novel simulation study compared four imputation methods (random draw, hot deck, joint multiple imputation [MI], conditional MI) for missing values for multiple variables, including race, gender, admission source, median household income, and total charges. The simulation was built on real data from the SID to retain their hierarchical data structures and missing data patterns. Additional predictive information from the U.S. Census and American Hospital Association (AHA) database was incorporated into the imputation. Conditional MI prediction was equivalent or superior to the best performing alternatives for all missing data structures and substantially outperformed each of the alternatives in various scenarios. Conditional MI substantially improved statistical inferences for racial health disparities research with the SID. © Health Research and Educational Trust.
Derivative Free Optimization of Complex Systems with the Use of Statistical Machine Learning Models
2015-09-12
AFRL-AFOSR-VA-TR-2015-0278 DERIVATIVE FREE OPTIMIZATION OF COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS Katya Scheinberg...COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS 5a. CONTRACT NUMBER 5b. GRANT NUMBER FA9550-11-1-0239 5c. PROGRAM ELEMENT...developed, which has been the focus of our research. 15. SUBJECT TERMS optimization, Derivative-Free Optimization, Statistical Machine Learning 16. SECURITY
Statistically Based Inference of Physical Rock Properties of Main Rock Types in Germany
Koch, A.; Jorand, R.; Clauser, C.
2009-12-01
A major obstacle for an increased use of geothermal energy often lies in the high success risk for the development of geothermal reservoirs due to the unknown rock properties. In general, the ranges of thermal and hydraulic properties (thermal conductivity, volumetric heat capacity, porosity, permeability) in existing compilations of rock properties are too large to be useful to constrain properties for specific sites. Usually, conservative assumptions are made about these properties, resulting in greater drilling depth and increased exploration cost. In this study, data from direct measurements on more than 600 core samples from different borehole locations and depths enable to derive statistical moments of the desired properties for selected main rock types in the German subsurface. Using modern core scanning technology allowed measuring rapidly thermal conductivity, sonic velocity, and gamma density with high resolution on a large number of samples. In addition, we measured porosity, bulk density, and matrix density based on Archimedes’ principle and pycnometer analysis. Tests on a smaller collection of samples also include specific heat capacity, hydraulic permeability, and radiogenic heat production rate. In addition, we complemented the petrophysical measurements by quantitative mineralogical analysis. The results reveal that even for the same main rock type the results differ significantly depending on geologic age, origin, compaction, and mineralogical composition. For example, water saturated thermal conductivity of tight Palaeozoic sandstones from the Lower Rhine Embayment and the Ruhr Area is 4.0±0.7 W m-1 K-1 and 4.6±0.6 W m-1 K-1, respectively, which is nearly identical to values for the Lower Triassic Bunter sandstone in Southwest-Germany (high in quartz showing an average value of 4.3±0.4 W m-1 K-1). In contrast, saturated thermal conductivity of Upper Triassic sandstone in the same area is considerably lower at 2.5±0.1 W m-1 K-1 (Schilf
Menon, Ravishankar; Gerstoft, Peter; Hodgkiss, William S
2012-11-01
Cross-correlations of diffuse noise fields can be used to extract environmental information. The influence of directional sources (usually ships) often results in a bias of the travel time estimates obtained from the cross-correlations. Using an array of sensors, insights from random matrix theory on the behavior of the eigenvalues of the sample covariance matrix (SCM) in an isotropic noise field are used to isolate the diffuse noise component from the directional sources. A sequential hypothesis testing of the eigenvalues of the SCM reveals eigenvalues dominated by loud sources that are statistical outliers for the assumed diffuse noise model. Travel times obtained from cross-correlations using only the diffuse noise component (i.e., by discarding or attenuating the outliers) converge to the predicted travel times based on the known array sensor spacing and measured sound speed at the site and are stable temporally (i.e., unbiased estimates). Data from the Shallow Water 2006 experiment demonstrates the effectiveness of this approach and that the signal-to-noise ratio builds up as the square root of time, as predicted by theory.
Hasan, A; Maloney, C E
2014-12-01
We compute the effective dispersion and vibrational density of states (DOS) of two-dimensional subregions of three-dimensional face-centered-cubic crystals using both a direct projection-inversion technique and a Monte Carlo simulation based on a common underlying Hamiltonian. We study both a (111) and (100) plane. We show that for any given direction of wave vector, both (111) and (100) show an anomalous ω(2)∼q regime at low q where ω(2) is the energy associated with the given mode and q is its wave number. The ω(2)∼q scaling should be expected to give rise to an anomalous DOS, D(ω), at low ω: D(ω)∼ω(3) rather than the conventional Debye result: D(ω)∼ω(2). The DOS for (100) looks to be consistent with D(ω)∼ω(3), while (111) shows something closer to the conventional Debye result at the smallest frequencies. In addition to the direct projection-inversion calculation, we perform Monte Carlo simulations to study the effects of finite sampling statistics. We show that finite sampling artifacts act as an effective disorder and bias D(ω), giving a behavior closer to D(ω)∼ω(2) than D(ω)∼ω(3). These results should have an important impact on the interpretation of recent studies of colloidal solids where the two-point displacement correlations can be obtained directly in real-space via microscopy.
Inferring statistics of planet populations by means of automated microlensing searches
Dominik, M; Horne, K; Tsapras, Y; Street, R A; Wyrzykowski, L; Hessman, F V; Hundertmark, M; Rahvar, S; Wambsganss, J; Scarpetta, G; Bozza, V; Novati, S Calchi; Mancini, L; Masi, G; Teuber, J; Hinse, T C; Steele, I A; Burgdorf, M J; Kane, S
2008-01-01
(abridged) The study of other worlds is key to understanding our own, and not only provides clues to the origin of our civilization, but also looks into its future. Rather than in identifying nearby systems and learning about their individual properties, the main value of the technique of gravitational microlensing is in obtaining the statistics of planetary populations within the Milky Way and beyond. Only the complementarity of different techniques currently employed promises to yield a complete picture of planet formation that has sufficient predictive power to let us understand how habitable worlds like ours evolve, and how abundant such systems are in the Universe. A cooperative three-step strategy of survey, follow-up, and anomaly monitoring of microlensing targets, realized by means of an automated expert system and a network of ground-based telescopes is ready right now to be used to obtain a first census of cool planets with masses reaching even below that of Earth orbiting K and M dwarfs in two dist...
Bayesian inference – a way to combine statistical data and semantic analysis meaningfully
Directory of Open Access Journals (Sweden)
Eila Lindfors
2011-11-01
Full Text Available This article focuses on presenting the possibilities of Bayesian modelling (Finite Mixture Modelling in the semantic analysis of statistically modelled data. The probability of a hypothesis in relation to the data available is an important question in inductive reasoning. Bayesian modelling allows the researcher to use many models at a time and provides tools to evaluate the goodness of different models. The researcher should always be aware that there is no such thing as the exact probability of an exact event. This is the reason for using probabilistic models. Each model presents a different perspective on the phenomenon in focus, and the researcher has to choose the most probable model with a view to previous research and the knowledge available.The idea of Bayesian modelling is illustrated here by presenting two different sets of data, one from craft science research (n=167 and the other (n=63 from educational research (Lindfors, 2007, 2002. The principles of how to build models and how to combine different profiles are described in the light of the research mentioned.Bayesian modelling is an analysis based on calculating probabilities in relation to a specific set of quantitative data. It is a tool for handling data and interpreting it semantically. The reliability of the analysis arises from an argumentation of which model can be selected from the model space as the basis for an interpretation, and on which arguments.Keywords: method, sloyd, Bayesian modelling, student teachersURN:NBN:no-29959
Can we infer the effect of river works on streamflow statistics?
Ganora, Daniele
2016-04-01
Most of our river network system is affected by anthropic pressure of different types. While climate and land use change are widely recognized as important factors, the effects of "in-line" water infrastructures on the global behavior of the river system is often overlooked. This is due to the difficulty in including local "physical" knowledge (e.g., the hydraulic behavior of a river reach with levees during a flood) into large-scale models that provide a statistical description of the streamflow, and which are the basis for the implementation of resources/risk management plans (e.g., regional models for prediction of the flood frequency curve). This work presents some preliminary applications regarding two widely used hydrological signatures, the flow duration curve and the flood frequency curve. We adopt a pragmatic (i.e., reliable and implementable at large scales) and parsimonious (i.e., that requires a few data) framework of analysis, considering that we operate in a complex system (many river work are already existing, and many others could be built in the future). In the first case, a method is proposed to correct observations of streamflow affected by the presence of upstream run-of-the-river power plants in order to provide the "natural" flow duration curve, using only simple information about the plant (i.e., the maximum intake flow). The second case regards the effects of flood-protection works on the downstream sections, to support the application of along-stream cost-benefit analysis in the flood risk management context. Current applications and possible future developments are discussed.
Constrained statistical inference: sample-size tables for ANOVA and regression.
Vanbrabant, Leonard; Van De Schoot, Rens; Rosseel, Yves
2014-01-01
Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient β1 is larger than β2 and β3. The corresponding hypothesis is H: β1 > {β2, β3} and this is known as an (order) constrained hypothesis. A major advantage of testing such a hypothesis is that power can be gained and inherently a smaller sample size is needed. This article discusses this gain in sample size reduction, when an increasing number of constraints is included into the hypothesis. The main goal is to present sample-size tables for constrained hypotheses. A sample-size table contains the necessary sample-size at a pre-specified power (say, 0.80) for an increasing number of constraints. To obtain sample-size tables, two Monte Carlo simulations were performed, one for ANOVA and one for multiple regression. Three results are salient. First, in an ANOVA the needed sample-size decreases with 30-50% when complete ordering of the parameters is taken into account. Second, small deviations from the imposed order have only a minor impact on the power. Third, at the maximum number of constraints, the linear regression results are comparable with the ANOVA results. However, in the case of fewer constraints, ordering the parameters (e.g., β1 > β2) results in a higher power than assigning a positive or a negative sign to the parameters (e.g., β1 > 0).
Statistical optimization of activity and stability of β-xylanase ...
African Journals Online (AJOL)
STORAGESEVER
2008-10-20
Oct 20, 2008 ... analysis of treatment combinations showed that a regression models of optimization of xylanase activity and ..... Compendium of soil fungi. Regenburg;. ... Integration of Science & Technology for Sustainable Development,.
Helou, E. S.; Zibetti, M. V. W.; Miqueles, E. X.
2017-04-01
We propose the superiorization of incremental algorithms for tomographic image reconstruction. The resulting methods follow a better path in its way to finding the optimal solution for the maximum likelihood problem in the sense that they are closer to the Pareto optimal curve than the non-superiorized techniques. A new scaled gradient iteration is proposed and three superiorization schemes are evaluated. Theoretical analysis of the methods as well as computational experiments with both synthetic and real data are provided.
Hawe, David; Hernández Fernández, Francisco R; O'Suilleabháin, Liam; Huang, Jian; Wolsztynski, Eric; O'Sullivan, Finbarr
2012-05-01
In dynamic mode, positron emission tomography (PET) can be used to track the evolution of injected radio-labelled molecules in living tissue. This is a powerful diagnostic imaging technique that provides a unique opportunity to probe the status of healthy and pathological tissue by examining how it processes substrates. The spatial aspect of PET is well established in the computational statistics literature. This article focuses on its temporal aspect. The interpretation of PET time-course data is complicated because the measured signal is a combination of vascular delivery and tissue retention effects. If the arterial time-course is known, the tissue time-course can typically be expressed in terms of a linear convolution between the arterial time-course and the tissue residue. In statistical terms, the residue function is essentially a survival function - a familiar life-time data construct. Kinetic analysis of PET data is concerned with estimation of the residue and associated functionals such as flow, flux, volume of distribution and transit time summaries. This review emphasises a nonparametric approach to the estimation of the residue based on a piecewise linear form. Rapid implementation of this by quadratic programming is described. The approach provides a reference for statistical assessment of widely used one- and two-compartmental model forms. We illustrate the method with data from two of the most well-established PET radiotracers, (15)O-H(2)O and (18)F-fluorodeoxyglucose, used for assessment of blood perfusion and glucose metabolism respectively. The presentation illustrates the use of two open-source tools, AMIDE and R, for PET scan manipulation and model inference.
Grodwohl, Jean-Baptiste
2016-08-01
This paper gives a detailed narrative of a controversial empirical research in postwar population genetics, the analysis of the cytological polymorphisms of an Australian grasshopper, Moraba scurra. This research intertwined key technical developments in three research areas during the 1950s and 1960s: it involved Dobzhansky's empirical research program on cytological polymorphisms, the mathematical theory of natural selection in two-locus systems, and the building of reliable estimates of natural selection in the wild. In the mid-1950s the cytologist Michael White discovered an interesting case of epistasis in populations of Moraba scurra. These observations received a wide diffusion when theoretical population geneticist Richard Lewontin represented White's data on adaptive topographies. These topographies connected the information on the genetic structure of these grasshopper populations with the formal framework of theoretical population genetics. As such, they appeared at the time as the most successful application of two-locus models of natural selection to an empirical study system. However, this connection generated paradoxical results: in the landscapes, all grasshopper populations were located on a ridge (an unstable equilibrium) while they were expected to reach a peak. This puzzling result fueled years of research and triggered a controversy attracting contributors from Australia, the United States and the United Kingdom. While the original problem seemed, at first, purely empirical, the subsequent controversy affected the main mathematical tools used in the study of two-gene systems under natural selection. Adaptive topographies and their underlying mathematical structure, Wright's mean fitness equations, were submitted to close scrutiny. Suspicion eventually shifted to the statistical machinery used in data analysis, reflecting the crucial role of statistical inference in applied population genetics. In the 1950s and 1960s, population geneticists were
Statistical optimization for decolorization of textile dyes using Trametes versicolor.
Srinivasan, S V; Murthy, D V S
2009-06-15
The conventional treatment of dark coloured textile wastewater using chemical coagulation generates large volume of sludge, which requires further treatment and disposal. In the present investigation, a systematic optimization study of the important variables influencing the decolorization of Reactive Orange-16 (RO-16) and Reactive Red-35 (RR-35) dyes by the white-rot fungus (Trametes versicolor) was carried out. A full factorial central composite design was employed for experimental design and optimization of results. The effect of concentrations of dye, glucose and ammonium chloride on decolorization was studied and optimized using Response Surface Methodology (RSM). Maximum decolorization of 94.5% and 90.7% for RO-16 and RR-35 was obtained at optimum concentrations of dye, glucose and ammonium chloride i.e., 0.66, 17.50 and 2.69 g/L for RO-16 and 0.68, 16.67 and 2.13 g/L for RR-35, respectively.
A Simulation Approach to Statistical Estimation of Multiperiod Optimal Portfolios
Directory of Open Access Journals (Sweden)
Hiroshi Shiraishi
2012-01-01
Full Text Available This paper discusses a simulation-based method for solving discrete-time multiperiod portfolio choice problems under AR(1 process. The method is applicable even if the distributions of return processes are unknown. We first generate simulation sample paths of the random returns by using AR bootstrap. Then, for each sample path and each investment time, we obtain an optimal portfolio estimator, which optimizes a constant relative risk aversion (CRRA utility function. When an investor considers an optimal investment strategy with portfolio rebalancing, it is convenient to introduce a value function. The most important difference between single-period portfolio choice problems and multiperiod ones is that the value function is time dependent. Our method takes care of the time dependency by using bootstrapped sample paths. Numerical studies are provided to examine the validity of our method. The result shows the necessity to take care of the time dependency of the value function.
Directory of Open Access Journals (Sweden)
Mitsuhiro Nakamura
2016-07-01
Full Text Available In strategic situations, humans infer the state of mind of others, e.g., emotions or intentions, adapting their behavior appropriately. Nonetheless, evolutionary studies of cooperation typically focus only on reaction norms, e.g., tit for tat, whereby individuals make their next decisions by only considering the observed outcome rather than focusing on their opponent’s state of mind. In this paper, we analyze repeated two-player games in which players explicitly infer their opponent’s unobservable state of mind. Using Markov decision processes, we investigate optimal decision rules and their performance in cooperation. The state-of-mind inference requires Bayesian belief calculations, which is computationally intensive. We therefore study two models in which players simplify these belief calculations. In Model 1, players adopt a heuristic to approximately infer their opponent’s state of mind, whereas in Model 2, players use information regarding their opponent’s previous state of mind, obtained from external evidence, e.g., emotional signals. We show that players in both models reach almost optimal behavior through commitment-like decision rules by which players are committed to selecting the same action regardless of their opponent’s behavior. These commitment-like decision rules can enhance or reduce cooperation depending on the opponent’s strategy.
Optimization of statistical methods impact on quantitative proteomics data
Pursiheimo, A.; Vehmas, A.P.; Afzal, S.; Suomi, T.; Chand, T.; Strauss, L.; Poutanen, M.; Rokka, A.; Corthals, G.L.; Elo, L.L.
2015-01-01
As tools for quantitative label-free mass spectrometry (MS) rapidly develop, a consensus about the best practices is not apparent. In the work described here we compared popular statistical methods for detecting differential protein expression from quantitative MS data using both controlled
Feldman, Naomi H.; Griffiths, Thomas L.; Morgan, James L.
2009-01-01
A variety of studies have demonstrated that organizing stimuli into categories can affect the way the stimuli are perceived. We explore the influence of categories on perception through one such phenomenon, the perceptual magnet effect, in which discriminability between vowels is reduced near prototypical vowel sounds. We present a Bayesian model…
Directory of Open Access Journals (Sweden)
Ye Ping
2005-12-01
Full Text Available Abstract Background Synthetic lethality experiments identify pairs of genes with complementary function. More direct functional associations (for example greater probability of membership in a single protein complex may be inferred between genes that share synthetic lethal interaction partners than genes that are directly synthetic lethal. Probabilistic algorithms that identify gene modules based on motif discovery are highly appropriate for the analysis of synthetic lethal genetic interaction data and have great potential in integrative analysis of heterogeneous datasets. Results We have developed Genetic Interaction Motif Finding (GIMF, an algorithm for unsupervised motif discovery from synthetic lethal interaction data. Interaction motifs are characterized by position weight matrices and optimized through expectation maximization. Given a seed gene, GIMF performs a nonlinear transform on the input genetic interaction data and automatically assigns genes to the motif or non-motif category. We demonstrate the capacity to extract known and novel pathways for Saccharomyces cerevisiae (budding yeast. Annotations suggested for several uncharacterized genes are supported by recent experimental evidence. GIMF is efficient in computation, requires no training and automatically down-weights promiscuous genes with high degrees. Conclusion GIMF effectively identifies pathways from synthetic lethality data with several unique features. It is mostly suitable for building gene modules around seed genes. Optimal choice of one single model parameter allows construction of gene networks with different levels of confidence. The impact of hub genes the generic probabilistic framework of GIMF may be used to group other types of biological entities such as proteins based on stochastic motifs. Analysis of the strongest motifs discovered by the algorithm indicates that synthetic lethal interactions are depleted between genes within a motif, suggesting that synthetic
Schimek, Michael G; Budinská, Eva; Kugler, Karl G; Švendová, Vendula; Ding, Jie; Lin, Shili
2015-06-01
High-throughput sequencing techniques are increasingly affordable and produce massive amounts of data. Together with other high-throughput technologies, such as microarrays, there are an enormous amount of resources in databases. The collection of these valuable data has been routine for more than a decade. Despite different technologies, many experiments share the same goal. For instance, the aims of RNA-seq studies often coincide with those of differential gene expression experiments based on microarrays. As such, it would be logical to utilize all available data. However, there is a lack of biostatistical tools for the integration of results obtained from different technologies. Although diverse technological platforms produce different raw data, one commonality for experiments with the same goal is that all the outcomes can be transformed into a platform-independent data format - rankings - for the same set of items. Here we present the R package TopKLists, which allows for statistical inference on the lengths of informative (top-k) partial lists, for stochastic aggregation of full or partial lists, and for graphical exploration of the input and consolidated output. A graphical user interface has also been implemented for providing access to the underlying algorithms. To illustrate the applicability and usefulness of the package, we integrated microRNA data of non-small cell lung cancer across different measurement techniques and draw conclusions. The package can be obtained from CRAN under a LGPL-3 license.
Studies on statistical optimization of sulforaphane production from broccoli seed
Wu, Yuanfeng; Mao, Jianwei; Mei,Lehe; Liu, Shiwang
2013-01-01
Background: Natural sulforaphane (SF) has been of increasing interest for nutraceutical and pharmaceutical industries due to its anti-cancer effect. The main objective of the present work was to optimize the production of SF from broccoli seed using response surface methodology. Results: Three major factors (hydrolysis time, water volume and ethyl acetate volume) were screened out through Plackett-Burman (PB) factorial design. The methods of steepest ascent combined with central composite des...
An Optimization Method for Simulator Using Probability Statistic Model
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
An optimization method was presented to be easily applied in retargetable simulator. The substance of this method is to reduce the redundant information of operation code which is caused by the variety of execution frequencies of instructions. By recoding the operation code in the loading part of simulator, times of bit comparison in identification of an instruction will get reduced. Thus the performance of the simulator will be improved. The theoretical analysis and experimental results both prove the validity of this method.
Black-Box Optimization Using Geodesics in Statistical Manifolds
Directory of Open Access Journals (Sweden)
Jérémy Bensadon
2015-01-01
Full Text Available Information geometric optimization (IGO is a general framework for stochastic optimization problems aiming at limiting the influence of arbitrary parametrization choices: the initial problem is transformed into the optimization of a smooth function on a Riemannian manifold, defining a parametrization-invariant first order differential equation and, thus, yielding an approximately parametrization-invariant algorithm (up to second order in the step size. We define the geodesic IGO update, a fully parametrization-invariant algorithm using the Riemannian structure, and we compute it for the manifold of Gaussians, thanks to Noether’s theorem. However, in similar algorithms, such as CMA-ES (Covariance Matrix Adaptation - Evolution Strategy and xNES (exponential Natural Evolution Strategy, the time steps for the mean and the covariance are decoupled. We suggest two ways of doing so: twisted geodesic IGO (GIGO and blockwise GIGO. Finally, we show that while the xNES algorithm is not GIGO, it is an instance of blockwise GIGO applied to the mean and covariance matrix separately. Therefore, xNES has an almost parametrization-invariant description.
Ortega-Minakata, R. A.; Torres-Papaqui, J. P.; Andernach, H.; Islas-Islas, J. M.
2014-05-01
We quantify the statistical evidence of the relation between the inferred morphology and the emission-line activity type of galaxies for a large sample of galaxies. We compare the distribution of the inferred morphologies of galaxies of different dominant activity types, showing that the difference in the median morphological type between the samples of different activity types is significant. We also test the significance of the difference in the mean morphological type between all the activity-type samples using an ANOVA model with a modified Tukey test that takes into account heteroscedasticity and the unequal sample sizes. We show this test in the form of simultaneous confidence intervals for all pairwise comparisons of the mean morphological types of the samples. Using this test, scarcely applied in astronomy, we conclude that there are statistically significant differences in the inferred morphologies of galaxies of different dominant activity types.
DEFF Research Database (Denmark)
Andersen, J.S.; Bedaux, J.J.M.; Kooijman, S.A.L.M.;
2000-01-01
This paper describes the influence of design characteristics on the statistical inference for an ecotoxicological hazard-based model using simulated survival data. The design characteristics of interest are the number and spacing of observations (counts) in time, the number and spacing of exposure...
病因统计推断中的因果关系研究%Causality in Statistical Inference for Causes of Disease
Institute of Scientific and Technical Information of China (English)
邓平基
2012-01-01
基于观察研究的病因推断是一个医学难题,由于疾病产生的原因往往混杂着社会、心理、环境和生理等复杂因素,因而因果关系并非简单就能证明出来.统计模型在病因推断中有其成立的先验假设条件,有些条件是不可经验证伪的.讨论统计推断中因果关系的含义,揭示因果统计推断背后隐藏的假设条件,从而看到病因统计推断的局限性.%Etiological inference based on observation study is a hard problem in medical science. It is uneasy to prove causality, because the causes of disease are often involved with the social, psychological, environmental and physiological factors. Statistical models carry out with prior assumptions, some of which are not empirical falsifi-able. This paper aims to discuss the concept of causality in statistical inference and reveal some assumptions rooted in causal statistical inference in order to see the limitations of statistical inference for etiology.
Convex Optimization Methods for Graphs and Statistical Modeling
2011-06-01
requirements that the graph be triangle-free and square-free. Of course such graph reconstruction problems may be infeasible in general, as there may be...over C1, C2 is motivated by a similar procedure in statistics and signal processing, which goes by the name of “matched filtering.” Of course other...h is the height of the cap over the equator. Via elementary trigonometry , the solid angle that K subtends is given by π/2 − sin−1(h). Hence, if h(β
Multivariate Statistical Process Optimization in the Industrial Production of Enzymes
DEFF Research Database (Denmark)
Klimkiewicz, Anna
In modern biotech production, a massive number of diverse measurements, with a broad diversity in information content and quality, is stored in data historians. The potential of this enormous amount of data is currently under-employed in process optimization efforts. This is a result of the deman......In modern biotech production, a massive number of diverse measurements, with a broad diversity in information content and quality, is stored in data historians. The potential of this enormous amount of data is currently under-employed in process optimization efforts. This is a result...... and difficulties related to ‘recycling’ of historical data from a full-scale manufacturing of industrial enzymes. First, the crucial and tedious step of retrieving the data from the systems is presented. The prerequisites that need to be comprehended are discussed, such as sensors accuracy and reliability, aspects...... related to the actual measuring frequency and non-equidistance retaining strategies in data storage. Different regimes of data extraction can be employed, and some might introduce undesirable artifacts in the final analysis results (POSTER II1). Several signal processing techniques are also briefly...
Multivariate Statistical Process Optimization in the Industrial Production of Enzymes
DEFF Research Database (Denmark)
Klimkiewicz, Anna
ultrafiltration operation is limited by the membrane fouling phenomenawhere the production capacity - monitored as flow through the membrane or flux -decreases over time. The flux varies considerably from run to run within the sameproduct and likewise between different products. This variability clearly affects......, the study revealed that the less demanding in-line flow cellsetup outperformed the on-line arrangement. The former worked satisfactory robusttowards different products (amylases and proteases) and associated processingparameters such temperature and processing speed.This dissertation work shows......In modern biotech production, a massive number of diverse measurements, with a broad diversity in information content and quality, is stored in data historians. The potential of this enormous amount of data is currently under-employed in process optimization efforts. This is a result...
Hartmann, Alexander K
2005-01-01
A concise, comprehensive introduction to the topic of statistical physics of combinatorial optimization, bringing together theoretical concepts and algorithms from computer science with analytical methods from physics. The result bridges the gap between statistical physics and combinatorial optimization, investigating problems taken from theoretical computing, such as the vertex-cover problem, with the concepts and methods of theoretical physics. The authors cover rapid developments and analytical methods that are both extremely complex and spread by word-of-mouth, providing all the necessary
Optimal Inference for Instrumental Variables Regression with non-Gaussian Errors
DEFF Research Database (Denmark)
Cattaneo, Matias D.; Crump, Richard K.; Jansson, Michael
This paper is concerned with inference on the coefficient on the endogenous regressor in a linear instrumental variables model with a single endogenous regressor, nonrandom exogenous regressors and instruments, and i.i.d. errors whose distribution is unknown. It is shown that under mild smoothness...
Optimal Inference for Instrumental Variables Regression with non-Gaussian Errors
DEFF Research Database (Denmark)
Cattaneo, Matias D.; Crump, Richard K.; Jansson, Michael
This paper is concerned with inference on the coefficient on the endogenous regressor in a linear instrumental variables model with a single endogenous regressor, nonrandom exogenous regressors and instruments, and i.i.d. errors whose distribution is unknown. It is shown that under mild smoothness...
Optimization of a small passive wind turbine based on mixed Weibull-turbulence statistics of wind
2008-01-01
A "low cost full passive structure" of wind turbine system is proposed. The efficiency of such device can be obtained only if the design parameters are mutually adapted through an optimization design approach. An original wind profile generation process mixing Weibull and turbulence statistics is presented. The optimization results are compared with those obtained from a particular but typical time cycle of wind speed.
Kleijnen, J.P.C.
1995-01-01
This tutorial discusses what-if analysis and optimization of System Dynamics models. These problems are solved, using the statistical techniques of regression analysis and design of experiments (DOE). These issues are illustrated by applying the statistical techniques to a System Dynamics model for
Final Report: Large-Scale Optimization for Bayesian Inference in Complex Systems
Energy Technology Data Exchange (ETDEWEB)
Ghattas, Omar [The University of Texas at Austin
2013-10-15
The SAGUARO (Scalable Algorithms for Groundwater Uncertainty Analysis and Robust Optimiza- tion) Project focuses on the development of scalable numerical algorithms for large-scale Bayesian inversion in complex systems that capitalize on advances in large-scale simulation-based optimiza- tion and inversion methods. Our research is directed in three complementary areas: efficient approximations of the Hessian operator, reductions in complexity of forward simulations via stochastic spectral approximations and model reduction, and employing large-scale optimization concepts to accelerate sampling. Our efforts are integrated in the context of a challenging testbed problem that considers subsurface reacting flow and transport. The MIT component of the SAGUARO Project addresses the intractability of conventional sampling methods for large-scale statistical inverse problems by devising reduced-order models that are faithful to the full-order model over a wide range of parameter values; sampling then employs the reduced model rather than the full model, resulting in very large computational savings. Results indicate little effect on the computed posterior distribution. On the other hand, in the Texas-Georgia Tech component of the project, we retain the full-order model, but exploit inverse problem structure (adjoint-based gradients and partial Hessian information of the parameter-to- observation map) to implicitly extract lower dimensional information on the posterior distribution; this greatly speeds up sampling methods, so that fewer sampling points are needed. We can think of these two approaches as "reduce then sample" and "sample then reduce." In fact, these two approaches are complementary, and can be used in conjunction with each other. Moreover, they both exploit deterministic inverse problem structure, in the form of adjoint-based gradient and Hessian information of the underlying parameter-to-observation map, to achieve their speedups.
Directory of Open Access Journals (Sweden)
Ricardo de Matos Simoes
Full Text Available The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study 4 different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger in combination with 3 discretization methods (equal frequency, equal width and global equal width discretization. We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.
de Matos Simoes, Ricardo; Emmert-Streib, Frank
2011-01-01
The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study 4 different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with 3 discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.
我如古, 博之; 山城, 毅; 渡久地, 實; Ganeko, Hiroyuki; Yamashiro, Tsuyoshi; Toguchi, Minoru
1999-01-01
Many methods for tracking of Three-Dimensional human gestures have been proposed by using multiview scheme. However, at the present state these methods have been very far from the stage of practical application due to its high-cost. This paper describes a new method of Three-Dimensional human gesture from image sequence taken with Ocellar CCD camera. This tracking system is composed based on statistical inference and Three-Dimensional human model, and the Occlusion problem is solved by bottom...
Joelsson, Daniel; Moravec, Phil; Troutman, Matthew; Pigeon, Joseph; DePhillips, Pete
2008-08-20
Transferring manual ELISAs to automated platforms requires optimizing the assays for each particular robotic platform. These optimization experiments are often time consuming and difficult to perform using a traditional one-factor-at-a-time strategy. In this manuscript we describe the development of an automated process using statistical design of experiments (DOE) to quickly optimize immunoassays for precision and robustness on the Tecan EVO liquid handler. By using fractional factorials and a split-plot design, five incubation time variables and four reagent concentration variables can be optimized in a short period of time.
Yang, Yahong; Zhao, Fuqing; Hong, Yi; Yu, Dongmei
2005-12-01
Integration of process planning with scheduling by considering the manufacturing system's capacity, cost and capacity in its workshop is a critical issue. The concurrency between them can also eliminate the redundant process and optimize the entire production cycle, but most integrated process planning and scheduling methods only consider the time aspects of the alternative machines when constructing schedules. In this paper, a fuzzy inference system (FIS) in choosing alternative machines for integrated process planning and scheduling of a job shop manufacturing system is presented. Instead of choosing alternative machines randomly, machines are being selected based on the machines reliability. The mean time to failure (MTF) values is input in a fuzzy inference mechanism, which outputs the machine reliability. The machine is then being penalized based on the fuzzy output. The most reliable machine will have the higher priority to be chosen. In order to overcome the problem of un-utilization machines, sometimes faced by unreliable machine, the particle swarm optimization (PSO) have been used to balance the load for all the machines. Simulation study shows that the system can be used as an alternative way of choosing machines in integrated process planning and scheduling.
Institute of Scientific and Technical Information of China (English)
WuXiaojun; YangJingyu; JosefKittler; WangShitong; LiuTongming; KieronMesser
2004-01-01
A study has been made on the essence of optimal uncorrelated discriminant vectors. A whitening transform has been constructed by means of the eigen decomposition of the population scatter matrix, which makes the population scatter matrix be an identity matrix in the transformed sample space no matter whether the population scatter matrix is singular or not. Thus, the optimal discriminant vectors solved by the conventional linear discriminant analysis (LDA) methods are statistically uncorrelated. The research indicates that the essence of the statistically uncorrelated discriminant transform is the whitening transform plus conventional linear discriminant transform. The distinguished characteristics of the proposed method is that the obtained optimal discriminant vectors are not only orthogonal but also statistically uneorrelated. The proposed method is applicable to all the problems of algebraic feature extraction. The numerical experiments on several facial databases show the effectiveness of the proposed method.
Harari, Gil
2014-01-01
Statistic significance, also known as p-value, and CI (Confidence Interval) are common statistics measures and are essential for the statistical analysis of studies in medicine and life sciences. These measures provide complementary information about the statistical probability and conclusions regarding the clinical significance of study findings. This article is intended to describe the methodologies, compare between the methods, assert their suitability for the different needs of study results analysis and to explain situations in which each method should be used.
Duda, Jarek
2007-01-01
In this paper it is shown how to almost optimally encode information in valuations of discrete lattice with some translational invariant constrains. The method is based on finding statistical description of such valuations and changing it into statistical algorithm: which allow to construct deterministically valuation with given statistics. Optimal statistic allows to generate valuations with uniform distribution - we get this way maximum information capacity. It will be shown that in this approach we practically can get as close to capacity of the model as we want (found numerically: lost 1e-10 bit/node for Hard Square). There will be presented an alternative to Huffman coding too, which is more precise and practice with changing probability distributions.
Optimization of BP Algorithm in Repeated Inference%多次推理中的BP算法优化
Institute of Scientific and Technical Information of China (English)
吴孝滨; 任志平
2011-01-01
使用BP算法求解效用最大化问题时,容易产生大量冗余计算.为此,对标准BP算法进行优化,在推理过程中,对一些受限定条件影响较小的结点,直接利用前次推理结果,无需重新计算其边缘概率,并证明这种优化不会显著影响推理结果.将该算法应用于组合竞拍模型进行测试.仿真结果表明,相对于标准BP算法,该优化算法能提升求解效用最大化问题时的收敛效率.%It always produces redundant computation when applying Belief Propagation(BP) algorithm to MEU problem. To improve such a situation, an optimization to BP is proposed that reuse last influent result instead of recomputing the marginal probability if the node is not significantly affected by the change of condition, and prove is given to confirm that the optimization does not significantly affect the influence result. A simulation by applying the optimized algorithm to combinatorial auctions gives the result that compare to the standard BP algorism, optimized algorism efficiently improve the efficiency of inference without significantly affect the quality of the solutions.
Optimization of sequences in CDMA systems: a statistical-mechanics approach
Kitagawa, Koichiro
2008-01-01
Statistical mechanics approach is useful not only in analyzing macroscopic system performance of wireless communication systems, but also in discussing design problems of wireless communication systems. In this paper, we discuss a design problem of spreading sequences in code-division multiple-access (CDMA) systems, as an example demonstrating the usefulness of statistical mechanics approach. We analyze, via replica method, the average mutual information between inputs and outputs of a randomly-spread CDMA channel, and discuss the optimization problem with the average mutual information as a measure of optimization. It has been shown that the average mutual information is maximized by orthogonally-invariant random Welch bound equality (WBE) spreading sequences.
Institute of Scientific and Technical Information of China (English)
ZHAO Zhen-shan; XU Guo-zhi
2007-01-01
In real multiple-input multiple-output (MIMO) systems, the perfect channel state information (CSI) may be costly or impossible to acquire. But the channel statistical information can be considered relatively stationary during long-term transmission.The statistical information can be obtained at the receiver and fed back to the transmitter and do not require frequent update. By exploiting channel mean and covariance information at the transmitter simultaneously, this paper investigates the optimal transmission strategy for spatially correlated MIMO channels. An upper bound of ergodic capacity is derived and taken as the performance criterion. Simulation results are also given to show the performance improvement of the optimal transmission strategy.
Padmanaban, Sethuraman; Balaji, Nagarajan; Muthukumaran, Chandrasekaran; Tamilarasan, Krishnamurthi
2015-01-01
Statistical experimental designs were applied to optimize the fermentation medium for exopolysaccharide (EPS) production. Plackett–Burman design was applied to identify the significance of seven medium variables, in which sweet potato and yeast extract were found to be the significant variables for EPS production. Central composite design was applied to evaluate the optimum condition of the selected variables. Maximum EPS production of 9.3 g/L was obtained with the predicted optimal level of ...
Halpin, Peter F; Stam, Henderikus J
2006-01-01
The application of statistical testing in psychological research over the period of 1940-1960 is examined in order to address psychologists' reconciliation of the extant controversy between the Fisher and Neyman-Pearson approaches. Textbooks of psychological statistics and the psychological journal literature are reviewed to examine the presence of what Gigerenzer (1993) called a hybrid model of statistical testing. Such a model is present in the textbooks, although the mathematically incomplete character of this model precludes the appearance of a similarly hybridized approach to statistical testing in the research literature. The implications of this hybrid model for psychological research and the statistical testing controversy are discussed.
Yang, Yuqing; Chen, Ning; Chen, Ting
2017-01-25
The inference of associations between environmental factors and microbes and among microbes is critical to interpreting metagenomic data, but compositional bias, indirect associations resulting from common factors, and variance within metagenomic sequencing data limit the discovery of associations. To account for these problems, we propose metagenomic Lognormal-Dirichlet-Multinomial (mLDM), a hierarchical Bayesian model with sparsity constraints, to estimate absolute microbial abundance and simultaneously infer both conditionally dependent associations among microbes and direct associations between microbes and environmental factors. We empirically show the effectiveness of the mLDM model using synthetic data, data from the TARA Oceans project, and a colorectal cancer dataset. Finally, we apply mLDM to 16S sequencing data from the western English Channel and report several associations. Our model can be used on both natural environmental and human metagenomic datasets, promoting the understanding of associations in the microbial community.
Optimal inference strategies and their implications for the linear noise approximation
Hartich, David
2016-01-01
We study the information loss of a class of inference strategies that is solely based on time averaging. For an array of independent binary sensors (e.g., receptors) measuring a weak random signal (e.g., ligand concentration) this information loss is up to 0.5 bit per measurement irrespective of the number of sensors. We derive a condition related to the local detailed balance relation that determines whether or not such a loss of information occurs. Specifically, if the free energy difference arising from the signal is symmetrically distributed among the forward and backward rates, time integration mechanisms will capture the full information about the signal. As an implication, for the linear noise approximation, we can identify the same loss of information, arising from its inherent simplification of the dynamics.
Shinzato, Takashi
2016-12-01
The portfolio optimization problem in which the variances of the return rates of assets are not identical is analyzed in this paper using the methodology of statistical mechanical informatics, specifically, replica analysis. We defined two characteristic quantities of an optimal portfolio, namely, minimal investment risk and investment concentration, in order to solve the portfolio optimization problem and analytically determined their asymptotical behaviors using replica analysis. Numerical experiments were also performed, and a comparison between the results of our simulation and those obtained via replica analysis validated our proposed method.
Farias, L Albano
2010-01-01
We analyze the statistics of observables in continuous variable quantum teleportation in the formalism of the characteristic function. We derive expressions for average values of output state observables in particular cumulants which are additive in terms of the input state and the resource of teleportation. Working with Squeezed Bell-like states, which may be optimized in a free parameter for better teleportation performance we discuss the relation between resources optimal for fidelity and for different observable averages. We obtain the values of the free parameter which optimize the central momenta and cumulants up to fourth order. For the cumulants the distortion between in and out states due to teleportation depends only on the resource. We obtain optimal parameters for the second and fourth order cumulants which do not depend on the squeezing of the resource. The second order central momenta which is equal to the second order cumulants and the photon number average are optimized by the same resource. W...
On Difference of Convex Optimization to Visualize Statistical Data and Dissimilarities
DEFF Research Database (Denmark)
Carrizosa, Emilio; Guerrero, Vanesa; Morales, Dolores Romero
2016-01-01
In this talk we address the problem of visualizing in a bounded region a set of individuals, which has attached a dissimilarity measure and a statistical value. This problem, which extends the standard Multidimensional Scaling Analysis, is written as a global optimization problem whose objective ...
On Difference of Convex Optimization to Visualize Statistical Data and Dissimilarities
DEFF Research Database (Denmark)
Carrizosa, Emilio; Guerrero, Vanesa; Morales, Dolores Romero
2016-01-01
In this talk we address the problem of visualizing in a bounded region a set of individuals, which has attached a dissimilarity measure and a statistical value. This problem, which extends the standard Multidimensional Scaling Analysis, is written as a global optimization problem whose objective ...
DEFF Research Database (Denmark)
Stefani, Alessio; Nielsen, Kristian; Rasmussen, Henrik K.;
2012-01-01
POFs) with hexagonal hole structures we developed a program for cleaving quality optimization, which reads in a microscope image of the fiber end-facet and determines the core-shift and the statistics of the hole diameter, hole-to-hole pitch, hole ellipticity, and direction of major ellipse axis. For 125μm in diameter...
Compartmental analysis of renal physiology using nuclear medicine data and statistical optimization
Garbarino, Sara; Brignone, Massimo; Massollo, Michela; Sambuceti, Gianmario; Piana, Michele
2012-01-01
This paper describes a general approach to the compartmental modeling of nuclear data based on spectral analysis and statistical optimization. We utilize the renal physiology as test case and validate the method against both synthetic data and real measurements acquired during two micro-PET experiments with murine models.
Hayslett, H T
1991-01-01
Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the
Energy Technology Data Exchange (ETDEWEB)
Blanc, Guillermo A. [Observatories of the Carnegie Institution for Science, 813 Santa Barbara Street, Pasadena, CA 91101 (United States); Kewley, Lisa; Vogt, Frédéric P. A.; Dopita, Michael A. [Research School of Astronomy and Astrophysics, Australian National University, Cotter Road, Weston, ACT 2611 (Australia)
2015-01-10
We present a new method for inferring the metallicity (Z) and ionization parameter (q) of H II regions and star-forming galaxies using strong nebular emission lines (SELs). We use Bayesian inference to derive the joint and marginalized posterior probability density functions for Z and q given a set of observed line fluxes and an input photoionization model. Our approach allows the use of arbitrary sets of SELs and the inclusion of flux upper limits. The method provides a self-consistent way of determining the physical conditions of ionized nebulae that is not tied to the arbitrary choice of a particular SEL diagnostic and uses all the available information. Unlike theoretically calibrated SEL diagnostics, the method is flexible and not tied to a particular photoionization model. We describe our algorithm, validate it against other methods, and present a tool that implements it called IZI. Using a sample of nearby extragalactic H II regions, we assess the performance of commonly used SEL abundance diagnostics. We also use a sample of 22 local H II regions having both direct and recombination line (RL) oxygen abundance measurements in the literature to study discrepancies in the abundance scale between different methods. We find that oxygen abundances derived through Bayesian inference using currently available photoionization models in the literature can be in good (∼30%) agreement with RL abundances, although some models perform significantly better than others. We also confirm that abundances measured using the direct method are typically ∼0.2 dex lower than both RL and photoionization-model-based abundances.
Rankin, Jeffery W; Rubenson, Jonas; Hutchinson, John R
2016-05-01
Owing to their cursorial background, ostriches (Struthio camelus) walk and run with high metabolic economy, can reach very fast running speeds and quickly execute cutting manoeuvres. These capabilities are believed to be a result of their ability to coordinate muscles to take advantage of specialized passive limb structures. This study aimed to infer the functional roles of ostrich pelvic limb muscles during gait. Existing gait data were combined with a newly developed musculoskeletal model to generate simulations of ostrich walking and running that predict muscle excitations, force and mechanical work. Consistent with previous avian electromyography studies, predicted excitation patterns showed that individual muscles tended to be excited primarily during only stance or swing. Work and force estimates show that ostrich gaits are partially hip-driven with the bi-articular hip-knee muscles driving stance mechanics. Conversely, the knee extensors acted as brakes, absorbing energy. The digital extensors generated large amounts of both negative and positive mechanical work, with increased magnitudes during running, providing further evidence that ostriches make extensive use of tendinous elastic energy storage to improve economy. The simulations also highlight the need to carefully consider non-muscular soft tissues that may play a role in ostrich gait.
Mukherjee, Kushal; Gupta, Shalabh; Ray, Asok; Wettergren, Thomas A
2011-06-01
This paper presents a statistical-mechanics-inspired procedure for optimization of the sensor field configuration to detect mobile targets. The key idea is to capture the low-dimensional behavior of the sensor field configurations across the Pareto front in a multiobjective scenario for optimal sensor deployment, where the nondominated points are concentrated within a small region of the large-dimensional decision space. The sensor distribution is constructed using location-dependent energy-like functions and intensive temperature-like parameters in the sense of statistical mechanics. This low-dimensional representation is shown to permit rapid optimization of the sensor field distribution on a high-fidelity simulation test bed of distributed sensor networks.
Design and statistical optimization of osmotically driven capsule based on push-pull technology.
Shaikh, Wasim; Deshmukh, Prashant K; Patil, Ganesh B; Chatap, Vivekanand K; Bari, Sanjay B
2013-01-01
In present investigation attempt was made to develop and statistically optimize osmotically active capsule tailor made from the concept of bilayer (push-pull) osmotic tablet technology. The capsule was comprised of active (drug) and push (osmogen) layer. Active layer was compressed in form of tablet by mixing known amount of drug and formulation excipients. Similarly push layer was made by compressing Mannitol with formulation excipients. Finally, both layers were packed in hard gelatin capsule having small aperture at top and coated with semipermeable membrane to form osmotically active capsule. Formulated and optimized capsules were characterized for Fourier transform infrared (FT-IR) spectroscopy, differential scanning calorimetric (DSC), scanning electron microscopy, In-vitro drug release study and Release models and kinetics. Statistically optimized formulation showed good correlation between predicted and experimented results, which further confirms the practicability and validity of the model.
Blanc, Guillermo A; Vogt, Frédéric P A; Dopita, Michael A
2014-01-01
We present a new method for inferring the metallicity (Z) and ionization parameter (q) of HII regions and star-forming galaxies using strong nebular emission lines (SEL). We use Bayesian inference to derive the joint and marginalized posterior probability density functions for Z and q given a set of observed line fluxes and an input photo-ionization model. Our approach allows the use of arbitrary sets of SELs and the inclusion of flux upper limits. The method provides a self-consistent way of determining the physical conditions of ionized nebulae that is not tied to the arbitrary choice of a particular SEL diagnostic and uses all the available information. Unlike theoretically calibrated SEL diagnostics the method is flexible and not tied to a particular photo-ionization model. We describe our algorithm, validate it against other methods, and present a tool that implements it called IZI. Using a sample of nearby extra-galactic HII regions we assess the performance of commonly used SEL abundance diagnostics. W...
Saeki, Hiroyuki; Tango, Toshiro; Wang, Jinfang
2017-01-01
In clinical investigations of diagnostic procedures to indicate noninferiority, efficacy is generally evaluated on the basis of results from independent multiple raters. For each subject, if two diagnostic procedures are performed and some units are evaluated, the difference in proportions for matched-pair data is correlated between the two diagnostic procedures and within the subject, i.e. the data are clustered. In this article, we propose a noninferiority test to infer the difference in the correlated proportions of clustered data between the two diagnostic procedures. The proposed noninferiority test was validated in a Monte Carlo simulation study. Empirical sizes of the noninferiority test were close to the nominal level. The proposed test is illustrated on data of aneurysm diagnostic procedures for patients with acute subarachnoid hemorrhage.
The impact on midlevel vision of statistically optimal divisive normalization in V1.
Coen-Cagli, Ruben; Schwartz, Odelia
2013-07-15
The first two areas of the primate visual cortex (V1, V2) provide a paradigmatic example of hierarchical computation in the brain. However, neither the functional properties of V2 nor the interactions between the two areas are well understood. One key aspect is that the statistics of the inputs received by V2 depend on the nonlinear response properties of V1. Here, we focused on divisive normalization, a canonical nonlinear computation that is observed in many neural areas and modalities. We simulated V1 responses with (and without) different forms of surround normalization derived from statistical models of natural scenes, including canonical normalization and a statistically optimal extension that accounted for image nonhomogeneities. The statistics of the V1 population responses differed markedly across models. We then addressed how V2 receptive fields pool the responses of V1 model units with different tuning. We assumed this is achieved by learning without supervision a linear representation that removes correlations, which could be accomplished with principal component analysis. This approach revealed V2-like feature selectivity when we used the optimal normalization and, to a lesser extent, the canonical one but not in the absence of both. We compared the resulting two-stage models on two perceptual tasks; while models encompassing V1 surround normalization performed better at object recognition, only statistically optimal normalization provided systematic advantages in a task more closely matched to midlevel vision, namely figure/ground judgment. Our results suggest that experiments probing midlevel areas might benefit from using stimuli designed to engage the computations that characterize V1 optimality.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Teimouri, Reza; Sohrabpoor, Hamed
2013-12-01
Electrochemical machining process (ECM) is increasing its importance due to some of the specific advantages which can be exploited during machining operation. The process offers several special privileges such as higher machining rate, better accuracy and control, and wider range of materials that can be machined. Contribution of too many predominate parameters in the process, makes its prediction and selection of optimal values really complex, especially while the process is programmized for machining of hard materials. In the present work in order to investigate effects of electrolyte concentration, electrolyte flow rate, applied voltage and feed rate on material removal rate (MRR) and surface roughness (SR) the adaptive neuro-fuzzy inference systems (ANFIS) have been used for creation predictive models based on experimental observations. Then the ANFIS 3D surfaces have been plotted for analyzing effects of process parameters on MRR and SR. Finally, the cuckoo optimization algorithm (COA) was used for selection solutions in which the process reaches maximum material removal rate and minimum surface roughness simultaneously. Results indicated that the ANFIS technique has superiority in modeling of MRR and SR with high prediction accuracy. Also, results obtained while applying of COA have been compared with those derived from confirmatory experiments which validate the applicability and suitability of the proposed techniques in enhancing the performance of ECM process.
Directory of Open Access Journals (Sweden)
Jan eKneissler
2015-04-01
Full Text Available Predictive coding appears to be one of the fundamental working principles of brain processing. Amongst other aspects, brains often predict the sensory consequences of their own actions. Predictive coding resembles Kalman filtering, where incoming sensory information is filtered to produce prediction errors for subsequent adaptation and learning. However, to generate prediction errors given motor commands, a suitable temporal forward model is required to generate predictions. While in engineering applications, it is usually assumed that this forward model is known, the brain has to learn it. When filtering sensory input and learning from the residual signal in parallel, a fundamental problem arises: the system can enter a delusional loop when filtering the sensory information using an overly trusted forward model. In this case, learning stalls before accurate convergence because uncertainty about the forward model is not properly accommodated. We present a Bayes-optimal solution to this generic and pernicious problem for the case of linear forward models, which we call Predictive Inference and Adaptive Filtering (PIAF. PIAF filters incoming sensory information and learns the forward model simultaneously. We show that PIAF is formally related to Kalman filtering and to the Recursive Least Squares linear approximation method, but combines these procedures in a Bayes optimal fashion. Numerical evaluations confirm that the delusional loop is precluded and that the learning of the forward model is more than ten-times faster when compared to a naive combination of Kalman filtering and Recursive Least Squares.
Statistical optimization of insulin-loaded Pluronic F-127 gels for buccal delivery of basal insulin.
Das, Nilanjana; Madan, Parshotam; Lin, Senshang
2012-01-01
The principle of statistical optimization was employed to fabricate insulin-loaded Pluronic F-127 (PF-127) gel formulations having the potential for buccal delivery of basal insulin. A two-level resolution III fractional factorial design was applied to simultaneously evaluate five independent formulation variables: PF-127 concentration, insulin concentration, sodium sulfate concentration, hydroxypropylmethyl cellulose (HPMC) concentration, and presence of sodium glycocholate. The amount of insulin released and permeated from gels as well as gelation time and mucoadhesion force of gels were measured and used as dependent response variables for formulation optimization. Optimization of a gel formulation was achieved by applying constrained optimization via regression analysis. In vitro permeation flux of insulin from the optimized formulation through procine buccal mucosa was 93.17 (±0.058, n = 3) μg/cm(2). Plasma insulin levels following buccal administration of the optimized formulation at 10, 25 and 50 IU/kg to healthy rats were found to be dose dependent and basal insulin levels were maintained at least for 8 h. Furthermore, continuous hypoglycemia for at least 8 h was observed with 89%, 51% and 25% of blood glucose reduction, respectively, for these three doses. The results of this investigation conclude the feasibility of development of optimized buccal insulin-loaded Pluronic F-127 gels for basal insulin delivery.
Pan, C M; Fan, Y T; Xing, Y; Hou, H W; Zhang, M L
2008-05-01
Statistically based experimental designs were applied to optimizing process parameters for hydrogen production from glucose by Clostridium sp. Fanp2 which was isolated from effluent sludge of anaerobic hydrogen-producing bioreactor. The important factors influencing hydrogen production, which identified by initial screening method of Plackett-Burman, were glucose, phosphate buffer and vitamin solution. The path of steepest ascent was undertaken to approach the optimal region of the three significant factors. Box-Behnken design and response surface analysis were adopted to further investigate the mutual interaction between the variables and identify optimal values that bring maximum hydrogen production. Experimental results showed that glucose, vitamin solution and phosphate buffer concentration all had an individual significant influence on the specific hydrogen production potential (Ps). Simultaneously, glucose and vitamin solution, glucose and phosphate buffer were interdependent. The optimal conditions for the maximal Ps were: glucose 23.75 g/l, phosphate buffer 0.159 M and vitamin solution 13.3 ml/l. Using this statistical optimization method, the hydrogen production from glucose was increased from 2248.5 to 4165.9 ml H2/l.
Wang, Peng; Liu, Xia; Wang, Yifei; Ruan, Hui; Zheng, XiaoDong
2011-01-01
A cane molasses-based medium for the biomass production of biocontrol agent Rhodosporidium paludigenum was statistically optimized. Molasses concentration (after pretreatment), yeast extract, and initial pH were identified by the Plackett-Burman design to show significant influence on the biomass production. The three factors were further optimized by central composite design and response-surface methodology. The statistical analysis indicated the optimum values of the variables were 89.98 g/L for cane molasses, 2.35 g/L for yeast extract and an initial pH of 8.48. The biomass yield at the optimal culture achieved 15.89 g/L in flask fermentation, which was 2.1 times higher than that at the initial NYDB medium. In a 10-L fermenter, 18.97 g/L of biomass was obtained after 36 hr of cultivation. Moreover, the biocontrol efficacy of the yeast was investigated after culture optimization. The results showed the yeast harvested in the optimal medium maintained its initial biocontrol properties by reducing the percentage of decayed apples to below 20%.
Reproducibility-optimized test statistic for ranking genes in microarray studies.
Elo, Laura L; Filén, Sanna; Lahesmaa, Riitta; Aittokallio, Tero
2008-01-01
A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.
Statistical Mechanics of On-Line Learning Using Correlated Examples and Its Optimal Scheduling
Fujii, Takashi; Ito, Hidetaka; Miyoshi, Seiji
2017-08-01
We theoretically study the generalization capability of on-line learning using several correlated input vectors in each update in a statistical-mechanical manner. We consider a model organized with linear perceptrons with Gaussian noise. First, in a noiseless case, we analytically derive the optimal learning rate as a function of the number of examples used in one update and their correlation. Next, we analytically show that the use of correlated examples is effective if the optimal learning rate is used, even when there is some noise. Furthermore, we propose a novel algorithm that raises the generalization capability by increasing the number of examples used in one update with time.
Directory of Open Access Journals (Sweden)
Wei Wu
Full Text Available We analyzed the spike discharge patterns of two types of neurons in the rodent peripheral gustatory system, Na specialists (NS and acid generalists (AG to lingual stimulation with NaCl, acetic acid, and mixtures of the two stimuli. Previous computational investigations found that both spike rate and spike timing contribute to taste quality coding. These studies used commonly accepted computational methods, but they do not provide a consistent statistical evaluation of spike trains. In this paper, we adopted a new computational framework that treated each spike train as an individual data point for computing summary statistics such as mean and variance in the spike train space. We found that these statistical summaries properly characterized the firing patterns (e. g. template and variability and quantified the differences between NS and AG neurons. The same framework was also used to assess the discrimination performance of NS and AG neurons and to remove spontaneous background activity or "noise" from the spike train responses. The results indicated that the new metric system provided the desired decoding performance and noise-removal improved stimulus classification accuracy, especially of neurons with high spontaneous rates. In summary, this new method naturally conducts statistical analysis and neural decoding under one consistent framework, and the results demonstrated that individual peripheral-gustatory neurons generate a unique and reliable firing pattern during sensory stimulation and that this pattern can be reliably decoded.
Wu, Wei; Mast, Thomas G; Ziembko, Christopher; Breza, Joseph M; Contreras, Robert J
2013-01-01
We analyzed the spike discharge patterns of two types of neurons in the rodent peripheral gustatory system, Na specialists (NS) and acid generalists (AG) to lingual stimulation with NaCl, acetic acid, and mixtures of the two stimuli. Previous computational investigations found that both spike rate and spike timing contribute to taste quality coding. These studies used commonly accepted computational methods, but they do not provide a consistent statistical evaluation of spike trains. In this paper, we adopted a new computational framework that treated each spike train as an individual data point for computing summary statistics such as mean and variance in the spike train space. We found that these statistical summaries properly characterized the firing patterns (e. g. template and variability) and quantified the differences between NS and AG neurons. The same framework was also used to assess the discrimination performance of NS and AG neurons and to remove spontaneous background activity or "noise" from the spike train responses. The results indicated that the new metric system provided the desired decoding performance and noise-removal improved stimulus classification accuracy, especially of neurons with high spontaneous rates. In summary, this new method naturally conducts statistical analysis and neural decoding under one consistent framework, and the results demonstrated that individual peripheral-gustatory neurons generate a unique and reliable firing pattern during sensory stimulation and that this pattern can be reliably decoded.
Schochet, Peter Z.
2015-01-01
This report presents the statistical theory underlying the "RCT-YES" software that estimates and reports impacts for RCTs for a wide range of designs used in social policy research. The report discusses a unified, non-parametric design-based approach for impact estimation using the building blocks of the Neyman-Rubin-Holland causal…
Cluster statistics and quasisoliton dynamics in microscopic optimal-velocity models
Yang, Bo; Xu, Xihua; Pang, John Z. F.; Monterola, Christopher
2016-04-01
Using the non-linear optimal velocity models as an example, we show that there exists an emergent intrinsic scale that characterizes the interaction strength between multiple clusters appearing in the solutions of such models. The interaction characterizes the dynamics of the localized quasisoliton structures given by the time derivative of the headways, and the intrinsic scale is analogous to the "charge" of the quasisolitons, leading to non-trivial cluster statistics from the random perturbations to the initial steady states of uniform headways. The cluster statistics depend both on the quasisoliton charge and the density of the traffic. The intrinsic scale is also related to an emergent quantity that gives the extremum headways in the cluster formation, as well as the coexistence curve separating the absolute stable phase from the metastable phase. The relationship is qualitatively universal for general optimal velocity models.
Fernández-González, Daniel; Martín-Duarte, Ramón; Ruiz-Bustinza, Íñigo; Mochón, Javier; González-Gasca, Carmen; Verdeja, Luis Felipe
2016-08-01
Blast furnace operators expect to get sinter with homogenous and regular properties (chemical and mechanical), necessary to ensure regular blast furnace operation. Blends for sintering also include several iron by-products and other wastes that are obtained in different processes inside the steelworks. Due to their source, the availability of such materials is not always consistent, but their total production should be consumed in the sintering process, to both save money and recycle wastes. The main scope of this paper is to obtain the least expensive iron ore blend for the sintering process, which will provide suitable chemical and mechanical features for the homogeneous and regular operation of the blast furnace. The systematic use of statistical tools was employed to analyze historical data, including linear and partial correlations applied to the data and fuzzy clustering based on the Sugeno Fuzzy Inference System to establish relationships among the available variables.
Optimization of Rolling Process for Bi(2223)/Ag Superconducting Tapes by a Statistical Method
Institute of Scientific and Technical Information of China (English)
无
2001-01-01
Ag-sheathed (Bi,Pb)2Sr2Ca2Cu3Ox tapes were prepared by the powder-in-tube method. The influences of rolling parameters on superconducting characteristics of Bi(2223)/Ag tapes were analyzed qualitatively with a statistical method. The results demonstrate that roll diameter and reduction per pass significantly influence the properties of superconducting tapes while roll speed does less and working friction the least. An optimized rolling process was therefore achieved according to the above results.
Directory of Open Access Journals (Sweden)
Wills Rachael A
2009-05-01
Full Text Available Abstract Background The problem of silent multiple comparisons is one of the most difficult statistical problems faced by scientists. It is a particular problem for investigating a one-off cancer cluster reported to a health department because any one of hundreds, or possibly thousands, of neighbourhoods, schools, or workplaces could have reported a cluster, which could have been for any one of several types of cancer or any one of several time periods. Methods This paper contrasts the frequentist approach with a Bayesian approach for dealing with silent multiple comparisons in the context of a one-off cluster reported to a health department. Two published cluster investigations were re-analysed using the Dunn-Sidak method to adjust frequentist p-values and confidence intervals for silent multiple comparisons. Bayesian methods were based on the Gamma distribution. Results Bayesian analysis with non-informative priors produced results similar to the frequentist analysis, and suggested that both clusters represented a statistical excess. In the frequentist framework, the statistical significance of both clusters was extremely sensitive to the number of silent multiple comparisons, which can only ever be a subjective "guesstimate". The Bayesian approach is also subjective: whether there is an apparent statistical excess depends on the specified prior. Conclusion In cluster investigations, the frequentist approach is just as subjective as the Bayesian approach, but the Bayesian approach is less ambitious in that it treats the analysis as a synthesis of data and personal judgements (possibly poor ones, rather than objective reality. Bayesian analysis is (arguably a useful tool to support complicated decision-making, because it makes the uncertainty associated with silent multiple comparisons explicit.
Hawe, David; Hernández Fernández, Francisco R.; O’Suilleabháin, Liam; Huang, Jian; Wolsztynski, Eric; O’Sullivan, Finbarr
2012-01-01
In dynamic mode, positron emission tomography (PET) can be used to track the evolution of injected radio-labelled molecules in living tissue. This is a powerful diagnostic imaging technique that provides a unique opportunity to probe the status of healthy and pathological tissue by examining how it processes substrates. The spatial aspect of PET is well established in the computational statistics literature. This article focuses on its temporal aspect. The interpretation of PET time-course da...
Directory of Open Access Journals (Sweden)
Zhicai Zhang
2007-01-01
Full Text Available Statistical analyses were applied to optimize the medium composition for the mycelial growth and polysaccharide production by Tremella aurantialba in shake flask cultures. Firstly, four significant factors (xylan, peptone, wheat bran and corn powder on mycelial growth and polysaccharide yield (p≤0.05 were obtained using one-at-a-time design. Subsequently, in order to study the mutual interactions between variables, the effects of these factors were further investigated using four-factor, three-level orthogonal test design and the optimal composition was (in g/L: xylan 40, peptone 10, wheat bran 20, corn powder 20, KH2PO4 1.2 and MgSO4·7H2O 0.6. Finally, the maximum mycelium yield and polysaccharide production in 50-litre stirred-tank bioreactor reached 36.8 and 3.01 g/L under the optimized medium, respectively.
Yang, Xiaoyan; Patel, Sulabh; Sheng, Ye; Pal, Dhananjay; Mitra, Ashim K
2014-06-01
The aim of this investigation was to develop hydrocortisone butyrate (HB)-loaded poly(D,L-lactic-co-glycolic acid) (PLGA) nanoparticles (NP) with ideal encapsulation efficiency (EE), particle size, and drug loading (DL) under emulsion solvent evaporation technique utilizing various experimental statistical design modules. Experimental designs were used to investigate specific effects of independent variables during preparation of HB-loaded PLGA NP and corresponding responses in optimizing the formulation. Plackett-Burman design for independent variables was first conducted to prescreen various formulation and process variables during the development of NP. Selected primary variables were further optimized by central composite design. This process leads to an optimum formulation with desired EE, particle size, and DL. Contour plots and response surface curves display visual diagrammatic relationships between the experimental responses and input variables. The concentration of PLGA, drug, and polyvinyl alcohol and sonication time were the critical factors influencing the responses analyzed. Optimized formulation showed EE of 90.6%, particle size of 164.3 nm, and DL of 64.35%. This study demonstrates that statistical experimental design methodology can optimize the formulation and process variables to achieve favorable responses for HB-loaded NP.
Formulation and optimization of rifampicin microparticles by Box-Behnken statistical design.
Maurya, D P; Sultana, Yasmin; Aqil, Mohd; Ali, A
2012-01-01
The objective of the present study was to optimize and evaluate in vitro gastroretentive performance of rifampicin microparticles. Formulations were optimized using design of experiments by employing a 4-factor, 3-level Box-Behnken statistical design. Independent variables studied were the ratio of polymers (Eudragit RSPO: ethyl cellulose), inert drug dispersing agent (talc), surfactant (sodium dodecyl sulfate) and stirring speed. The dependent variables were particle size and entrapment efficiency. Response surface plots were drawn, statistical validity of the polynomials was validated and the optimized formulation was characterized by Fourier Transform-InfraRed spectroscopy (FT-IR) and differential scanning calorimetry (DSC). Entrapment efficiency and particle size were determined. The designed microparticles have average particle size from 14.10 μm to 45.63 μm and entrapment efficiency from 38.14% to 94.81%. Optimized microparticles showed particle size and drug entrapment, 51.53 μm and 83.43%, respectively with sustained drug release behavior up to 12 h. In the present study, rifampicin microspheres were successfully prepared by a quasi-emulsion solvent diffusion technique for prolonged drug release. FT-IR and DSC studies did not reveal any significant drug interactions. The drug release was found to be controlled for more than 12 h by following zero order release pattern.
Hartcher-O'Brien, Jess; Di Luca, Massimiliano; Ernst, Marc O.
2014-01-01
Often multisensory information is integrated in a statistically optimal fashion where each sensory source is weighted according to its precision. This integration scheme is statistically optimal because it theoretically results in unbiased perceptual estimates with the highest precision possible. There is a current lack of consensus about how the nervous system processes multiple sensory cues to elapsed time. In order to shed light upon this, we adopt a computational approach to pinpoint the integration strategy underlying duration estimation of audio/visual stimuli. One of the assumptions of our computational approach is that the multisensory signals redundantly specify the same stimulus property. Our results clearly show that despite claims to the contrary, perceived duration is the result of an optimal weighting process, similar to that adopted for estimates of space. That is, participants weight the audio and visual information to arrive at the most precise, single duration estimate possible. The work also disentangles how different integration strategies – i.e. considering the time of onset/offset of signals - might alter the final estimate. As such we provide the first concrete evidence of an optimal integration strategy in human duration estimates. PMID:24594578
Cross-Layer Optimization of Two-Way Relaying for Statistical QoS Guarantees
lin, Cen; Tao, Meixia
2012-01-01
Two-way relaying promises considerable improvements on spectral efficiency in wireless relay networks. While most existing works focus on physical layer approaches to exploit its capacity gain, the benefits of two-way relaying on upper layers are much less investigated. In this paper, we study the cross-layer design and optimization for delay quality-of-service (QoS) provisioning in two-way relay systems. Our goal is to find the optimal transmission policy to maximize the weighted sum throughput of the two users in the physical layer while guaranteeing the individual statistical delay-QoS requirement for each user in the datalink layer. This statistical delay-QoS requirement is characterized by the QoS exponent. By integrating the concept of effective capacity, the cross-layer optimization problem is equivalent to a weighted sum effective capacity maximization problem. We derive the jointly optimal power and rate adaptation policies for both three-phase and two-phase two-way relay protocols. Numerical results...
Kaur, Baljinder; Chakraborty, Debkumar
2013-11-01
An isolate of P. acidilactici capable of producing vanillin from rice bran was isolated from a milk product. Response Surface Methodology was employed for statistical media and process optimization for production of biovanillin. Statistical medium optimization was done in two steps involving Placket Burman Design and Central Composite Response Designs. The RSM optimized vanillin production medium consisted of 15% (w/v) rice bran, 0.5% (w/v) peptone, 0.1% (w/v) ammonium nitrate, 0.005% (w/v) ferulic acid, 0.005% (w/v) magnesium sulphate, and 0.1% (v/v) tween-80, pH 5.6, at a temperature of 37 degrees C under shaking conditions at 180 rpm. 1.269 g/L vanillin was obtained within 24 h of incubation in optimized culture medium. This is the first report indicating such a high vanillin yield obtained during biotransformation of ferulic acid to vanillin using a Pediococcal isolate.
Shen, Samuel S. P.; Wied, Olaf; Weithmann, Alexander; Regele, Tobias; Bailey, Barbara A.; Lawrimore, Jay H.
2016-07-01
This paper describes six different temporal climate regimes of the contiguous United States (CONUS) according to interdecadal variations of surface air temperature (SAT) and precipitation using the United States Historical Climatology Network (USHCN) monthly data (Tmax, Tmin, Tmean, and precipitation) from 1895 to 2010. Our analysis is based on the probability distribution, mean, standard deviation, skewness, kurtosis, Kolmogorov-Smirnov (KS) test, and Welch's t test. The relevant statistical parameters are computed from gridded monthly SAT and precipitation data. SAT variations lead to classification of four regimes: 1895-1930 (cool), 1931-1960 (warm), 1961-1985 (cool), and 1986-2010 (warm), while precipitation variations lead to a classification of two regimes: 1895-1975 (dry) and 1976-2010 (wet). The KS test shows that any two regimes of the above six are statistically significantly different from each other due to clear shifts of the probability density functions. Extremes of SAT and precipitation identify the ten hottest, coldest, driest, and wettest years. Welch's t test is used to discern significant differences among these extremes. The spatial patterns of the six climate regimes and some years of extreme climate are analyzed. Although the recent two decades are the warmest among the other decades since 1895 and many hottest years measured by CONUS Tmin and Tmean are in these two decades, the hottest year according to the CONUS Tmax anomalies is 1934 (1.37 °C), which is very close to the second Tmax hottest year 2006 (1.35 °C).
Roy, Pallab; Shahiwala, Aliasgar
2009-06-28
Present work conceptualizes a specific technology, based on combining floating and pulsatile principles to develop drug delivery system, intended for chronotherapy in nocturnal acid breakthrough. This approach will be achieved by using a programmed delivery of ranitidine hydrochloride from a floating tablet with time-lagged coating. In this study, investigation of the functionality of the outer polymer coating to predict lag time and drug release was statistically analyzed using the response surface methodology (RSM). RSM was employed for designing of the experiment, generation of mathematical models and optimization study. The chosen independent variables, i.e. percentage weight ratios of ethyl cellulose to hydroxypropyl methyl cellulose in the coating formulation and coating level (% weight gain) were optimized with a 3(2) full factorial design. Lag time prior to drug release and cumulative percentage drug release in 7h were selected as responses. Results revealed that both, the coating composition and coating level, are significant factors affecting drug release profile. A second-order polynomial equation fitted to the data was used to predict the responses in the optimal region. The optimized formulation prepared according to computer-determined levels provided a release profile, which was close to the predicted values. The proposed mathematical model is found to be robust and accurate for optimization of time-lagged coating formulations for programmable pulsatile release of ranitidine hydrochloride, consistent with the demands of nocturnal acid breakthrough.
Zan, X Z; Liu, W B; Hu, M X; Shen, L Z
2016-12-19
A salient problem in translational genomics is the use of gene regulatory networks to determine therapeutic intervention strategies. Theoretically, in a complete network, the optimal policy performs better than the suboptimal policy. However, this theory may not hold if we intervene in a system based on a control policy derived from imprecise inferred networks, especially in the small-sample scenario. In this paper, we compare the performance of the unconstrained (UC) policy with that of the mean-first-passage-time (MFPT) policy in terms of the quality of the determined control gene and the effectiveness of the policy. Our simulation results reveal that the quality of the control gene determined by the robust MFPT policy is better in the small-sample scenario, whereas the sensitive UC policy performs better in the large-sample scenario. Furthermore, given the same control gene, the MFPT policy is more efficient than the UC policy for the small-sample scenario. Owing to these two features, the MFPT policy performs better in the small-sample scenario and the UC policy performs better only in the large-sample scenario. Additionally, using a relatively complex model (gene number N is more than 1) is beneficial for the intervention process, especially for the sensitive UC policy.
Niwas, Ram; Osama, Khwaja; Khan, Saif; Haque, Shafiul; Tripathi, C. K. M.; Mishra, B. N.
2015-01-01
Cholesterol oxidase (COD) is a bi-functional FAD-containing oxidoreductase which catalyzes the oxidation of cholesterol into 4-cholesten-3-one. The wider biological functions and clinical applications of COD have urged the screening, isolation and characterization of newer microbes from diverse habitats as a source of COD and optimization and over-production of COD for various uses. The practicability of statistical/ artificial intelligence techniques, such as response surface methodology (RSM), artificial neural network (ANN) and genetic algorithm (GA) have been tested to optimize the medium composition for the production of COD from novel strain Streptomyces sp. NCIM 5500. All experiments were performed according to the five factor central composite design (CCD) and the generated data was analysed using RSM and ANN. GA was employed to optimize the models generated by RSM and ANN. Based upon the predicted COD concentration, the model developed with ANN was found to be superior to the model developed with RSM. The RSM-GA approach predicted maximum of 6.283 U/mL COD production, whereas the ANN-GA approach predicted a maximum of 9.93 U/mL COD concentration. The optimum concentrations of the medium variables predicted through ANN-GA approach were: 1.431 g/50 mL soybean, 1.389 g/50 mL maltose, 0.029 g/50 mL MgSO4, 0.45 g/50 mL NaCl and 2.235 ml/50 mL glycerol. The experimental COD concentration was concurrent with the GA predicted yield and led to 9.75 U/mL COD production, which was nearly two times higher than the yield (4.2 U/mL) obtained with the un-optimized medium. This is the very first time we are reporting the statistical versus artificial intelligence based modeling and optimization of COD production by Streptomyces sp. NCIM 5500. PMID:26368924
Li, Y.; Kirchengast, G.; Scherllin-Pirscher, B.; Norman, R.; Yuan, Y. B.; Fritzer, J.; Schwaerz, M.; Zhang, K.
2015-01-01
We introduce a new dynamic statistical optimization algorithm to initialize ionosphere-corrected bending angles of Global Navigation Satellite System (GNSS) based radio occultation (RO) measurements. The new algorithm estimates background and observation error covariance matrices with geographically-varying uncertainty profiles and realistic global-mean correlation matrices. The error covariance matrices estimated by the new approach are more accurate and realistic than in simplified existing approaches and can therefore be used in statistical optimization to provide optimal bending angle profiles for high-altitude initialization of the subsequent Abel transform retrieval of refractivity. The new algorithm is evaluated against the existing Wegener Center Occultation Processing System version 5.6 (OPSv5.6) algorithm, using simulated data on two test days from January and July 2008 and real observed CHAMP and COSMIC measurements from the complete months of January and July 2008. The following is achieved for the new method's performance compared to OPSv5.6: (1) significant reduction in random errors (standard deviations) of optimized bending angles down to about two-thirds of their size or more; (2) reduction of the systematic differences in optimized bending angles for simulated MetOp data; (3) improved retrieval of refractivity and temperature profiles; (4) produces realistically estimated global-mean correlation matrices and realistic uncertainty fields for the background and observations. Overall the results indicate high suitability for employing the new dynamic approach in the processing of long-term RO data into a reference climate record, leading to well characterized and high-quality atmospheric profiles over the entire stratosphere.
Li, Y.; Kirchengast, G.; Scherllin-Pirscher, B.; Norman, R.; Yuan, Y. B.; Fritzer, J.; Schwaerz, M.; Zhang, K.
2015-08-01
We introduce a new dynamic statistical optimization algorithm to initialize ionosphere-corrected bending angles of Global Navigation Satellite System (GNSS)-based radio occultation (RO) measurements. The new algorithm estimates background and observation error covariance matrices with geographically varying uncertainty profiles and realistic global-mean correlation matrices. The error covariance matrices estimated by the new approach are more accurate and realistic than in simplified existing approaches and can therefore be used in statistical optimization to provide optimal bending angle profiles for high-altitude initialization of the subsequent Abel transform retrieval of refractivity. The new algorithm is evaluated against the existing Wegener Center Occultation Processing System version 5.6 (OPSv5.6) algorithm, using simulated data on two test days from January and July 2008 and real observed CHAllenging Minisatellite Payload (CHAMP) and Constellation Observing System for Meteorology, Ionosphere, and Climate (COSMIC) measurements from the complete months of January and July 2008. The following is achieved for the new method's performance compared to OPSv5.6: (1) significant reduction of random errors (standard deviations) of optimized bending angles down to about half of their size or more; (2) reduction of the systematic differences in optimized bending angles for simulated MetOp data; (3) improved retrieval of refractivity and temperature profiles; and (4) realistically estimated global-mean correlation matrices and realistic uncertainty fields for the background and observations. Overall the results indicate high suitability for employing the new dynamic approach in the processing of long-term RO data into a reference climate record, leading to well-characterized and high-quality atmospheric profiles over the entire stratosphere.
Directory of Open Access Journals (Sweden)
Y. Li
2015-01-01
Full Text Available We introduce a new dynamic statistical optimization algorithm to initialize ionosphere-corrected bending angles of Global Navigation Satellite System (GNSS based radio occultation (RO measurements. The new algorithm estimates background and observation error covariance matrices with geographically-varying uncertainty profiles and realistic global-mean correlation matrices. The error covariance matrices estimated by the new approach are more accurate and realistic than in simplified existing approaches and can therefore be used in statistical optimization to provide optimal bending angle profiles for high-altitude initialization of the subsequent Abel transform retrieval of refractivity. The new algorithm is evaluated against the existing Wegener Center Occultation Processing System version 5.6 (OPSv5.6 algorithm, using simulated data on two test days from January and July 2008 and real observed CHAMP and COSMIC measurements from the complete months of January and July 2008. The following is achieved for the new method's performance compared to OPSv5.6: (1 significant reduction in random errors (standard deviations of optimized bending angles down to about two-thirds of their size or more; (2 reduction of the systematic differences in optimized bending angles for simulated MetOp data; (3 improved retrieval of refractivity and temperature profiles; (4 produces realistically estimated global-mean correlation matrices and realistic uncertainty fields for the background and observations. Overall the results indicate high suitability for employing the new dynamic approach in the processing of long-term RO data into a reference climate record, leading to well characterized and high-quality atmospheric profiles over the entire stratosphere.
Pathak, Lakshmi; Singh, Vineeta; Niwas, Ram; Osama, Khwaja; Khan, Saif; Haque, Shafiul; Tripathi, C K M; Mishra, B N
2015-01-01
Cholesterol oxidase (COD) is a bi-functional FAD-containing oxidoreductase which catalyzes the oxidation of cholesterol into 4-cholesten-3-one. The wider biological functions and clinical applications of COD have urged the screening, isolation and characterization of newer microbes from diverse habitats as a source of COD and optimization and over-production of COD for various uses. The practicability of statistical/ artificial intelligence techniques, such as response surface methodology (RSM), artificial neural network (ANN) and genetic algorithm (GA) have been tested to optimize the medium composition for the production of COD from novel strain Streptomyces sp. NCIM 5500. All experiments were performed according to the five factor central composite design (CCD) and the generated data was analysed using RSM and ANN. GA was employed to optimize the models generated by RSM and ANN. Based upon the predicted COD concentration, the model developed with ANN was found to be superior to the model developed with RSM. The RSM-GA approach predicted maximum of 6.283 U/mL COD production, whereas the ANN-GA approach predicted a maximum of 9.93 U/mL COD concentration. The optimum concentrations of the medium variables predicted through ANN-GA approach were: 1.431 g/50 mL soybean, 1.389 g/50 mL maltose, 0.029 g/50 mL MgSO4, 0.45 g/50 mL NaCl and 2.235 ml/50 mL glycerol. The experimental COD concentration was concurrent with the GA predicted yield and led to 9.75 U/mL COD production, which was nearly two times higher than the yield (4.2 U/mL) obtained with the un-optimized medium. This is the very first time we are reporting the statistical versus artificial intelligence based modeling and optimization of COD production by Streptomyces sp. NCIM 5500.
Robust optimization of the output voltage of nanogenerators by statistical design of experiments
Song, Jinhui
2010-09-01
Nanogenerators were first demonstrated by deflecting aligned ZnO nanowires using a conductive atomic force microscopy (AFM) tip. The output of a nanogenerator is affected by three parameters: tip normal force, tip scanning speed, and tip abrasion. In this work, systematic experimental studies have been carried out to examine the combined effects of these three parameters on the output, using statistical design of experiments. A statistical model has been built to analyze the data and predict the optimal parameter settings. For an AFM tip of cone angle 70° coated with Pt, and ZnO nanowires with a diameter of 50 nm and lengths of 600 nm to 1 μm, the optimized parameters for the nanogenerator were found to be a normal force of 137 nN and scanning speed of 40 μm/s, rather than the conventional settings of 120 nN for the normal force and 30 μm/s for the scanning speed. A nanogenerator with the optimized settings has three times the average output voltage of one with the conventional settings. © 2010 Tsinghua University Press and Springer-Verlag Berlin Heidelberg.
Directory of Open Access Journals (Sweden)
Himanshu Bhatt
2014-01-01
Full Text Available The purpose of the research was to present Budesonide (BUD as a novel formulation showing improved aqueous solubility, which may decrease variability in Cmax and Tmax found in inflammatory bowel disease (IBD patients, and drug targeting to colon. To improve aqueous solubility, solid dispersion (SD of the BUD with poloxamer 188 was prepared by melting method. Physical characterization of solid dispersion was performed. The SD was used to prepare tablet equivalent to 9 mg of BUD. The tablet was coated with enteric polymers Eudragit S100 and Eudragit L100 to target colon. The ratio of polymers and percentage coating was optimized using statistical design. Variables studied in design were ratio of enteric polymers and the effect of percentage coating on in vitro drug release. Dissolution at different pH showed that drug release in colon could be modified by optimizing the ratio of polymers and percentage coating. The dissolution data showed that the percentage coating and ratio of polymers are very important to get lag time and optimum formulation. The optimized batch from statistical design was kept under accelerated condition for three months. After accelerated stability study, there was no significant change in the drug release.
Optimization of biohydrogen production from sweet sorghum syrup using statistical methods
Energy Technology Data Exchange (ETDEWEB)
Saraphirom, Piyawadee [Department of Biology, Faculty of Science and Technology, Rajabhat Maha Sarakham University, A.Muang, Maha Sarakham 44000 (Thailand); Department of Biotechnology, Faculty of Technology, Khon Kaen University, A. Muang, Khon Kaen 40002 (Thailand); Reungsang, Alissara [Department of Biotechnology, Faculty of Technology, Khon Kaen University, A. Muang, Khon Kaen 40002 (Thailand); Fermentation Research Center for Value Added of Agricultural Products, Faculty of Technology, Khon Kaen University, A. Muang, Khon Kaen 40002 (Thailand)
2010-12-15
This study employed statistically based experimental designs to optimize fermentation conditions for hydrogen production from sweet sorghum syrup by anaerobic mixed cultures. Initial screening of important factors influencing hydrogen production, i.e., total sugar, initial pH, nutrient solution, iron (II) sulphate (FeSO{sub 4}), peptone and sodium bicarbonate was conducted by the Plackett-Burman method. Results indicated that only FeSO{sub 4} had statistically significant (P {<=} 0.005) influences on specific hydrogen production (P{sub s}) while total sugar and initial pH had an interdependent effect on P{sub s}. Optimal conditions for the maximal P{sub s} were 25 g/L total sugar, 4.75 initial pH and 1.45 g/L FeSO{sub 4} in which P{sub s} of 6897 mL H{sub 2}/L was estimated. Estimated optimum conditions revealed only 0.04% difference from the actual P{sub s} of 6864 mL H{sub 2}/L which suggested that the optimal conditions obtained can be practically applied to produce hydrogen from sweet sorghum syrup with the least error. (author)
Energy Technology Data Exchange (ETDEWEB)
Ramimoghadam, Donya [Nanotechnology & Catalysis Research Centre (NANOCAT), IPS Building, University of Malaya, 50603 Kuala Lumpur (Malaysia); Bagheri, Samira, E-mail: samira_bagheri@edu.um.my [Nanotechnology & Catalysis Research Centre (NANOCAT), IPS Building, University of Malaya, 50603 Kuala Lumpur (Malaysia); Yousefi, Amin Termeh [ChECA IKohza, Department of Environmental & Green Technology (EGT), Malaysia Japan International Institute of Technology (MJIIT), University Technology Malaysia - UTM, Kuala Lumpur (Malaysia); Abd Hamid, Sharifah Bee [Nanotechnology & Catalysis Research Centre (NANOCAT), IPS Building, University of Malaya, 50603 Kuala Lumpur (Malaysia)
2015-11-01
In this study, nanomagnetite particles have been successfully prepared via the coprecipitation method. The effect of the key explanatory variables on the saturation magnetization of synthetic nanomagnetite particles was investigated using the response surface methodology (RSM). The correlation of the involved parameters with the growth process was examined by employing the central composite design method through designating set up experiments that will determine the interaction of the variables. The vibrating sample magnetometer (VSM) was used to confirm the statistical analysis. Furthermore, the regression analysis monitors the priority of the variables' influence on the saturation magnetization of nanomagnetite particles by developing the statistical model of the saturation magnetization. According to the investigated model, the highest interaction of variable belongs to the pH and temperature with the optimized condition of 9–11, and 75–85 °C, respectively. The response obtained by VSM suggests that the saturation magnetization of nanomagnetite particles can be controlled by restricting the effective parameters. - Highlights: • Nanomagnetite particles have been prepared via the coprecipitation method. • Effects of key variables on M{sub s} of synthetic nanomagnetite investigated by RSM. • The VSM was used to confirm the statistical analysis. • Optimized condition belongs to pH of 9–11, and temperature of 75–85 °C.
Garelli, F M; Espinosa, M O; Gürtler, R E
2012-05-01
Understanding the processes that affect Aedes aegypti (L.) (Diptera: Culicidae) may serve as a starting point to create and/or improve vector control strategies. For this purpose, we performed statistical modeling of three entomological surveys conducted in Clorinda City, northern Argentina. Previous 'basic' models of presence or absence of larvae and/or pupae (infestation) and the number of pupae in infested containers (productivity), mainly based on physical characteristics of containers, were expanded to include variables selected a priori reflecting water use practices, vector-related context factors, the history of chemical control, and climate. Model selection was performed using Akaike's Information Criterion. In total, 5,431 water-holding containers were inspected and 12,369 Ae. aegypti pupae collected from 963 positive containers. Large tanks were the most productive container type. Variables reflecting every putative process considered, except for history of chemical control, were selected in the best models obtained for infestation and productivity. The associations found were very strong, particularly in the case of infestation. Water use practices and vector-related context factors were the most important ones, as evidenced by their impact on Akaike's Information Criterion scores of the infestation model. Risk maps based on empirical data and model predictions showed a heterogeneous distribution of entomological risk. An integrated vector control strategy is recommended, aiming at community participation for healthier water use practices and targeting large tanks for key elements such as lid status, water addition frequency and water use.
Directory of Open Access Journals (Sweden)
Andrew Nosakhare Amenaghawon
2014-07-01
Full Text Available Response surface methodology (RSM was employed for the analysis of the simultaneous effect of acid concentration, pretreatment time and temperature on the total reducing sugar concentration obtained during acid hydrolysis of corn stover. A three-variable, three-level Box-Behnken design (BBD was used to develop a statistical model for the optimization of the process variables. The optimal hydrolysis conditions that resulted in the maximum total reducing sugar concentration were acid concentration; 1.72 % (w/w, temperature; 169.260C and pretreatment time; 48.73 minutes. Under these conditions, the total reducing sugar concentration was obtained to be 23.41g/L. Validation of the model indicated no difference between predicted and observed values.
Optimizing Friction Stir Welding via Statistical Design of Tool Geometry and Process Parameters
Blignault, C.; Hattingh, D. G.; James, M. N.
2012-06-01
This article considers optimization procedures for friction stir welding (FSW) in 5083-H321 aluminum alloy, via control of weld process parameters and tool design modifications. It demonstrates the potential utility of the "force footprint" (FF) diagram in providing a real-time graphical user interface (GUI) for process optimization of FSW. Multiple force, torque, and temperature responses were recorded during FS welding using 24 different tool pin geometries, and these data were statistically analyzed to determine the relative influence of a number of combinations of important process and tool geometry parameters on tensile strength. Desirability profile charts are presented, which show the influence of seven key combinations of weld process variables on tensile strength. The model developed in this study allows the weld tensile strength to be predicted for other combinations of tool geometry and process parameters to fall within an average error of 13%. General guidelines for tool profile selection and the likelihood of influencing weld tensile strength are also provided.
Mohajeri, Leila; Aziz, Hamidi Abdul; Isa, Mohamed Hasnain; Zahed, Mohammad Ali
2010-02-01
This work studied the bioremediation of weathered crude oil (WCO) in coastal sediment samples using central composite face centered design (CCFD) under response surface methodology (RSM). Initial oil concentration, biomass, nitrogen and phosphorus concentrations were used as independent variables (factors) and oil removal as dependent variable (response) in a 60 days trial. A statistically significant model for WCO removal was obtained. The coefficient of determination (R(2)=0.9732) and probability value (P<0.0001) demonstrated significance for the regression model. Numerical optimization based on desirability function were carried out for initial oil concentration of 2, 16 and 30 g per kg sediment and 83.13, 78.06 and 69.92 per cent removal were observed respectively, compare to 77.13, 74.17 and 69.87 per cent removal for un-optimized results.
Directory of Open Access Journals (Sweden)
Clarich Alberto
2005-01-01
Full Text Available The purpose of this work is to optimize the stator shape of an axial compressor, in order to maximize the global efficiency of the machine, fixing the rotor shape. We have used a 3D parametric mesh and the CFX-Tascflow code for the flow simulation. To find out the most important variables in this problem, we have run a preliminary series of designs, whose results have been analyzed by a statistic tool. This analysis has helped us to choose the most appropriate variables and their ranges in order to implement the optimization algorithm more efficiently and rapidly. For the simulation of the fluid flow through the machine, we have used a cluster of 12 processors.
Statistical mechanics of the inverse Ising problem and the optimal objective function
Berg, Johannes
2017-08-01
The inverse Ising problem seeks to reconstruct the parameters of an Ising Hamiltonian on the basis of spin configurations sampled from the Boltzmann measure. Over the last decade, many applications of the inverse Ising problem have arisen, driven by the advent of large-scale data across different scientific disciplines. Recently, strategies to solve the inverse Ising problem based on convex optimisation have proven to be very successful. These approaches maximise particular objective functions with respect to the model parameters. Examples are the pseudolikelihood method and interaction screening. In this paper, we establish a link between approaches to the inverse Ising problem based on convex optimisation and the statistical physics of disordered systems. We characterise the performance of an arbitrary objective function and calculate the objective function which optimally reconstructs the model parameters. We evaluate the optimal objective function within a replica-symmetric ansatz and compare the results of the optimal objective function with other reconstruction methods. Apart from giving a theoretical underpinning to solving the inverse Ising problem by convex optimisation, the optimal objective function outperforms state-of-the-art methods, albeit by a small margin.
Supersonic acoustic intensity with statistically optimized near-field acoustic holography
DEFF Research Database (Denmark)
Fernandez Grande, Efren; Jacobsen, Finn
2011-01-01
and circulating energy in the near-field of the source. This quantity is of concern because it makes it possible to identify the regions of a source that contribute to the far field radiation, which is often the ultimate concern in noise control. Therefore, this is a very useful analysis tool complementary...... to the information provided by the near-field acoustic holography technique. This study proposes a version of the supersonic acoustic intensity applied to statistically optimized near-field acoustic holography (SONAH). The theory, numerical results and an experimental study are presented. The possibility of using...
DEFF Research Database (Denmark)
Stefani, Alessio; Nielsen, Kristian; Rasmussen, Henrik K.
2012-01-01
We fabricated an electronically controlled polymer optical fiber cleaver, which uses a razor-blade guillotine and provides independent control of fiber temperature, blade temperature, and cleaving speed. To determine the optimum cleaving conditions of microstructured polymer optical fibers (m......POFs) with hexagonal hole structures we developed a program for cleaving quality optimization, which reads in a microscope image of the fiber end-facet and determines the core-shift and the statistics of the hole diameter, hole-to-hole pitch, hole ellipticity, and direction of major ellipse axis. For 125μm in diameter...
Directory of Open Access Journals (Sweden)
Aristides T Hatjimihail
Full Text Available BACKGROUND: An open problem in clinical chemistry is the estimation of the optimal sampling time intervals for the application of statistical quality control (QC procedures that are based on the measurement of control materials. This is a probabilistic risk assessment problem that requires reliability analysis of the analytical system, and the estimation of the risk caused by the measurement error. METHODOLOGY/PRINCIPAL FINDINGS: Assuming that the states of the analytical system are the reliability state, the maintenance state, the critical-failure modes and their combinations, we can define risk functions based on the mean time of the states, their measurement error and the medically acceptable measurement error. Consequently, a residual risk measure rr can be defined for each sampling time interval. The rr depends on the state probability vectors of the analytical system, the state transition probability matrices before and after each application of the QC procedure and the state mean time matrices. As optimal sampling time intervals can be defined those minimizing a QC related cost measure while the rr is acceptable. I developed an algorithm that estimates the rr for any QC sampling time interval of a QC procedure applied to analytical systems with an arbitrary number of critical-failure modes, assuming any failure time and measurement error probability density function for each mode. Furthermore, given the acceptable rr, it can estimate the optimal QC sampling time intervals. CONCLUSIONS/SIGNIFICANCE: It is possible to rationally estimate the optimal QC sampling time intervals of an analytical system to sustain an acceptable residual risk with the minimum QC related cost. For the optimization the reliability analysis of the analytical system and the risk analysis of the measurement error are needed.
Sun, Hongyue; Luo, Shuai; Jin, Ran; He, Zhen
2017-07-01
Mathematical modeling is an important tool to investigate the performance of microbial fuel cell (MFC) towards its optimized design. To overcome the shortcoming of traditional MFC models, an ensemble model is developed through integrating both engineering model and statistical analytics for the extrapolation scenarios in this study. Such an ensemble model can reduce laboring effort in parameter calibration and require fewer measurement data to achieve comparable accuracy to traditional statistical model under both the normal and extreme operation regions. Based on different weight between current generation and organic removal efficiency, the ensemble model can give recommended input factor settings to achieve the best current generation and organic removal efficiency. The model predicts a set of optimal design factors for the present tubular MFCs including the anode flow rate of 3.47 mL min-1, organic concentration of 0.71 g L-1, and catholyte pumping flow rate of 14.74 mL min-1 to achieve the peak current at 39.2 mA. To maintain 100% organic removal efficiency, the anode flow rate and organic concentration should be controlled lower than 1.04 mL min-1 and 0.22 g L-1, respectively. The developed ensemble model can be potentially modified to model other types of MFCs or bioelectrochemical systems.
Takabe, Satoshi; Hukushima, Koji
2016-05-01
Typical behavior of the linear programming (LP) problem is studied as a relaxation of the minimum vertex cover (min-VC), a type of integer programming (IP) problem. A lattice-gas model on the Erdös-Rényi random graphs of α -uniform hyperedges is proposed to express both the LP and IP problems of the min-VC in the common statistical mechanical model with a one-parameter family. Statistical mechanical analyses reveal for α =2 that the LP optimal solution is typically equal to that given by the IP below the critical average degree c =e in the thermodynamic limit. The critical threshold for good accuracy of the relaxation extends the mathematical result c =1 and coincides with the replica symmetry-breaking threshold of the IP. The LP relaxation for the minimum hitting sets with α ≥3 , minimum vertex covers on α -uniform random graphs, is also studied. Analytic and numerical results strongly suggest that the LP relaxation fails to estimate optimal values above the critical average degree c =e /(α -1 ) where the replica symmetry is broken.
Manning, Robert M.
1990-01-01
A static and dynamic rain-attenuation model is presented which describes the statistics of attenuation on an arbitrarily specified satellite link for any location for which there are long-term rainfall statistics. The model may be used in the design of the optimal stochastic control algorithms to mitigate the effects of attenuation and maintain link reliability. A rain-statistics data base is compiled, which makes it possible to apply the model to any location in the continental U.S. with a resolution of 0-5 degrees in latitude and longitude. The model predictions are compared with experimental observations, showing good agreement.
Forward and backward inference in spatial cognition.
Directory of Open Access Journals (Sweden)
Will D Penny
Full Text Available This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of 'lower-level' computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus.
Directory of Open Access Journals (Sweden)
Lakshmi Pathak
Full Text Available Cholesterol oxidase (COD is a bi-functional FAD-containing oxidoreductase which catalyzes the oxidation of cholesterol into 4-cholesten-3-one. The wider biological functions and clinical applications of COD have urged the screening, isolation and characterization of newer microbes from diverse habitats as a source of COD and optimization and over-production of COD for various uses. The practicability of statistical/ artificial intelligence techniques, such as response surface methodology (RSM, artificial neural network (ANN and genetic algorithm (GA have been tested to optimize the medium composition for the production of COD from novel strain Streptomyces sp. NCIM 5500. All experiments were performed according to the five factor central composite design (CCD and the generated data was analysed using RSM and ANN. GA was employed to optimize the models generated by RSM and ANN. Based upon the predicted COD concentration, the model developed with ANN was found to be superior to the model developed with RSM. The RSM-GA approach predicted maximum of 6.283 U/mL COD production, whereas the ANN-GA approach predicted a maximum of 9.93 U/mL COD concentration. The optimum concentrations of the medium variables predicted through ANN-GA approach were: 1.431 g/50 mL soybean, 1.389 g/50 mL maltose, 0.029 g/50 mL MgSO4, 0.45 g/50 mL NaCl and 2.235 ml/50 mL glycerol. The experimental COD concentration was concurrent with the GA predicted yield and led to 9.75 U/mL COD production, which was nearly two times higher than the yield (4.2 U/mL obtained with the un-optimized medium. This is the very first time we are reporting the statistical versus artificial intelligence based modeling and optimization of COD production by Streptomyces sp. NCIM 5500.
Directory of Open Access Journals (Sweden)
Baljinder Kaur
2015-01-01
Full Text Available In the present study, the biobleaching potential of white rot fungus Cordyceps militaris MTCC3936 was investigated. For preliminary screening, decolorization properties of C. militaris were comparatively studied using whole cells in agar-based and liquid culture systems. Preliminary investigation in liquid culture systems revealed 100% decolorization achieved within 3 days of incubation for reactive yellow 18, 6 days for reactive red 31, 7 days for reactive black 8, and 11 days for reactive green 19 and reactive red 74. RSM was further used to study the effect of three independent variables such as pH, incubation time, and concentration of dye on decolorization properties of cell free supernatant of C. militaris. RSM based statistical analysis revealed that dye decolorization by cell free supernatants of C. militaris is more efficient than whole cell based system. The optimized conditions for decolorization of synthetic dyes were identified as dye concentration of 300 ppm, incubation time of 48 h, and optimal pH value as 5.5, except for reactive red 31 (for which the model was nonsignificant. The maximum dye decolorizations achieved under optimized conditions for reactive yellow 18, reactive green 19, reactive red 74, and reactive black 8 were 73.07, 65.36, 55.37, and 68.59%, respectively.
Kaur, Baljinder; Kumar, Balvir; Kaur, Navneet
2015-01-01
In the present study, the biobleaching potential of white rot fungus Cordyceps militaris MTCC3936 was investigated. For preliminary screening, decolorization properties of C. militaris were comparatively studied using whole cells in agar-based and liquid culture systems. Preliminary investigation in liquid culture systems revealed 100% decolorization achieved within 3 days of incubation for reactive yellow 18, 6 days for reactive red 31, 7 days for reactive black 8, and 11 days for reactive green 19 and reactive red 74. RSM was further used to study the effect of three independent variables such as pH, incubation time, and concentration of dye on decolorization properties of cell free supernatant of C. militaris. RSM based statistical analysis revealed that dye decolorization by cell free supernatants of C. militaris is more efficient than whole cell based system. The optimized conditions for decolorization of synthetic dyes were identified as dye concentration of 300 ppm, incubation time of 48 h, and optimal pH value as 5.5, except for reactive red 31 (for which the model was nonsignificant). The maximum dye decolorizations achieved under optimized conditions for reactive yellow 18, reactive green 19, reactive red 74, and reactive black 8 were 73.07, 65.36, 55.37, and 68.59%, respectively. PMID:25722980
Hoell, Simon; Omenzetter, Piotr
2017-04-01
The increasing demand for carbon neutral energy in a challenging economic environment is a driving factor for erecting ever larger wind turbines in harsh environments using novel wind turbine blade (WTBs) designs characterized by high flexibilities and lower buckling capacities. To counteract resulting increasing of operation and maintenance costs, efficient structural health monitoring systems can be employed to prevent dramatic failures and to schedule maintenance actions according to the true structural state. This paper presents a novel methodology for classifying structural damages using vibrational responses from a single sensor. The method is based on statistical classification using Bayes' theorem and an advanced statistic, which allows controlling the performance by varying the number of samples which represent the current state. This is done for multivariate damage sensitive features defined as partial autocorrelation coefficients (PACCs) estimated from vibrational responses and principal component analysis scores from PACCs. Additionally, optimal DSFs are composed not only for damage classification but also for damage detection based on binary statistical hypothesis testing, where features selections are found with a fast forward procedure. The method is applied to laboratory experiments with a small scale WTB with wind-like excitation and non-destructive damage scenarios. The obtained results demonstrate the advantages of the proposed procedure and are promising for future applications of vibration-based structural health monitoring in WTBs.
Institute of Scientific and Technical Information of China (English)
Li Wenbing; Yu Longjiang; Zhou Pengpeng
2006-01-01
The culture of Magnetospirillum magneticum WM-1 depends on several control factors that have great effect on the magnetic cells concentration. Investigation into the optimal culture conditions needs a large number of experiments. So it is desirable to minimize the number of experiments and maximize the information gained from them. The orthogonal design of experiments and mathematical statistical method are considered as effective methods to optimize the culture condition of magnetotactic bacteria WM-1 for high magnetic cells concentration. The effects of the four factors, such as pH value of medium, oxygen concentration of gas phase in the serum bottle, C:C (mtartaric acid: msuccinic acid) ratio and NaNO3 concentration, are simultaneously investigated by only sixteen experiments through the orthogonal design L16(44) method. The optimal culture condition is obtained. At the optimal culture condition ( pH 7.0, an oxygen concentration 4.0%, C: C (mtartaric acid:msuccinic acid) ratio 1:2 and NaNO3 100 mg l-1), the magnetic cells concentration is promoted to 6.5×107 cells ml-1, approximately 8.3% higher than that under the initial conditions. The pH value of medium is a very important factor for magnetic cells concentration. It can be proved that the orthogonal design of experiment is of 90% confidence. Ferric iron uptake follows Michaelis-Menten kinetics with a Km of 2.5 μM and a Vmax of 0.83 min-1.
Haque, Shafiul; Khan, Saif; Wahid, Mohd; Dar, Sajad A; Soni, Nipunjot; Mandal, Raju K; Singh, Vineeta; Tiwari, Dileep; Lohani, Mohtashim; Areeshi, Mohammed Y; Govender, Thavendran; Kruger, Hendrik G; Jawed, Arshad
2016-01-01
For a commercially viable recombinant intracellular protein production process, efficient cell lysis and protein release is a major bottleneck. The recovery of recombinant protein, cholesterol oxidase (COD) was studied in a continuous bead milling process. A full factorial response surface methodology (RSM) design was employed and compared to artificial neural networks coupled with genetic algorithm (ANN-GA). Significant process variables, cell slurry feed rate (A), bead load (B), cell load (C), and run time (D), were investigated and optimized for maximizing COD recovery. RSM predicted an optimum of feed rate of 310.73 mL/h, bead loading of 79.9% (v/v), cell loading OD600nm of 74, and run time of 29.9 min with a recovery of ~3.2 g/L. ANN-GA predicted a maximum COD recovery of ~3.5 g/L at an optimum feed rate (mL/h): 258.08, bead loading (%, v/v): 80%, cell loading (OD600nm): 73.99, and run time of 32 min. An overall 3.7-fold increase in productivity is obtained when compared to a batch process. Optimization and comparison of statistical vs. artificial intelligence techniques in continuous bead milling process has been attempted for the very first time in our study. We were able to successfully represent the complex non-linear multivariable dependence of enzyme recovery on bead milling parameters. The quadratic second order response functions are not flexible enough to represent such complex non-linear dependence. ANN being a summation function of multiple layers are capable to represent complex non-linear dependence of variables in this case; enzyme recovery as a function of bead milling parameters. Since GA can even optimize discontinuous functions present study cites a perfect example of using machine learning (ANN) in combination with evolutionary optimization (GA) for representing undefined biological functions which is the case for common industrial processes involving biological moieties.
STATISTICAL INFERENCE FOR CONTAMINATION DISTRIBUTION*
Institute of Scientific and Technical Information of China (English)
WU Shanshan; MIAO Baiqi
2001-01-01
For the contamination distribution model F (x) - (1 - α) F1 (x) + cF2 (x), theestimators of c and F1 (x) are studied when F2 (x) is known, and the strong consistencyof the two estimators is proved. Then the rate of consistency of estimator ^α and a testingproblem are discussed.
Statistical inference for Cox processes
DEFF Research Database (Denmark)
Møller, Jesper; Waagepetersen, Rasmus Plenge
2002-01-01
and space-time, spatial and spatio-temporal process modelling, nonparametric methods for clustering, and spatio-temporal cluster modelling. Many figures, some in full color, complement the text, and a single section of references cited makes it easy to locate source material. Leading specialists...
Statistical Arbitrage and Optimal Trading with Transaction Costs in Futures Markets
Tsagaris, Theodoros
2008-01-01
We consider the Brownian market model and the problem of expected utility maximization of terminal wealth. We, specifically, examine the problem of maximizing the utility of terminal wealth under the presence of transaction costs of a fund/agent investing in futures markets. We offer some preliminary remarks about statistical arbitrage strategies and we set the framework for futures markets, and introduce concepts such as margin, gearing and slippage. The setting is of discrete time, and the price evolution of the futures prices is modelled as discrete random sequence involving Ito's sums. We assume the drift and the Brownian motion driving the return process are non-observable and the transaction costs are represented by the bid-ask spread. We provide explicit solution to the optimal portfolio process, and we offer an example using logarithmic utility.
Secure Wireless Communication and Optimal Power Control under Statistical Queueing Constraints
Qiao, Deli; Velipasalar, Senem
2010-01-01
In this paper, secure transmission of information over fading broadcast channels is studied in the presence of statistical queueing constraints. Effective capacity is employed as a performance metric to identify the secure throughput of the system, i.e., effective secure throughput. It is assumed that perfect channel side information (CSI) is available at both the transmitter and the receivers. Initially, the scenario in which the transmitter sends common messages to two receivers and confidential messages to one receiver is considered. For this case, effective secure throughput region, which is the region of constant arrival rates of common and confidential messages that can be supported by the buffer-constrained transmitter and fading broadcast channel, is defined. It is proven that this effective throughput region is convex. Then, the optimal power control policies that achieve the boundary points of the effective secure throughput region are investigated and an algorithm for the numerical computation of t...
Andronov, I. L.; Chinarova, L. L.; Kudashkina, L. S.; Marsakova, V. I.; Tkachenko, M. G.
2016-06-01
We have elaborated a set of new algorithms and programs for advanced time series analysis of (generally) multi-component multi-channel observations with irregularly spaced times of observations, which is a common case for large photometric surveys. Previous self-review on these methods for periodogram, scalegram, wavelet, autocorrelation analysis as well as on "running" or "sub-interval" local approximations were self-reviewed in (2003ASPC..292..391A). For an approximation of the phase light curves of nearly-periodic pulsating stars, we use a Trigonometric Polynomial (TP) fit of the statistically optimal degree and initial period improvement using differential corrections (1994OAP.....7...49A). For the determination of parameters of "characteristic points" (minima, maxima, crossings of some constant value etc.) we use a set of methods self-reviewed in 2005ASPC..335...37A, Results of the analysis of the catalogs compiled using these programs are presented in 2014AASP....4....3A. For more complicated signals, we use "phenomenological approximations" with "special shapes" based on functions defined on sub-intervals rather on the complete interval. E. g. for the Algol-type stars we developed the NAV ("New Algol Variable") algorithm (2012Ap.....55..536A, 2012arXiv1212.6707A, 2015JASS...32..127A), which was compared to common methods of Trigonometric Polynomial Fit (TP) or local Algebraic Polynomial (A) fit of a fixed or (alternately) statistically optimal degree. The method allows determine the minimal set of parameters required for the "General Catalogue of Variable Stars", as well as an extended set of phenomenological and astrophysical parameters which may be used for the classification. Totally more that 1900 variable stars were studied in our group using these methods in a frame of the "Inter-Longitude Astronomy" campaign (2010OAP....23....8A) and the "Ukrainian Virtual Observatory" project (2012KPCB...28...85V).
Navya, P N; Pushpa, S Murthy
2013-08-01
Coffee cherry husk (CH) is one of the major by-products obtained from coffee processing industry and accounts to 43 ± 5.9% of cellulose. Screening of fungal organism for cellulase production was carried out and the potential organism was identified as Rhizopus stolonifer by internal transcribed spacer's (ITS)-5.8S rDNA analysis. A systematic study with response surface methodology (RSM) based on CCRD was used to study the interactions among the variables such as pH (3-7), moisture (40-80%) and progression duration (72-168 h) of the fermentation process to maximize the enzyme production. Under the optimized cultivation condition, R. stolonifer synthesized 22,109 U/gds. Model validations at optimum operating conditions showed excellent agreement between the experimental results and the predicted responses with a confidence level of 95%. Endoglucanase thus produced was utilized for ethanol production by simultaneous saccharification and fermentation and maximum of 65.5 g/L of ethanol was obtained. This fungal cellulase has also reported to be efficient detergent additives and promising for commercial use. The present study demonstrates coffee husk as a significant bioprocess substrate. Statistical optimization with major parameters for cellulase production can be highly applicable for industrial scale. Furthermore, value addition to coffee husk with sustainable waste management leading to environment conservation can be achieved.
Kumar, Animesh; Garg, Tarun; Sarma, Ganti S; Rath, Goutam; Goyal, Amit Kumar
2015-04-01
Migraine is a chronic disorder characterized by significant headache and various associated symptoms which worsen with exertion. Zolmitriptan approved for use in the acute treatment of migraine and related vascular headaches but are limited by high pain recurrence due to rapid drug elimination. Combinationalformulationof triptans and a nonsteroidal anti-inflammatory drug may provide a quicker and longer duration of relief from the subsequent pain during the attack. In this study, we formulate a Zolmitriptan (ZT) & ketorolac tromethamine (KT) loaded thermo reversible in-situ mucoadhesive intranasal gel (TMISG) formulation which gels at the nasal mucosal temperature and contains a bioadhesive polymer (Xyloglucan) that lengthens the residence time will enhance the bioavailability of the combinational drugs. This study uses Box-Behnken design for the first time to develop, optimize the TMISG and assess factors affecting the critical quality attributes. Histopathological study of the nasal mucosa suggested that the formulation was safe for nasal administration. The statistical difference in absolute bioavailability between oral and intranasal route suggested that intranasal route had almost 21% increases in bioavailability for ZT and for KT there was 16% increase over oral formulations. Optimized formulation would help mitigate migraine associated symptoms much better over the currently available formulations.
Gao, Ya; Cheng, Wenchi; Zhang, Hailin
2017-08-23
Energy harvesting, which offers a never-ending energy supply, has emerged as a prominent technology to prolong the lifetime and reduce costs for the battery-powered wireless sensor networks. However, how to improve the energy efficiency while guaranteeing the quality of service (QoS) for energy harvesting based wireless sensor networks is still an open problem. In this paper, we develop statistical delay-bounded QoS-driven power control policies to maximize the effective energy efficiency (EEE), which is defined as the spectrum efficiency under given specified QoS constraints per unit harvested energy, for energy harvesting based wireless sensor networks. For the battery-infinite wireless sensor networks, our developed QoS-driven power control policy converges to the Energy harvesting Water Filling (E-WF) scheme and the Energy harvesting Channel Inversion (E-CI) scheme under the very loose and stringent QoS constraints, respectively. For the battery-finite wireless sensor networks, our developed QoS-driven power control policy becomes the Truncated energy harvesting Water Filling (T-WF) scheme and the Truncated energy harvesting Channel Inversion (T-CI) scheme under the very loose and stringent QoS constraints, respectively. Furthermore, we evaluate the outage probabilities to theoretically analyze the performance of our developed QoS-driven power control policies. The obtained numerical results validate our analysis and show that our developed optimal power control policies can optimize the EEE over energy harvesting based wireless sensor networks.
Jain, S; Srinath, Ms; Narendra, C; Reddy, Sn; Sindhu, A
2010-10-01
The objective of this study was to evaluate the effect of formulation variables on the release properties, floating lag time, and hardness, when developing floating tablets of Ranitidine hydrochloride, by the statistical optimization technique. The formulations were prepared based on 3(2) factorial design, with polymer ratio (HPMC 100 KM: Xanthan gum) and the amount of aerosil, as two independent formulation variables. The four dependent (response) variables considered were: percentage of drug release at the first hour, T(50%) (time taken to release 50% of the drug), floating lag time, and hardness of the tablet. The release profile data was subjected to a curve fitting analysis, to describe the release mechanism of the drug from the floating tablet. An increase in drug release was observed with an increase in the polymer ratio, and as the amount of aerosil increased, the hardness of the tablet also increased, without causing any change in the floating lag time. The desirability function was used to optimize the response variables, each having a different target, and the observed responses were in accordance with the experimental values. The results demonstrate the feasibility of the model in the development of floating tablets containing Ranitidine hydrochloride.
Enhanced Bio-Ethanol Production from Industrial Potato Waste by Statistical Medium Optimization.
Izmirlioglu, Gulten; Demirci, Ali
2015-10-15
Industrial wastes are of great interest as a substrate in production of value-added products to reduce cost, while managing the waste economically and environmentally. Bio-ethanol production from industrial wastes has gained attention because of its abundance, availability, and rich carbon and nitrogen content. In this study, industrial potato waste was used as a carbon source and a medium was optimized for ethanol production by using statistical designs. The effect of various medium components on ethanol production was evaluated. Yeast extract, malt extract, and MgSO₄·7H₂O showed significantly positive effects, whereas KH₂PO₄ and CaCl₂·2H₂O had a significantly negative effect (p-valueindustrial waste potato, 50 g/L malt extract, and 4.84 g/L MgSO₄·7H₂O was found optimal and yielded 24.6 g/L ethanol at 30 °C, 150 rpm, and 48 h of fermentation. In conclusion, this study demonstrated that industrial potato waste can be used effectively to enhance bioethanol production.
Enhanced Bio-Ethanol Production from Industrial Potato Waste by Statistical Medium Optimization
Directory of Open Access Journals (Sweden)
Gulten Izmirlioglu
2015-10-01
Full Text Available Industrial wastes are of great interest as a substrate in production of value-added products to reduce cost, while managing the waste economically and environmentally. Bio-ethanol production from industrial wastes has gained attention because of its abundance, availability, and rich carbon and nitrogen content. In this study, industrial potato waste was used as a carbon source and a medium was optimized for ethanol production by using statistical designs. The effect of various medium components on ethanol production was evaluated. Yeast extract, malt extract, and MgSO4·7H2O showed significantly positive effects, whereas KH2PO4 and CaCl2·2H2O had a significantly negative effect (p-value < 0.05. Using response surface methodology, a medium consisting of 40.4 g/L (dry basis industrial waste potato, 50 g/L malt extract, and 4.84 g/L MgSO4·7H2O was found optimal and yielded 24.6 g/L ethanol at 30 °C, 150 rpm, and 48 h of fermentation. In conclusion, this study demonstrated that industrial potato waste can be used effectively to enhance bioethanol production.
Agarwal, Vaibhav; Bansal, Mayank
2013-08-01
Present work focuses on the use of mimosa seed gum to develop a drug delivery system making combined use of floating and pulsatile principles, for the chrono-prevention of nocturnal acid breakthrough. The desired aim was achieved by fabricating a floating delivery system bearing time - lagged coating of Mimosa pudica seed polymer for the programmed release of Famotidine. Response Surface Methodology was the statistical tool that was employed for experiment designing, mathematical model generation and optimization study. A 3(2) full factorial design was used in designing the experiment.% weight ratio of mimosa gum to hydroxy propyl methyl cellulose in the coating combination and the coating weight were the independent variables, whereas the lag time and the cumulative % drug release in 360 minutes were the observed responses. Results revealed that both the coating composition and the coating weight significantly affected the release of drug from the dosage form. The optimized formulation prepared according to the computer generated software, Design-Expert(®) deciphered response which were in close proximity with the experimental responses, thus confirming the robustness as well as accuracy of the predicted model for the utilization of natural polymer like mimosa seed gum for the chronotherapeutic treatment of nocturnal acid breakthrough.
Directory of Open Access Journals (Sweden)
L. J. Ong
2016-09-01
Full Text Available In this study, the effect of extraction parameters (ethanol concentration, sonication time, and solvent-to-sample ratio on Ficus deltoidea leaves was investigated using ultrasound-assisted extraction by response surface methodology (RSM. Total phenolic content (TPC of F. deltoidea extracts was identified using Folin-Ciocalteu method and expressed in gallic acid equivalent (GAE per g. Box-Behnken statistical design (BBD was the tool used to find the optimal conditions for maximum TPC. Besides, the extraction yield was measured and stated in percentage. The optimized TPC attained was 455.78 mg GAE/g at 64% ethanol concentration, 10 minutes sonication time, and 20 mL/g solvent-to-sample ratio whereas the greatest extraction yield was 33% with ethanol concentration of 70%, sonication time of 40 minutes, and solvent-to-material ratio at 40 mL/g. The determination coefficient, R2, for TPC indicates that 99.5% capriciousness in the response could be clarified by the ANOVA model and the value of 0.9681 of predicted R2 is in equitable agreement with the 0.9890 of adjusted R2. The present study shows that ethanol water as solvent, a short time of 10 minutes, and adequate solvent-to-sample ratio (20 mL/g are the best conditions for extraction.
Design of Off-statistics Axial-flow Fans by Means of Vortex Law Optimization
Institute of Scientific and Technical Information of China (English)
Andrea Lazari; Andrea Cattanei
2014-01-01
Off-statistics input data sets are common in axial-flow fans design and may easily result in some violation of the requirements of a good aerodynamic blade design.In order to circumvent this problem,in the present paper,a solution to the radial equilibrium equation is found which minimizes the outlet kinetic energy and fulfills the aerodynamic constraints,thus ensuring that the resulting blade has acceptable aerodynamic performance.The presented method is based on the optimization of a three-parameters vortex law and of the meridional channel size.The aerodynamic quantities to be employed as constraints are individuated and their suitable ranges of variation are proposed.The method is validated by means of a design with critical input data values and CFD analysis.Then,by means of systematic computations with different input data sets,some correlations and charts are obtained which are analogous to classic correlations based on statistical investigations on existing machines.Such new correlations help size a fan of given characteristics as well as study the feasibility of a given design.
Design of off-statistics axial-flow fans by means of vortex law optimization
Lazari, Andrea; Cattanei, Andrea
2014-12-01
Off-statistics input data sets are common in axial-flow fans design and may easily result in some violation of the requirements of a good aerodynamic blade design. In order to circumvent this problem, in the present paper, a solution to the radial equilibrium equation is found which minimizes the outlet kinetic energy and fulfills the aerodynamic constraints, thus ensuring that the resulting blade has acceptable aerodynamic performance. The presented method is based on the optimization of a three-parameters vortex law and of the meridional channel size. The aerodynamic quantities to be employed as constraints are individuated and their suitable ranges of variation are proposed. The method is validated by means of a design with critical input data values and CFD analysis. Then, by means of systematic computations with different input data sets, some correlations and charts are obtained which are analogous to classic correlations based on statistical investigations on existing machines. Such new correlations help size a fan of given characteristics as well as study the feasibility of a given design.
Optimization of hydrogels for transdermal delivery of lisinopril by Box-Behnken statistical design.
Gannu, Ramesh; Yamsani, Vamshi Vishnu; Yamsani, Shravan Kumar; Palem, Chinna Reddy; Yamsani, Madhusudan Rao
2009-01-01
The aim of this study was to investigate the combined influence of three independent variables on the permeation kinetics of lisinopril from hydrogels for transdermal delivery. A three-factor, three-level Box-Behnken design was used to optimize the independent variables, Carbopol 971 P (X(1)), menthol (X(2)), and propylene glycol (X(3)). Fifteen batches were prepared and evaluated for responses as dependent variables. The dependent variables selected were cumulative amount permeated across rat abdominal skin in 24 h (Q (24); Y(1)), flux (Y(2)), and lag time (Y(3)). Aloe juice has been first time investigated as vehicle for hydrogel preparation. The ex vivo permeation study was conducted using Franz diffusion cells. Mathematical equations and response surface plots were used to relate the dependent and independent variables. The regression equation generated for the cumulative permeation of LSP in 24 h (Q(24)) was Y(1) = 1,443.3-602.59X(1) + 93.24X(2) + 91.75X(3) - 18.95X(1)X(2) - 140.93X(1)X(3) - 4.43X(2)X(3) - 152.63X(1)(2) - 150.03X(2)(2) - 213.9X(3)(2). The statistical validity of the polynomials was established, and optimized formulation factors were selected by feasibility and grid search. Validation of the optimization study with 15 confirmatory runs indicated high degree of prognostic ability of response surface methodology. The use of Box-Behnken design approach helped in identifying the critical formulation parameters in the transdermal delivery of lisinopril from hydrogels.
Gannu, Ramesh; Yamsani, Vamshi Vishnu; Palem, Chinna Reddy; Yamsani, Shravan Kumar; Yamsani, Madhusudan Rao
2010-01-01
The objective of the investigation was to optimize the iontophoresis process parameters of lisinopril (LSP) by 3 x 3 factorial design, Box-Behnken statistical design. LSP is an ideal candidate for iontophoretic delivery to avoid the incomplete absorption problem associated after its oral administration. Independent variables selected were current (X(1)), salt (sodium chloride) concentration (X(2)) and medium/pH (X(3)). The dependent variables studied were amount of LSP permeated in 4 h (Y(1): Q(4)), 24 h (Y(2): Q(24)) and lag time (Y(3)). Mathematical equations and response surface plots were used to relate the dependent and independent variables. The regression equation generated for the iontophoretic permeation was Y(1) = 1.98 + 1.23X(1) - 0.49X(2) + 0.025X(3) - 0.49X(1)X(2) + 0.040X(1)X(3) - 0.010X(2)X(3) + 0.58X(1)(2) - 0.17X(2)(2) - 0.18X(3)(2); Y(2) = 7.28 + 3.32X(1) - 1.52X(2) + 0.22X(3) - 1.30X(1)X(2) + 0.49X(1)X(3) - 0.090X(2)X(3) + 0.79X(1)(2) - 0.62X(2)(2) - 0.33X(3)(2) and Y(3) = 0.60 + 0.0038X(1) + 0.12X(2) - 0.011X(3) + 0.005X(1)X(2) - 0.018X(1)X(3) - 0.015X(2)X(3) - 0.00075X(1)(2) + 0.017X(2)(2) - 0.11X(3)(2). The statistical validity of the polynomials was established and optimized process parameters were selected by feasibility and grid search. Validation of the optimization study with 8 confirmatory runs indicated high degree of prognostic ability of response surface methodology. The use of Box-Behnken design approach helped in identifying the critical process parameters in the iontophoretic delivery of lisinopril.
Directory of Open Access Journals (Sweden)
Shafiul Haque
2016-11-01
Full Text Available AbstractFor a commercially viable recombinant intracellular protein production process, efficient cell lysis and protein release is a major bottleneck. The recovery of recombinant protein, cholesterol oxidase (COD was studied in a continuous bead milling process. A full factorial Response Surface Model (RSM design was employed and compared to Artificial Neural Networks coupled with Genetic Algorithm (ANN-GA. Significant process variables, cell slurry feed rate (A, bead load (B, cell load (C and run time (D, were investigated and optimized for maximizing COD recovery. RSM predicted an optimum of feed rate of 310.73 mL/h, bead loading of 79.9% (v/v, cell loading OD600 nm of 74, and run time of 29.9 min with a recovery of ~3.2 g/L. ANN coupled with GA predicted a maximum COD recovery of ~3.5 g/L at an optimum feed rate (mL/h: 258.08, bead loading (%, v/v: 80%, cell loading (OD600 nm: 73.99, and run time of 32 min. An overall 3.7-fold increase in productivity is obtained when compared to a batch process. Optimization and comparison of statistical vs. artificial intelligence techniques in continuous bead milling process has been attempted for the very first time in our study. We were able to successfully represent the complex non-linear multivariable dependence of enzyme recovery on bead milling parameters. The quadratic second order response functions are not flexible enough to represent such complex non-linear dependence. ANN being a summation function of multiple layers are capable to represent complex non-linear dependence of variables in this case; enzyme recovery as a function of bead milling parameters. Since GA can even optimize discontinuous functions present study cites a perfect example of using machine learning (ANN in combination with evolutionary optimization (GA for representing undefined biological functions which is the case for common industrial processes involving biological moieties.
Directory of Open Access Journals (Sweden)
Jamal I. Daoud
2010-01-01
Full Text Available Problem statement: Palm oil mill effluent discharged by the oil palm industries is considered as the mixed of high polluted effluent which is abundant (about 20 million tonnes year-1 and its effect contributes to the serious environmental problems through the pollution of water bodies. Approach: The aim of this study was to identify the potential of low cost substrate such as Palm Oil Mill Effluent (POME for the production of cellulase enzyme by liquid state bioconversion. The filamentous fungus Trichoderma harzianum was used for liquid state bioconversion of POME for cellulase production. Statistical optimization was carried out to evaluate the physico-chemical parameters (factors for maximum cellulase production by 2-level fractional factorial design with six central points. The polynomial regression model was developed using the experimental data including the effects of linear, quadratic and interaction of the factors. The factors involved were substrate (POME and co-substrate (wheat flour concentrations, temperature, pH, inoculum and agitation. Results: Statistical analysis showed that the optimum conditions were: Temperature of 30°C, substrate concentration of 2%, wheat flour concentration of 3%, pH of 4, inoculum of 3% and agitation of 200 rpm. Under these conditions, the model predicted the enzyme production to be about 14 FPU mL-1. Analysis Of Variance (ANOVA of the design showed a high coefficient of determination (R2 value of 0.999, thus ensuring a high satisfactory adjustment of the quadratic model with the experimental data. Conclusion/Recommendations: This study indicates a better solution for waste management through the utilization of POME for cellulase production that could be used in the industrial applications such as bioethanol production.
GOSIM: A multi-scale iterative multiple-point statistics algorithm with global optimization
Yang, Liang; Hou, Weisheng; Cui, Chanjie; Cui, Jie
2016-04-01
Most current multiple-point statistics (MPS) algorithms are based on a sequential simulation procedure, during which grid values are updated according to the local data events. Because the realization is updated only once during the sequential process, errors that occur while updating data events cannot be corrected. Error accumulation during simulations decreases the realization quality. Aimed at improving simulation quality, this study presents an MPS algorithm based on global optimization, called GOSIM. An objective function is defined for representing the dissimilarity between a realization and the TI in GOSIM, which is minimized by a multi-scale EM-like iterative method that contains an E-step and M-step in each iteration. The E-step searches for TI patterns that are most similar to the realization and match the conditioning data. A modified PatchMatch algorithm is used to accelerate the search process in E-step. M-step updates the realization based on the most similar patterns found in E-step and matches the global statistics of TI. During categorical data simulation, k-means clustering is used for transforming the obtained continuous realization into a categorical realization. The qualitative and quantitative comparison results of GOSIM, MS-CCSIM and SNESIM suggest that GOSIM has a better pattern reproduction ability for both unconditional and conditional simulations. A sensitivity analysis illustrates that pattern size significantly impacts the time costs and simulation quality. In conditional simulations, the weights of conditioning data should be as small as possible to maintain a good simulation quality. The study shows that big iteration numbers at coarser scales increase simulation quality and small iteration numbers at finer scales significantly save simulation time.
Bonavito, N. L.; Gordon, C. L.; Inguva, R.; Serafino, G. N.; Barnes, R. A.
1994-01-01
NASA's Mission to Planet Earth (MTPE) will address important interdisciplinary and environmental issues such as global warming, ozone depletion, deforestation, acid rain, and the like with its long term satellite observations of the Earth and with its comprehensive Data and Information System. Extensive sets of satellite observations supporting MTPE will be provided by the Earth Observing System (EOS), while more specific process related observations will be provided by smaller Earth Probes. MTPE will use data from ground and airborne scientific investigations to supplement and validate the global observations obtained from satellite imagery, while the EOS satellites will support interdisciplinary research and model development. This is important for understanding the processes that control the global environment and for improving the prediction of events. In this paper we illustrate the potential for powerful artificial intelligence (AI) techniques when used in the analysis of the formidable problems that exist in the NASA Earth Science programs and of those to be encountered in the future MTPE and EOS programs. These techniques, based on the logical and probabilistic reasoning aspects of plausible inference, strongly emphasize the synergetic relation between data and information. As such, they are ideally suited for the analysis of the massive data streams to be provided by both MTPE and EOS. To demonstrate this, we address both the satellite imagery and model enhancement issues for the problem of ozone profile retrieval through a method based on plausible scientific inferencing. Since in the retrieval problem, the atmospheric ozone profile that is consistent with a given set of measured radiances may not be unique, an optimum statistical method is used to estimate a 'best' profile solution from the radiances and from additional a priori information.
Tien Bui, Dieu; Pradhan, Biswajeet; Nampak, Haleh; Bui, Quang-Thanh; Tran, Quynh-An; Nguyen, Quoc-Phi
2016-09-01
This paper proposes a new artificial intelligence approach based on neural fuzzy inference system and metaheuristic optimization for flood susceptibility modeling, namely MONF. In the new approach, the neural fuzzy inference system was used to create an initial flood susceptibility model and then the model was optimized using two metaheuristic algorithms, Evolutionary Genetic and Particle Swarm Optimization. A high-frequency tropical cyclone area of the Tuong Duong district in Central Vietnam was used as a case study. First, a GIS database for the study area was constructed. The database that includes 76 historical flood inundated areas and ten flood influencing factors was used to develop and validate the proposed model. Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Receiver Operating Characteristic (ROC) curve, and area under the ROC curve (AUC) were used to assess the model performance and its prediction capability. Experimental results showed that the proposed model has high performance on both the training (RMSE = 0.306, MAE = 0.094, AUC = 0.962) and validation dataset (RMSE = 0.362, MAE = 0.130, AUC = 0.911). The usability of the proposed model was evaluated by comparing with those obtained from state-of-the art benchmark soft computing techniques such as J48 Decision Tree, Random Forest, Multi-layer Perceptron Neural Network, Support Vector Machine, and Adaptive Neuro Fuzzy Inference System. The results show that the proposed MONF model outperforms the above benchmark models; we conclude that the MONF model is a new alternative tool that should be used in flood susceptibility mapping. The result in this study is useful for planners and decision makers for sustainable management of flood-prone areas.
Directory of Open Access Journals (Sweden)
Gage Brian F
2010-11-01
Full Text Available Abstract Background There is currently much interest in pharmacogenetics: determining variation in genes that regulate drug effects, with a particular emphasis on improving drug safety and efficacy. The ability to determine such variation motivates the application of personalized drug therapies that utilize a patient's genetic makeup to determine a safe and effective drug at the correct dose. To ascertain whether a genotype-guided drug therapy improves patient care, a personalized medicine intervention may be evaluated within the framework of a randomized controlled trial. The statistical design of this type of personalized medicine intervention requires special considerations: the distribution of relevant allelic variants in the study population; and whether the pharmacogenetic intervention is equally effective across subpopulations defined by allelic variants. Methods The statistical design of the Clarification of Optimal Anticoagulation through Genetics (COAG trial serves as an illustrative example of a personalized medicine intervention that uses each subject's genotype information. The COAG trial is a multicenter, double blind, randomized clinical trial that will compare two approaches to initiation of warfarin therapy: genotype-guided dosing, the initiation of warfarin therapy based on algorithms using clinical information and genotypes for polymorphisms in CYP2C9 and VKORC1; and clinical-guided dosing, the initiation of warfarin therapy based on algorithms using only clinical information. Results We determine an absolute minimum detectable difference of 5.49% based on an assumed 60% population prevalence of zero or multiple genetic variants in either CYP2C9 or VKORC1 and an assumed 15% relative effectiveness of genotype-guided warfarin initiation for those with zero or multiple genetic variants. Thus we calculate a sample size of 1238 to achieve a power level of 80% for the primary outcome. We show that reasonable departures from these
Directory of Open Access Journals (Sweden)
Manuel Vargas-Yáñez
2005-12-01
Full Text Available Optimal Statistical Interpolation is widely applied in the analysis of oceanographic data. This technique requires knowing some statistics of the analysed fields such as the covariance function and the noise to signal ratio. The different parameters needed should be obtained from historical data sets, but, in contrast with the case of meteorology, these data sets are not frequently available for oceanographic purposes. Here we show that using routine hydrographic samplings can provide a good estimate of the statistics needed to perform an Optimal Statistical Interpolation. These data sets allow the covariance function to be estimated between two points with different horizontal and vertical coordinates taking into account the possible lack of homogeneity and isotropy which is rarely considered. This also allows us to accomplish a three-dimensional analysis of hydrographic data, which yields smaller analysis errors than a traditional two-dimensional analysis would.
Ren, Jie; Lin, Wei-Tie; Shen, Yan-Jing; Wang, Ju-Fang; Luo, Xiao-Chun; Xie, Ming-Quan
2008-11-01
The sequential statistical experimental design (Plackett-Burman, factorial, response surface and steepest ascent experiment) was applied to optimize the culture medium of nitrite oxidizing bacteria for improving the nitrite oxidizing rate. Estimated optimum medium composition of the nitrite oxidizing rate was as follows: NaHCO3, 1.86gl(-1); NaNO2, 2.04gl(-1); Na2CO3, 0.2gl(-1); NaCl, 0.2gl(-1); KH2PO4, 0.1gl(-1); MgSO4 x7H2O, 0.1gl(-1); and FeSO4 x 7H2O, 0.01gl(-1). The nitrite oxidizing rate was increased by 48.0% and reached a maximum at 859.5+/-8.4mgNO2-N/gMLSS.d as compared to 580.7+/-25.8mgNO2-N/gMLSS x d. In the field trial, 50L of nitrite oxidizing bacteria concentrate (1.99gVSS/L) with 850mgNO2-N/gMLSS x d were added to 0.6ha of the aquaculture water. Nitrite level in all treated ponds remained very low compared to the steady increase observed in all of the control ponds during 7 days.
Wall, Alan T; Gee, Kent L; Neilsen, Tracianne B; McKinley, Richard L; James, Michael M
2016-04-01
The identification of acoustic sources is critical to targeted noise reduction efforts for jets on high-performance tactical aircraft. This paper describes the imaging of acoustic sources from a tactical jet using near-field acoustical holography techniques. The measurement consists of a series of scans over the hologram with a dense microphone array. Partial field decomposition methods are performed to generate coherent holograms. Numerical extrapolation of data beyond the measurement aperture mitigates artifacts near the aperture edges. A multisource equivalent wave model is used that includes the effects of the ground reflection on the measurement. Multisource statistically optimized near-field acoustical holography (M-SONAH) is used to reconstruct apparent source distributions between 20 and 1250 Hz at four engine powers. It is shown that M-SONAH produces accurate field reconstructions for both inward and outward propagation in the region spanned by the physical hologram measurement. Reconstructions across the set of engine powers and frequencies suggests that directivity depends mainly on estimated source location; sources farther downstream radiate at a higher angle relative to the inlet axis. At some frequencies and engine powers, reconstructed fields exhibit multiple radiation lobes originating from overlapped source regions, which is a phenomenon relatively recently reported for full-scale jets.
Directory of Open Access Journals (Sweden)
Sun Yong
2014-01-01
Full Text Available The hydrolysis of corn stover using hydrochloric acid was studied. The kinetic parameters of the mathematical models for predicting the yields of xylose, glucose, furfural and acetic acid were obtained, and the corresponding xylose generation activation energy of 100 kJ/mol was determined. The characterization of corn stover using with different techniques during hydrolysis indicated an effective removal of xylan and the slightly alteration on the structures of cellulose and lignin. A 23five levels Central Composite Design (CCD was used to develop a statistical model for the optimization of process variables including acid concentration, pretreatment temperature and time. The optimum conditions determined by this model were found to be 108ºC for 80 minutes with acid concentration of 5.8%. Under these conditions, the maximised results are the following: xylose 19.93 g/L, glucose 1.2 g/L, furfural 1.5 g/L, acetic acid 1.3 g/L. The validation of the model indicates a good agreement between the experimental results and the predicted values.
Energy Technology Data Exchange (ETDEWEB)
Pourmortazavi, Seied Mahdi, E-mail: pourmortazavi@yahoo.com [Faculty of Material and Manufacturing Technologies, Malek Ashtar University of Technology, P.O. Box 16765-3454, Tehran (Iran, Islamic Republic of); Babaee, Saeed; Ashtiani, Fatemeh Shamsi [Faculty of Chemistry & Chemical Engineering, Malek Ashtar University of Technology, Tehran (Iran, Islamic Republic of)
2015-09-15
Graphical abstract: - Highlights: • Surface of magnesium particles was modified with Viton via solvent/non-solvent method. • FT-IR, SEM, EDX, Map analysis, and TG/DSC techniques were employed to characterize the coated particles. • Coating process factors were optimized by Taguchi robust design. • The importance of coating conditions on resistance of coated magnesium against oxidation was studied. - Abstract: The surface of magnesium particles was modified by coating with Viton as an energetic polymer using solvent/non-solvent technique. Taguchi robust method was utilized as a statistical experiment design to evaluate the role of coating process parameters. The coated magnesium particles were characterized by various techniques, i.e., Fourier transform infrared (FT-IR) spectroscopy, scanning electron microscopy (SEM), energy-dispersive X-ray spectroscopy (EDX) and thermogravimetry (TG), and differential scanning calorimetry (DSC). The results showed that the coating of magnesium powder with the Viton leads to a higher resistance of metal against oxidation in the presence of air atmosphere. Meanwhile, tuning of the coating process parameters (i.e., percent of Viton, flow rate of non-solvent addition, and type of solvent) influences on the resistance of the metal particles against thermal oxidation. Coating of magnesium particles yields Viton coated particles with higher thermal stability (632 °C); in comparison with the pure magnesium powder, which commences oxidation in the presence of air atmosphere at a lower temperature of 260 °C.
Institute of Scientific and Technical Information of China (English)
Bahram Behnajady; Javad Moghaddam
2015-01-01
The neutral zinc sulfate solution obtained from hydrometallurgical process of Angouran zinc concentrate has cadmium, nickel and cobalt impurities, that must be purified before electrowinning. Therefore, cadmium and nickel are usually cemented out by addition of zinc dust and remained nickel and cobalt cemented out at second stage with zinc powder and arsenic trioxide. In this research, a new approach is described for determination of effective parameters and optimization of zinc electrolyte hot purification process using statistical design of experiments. The Taguchi method based on orthogonal array design (OAD) has been used to arrange the experimental runs. The experimental conditions involved in the work are as follows: the temperature range of 70−90°C for reaction temperature (T), 30−90 min for reaction time (t), 2−4 g/L for zinc powder mass concentration (M), one to five series for zinc dust particle size distributions (S1−S5), and 0.1−0.5 g/L (C) for arsenic trioxide mass concentration. Optimum conditions for hot purification obtained in this work areT4 (85 °C),t4=75 min,M4=3.5 g/L,S4 (Serie 4), andC2=0.2 g/L.
Scaling of plane-wave functions in statistically optimized near-field acoustic holography.
Hald, Jørgen
2014-11-01
Statistically Optimized Near-field Acoustic Holography (SONAH) is a Patch Holography method, meaning that it can be applied in cases where the measurement area covers only part of the source surface. The method performs projections directly in the spatial domain, avoiding the use of spatial discrete Fourier transforms and the associated errors. First, an inverse problem is solved using regularization. For each calculation point a multiplication must then be performed with two transfer vectors--one to get the sound pressure and the other to get the particle velocity. Considering SONAH based on sound pressure measurements, existing derivations consider only pressure reconstruction when setting up the inverse problem, so the evanescent wave amplification associated with the calculation of particle velocity is not taken into account in the regularized solution of the inverse problem. The present paper introduces a scaling of the applied plane wave functions that takes the amplification into account, and it is shown that the previously published virtual source-plane retraction has almost the same effect. The effectiveness of the different solutions is verified through a set of simulated measurements.
Bettonvil, B.W.M.; Del Castillo, E.; Kleijnen, Jack P.C.
2005-01-01
This paper derives a novel procedure for testing the Karush-Kuhn-Tucker (KKT) first-order optimality conditions in models with multiple random responses.Such models arise in simulation-based optimization with multivariate outputs.This paper focuses on expensive simulations, which have small sample
El-Naggar, Noura El-Ahmady; Abdelwahed, Nayera A M
2014-01-01
Central composite design was chosen to determine the combined effects of four process variables (AgNO3 concentration, incubation period, pH level and inoculum size) on the extracellular biosynthesis of silver nanoparticles (AgNPs) by Streptomyces viridochromogenes. Statistical analysis of the results showed that incubation period, initial pH level and inoculum size had significant effects (Psilver nanoparticles at their individual level. The maximum biosynthesis of silver nanoparticles was achieved at a concentration of 0.5% (v/v) of 1 mM AgNO3, incubation period of 96 h, initial pH of 9 and inoculum size of 2% (v/v). After optimization, the biosynthesis of silver nanoparticles was improved by approximately 5-fold as compared to that of the unoptimized conditions. The synthetic process of silver nanoparticle generation using the reduction of aqueous Ag+ ion by the culture supernatants of S. viridochromogenes was quite fast, and silver nanoparticles were formed immediately by the addition of AgNO3 solution (1 mM) to the cell-free supernatant. Initial characterization of silver nanoparticles was performed by visual observation of color change from yellow to intense brown color. UV-visible spectrophotometry for measuring surface plasmon resonance showed a single absorption peak at 400 nm, which confirmed the presence of silver nanoparticles. Fourier Transform Infrared Spectroscopy analysis provided evidence for proteins as possible reducing and capping agents for stabilizing the nanoparticles. Transmission Electron Microscopy revealed the extracellular formation of spherical silver nanoparticles in the size range of 2.15-7.27 nm. Compared to the cell-free supernatant, the biosynthesized AgNPs revealed superior antimicrobial activity against Gram-negative, Gram-positive bacterial strains and Candida albicans.
On the statistical optimality of CO2 atmospheric inversions assimilating CO2 column retrievals
Directory of Open Access Journals (Sweden)
F. Chevallier
2015-04-01
Full Text Available The extending archive of the Greenhouse Gases Observing SATellite (GOSAT measurements (now covering about six years allows increasingly robust statistics to be computed, that document the performance of the corresponding retrievals of the column-average dry air-mole fraction of CO2 (XCO2. Here, we compare a model simulation constrained by surface air-sample measurements with one of the GOSAT retrieval products (NASA's ACOS. The retrieval-minus-model differences result from various error sources, both in the retrievals and in the simulation: we discuss the plausibility of the origin of the major patterns. We find systematic retrieval errors over the dark surfaces of high-latitude lands and over African savannahs. More importantly, we also find a systematic over-fit of the GOSAT radiances by the retrievals over land for the high-gain detector mode, which is the usual observation mode. The over-fit is partially compensated by the retrieval bias-correction. These issues are likely common to other retrieval products and may explain some of the surprising and inconsistent CO2 atmospheric inversion results obtained with the existing GOSAT retrieval products. We suggest that reducing the observation weight in the retrieval schemes (for instance so that retrieval increments to the retrieval prior values are halved for the studied retrieval product would significantly improve the retrieval quality and reduce the need for (or at least reduce the complexity of ad-hoc retrieval bias correction. More generally, we demonstrate that atmospheric inversions cannot be rigorously optimal when assimilating XCO2 retrievals, even with averaging kernels.
Statistical Optimization of Process Variables for Antibiotic Activity of Xenorhabdus bovienii
Cao, Xue-Qiang; Zhu, Ming-Xuan; Zhang, Xing; Wang, Yong-Hong
2012-01-01
The production of secondary metabolites with antibiotic properties is a common characteristic to entomopathogenic bacteria Xenorhabdus spp. These metabolites not only have diverse chemical structures but also have a wide range of bioactivities of medicinal and agricultural interests. Culture variables are critical to the production of secondary metabolites of microorganisms. Manipulating culture process variables can promote secondary metabolite biosynthesis and thus facilitate the discovery of novel natural products. This work was conducted to evaluate the effects of five process variables (initial pH, medium volume, rotary speed, temperature, and inoculation volume) on the antibiotic production of Xenorhabdus bovienii YL002 using response surface methodology. A 25–1 factorial central composite design was chosen to determine the combined effects of the five variables, and to design a minimum number of experiments. The experimental and predicted antibiotic activity of X. bovienii YL002 was in close agreement. Statistical analysis of the results showed that initial pH, medium volume, rotary speed and temperature had a significant effect (Pantibiotic production of X. bovienii YL002 at their individual level; medium volume and rotary speed showed a significant effect at a combined level and was most significant at an individual level. The maximum antibiotic activity (287.5 U/mL) was achieved at the initial pH of 8.24, medium volume of 54 mL in 250 mL flask, rotary speed of 208 rpm, temperature of 32.0°C and inoculation volume of 13.8%. After optimization, the antibiotic activity was improved by 23.02% as compared with that of unoptimized conditions. PMID:22701637
Angelastro, A.; Campanelli, S. L.; Casalino, G.
2017-09-01
This paper presents a study on process parameters and building strategy for the deposition of Colmonoy 227-F powder by CO2 laser with a focal spot diameter of 0.3 mm. Colmonoy 227-F is a nickel alloy especially designed for mold manufacturing. The substrate material is a 10 mm thick plate of AISI 304 steel. A commercial CO2 laser welding machine was equipped with a low-cost powder feeding system. In this work, following another one in which laser power, scanning speed and powder flow rate had been studied, the effects of two important process parameters, i.e. hatch spacing and step height, on the properties of the built parts were analysed. The explored ranges of hatch spacing and step height were respectively 150-300 μm and 100-200 μm, whose dimensions were comparable with that of the laser spot. The roughness, adhesion, microstructure, microhardness and density of the manufactured specimens were studied for multi-layer samples, which were made of 30 layers. The statistical significance of the studied process parameters was assessed by the analysis of the variance. The process parameters used allowed to obtain both first layer-to-substrate and layer-to-layer good adhesions. The microstructure was fine and almost defect-free. The microhardness of the deposited material was about 100 HV higher than that of the starting powder. The density as high as 98% of that of the same bulk alloy was more than satisfactory. Finally, simultaneous optimization of density and roughness was performed using the contour plots.
Watson, Jane
2007-01-01
Inference, or decision making, is seen in curriculum documents as the final step in a statistical investigation. For a formal statistical enquiry this may be associated with sophisticated tests involving probability distributions. For young students without the mathematical background to perform such tests, it is still possible to draw informal…
Lee, Kwang-Min; Rhee, Chang-Hoon; Kang, Choong-Kyung; Kim, Jung-Hoe
2006-10-01
The production of recombinant anti-HIV peptide, T-20, in Escherichia coli was optimized by statistical experimental designs (successive designs with multifactors) such as 2(4-1) fractional factorial, 2(3) full factorial, and 2(2) rotational central composite design in order. The effects of media compositions (glucose, NPK sources, MgSO4, and trace elements), induction level, induction timing (optical density at induction process), and induction duration (culture time after induction) on T-20 production were studied by using a statistical response surface method. A series of iterative experimental designs was employed to determine optimal fermentation conditions (media and process factors). Optimal ranges characterized by %T-20 (proportion of peptide to the total cell protein) were observed, narrowed down, and further investigated to determine the optimal combination of culture conditions, which was as follows: 9, 6, 10, and 1 mL of glucose, NPK sources, MgSO4, and trace elements, respectively, in a total of 100 mL of medium inducted at an OD of 0.55-0.75 with 0.7 mM isopropyl-beta-D-thiogalactopyranoside in an induction duration of 4 h. Under these conditions, up to 14% of T-20 was obtained. This statistical optimization allowed the production of T-20 to be increased more than twofold (from 6 to 14%) within a shorter induction duration (from 6 to 4 h) at the shake-flask scale.
King, Gary; Rosen, Ori; Tanner, Martin A.
2004-09-01
This collection of essays brings together a diverse group of scholars to survey the latest strategies for solving ecological inference problems in various fields. The last half-decade has witnessed an explosion of research in ecological inference--the process of trying to infer individual behavior from aggregate data. Although uncertainties and information lost in aggregation make ecological inference one of the most problematic types of research to rely on, these inferences are required in many academic fields, as well as by legislatures and the Courts in redistricting, by business in marketing research, and by governments in policy analysis.
Institute of Scientific and Technical Information of China (English)
CHEN Jie; XIN Bin; PENG ZhiHong; PAN Feng
2009-01-01
This brief paper reports a hybrid algorithm we developed recently to solve the global optimization problems of multimodal functions, by combining the advantages of two powerful population-based metaheuristics-differential evolution (DE) and particle swarm optimization (PSO). In the hybrid denoted by DEPSO, each individual in one generation chooses its evolution method, DE or PSO, in a statistical learning way. The choice depends on the relative success ratio of the two methods in a previous learning period. The proposed DEPSO is compared with its PSO and DE parents, two advanced DE variants one of which is suggested by the originators of DE, two advanced PSO variants one of which is acknowledged as a recent standard by PSO community, and also a previous DEPSO. Benchmark tests demonstrate that the DEPSO is more competent for the global optimization of multimodal functions due to its high optimization quality.
Statistical optimization of dithranol-loaded solid lipid nanoparticles using factorial design
Directory of Open Access Journals (Sweden)
Makarand Suresh Gambhire
2011-09-01
Full Text Available This study describes a 3² full factorial experimental design to optimize the formulation of dithranol (DTH loaded solid lipid nanoparticles (SLN by the pre-emulsion ultrasonication method. The variables drug: lipid ratio and sonication time were studied at three levels and arranged in a 3² factorial design to study the influence on the response variables particle size and % entrapment efficiency (%EE. From the statistical analysis of data polynomial equations were generated. The particle size and %EE for the 9 batches (R1 to R9 showed a wide variation of 219-348 nm and 51.33- 71.80 %, respectively. The physical characteristics of DTH-loaded SLN were evaluated using a particle size analyzer, differential scanning calorimetry and X-ray diffraction. The results of the optimized formulation showed an average particle size of 219 nm and entrapment efficiency of 69.88 %. Ex-vivo drug penetration using rat skin showed about a 2-fold increase in localization of DTH in skin as compared to the marketed preparation of DTH.Este estudo descreve o planejamento factorial 3² para otimizar a formulação de nanopartículas lipídicas sólidas (SLN carregadas com ditranol (DTH pelo método da ultrassonificação pré-emulsão. As variáveis como proporção de fármaco:lipídio e o tempo de sonicação foram estudados em três níveis e arranjados em planejamento fatorial 3² para estudar a influência nas variáveis de resposta tamanho de partícula e eficiência percentual de retenção do fármaco (%EE. Pela análise estatística, geraram-se equações polinomiais. O tamanho da partícula e a %EE para os 9 lotes (R1 a R9 mostraram ampla variação, respectivamente, 219-348 nm e 51,33-71,80%. As características físicas das SLN carregadas com DTN foram avaliadas utilizando-se analisador de tamanho de partícula, calorimetria de varredura diferencial e difração de raios X. Os resultados da formulação otimizada mostraram tamanho médio de partícula de
DEFF Research Database (Denmark)
Thorlund, Kristian; Wetterslev, Jørn; Awad, Tahany;
2011-01-01
In random-effects model meta-analysis, the conventional DerSimonian-Laird (DL) estimator typically underestimates the between-trial variance. Alternative variance estimators have been proposed to address this bias. This study aims to empirically compare statistical inferences from random......-effects model meta-analyses on the basis of the DL estimator and four alternative estimators, as well as distributional assumptions (normal distribution and t-distribution) about the pooled intervention effect. We evaluated the discrepancies of p-values, 95% confidence intervals (CIs) in statistically...... significant meta-analyses, and the degree (percentage) of statistical heterogeneity (e.g. I(2)) across 920 Cochrane primary outcome meta-analyses. In total, 414 of the 920 meta-analyses were statistically significant with the DL meta-analysis, and 506 were not. Compared with the DL estimator, the four...
Energy Technology Data Exchange (ETDEWEB)
Allen, J; Velsko, S
2009-11-16
This report explores the question of whether meaningful conclusions can be drawn regarding the transmission relationship between two microbial samples on the basis of differences observed between the two sample's respective genomes. Unlike similar forensic applications using human DNA, the rapid rate of microbial genome evolution combined with the dynamics of infectious disease require a shift in thinking on what it means for two samples to 'match' in support of a forensic hypothesis. Previous outbreaks for SARS-CoV, FMDV and HIV were examined to investigate the question of how microbial sequence data can be used to draw inferences that link two infected individuals by direct transmission. The results are counter intuitive with respect to human DNA forensic applications in that some genetic change rather than exact matching improve confidence in inferring direct transmission links, however, too much genetic change poses challenges, which can weaken confidence in inferred links. High rates of infection coupled with relatively weak selective pressure observed in the SARS-CoV and FMDV data lead to fairly low confidence for direct transmission links. Confidence values for forensic hypotheses increased when testing for the possibility that samples are separated by at most a few intermediate hosts. Moreover, the observed outbreak conditions support the potential to provide high confidence values for hypothesis that exclude direct transmission links. Transmission inferences are based on the total number of observed or inferred genetic changes separating two sequences rather than uniquely weighing the importance of any one genetic mismatch. Thus, inferences are surprisingly robust in the presence of sequencing errors provided the error rates are randomly distributed across all samples in the reference outbreak database and the novel sequence samples in question. When the number of observed nucleotide mutations are limited due to characteristics of the
Gannu, Ramesh; Palem, Chinna Reddy; Yamsani, Shravan Kumar; Yamsani, Vamshi Vishnu; Yamsani, Madhusudan Rao
2010-06-01
The purpose of the present study was to develop and optimize reservoir-based transdermal therapeutic system (TTS) for buspirone (BUSP), a low bioavailable drug. A three-factor, three-level Box-Behnken design was employed to optimize the TTS. Hydroxypropyl methylcellulose, D: -limonene and propylene glycol were varied as independent variables; cumulative amount permeated across rat abdominal skin in 24 h, flux and lag time were selected as dependent variables. Mathematical equations and response surface plots were used to relate the dependent and independent variables. The statistical validity of polynomials was established, and optimized formulation factors were selected by feasibility and grid search. Validation of the optimization study with seven confirmatory runs indicated high degree of prognostic ability of response surface methodology. BUSP-OPT (optimized formulation) showed a flux 104.6 microg cm(-2) h(-1), which could meet target flux. The bioavailability studies in rabbits showed that about 2.65 times improvement (p statistical design and could provide an effective treatment in the management of anxiety.
Zhang, Ying; Wang, Yang; Wang, Zhi-Gang; Wang, Xi; Guo, Huo-Sheng; Meng, Dong-Fang; Wong, Po-Keung
2012-01-01
Statistical experimental designs provided by statistical analysis system (SAS) software were applied to optimize the fermentation medium composition for the production of atrazine-degrading Acinetobacter sp. DNS(32) in shake-flask cultures. A "Plackett-Burman Design" was employed to evaluate the effects of different components in the medium. The concentrations of corn flour, soybean flour, and K(2)HPO(4) were found to significantly influence Acinetobacter sp. DNS(32) production. The steepest ascent method was employed to determine the optimal regions of these three significant factors. Then, these three factors were optimized using central composite design of "response surface methodology." The optimized fermentation medium composition was composed as follows (g/L): corn flour 39.49, soybean flour 25.64, CaCO(3) 3, K(2)HPO(4) 3.27, MgSO(4)·7H(2)O 0.2, and NaCl 0.2. The predicted and verifiable values in the medium with optimized concentration of components in shake flasks experiments were 7.079 × 10(8) CFU/mL and 7.194 × 10(8) CFU/mL, respectively. The validated model can precisely predict the growth of atrazine-degraing bacterium, Acinetobacter sp. DNS(32).
Westfall, Jacob; Kenny, David A; Judd, Charles M
2014-10-01
Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.
Directory of Open Access Journals (Sweden)
Chien-Lin Huang
2015-01-01
Full Text Available This study aims to construct a typhoon precipitation forecast model providing forecasts one to six hours in advance using optimal model parameters and structures retrieved from a combination of the adaptive network-based fuzzy inference system (ANFIS and artificial intelligence. To enhance the accuracy of the precipitation forecast, two structures were then used to establish the precipitation forecast model for a specific lead-time: a single-model structure and a dual-model hybrid structure where the forecast models of higher and lower precipitation were integrated. In order to rapidly, automatically, and accurately retrieve the optimal parameters and structures of the ANFIS-based precipitation forecast model, a tabu search was applied to identify the adjacent radius in subtractive clustering when constructing the ANFIS structure. The coupled structure was also employed to establish a precipitation forecast model across short and long lead-times in order to improve the accuracy of long-term precipitation forecasts. The study area is the Shimen Reservoir, and the analyzed period is from 2001 to 2009. Results showed that the optimal initial ANFIS parameters selected by the tabu search, combined with the dual-model hybrid method and the coupled structure, provided the favors in computation efficiency and high-reliability predictions in typhoon precipitation forecasts regarding short to long lead-time forecasting horizons.
Kano, Masayuki; Miyazaki, Shin'ichi; Ishikawa, Yoichi; Hiyoshi, Yoshihisa; Ito, Kosuke; Hirahara, Kazuro
2015-10-01
Data assimilation is a technique that optimizes the parameters used in a numerical model with a constraint of model dynamics achieving the better fit to observations. Optimized parameters can be utilized for the subsequent prediction with a numerical model and predicted physical variables are presumably closer to observations that will be available in the future, at least, comparing to those obtained without the optimization through data assimilation. In this work, an adjoint data assimilation system is developed for optimizing a relatively large number of spatially inhomogeneous frictional parameters during the afterslip period in which the physical constraints are a quasi-dynamic equation of motion and a laboratory derived rate and state dependent friction law that describe the temporal evolution of slip velocity at subduction zones. The observed variable is estimated slip velocity on the plate interface. Before applying this method to the real data assimilation for the afterslip of the 2003 Tokachi-oki earthquake, a synthetic data assimilation experiment is conducted to examine the feasibility of optimizing the frictional parameters in the afterslip area. It is confirmed that the current system is capable of optimizing the frictional parameters A-B, A and L by adopting the physical constraint based on a numerical model if observations capture the acceleration and decaying phases of slip on the plate interface. On the other hand, it is unlikely to constrain the frictional parameters in the region where the amplitude of afterslip is less than 1.0 cm d-1. Next, real data assimilation for the 2003 Tokachi-oki earthquake is conducted to incorporate slip velocity data inferred from time dependent inversion of Global Navigation Satellite System time-series. The optimized values of A-B, A and L are O(10 kPa), O(102 kPa) and O(10 mm), respectively. The optimized frictional parameters yield the better fit to the observations and the better prediction skill of slip
Verma, Harish Kumar; Jain, Cheshta
2016-09-01
In this article, a hybrid algorithm of particle swarm optimization (PSO) with statistical parameter (HSPSO) is proposed. Basic PSO for shifted multimodal problems have low searching precision due to falling into a number of local minima. The proposed approach uses statistical characteristics to update the velocity of the particle to avoid local minima and help particles to search global optimum with improved convergence. The performance of the newly developed algorithm is verified using various standard multimodal, multivariable, shifted hybrid composition benchmark problems. Further, the comparative analysis of HSPSO with variants of PSO is tested to control frequency of hybrid renewable energy system which comprises solar system, wind system, diesel generator, aqua electrolyzer and ultra capacitor. A significant improvement in convergence characteristic of HSPSO algorithm over other variants of PSO is observed in solving benchmark optimization and renewable hybrid system problems.
Zhu, Jian-Rong; Li, Jian; Zhang, Chun-Mei; Wang, Qin
2017-10-01
The decoy-state method has been widely used in commercial quantum key distribution (QKD) systems. In view of the practical decoy-state QKD with both source errors and statistical fluctuations, we propose a universal model of full parameter optimization in biased decoy-state QKD with phase-randomized sources. Besides, we adopt this model to carry out simulations of two widely used sources: weak coherent source (WCS) and heralded single-photon source (HSPS). Results show that full parameter optimization can significantly improve not only the secure transmission distance but also the final key generation rate. And when taking source errors and statistical fluctuations into account, the performance of decoy-state QKD using HSPS suffered less than that of decoy-state QKD using WCS.
Chen, Shuming; Wang, Lianhui; Song, Jiqang; Wang, Dengfeng; Chen, Jing
The interior sound pressure levels of a commercial vehicle cab at the driver’s right ear position and head rest position are determined as evaluation indices of vehicle acoustic performances. A statistical energy analysis model of the commercial vehicle cab was created by using statistical energy analysis method. The simulated interior acoustic performance of the cab has a significant coincidence with the experimental results. A response surface model was presented to determine the relationship between sound package parameters and evaluation indices of the interior acoustic performance for the vehicle cab. A multi-objective optimization was performed by using NSGA II algorithm with weighting coefficient method. The presented method provides a new idea for the multi-objective optimization design of the acoustic performances in vehicle noise analysis and control field.
National Research Council Canada - National Science Library
Pathak, Lakshmi; Singh, Vineeta; Niwas, Ram; Osama, Khwaja; Khan, Saif; Haque, Shafiul; Tripathi, C K M; Mishra, B N
2015-01-01
.... The wider biological functions and clinical applications of COD have urged the screening, isolation and characterization of newer microbes from diverse habitats as a source of COD and optimization...
Modeling and optimization of Electrical Discharge Machining (EDM using statistical design
Directory of Open Access Journals (Sweden)
Hegab Husein A.
2015-01-01
Full Text Available Modeling and optimization of nontraditional machining is still an ongoing area of research. The objective of this work is to optimize Electrical Discharge Machining process parameters of Aluminum-multiwall carbon Nanotube composites (AL-CNT model. Material Removal Rate (MRR, Wear Electrode Ratio (EWR and Average Surface Roughness (Ra are primary objectives. The Machining parameters are machining-on time (sec, discharge current (A, voltage (V, total depth of cut (mm, and %wt. CNT added. Mathematical models for all responses as function of significant process parameters are developed using Response Surface Methodology (RSM. Experimental results show optimum levels for material removal rate are %wt. CNT (0%, high level of discharge current (6A and low level of voltage (50 V while optimum levels for Electrode wear ratio are %wt. CNT (5%, high level of discharge current (6A and optimum levels for average surface roughness are %wt. CNT (0%, low level of discharge current (2A and high level of depth of cut (1 mm. Single-objective optimization is formulated and solved via Genetic Algorithm. Multi-objective optimization model is then formulated for the three responses of interest. This methodology gathers experimental results, builds mathematical models in the domain of interest and optimizes the process models. As such, process analysis, modeling, design and optimization are achieved.
De Groot, M.; Vernooij, M.W.; Klein, S.; Arfan Ikram, M.; Vos, F.M.; Smith, S.M.; Niessen, W.J.; Andersson, J.L.R.
2013-01-01
Anatomical alignment in neuroimaging studies is of such importance that considerable effort is put into improving the registration used to establish spatial correspondence. Tract-based spatial statistics (TBSS) is a popular method for comparing diffusion characteristics across subjects. TBSS
Gheshlaghi, R; Scharer, J M; Moo-Young, M; Douglas, P L
2008-12-01
Modified resolution and overall separation factors used to quantify the separation of complex chromatography systems are described. These factors were proven to be applicable to the optimization of amino acid resolution in reverse-phase (RP) HPLC chromatograms. To optimize precolumn derivatization with phenylisothiocyanate, a 2(5-1) fractional factorial design in triplicate was employed. The five independent variables for optimizing the overall separation factor were triethylamine content of the aqueous buffer, pH of the aqueous buffer, separation temperature, methanol/acetonitrile concentration ratio in the organic eluant, and mobile phase flow rate. Of these, triethylamine concentration and methanol/acetonitrile concentration ratio were the most important. The methodology captured the interaction between variables. Temperature appeared in the interaction terms; consequently, it was included in the hierarchic model. The preliminary model based on the factorial experiments was not able to explain the response curvature in the design space; therefore, a central composite design was used to provide a quadratic model. Constrained nonlinear programming was used for optimization purposes. The quadratic model predicted the optimal levels of the variables. In this study, the best levels of the five independent variables that provide the maximum modified resolution for each pair of consecutive amino acids appearing in the chromatograph were determined. These results are of utmost importance for accurate analysis of a subset of amino acids.
Chen, Yuancai; Lin, Che-Jen; Jones, Gavin; Fu, Shiyu; Zhan, Huaiyu
2011-02-01
A fractional factorial design (FFD) and a response surface methodology (RSM) were used to optimize the inoculum composition of six strains for treatment of synthetic domestic wastewater. The model predicted the highest overall specific substrate utilization rate (q) of 6.88 g TOC/(d-gVSS). The value is in accordance with the actual maximum q, and is 1.5 and 1.97 times greater than those without optimization for 4 and 6 strains respectively. Additionally, the shortest time to reach stationary phase (3.5 h) and highest maximum total organic carbon (TOC) removal efficiency (92%) were also achieved under the optimum condition. The results indicated that the FFD and RSM are powerful screening and optimizing tools for the microbial community. The experimental approaches enhance the overall specific rate of substrate utilization as well as other biodegradation parameters.
Statistical optimization of single-cell production from Taxus cuspidata plant cell aggregates.
Gaurav, Vishal; Roberts, Susan C
2011-01-01
Flow-cytometric characterization of plant cell culture growth and metabolism at the single-cell level is a method superior to traditional culture average measurements for collecting population information. Investigation of culture heterogeneity and production variability by obtaining information about different culture subpopulations is crucial for optimizing bio-processes for enhanced productivity. Obtaining high yields of intact and viable single cells from aggregated plant cell cultures is an enabling criterion for their analysis and isolation using high-throughput flow cytometric methods. The critical parameters affecting the enzymatic isolation of single cells from aggregated Taxus cuspidata plant cell suspensions were optimized using response-surface methodology and factorial central composite design. Using a design of experiments approach, the output response single-cell yield (SCY, percentage of cell clusters containing only a single cell) was optimized. Optimal conditions were defined for the independent parameters cellulase concentration, pectolyase Y-23 concentration, and centrifugation speed to be 0.045% (w/v), 0.7% (w/v), and 1200 × g, respectively. At these optimal conditions, the model predicted a maximum SCY of 48%. The experimental data exhibited a 72% increase over previously attained values and additionally validated the model predictions. More than 99% of the isolated cells were viable and suitable for rapid analysis through flow cytometry, thus enabling the collection of population information from cells that accurately represent aggregated suspensions. These isolated cells can be further studied to gain insight into both growth and secondary metabolite production, which can be used for bio-process optimization.
Statistical analysis of piloted simulation of real time trajectory optimization algorithms
Price, D. B.
1982-01-01
A simulation of time-optimal intercept algorithms for on-board computation of control commands is described. The effects of three different display modes and two different computation modes on the pilots' ability to intercept a moving target in minimum time were tested. Both computation modes employed singular perturbation theory to help simplify the two-point boundary value problem associated with trajectory optimization. Target intercept time was affected by both the display and computation modes chosen, but the display mode chosen was the only significant influence on the miss distance.
Oberoi, Harinder Singh; Vadlani, Praveen V; Saida, Lavudi; Bansal, Sunil; Hughes, Joshua D
2011-07-01
Dried and ground banana peel biomass (BP) after hydrothermal sterilization pretreatment was used for ethanol production using simultaneous saccharification and fermentation (SSF). Central composite design (CCD) was used to optimize concentrations of cellulase and pectinase, temperature and time for ethanol production from BP using SSF. Analysis of variance showed a high coefficient of determination (R(2)) value of 0.92 for ethanol production. On the basis of model graphs and numerical optimization, the validation was done in a laboratory batch fermenter with cellulase, pectinase, temperature and time of nine cellulase filter paper unit/gram cellulose (FPU/g-cellulose), 72 international units/gram pectin (IU/g-pectin), 37 °C and 15 h, respectively. The experiment using optimized parameters in batch fermenter not only resulted in higher ethanol concentration than the one predicted by the model equation, but also saved fermentation time. This study demonstrated that both hydrothermal pretreatment and SSF could be successfully carried out in a single vessel, and use of optimized process parameters helped achieve significant ethanol productivity, indicating commercial potential for the process. To the best of our knowledge, ethanol concentration and ethanol productivity of 28.2 g/l and 2.3 g/l/h, respectively from banana peels have not been reported to date.
Energy Technology Data Exchange (ETDEWEB)
Cho, Su Gil; Jang, Jun Yong; Kim, Ji Hoon; Lee, Tae Hee [Hanyang University, Seoul (Korea, Republic of); Lee, Min Uk [Romax Technology Ltd., Seoul (Korea, Republic of); Choi, Jong Su; Hong, Sup [Korea Research Institute of Ships and Ocean Engineering, Daejeon (Korea, Republic of)
2015-04-15
Sequential surrogate model-based global optimization algorithms, such as super-EGO, have been developed to increase the efficiency of commonly used global optimization technique as well as to ensure the accuracy of optimization. However, earlier studies have drawbacks because there are three phases in the optimization loop and empirical parameters. We propose a united sampling criterion to simplify the algorithm and to achieve the global optimum of problems with constraints without any empirical parameters. It is able to select the points located in a feasible region with high model uncertainty as well as the points along the boundary of constraint at the lowest objective value. The mean squared error determines which criterion is more dominant among the infill sampling criterion and boundary sampling criterion. Also, the method guarantees the accuracy of the surrogate model because the sample points are not located within extremely small regions like super-EGO. The performance of the proposed method, such as the solvability of a problem, convergence properties, and efficiency, are validated through nonlinear numerical examples with disconnected feasible regions.
Directory of Open Access Journals (Sweden)
Damilola Isaac Adebiyi
2016-06-01
Full Text Available The cold spray coating process involves many process parameters which make the process very complex, and highly dependent and sensitive to small changes in these parameters. This results in a small operational window of the parameters. Consequently, mathematical optimization of the process parameters is key, not only to achieving deposition but also improving the coating quality. This study focuses on the mathematical identification and experimental justification of the optimum process parameters for cold spray coating of titanium alloy with silicon carbide (SiC. The continuity, momentum and the energy equations governing the flow through the low-pressure cold spray nozzle were solved by introducing a constitutive equation to close the system. This was used to calculate the critical velocity for the deposition of SiC. In order to determine the input temperature that yields the calculated velocity, the distribution of velocity, temperature, and pressure in the cold spray nozzle were analyzed, and the exit values were predicted using the meshing tool of Solidworks. Coatings fabricated using the optimized parameters and some non-optimized parameters are compared. The coating of the CFD-optimized parameters yielded lower porosity and higher hardness.
Bao, Yingling; Zhengfang, Ye
2013-01-01
Statistical methodology was applied to the optimization of the ammonium oxidation by Nitrosomonas europaea for biomass concentration (C(B)), nitrite yield (Y(N)) and ammonium removal (R(A)). Initial screening by Plackett-Burman design was performed to select major variables out of nineteen factors, among which NH4Cl concentration (C(N)), trace element solution (TES), agitation speed (AS), and fermentation time (T) were found to have significant effects. Path of steepest ascent and response surface methodology was applied to optimize the levels of the selected factors. Finally, multi-objective optimization was used to obtain optimal condition by compromise of the three desirable objectives through a combination of weighted coefficient method coupled with entropy measurement methodology. These models enabled us to identify the optimum operation conditions (C(N)= 84.1 mM; TES = 0.74 ml; AS= 100 rpm and T = 78 h), under which C(B)= 3.386×10(8) cells/ml; Y(N)= 1.98 mg/mg and R(A) = 97.76% were simultaneously obtained. The optimized conditions were shown to be feasible through verification tests.
Subramanyam, Busetty; Das, Ashutosh
2014-01-01
In adsorption study, to describe sorption process and evaluation of best-fitting isotherm model is a key analysis to investigate the theoretical hypothesis. Hence, numerous statistically analysis have been extensively used to estimate validity of the experimental equilibrium adsorption values with the predicted equilibrium values. Several statistical error analysis were carried out. In the present study, the following statistical analysis were carried out to evaluate the adsorption isotherm model fitness, like the Pearson correlation, the coefficient of determination and the Chi-square test, have been used. The ANOVA test was carried out for evaluating significance of various error functions and also coefficient of dispersion were evaluated for linearised and non-linearised models. The adsorption of phenol onto natural soil (Local name Kalathur soil) was carried out, in batch mode at 30 ± 20 C. For estimating the isotherm parameters, to get a holistic view of the analysis the models were compared between linear and non-linear isotherm models. The result reveled that, among above mentioned error functions and statistical functions were designed to determine the best fitting isotherm.
Improving Statistical Machine Translation Through N-best List Re-ranking and Optimization
2014-03-27
Language Processing, 1352–1362. Association for Computational Linguistics, Edinburgh, Scotland , UK., July 2011. URL http://www.aclweb.org/anthology/D11-1125...Josef. “Minimum Error Rate Training in Statistical Machine Translation”. Erhard Hinrichs and Dan Roth (editors), Proceedings of the 41st Annual Meeting of
Balder, E.J.
1980-01-01
By employing fundamental results from “geometric” functional analysis and the theory of multifunctions we formulate a general model for (nonsequential) statistical decision theory, which extends Wald's classical model. From central results that hold for the model we derive a general theorem on the e
Institute of Scientific and Technical Information of China (English)
祁燕申
2001-01-01
对Bayes统计推断中,通过试验由先验概率求出后验概率.证明了后验概率关于试验构成半群,从而序贯试验与累积试验具有相同的后验概率.%In this paper,in Bayes statistical inference,the posterior probability is got from the prior probability through expriments.It proves that the posterior probability about experiments makes up hemigroup then proves that the posterior probability of the order experiments and the accumulation experiments are the same.
Directory of Open Access Journals (Sweden)
Thomas B Kepler
2013-04-01
Full Text Available One of the key phenomena in the adaptive immune response to infection and immunization is affinity maturation, during which antibody genes are mutated and selected, typically resulting in a substantial increase in binding affinity to the eliciting antigen. Advances in technology on several fronts have made it possible to clone large numbers of heavy-chain light-chain pairs from individual B cells and thereby identify whole sets of clonally related antibodies. These collections could provide the information necessary to reconstruct their own history - the sequence of changes introduced into the lineage during the development of the clone - and to study affinity maturation in detail. But the success of such a program depends entirely on accurately inferring the founding ancestor and the other unobserved intermediates. Given a set of clonally related immunoglobulin V-region genes, the method described here allows one to compute the posterior distribution over their possible ancestors, thereby giving a thorough accounting of the uncertainty inherent in the reconstruction. I demonstrate the application of this method on heavy-chain and light-chain clones, assess the reliability of the inference, and discuss the sources of uncertainty.
Elmizadeh, Hamideh; Khanmohammadi, Mohammadreza; Ghasemi, Keyvan; Hassanzadeh, Gholamreza; Nassiri-Asl, Marjan; Garmarudi, Amir Bagheri
2013-06-01
Chitosan nanoparticles and magnetic chitosan nanoparticles can be applied as delivery systems for the anti-Alzheimer drug tacrine. Investigation was carried out to elucidate the influence of process parameters on the mean particle size of chitosan nanoparticles produced by spontaneous emulsification. The method was optimized using design of experiments (DOE) by employing a 3-factor, 3-level Box-Behnken statistical design. This statistical design is used in order to achieve the minimum size and suitable morphology of nanoparticles. Also, magnetic chitosan nanoparticles were synthesized according to optimal method. The designed nanoparticles have average particle size from 33.64 to 74.87nm, which were determined by field emission scanning electron microscopy (FE-SEM). Drug loading in the nanoparticles as drug delivery systems has been done according to the presented optimal method and appropriate capacity of drug loading was shown by ultraviolet spectrophotometry. Chitosan and magnetic chitosan nanoparticles as drug delivery systems were characterized by Diffuse Reflectance Fourier Transform Mid Infrared spectroscopy (DR-FTMIR).
Xu, Sheng; Adiga, Nagesh; Ba, Shan; Dasgupta, Tirthankar; Wu, C F Jeff; Wang, Zhong Lin
2009-07-28
Controlling the morphology of the as-synthesized nanostructures is usually challenging, and there lacks of a general theoretical guidance in experimental approach. In this study, a novel way of optimizing the aspect ratio of hydrothermally grown ZnO nanowire (NW) arrays is presented by utilizing a systematic statistical design and analysis method. In this work, we use pick-the-winner rule and one-pair-at-a-time main effect analysis to sequentially design the experiments and identify optimal reaction settings. By controlling the hydrothermal reaction parameters (reaction temperature, time, precursor concentration, and capping agent), we improved the aspect ratio of ZnO NWs from around 10 to nearly 23. The effect of noise on the experimental results was identified and successfully reduced, and the statistical design and analysis methods were very effective in reducing the number of experiments performed and in identifying the optimal experimental settings. In addition, the antireflection spectrum of the as-synthesized ZnO NWs clearly shows that higher aspect ratio of the ZnO NW arrays leads to about 30% stronger suppression in the UV-vis range emission. This shows great potential applications as antireflective coating layers in photovoltaic devices.
Demarque, Daniel P.; Fitts, Sonia Maria F.; Boaretto, Amanda G.; da Silva, Júlio César Leite; Vieira, Maria C.; Franco, Vanessa N. P.; Teixeira, Caroline B.; Toffoli-Kadri, Mônica C.; Carollo, Carlos A.
2015-01-01
Achyrocline alata, known as Jateí-ka-há, is traditionally used to treat several health problems, including inflammations and infections. This study aimed to optimize an active extract against Streptococcus mutans, the main bacteria that causes caries. The extract was developed using an accelerated solvent extraction and chemometric calculations. Factorial design and response surface methodologies were used to determine the most important variables, such as active compound selectivity. The standardized extraction recovered 99% of the four main compounds, gnaphaliin, helipyrone, obtusifolin and lepidissipyrone, which represent 44% of the extract. The optimized extract of A. alata has a MIC of 62.5 μg/mL against S. mutans and could be used in mouth care products. PMID:25710523
Demarque, Daniel P; Fitts, Sonia Maria F; Boaretto, Amanda G; da Silva, Júlio César Leite; Vieira, Maria C; Franco, Vanessa N P; Teixeira, Caroline B; Toffoli-Kadri, Mônica C; Carollo, Carlos A
2015-01-01
Achyrocline alata, known as Jateí-ka-há, is traditionally used to treat several health problems, including inflammations and infections. This study aimed to optimize an active extract against Streptococcus mutans, the main bacteria that causes caries. The extract was developed using an accelerated solvent extraction and chemometric calculations. Factorial design and response surface methodologies were used to determine the most important variables, such as active compound selectivity. The standardized extraction recovered 99% of the four main compounds, gnaphaliin, helipyrone, obtusifolin and lepidissipyrone, which represent 44% of the extract. The optimized extract of A. alata has a MIC of 62.5 μg/mL against S. mutans and could be used in mouth care products.
Hong, Xin; Zhang, Xiaoxiao
2008-12-01
The AcrySof ReSTOR intraocular lens (IOL) is a multifocal lens with state-of-the-art apodized diffractive technology, and is indicated for visual correction of aphakia secondary to removal of cataractous lenses in adult patients with/without presbyopia, who desire near, intermediate, and distance vision with increased spectacle independence. The multifocal design results in some optical contrast reduction, which may be improved by reducing spherical aberration. A novel patent-pending approach was undertaken to investigate the optical performance of aspheric lens designs. Simulated eyes using human normal distributions were corrected with different lens designs in a Monte Carlo simulation that allowed for variability in multiple surgical parameters (e.g. positioning error, biometric variation). Monte Carlo optimized results indicated that a lens spherical aberration of -0.10 microm provided optimal distance image quality.
Directory of Open Access Journals (Sweden)
Daniel P Demarque
Full Text Available Achyrocline alata, known as Jateí-ka-há, is traditionally used to treat several health problems, including inflammations and infections. This study aimed to optimize an active extract against Streptococcus mutans, the main bacteria that causes caries. The extract was developed using an accelerated solvent extraction and chemometric calculations. Factorial design and response surface methodologies were used to determine the most important variables, such as active compound selectivity. The standardized extraction recovered 99% of the four main compounds, gnaphaliin, helipyrone, obtusifolin and lepidissipyrone, which represent 44% of the extract. The optimized extract of A. alata has a MIC of 62.5 μg/mL against S. mutans and could be used in mouth care products.
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
The optimization of nutrient levels for the production of recombinant hyperthermophilic esterase by E. coli was carried out with response surface methodology(RSM) based on the central composite rotatable design(CCRD). A 24central composite rotatable design was used to study the combined effect of the nutritional constituents like yeast extract, peptone, mineral salt and trace metals. The P-value of the coefficient for the linear effect of peptone concentration was 0. 0081 and trace metals solution was less than 0. 0001, suggesting that these were the principal variables with significant effect on the hyperthermophilic esterase production. The predicted optimal hyperthermophilic esterase yield was 269. 17 U/mL, whereas an actual experimental value of 284. 58 U/mL was obtained.
Statistical optimization of aqueous extraction of pectin from waste durian rinds.
Maran, J Prakash
2015-02-01
The objectives of this present study was to investigate and optimize the aqueous extraction conditions such as solid-liquid (SL) ratio (1:5-1:15 g/ml), pH (2-3), extraction time (20-60 min) and extraction temperature (75-95 °C) on maximum extraction of pectin from durian rinds using four factors, three levels Box-Behnken response design. The experimental data obtained were fitted to a second-order polynomial equation using multiple regression analysis and analyzed by analysis of variance (ANOVA). The optimum extraction condition was found to be as follows: SL ratio of 1:10 g/ml, pH of 2.8, extraction time of 43 min and extraction temperature of 86 °C respectively. Under the optimal conditions, the experimental pectin yield (9.1%) was well correlated with predicted yield (9.3%).
Directory of Open Access Journals (Sweden)
M. Mihelich
2014-11-01
Full Text Available We derive rigorous results on the link between the principle of maximum entropy production and the principle of maximum Kolmogorov–Sinai entropy using a Markov model of the passive scalar diffusion called the Zero Range Process. We show analytically that both the entropy production and the Kolmogorov–Sinai entropy seen as functions of f admit a unique maximum denoted fmaxEP and fmaxKS. The behavior of these two maxima is explored as a function of the system disequilibrium and the system resolution N. The main result of this article is that fmaxEP and fmaxKS have the same Taylor expansion at first order in the deviation of equilibrium. We find that fmaxEP hardly depends on N whereas fmaxKS depends strongly on N. In particular, for a fixed difference of potential between the reservoirs, fmaxEP(N tends towards a non-zero value, while fmaxKS(N tends to 0 when N goes to infinity. For values of N typical of that adopted by Paltridge and climatologists (N ≈ 10 ~ 100, we show that fmaxEP and fmaxKS coincide even far from equilibrium. Finally, we show that one can find an optimal resolution N* such that fmaxEP and fmaxKS coincide, at least up to a second order parameter proportional to the non-equilibrium fluxes imposed to the boundaries. We find that the optimal resolution N* depends on the non equilibrium fluxes, so that deeper convection should be represented on finer grids. This result points to the inadequacy of using a single grid for representing convection in climate and weather models. Moreover, the application of this principle to passive scalar transport parametrization is therefore expected to provide both the value of the optimal flux, and of the optimal number of degrees of freedom (resolution to describe the system.
Statistical optimization of supercapacitor pilot plant manufacturing and process scale-up
Ajina, Ahmida
2015-01-01
In recent years, electrical double layer capacitor (EDLCs) has become one of the most popular energy storage devices. This can be attributed to its high capacity, long life cycle and fast charge/discharge rates. However, it has some drawbacks – mainly it stores less amount of energy than batteries. Hence, there is a need to optimize the EDLC to increase its capacity and decrease its equivalent series resistance (ESR), resulting in a supercapacitor that is able to charge quickly and will hold ...
Xiao, Yun-Zhu; Wu, Duan-Kai; Zhao, Si-Yang; Lin, Wei-Min; Gao, Xiang-Yang
2015-01-01
Proteases from halotolerant and halophilic microorganisms were found in traditional Chinese fish sauce. In this study, 30 fungi were isolated from fermented fish sauce in five growth media based on their morphology. However, only one strain, YL-1, which was identified as Penicillium citrinum by internal transcribed spacer (ITS) sequence analysis, can produce alkaline protease. This study is the first to report that a protease-producing fungus strain was isolated and identified in traditional Chinese fish sauce. Furthermore, the culture conditions of alkaline protease production by P. citrinum YL-1 in solid-state fermentation were optimized by response surface methodology. First, three variables including peptone, initial pH, and moisture content were selected by Plackett-Burman design as the significant variables for alkaline protease production. The Box-Behnken design was then adopted to further investigate the interaction effects between the three variables on alkaline protease production and determine the optimal values of the variables. The maximal production (94.30 U/mL) of alkaline protease by P. citrinum YL-1 took place under the optimal conditions of peptone, initial pH, and moisture content (v/w) of 35.5 g/L, 7.73, and 136%, respectively.
El-Say, Khalid M; El-Helw, Abdel-Rahim M; Ahmed, Osama A A; Hosny, Khaled M; Ahmed, Tarek A; Kharshoum, Rasha M; Fahmy, Usama A; Alsawahli, Majed
2015-01-01
The purpose was to improve the encapsulation efficiency of cetirizine hydrochloride (CTZ) microspheres as a model for water soluble drugs and control its release by applying response surface methodology. A 3(3) Box-Behnken design was used to determine the effect of drug/polymer ratio (X1), surfactant concentration (X2) and stirring speed (X3), on the mean particle size (Y1), percentage encapsulation efficiency (Y2) and cumulative percent drug released for 12 h (Y3). Emulsion solvent evaporation (ESE) technique was applied utilizing Eudragit RS100 as coating polymer and span 80 as surfactant. All formulations were evaluated for micromeritic properties and morphologically characterized by scanning electron microscopy (SEM). The relative bioavailability of the optimized microspheres was compared with CTZ marketed product after oral administration on healthy human volunteers using a double blind, randomized, cross-over design. The results revealed that the mean particle sizes of the microspheres ranged from 62 to 348 µm and the efficiency of entrapment ranged from 36.3% to 70.1%. The optimized CTZ microspheres exhibited a slow and controlled release over 12 h. The pharmacokinetic data of optimized CTZ microspheres showed prolonged tmax, decreased Cmax and AUC0-∞ value of 3309 ± 211 ng h/ml indicating improved relative bioavailability by 169.4% compared with marketed tablets.