large statistical sample: Topics by WorldWideScience.org

Sample records for large statistical sample

A course in mathematical statistics and large sample theory

CERN Document Server

Bhattacharya, Rabi; Patrangenaru, Victor

2016-01-01

This graduate-level textbook is primarily aimed at graduate students of statistics, mathematics, science, and engineering who have had an undergraduate course in statistics, an upper division course in analysis, and some acquaintance with measure theoretic probability. It provides a rigorous presentation of the core of mathematical statistics. Part I of this book constitutes a one-semester course on basic parametric mathematical statistics. Part II deals with the large sample theory of statistics — parametric and nonparametric, and its contents may be covered in one semester as well. Part III provides brief accounts of a number of topics of current interest for practitioners and other disciplines whose work involves statistical methods. Large Sample theory with many worked examples, numerical calculations, and simulations to illustrate theory Appendices provide ready access to a number of standard results, with many proofs Solutions given to a number of selected exercises from Part I Part II exercises with ...
Multivariate statistics high-dimensional and large-sample approximations

CERN Document Server

Fujikoshi, Yasunori; Shimizu, Ryoichi

2010-01-01

A comprehensive examination of high-dimensional analysis of multivariate methods and their real-world applications Multivariate Statistics: High-Dimensional and Large-Sample Approximations is the first book of its kind to explore how classical multivariate methods can be revised and used in place of conventional statistical tools. Written by prominent researchers in the field, the book focuses on high-dimensional and large-scale approximations and details the many basic multivariate methods used to achieve high levels of accuracy. The authors begin with a fundamental presentation of the basic
Statistical characterization of a large geochemical database and effect of sample size

Science.gov (United States)

Zhang, C.; Manheim, F.T.; Hinde, J.; Grossman, J.N.

2005-01-01

The authors investigated statistical distributions for concentrations of chemical elements from the National Geochemical Survey (NGS) database of the U.S. Geological Survey. At the time of this study, the NGS data set encompasses 48,544 stream sediment and soil samples from the conterminous United States analyzed by ICP-AES following a 4-acid near-total digestion. This report includes 27 elements: Al, Ca, Fe, K, Mg, Na, P, Ti, Ba, Ce, Co, Cr, Cu, Ga, La, Li, Mn, Nb, Nd, Ni, Pb, Sc, Sr, Th, V, Y and Zn. The goal and challenge for the statistical overview was to delineate chemical distributions in a complex, heterogeneous data set spanning a large geographic range (the conterminous United States), and many different geological provinces and rock types. After declustering to create a uniform spatial sample distribution with 16,511 samples, histograms and quantile-quantile (Q-Q) plots were employed to delineate subpopulations that have coherent chemical and mineral affinities. Probability groupings are discerned by changes in slope (kinks) on the plots. Major rock-forming elements, e.g., Al, Ca, K and Na, tend to display linear segments on normal Q-Q plots. These segments can commonly be linked to petrologic or mineralogical associations. For example, linear segments on K and Na plots reflect dilution of clay minerals by quartz sand (low in K and Na). Minor and trace element relationships are best displayed on lognormal Q-Q plots. These sensitively reflect discrete relationships in subpopulations within the wide range of the data. For example, small but distinctly log-linear subpopulations for Pb, Cu, Zn and Ag are interpreted to represent ore-grade enrichment of naturally occurring minerals such as sulfides. None of the 27 chemical elements could pass the test for either normal or lognormal distribution on the declustered data set. Part of the reasons relate to the presence of mixtures of subpopulations and outliers. Random samples of the data set with successively
Gene coexpression measures in large heterogeneous samples using count statistics.

Science.gov (United States)

Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan

2014-11-18

With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.
The large sample size fallacy.

Science.gov (United States)

Lantz, Björn

2013-06-01

Significance in the statistical sense has little to do with significance in the common practical sense. Statistical significance is a necessary but not a sufficient condition for practical significance. Hence, results that are extremely statistically significant may be highly nonsignificant in practice. The degree of practical significance is generally determined by the size of the observed effect, not the p-value. The results of studies based on large samples are often characterized by extreme statistical significance despite small or even trivial effect sizes. Interpreting such results as significant in practice without further analysis is referred to as the large sample size fallacy in this article. The aim of this article is to explore the relevance of the large sample size fallacy in contemporary nursing research. Relatively few nursing articles display explicit measures of observed effect sizes or include a qualitative discussion of observed effect sizes. Statistical significance is often treated as an end in itself. Effect sizes should generally be calculated and presented along with p-values for statistically significant results, and observed effect sizes should be discussed qualitatively through direct and explicit comparisons with the effects in related literature. © 2012 Nordic College of Caring Science.
[Effect sizes, statistical power and sample sizes in "the Japanese Journal of Psychology"].

Science.gov (United States)

Suzukawa, Yumi; Toyoda, Hideki

2012-04-01

This study analyzed the statistical power of research studies published in the "Japanese Journal of Psychology" in 2008 and 2009. Sample effect sizes and sample statistical powers were calculated for each statistical test and analyzed with respect to the analytical methods and the fields of the studies. The results show that in the fields like perception, cognition or learning, the effect sizes were relatively large, although the sample sizes were small. At the same time, because of the small sample sizes, some meaningful effects could not be detected. In the other fields, because of the large sample sizes, meaningless effects could be detected. This implies that researchers who could not get large enough effect sizes would use larger samples to obtain significant results.
Sampling, Probability Models and Statistical Reasoning Statistical

Indian Academy of Sciences (India)

Home; Journals; Resonance – Journal of Science Education; Volume 1; Issue 5. Sampling, Probability Models and Statistical Reasoning Statistical Inference. Mohan Delampady V R Padmawar. General Article Volume 1 Issue 5 May 1996 pp 49-58 ...
Statistical distribution sampling

Science.gov (United States)

Johnson, E. S.

1975-01-01

Determining the distribution of statistics by sampling was investigated. Characteristic functions, the quadratic regression problem, and the differential equations for the characteristic functions are analyzed.
Reliability and statistical power analysis of cortical and subcortical FreeSurfer metrics in a large sample of healthy elderly.

Science.gov (United States)

Liem, Franziskus; Mérillat, Susan; Bezzola, Ladina; Hirsiger, Sarah; Philipp, Michel; Madhyastha, Tara; Jäncke, Lutz

2015-03-01

FreeSurfer is a tool to quantify cortical and subcortical brain anatomy automatically and noninvasively. Previous studies have reported reliability and statistical power analyses in relatively small samples or only selected one aspect of brain anatomy. Here, we investigated reliability and statistical power of cortical thickness, surface area, volume, and the volume of subcortical structures in a large sample (N=189) of healthy elderly subjects (64+ years). Reliability (intraclass correlation coefficient) of cortical and subcortical parameters is generally high (cortical: ICCs>0.87, subcortical: ICCs>0.95). Surface-based smoothing increases reliability of cortical thickness maps, while it decreases reliability of cortical surface area and volume. Nevertheless, statistical power of all measures benefits from smoothing. When aiming to detect a 10% difference between groups, the number of subjects required to test effects with sufficient power over the entire cortex varies between cortical measures (cortical thickness: N=39, surface area: N=21, volume: N=81; 10mm smoothing, power=0.8, α=0.05). For subcortical regions this number is between 16 and 76 subjects, depending on the region. We also demonstrate the advantage of within-subject designs over between-subject designs. Furthermore, we publicly provide a tool that allows researchers to perform a priori power analysis and sensitivity analysis to help evaluate previously published studies and to design future studies with sufficient statistical power. Copyright © 2014 Elsevier Inc. All rights reserved.
Statistics of LES simulations of large wind farms

DEFF Research Database (Denmark)

Andersen, Søren Juhl; Sørensen, Jens Nørkær; Mikkelsen, Robert Flemming

2016-01-01

. The statistical moments appear to collapse and hence the turbulence inside large wind farms can potentially be scaled accordingly. The thrust coefficient is estimated by two different reference velocities and the generic CT expression by Frandsen. A reference velocity derived from the power production is shown...... to give very good agreement and furthermore enables the very good estimation of the thrust force using only the steady CT-curve, even for very short time samples. Finally, the effective turbulence inside large wind farms and the equivalent loads are examined....
Effect of model choice and sample size on statistical tolerance limits

International Nuclear Information System (INIS)

Duran, B.S.; Campbell, K.

1980-03-01

Statistical tolerance limits are estimates of large (or small) quantiles of a distribution, quantities which are very sensitive to the shape of the tail of the distribution. The exact nature of this tail behavior cannot be ascertained brom small samples, so statistical tolerance limits are frequently computed using a statistical model chosen on the basis of theoretical considerations or prior experience with similar populations. This report illustrates the effects of such choices on the computations
Statistical Symbolic Execution with Informed Sampling

Science.gov (United States)

Filieri, Antonio; Pasareanu, Corina S.; Visser, Willem; Geldenhuys, Jaco

2014-01-01

Symbolic execution techniques have been proposed recently for the probabilistic analysis of programs. These techniques seek to quantify the likelihood of reaching program events of interest, e.g., assert violations. They have many promising applications but have scalability issues due to high computational demand. To address this challenge, we propose a statistical symbolic execution technique that performs Monte Carlo sampling of the symbolic program paths and uses the obtained information for Bayesian estimation and hypothesis testing with respect to the probability of reaching the target events. To speed up the convergence of the statistical analysis, we propose Informed Sampling, an iterative symbolic execution that first explores the paths that have high statistical significance, prunes them from the state space and guides the execution towards less likely paths. The technique combines Bayesian estimation with a partial exact analysis for the pruned paths leading to provably improved convergence of the statistical analysis. We have implemented statistical symbolic execution with in- formed sampling in the Symbolic PathFinder tool. We show experimentally that the informed sampling obtains more precise results and converges faster than a purely statistical analysis and may also be more efficient than an exact symbolic analysis. When the latter does not terminate symbolic execution with informed sampling can give meaningful results under the same time and memory limits.
Statistical sampling method for releasing decontaminated vehicles

International Nuclear Information System (INIS)

Lively, J.W.; Ware, J.A.

1996-01-01

Earth moving vehicles (e.g., dump trucks, belly dumps) commonly haul radiologically contaminated materials from a site being remediated to a disposal site. Traditionally, each vehicle must be surveyed before being released. The logistical difficulties of implementing the traditional approach on a large scale demand that an alternative be devised. A statistical method (MIL-STD-105E, open-quotes Sampling Procedures and Tables for Inspection by Attributesclose quotes) for assessing product quality from a continuous process was adapted to the vehicle decontamination process. This method produced a sampling scheme that automatically compensates and accommodates fluctuating batch sizes and changing conditions without the need to modify or rectify the sampling scheme in the field. Vehicles are randomly selected (sampled) upon completion of the decontamination process to be surveyed for residual radioactive surface contamination. The frequency of sampling is based on the expected number of vehicles passing through the decontamination process in a given period and the confidence level desired. This process has been successfully used for 1 year at the former uranium mill site in Monticello, Utah (a CERCLA regulated clean-up site). The method forces improvement in the quality of the decontamination process and results in a lower likelihood that vehicles exceeding the surface contamination standards are offered for survey. Implementation of this statistical sampling method on Monticello Projects has resulted in more efficient processing of vehicles through decontamination and radiological release, saved hundreds of hours of processing time, provided a high level of confidence that release limits are met, and improved the radiological cleanliness of vehicles leaving the controlled site
The large deviation approach to statistical mechanics

International Nuclear Information System (INIS)

Touchette, Hugo

2009-01-01

The theory of large deviations is concerned with the exponential decay of probabilities of large fluctuations in random systems. These probabilities are important in many fields of study, including statistics, finance, and engineering, as they often yield valuable information about the large fluctuations of a random system around its most probable state or trajectory. In the context of equilibrium statistical mechanics, the theory of large deviations provides exponential-order estimates of probabilities that refine and generalize Einstein's theory of fluctuations. This review explores this and other connections between large deviation theory and statistical mechanics, in an effort to show that the mathematical language of statistical mechanics is the language of large deviation theory. The first part of the review presents the basics of large deviation theory, and works out many of its classical applications related to sums of random variables and Markov processes. The second part goes through many problems and results of statistical mechanics, and shows how these can be formulated and derived within the context of large deviation theory. The problems and results treated cover a wide range of physical systems, including equilibrium many-particle systems, noise-perturbed dynamics, nonequilibrium systems, as well as multifractals, disordered systems, and chaotic systems. This review also covers many fundamental aspects of statistical mechanics, such as the derivation of variational principles characterizing equilibrium and nonequilibrium states, the breaking of the Legendre transform for nonconcave entropies, and the characterization of nonequilibrium fluctuations through fluctuation relations.
The large deviation approach to statistical mechanics

Science.gov (United States)

Touchette, Hugo

2009-07-01

The theory of large deviations is concerned with the exponential decay of probabilities of large fluctuations in random systems. These probabilities are important in many fields of study, including statistics, finance, and engineering, as they often yield valuable information about the large fluctuations of a random system around its most probable state or trajectory. In the context of equilibrium statistical mechanics, the theory of large deviations provides exponential-order estimates of probabilities that refine and generalize Einstein’s theory of fluctuations. This review explores this and other connections between large deviation theory and statistical mechanics, in an effort to show that the mathematical language of statistical mechanics is the language of large deviation theory. The first part of the review presents the basics of large deviation theory, and works out many of its classical applications related to sums of random variables and Markov processes. The second part goes through many problems and results of statistical mechanics, and shows how these can be formulated and derived within the context of large deviation theory. The problems and results treated cover a wide range of physical systems, including equilibrium many-particle systems, noise-perturbed dynamics, nonequilibrium systems, as well as multifractals, disordered systems, and chaotic systems. This review also covers many fundamental aspects of statistical mechanics, such as the derivation of variational principles characterizing equilibrium and nonequilibrium states, the breaking of the Legendre transform for nonconcave entropies, and the characterization of nonequilibrium fluctuations through fluctuation relations.
42 CFR 402.109 - Statistical sampling.

Science.gov (United States)

2010-10-01

... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study, if based upon an appropriate sampling and computed by valid statistical methods, constitute prima... § 402.1. (c) Burden of proof. Once CMS or OIG has made a prima facie case, the burden is on the...
Statistical searches for microlensing events in large, non-uniformly sampled time-domain surveys: A test using palomar transient factory data

Energy Technology Data Exchange (ETDEWEB)

Price-Whelan, Adrian M.; Agüeros, Marcel A. [Department of Astronomy, Columbia University, 550 W 120th Street, New York, NY 10027 (United States); Fournier, Amanda P. [Department of Physics, Broida Hall, University of California, Santa Barbara, CA 93106 (United States); Street, Rachel [Las Cumbres Observatory Global Telescope Network, Inc., 6740 Cortona Drive, Suite 102, Santa Barbara, CA 93117 (United States); Ofek, Eran O. [Benoziyo Center for Astrophysics, Weizmann Institute of Science, 76100 Rehovot (Israel); Covey, Kevin R. [Lowell Observatory, 1400 West Mars Hill Road, Flagstaff, AZ 86001 (United States); Levitan, David; Sesar, Branimir [Division of Physics, Mathematics, and Astronomy, California Institute of Technology, Pasadena, CA 91125 (United States); Laher, Russ R.; Surace, Jason, E-mail: adrn@astro.columbia.edu [Spitzer Science Center, California Institute of Technology, Mail Stop 314-6, Pasadena, CA 91125 (United States)

2014-01-20

Many photometric time-domain surveys are driven by specific goals, such as searches for supernovae or transiting exoplanets, which set the cadence with which fields are re-imaged. In the case of the Palomar Transient Factory (PTF), several sub-surveys are conducted in parallel, leading to non-uniform sampling over its ∼20,000 deg{sup 2} footprint. While the median 7.26 deg{sup 2} PTF field has been imaged ∼40 times in the R band, ∼2300 deg{sup 2} have been observed >100 times. We use PTF data to study the trade off between searching for microlensing events in a survey whose footprint is much larger than that of typical microlensing searches, but with far-from-optimal time sampling. To examine the probability that microlensing events can be recovered in these data, we test statistics used on uniformly sampled data to identify variables and transients. We find that the von Neumann ratio performs best for identifying simulated microlensing events in our data. We develop a selection method using this statistic and apply it to data from fields with >10 R-band observations, 1.1 × 10{sup 9} light curves, uncovering three candidate microlensing events. We lack simultaneous, multi-color photometry to confirm these as microlensing events. However, their number is consistent with predictions for the event rate in the PTF footprint over the survey's three years of operations, as estimated from near-field microlensing models. This work can help constrain all-sky event rate predictions and tests microlensing signal recovery in large data sets, which will be useful to future time-domain surveys, such as that planned with the Large Synoptic Survey Telescope.
Weighted statistical parameters for irregularly sampled time series

Science.gov (United States)

Rimoldini, Lorenzo

2014-01-01

Unevenly spaced time series are common in astronomy because of the day-night cycle, weather conditions, dependence on the source position in the sky, allocated telescope time and corrupt measurements, for example, or inherent to the scanning law of satellites like Hipparcos and the forthcoming Gaia. Irregular sampling often causes clumps of measurements and gaps with no data which can severely disrupt the values of estimators. This paper aims at improving the accuracy of common statistical parameters when linear interpolation (in time or phase) can be considered an acceptable approximation of a deterministic signal. A pragmatic solution is formulated in terms of a simple weighting scheme, adapting to the sampling density and noise level, applicable to large data volumes at minimal computational cost. Tests on time series from the Hipparcos periodic catalogue led to significant improvements in the overall accuracy and precision of the estimators with respect to the unweighted counterparts and those weighted by inverse-squared uncertainties. Automated classification procedures employing statistical parameters weighted by the suggested scheme confirmed the benefits of the improved input attributes. The classification of eclipsing binaries, Mira, RR Lyrae, Delta Cephei and Alpha2 Canum Venaticorum stars employing exclusively weighted descriptive statistics achieved an overall accuracy of 92 per cent, about 6 per cent higher than with unweighted estimators.
Extreme value statistics and thermodynamics of earthquakes. Large earthquakes

Energy Technology Data Exchange (ETDEWEB)

Lavenda, B. [Camerino Univ., Camerino, MC (Italy); Cipollone, E. [ENEA, Centro Ricerche Casaccia, S. Maria di Galeria, RM (Italy). National Centre for Research on Thermodynamics

2000-06-01

A compound Poisson process is used to derive a new shape parameter which can be used to discriminate between large earthquakes and aftershocks sequences. Sample exceedance distributions of large earthquakes are fitted to the Pareto tail and the actual distribution of the maximum to the Frechet distribution, while the sample distribution of aftershocks are fitted to a Beta distribution and the distribution of the minimum to the Weibull distribution for the smallest value. The transition between initial sample distributions and asymptotic extreme value distributions show that self-similar power laws are transformed into non scaling exponential distributions so that neither self-similarity nor the Gutenberg-Richter law can be considered universal. The energy-magnitude transformation converts the Frechet distribution into the Gumbel distribution, originally proposed by Epstein and Lomnitz, and not the Gompertz distribution as in the Lomnitz-Adler and Lomnitz generalization of the Gutenberg-Richter law. Numerical comparison is made with the Lomnitz-Adler and Lomnitz analysis using the same catalogue of Chinese earthquakes. An analogy is drawn between large earthquakes and high energy particle physics. A generalized equation of state is used to transform the Gamma density into the order-statistic Frechet distribution. Earthquake temperature and volume are determined as functions of the energy. Large insurance claims based on the Pareto distribution, which does not have a right endpoint, show why there cannot be a maximum earthquake energy.
Contributions to sampling statistics

CERN Document Server

Conti, Pier; Ranalli, Maria

2014-01-01

This book contains a selection of the papers presented at the ITACOSM 2013 Conference, held in Milan in June 2013. ITACOSM is the bi-annual meeting of the Survey Sampling Group S2G of the Italian Statistical Society, intended as an international forum of scientific discussion on the developments of theory and application of survey sampling methodologies and applications in human and natural sciences. The book gathers research papers carefully selected from both invited and contributed sessions of the conference. The whole book appears to be a relevant contribution to various key aspects of sampling methodology and techniques; it deals with some hot topics in sampling theory, such as calibration, quantile-regression and multiple frame surveys, and with innovative methodologies in important topics of both sampling theory and applications. Contributions cut across current sampling methodologies such as interval estimation for complex samples, randomized responses, bootstrap, weighting, modeling, imputati...

Statistical literacy and sample survey results

Science.gov (United States)

McAlevey, Lynn; Sullivan, Charles

2010-10-01

Sample surveys are widely used in the social sciences and business. The news media almost daily quote from them, yet they are widely misused. Using students with prior managerial experience embarking on an MBA course, we show that common sample survey results are misunderstood even by those managers who have previously done a statistics course. In general, they fare no better than managers who have never studied statistics. There are implications for teaching, especially in business schools, as well as for consulting.
Statistical sampling approaches for soil monitoring

NARCIS (Netherlands)

Brus, D.J.

2014-01-01

This paper describes three statistical sampling approaches for regional soil monitoring, a design-based, a model-based and a hybrid approach. In the model-based approach a space-time model is exploited to predict global statistical parameters of interest such as the space-time mean. In the hybrid
Extreme value statistics and thermodynamics of earthquakes: large earthquakes

Directory of Open Access Journals (Sweden)

B. H. Lavenda

2000-06-01

Full Text Available A compound Poisson process is used to derive a new shape parameter which can be used to discriminate between large earthquakes and aftershock sequences. Sample exceedance distributions of large earthquakes are fitted to the Pareto tail and the actual distribution of the maximum to the Fréchet distribution, while the sample distribution of aftershocks are fitted to a Beta distribution and the distribution of the minimum to the Weibull distribution for the smallest value. The transition between initial sample distributions and asymptotic extreme value distributions shows that self-similar power laws are transformed into nonscaling exponential distributions so that neither self-similarity nor the Gutenberg-Richter law can be considered universal. The energy-magnitude transformation converts the Fréchet distribution into the Gumbel distribution, originally proposed by Epstein and Lomnitz, and not the Gompertz distribution as in the Lomnitz-Adler and Lomnitz generalization of the Gutenberg-Richter law. Numerical comparison is made with the Lomnitz-Adler and Lomnitz analysis using the same Catalogue of Chinese Earthquakes. An analogy is drawn between large earthquakes and high energy particle physics. A generalized equation of state is used to transform the Gamma density into the order-statistic Fréchet distribution. Earthquaketemperature and volume are determined as functions of the energy. Large insurance claims based on the Pareto distribution, which does not have a right endpoint, show why there cannot be a maximum earthquake energy.
Speeding Up Non-Parametric Bootstrap Computations for Statistics Based on Sample Moments in Small/Moderate Sample Size Applications.

Directory of Open Access Journals (Sweden)

Elias Chaibub Neto

Full Text Available In this paper we propose a vectorized implementation of the non-parametric bootstrap for statistics based on sample moments. Basically, we adopt the multinomial sampling formulation of the non-parametric bootstrap, and compute bootstrap replications of sample moment statistics by simply weighting the observed data according to multinomial counts instead of evaluating the statistic on a resampled version of the observed data. Using this formulation we can generate a matrix of bootstrap weights and compute the entire vector of bootstrap replications with a few matrix multiplications. Vectorization is particularly important for matrix-oriented programming languages such as R, where matrix/vector calculations tend to be faster than scalar operations implemented in a loop. We illustrate the application of the vectorized implementation in real and simulated data sets, when bootstrapping Pearson's sample correlation coefficient, and compared its performance against two state-of-the-art R implementations of the non-parametric bootstrap, as well as a straightforward one based on a for loop. Our investigations spanned varying sample sizes and number of bootstrap replications. The vectorized bootstrap compared favorably against the state-of-the-art implementations in all cases tested, and was remarkably/considerably faster for small/moderate sample sizes. The same results were observed in the comparison with the straightforward implementation, except for large sample sizes, where the vectorized bootstrap was slightly slower than the straightforward implementation due to increased time expenditures in the generation of weight matrices via multinomial sampling.
Statistical aspects of food safety sampling

NARCIS (Netherlands)

Jongenburger, I.; Besten, den H.M.W.; Zwietering, M.H.

2015-01-01

In food safety management, sampling is an important tool for verifying control. Sampling by nature is a stochastic process. However, uncertainty regarding results is made even greater by the uneven distribution of microorganisms in a batch of food. This article reviews statistical aspects of
Statistical process control charts for attribute data involving very large sample sizes: a review of problems and solutions.

Science.gov (United States)

Mohammed, Mohammed A; Panesar, Jagdeep S; Laney, David B; Wilson, Richard

2013-04-01

The use of statistical process control (SPC) charts in healthcare is increasing. The primary purpose of SPC is to distinguish between common-cause variation which is attributable to the underlying process, and special-cause variation which is extrinsic to the underlying process. This is important because improvement under common-cause variation requires action on the process, whereas special-cause variation merits an investigation to first find the cause. Nonetheless, when dealing with attribute or count data (eg, number of emergency admissions) involving very large sample sizes, traditional SPC charts often produce tight control limits with most of the data points appearing outside the control limits. This can give a false impression of common and special-cause variation, and potentially misguide the user into taking the wrong actions. Given the growing availability of large datasets from routinely collected databases in healthcare, there is a need to present a review of this problem (which arises because traditional attribute charts only consider within-subgroup variation) and its solutions (which consider within and between-subgroup variation), which involve the use of the well-established measurements chart and the more recently developed attribute charts based on Laney's innovative approach. We close by making some suggestions for practice.
Some Statistics for Measuring Large-Scale Structure

OpenAIRE

Brandenberger, Robert H.; Kaplan, David M.; A, Stephen; Ramsey

1993-01-01

Good statistics for measuring large-scale structure in the Universe must be able to distinguish between different models of structure formation. In this paper, two and three dimensional ``counts in cell" statistics and a new ``discrete genus statistic" are applied to toy versions of several popular theories of structure formation: random phase cold dark matter model, cosmic string models, and global texture scenario. All three statistics appear quite promising in terms of differentiating betw...
Audit sampling: A qualitative study on the role of statistical and non-statistical sampling approaches on audit practices in Sweden

OpenAIRE

Ayam, Rufus Tekoh

2011-01-01

PURPOSE: The two approaches to audit sampling; statistical and nonstatistical have been examined in this study. The overall purpose of the study is to explore the current extent at which statistical and nonstatistical sampling approaches are utilized by independent auditors during auditing practices. Moreover, the study also seeks to achieve two additional purposes; the first is to find out whether auditors utilize different sampling techniques when auditing SME´s (Small and Medium-Sized Ente...
Large area synchrotron X-ray fluorescence mapping of biological samples

International Nuclear Information System (INIS)

Kempson, I.; Thierry, B.; Smith, E.; Gao, M.; De Jonge, M.

2014-01-01

Large area mapping of inorganic material in biological samples has suffered severely from prohibitively long acquisition times. With the advent of new detector technology we can now generate statistically relevant information for studying cell populations, inter-variability and bioinorganic chemistry in large specimen. We have been implementing ultrafast synchrotron-based XRF mapping afforded by the MAIA detector for large area mapping of biological material. For example, a 2.5 million pixel map can be acquired in 3 hours, compared to a typical synchrotron XRF set-up needing over 1 month of uninterrupted beamtime. Of particular focus to us is the fate of metals and nanoparticles in cells, 3D tissue models and animal tissues. The large area scanning has for the first time provided statistically significant information on sufficiently large numbers of cells to provide data on intercellular variability in uptake of nanoparticles. Techniques such as flow cytometry generally require analysis of thousands of cells for statistically meaningful comparison, due to the large degree of variability. Large area XRF now gives comparable information in a quantifiable manner. Furthermore, we can now image localised deposition of nanoparticles in tissues that would be highly improbable to 'find' by typical XRF imaging. In addition, the ultra fast nature also makes it viable to conduct 3D XRF tomography over large dimensions. This technology avails new opportunities in biomonitoring and understanding metal and nanoparticle fate ex-vivo. Following from this is extension to molecular imaging through specific anti-body targeted nanoparticles to label specific tissues and monitor cellular process or biological consequence
A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions.

Science.gov (United States)

Pavlacky, David C; Lukacs, Paul M; Blakesley, Jennifer A; Skorkowsky, Robert C; Klute, David S; Hahn, Beth A; Dreitz, Victoria J; George, T Luke; Hanni, David J

2017-01-01

Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1) coordination across organizations and regions, 2) meaningful management and conservation objectives, and 3) rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR) program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17). We provide two examples for the Brewer's sparrow (Spizella breweri) in BCR 17 demonstrating the ability of the design to 1) determine hierarchical population responses to landscape change and 2) estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous statistical
A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions.

Directory of Open Access Journals (Sweden)

David C Pavlacky

Full Text Available Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1 coordination across organizations and regions, 2 meaningful management and conservation objectives, and 3 rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17. We provide two examples for the Brewer's sparrow (Spizella breweri in BCR 17 demonstrating the ability of the design to 1 determine hierarchical population responses to landscape change and 2 estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous
Measuring radioactive half-lives via statistical sampling in practice

Science.gov (United States)

Lorusso, G.; Collins, S. M.; Jagan, K.; Hitt, G. W.; Sadek, A. M.; Aitken-Smith, P. M.; Bridi, D.; Keightley, J. D.

2017-10-01

The statistical sampling method for the measurement of radioactive decay half-lives exhibits intriguing features such as that the half-life is approximately the median of a distribution closely resembling a Cauchy distribution. Whilst initial theoretical considerations suggested that in certain cases the method could have significant advantages, accurate measurements by statistical sampling have proven difficult, for they require an exercise in non-standard statistical analysis. As a consequence, no half-life measurement using this method has yet been reported and no comparison with traditional methods has ever been made. We used a Monte Carlo approach to address these analysis difficulties, and present the first experimental measurement of a radioisotope half-life (211Pb) by statistical sampling in good agreement with the literature recommended value. Our work also focused on the comparison between statistical sampling and exponential regression analysis, and concluded that exponential regression achieves generally the highest accuracy.
Galaxies distribution in the universe: large-scale statistics and structures

International Nuclear Information System (INIS)

Maurogordato, Sophie

1988-01-01

This research thesis addresses the distribution of galaxies in the Universe, and more particularly large scale statistics and structures. Based on an assessment of the main used statistical techniques, the author outlines the need to develop additional tools to correlation functions in order to characterise the distribution. She introduces a new indicator: the probability of a volume randomly tested in the distribution to be void. This allows a characterisation of void properties at the work scales (until 10h"-"1 Mpc) in the Harvard Smithsonian Center for Astrophysics Redshift Survey, or CfA catalog. A systematic analysis of statistical properties of different sub-samples has then been performed with respect to the size and location, luminosity class, and morphological type. This analysis is then extended to different scenarios of structure formation. A program of radial speed measurements based on observations allows the determination of possible relationships between apparent structures. The author also presents results of the search for south extensions of Perseus supernova [fr
Statistical benchmark for BosonSampling

International Nuclear Information System (INIS)

Walschaers, Mattia; Mayer, Klaus; Buchleitner, Andreas; Kuipers, Jack; Urbina, Juan-Diego; Richter, Klaus; Tichy, Malte Christopher

2016-01-01

Boson samplers—set-ups that generate complex many-particle output states through the transmission of elementary many-particle input states across a multitude of mutually coupled modes—promise the efficient quantum simulation of a classically intractable computational task, and challenge the extended Church–Turing thesis, one of the fundamental dogmas of computer science. However, as in all experimental quantum simulations of truly complex systems, one crucial problem remains: how to certify that a given experimental measurement record unambiguously results from enforcing the claimed dynamics, on bosons, fermions or distinguishable particles? Here we offer a statistical solution to the certification problem, identifying an unambiguous statistical signature of many-body quantum interference upon transmission across a multimode, random scattering device. We show that statistical analysis of only partial information on the output state allows to characterise the imparted dynamics through particle type-specific features of the emerging interference patterns. The relevant statistical quantifiers are classically computable, define a falsifiable benchmark for BosonSampling, and reveal distinctive features of many-particle quantum dynamics, which go much beyond mere bunching or anti-bunching effects. (fast track communication)
Statistical conditional sampling for variable-resolution video compression.

Directory of Open Access Journals (Sweden)

Alexander Wong

Full Text Available In this study, we investigate a variable-resolution approach to video compression based on Conditional Random Field and statistical conditional sampling in order to further improve compression rate while maintaining high-quality video. In the proposed approach, representative key-frames within a video shot are identified and stored at full resolution. The remaining frames within the video shot are stored and compressed at a reduced resolution. At the decompression stage, a region-based dictionary is constructed from the key-frames and used to restore the reduced resolution frames to the original resolution via statistical conditional sampling. The sampling approach is based on the conditional probability of the CRF modeling by use of the constructed dictionary. Experimental results show that the proposed variable-resolution approach via statistical conditional sampling has potential for improving compression rates when compared to compressing the video at full resolution, while achieving higher video quality when compared to compressing the video at reduced resolution.
Statistical sampling techniques as applied to OSE inspections

International Nuclear Information System (INIS)

Davis, J.J.; Cote, R.W.

1987-01-01

The need has been recognized for statistically valid methods for gathering information during OSE inspections; and for interpretation of results, both from performance testing and from records reviews, interviews, etc. Battelle Columbus Division, under contract to DOE OSE has performed and is continuing to perform work in the area of statistical methodology for OSE inspections. This paper represents some of the sampling methodology currently being developed for use during OSE inspections. Topics include population definition, sample size requirements, level of confidence and practical logistical constraints associated with the conduct of an inspection based on random sampling. Sequential sampling schemes and sampling from finite populations are also discussed. The methods described are applicable to various data gathering activities, ranging from the sampling and examination of classified documents to the sampling of Protective Force security inspectors for skill testing
Statistical Analysis Of Tank 19F Floor Sample Results

International Nuclear Information System (INIS)

Harris, S.

2010-01-01

Representative sampling has been completed for characterization of the residual material on the floor of Tank 19F as per the statistical sampling plan developed by Harris and Shine. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples results to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL95%) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current scrape sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 19F. The uncertainty is quantified in this report by an UCL95% on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL95% was based entirely on the six current scrape sample results (each averaged across three analytical determinations).
STATISTICAL ANALYSIS OF TANK 18F FLOOR SAMPLE RESULTS

Energy Technology Data Exchange (ETDEWEB)

Harris, S.

2010-09-02

Representative sampling has been completed for characterization of the residual material on the floor of Tank 18F as per the statistical sampling plan developed by Shine [1]. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL [2]. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples results [3] to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL{sub 95%}) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 18F. The uncertainty is quantified in this report by an upper 95% confidence limit (UCL{sub 95%}) on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL{sub 95%} was based entirely on the six current scrape sample results (each averaged across three analytical determinations).
Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations

Energy Technology Data Exchange (ETDEWEB)

Kleijnen, J.P.C.; Helton, J.C.

1999-04-01

The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are considered for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.
Developing Students' Reasoning about Samples and Sampling Variability as a Path to Expert Statistical Thinking

Science.gov (United States)

Garfield, Joan; Le, Laura; Zieffler, Andrew; Ben-Zvi, Dani

2015-01-01

This paper describes the importance of developing students' reasoning about samples and sampling variability as a foundation for statistical thinking. Research on expert-novice thinking as well as statistical thinking is reviewed and compared. A case is made that statistical thinking is a type of expert thinking, and as such, research…

Shell model in large spaces and statistical spectroscopy

International Nuclear Information System (INIS)

Kota, V.K.B.

1996-01-01

For many nuclear structure problems of current interest it is essential to deal with shell model in large spaces. For this, three different approaches are now in use and two of them are: (i) the conventional shell model diagonalization approach but taking into account new advances in computer technology; (ii) the shell model Monte Carlo method. A brief overview of these two methods is given. Large space shell model studies raise fundamental questions regarding the information content of the shell model spectrum of complex nuclei. This led to the third approach- the statistical spectroscopy methods. The principles of statistical spectroscopy have their basis in nuclear quantum chaos and they are described (which are substantiated by large scale shell model calculations) in some detail. (author)
Exploring Technostress: Results of a Large Sample Factor Analysis

Directory of Open Access Journals (Sweden)

Steponas Jonušauskas

2016-06-01

Full Text Available With reference to the results of a large sample factor analysis, the article aims to propose the frame examining technostress in a population. The survey and principal component analysis of the sample consisting of 1013 individuals who use ICT in their everyday work was implemented in the research. 13 factors combine 68 questions and explain 59.13 per cent of the answers dispersion. Based on the factor analysis, questionnaire was reframed and prepared to reasonably analyze the respondents’ answers, revealing technostress causes and consequences as well as technostress prevalence in the population in a statistically validated pattern. A key elements of technostress based on factor analysis can serve for the construction of technostress measurement scales in further research.
Large sample neutron activation analysis of a reference inhomogeneous sample

International Nuclear Information System (INIS)

Vasilopoulou, T.; Athens National Technical University, Athens; Tzika, F.; Stamatelatos, I.E.; Koster-Ammerlaan, M.J.J.

2011-01-01

A benchmark experiment was performed for Neutron Activation Analysis (NAA) of a large inhomogeneous sample. The reference sample was developed in-house and consisted of SiO 2 matrix and an Al-Zn alloy 'inhomogeneity' body. Monte Carlo simulations were employed to derive appropriate correction factors for neutron self-shielding during irradiation as well as self-attenuation of gamma rays and sample geometry during counting. The large sample neutron activation analysis (LSNAA) results were compared against reference values and the trueness of the technique was evaluated. An agreement within ±10% was observed between LSNAA and reference elemental mass values, for all matrix and inhomogeneity elements except Samarium, provided that the inhomogeneity body was fully simulated. However, in cases that the inhomogeneity was treated as not known, the results showed a reasonable agreement for most matrix elements, while large discrepancies were observed for the inhomogeneity elements. This study provided a quantification of the uncertainties associated with inhomogeneity in large sample analysis and contributed to the identification of the needs for future development of LSNAA facilities for analysis of inhomogeneous samples. (author)
Illustrating Sampling Distribution of a Statistic: Minitab Revisited

Science.gov (United States)

Johnson, H. Dean; Evans, Marc A.

2008-01-01

Understanding the concept of the sampling distribution of a statistic is essential for the understanding of inferential procedures. Unfortunately, this topic proves to be a stumbling block for students in introductory statistics classes. In efforts to aid students in their understanding of this concept, alternatives to a lecture-based mode of…
Calculating Confidence, Uncertainty, and Numbers of Samples When Using Statistical Sampling Approaches to Characterize and Clear Contaminated Areas

Energy Technology Data Exchange (ETDEWEB)

Piepel, Gregory F.; Matzke, Brett D.; Sego, Landon H.; Amidan, Brett G.

2013-04-27

This report discusses the methodology, formulas, and inputs needed to make characterization and clearance decisions for Bacillus anthracis-contaminated and uncontaminated (or decontaminated) areas using a statistical sampling approach. Specifically, the report includes the methods and formulas for calculating the • number of samples required to achieve a specified confidence in characterization and clearance decisions • confidence in making characterization and clearance decisions for a specified number of samples for two common statistically based environmental sampling approaches. In particular, the report addresses an issue raised by the Government Accountability Office by providing methods and formulas to calculate the confidence that a decision area is uncontaminated (or successfully decontaminated) if all samples collected according to a statistical sampling approach have negative results. Key to addressing this topic is the probability that an individual sample result is a false negative, which is commonly referred to as the false negative rate (FNR). The two statistical sampling approaches currently discussed in this report are 1) hotspot sampling to detect small isolated contaminated locations during the characterization phase, and 2) combined judgment and random (CJR) sampling during the clearance phase. Typically if contamination is widely distributed in a decision area, it will be detectable via judgment sampling during the characterization phrase. Hotspot sampling is appropriate for characterization situations where contamination is not widely distributed and may not be detected by judgment sampling. CJR sampling is appropriate during the clearance phase when it is desired to augment judgment samples with statistical (random) samples. The hotspot and CJR statistical sampling approaches are discussed in the report for four situations: 1. qualitative data (detect and non-detect) when the FNR = 0 or when using statistical sampling methods that account
Large Sample Neutron Activation Analysis of Heterogeneous Samples

International Nuclear Information System (INIS)

Stamatelatos, I.E.; Vasilopoulou, T.; Tzika, F.

2018-01-01

A Large Sample Neutron Activation Analysis (LSNAA) technique was developed for non-destructive analysis of heterogeneous bulk samples. The technique incorporated collimated scanning and combining experimental measurements and Monte Carlo simulations for the identification of inhomogeneities in large volume samples and the correction of their effect on the interpretation of gamma-spectrometry data. Corrections were applied for the effect of neutron self-shielding, gamma-ray attenuation, geometrical factor and heterogeneous activity distribution within the sample. A benchmark experiment was performed to investigate the effect of heterogeneity on the accuracy of LSNAA. Moreover, a ceramic vase was analyzed as a whole demonstrating the feasibility of the technique. The LSNAA results were compared against results obtained by INAA and a satisfactory agreement between the two methods was observed. This study showed that LSNAA is a technique capable to perform accurate non-destructive, multi-elemental compositional analysis of heterogeneous objects. It also revealed the great potential of the technique for the analysis of precious objects and artefacts that need to be preserved intact and cannot be damaged for sampling purposes. (author)
Pierre Gy's sampling theory and sampling practice heterogeneity, sampling correctness, and statistical process control

CERN Document Server

Pitard, Francis F

1993-01-01

Pierre Gy's Sampling Theory and Sampling Practice, Second Edition is a concise, step-by-step guide for process variability management and methods. Updated and expanded, this new edition provides a comprehensive study of heterogeneity, covering the basic principles of sampling theory and its various applications. It presents many practical examples to allow readers to select appropriate sampling protocols and assess the validity of sampling protocols from others. The variability of dynamic process streams using variography is discussed to help bridge sampling theory with statistical process control. Many descriptions of good sampling devices, as well as descriptions of poor ones, are featured to educate readers on what to look for when purchasing sampling systems. The book uses its accessible, tutorial style to focus on professional selection and use of methods. The book will be a valuable guide for mineral processing engineers; metallurgists; geologists; miners; chemists; environmental scientists; and practit...
The application of statistical and/or non-statistical sampling techniques by internal audit functions in the South African banking industry

Directory of Open Access Journals (Sweden)

D.P. van der Nest

2015-03-01

Full Text Available This article explores the use by internal audit functions of audit sampling techniques in order to test the effectiveness of controls in the banking sector. The article focuses specifically on the use of statistical and/or non-statistical sampling techniques by internal auditors. The focus of the research for this article was internal audit functions in the banking sector of South Africa. The results discussed in the article indicate that audit sampling is still used frequently as an audit evidence-gathering technique. Non-statistical sampling techniques are used more frequently than statistical sampling techniques for the evaluation of the sample. In addition, both techniques are regarded as important for the determination of the sample size and the selection of the sample items
Finite-sample instrumental variables inference using an asymptotically pivotal statistic

NARCIS (Netherlands)

Bekker, P; Kleibergen, F

2003-01-01

We consider the K-statistic, Kleibergen's (2002, Econometrica 70, 1781-1803) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Whereas Kleibergen (2002) especially analyzes the asymptotic behavior of the statistic, we focus on finite-sample properties in, a
Statistical analyses to support guidelines for marine avian sampling. Final report

Science.gov (United States)

Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris

2012-01-01

distribution to describe counts of a given species in a particular region and season. 4. Using a large database of historical at-sea seabird survey data, we applied this technique to identify appropriate statistical distributions for modeling a variety of species, allowing the distribution to vary by season. For each species and season, we used the selected distribution to calculate and map retrospective statistical power to detect hotspots and coldspots, and map pvalues from Monte Carlo significance tests of hotspots and coldspots, in discrete lease blocks designated by the U.S. Department of Interior, Bureau of Ocean Energy Management (BOEM). 5. Because our definition of hotspots and coldspots does not explicitly include variability over time, we examine the relationship between the temporal scale of sampling and the proportion of variance captured in time series of key environmental correlates of marine bird abundance, as well as available marine bird abundance time series, and use these analyses to develop recommendations for the temporal distribution of sampling to adequately represent both shortterm and long-term variability. We conclude by presenting a schematic “decision tree” showing how this power analysis approach would fit in a general framework for avian survey design, and discuss implications of model assumptions and results. We discuss avenues for future development of this work, and recommendations for practical implementation in the context of siting and wildlife assessment for offshore renewable energy development projects.
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

Science.gov (United States)

Lin, Johnny; Bentler, Peter M

2012-01-01

Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.
Statistical measurement of power spectrum density of large aperture optical component

International Nuclear Information System (INIS)

Xu Jiancheng; Xu Qiao; Chai Liqun

2010-01-01

According to the requirement of ICF, a method based on statistical theory has been proposed to measure the power spectrum density (PSD) of large aperture optical components. The method breaks the large-aperture wavefront into small regions, and obtains the PSD of the large-aperture wavefront by weighted averaging of the PSDs of the regions, where the weight factor is each region's area. Simulation and experiment demonstrate the effectiveness of the proposed method. They also show that, the obtained PSDs of the large-aperture wavefront by statistical method and sub-aperture stitching method fit well, when the number of small regions is no less than 8 x 8. The statistical method is not sensitive to translation stage's errors and environment instabilities, thus it is appropriate for PSD measurement during the process of optical fabrication. (authors)
Statistical sampling for holdup measurement

International Nuclear Information System (INIS)

Picard, R.R.; Pillay, K.K.S.

1986-01-01

Nuclear materials holdup is a serious problem in many operating facilities. Estimating amounts of holdup is important for materials accounting and, sometimes, for process safety. Clearly, measuring holdup in all pieces of equipment is not a viable option in terms of time, money, and radiation exposure to personnel. Furthermore, 100% measurement is not only impractical but unnecessary for developing estimated values. Principles of statistical sampling are valuable in the design of cost effective holdup monitoring plans and in qualifying uncertainties in holdup estimates. The purpose of this paper is to describe those principles and to illustrate their use
Sparse Power-Law Network Model for Reliable Statistical Predictions Based on Sampled Data

Directory of Open Access Journals (Sweden)

Alexander P. Kartun-Giles

2018-04-01

Full Text Available A projective network model is a model that enables predictions to be made based on a subsample of the network data, with the predictions remaining unchanged if a larger sample is taken into consideration. An exchangeable model is a model that does not depend on the order in which nodes are sampled. Despite a large variety of non-equilibrium (growing and equilibrium (static sparse complex network models that are widely used in network science, how to reconcile sparseness (constant average degree with the desired statistical properties of projectivity and exchangeability is currently an outstanding scientific problem. Here we propose a network process with hidden variables which is projective and can generate sparse power-law networks. Despite the model not being exchangeable, it can be closely related to exchangeable uncorrelated networks as indicated by its information theory characterization and its network entropy. The use of the proposed network process as a null model is here tested on real data, indicating that the model offers a promising avenue for statistical network modelling.
Optimal sampling designs for large-scale fishery sample surveys in Greece

Directory of Open Access Journals (Sweden)

G. BAZIGOS

2007-12-01

The paper deals with the optimization of the following three large scale sample surveys: biological sample survey of commercial landings (BSCL, experimental fishing sample survey (EFSS, and commercial landings and effort sample survey (CLES.
Sampling methods to the statistical control of the production of blood components.

Science.gov (United States)

Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo

2017-12-01

The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.
Statistical assessment of fish behavior from split-beam hydro-acoustic sampling

International Nuclear Information System (INIS)

McKinstry, Craig A.; Simmons, Mary Ann; Simmons, Carver S.; Johnson, Robert L.

2005-01-01

Statistical methods are presented for using echo-traces from split-beam hydro-acoustic sampling to assess fish behavior in response to a stimulus. The data presented are from a study designed to assess the response of free-ranging, lake-resident fish, primarily kokanee (Oncorhynchus nerka) and rainbow trout (Oncorhynchus mykiss) to high intensity strobe lights, and was conducted at Grand Coulee Dam on the Columbia River in Northern Washington State. The lights were deployed immediately upstream from the turbine intakes, in a region exposed to daily alternating periods of high and low flows. The study design included five down-looking split-beam transducers positioned in a line at incremental distances upstream from the strobe lights, and treatments applied in randomized pseudo-replicate blocks. Statistical methods included the use of odds-ratios from fitted loglinear models. Fish-track velocity vectors were modeled using circular probability distributions. Both analyses are depicted graphically. Study results suggest large increases of fish activity in the presence of the strobe lights, most notably at night and during periods of low flow. The lights also induced notable bimodality in the angular distributions of the fish track velocity vectors. Statistical/SUMmaries are presented along with interpretations on fish behavior
Large-Deviation Results for Discriminant Statistics of Gaussian Locally Stationary Processes

Directory of Open Access Journals (Sweden)

Junichi Hirukawa

2012-01-01

Full Text Available This paper discusses the large-deviation principle of discriminant statistics for Gaussian locally stationary processes. First, large-deviation theorems for quadratic forms and the log-likelihood ratio for a Gaussian locally stationary process with a mean function are proved. Their asymptotics are described by the large deviation rate functions. Second, we consider the situations where processes are misspecified to be stationary. In these misspecified cases, we formally make the log-likelihood ratio discriminant statistics and derive the large deviation theorems of them. Since they are complicated, they are evaluated and illustrated by numerical examples. We realize the misspecification of the process to be stationary seriously affecting our discrimination.
Statistical distribution of the local purity in a large quantum system

International Nuclear Information System (INIS)

De Pasquale, A; Pascazio, S; Facchi, P; Giovannetti, V; Parisi, G; Scardicchio, A

2012-01-01

The local purity of large many-body quantum systems can be studied by following a statistical mechanical approach based on a random matrix model. Restricting the analysis to the case of global pure states, this method proved to be successful, and a full characterization of the statistical properties of the local purity was obtained by computing the partition function of the problem. Here we generalize these techniques to the case of global mixed states. In this context, by uniformly sampling the phase space of states with assigned global mixedness, we determine the exact expression of the first two moments of the local purity and a general expression for the moments of higher order. This generalizes previous results obtained for globally pure configurations. Furthermore, through the introduction of a partition function for a suitable canonical ensemble, we compute the approximate expression of the first moment of the marginal purity in the high-temperature regime. In the process, we establish a formal connection with the theory of quantum twirling maps that provides an alternative, possibly fruitful, way of performing the calculation. (paper)
Sampling Large Graphs for Anticipatory Analytics

Science.gov (United States)

2015-05-15

low. C. Random Area Sampling Random area sampling [8] is a “ snowball ” sampling method in which a set of random seed vertices are selected and areas... Sampling Large Graphs for Anticipatory Analytics Lauren Edwards, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller Lincoln...systems, greater human-in-the-loop involvement, or through complex algorithms. We are investigating the use of sampling to mitigate these challenges

New Hybrid Monte Carlo methods for efficient sampling. From physics to biology and statistics

International Nuclear Information System (INIS)

Akhmatskaya, Elena; Reich, Sebastian

2011-01-01

We introduce a class of novel hybrid methods for detailed simulations of large complex systems in physics, biology, materials science and statistics. These generalized shadow Hybrid Monte Carlo (GSHMC) methods combine the advantages of stochastic and deterministic simulation techniques. They utilize a partial momentum update to retain some of the dynamical information, employ modified Hamiltonians to overcome exponential performance degradation with the system’s size and make use of multi-scale nature of complex systems. Variants of GSHMCs were developed for atomistic simulation, particle simulation and statistics: GSHMC (thermodynamically consistent implementation of constant-temperature molecular dynamics), MTS-GSHMC (multiple-time-stepping GSHMC), meso-GSHMC (Metropolis corrected dissipative particle dynamics (DPD) method), and a generalized shadow Hamiltonian Monte Carlo, GSHmMC (a GSHMC for statistical simulations). All of these are compatible with other enhanced sampling techniques and suitable for massively parallel computing allowing for a range of multi-level parallel strategies. A brief description of the GSHMC approach, examples of its application on high performance computers and comparison with other existing techniques are given. Our approach is shown to resolve such problems as resonance instabilities of the MTS methods and non-preservation of thermodynamic equilibrium properties in DPD, and to outperform known methods in sampling efficiency by an order of magnitude. (author)
The Role of the Sampling Distribution in Understanding Statistical Inference

Science.gov (United States)

Lipson, Kay

2003-01-01

Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…
Parameter sampling capabilities of sequential and simultaneous data assimilation: II. Statistical analysis of numerical results

International Nuclear Information System (INIS)

Fossum, Kristian; Mannseth, Trond

2014-01-01

We assess and compare parameter sampling capabilities of one sequential and one simultaneous Bayesian, ensemble-based, joint state-parameter (JS) estimation method. In the companion paper, part I (Fossum and Mannseth 2014 Inverse Problems 30 114002), analytical investigations lead us to propose three claims, essentially stating that the sequential method can be expected to outperform the simultaneous method for weakly nonlinear forward models. Here, we assess the reliability and robustness of these claims through statistical analysis of results from a range of numerical experiments. Samples generated by the two approximate JS methods are compared to samples from the posterior distribution generated by a Markov chain Monte Carlo method, using four approximate measures of distance between probability distributions. Forward-model nonlinearity is assessed from a stochastic nonlinearity measure allowing for sufficiently large model dimensions. Both toy models (with low computational complexity, and where the nonlinearity is fairly easy to control) and two-phase porous-media flow models (corresponding to down-scaled versions of problems to which the JS methods have been frequently applied recently) are considered in the numerical experiments. Results from the statistical analysis show strong support of all three claims stated in part I. (paper)
Software engineering the mixed model for genome-wide association studies on large samples.

Science.gov (United States)

Zhang, Zhiwu; Buckler, Edward S; Casstevens, Terry M; Bradbury, Peter J

2009-11-01

Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample size and number of markers used for GWAS is increasing dramatically, resulting in greater statistical power to detect those associations. The use of mixed models with increasingly large data sets depends on the availability of software for analyzing those models. While multiple software packages implement the mixed model method, no single package provides the best combination of fast computation, ability to handle large samples, flexible modeling and ease of use. Key elements of association analysis with mixed models are reviewed, including modeling phenotype-genotype associations using mixed models, population stratification, kinship and its estimation, variance component estimation, use of best linear unbiased predictors or residuals in place of raw phenotype, improving efficiency and software-user interaction. The available software packages are evaluated, and suggestions made for future software development.
Statistical methods for detecting differentially abundant features in clinical metagenomic samples.

Directory of Open Access Journals (Sweden)

James Robert White

2009-04-01

Full Text Available Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them.We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software
Comparison of pure and 'Latinized' centroidal Voronoi tessellation against various other statistical sampling methods

International Nuclear Information System (INIS)

Romero, Vicente J.; Burkardt, John V.; Gunzburger, Max D.; Peterson, Janet S.

2006-01-01

A recently developed centroidal Voronoi tessellation (CVT) sampling method is investigated here to assess its suitability for use in statistical sampling applications. CVT efficiently generates a highly uniform distribution of sample points over arbitrarily shaped M-dimensional parameter spaces. On several 2-D test problems CVT has recently been found to provide exceedingly effective and efficient point distributions for response surface generation. Additionally, for statistical function integration and estimation of response statistics associated with uniformly distributed random-variable inputs (uncorrelated), CVT has been found in initial investigations to provide superior points sets when compared against latin-hypercube and simple-random Monte Carlo methods and Halton and Hammersley quasi-random sequence methods. In this paper, the performance of all these sampling methods and a new variant ('Latinized' CVT) are further compared for non-uniform input distributions. Specifically, given uncorrelated normal inputs in a 2-D test problem, statistical sampling efficiencies are compared for resolving various statistics of response: mean, variance, and exceedence probabilities
TRAN-STAT: statistics for environmental studies, Number 22. Comparison of soil-sampling techniques for plutonium at Rocky Flats

International Nuclear Information System (INIS)

Gilbert, R.O.; Bernhardt, D.E.; Hahn, P.B.

1983-01-01

A summary of a field soil sampling study conducted around the Rocky Flats Colorado plant in May 1977 is preseted. Several different soil sampling techniques that had been used in the area were applied at four different sites. One objective was to comparethe average 239 - 240 Pu concentration values obtained by the various soil sampling techniques used. There was also interest in determining whether there are differences in the reproducibility of the various techniques and how the techniques compared with the proposed EPA technique of sampling to 1 cm depth. Statistically significant differences in average concentrations between the techniques were found. The differences could be largely related to the differences in sampling depth-the primary physical variable between the techniques. The reproducibility of the techniques was evaluated by comparing coefficients of variation. Differences between coefficients of variation were not statistically significant. Average (median) coefficients ranged from 21 to 42 percent for the five sampling techniques. A laboratory study indicated that various sample treatment and particle sizing techniques could increase the concentration of plutonium in the less than 10 micrometer size fraction by up to a factor of about 4 compared to the 2 mm size fraction
Statistical analyses of scatterplots to identify important factors in large-scale simulations, 2: robustness of techniques

International Nuclear Information System (INIS)

Kleijnen, J.P.C.; Helton, J.C.

1999-01-01

The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (i) linear relationships with correlation coefficients, (ii) monotonic relationships with rank correlation coefficients, (iii) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (iv) trends in variability as defined by variances and interquartile ranges, and (v) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are considered for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (i) Type I errors are unavoidable, (ii) Type II errors can occur when inappropriate analysis procedures are used, (iii) physical explanations should always be sought for why statistical procedures identify variables as being important, and (iv) the identification of important variables tends to be stable for independent Latin hypercube samples
The structure of Diagnostic and Statistical Manual of Mental Disorders (4th edition, text revision) personality disorder symptoms in a large national sample.

Science.gov (United States)

Trull, Timothy J; Vergés, Alvaro; Wood, Phillip K; Jahng, Seungmin; Sher, Kenneth J

2012-10-01

We examined the latent structure underlying the criteria for DSM-IV-TR (American Psychiatric Association, 2000, Diagnostic and statistical manual of mental disorders (4th ed., text revision). Washington, DC: Author.) personality disorders in a large nationally representative sample of U.S. adults. Personality disorder symptom data were collected using a structured diagnostic interview from approximately 35,000 adults assessed over two waves of data collection in the National Epidemiologic Survey on Alcohol and Related Conditions. Our analyses suggested that a seven-factor solution provided the best fit for the data, and these factors were marked primarily by one or at most two personality disorder criteria sets. A series of regression analyses that used external validators tapping Axis I psychopathology, treatment for mental health problems, functioning scores, interpersonal conflict, and suicidal ideation and behavior provided support for the seven-factor solution. We discuss these findings in the context of previous studies that have examined the structure underlying the personality disorder criteria as well as the current proposals for DSM-5 personality disorders. (PsycINFO Database Record (c) 2012 APA, all rights reserved).
Polish Phoneme Statistics Obtained On Large Set Of Written Texts

Directory of Open Access Journals (Sweden)

Bartosz Ziółko

2009-01-01

Full Text Available The phonetical statistics were collected from several Polish corpora. The paper is a summaryof the data which are phoneme n-grams and some phenomena in the statistics. Triphonestatistics apply context-dependent speech units which have an important role in speech recognitionsystems and were never calculated for a large set of Polish written texts. The standardphonetic alphabet for Polish, SAMPA, and methods of providing phonetic transcriptions are described.
Notes on the Implementation of Non-Parametric Statistics within the Westinghouse Realistic Large Break LOCA Evaluation Model (ASTRUM)

International Nuclear Information System (INIS)

Frepoli, Cesare; Oriani, Luca

2006-01-01

In recent years, non-parametric or order statistics methods have been widely used to assess the impact of the uncertainties within Best-Estimate LOCA evaluation models. The bounding of the uncertainties is achieved with a direct Monte Carlo sampling of the uncertainty attributes, with the minimum trial number selected to 'stabilize' the estimation of the critical output values (peak cladding temperature (PCT), local maximum oxidation (LMO), and core-wide oxidation (CWO A non-parametric order statistics uncertainty analysis was recently implemented within the Westinghouse Realistic Large Break LOCA evaluation model, also referred to as 'Automated Statistical Treatment of Uncertainty Method' (ASTRUM). The implementation or interpretation of order statistics in safety analysis is not fully consistent within the industry. This has led to an extensive public debate among regulators and researchers which can be found in the open literature. The USNRC-approved Westinghouse method follows a rigorous implementation of the order statistics theory, which leads to the execution of 124 simulations within a Large Break LOCA analysis. This is a solid approach which guarantees that a bounding value (at 95% probability) of the 95 th percentile for each of the three 10 CFR 50.46 ECCS design acceptance criteria (PCT, LMO and CWO) is obtained. The objective of this paper is to provide additional insights on the ASTRUM statistical approach, with a more in-depth analysis of pros and cons of the order statistics and of the Westinghouse approach in the implementation of this statistical methodology. (authors)
Multiparametric statistics

CERN Document Server

Serdobolskii, Vadim Ivanovich

2007-01-01

This monograph presents mathematical theory of statistical models described by the essentially large number of unknown parameters, comparable with sample size but can also be much larger. In this meaning, the proposed theory can be called "essentially multiparametric". It is developed on the basis of the Kolmogorov asymptotic approach in which sample size increases along with the number of unknown parameters.This theory opens a way for solution of central problems of multivariate statistics, which up until now have not been solved. Traditional statistical methods based on the idea of an infinite sampling often break down in the solution of real problems, and, dependent on data, can be inefficient, unstable and even not applicable. In this situation, practical statisticians are forced to use various heuristic methods in the hope the will find a satisfactory solution.Mathematical theory developed in this book presents a regular technique for implementing new, more efficient versions of statistical procedures. ...
Sampling stored product insect pests: a comparison of four statistical sampling models for probability of pest detection

Science.gov (United States)

Statistically robust sampling strategies form an integral component of grain storage and handling activities throughout the world. Developing sampling strategies to target biological pests such as insects in stored grain is inherently difficult due to species biology and behavioral characteristics. ...
Improving Statistics Education through Simulations: The Case of the Sampling Distribution.

Science.gov (United States)

Earley, Mark A.

This paper presents a summary of action research investigating statistics students' understandings of the sampling distribution of the mean. With four sections of an introductory Statistics in Education course (n=98 students), a computer simulation activity (R. delMas, J. Garfield, and B. Chance, 1999) was implemented and evaluated to show…
Comparing Simulated and Theoretical Sampling Distributions of the U3 Person-Fit Statistic.

Science.gov (United States)

Emons, Wilco H. M.; Meijer, Rob R.; Sijtsma, Klaas

2002-01-01

Studied whether the theoretical sampling distribution of the U3 person-fit statistic is in agreement with the simulated sampling distribution under different item response theory models and varying item and test characteristics. Simulation results suggest that the use of standard normal deviates for the standardized version of the U3 statistic may…
A Preliminary Study on Sensitivity and Uncertainty Analysis with Statistic Method: Uncertainty Analysis with Cross Section Sampling from Lognormal Distribution

Energy Technology Data Exchange (ETDEWEB)

Song, Myung Sub; Kim, Song Hyun; Kim, Jong Kyung [Hanyang Univ., Seoul (Korea, Republic of); Noh, Jae Man [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

2013-10-15

The uncertainty evaluation with statistical method is performed by repetition of transport calculation with sampling the directly perturbed nuclear data. Hence, the reliable uncertainty result can be obtained by analyzing the results of the numerous transport calculations. One of the problems in the uncertainty analysis with the statistical approach is known as that the cross section sampling from the normal (Gaussian) distribution with relatively large standard deviation leads to the sampling error of the cross sections such as the sampling of the negative cross section. Some collection methods are noted; however, the methods can distort the distribution of the sampled cross sections. In this study, a sampling method of the nuclear data is proposed by using lognormal distribution. After that, the criticality calculations with sampled nuclear data are performed and the results are compared with that from the normal distribution which is conventionally used in the previous studies. In this study, the statistical sampling method of the cross section with the lognormal distribution was proposed to increase the sampling accuracy without negative sampling error. Also, a stochastic cross section sampling and writing program was developed. For the sensitivity and uncertainty analysis, the cross section sampling was pursued with the normal and lognormal distribution. The uncertainties, which are caused by covariance of (n,.) cross sections, were evaluated by solving GODIVA problem. The results show that the sampling method with lognormal distribution can efficiently solve the negative sampling problem referred in the previous studies. It is expected that this study will contribute to increase the accuracy of the sampling-based uncertainty analysis.
A Preliminary Study on Sensitivity and Uncertainty Analysis with Statistic Method: Uncertainty Analysis with Cross Section Sampling from Lognormal Distribution

International Nuclear Information System (INIS)

Song, Myung Sub; Kim, Song Hyun; Kim, Jong Kyung; Noh, Jae Man

2013-01-01

The uncertainty evaluation with statistical method is performed by repetition of transport calculation with sampling the directly perturbed nuclear data. Hence, the reliable uncertainty result can be obtained by analyzing the results of the numerous transport calculations. One of the problems in the uncertainty analysis with the statistical approach is known as that the cross section sampling from the normal (Gaussian) distribution with relatively large standard deviation leads to the sampling error of the cross sections such as the sampling of the negative cross section. Some collection methods are noted; however, the methods can distort the distribution of the sampled cross sections. In this study, a sampling method of the nuclear data is proposed by using lognormal distribution. After that, the criticality calculations with sampled nuclear data are performed and the results are compared with that from the normal distribution which is conventionally used in the previous studies. In this study, the statistical sampling method of the cross section with the lognormal distribution was proposed to increase the sampling accuracy without negative sampling error. Also, a stochastic cross section sampling and writing program was developed. For the sensitivity and uncertainty analysis, the cross section sampling was pursued with the normal and lognormal distribution. The uncertainties, which are caused by covariance of (n,.) cross sections, were evaluated by solving GODIVA problem. The results show that the sampling method with lognormal distribution can efficiently solve the negative sampling problem referred in the previous studies. It is expected that this study will contribute to increase the accuracy of the sampling-based uncertainty analysis
Scalability on LHS (Latin Hypercube Sampling) samples for use in uncertainty analysis of large numerical models

International Nuclear Information System (INIS)

Baron, Jorge H.; Nunez Mac Leod, J.E.

2000-01-01

The present paper deals with the utilization of advanced sampling statistical methods to perform uncertainty and sensitivity analysis on numerical models. Such models may represent physical phenomena, logical structures (such as boolean expressions) or other systems, and various of their intrinsic parameters and/or input variables are usually treated as random variables simultaneously. In the present paper a simple method to scale-up Latin Hypercube Sampling (LHS) samples is presented, starting with a small sample and duplicating its size at each step, making it possible to use the already run numerical model results with the smaller sample. The method does not distort the statistical properties of the random variables and does not add any bias to the samples. The results is a significant reduction in numerical models running time can be achieved (by re-using the previously run samples), keeping all the advantages of LHS, until an acceptable representation level is achieved in the output variables. (author)
Effect of the absolute statistic on gene-sampling gene-set analysis methods.

Science.gov (United States)

Nam, Dougu

2017-06-01

Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.
Statistics of geodesics in large quadrangulations

International Nuclear Information System (INIS)

Bouttier, J; Guitter, E

2008-01-01

We study the statistical properties of geodesics, i.e. paths of minimal length, in large random planar quadrangulations. We extend Schaeffer's well-labeled tree bijection to the case of quadrangulations with a marked geodesic, leading to the notion of 'spine trees', amenable to a direct enumeration. We obtain the generating functions for quadrangulations with a marked geodesic of fixed length, as well as with a set of 'confluent geodesics', i.e. a collection of non-intersecting minimal paths connecting two given points. In the limit of quadrangulations with a large area n, we find in particular an average number 3 x 2 i of geodesics between two fixed points at distance i >> 1 from each other. We show that, for generic endpoints, two confluent geodesics remain close to each other and have an extensive number of contacts. This property fails for a few 'exceptional' endpoints which can be linked by truly distinct geodesics. Results are presented both in the case of finite length i and in the scaling limit i ∼ n 1/4 . In particular, we give the scaling distribution of the exceptional points

Nomogram for sample size calculation on a straightforward basis for the kappa statistic.

Science.gov (United States)

Hong, Hyunsook; Choi, Yunhee; Hahn, Seokyung; Park, Sue Kyung; Park, Byung-Joo

2014-09-01

Kappa is a widely used measure of agreement. However, it may not be straightforward in some situation such as sample size calculation due to the kappa paradox: high agreement but low kappa. Hence, it seems reasonable in sample size calculation that the level of agreement under a certain marginal prevalence is considered in terms of a simple proportion of agreement rather than a kappa value. Therefore, sample size formulae and nomograms using a simple proportion of agreement rather than a kappa under certain marginal prevalences are proposed. A sample size formula was derived using the kappa statistic under the common correlation model and goodness-of-fit statistic. The nomogram for the sample size formula was developed using SAS 9.3. The sample size formulae using a simple proportion of agreement instead of a kappa statistic and nomograms to eliminate the inconvenience of using a mathematical formula were produced. A nomogram for sample size calculation with a simple proportion of agreement should be useful in the planning stages when the focus of interest is on testing the hypothesis of interobserver agreement involving two raters and nominal outcome measures. Copyright © 2014 Elsevier Inc. All rights reserved.
The statistical-inference approach to generalized thermodynamics

International Nuclear Information System (INIS)

Lavenda, B.H.; Scherer, C.

1987-01-01

Limit theorems, such as the central-limit theorem and the weak law of large numbers, are applicable to statistical thermodynamics for sufficiently large sample size of indipendent and identically distributed observations performed on extensive thermodynamic (chance) variables. The estimation of the intensive thermodynamic quantities is a problem in parametric statistical estimation. The normal approximation to the Gibbs' distribution is justified by the analysis of large deviations. Statistical thermodynamics is generalized to include the statistical estimation of variance as well as mean values
Testing statistical hypotheses

CERN Document Server

Lehmann, E L

2005-01-01

The third edition of Testing Statistical Hypotheses updates and expands upon the classic graduate text, emphasizing optimality theory for hypothesis testing and confidence sets. The principal additions include a rigorous treatment of large sample optimality, together with the requisite tools. In addition, an introduction to the theory of resampling methods such as the bootstrap is developed. The sections on multiple testing and goodness of fit testing are expanded. The text is suitable for Ph.D. students in statistics and includes over 300 new problems out of a total of more than 760. E.L. Lehmann is Professor of Statistics Emeritus at the University of California, Berkeley. He is a member of the National Academy of Sciences and the American Academy of Arts and Sciences, and the recipient of honorary degrees from the University of Leiden, The Netherlands and the University of Chicago. He is the author of Elements of Large-Sample Theory and (with George Casella) he is also the author of Theory of Point Estimat...
A simulative comparison of respondent driven sampling with incentivized snowball sampling--the "strudel effect".

Science.gov (United States)

Gyarmathy, V Anna; Johnston, Lisa G; Caplinskiene, Irma; Caplinskas, Saulius; Latkin, Carl A

2014-02-01

Respondent driven sampling (RDS) and incentivized snowball sampling (ISS) are two sampling methods that are commonly used to reach people who inject drugs (PWID). We generated a set of simulated RDS samples on an actual sociometric ISS sample of PWID in Vilnius, Lithuania ("original sample") to assess if the simulated RDS estimates were statistically significantly different from the original ISS sample prevalences for HIV (9.8%), Hepatitis A (43.6%), Hepatitis B (Anti-HBc 43.9% and HBsAg 3.4%), Hepatitis C (87.5%), syphilis (6.8%) and Chlamydia (8.8%) infections and for selected behavioral risk characteristics. The original sample consisted of a large component of 249 people (83% of the sample) and 13 smaller components with 1-12 individuals. Generally, as long as all seeds were recruited from the large component of the original sample, the simulation samples simply recreated the large component. There were no significant differences between the large component and the entire original sample for the characteristics of interest. Altogether 99.2% of 360 simulation sample point estimates were within the confidence interval of the original prevalence values for the characteristics of interest. When population characteristics are reflected in large network components that dominate the population, RDS and ISS may produce samples that have statistically non-different prevalence values, even though some isolated network components may be under-sampled and/or statistically significantly different from the main groups. This so-called "strudel effect" is discussed in the paper. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Statistics and Dynamics in the Large-scale Structure of the Universe

International Nuclear Information System (INIS)

Matsubara, Takahiko

2006-01-01

In cosmology, observations and theories are related to each other by statistics in most cases. Especially, statistical methods play central roles in analyzing fluctuations in the universe, which are seeds of the present structure of the universe. The confrontation of the statistics and dynamics is one of the key methods to unveil the structure and evolution of the universe. I will review some of the major statistical methods in cosmology, in connection with linear and nonlinear dynamics of the large-scale structure of the universe. The present status of analyses of the observational data such as the Sloan Digital Sky Survey, and the future prospects to constrain the nature of exotic components of the universe such as the dark energy will be presented
Causality in Statistical Power: Isomorphic Properties of Measurement, Research Design, Effect Size, and Sample Size

Directory of Open Access Journals (Sweden)

R. Eric Heidel

2016-01-01

Full Text Available Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by an a priori sample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up an a priori sample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.
Statistical sampling strategies

International Nuclear Information System (INIS)

Andres, T.H.

1987-01-01

Systems assessment codes use mathematical models to simulate natural and engineered systems. Probabilistic systems assessment codes carry out multiple simulations to reveal the uncertainty in values of output variables due to uncertainty in the values of the model parameters. In this paper, methods are described for sampling sets of parameter values to be used in a probabilistic systems assessment code. Three Monte Carlo parameter selection methods are discussed: simple random sampling, Latin hypercube sampling, and sampling using two-level orthogonal arrays. Three post-selection transformations are also described: truncation, importance transformation, and discretization. Advantages and disadvantages of each method are summarized
Current fluctuations and statistics during a large deviation event in an exactly solvable transport model

International Nuclear Information System (INIS)

Hurtado, Pablo I; Garrido, Pedro L

2009-01-01

We study the distribution of the time-integrated current in an exactly solvable toy model of heat conduction, both analytically and numerically. The simplicity of the model allows us to derive the full current large deviation function and the system statistics during a large deviation event. In this way we unveil a relation between system statistics at the end of a large deviation event and for intermediate times. The mid-time statistics is independent of the sign of the current, a reflection of the time-reversal symmetry of microscopic dynamics, while the end-time statistics does depend on the current sign, and also on its microscopic definition. We compare our exact results with simulations based on the direct evaluation of large deviation functions, analyzing the finite-size corrections of this simulation method and deriving detailed bounds for its applicability. We also show how the Gallavotti–Cohen fluctuation theorem can be used to determine the range of validity of simulation results
Statistical sampling and modelling for cork oak and eucalyptus stands

NARCIS (Netherlands)

Paulo, M.J.

2002-01-01

This thesis focuses on the use of modern statistical methods to solve problems on sampling, optimal cutting time and agricultural modelling in Portuguese cork oak and eucalyptus stands. The results are contained in five chapters that have been submitted for publication
Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures

Directory of Open Access Journals (Sweden)

Scheid Anika

2012-07-01

Full Text Available Abstract Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent stochastic context-free grammar (SCFG that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples, where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones, then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst
Statistical inference

CERN Document Server

Rohatgi, Vijay K

2003-01-01

Unified treatment of probability and statistics examines and analyzes the relationship between the two fields, exploring inferential issues. Numerous problems, examples, and diagrams--some with solutions--plus clear-cut, highlighted summaries of results. Advanced undergraduate to graduate level. Contents: 1. Introduction. 2. Probability Model. 3. Probability Distributions. 4. Introduction to Statistical Inference. 5. More on Mathematical Expectation. 6. Some Discrete Models. 7. Some Continuous Models. 8. Functions of Random Variables and Random Vectors. 9. Large-Sample Theory. 10. General Meth
A Simple Sampling Method for Estimating the Accuracy of Large Scale Record Linkage Projects.

Science.gov (United States)

Boyd, James H; Guiver, Tenniel; Randall, Sean M; Ferrante, Anna M; Semmens, James B; Anderson, Phil; Dickinson, Teresa

2016-05-17

Record linkage techniques allow different data collections to be brought together to provide a wider picture of the health status of individuals. Ensuring high linkage quality is important to guarantee the quality and integrity of research. Current methods for measuring linkage quality typically focus on precision (the proportion of incorrect links), given the difficulty of measuring the proportion of false negatives. The aim of this work is to introduce and evaluate a sampling based method to estimate both precision and recall following record linkage. In the sampling based method, record-pairs from each threshold (including those below the identified cut-off for acceptance) are sampled and clerically reviewed. These results are then applied to the entire set of record-pairs, providing estimates of false positives and false negatives. This method was evaluated on a synthetically generated dataset, where the true match status (which records belonged to the same person) was known. The sampled estimates of linkage quality were relatively close to actual linkage quality metrics calculated for the whole synthetic dataset. The precision and recall measures for seven reviewers were very consistent with little variation in the clerical assessment results (overall agreement using the Fleiss Kappa statistics was 0.601). This method presents as a possible means of accurately estimating matching quality and refining linkages in population level linkage studies. The sampling approach is especially important for large project linkages where the number of record pairs produced may be very large often running into millions.
A simulative comparison of respondent driven sampling with incentivized snowball sampling – the “strudel effect”

Science.gov (United States)

Gyarmathy, V. Anna; Johnston, Lisa G.; Caplinskiene, Irma; Caplinskas, Saulius; Latkin, Carl A.

2014-01-01

Background Respondent driven sampling (RDS) and Incentivized Snowball Sampling (ISS) are two sampling methods that are commonly used to reach people who inject drugs (PWID). Methods We generated a set of simulated RDS samples on an actual sociometric ISS sample of PWID in Vilnius, Lithuania (“original sample”) to assess if the simulated RDS estimates were statistically significantly different from the original ISS sample prevalences for HIV (9.8%), Hepatitis A (43.6%), Hepatitis B (Anti-HBc 43.9% and HBsAg 3.4%), Hepatitis C (87.5%), syphilis (6.8%) and Chlamydia (8.8%) infections and for selected behavioral risk characteristics. Results The original sample consisted of a large component of 249 people (83% of the sample) and 13 smaller components with 1 to 12 individuals. Generally, as long as all seeds were recruited from the large component of the original sample, the simulation samples simply recreated the large component. There were no significant differences between the large component and the entire original sample for the characteristics of interest. Altogether 99.2% of 360 simulation sample point estimates were within the confidence interval of the original prevalence values for the characteristics of interest. Conclusions When population characteristics are reflected in large network components that dominate the population, RDS and ISS may produce samples that have statistically non-different prevalence values, even though some isolated network components may be under-sampled and/or statistically significantly different from the main groups. This so-called “strudel effect” is discussed in the paper. PMID:24360650
A comparative analysis of the statistical properties of large mobile phone calling networks.

Science.gov (United States)

Li, Ming-Xia; Jiang, Zhi-Qiang; Xie, Wen-Jie; Miccichè, Salvatore; Tumminello, Michele; Zhou, Wei-Xing; Mantegna, Rosario N

2014-05-30

Mobile phone calling is one of the most widely used communication methods in modern society. The records of calls among mobile phone users provide us a valuable proxy for the understanding of human communication patterns embedded in social networks. Mobile phone users call each other forming a directed calling network. If only reciprocal calls are considered, we obtain an undirected mutual calling network. The preferential communication behavior between two connected users can be statistically tested and it results in two Bonferroni networks with statistically validated edges. We perform a comparative analysis of the statistical properties of these four networks, which are constructed from the calling records of more than nine million individuals in Shanghai over a period of 110 days. We find that these networks share many common structural properties and also exhibit idiosyncratic features when compared with previously studied large mobile calling networks. The empirical findings provide us an intriguing picture of a representative large social network that might shed new lights on the modelling of large social networks.
Strong laws for L- and U-statistics

NARCIS (Netherlands)

Aaronson, J; Burton, R; Dehling, H; Gilat, D; Hill, T; Weiss, B

Strong laws of large numbers are given for L-statistics (linear combinations of order statistics) and for U-statistics (averages of kernels of random samples) for ergodic stationary processes, extending classical theorems; of Hoeffding and of Helmers for lid sequences. Examples are given to show
The large break LOCA evaluation method with the simplified statistic approach

International Nuclear Information System (INIS)

Kamata, Shinya; Kubo, Kazuo

2004-01-01

USNRC published the Code Scaling, Applicability and Uncertainty (CSAU) evaluation methodology to large break LOCA which supported the revised rule for Emergency Core Cooling System performance in 1989. In USNRC regulatory guide 1.157, it is required that the peak cladding temperature (PCT) cannot exceed 2200deg F with high probability 95th percentile. In recent years, overseas countries have developed statistical methodology and best estimate code with the model which can provide more realistic simulation for the phenomena based on the CSAU evaluation methodology. In order to calculate PCT probability distribution by Monte Carlo trials, there are approaches such as the response surface technique using polynomials, the order statistics method, etc. For the purpose of performing rational statistic analysis, Mitsubishi Heavy Industries, LTD (MHI) tried to develop the statistic LOCA method using the best estimate LOCA code MCOBRA/TRAC and the simplified code HOTSPOT. HOTSPOT is a Monte Carlo heat conduction solver to evaluate the uncertainties of the significant fuel parameters at the PCT positions of the hot rod. The direct uncertainty sensitivity studies can be performed without the response surface because the Monte Carlo simulation for key parameters can be performed in short time using HOTSPOT. With regard to the parameter uncertainties, MHI established the treatment that the bounding conditions are given for LOCA boundary and plant initial conditions, the Monte Carlo simulation using HOTSPOT is applied to the significant fuel parameters. The paper describes the large break LOCA evaluation method with the simplified statistic approach and the results of the application of the method to the representative four-loop nuclear power plant. (author)
On the accuracy of protein determination in large biological samples by prompt gamma neutron activation analysis

International Nuclear Information System (INIS)

Kasviki, K.; Stamatelatos, I.E.; Yannakopoulou, E.; Papadopoulou, P.; Kalef-Ezra, J.

2007-01-01

A prompt gamma neutron activation analysis (PGNAA) facility has been developed for the determination of nitrogen and thus total protein in large volume biological samples or the whole body of small animals. In the present work, the accuracy of nitrogen determination by PGNAA in phantoms of known composition as well as in four raw ground meat samples of about 1 kg mass was examined. Dumas combustion and Kjeldahl techniques were also used for the assessment of nitrogen concentration in the meat samples. No statistically significant differences were found between the concentrations assessed by the three techniques. The results of this work demonstrate the applicability of PGNAA for the assessment of total protein in biological samples of 0.25-1.5 kg mass, such as a meat sample or the body of small animal even in vivo with an equivalent radiation dose of about 40 mSv
On the accuracy of protein determination in large biological samples by prompt gamma neutron activation analysis

Energy Technology Data Exchange (ETDEWEB)

Kasviki, K. [Institute of Nuclear Technology and Radiation Protection, NCSR ' Demokritos' , Aghia Paraskevi, Attikis 15310 (Greece); Medical Physics Laboratory, Medical School, University of Ioannina, Ioannina 45110 (Greece); Stamatelatos, I.E. [Institute of Nuclear Technology and Radiation Protection, NCSR ' Demokritos' , Aghia Paraskevi, Attikis 15310 (Greece)], E-mail: ion@ipta.demokritos.gr; Yannakopoulou, E. [Institute of Physical Chemistry, NCSR ' Demokritos' , Aghia Paraskevi, Attikis 15310 (Greece); Papadopoulou, P. [Institute of Technology of Agricultural Products, NAGREF, Lycovrissi, Attikis 14123 (Greece); Kalef-Ezra, J. [Medical Physics Laboratory, Medical School, University of Ioannina, Ioannina 45110 (Greece)

2007-10-15

A prompt gamma neutron activation analysis (PGNAA) facility has been developed for the determination of nitrogen and thus total protein in large volume biological samples or the whole body of small animals. In the present work, the accuracy of nitrogen determination by PGNAA in phantoms of known composition as well as in four raw ground meat samples of about 1 kg mass was examined. Dumas combustion and Kjeldahl techniques were also used for the assessment of nitrogen concentration in the meat samples. No statistically significant differences were found between the concentrations assessed by the three techniques. The results of this work demonstrate the applicability of PGNAA for the assessment of total protein in biological samples of 0.25-1.5 kg mass, such as a meat sample or the body of small animal even in vivo with an equivalent radiation dose of about 40 mSv.
Statistical Methods and Tools for Hanford Staged Feed Tank Sampling

Energy Technology Data Exchange (ETDEWEB)

Fountain, Matthew S. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Brigantic, Robert T. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Peterson, Reid A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

2013-10-01

This report summarizes work conducted by Pacific Northwest National Laboratory to technically evaluate the current approach to staged feed sampling of high-level waste (HLW) sludge to meet waste acceptance criteria (WAC) for transfer from tank farms to the Hanford Waste Treatment and Immobilization Plant (WTP). The current sampling and analysis approach is detailed in the document titled Initial Data Quality Objectives for WTP Feed Acceptance Criteria, 24590-WTP-RPT-MGT-11-014, Revision 0 (Arakali et al. 2011). The goal of this current work is to evaluate and provide recommendations to support a defensible, technical and statistical basis for the staged feed sampling approach that meets WAC data quality objectives (DQOs).
Sampling large random knots in a confined space

International Nuclear Information System (INIS)

Arsuaga, J; Blackstone, T; Diao, Y; Hinson, K; Karadayi, E; Saito, M

2007-01-01

DNA knots formed under extreme conditions of condensation, as in bacteriophage P4, are difficult to analyze experimentally and theoretically. In this paper, we propose to use the uniform random polygon model as a supplementary method to the existing methods for generating random knots in confinement. The uniform random polygon model allows us to sample knots with large crossing numbers and also to generate large diagrammatically prime knot diagrams. We show numerically that uniform random polygons sample knots with large minimum crossing numbers and certain complicated knot invariants (as those observed experimentally). We do this in terms of the knot determinants or colorings. Our numerical results suggest that the average determinant of a uniform random polygon of n vertices grows faster than O(e n 2 )). We also investigate the complexity of prime knot diagrams. We show rigorously that the probability that a randomly selected 2D uniform random polygon of n vertices is almost diagrammatically prime goes to 1 as n goes to infinity. Furthermore, the average number of crossings in such a diagram is at the order of O(n 2 ). Therefore, the two-dimensional uniform random polygons offer an effective way in sampling large (prime) knots, which can be useful in various applications

Sampling large random knots in a confined space

Science.gov (United States)

Arsuaga, J.; Blackstone, T.; Diao, Y.; Hinson, K.; Karadayi, E.; Saito, M.

2007-09-01

DNA knots formed under extreme conditions of condensation, as in bacteriophage P4, are difficult to analyze experimentally and theoretically. In this paper, we propose to use the uniform random polygon model as a supplementary method to the existing methods for generating random knots in confinement. The uniform random polygon model allows us to sample knots with large crossing numbers and also to generate large diagrammatically prime knot diagrams. We show numerically that uniform random polygons sample knots with large minimum crossing numbers and certain complicated knot invariants (as those observed experimentally). We do this in terms of the knot determinants or colorings. Our numerical results suggest that the average determinant of a uniform random polygon of n vertices grows faster than O(e^{n^2}) . We also investigate the complexity of prime knot diagrams. We show rigorously that the probability that a randomly selected 2D uniform random polygon of n vertices is almost diagrammatically prime goes to 1 as n goes to infinity. Furthermore, the average number of crossings in such a diagram is at the order of O(n2). Therefore, the two-dimensional uniform random polygons offer an effective way in sampling large (prime) knots, which can be useful in various applications.
Sampling large random knots in a confined space

Energy Technology Data Exchange (ETDEWEB)

Arsuaga, J [Department of Mathematics, San Francisco State University, 1600 Holloway Ave, San Francisco, CA 94132 (United States); Blackstone, T [Department of Computer Science, San Francisco State University, 1600 Holloway Ave., San Francisco, CA 94132 (United States); Diao, Y [Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC 28223 (United States); Hinson, K [Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC 28223 (United States); Karadayi, E [Department of Mathematics, University of South Florida, 4202 E Fowler Avenue, Tampa, FL 33620 (United States); Saito, M [Department of Mathematics, University of South Florida, 4202 E Fowler Avenue, Tampa, FL 33620 (United States)

2007-09-28

DNA knots formed under extreme conditions of condensation, as in bacteriophage P4, are difficult to analyze experimentally and theoretically. In this paper, we propose to use the uniform random polygon model as a supplementary method to the existing methods for generating random knots in confinement. The uniform random polygon model allows us to sample knots with large crossing numbers and also to generate large diagrammatically prime knot diagrams. We show numerically that uniform random polygons sample knots with large minimum crossing numbers and certain complicated knot invariants (as those observed experimentally). We do this in terms of the knot determinants or colorings. Our numerical results suggest that the average determinant of a uniform random polygon of n vertices grows faster than O(e{sup n{sup 2}}). We also investigate the complexity of prime knot diagrams. We show rigorously that the probability that a randomly selected 2D uniform random polygon of n vertices is almost diagrammatically prime goes to 1 as n goes to infinity. Furthermore, the average number of crossings in such a diagram is at the order of O(n{sup 2}). Therefore, the two-dimensional uniform random polygons offer an effective way in sampling large (prime) knots, which can be useful in various applications.
Importance sampling large deviations in nonequilibrium steady states. I

Science.gov (United States)

Ray, Ushnish; Chan, Garnet Kin-Lic; Limmer, David T.

2018-03-01

Large deviation functions contain information on the stability and response of systems driven into nonequilibrium steady states and in such a way are similar to free energies for systems at equilibrium. As with equilibrium free energies, evaluating large deviation functions numerically for all but the simplest systems is difficult because by construction they depend on exponentially rare events. In this first paper of a series, we evaluate different trajectory-based sampling methods capable of computing large deviation functions of time integrated observables within nonequilibrium steady states. We illustrate some convergence criteria and best practices using a number of different models, including a biased Brownian walker, a driven lattice gas, and a model of self-assembly. We show how two popular methods for sampling trajectory ensembles, transition path sampling and diffusion Monte Carlo, suffer from exponentially diverging correlations in trajectory space as a function of the bias parameter when estimating large deviation functions. Improving the efficiencies of these algorithms requires introducing guiding functions for the trajectories.
Importance sampling large deviations in nonequilibrium steady states. I.

Science.gov (United States)

Ray, Ushnish; Chan, Garnet Kin-Lic; Limmer, David T

2018-03-28

Large deviation functions contain information on the stability and response of systems driven into nonequilibrium steady states and in such a way are similar to free energies for systems at equilibrium. As with equilibrium free energies, evaluating large deviation functions numerically for all but the simplest systems is difficult because by construction they depend on exponentially rare events. In this first paper of a series, we evaluate different trajectory-based sampling methods capable of computing large deviation functions of time integrated observables within nonequilibrium steady states. We illustrate some convergence criteria and best practices using a number of different models, including a biased Brownian walker, a driven lattice gas, and a model of self-assembly. We show how two popular methods for sampling trajectory ensembles, transition path sampling and diffusion Monte Carlo, suffer from exponentially diverging correlations in trajectory space as a function of the bias parameter when estimating large deviation functions. Improving the efficiencies of these algorithms requires introducing guiding functions for the trajectories.
Quantum probability, choice in large worlds, and the statistical structure of reality.

Science.gov (United States)

Ross, Don; Ladyman, James

2013-06-01

Classical probability models of incentive response are inadequate in "large worlds," where the dimensions of relative risk and the dimensions of similarity in outcome comparisons typically differ. Quantum probability models for choice in large worlds may be motivated pragmatically - there is no third theory - or metaphysically: statistical processing in the brain adapts to the true scale-relative structure of the universe.
Statistical Model of Extreme Shear

DEFF Research Database (Denmark)

Larsen, Gunner Chr.; Hansen, Kurt Schaldemose

2004-01-01

In order to continue cost-optimisation of modern large wind turbines, it is important to continously increase the knowledge on wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... by a model that, on a statistically consistent basis, describe the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of high-sampled full-scale time series measurements...... are consistent, given the inevitabel uncertainties associated with model as well as with the extreme value data analysis. Keywords: Statistical model, extreme wind conditions, statistical analysis, turbulence, wind loading, statistical analysis, turbulence, wind loading, wind shear, wind turbines....
Cosmological implications of a large complete quasar sample.

Science.gov (United States)

Segal, I E; Nicoll, J F

1998-04-28

Objective and reproducible determinations of the probabilistic significance levels of the deviations between theoretical cosmological prediction and direct model-independent observation are made for the Large Bright Quasar Sample [Foltz, C., Chaffee, F. H., Hewett, P. C., MacAlpine, G. M., Turnshek, D. A., et al. (1987) Astron. J. 94, 1423-1460]. The Expanding Universe model as represented by the Friedman-Lemaitre cosmology with parameters qo = 0, Lambda = 0 denoted as C1 and chronometric cosmology (no relevant adjustable parameters) denoted as C2 are the cosmologies considered. The mean and the dispersion of the apparent magnitudes and the slope of the apparent magnitude-redshift relation are the directly observed statistics predicted. The C1 predictions of these cosmology-independent quantities are deviant by as much as 11sigma from direct observation; none of the C2 predictions deviate by >2sigma. The C1 deviations may be reconciled with theory by the hypothesis of quasar "evolution," which, however, appears incapable of being substantiated through direct observation. The excellent quantitative agreement of the C1 deviations with those predicted by C2 without adjustable parameters for the results of analysis predicated on C1 indicates that the evolution hypothesis may well be a theoretical artifact.
Statistical sampling applied to the radiological characterization of historical waste

Directory of Open Access Journals (Sweden)

Zaffora Biagio

2016-01-01

Full Text Available The evaluation of the activity of radionuclides in radioactive waste is required for its disposal in final repositories. Easy-to-measure nuclides, like γ-emitters and high-energy X-rays, can be measured via non-destructive nuclear techniques from outside a waste package. Some radionuclides are difficult-to-measure (DTM from outside a package because they are α- or β-emitters. The present article discusses the application of linear regression, scaling factors (SF and the so-called “mean activity method” to estimate the activity of DTM nuclides on metallic waste produced at the European Organization for Nuclear Research (CERN. Various statistical sampling techniques including simple random sampling, systematic sampling, stratified and authoritative sampling are described and applied to 2 waste populations of activated copper cables. The bootstrap is introduced as a tool to estimate average activities and standard errors in waste characterization. The analysis of the DTM Ni-63 is used as an example. Experimental and theoretical values of SFs are calculated and compared. Guidelines for sampling historical waste using probabilistic and non-probabilistic sampling are finally given.
On the coupling of statistic sum of canonical and large canonical ensemble of interacting particles

International Nuclear Information System (INIS)

Vall, A.N.

2000-01-01

Potentiality of refining the known result based on analytic properties of a great statistical sum, as a function of the absolute activity of the boundary integral contribution into statistical sum, is considered. A strict asymptotic ratio between statistical sums of canonical and large canonical ensemble of interacting particles was derived [ru
Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module

Energy Technology Data Exchange (ETDEWEB)

Martinez, Gregory D. [University of California, Physics and Astronomy Department, Los Angeles, CA (United States); McKay, James; Scott, Pat [Imperial College London, Department of Physics, Blackett Laboratory, London (United Kingdom); Farmer, Ben; Conrad, Jan [AlbaNova University Centre, Oskar Klein Centre for Cosmoparticle Physics, Stockholm (Sweden); Stockholm University, Department of Physics, Stockholm (Sweden); Roebber, Elinore [McGill University, Department of Physics, Montreal, QC (Canada); Putze, Antje [LAPTh, Universite de Savoie, CNRS, Annecy-le-Vieux (France); Collaboration: The GAMBIT Scanner Workgroup

2017-11-15

We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework GAMBIT. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. We also announce the release of a new standalone differential evolution sampler, Diver, and describe its design, usage and interface to ScannerBit. We subject Diver and three other samplers (the nested sampler MultiNest, the MCMC GreAT, and the native ScannerBit implementation of the ensemble Monte Carlo algorithm T-Walk) to a battery of statistical tests. For this we use a realistic physical likelihood function, based on the scalar singlet model of dark matter. We examine the performance of each sampler as a function of its adjustable settings, and the dimensionality of the sampling problem. We evaluate performance on four metrics: optimality of the best fit found, completeness in exploring the best-fit region, number of likelihood evaluations, and total runtime. For Bayesian posterior estimation at high resolution, T-Walk provides the most accurate and timely mapping of the full parameter space. For profile likelihood analysis in less than about ten dimensions, we find that Diver and MultiNest score similarly in terms of best fit and speed, outperforming GreAT and T-Walk; in ten or more dimensions, Diver substantially outperforms the other three samplers on all metrics. (orig.)
Testing the statistical isotropy of large scale structure with multipole vectors

International Nuclear Information System (INIS)

Zunckel, Caroline; Huterer, Dragan; Starkman, Glenn D.

2011-01-01

A fundamental assumption in cosmology is that of statistical isotropy - that the Universe, on average, looks the same in every direction in the sky. Statistical isotropy has recently been tested stringently using cosmic microwave background data, leading to intriguing results on large angular scales. Here we apply some of the same techniques used in the cosmic microwave background to the distribution of galaxies on the sky. Using the multipole vector approach, where each multipole in the harmonic decomposition of galaxy density field is described by unit vectors and an amplitude, we lay out the basic formalism of how to reconstruct the multipole vectors and their statistics out of galaxy survey catalogs. We apply the algorithm to synthetic galaxy maps, and study the sensitivity of the multipole vector reconstruction accuracy to the density, depth, sky coverage, and pixelization of galaxy catalog maps.
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics

Science.gov (United States)

Pohorille, Andrew

2006-01-01

The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described
Statistical sampling plans

International Nuclear Information System (INIS)

Jaech, J.L.

1984-01-01

In auditing and in inspection, one selects a number of items by some set of procedures and performs measurements which are compared with the operator's values. This session considers the problem of how to select the samples to be measured, and what kinds of measurements to make. In the inspection situation, the ultimate aim is to independently verify the operator's material balance. The effectiveness of the sample plan in achieving this objective is briefly considered. The discussion focuses on the model plant
Comparing simulated and theoretical sampling distributions of the U3 person-fit statistic

NARCIS (Netherlands)

Emons, W.H.M.; Meijer, R.R.; Sijtsma, K.

2002-01-01

The accuracy with which the theoretical sampling distribution of van der Flier's person-.t statistic U3 approaches the empirical U3 sampling distribution is affected by the item discrimination. A simulation study showed that for tests with a moderate or a strong mean item discrimination, the Type I
The Statistics of Radio Astronomical Polarimetry: Disjoint, Superposed, and Composite Samples

Energy Technology Data Exchange (ETDEWEB)

Straten, W. van [Centre for Astrophysics and Supercomputing, Swinburne University of Technology, Hawthorn, VIC 3122 (Australia); Tiburzi, C., E-mail: willem.van.straten@aut.ac.nz [Max-Planck-Institut für Radioastronomie, Auf dem Hügel 69, D-53121 Bonn (Germany)

2017-02-01

A statistical framework is presented for the study of the orthogonally polarized modes of radio pulsar emission via the covariances between the Stokes parameters. To accommodate the typically heavy-tailed distributions of single-pulse radio flux density, the fourth-order joint cumulants of the electric field are used to describe the superposition of modes with arbitrary probability distributions. The framework is used to consider the distinction between superposed and disjoint modes, with particular attention to the effects of integration over finite samples. If the interval over which the polarization state is estimated is longer than the timescale for switching between two or more disjoint modes of emission, then the modes are unresolved by the instrument. The resulting composite sample mean exhibits properties that have been attributed to mode superposition, such as depolarization. Because the distinction between disjoint modes and a composite sample of unresolved disjoint modes depends on the temporal resolution of the observing instrumentation, the arguments in favor of superposed modes of pulsar emission are revisited, and observational evidence for disjoint modes is described. In principle, the four-dimensional covariance matrix that describes the distribution of sample mean Stokes parameters can be used to distinguish between disjoint modes, superposed modes, and a composite sample of unresolved disjoint modes. More comprehensive and conclusive interpretation of the covariance matrix requires more detailed consideration of various relevant phenomena, including temporally correlated subpulse modulation (e.g., jitter), statistical dependence between modes (e.g., covariant intensities and partial coherence), and multipath propagation effects (e.g., scintillation and scattering).
Analysis of large soil samples for actinides

Science.gov (United States)

Maxwell, III; Sherrod, L [Aiken, SC

2009-03-24

A method of analyzing relatively large soil samples for actinides by employing a separation process that includes cerium fluoride precipitation for removing the soil matrix and precipitates plutonium, americium, and curium with cerium and hydrofluoric acid followed by separating these actinides using chromatography cartridges.
Estimating statistical uncertainty of Monte Carlo efficiency-gain in the context of a correlated sampling Monte Carlo code for brachytherapy treatment planning with non-normal dose distribution.

Science.gov (United States)

Mukhopadhyay, Nitai D; Sampson, Andrew J; Deniz, Daniel; Alm Carlsson, Gudrun; Williamson, Jeffrey; Malusek, Alexandr

2012-01-01

Correlated sampling Monte Carlo methods can shorten computing times in brachytherapy treatment planning. Monte Carlo efficiency is typically estimated via efficiency gain, defined as the reduction in computing time by correlated sampling relative to conventional Monte Carlo methods when equal statistical uncertainties have been achieved. The determination of the efficiency gain uncertainty arising from random effects, however, is not a straightforward task specially when the error distribution is non-normal. The purpose of this study is to evaluate the applicability of the F distribution and standardized uncertainty propagation methods (widely used in metrology to estimate uncertainty of physical measurements) for predicting confidence intervals about efficiency gain estimates derived from single Monte Carlo runs using fixed-collision correlated sampling in a simplified brachytherapy geometry. A bootstrap based algorithm was used to simulate the probability distribution of the efficiency gain estimates and the shortest 95% confidence interval was estimated from this distribution. It was found that the corresponding relative uncertainty was as large as 37% for this particular problem. The uncertainty propagation framework predicted confidence intervals reasonably well; however its main disadvantage was that uncertainties of input quantities had to be calculated in a separate run via a Monte Carlo method. The F distribution noticeably underestimated the confidence interval. These discrepancies were influenced by several photons with large statistical weights which made extremely large contributions to the scored absorbed dose difference. The mechanism of acquiring high statistical weights in the fixed-collision correlated sampling method was explained and a mitigation strategy was proposed. Copyright © 2011 Elsevier Ltd. All rights reserved.
TRAN-STAT, Issue No. 3, January 1978. Topics discussed: some statistical aspects of compositing field samples

International Nuclear Information System (INIS)

Gilbert, R.O.

1978-01-01

Some statistical aspects of compositing field samples of soils for determining the content of Pu are discussed. Some of the potential problems involved in pooling samples are reviewed. This is followed by more detailed discussions and examples of compositing designs, adequacy of mixing, statistical models and their role in compositing, and related topics
Cloud-based solution to identify statistically significant MS peaks differentiating sample categories.

Science.gov (United States)

Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B

2013-03-23

Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.
Statistical Model of Extreme Shear

DEFF Research Database (Denmark)

Hansen, Kurt Schaldemose; Larsen, Gunner Chr.

2005-01-01

In order to continue cost-optimisation of modern large wind turbines, it is important to continuously increase the knowledge of wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... by a model that, on a statistically consistent basis, describes the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of full-scale measurements recorded with a high sampling rate...

Comparing simulated and theoretical sampling distributions of the U3 person-fit statistic

NARCIS (Netherlands)

Emons, Wilco H.M.; Meijer, R.R.; Sijtsma, Klaas

2002-01-01

The accuracy with which the theoretical sampling distribution of van der Flier’s person-fit statistic U3 approaches the empirical U3 sampling distribution is affected by the item discrimination. A simulation study showed that for tests with a moderate or a strong mean item discrimination, the Type I
Bias expansion of spatial statistics and approximation of differenced ...

Indian Academy of Sciences (India)

Investigations of spatial statistics, computed from lattice data in the plane, can lead to a special lattice point counting problem. The statistical goal is to expand the asymptotic expectation or large-sample bias of certain spatial covariance estimators, where this bias typically depends on the shape of a spatial sampling region.
Elementary methods for statistical systems, mean field, large-n, and duality

International Nuclear Information System (INIS)

Itzykson, C.

1983-01-01

Renormalizable field theories are singled out by such precise restraints that regularization schemes must be used to break these invariances. Statistical methods can be adapted to these problems where asymptotically free models fail. This lecture surveys approximation schemes developed in the context of statistical mechanics. The confluence point of statistical mechanics and field theory is the use of discretized path integrals, where continuous space time has been replaced by a regular lattice. Dynamic variables, a Boltzman weight factor, and boundary conditions are the ingredients. Mean field approximations --field equations, Random field transform, and gauge invariant systems--are surveyed. Under Large-N limits vector models are found to simplify tremendously. The reasons why matrix models drawn from SU (n) gauge theories do not simplify are discussed. In the epilogue, random curves versus random surfaces are offered as an example where global and local symmetries are not alike
Effects of (α,n) contaminants and sample multiplication on statistical neutron correlation measurements

International Nuclear Information System (INIS)

Dowdy, E.J.; Hansen, G.E.; Robba, A.A.; Pratt, J.C.

1980-01-01

The complete formalism for the use of statistical neutron fluctuation measurements for the nondestructive assay of fissionable materials has been developed. This formalism includes the effect of detector deadtime, neutron multiplicity, random neutron pulse contributions from (α,n) contaminants in the sample, and the sample multiplication of both fission-related and background neutrons
STATISTICAL LANDMARKS AND PRACTICAL ISSUES REGARDING THE USE OF SIMPLE RANDOM SAMPLING IN MARKET RESEARCHES

Directory of Open Access Journals (Sweden)

CODRUŢA DURA

2010-01-01

Full Text Available The sample represents a particular segment of the statistical populationchosen to represent it as a whole. The representativeness of the sample determines the accuracyfor estimations made on the basis of calculating the research indicators and the inferentialstatistics. The method of random sampling is part of probabilistic methods which can be usedwithin marketing research and it is characterized by the fact that it imposes the requirementthat each unit belonging to the statistical population should have an equal chance of beingselected for the sampling process. When the simple random sampling is meant to be rigorouslyput into practice, it is recommended to use the technique of random number tables in order toconfigure the sample which will provide information that the marketer needs. The paper alsodetails the practical procedure implemented in order to create a sample for a marketingresearch by generating random numbers using the facilities offered by Microsoft Excel.
Replicability of time-varying connectivity patterns in large resting state fMRI samples.

Science.gov (United States)

Abrol, Anees; Damaraju, Eswar; Miller, Robyn L; Stephen, Julia M; Claus, Eric D; Mayer, Andrew R; Calhoun, Vince D

2017-12-01

The past few years have seen an emergence of approaches that leverage temporal changes in whole-brain patterns of functional connectivity (the chronnectome). In this chronnectome study, we investigate the replicability of the human brain's inter-regional coupling dynamics during rest by evaluating two different dynamic functional network connectivity (dFNC) analysis frameworks using 7 500 functional magnetic resonance imaging (fMRI) datasets. To quantify the extent to which the emergent functional connectivity (FC) patterns are reproducible, we characterize the temporal dynamics by deriving several summary measures across multiple large, independent age-matched samples. Reproducibility was demonstrated through the existence of basic connectivity patterns (FC states) amidst an ensemble of inter-regional connections. Furthermore, application of the methods to conservatively configured (statistically stationary, linear and Gaussian) surrogate datasets revealed that some of the studied state summary measures were indeed statistically significant and also suggested that this class of null model did not explain the fMRI data fully. This extensive testing of reproducibility of similarity statistics also suggests that the estimated FC states are robust against variation in data quality, analysis, grouping, and decomposition methods. We conclude that future investigations probing the functional and neurophysiological relevance of time-varying connectivity assume critical importance. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Subclinical delusional ideation and appreciation of sample size and heterogeneity in statistical judgment.

Science.gov (United States)

Galbraith, Niall D; Manktelow, Ken I; Morris, Neil G

2010-11-01

Previous studies demonstrate that people high in delusional ideation exhibit a data-gathering bias on inductive reasoning tasks. The current study set out to investigate the factors that may underpin such a bias by examining healthy individuals, classified as either high or low scorers on the Peters et al. Delusions Inventory (PDI). More specifically, whether high PDI scorers have a relatively poor appreciation of sample size and heterogeneity when making statistical judgments. In Expt 1, high PDI scorers made higher probability estimates when generalizing from a sample of 1 with regard to the heterogeneous human property of obesity. In Expt 2, this effect was replicated and was also observed in relation to the heterogeneous property of aggression. The findings suggest that delusion-prone individuals are less appreciative of the importance of sample size when making statistical judgments about heterogeneous properties; this may underpin the data gathering bias observed in previous studies. There was some support for the hypothesis that threatening material would exacerbate high PDI scorers' indifference to sample size.
Sample Size Requirements for Assessing Statistical Moments of Simulated Crop Yield Distributions

NARCIS (Netherlands)

Lehmann, N.; Finger, R.; Klein, T.; Calanca, P.

2013-01-01

Mechanistic crop growth models are becoming increasingly important in agricultural research and are extensively used in climate change impact assessments. In such studies, statistics of crop yields are usually evaluated without the explicit consideration of sample size requirements. The purpose of
Gibbs sampling on large lattice with GMRF

Science.gov (United States)

Marcotte, Denis; Allard, Denis

2018-02-01

Gibbs sampling is routinely used to sample truncated Gaussian distributions. These distributions naturally occur when associating latent Gaussian fields to category fields obtained by discrete simulation methods like multipoint, sequential indicator simulation and object-based simulation. The latent Gaussians are often used in data assimilation and history matching algorithms. When the Gibbs sampling is applied on a large lattice, the computing cost can become prohibitive. The usual practice of using local neighborhoods is unsatisfying as it can diverge and it does not reproduce exactly the desired covariance. A better approach is to use Gaussian Markov Random Fields (GMRF) which enables to compute the conditional distributions at any point without having to compute and invert the full covariance matrix. As the GMRF is locally defined, it allows simultaneous updating of all points that do not share neighbors (coding sets). We propose a new simultaneous Gibbs updating strategy on coding sets that can be efficiently computed by convolution and applied with an acceptance/rejection method in the truncated case. We study empirically the speed of convergence, the effect of choice of boundary conditions, of the correlation range and of GMRF smoothness. We show that the convergence is slower in the Gaussian case on the torus than for the finite case studied in the literature. However, in the truncated Gaussian case, we show that short scale correlation is quickly restored and the conditioning categories at each lattice point imprint the long scale correlation. Hence our approach enables to realistically apply Gibbs sampling on large 2D or 3D lattice with the desired GMRF covariance.
Efficient Partitioning of Large Databases without Query Statistics

Directory of Open Access Journals (Sweden)

Shahidul Islam KHAN

2016-11-01

Full Text Available An efficient way of improving the performance of a database management system is distributed processing. Distribution of data involves fragmentation or partitioning, replication, and allocation process. Previous research works provided partitioning based on empirical data about the type and frequency of the queries. These solutions are not suitable at the initial stage of a distributed database as query statistics are not available then. In this paper, I have presented a fragmentation technique, Matrix based Fragmentation (MMF, which can be applied at the initial stage as well as at later stages of distributed databases. Instead of using empirical data, I have developed a matrix, Modified Create, Read, Update and Delete (MCRUD, to partition a large database properly. Allocation of fragments is done simultaneously in my proposed technique. So using MMF, no additional complexity is added for allocating the fragments to the sites of a distributed database as fragmentation is synchronized with allocation. The performance of a DDBMS can be improved significantly by avoiding frequent remote access and high data transfer among the sites. Results show that proposed technique can solve the initial partitioning problem of large distributed databases.
A spinner magnetometer for large Apollo lunar samples

Science.gov (United States)

Uehara, M.; Gattacceca, J.; Quesnel, Y.; Lepaulard, C.; Lima, E. A.; Manfredi, M.; Rochette, P.

2017-10-01

We developed a spinner magnetometer to measure the natural remanent magnetization of large Apollo lunar rocks in the storage vault of the Lunar Sample Laboratory Facility (LSLF) of NASA. The magnetometer mainly consists of a commercially available three-axial fluxgate sensor and a hand-rotating sample table with an optical encoder recording the rotation angles. The distance between the sample and the sensor is adjustable according to the sample size and magnetization intensity. The sensor and the sample are placed in a two-layer mu-metal shield to measure the sample natural remanent magnetization. The magnetic signals are acquired together with the rotation angle to obtain stacking of the measured signals over multiple revolutions. The developed magnetometer has a sensitivity of 5 × 10-7 Am2 at the standard sensor-to-sample distance of 15 cm. This sensitivity is sufficient to measure the natural remanent magnetization of almost all the lunar basalt and breccia samples with mass above 10 g in the LSLF vault.
A spinner magnetometer for large Apollo lunar samples.

Science.gov (United States)

Uehara, M; Gattacceca, J; Quesnel, Y; Lepaulard, C; Lima, E A; Manfredi, M; Rochette, P

2017-10-01

We developed a spinner magnetometer to measure the natural remanent magnetization of large Apollo lunar rocks in the storage vault of the Lunar Sample Laboratory Facility (LSLF) of NASA. The magnetometer mainly consists of a commercially available three-axial fluxgate sensor and a hand-rotating sample table with an optical encoder recording the rotation angles. The distance between the sample and the sensor is adjustable according to the sample size and magnetization intensity. The sensor and the sample are placed in a two-layer mu-metal shield to measure the sample natural remanent magnetization. The magnetic signals are acquired together with the rotation angle to obtain stacking of the measured signals over multiple revolutions. The developed magnetometer has a sensitivity of 5 × 10 -7 Am 2 at the standard sensor-to-sample distance of 15 cm. This sensitivity is sufficient to measure the natural remanent magnetization of almost all the lunar basalt and breccia samples with mass above 10 g in the LSLF vault.
Statistical Sampling For In-Service Inspection Of Liquid Waste Tanks At The Savannah River Site

International Nuclear Information System (INIS)

Harris, S.

2011-01-01

Savannah River Remediation, LLC (SRR) is implementing a statistical sampling strategy for In-Service Inspection (ISI) of Liquid Waste (LW) Tanks at the United States Department of Energy's Savannah River Site (SRS) in Aiken, South Carolina. As a component of SRS's corrosion control program, the ISI program assesses tank wall structural integrity through the use of ultrasonic testing (UT). The statistical strategy for ISI is based on the random sampling of a number of vertically oriented unit areas, called strips, within each tank. The number of strips to inspect was determined so as to attain, over time, a high probability of observing at least one of the worst 5% in terms of pitting and corrosion across all tanks. The probability estimation to determine the number of strips to inspect was performed using the hypergeometric distribution. Statistical tolerance limits for pit depth and corrosion rates were calculated by fitting the lognormal distribution to the data. In addition to the strip sampling strategy, a single strip within each tank was identified to serve as the baseline for a longitudinal assessment of the tank safe operational life. The statistical sampling strategy enables the ISI program to develop individual profiles of LW tank wall structural integrity that collectively provide a high confidence in their safety and integrity over operational lifetimes.
Neurocognitive impairment in a large sample of homeless adults with mental illness.

Science.gov (United States)

Stergiopoulos, V; Cusi, A; Bekele, T; Skosireva, A; Latimer, E; Schütz, C; Fernando, I; Rourke, S B

2015-04-01

This study examines neurocognitive functioning in a large, well-characterized sample of homeless adults with mental illness and assesses demographic and clinical factors associated with neurocognitive performance. A total of 1500 homeless adults with mental illness enrolled in the At Home Chez Soi study completed neuropsychological measures assessing speed of information processing, memory, and executive functioning. Sociodemographic and clinical data were also collected. Linear regression analyses were conducted to examine factors associated with neurocognitive performance. Approximately half of our sample met criteria for psychosis, major depressive disorder, and alcohol or substance use disorder, and nearly half had experienced severe traumatic brain injury. Overall, 72% of participants demonstrated cognitive impairment, including deficits in processing speed (48%), verbal learning (71%) and recall (67%), and executive functioning (38%). The overall statistical model explained 19.8% of the variance in the neurocognitive summary score, with reduced neurocognitive performance associated with older age, lower education, first language other than English or French, Black or Other ethnicity, and the presence of psychosis. Homeless adults with mental illness experience impairment in multiple neuropsychological domains. Much of the variance in our sample's cognitive performance remains unexplained, highlighting the need for further research in the mechanisms underlying cognitive impairment in this population. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Statistical Analysis and validation

NARCIS (Netherlands)

Hoefsloot, H.C.J.; Horvatovich, P.; Bischoff, R.

2013-01-01

In this chapter guidelines are given for the selection of a few biomarker candidates from a large number of compounds with a relative low number of samples. The main concepts concerning the statistical validation of the search for biomarkers are discussed. These complicated methods and concepts are
Modified Distribution-Free Goodness-of-Fit Test Statistic.

Science.gov (United States)

Chun, So Yeon; Browne, Michael W; Shapiro, Alexander

2018-03-01

Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.
Large Sample Neutron Activation Analysis: A Challenge in Cultural Heritage Studies

International Nuclear Information System (INIS)

Stamatelatos, I.E.; Tzika, F.

2007-01-01

Large sample neutron activation analysis compliments and significantly extends the analytical tools available for cultural heritage and authentication studies providing unique applications of non-destructive, multi-element analysis of materials that are too precious to damage for sampling purposes, representative sampling of heterogeneous materials or even analysis of whole objects. In this work, correction factors for neutron self-shielding, gamma-ray attenuation and volume distribution of the activity in large volume samples composed of iron and ceramic material were derived. Moreover, the effect of inhomogeneity on the accuracy of the technique was examined
A cost-saving statistically based screening technique for focused sampling of a lead-contaminated site

International Nuclear Information System (INIS)

Moscati, A.F. Jr.; Hediger, E.M.; Rupp, M.J.

1986-01-01

High concentrations of lead in soils along an abandoned railroad line prompted a remedial investigation to characterize the extent of contamination across a 7-acre site. Contamination was thought to be spotty across the site reflecting its past use in battery recycling operations at discrete locations. A screening technique was employed to delineate the more highly contaminated areas by testing a statistically determined minimum number of random samples from each of seven discrete site areas. The approach not only quickly identified those site areas which would require more extensive grid sampling, but also provided a statistically defensible basis for excluding other site areas from further consideration, thus saving the cost of additional sample collection and analysis. The reduction in the number of samples collected in ''clean'' areas of the site ranged from 45 to 60%
Ecotoxicology statistical sampling

International Nuclear Information System (INIS)

Saona, G.

2012-01-01

This presentation introduces to general concepts in toxicology sample designs such as the distribution of organic or inorganic contaminants, a microbiological contamination, and the determination of the position in an eco toxicological bioassays ecosystem.
A statistical rationale for establishing process quality control limits using fixed sample size, for critical current verification of SSC superconducting wire

International Nuclear Information System (INIS)

Pollock, D.A.; Brown, G.; Capone, D.W. II; Christopherson, D.; Seuntjens, J.M.; Woltz, J.

1992-01-01

This work has demonstrated the statistical concepts behind the XBAR R method for determining sample limits to verify billet I c performance and process uniformity. Using a preliminary population estimate for μ and σ from a stable production lot of only 5 billets, we have shown that reasonable sensitivity to systematic process drift and random within billet variation may be achieved, by using per billet subgroup sizes of moderate proportions. The effects of subgroup size (n) and sampling risk (α and β) on the calculated control limits have been shown to be important factors that need to be carefully considered when selecting an actual number of measurements to be used per billet, for each supplier process. Given the present method of testing in which individual wire samples are ramped to I c only once, with measurement uncertainty due to repeatability and reproducibility (typically > 1.4%), large subgroups (i.e. >30 per billet) appear to be unnecessary, except as an inspection tool to confirm wire process history for each spool. The introduction of the XBAR R method or a similar Statistical Quality Control procedure is recommend for use in the superconducing wire production program, particularly when the program transitions from requiring tests for all pieces of wire to sampling each production unit

Some connections between importance sampling and enhanced sampling methods in molecular dynamics.

Science.gov (United States)

Lie, H C; Quer, J

2017-11-21

In molecular dynamics, enhanced sampling methods enable the collection of better statistics of rare events from a reference or target distribution. We show that a large class of these methods is based on the idea of importance sampling from mathematical statistics. We illustrate this connection by comparing the Hartmann-Schütte method for rare event simulation (J. Stat. Mech. Theor. Exp. 2012, P11004) and the Valsson-Parrinello method of variationally enhanced sampling [Phys. Rev. Lett. 113, 090601 (2014)]. We use this connection in order to discuss how recent results from the Monte Carlo methods literature can guide the development of enhanced sampling methods.
Survey of statistical and sampling needs for environmental monitoring of commercial low-level radioactive waste disposal facilities

International Nuclear Information System (INIS)

Eberhardt, L.L.; Thomas, J.M.

1986-07-01

This project was designed to develop guidance for implementing 10 CFR Part 61 and to determine the overall needs for sampling and statistical work in characterizing, surveying, monitoring, and closing commercial low-level waste sites. When cost-effectiveness and statistical reliability are of prime importance, then double sampling, compositing, and stratification (with optimal allocation) are identified as key issues. If the principal concern is avoiding questionable statistical practice, then the applicability of kriging (for assessing spatial pattern), methods for routine monitoring, and use of standard textbook formulae in reporting monitoring results should be reevaluated. Other important issues identified include sampling for estimating model parameters and the use of data from left-censored (less than detectable limits) distributions
Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets

KAUST Repository

Sun, Ying; Stein, Michael L.

2014-01-01

For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.
Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets

KAUST Repository

Sun, Ying

2014-11-07

For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.
Combining censored and uncensored data in a U-statistic: design and sample size implications for cell therapy research.

Science.gov (United States)

Moyé, Lemuel A; Lai, Dejian; Jing, Kaiyan; Baraniuk, Mary Sarah; Kwak, Minjung; Penn, Marc S; Wu, Colon O

2011-01-01

The assumptions that anchor large clinical trials are rooted in smaller, Phase II studies. In addition to specifying the target population, intervention delivery, and patient follow-up duration, physician-scientists who design these Phase II studies must select the appropriate response variables (endpoints). However, endpoint measures can be problematic. If the endpoint assesses the change in a continuous measure over time, then the occurrence of an intervening significant clinical event (SCE), such as death, can preclude the follow-up measurement. Finally, the ideal continuous endpoint measurement may be contraindicated in a fraction of the study patients, a change that requires a less precise substitution in this subset of participants.A score function that is based on the U-statistic can address these issues of 1) intercurrent SCE's and 2) response variable ascertainments that use different measurements of different precision. The scoring statistic is easy to apply, clinically relevant, and provides flexibility for the investigators' prospective design decisions. Sample size and power formulations for this statistic are provided as functions of clinical event rates and effect size estimates that are easy for investigators to identify and discuss. Examples are provided from current cardiovascular cell therapy research.
Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach.

Directory of Open Access Journals (Sweden)

Simon Boitard

2016-03-01

Full Text Available Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey, PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles.
A review of methods for sampling large airborne particles and associated radioactivity

International Nuclear Information System (INIS)

Garland, J.A.; Nicholson, K.W.

1990-01-01

Radioactive particles, tens of μm or more in diameter, are unlikely to be emitted directly from nuclear facilities with exhaust gas cleansing systems, but may arise in the case of an accident or where resuspension from contaminated surfaces is significant. Such particles may dominate deposition and, according to some workers, may contribute to inhalation doses. Quantitative sampling of large airborne particles is difficult because of their inertia and large sedimentation velocities. The literature describes conditions for unbiased sampling and the magnitude of sampling errors for idealised sampling inlets in steady winds. However, few air samplers for outdoor use have been assessed for adequacy of sampling. Many size selective sampling methods are found in the literature but few are suitable at the low concentrations that are often encountered in the environment. A number of approaches for unbiased sampling of large particles have been found in the literature. Some are identified as meriting further study, for application in the measurement of airborne radioactivity. (author)
Two sample Bayesian prediction intervals for order statistics based on the inverse exponential-type distributions using right censored sample

Directory of Open Access Journals (Sweden)

M.M. Mohie El-Din

2011-10-01

Full Text Available In this paper, two sample Bayesian prediction intervals for order statistics (OS are obtained. This prediction is based on a certain class of the inverse exponential-type distributions using a right censored sample. A general class of prior density functions is used and the predictive cumulative function is obtained in the two samples case. The class of the inverse exponential-type distributions includes several important distributions such the inverse Weibull distribution, the inverse Burr distribution, the loglogistic distribution, the inverse Pareto distribution and the inverse paralogistic distribution. Special cases of the inverse Weibull model such as the inverse exponential model and the inverse Rayleigh model are considered.
105-DR Large Sodium Fire Facility decontamination, sampling, and analysis plan

International Nuclear Information System (INIS)

Knaus, Z.C.

1995-01-01

This is the decontamination, sampling, and analysis plan for the closure activities at the 105-DR Large Sodium Fire Facility at Hanford Reservation. This document supports the 105-DR Large Sodium Fire Facility Closure Plan, DOE-RL-90-25. The 105-DR LSFF, which operated from about 1972 to 1986, was a research laboratory that occupied the former ventilation supply room on the southwest side of the 105-DR Reactor facility in the 100-D Area of the Hanford Site. The LSFF was established to investigate fire fighting and safety associated with alkali metal fires in the liquid metal fast breeder reactor facilities. The decontamination, sampling, and analysis plan identifies the decontamination procedures, sampling locations, any special handling requirements, quality control samples, required chemical analysis, and data validation needed to meet the requirements of the 105-DR Large Sodium Fire Facility Closure Plan in compliance with the Resource Conservation and Recovery Act
Constrained statistical inference: sample-size tables for ANOVA and regression

Directory of Open Access Journals (Sweden)

Leonard eVanbrabant

2015-01-01

Full Text Available Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient beta1 is larger than beta2 and beta3. The corresponding hypothesis is H: beta1 > {beta2, beta3} and this is known as an (order constrained hypothesis. A major advantage of testing such a hypothesis is that power can be gained and inherently a smaller sample size is needed. This article discusses this gain in sample size reduction, when an increasing number of constraints is included into the hypothesis. The main goal is to present sample-size tables for constrained hypotheses. A sample-size table contains the necessary sample-size at a prespecified power (say, 0.80 for an increasing number of constraints. To obtain sample-size tables, two Monte Carlo simulations were performed, one for ANOVA and one for multiple regression. Three results are salient. First, in an ANOVA the needed sample-size decreases with 30% to 50% when complete ordering of the parameters is taken into account. Second, small deviations from the imposed order have only a minor impact on the power. Third, at the maximum number of constraints, the linear regression results are comparable with the ANOVA results. However, in the case of fewer constraints, ordering the parameters (e.g., beta1 > beta2 results in a higher power than assigning a positive or a negative sign to the parameters (e.g., beta1 > 0.
Statistical analyses of digital collections: Using a large corpus of systematic reviews to study non-citations

DEFF Research Database (Denmark)

Frandsen, Tove Faber; Nicolaisen, Jeppe

2017-01-01

Using statistical methods to analyse digital material for patterns makes it possible to detect patterns in big data that we would otherwise not be able to detect. This paper seeks to exemplify this fact by statistically analysing a large corpus of references in systematic reviews. The aim...
CAN'T MISS--conquer any number task by making important statistics simple. Part 2. Probability, populations, samples, and normal distributions.

Science.gov (United States)

Hansen, John P

2003-01-01

Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 2, describes probability, populations, and samples. The uses of descriptive and inferential statistics are outlined. The article also discusses the properties and probability of normal distributions, including the standard normal distribution.
Large sample NAA facility and methodology development

International Nuclear Information System (INIS)

Roth, C.; Gugiu, D.; Barbos, D.; Datcu, A.; Aioanei, L.; Dobrea, D.; Taroiu, I. E.; Bucsa, A.; Ghinescu, A.

2013-01-01

A Large Sample Neutron Activation Analysis (LSNAA) facility has been developed at the TRIGA- Annular Core Pulsed Reactor (ACPR) operated by the Institute for Nuclear Research in Pitesti, Romania. The central irradiation cavity of the ACPR core can accommodate a large irradiation device. The ACPR neutron flux characteristics are well known and spectrum adjustment techniques have been successfully applied to enhance the thermal component of the neutron flux in the central irradiation cavity. An analysis methodology was developed by using the MCNP code in order to estimate counting efficiency and correction factors for the major perturbing phenomena. Test experiments, comparison with classical instrumental neutron activation analysis (INAA) methods and international inter-comparison exercise have been performed to validate the new methodology. (authors)
IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

Science.gov (United States)

Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

2017-09-15

Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Sample preparation method for ICP-MS measurement of 99Tc in a large amount of environmental samples

International Nuclear Information System (INIS)

Kondo, M.; Seki, R.

2002-01-01

Sample preparation for measurement of 99 Tc in a large amount of soil and water samples by ICP-MS has been developed using 95m Tc as a yield tracer. This method is based on the conventional method for a small amount of soil samples using incineration, acid digestion, extraction chromatography (TEVA resin) and ICP-MS measurement. Preliminary concentration of Tc has been introduced by co-precipitation with ferric oxide. The matrix materials in a large amount of samples were more sufficiently removed with keeping the high recovery of Tc than previous method. The recovery of Tc was 70-80% for 100 g soil samples and 60-70% for 500 g of soil and 500 L of water samples. The detection limit of this method was evaluated as 0.054 mBq/kg in 500 g soil and 0.032 μBq/L in 500 L water. The determined value of 99 Tc in the IAEA-375 (soil sample collected near the Chernobyl Nuclear Reactor) was 0.25 ± 0.02 Bq/kg. (author)
Supporting Students to Develop Concepts Underlying Sampling and to Shuttle Between Contextual and Statistical Spheres

NARCIS (Netherlands)

Bakker, A.; Dierdorp, A.; Maanen, J.A. van; Eijkelhof, H.M.C.

2012-01-01

To stimulate students’ shuttling between contextual and statistical spheres, we based tasks on professional practices. This article focuses on two tasks to support reasoning about sampling by students aged 16-17. The purpose of the tasks was to find out which smaller sample size would have been
Basics of modern mathematical statistics

CERN Document Server

Spokoiny, Vladimir

2015-01-01

This textbook provides a unified and self-contained presentation of the main approaches to and ideas of mathematical statistics. It collects the basic mathematical ideas and tools needed as a basis for more serious studies or even independent research in statistics. The majority of existing textbooks in mathematical statistics follow the classical asymptotic framework. Yet, as modern statistics has changed rapidly in recent years, new methods and approaches have appeared. The emphasis is on finite sample behavior, large parameter dimensions, and model misspecifications. The present book provides a fully self-contained introduction to the world of modern mathematical statistics, collecting the basic knowledge, concepts and findings needed for doing further research in the modern theoretical and applied statistics. This textbook is primarily intended for graduate and postdoc students and young researchers who are interested in modern statistical methods.
Statistical issues in reporting quality data: small samples and casemix variation.

Science.gov (United States)

Zaslavsky, A M

2001-12-01

To present two key statistical issues that arise in analysis and reporting of quality data. Casemix variation is relevant to quality reporting when the units being measured have differing distributions of patient characteristics that also affect the quality outcome. When this is the case, adjustment using stratification or regression may be appropriate. Such adjustments may be controversial when the patient characteristic does not have an obvious relationship to the outcome. Stratified reporting poses problems for sample size and reporting format, but may be useful when casemix effects vary across units. Although there are no absolute standards of reliability, high reliabilities (interunit F > or = 10 or reliability > or = 0.9) are desirable for distinguishing above- and below-average units. When small or unequal sample sizes complicate reporting, precision may be improved using indirect estimation techniques that incorporate auxiliary information, and 'shrinkage' estimation can help to summarize the strength of evidence about units with small samples. With broader understanding of casemix adjustment and methods for analyzing small samples, quality data can be analysed and reported more accurately.
Statistical Analysis of Large Simulated Yield Datasets for Studying Climate Effects

Science.gov (United States)

Makowski, David; Asseng, Senthold; Ewert, Frank; Bassu, Simona; Durand, Jean-Louis; Martre, Pierre; Adam, Myriam; Aggarwal, Pramod K.; Angulo, Carlos; Baron, Chritian;

2015-01-01

Many studies have been carried out during the last decade to study the effect of climate change on crop yields and other key crop characteristics. In these studies, one or several crop models were used to simulate crop growth and development for different climate scenarios that correspond to different projections of atmospheric CO2 concentration, temperature, and rainfall changes (Semenov et al., 1996; Tubiello and Ewert, 2002; White et al., 2011). The Agricultural Model Intercomparison and Improvement Project (AgMIP; Rosenzweig et al., 2013) builds on these studies with the goal of using an ensemble of multiple crop models in order to assess effects of climate change scenarios for several crops in contrasting environments. These studies generate large datasets, including thousands of simulated crop yield data. They include series of yield values obtained by combining several crop models with different climate scenarios that are defined by several climatic variables (temperature, CO2, rainfall, etc.). Such datasets potentially provide useful information on the possible effects of different climate change scenarios on crop yields. However, it is sometimes difficult to analyze these datasets and to summarize them in a useful way due to their structural complexity; simulated yield data can differ among contrasting climate scenarios, sites, and crop models. Another issue is that it is not straightforward to extrapolate the results obtained for the scenarios to alternative climate change scenarios not initially included in the simulation protocols. Additional dynamic crop model simulations for new climate change scenarios are an option but this approach is costly, especially when a large number of crop models are used to generate the simulated data, as in AgMIP. Statistical models have been used to analyze responses of measured yield data to climate variables in past studies (Lobell et al., 2011), but the use of a statistical model to analyze yields simulated by complex

Statistical evaluation of the data obtained from the K East Basin Sandfilter Backwash Pit samples

International Nuclear Information System (INIS)

Welsh, T.L.

1994-01-01

Samples were obtained from different locations from the K Each Sandfilter Backwash Pit to characterize the sludge material. These samples were analyzed chemically for elements, radionuclides, and residual compounds. The analytical results were statistically analyzed to determine the mean analyte content and the associated variability for each mean value

Discrimination of handlebar grip samples by fourier transform infrared microspectroscopy analysis and statistics

Directory of Open Access Journals (Sweden)

Zeyu Lin

2017-01-01

Full Text Available In this paper, the authors presented a study on the discrimination of handlebar grip samples, to provide effective forensic science service for hit and run traffic cases. 50 bicycle handlebar grip samples, 49 electric bike handlebar grip samples, and 96 motorcycle handlebar grip samples have been randomly collected by the local police in Beijing (China. Fourier transform infrared microspectroscopy (FTIR was utilized as analytical technology. Then, target absorption selection, data pretreatment, and discrimination of linked samples and unlinked samples were chosen as three steps to improve the discrimination of FTIR spectrums collected from different handlebar grip samples. Principal component analysis and receiver operating characteristic curve were utilized to evaluate different data selection methods and different data pretreatment methods, respectively. It is possible to explore the evidential value of handlebar grip residue evidence through instrumental analysis and statistical treatments. It will provide a universal discrimination method for other forensic science samples as well.
Application of Conventional and K0-Based Internal Monostandard NAA Using Reactor Neutrons for Compositional Analysis of Large Samples

International Nuclear Information System (INIS)

Reddy, A.V.R.; Acharya, R.; Swain, K. K.; Pujari, P.K.

2018-01-01

Large sample neutron activation analysis (LSNAA) work was carried out for samples of coal, uranium ore, stainless steel, ancient and new clay potteries, dross and clay pottery replica from Peru using low flux high thermalized irradiation sites. Large as well as non-standard geometry samples (1 g - 0.5 kg) were irradiated using thermal column (TC) facility of Apsara reactor as well as graphite reflector position of critical facility (CF) at Bhabha Atomic Research Centre, Mumbai. Small size (10 - 500 mg) samples were also irradiated at core position of Apsara reactor, pneumatic carrier facility (PCF) of Dhruva reactor and pneumatic fast transfer facility (PFTS) of KAMINI reactor. Irradiation positions were characterized using indium flux monitor for TC and CF whereas multi monitors were used at other positions. Radioactive assay was carried out using high resolution gamma ray spectrometry. The k0-based internal monostandard NAA (IM-NAA) method was used to determine elemental concentration ratios with respect to Na in coal and uranium ore samples, Sc in pottery samples and Fe in stainless steel. Insitu relative detection efficiency for each irradiated sample was obtained using γ rays of activation products in the required energy range. Representative sample sizes were arrived at for coal and uranium ore from the plots of La/Na ratios as a function of the mass of the sample. For stainless steel sample of SS 304L, the absolute concentrations were calculated from concentration ratios by mass balance approach since all the major elements (Fe, Cr, Ni and Mn) were amenable to NAA. Concentration ratios obtained by IM-NAA were used for provenance study of 30 clay potteries, obtained from excavated Buddhist sites of AP, India. The La to Ce concentration ratios were used for preliminary grouping and concentration ratios of 15 elements with respect to Sc were used by statistical cluster analysis for confirmation of grouping. Concentrations of Au and Ag were determined in not so
CORRELATION ANALYSIS OF A LARGE SAMPLE OF NARROW-LINE SEYFERT 1 GALAXIES: LINKING CENTRAL ENGINE AND HOST PROPERTIES

International Nuclear Information System (INIS)

Xu Dawei; Komossa, S.; Wang Jing; Yuan Weimin; Zhou Hongyan; Lu Honglin; Li Cheng; Grupe, Dirk

2012-01-01

We present a statistical study of a large, homogeneously analyzed sample of narrow-line Seyfert 1 (NLS1) galaxies, accompanied by a comparison sample of broad-line Seyfert 1 (BLS1) galaxies. Optical emission-line and continuum properties are subjected to correlation analyses, in order to identify the main drivers of the correlation space of active galactic nuclei (AGNs), and of NLS1 galaxies in particular. For the first time, we have established the density of the narrow-line region as a key parameter in Eigenvector 1 space, as important as the Eddington ratio L/L Edd . This is important because it links the properties of the central engine with the properties of the host galaxy, i.e., the interstellar medium (ISM). We also confirm previously found correlations involving the line width of Hβ and the strength of the Fe II and [O III] λ5007 emission lines, and we confirm the important role played by L/L Edd in driving the properties of NLS1 galaxies. A spatial correlation analysis shows that large-scale environments of the BLS1 and NLS1 galaxies of our sample are similar. If mergers are rare in our sample, accretion-driven winds, on the one hand, or bar-driven inflows, on the other hand, may account for the strong dependence of Eigenvector 1 on ISM density.
Utilization of AHWR critical facility for research and development work on large sample NAA

International Nuclear Information System (INIS)

Acharya, R.; Dasari, K.B.; Pujari, P.K.; Swain, K.K.; Reddy, A.V.R.; Verma, S.K.; De, S.K.

2014-01-01

The graphite reflector position of AHWR critical facility (CF) was utilized for analysis of large size (g-kg scale) samples using internal mono standard neutron activation analysis (IM-NAA). The reactor position was characterized by cadmium ratio method using In monitor for total flux and sub cadmium to epithermal flux ratio (f). Large sample neutron activation analysis (LSNAA) work was carried out for samples of stainless steel, ancient and new clay potteries and dross. Large as well as non-standard geometry samples (1 g - 0.5 kg) were irradiated. Radioactive assay was carried out using high resolution gamma ray spectrometry. Concentration ratios obtained by IM-NAA were used for provenance study of 30 clay potteries, obtained from excavated Buddhist sites of AP, India. Concentrations of Au and Ag were determined in not so homogeneous three large size samples of dross. An X-Z rotary scanning unit has been installed for counting large and not so homogeneous samples. (author)
DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data

International Nuclear Information System (INIS)

Harris, S.P.

1997-01-01

This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock-up Facility using 3 ml inserts and 15 ml peanut vials. A number of the insert samples were analyzed by Cold Chem and compared with full peanut vial samples analyzed by the current methods. The remaining inserts were analyzed by
Ship detection using STFT sea background statistical modeling for large-scale oceansat remote sensing image

Science.gov (United States)

Wang, Lixia; Pei, Jihong; Xie, Weixin; Liu, Jinyuan

2018-03-01

Large-scale oceansat remote sensing images cover a big area sea surface, which fluctuation can be considered as a non-stationary process. Short-Time Fourier Transform (STFT) is a suitable analysis tool for the time varying nonstationary signal. In this paper, a novel ship detection method using 2-D STFT sea background statistical modeling for large-scale oceansat remote sensing images is proposed. First, the paper divides the large-scale oceansat remote sensing image into small sub-blocks, and 2-D STFT is applied to each sub-block individually. Second, the 2-D STFT spectrum of sub-blocks is studied and the obvious different characteristic between sea background and non-sea background is found. Finally, the statistical model for all valid frequency points in the STFT spectrum of sea background is given, and the ship detection method based on the 2-D STFT spectrum modeling is proposed. The experimental result shows that the proposed algorithm can detect ship targets with high recall rate and low missing rate.
Statistical sampling plan for the TRU waste assay facility

International Nuclear Information System (INIS)

Beauchamp, J.J.; Wright, T.; Schultz, F.J.; Haff, K.; Monroe, R.J.

1983-08-01

Due to limited space, there is a need to dispose appropriately of the Oak Ridge National Laboratory transuranic waste which is presently stored below ground in 55-gal (208-l) drums within weather-resistant structures. Waste containing less than 100 nCi/g transuranics can be removed from the present storage and be buried, while waste containing greater than 100 nCi/g transuranics must continue to be retrievably stored. To make the necessary measurements needed to determine the drums that can be buried, a transuranic Neutron Interrogation Assay System (NIAS) has been developed at Los Alamos National Laboratory and can make the needed measurements much faster than previous techniques which involved γ-ray spectroscopy. The previous techniques are reliable but time consuming. Therefore, a validation study has been planned to determine the ability of the NIAS to make adequate measurements. The validation of the NIAS will be based on a paired comparison of a sample of measurements made by the previous techniques and the NIAS. The purpose of this report is to describe the proposed sampling plan and the statistical analyses needed to validate the NIAS. 5 references, 4 figures, 5 tables
Exploring Technostress: Results of a Large Sample Factor Analysis

OpenAIRE

Jonušauskas, Steponas; Raišienė, Agota Giedrė

2016-01-01

With reference to the results of a large sample factor analysis, the article aims to propose the frame examining technostress in a population. The survey and principal component analysis of the sample consisting of 1013 individuals who use ICT in their everyday work was implemented in the research. 13 factors combine 68 questions and explain 59.13 per cent of the answers dispersion. Based on the factor analysis, questionnaire was reframed and prepared to reasonably analyze the respondents’ an...
Remote sensing data with the conditional latin hypercube sampling and geostatistical approach to delineate landscape changes induced by large chronological physical disturbances.

Science.gov (United States)

Lin, Yu-Pin; Chu, Hone-Jay; Wang, Cheng-Long; Yu, Hsiao-Hsuan; Wang, Yung-Chieh

2009-01-01

This study applies variogram analyses of normalized difference vegetation index (NDVI) images derived from SPOT HRV images obtained before and after the ChiChi earthquake in the Chenyulan watershed, Taiwan, as well as images after four large typhoons, to delineate the spatial patterns, spatial structures and spatial variability of landscapes caused by these large disturbances. The conditional Latin hypercube sampling approach was applied to select samples from multiple NDVI images. Kriging and sequential Gaussian simulation with sufficient samples were then used to generate maps of NDVI images. The variography of NDVI image results demonstrate that spatial patterns of disturbed landscapes were successfully delineated by variogram analysis in study areas. The high-magnitude Chi-Chi earthquake created spatial landscape variations in the study area. After the earthquake, the cumulative impacts of typhoons on landscape patterns depended on the magnitudes and paths of typhoons, but were not always evident in the spatiotemporal variability of landscapes in the study area. The statistics and spatial structures of multiple NDVI images were captured by 3,000 samples from 62,500 grids in the NDVI images. Kriging and sequential Gaussian simulation with the 3,000 samples effectively reproduced spatial patterns of NDVI images. However, the proposed approach, which integrates the conditional Latin hypercube sampling approach, variogram, kriging and sequential Gaussian simulation in remotely sensed images, efficiently monitors, samples and maps the effects of large chronological disturbances on spatial characteristics of landscape changes including spatial variability and heterogeneity.
Statistics and sampling in transuranic studies

International Nuclear Information System (INIS)

Eberhardt, L.L.; Gilbert, R.O.

1980-01-01

The existing data on transuranics in the environment exhibit a remarkably high variability from sample to sample (coefficients of variation of 100% or greater). This chapter stresses the necessity of adequate sample size and suggests various ways to increase sampling efficiency. Objectives in sampling are regarded as being of great importance in making decisions as to sampling methodology. Four different classes of sampling methods are described: (1) descriptive sampling, (2) sampling for spatial pattern, (3) analytical sampling, and (4) sampling for modeling. A number of research needs are identified in the various sampling categories along with several problems that appear to be common to two or more such areas
Transport Coefficients from Large Deviation Functions

Directory of Open Access Journals (Sweden)

Chloe Ya Gao

2017-10-01

Full Text Available We describe a method for computing transport coefficients from the direct evaluation of large deviation functions. This method is general, relying on only equilibrium fluctuations, and is statistically efficient, employing trajectory based importance sampling. Equilibrium fluctuations of molecular currents are characterized by their large deviation functions, which are scaled cumulant generating functions analogous to the free energies. A diffusion Monte Carlo algorithm is used to evaluate the large deviation functions, from which arbitrary transport coefficients are derivable. We find significant statistical improvement over traditional Green–Kubo based calculations. The systematic and statistical errors of this method are analyzed in the context of specific transport coefficient calculations, including the shear viscosity, interfacial friction coefficient, and thermal conductivity.
Transport Coefficients from Large Deviation Functions

Science.gov (United States)

Gao, Chloe; Limmer, David

2017-10-01

We describe a method for computing transport coefficients from the direct evaluation of large deviation function. This method is general, relying on only equilibrium fluctuations, and is statistically efficient, employing trajectory based importance sampling. Equilibrium fluctuations of molecular currents are characterized by their large deviation functions, which is a scaled cumulant generating function analogous to the free energy. A diffusion Monte Carlo algorithm is used to evaluate the large deviation functions, from which arbitrary transport coefficients are derivable. We find significant statistical improvement over traditional Green-Kubo based calculations. The systematic and statistical errors of this method are analyzed in the context of specific transport coefficient calculations, including the shear viscosity, interfacial friction coefficient, and thermal conductivity.
105-DR Large sodium fire facility soil sampling data evaluation report

International Nuclear Information System (INIS)

Adler, J.G.

1996-01-01

This report evaluates the soil sampling activities, soil sample analysis, and soil sample data associated with the closure activities at the 105-DR Large Sodium Fire Facility. The evaluation compares these activities to the regulatory requirements for meeting clean closure. The report concludes that there is no soil contamination from the waste treatment activities
4P: fast computing of population genetics statistics from large DNA polymorphism panels.

Science.gov (United States)

Benazzo, Andrea; Panziera, Alex; Bertorelle, Giorgio

2015-01-01

Massive DNA sequencing has significantly increased the amount of data available for population genetics and molecular ecology studies. However, the parallel computation of simple statistics within and between populations from large panels of polymorphic sites is not yet available, making the exploratory analyses of a set or subset of data a very laborious task. Here, we present 4P (parallel processing of polymorphism panels), a stand-alone software program for the rapid computation of genetic variation statistics (including the joint frequency spectrum) from millions of DNA variants in multiple individuals and multiple populations. It handles a standard input file format commonly used to store DNA variation from empirical or simulation experiments. The computational performance of 4P was evaluated using large SNP (single nucleotide polymorphism) datasets from human genomes or obtained by simulations. 4P was faster or much faster than other comparable programs, and the impact of parallel computing using multicore computers or servers was evident. 4P is a useful tool for biologists who need a simple and rapid computer program to run exploratory population genetics analyses in large panels of genomic data. It is also particularly suitable to analyze multiple data sets produced in simulation studies. Unix, Windows, and MacOs versions are provided, as well as the source code for easier pipeline implementations.
The ESO Diffuse Interstellar Bands Large Exploration Survey (EDIBLES) . I. Project description, survey sample, and quality assessment

Science.gov (United States)

Cox, Nick L. J.; Cami, Jan; Farhang, Amin; Smoker, Jonathan; Monreal-Ibero, Ana; Lallement, Rosine; Sarre, Peter J.; Marshall, Charlotte C. M.; Smith, Keith T.; Evans, Christopher J.; Royer, Pierre; Linnartz, Harold; Cordiner, Martin A.; Joblin, Christine; van Loon, Jacco Th.; Foing, Bernard H.; Bhatt, Neil H.; Bron, Emeric; Elyajouri, Meriem; de Koter, Alex; Ehrenfreund, Pascale; Javadi, Atefeh; Kaper, Lex; Khosroshadi, Habib G.; Laverick, Mike; Le Petit, Franck; Mulas, Giacomo; Roueff, Evelyne; Salama, Farid; Spaans, Marco

2017-10-01

The carriers of the diffuse interstellar bands (DIBs) are largely unidentified molecules ubiquitously present in the interstellar medium (ISM). After decades of study, two strong and possibly three weak near-infrared DIBs have recently been attributed to the C60^+ fullerene based on observational and laboratory measurements. There is great promise for the identification of the over 400 other known DIBs, as this result could provide chemical hints towards other possible carriers. In an effort tosystematically study the properties of the DIB carriers, we have initiated a new large-scale observational survey: the ESO Diffuse Interstellar Bands Large Exploration Survey (EDIBLES). The main objective is to build on and extend existing DIB surveys to make a major step forward in characterising the physical and chemical conditions for a statistically significant sample of interstellar lines-of-sight, with the goal to reverse-engineer key molecular properties of the DIB carriers. EDIBLES is a filler Large Programme using the Ultraviolet and Visual Echelle Spectrograph at the Very Large Telescope at Paranal, Chile. It is designed to provide an observationally unbiased view of the presence and behaviour of the DIBs towards early-spectral-type stars whose lines-of-sight probe the diffuse-to-translucent ISM. Such a complete dataset will provide a deep census of the atomic and molecular content, physical conditions, chemical abundances and elemental depletion levels for each sightline. Achieving these goals requires a homogeneous set of high-quality data in terms of resolution (R 70 000-100 000), sensitivity (S/N up to 1000 per resolution element), and spectral coverage (305-1042 nm), as well as a large sample size (100+ sightlines). In this first paper the goals, objectives and methodology of the EDIBLES programme are described and an initial assessment of the data is provided.
DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data

Energy Technology Data Exchange (ETDEWEB)

Harris, S.P. [Westinghouse Savannah River Company, AIKEN, SC (United States)

1997-09-18

This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Either new DWPF analytical method could result in a two to three fold improvement in sample analysis time.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples.The insert sample is named after the initial trials which placed the container inside the sample (peanut) vials. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock
Sampling design in large-scale vegetation studies: Do not sacrifice ecological thinking to statistical purism!

Czech Academy of Sciences Publication Activity Database

Roleček, J.; Chytrý, M.; Hájek, Michal; Lvončík, S.; Tichý, L.

2007-01-01

Roč. 42, - (2007), s. 199-208 ISSN 1211-9520 R&D Projects: GA AV ČR IAA6163303; GA ČR(CZ) GA206/05/0020 Grant - others:GA AV ČR(CZ) KJB601630504 Institutional research plan: CEZ:AV0Z60050516 Keywords : Ecological methodology * Large-scale vegetation patterns * Macroecology Subject RIV: EF - Botanics Impact factor: 1.133, year: 2007
Statistical Modeling of Large-Scale Signal Path Loss in Underwater Acoustic Networks

Directory of Open Access Journals (Sweden)

Manuel Perez Malumbres

2013-02-01

Full Text Available In an underwater acoustic channel, the propagation conditions are known to vary in time, causing the deviation of the received signal strength from the nominal value predicted by a deterministic propagation model. To facilitate a large-scale system design in such conditions (e.g., power allocation, we have developed a statistical propagation model in which the transmission loss is treated as a random variable. By applying repetitive computation to the acoustic field, using ray tracing for a set of varying environmental conditions (surface height, wave activity, small node displacements around nominal locations, etc., an ensemble of transmission losses is compiled and later used to infer the statistical model parameters. A reasonable agreement is found with log-normal distribution, whose mean obeys a log-distance increases, and whose variance appears to be constant for a certain range of inter-node distances in a given deployment location. The statistical model is deemed useful for higher-level system planning, where simulation is needed to assess the performance of candidate network protocols under various resource allocation policies, i.e., to determine the transmit power and bandwidth allocation necessary to achieve a desired level of performance (connectivity, throughput, reliability, etc..
Associations between sociodemographic, sampling and health factors and various salivary cortisol indicators in a large sample without psychopathology

NARCIS (Netherlands)

Vreeburg, Sophie A.; Kruijtzer, Boudewijn P.; van Pelt, Johannes; van Dyck, Richard; DeRijk, Roel H.; Hoogendijk, Witte J. G.; Smit, Johannes H.; Zitman, Frans G.; Penninx, Brenda

Background: Cortisol levels are increasingly often assessed in large-scale psychosomatic research. Although determinants of different salivary cortisol indicators have been described, they have not yet been systematically studied within the same study with a Large sample size. Sociodemographic,
A Note on the Large Sample Properties of Estimators Based on Generalized Linear Models for Correlated Pseudo-observations

DEFF Research Database (Denmark)

Jacobsen, Martin; Martinussen, Torben

2016-01-01

Pseudo-values have proven very useful in censored data analysis in complex settings such as multi-state models. It was originally suggested by Andersen et al., Biometrika, 90, 2003, 335 who also suggested to estimate standard errors using classical generalized estimating equation results. These r......Pseudo-values have proven very useful in censored data analysis in complex settings such as multi-state models. It was originally suggested by Andersen et al., Biometrika, 90, 2003, 335 who also suggested to estimate standard errors using classical generalized estimating equation results....... These results were studied more formally in Graw et al., Lifetime Data Anal., 15, 2009, 241 that derived some key results based on a second-order von Mises expansion. However, results concerning large sample properties of estimates based on regression models for pseudo-values still seem unclear. In this paper......, we study these large sample properties in the simple setting of survival probabilities and show that the estimating function can be written as a U-statistic of second order giving rise to an additional term that does not vanish asymptotically. We further show that previously advocated standard error...

Detecting the Land-Cover Changes Induced by Large-Physical Disturbances Using Landscape Metrics, Spatial Sampling, Simulation and Spatial Analysis

Directory of Open Access Journals (Sweden)

Hone-Jay Chu

2009-08-01

Full Text Available The objectives of the study are to integrate the conditional Latin Hypercube Sampling (cLHS, sequential Gaussian simulation (SGS and spatial analysis in remotely sensed images, to monitor the effects of large chronological disturbances on spatial characteristics of landscape changes including spatial heterogeneity and variability. The multiple NDVI images demonstrate that spatial patterns of disturbed landscapes were successfully delineated by spatial analysis such as variogram, Moran’I and landscape metrics in the study area. The hybrid method delineates the spatial patterns and spatial variability of landscapes caused by these large disturbances. The cLHS approach is applied to select samples from Normalized Difference Vegetation Index (NDVI images from SPOT HRV images in the Chenyulan watershed of Taiwan, and then SGS with sufficient samples is used to generate maps of NDVI images. In final, the NDVI simulated maps are verified using indexes such as the correlation coefficient and mean absolute error (MAE. Therefore, the statistics and spatial structures of multiple NDVI images present a very robust behavior, which advocates the use of the index for the quantification of the landscape spatial patterns and land cover change. In addition, the results transferred by Open Geospatial techniques can be accessed from web-based and end-user applications of the watershed management.
Chemometric and Statistical Analyses of ToF-SIMS Spectra of Increasingly Complex Biological Samples

Energy Technology Data Exchange (ETDEWEB)

Berman, E S; Wu, L; Fortson, S L; Nelson, D O; Kulp, K S; Wu, K J

2007-10-24

Characterizing and classifying molecular variation within biological samples is critical for determining fundamental mechanisms of biological processes that will lead to new insights including improved disease understanding. Towards these ends, time-of-flight secondary ion mass spectrometry (ToF-SIMS) was used to examine increasingly complex samples of biological relevance, including monosaccharide isomers, pure proteins, complex protein mixtures, and mouse embryo tissues. The complex mass spectral data sets produced were analyzed using five common statistical and chemometric multivariate analysis techniques: principal component analysis (PCA), linear discriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), soft independent modeling of class analogy (SIMCA), and decision tree analysis by recursive partitioning. PCA was found to be a valuable first step in multivariate analysis, providing insight both into the relative groupings of samples and into the molecular basis for those groupings. For the monosaccharides, pure proteins and protein mixture samples, all of LDA, PLSDA, and SIMCA were found to produce excellent classification given a sufficient number of compound variables calculated. For the mouse embryo tissues, however, SIMCA did not produce as accurate a classification. The decision tree analysis was found to be the least successful for all the data sets, providing neither as accurate a classification nor chemical insight for any of the tested samples. Based on these results we conclude that as the complexity of the sample increases, so must the sophistication of the multivariate technique used to classify the samples. PCA is a preferred first step for understanding ToF-SIMS data that can be followed by either LDA or PLSDA for effective classification analysis. This study demonstrates the strength of ToF-SIMS combined with multivariate statistical and chemometric techniques to classify increasingly complex biological samples
Examination of statistical noise in SPECT image and sampling pitch

International Nuclear Information System (INIS)

Takaki, Akihiro; Soma, Tsutomu; Murase, Kenya; Watanabe, Hiroyuki; Murakami, Tomonori; Kawakami, Kazunori; Teraoka, Satomi; Kojima, Akihiro; Matsumoto, Masanori

2008-01-01

Statistical noise in single photon emission computed tomography (SPECT) image was examined for its relation with total count and with sampling pitch by simulation and phantom experiment to obtain their projection data under defined conditions. The former SPECT simulation was performed on assumption of a virtual, homogeneous water column (20 cm diameter) as an absorbing mass. In the latter, used were 3D-Hoffman brain phantom (Data Spectrum Corp.) filled with 370 MBq of 99m Tc-pertechnetate solution and a facing 2-detector SPECT machine with a low-energy/high-resolution collimator, E-CAM (Siemens). Projected data by the two methods were reconstructed through the filtered back projection to make each transaxial image. The noise was evaluated by vision, by their root mean square uncertainty calculated from average count and standard deviation (SD) in the region of interest (ROI) defined in reconstructed images and by normalized mean squares calculated from the difference between the reference image obtained with common sampling pitch to and all of obtained slices of, the simulation and phantom. As a conclusion, the pitch was recommended to be set in the machine as to approximating the value calculated by the sampling theorem, though the projection counts per one angular direction were smaller with the same total time of data acquisition. (R.T.)
Data management in large-scale collaborative toxicity studies: how to file experimental data for automated statistical analysis.

Science.gov (United States)

Stanzel, Sven; Weimer, Marc; Kopp-Schneider, Annette

2013-06-01

High-throughput screening approaches are carried out for the toxicity assessment of a large number of chemical compounds. In such large-scale in vitro toxicity studies several hundred or thousand concentration-response experiments are conducted. The automated evaluation of concentration-response data using statistical analysis scripts saves time and yields more consistent results in comparison to data analysis performed by the use of menu-driven statistical software. Automated statistical analysis requires that concentration-response data are available in a standardised data format across all compounds. To obtain consistent data formats, a standardised data management workflow must be established, including guidelines for data storage, data handling and data extraction. In this paper two procedures for data management within large-scale toxicological projects are proposed. Both procedures are based on Microsoft Excel files as the researcher's primary data format and use a computer programme to automate the handling of data files. The first procedure assumes that data collection has not yet started whereas the second procedure can be used when data files already exist. Successful implementation of the two approaches into the European project ACuteTox is illustrated. Copyright © 2012 Elsevier Ltd. All rights reserved.
Gentile statistics with a large maximum occupation number

International Nuclear Information System (INIS)

Dai Wusheng; Xie Mi

2004-01-01

In Gentile statistics the maximum occupation number can take on unrestricted integers: 1 1 the Bose-Einstein case is not recovered from Gentile statistics as n goes to N. Attention is also concentrated on the contribution of the ground state which was ignored in related literature. The thermodynamic behavior of a ν-dimensional Gentile ideal gas of particle of dispersion E=p s /2m, where ν and s are arbitrary, is analyzed in detail. Moreover, we provide an alternative derivation of the partition function for Gentile statistics
Scalar energy fluctuations in Large-Eddy Simulation of turbulent flames: Statistical budgets and mesh quality criterion

Energy Technology Data Exchange (ETDEWEB)

Vervisch, Luc; Domingo, Pascale; Lodato, Guido [CORIA - CNRS and INSA de Rouen, Technopole du Madrillet, BP 8, 76801 Saint-Etienne-du-Rouvray (France); Veynante, Denis [EM2C - CNRS and Ecole Centrale Paris, Grande Voie des Vignes, 92295 Chatenay-Malabry (France)

2010-04-15

Large-Eddy Simulation (LES) provides space-filtered quantities to compare with measurements, which usually have been obtained using a different filtering operation; hence, numerical and experimental results can be examined side-by-side in a statistical sense only. Instantaneous, space-filtered and statistically time-averaged signals feature different characteristic length-scales, which can be combined in dimensionless ratios. From two canonical manufactured turbulent solutions, a turbulent flame and a passive scalar turbulent mixing layer, the critical values of these ratios under which measured and computed variances (resolved plus sub-grid scale) can be compared without resorting to additional residual terms are first determined. It is shown that actual Direct Numerical Simulation can hardly accommodate a sufficiently large range of length-scales to perform statistical studies of LES filtered reactive scalar-fields energy budget based on sub-grid scale variances; an estimation of the minimum Reynolds number allowing for such DNS studies is given. From these developments, a reliability mesh criterion emerges for scalar LES and scaling for scalar sub-grid scale energy is discussed. (author)
Assessing the validity of single-item life satisfaction measures: results from three large samples.

Science.gov (United States)

Cheung, Felix; Lucas, Richard E

2014-12-01

The present paper assessed the validity of single-item life satisfaction measures by comparing single-item measures to the Satisfaction with Life Scale (SWLS)-a more psychometrically established measure. Two large samples from Washington (N = 13,064) and Oregon (N = 2,277) recruited by the Behavioral Risk Factor Surveillance System and a representative German sample (N = 1,312) recruited by the Germany Socio-Economic Panel were included in the present analyses. Single-item life satisfaction measures and the SWLS were correlated with theoretically relevant variables, such as demographics, subjective health, domain satisfaction, and affect. The correlations between the two life satisfaction measures and these variables were examined to assess the construct validity of single-item life satisfaction measures. Consistent across three samples, single-item life satisfaction measures demonstrated substantial degree of criterion validity with the SWLS (zero-order r = 0.62-0.64; disattenuated r = 0.78-0.80). Patterns of statistical significance for correlations with theoretically relevant variables were the same across single-item measures and the SWLS. Single-item measures did not produce systematically different correlations compared to the SWLS (average difference = 0.001-0.005). The average absolute difference in the magnitudes of the correlations produced by single-item measures and the SWLS was very small (average absolute difference = 0.015-0.042). Single-item life satisfaction measures performed very similarly compared to the multiple-item SWLS. Social scientists would get virtually identical answer to substantive questions regardless of which measure they use.
Statistical energy as a tool for binning-free, multivariate goodness-of-fit tests, two-sample comparison and unfolding

International Nuclear Information System (INIS)

Aslan, B.; Zech, G.

2005-01-01

We introduce the novel concept of statistical energy as a statistical tool. We define statistical energy of statistical distributions in a similar way as for electric charge distributions. Charges of opposite sign are in a state of minimum energy if they are equally distributed. This property is used to check whether two samples belong to the same parent distribution, to define goodness-of-fit tests and to unfold distributions distorted by measurement. The approach is binning-free and especially powerful in multidimensional applications
Double sampling with multiple imputation to answer large sample meta-research questions: Introduction and illustration by evaluating adherence to two simple CONSORT guidelines

Directory of Open Access Journals (Sweden)

Patrice L. Capers

2015-03-01

Full Text Available BACKGROUND: Meta-research can involve manual retrieval and evaluation of research, which is resource intensive. Creation of high throughput methods (e.g., search heuristics, crowdsourcing has improved feasibility of large meta-research questions, but possibly at the cost of accuracy. OBJECTIVE: To evaluate the use of double sampling combined with multiple imputation (DS+MI to address meta-research questions, using as an example adherence of PubMed entries to two simple Consolidated Standards of Reporting Trials (CONSORT guidelines for titles and abstracts. METHODS: For the DS large sample, we retrieved all PubMed entries satisfying the filters: RCT; human; abstract available; and English language (n=322,107. For the DS subsample, we randomly sampled 500 entries from the large sample. The large sample was evaluated with a lower rigor, higher throughput (RLOTHI method using search heuristics, while the subsample was evaluated using a higher rigor, lower throughput (RHITLO human rating method. Multiple imputation of the missing-completely-at-random RHITLO data for the large sample was informed by: RHITLO data from the subsample; RLOTHI data from the large sample; whether a study was an RCT; and country and year of publication. RESULTS: The RHITLO and RLOTHI methods in the subsample largely agreed (phi coefficients: title=1.00, abstract=0.92. Compliance with abstract and title criteria has increased over time, with non-US countries improving more rapidly. DS+MI logistic regression estimates were more precise than subsample estimates (e.g., 95% CI for change in title and abstract compliance by Year: subsample RHITLO 1.050-1.174 vs. DS+MI 1.082-1.151. As evidence of improved accuracy, DS+MI coefficient estimates were closer to RHITLO than the large sample RLOTHI. CONCLUSIONS: Our results support our hypothesis that DS+MI would result in improved precision and accuracy. This method is flexible and may provide a practical way to examine large corpora of
Spatial statistics of hydrography and water chemistry in a eutrophic boreal lake based on sounding and water samples.

Science.gov (United States)

Leppäranta, Matti; Lewis, John E; Heini, Anniina; Arvola, Lauri

2018-06-04

Spatial variability, an essential characteristic of lake ecosystems, has often been neglected in field research and monitoring. In this study, we apply spatial statistical methods for the key physics and chemistry variables and chlorophyll a over eight sampling dates in two consecutive years in a large (area 103 km 2 ) eutrophic boreal lake in southern Finland. In the four summer sampling dates, the water body was vertically and horizontally heterogenic except with color and DOC, in the two winter ice-covered dates DO was vertically stratified, while in the two autumn dates, no significant spatial differences in any of the measured variables were found. Chlorophyll a concentration was one order of magnitude lower under the ice cover than in open water. The Moran statistic for spatial correlation was significant for chlorophyll a and NO 2 +NO 3 -N in all summer situations and for dissolved oxygen and pH in three cases. In summer, the mass centers of the chemicals were within 1.5 km from the geometric center of the lake, and the 2nd moment radius ranged in 3.7-4.1 km respective to 3.9 km for the homogeneous situation. The lateral length scales of the studied variables were 1.5-2.5 km, about 1 km longer in the surface layer. The detected spatial "noise" strongly suggests that besides vertical variation also the horizontal variation in eutrophic lakes, in particular, should be considered when the ecosystems are monitored.
Algorithm for computing significance levels using the Kolmogorov-Smirnov statistic and valid for both large and small samples

Energy Technology Data Exchange (ETDEWEB)

Kurtz, S.E.; Fields, D.E.

1983-10-01

The KSTEST code presented here is designed to perform the Kolmogorov-Smirnov one-sample test. The code may be used as a stand-alone program or the principal subroutines may be excerpted and used to service other programs. The Kolmogorov-Smirnov one-sample test is a nonparametric goodness-of-fit test. A number of codes to perform this test are in existence, but they suffer from the inability to provide meaningful results in the case of small sample sizes (number of values less than or equal to 80). The KSTEST code overcomes this inadequacy by using two distinct algorithms. If the sample size is greater than 80, an asymptotic series developed by Smirnov is evaluated. If the sample size is 80 or less, a table of values generated by Birnbaum is referenced. Valid results can be obtained from KSTEST when the sample contains from 3 to 300 data points. The program was developed on a Digital Equipment Corporation PDP-10 computer using the FORTRAN-10 language. The code size is approximately 450 card images and the typical CPU execution time is 0.19 s.
Statistical methods for including two-body forces in large system calculations

International Nuclear Information System (INIS)

Grimes, S.M.

1980-07-01

Large systems of interacting particles are often treated by assuming that the effect on any one particle of the remaining N-1 may be approximated by an average potential. This approach reduces the problem to that of finding the bound-state solutions for a particle in a potential; statistical mechanics is then used to obtain the properties of the many-body system. In some physical systems this approach may not be acceptable, because the two-body force component cannot be treated in this one-body limit. A technique for incorporating two-body forces in such calculations in a more realistic fashion is described. 1 figure
Application of nonparametric statistics to material strength/reliability assessment

International Nuclear Information System (INIS)

Arai, Taketoshi

1992-01-01

An advanced material technology requires data base on a wide variety of material behavior which need to be established experimentally. It may often happen that experiments are practically limited in terms of reproducibility or a range of test parameters. Statistical methods can be applied to understanding uncertainties in such a quantitative manner as required from the reliability point of view. Statistical assessment involves determinations of a most probable value and the maximum and/or minimum value as one-sided or two-sided confidence limit. A scatter of test data can be approximated by a theoretical distribution only if the goodness of fit satisfies a test criterion. Alternatively, nonparametric statistics (NPS) or distribution-free statistics can be applied. Mathematical procedures by NPS are well established for dealing with most reliability problems. They handle only order statistics of a sample. Mathematical formulas and some applications to engineering assessments are described. They include confidence limits of median, population coverage of sample, required minimum number of a sample, and confidence limits of fracture probability. These applications demonstrate that a nonparametric statistical estimation is useful in logical decision making in the case a large uncertainty exists. (author)
Sample Reuse in Statistical Remodeling.

Science.gov (United States)

1987-08-01

as the jackknife and bootstrap, is an expansion of the functional, T(Fn), or of its distribution function or both. Frangos and Schucany (1987a) used...accelerated bootstrap. In the same report Frangos and Schucany demonstrated the small sample superiority of that approach over the proposals that take...higher order terms of an Edgeworth expansion into account. In a second report Frangos and Schucany (1987b) examined the small sample performance of
Optimal design of sampling and mapping schemes in the radiometric exploration of Chipilapa, El Salvador (Geo-statistics)

International Nuclear Information System (INIS)

Balcazar G, M.; Flores R, J.H.

1992-01-01

As part of the knowledge about the radiometric surface exploration, carried out in the geothermal field of Chipilapa, El Salvador, its were considered the geo-statistical parameters starting from the calculated variogram of the field data, being that the maxim distance of correlation of the samples in 'radon' in the different observation addresses (N-S, E-W, N W-S E, N E-S W), it was of 121 mts for the monitoring grill in future prospectus in the same area. Being derived of it an optimization (minimum cost) in the spacing of the field samples by means of geo-statistical techniques, without losing the detection of the anomaly. (Author)
'Intelligent' approach to radioimmunoassay sample counting employing a microprocessor controlled sample counter

International Nuclear Information System (INIS)

Ekins, R.P.; Sufi, S.; Malan, P.G.

1977-01-01

The enormous impact on medical science in the last two decades of microanalytical techniques employing radioisotopic labels has, in turn, generated a large demand for automatic radioisotopic sample counters. Such instruments frequently comprise the most important item of capital equipment required in the use of radioimmunoassay and related techniques and often form a principle bottleneck in the flow of samples through a busy laboratory. It is therefore particularly imperitive that such instruments should be used 'intelligently' and in an optimal fashion to avoid both the very large capital expenditure involved in the unnecessary proliferation of instruments and the time delays arising from their sub-optimal use. The majority of the current generation of radioactive sample counters nevertheless rely on primitive control mechanisms based on a simplistic statistical theory of radioactive sample counting which preclude their efficient and rational use. The fundamental principle upon which this approach is based is that it is useless to continue counting a radioactive sample for a time longer than that required to yield a significant increase in precision of the measurement. Thus, since substantial experimental errors occur during sample preparation, these errors should be assessed and must be releted to the counting errors for that sample. It is the objective of this presentation to demonstrate that the combination of a realistic statistical assessment of radioactive sample measurement, together with the more sophisticated control mechanisms that modern microprocessor technology make possible, may often enable savings in counter usage of the order of 5-10 fold to be made. (orig.) [de
Economic and Humanistic Burden of Osteoarthritis: A Systematic Review of Large Sample Studies.

Science.gov (United States)

Xie, Feng; Kovic, Bruno; Jin, Xuejing; He, Xiaoning; Wang, Mengxiao; Silvestre, Camila

2016-11-01

Osteoarthritis (OA) consumes a significant amount of healthcare resources, and impairs the health-related quality of life (HRQoL) of patients. Previous reviews have consistently found substantial variations in the costs of OA across studies and countries. The comparability between studies was poor and limited the detection of the true differences between these studies. To review large sample studies on measuring the economic and/or humanistic burden of OA published since May 2006. We searched MEDLINE and EMBASE databases using comprehensive search strategies to identify studies reporting economic burden and HRQoL of OA. We included large sample studies if they had a sample size ≥1000 and measured the cost and/or HRQoL of OA. Reviewers worked independently and in duplicate, performing a cross-check between groups to verify agreement. Within- and between-group consolidation was performed to resolve discrepancies, with outstanding discrepancies being resolved by an arbitrator. The Kappa statistic was reported to assess the agreement between the reviewers. All costs were adjusted in their original currency to year 2015 using published inflation rates for the country where the study was conducted, and then converted to 2015 US dollars. A total of 651 articles were screened by title and abstract, 94 were reviewed in full text, and 28 were included in the final review. The Kappa value was 0.794. Twenty studies reported direct costs and nine reported indirect costs. The total annual average direct costs varied from US$1442 to US$21,335, both in USA. The annual average indirect costs ranged from US$238 to US$29,935. Twelve studies measured HRQoL using various instruments. The Short Form 12 version 2 scores ranged from 35.0 to 51.3 for the physical component, and from 43.5 to 55.0 for the mental component. Health utilities varied from 0.30 for severe OA to 0.77 for mild OA. Per-patient OA costs are considerable and a patient's quality of life remains poor. Variations in
Sampling strategy for a large scale indoor radiation survey - a pilot project

International Nuclear Information System (INIS)

Strand, T.; Stranden, E.

1986-01-01

Optimisation of a stratified random sampling strategy for large scale indoor radiation surveys is discussed. It is based on the results from a small scale pilot project where variances in dose rates within different categories of houses were assessed. By selecting a predetermined precision level for the mean dose rate in a given region, the number of measurements needed can be optimised. The results of a pilot project in Norway are presented together with the development of the final sampling strategy for a planned large scale survey. (author)
An open-flow pulse ionization chamber for alpha spectrometry of large-area samples

International Nuclear Information System (INIS)

Johansson, L.; Roos, B.; Samuelsson, C.

1992-01-01

The presented open-flow pulse ionization chamber was developed to make alpha spectrometry on large-area surfaces easy. One side of the chamber is left open, where the sample is to be placed. The sample acts as a chamber wall and therby defeins the detector volume. The sample area can be as large as 400 cm 2 . To prevent air from entering the volume there is a constant gas flow through the detector, coming in at the bottom of the chamber and leaking at the sides of the sample. The method results in good energy resolution and has considerable applicability in the retrospective radon research. Alpha spectra obtained in the retrospective measurements descend from 210 Po, built up in the sample from the radon daughters recoiled into a glass surface. (au)
Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

Energy Technology Data Exchange (ETDEWEB)

Cabalin, L.M.; Gonzalez, A. [Department of Analytical Chemistry, University of Malaga, E-29071 Malaga (Spain); Ruiz, J. [Department of Applied Physics I, University of Malaga, E-29071 Malaga (Spain); Laserna, J.J., E-mail: laserna@uma.e [Department of Analytical Chemistry, University of Malaga, E-29071 Malaga (Spain)

2010-08-15

Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s{sup -1}. Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

Science.gov (United States)

Cabalín, L. M.; González, A.; Ruiz, J.; Laserna, J. J.

2010-08-01

Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s - 1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.
Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

International Nuclear Information System (INIS)

Cabalin, L.M.; Gonzalez, A.; Ruiz, J.; Laserna, J.J.

2010-01-01

Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s -1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.
Lack of association between digit ratio (2D:4D) and assertiveness: replication in a large sample.

Science.gov (United States)

Voracek, Martin

2009-12-01

Findings regarding within-sex associations of digit ratio (2D:4D), a putative pointer to long-lasting effects of prenatal androgen action, and sexually differentiated personality traits have generally been inconsistent or unreplicable, suggesting that effects in this domain, if any, are likely small. In contrast to evidence from Wilson's important 1983 study, a forerunner of modern 2D:4D research, two recent studies in 2005 and 2008 by Freeman, et al. and Hampson, et al. showed assertiveness, a presumably male-typed personality trait, was not associated with 2D:4D; however, these studies were clearly statistically underpowered. Hence this study examined this question anew, based on a large sample of 491 men and 627 women. Assertiveness was only modestly sexually differentiated, favoring men, and a positive correlate of age and education and a negative correlate of weight and Body Mass Index among women, but not men. Replicating the two prior studies, 2D:4D was throughout unrelated to assertiveness scores. This null finding was preserved with controls for correlates of assertiveness, also in nonparametric analysis and with tests for curvilinear relations. Discussed are implications of this specific null finding, now replicated in a large sample, for studies of 2D:4D and personality in general and novel research approaches to proceed in this field.
The outlier sample effects on multivariate statistical data processing geochemical stream sediment survey (Moghangegh region, North West of Iran)

International Nuclear Information System (INIS)

Ghanbari, Y.; Habibnia, A.; Memar, A.

2009-01-01

In geochemical stream sediment surveys in Moghangegh Region in north west of Iran, sheet 1:50,000, 152 samples were collected and after the analyze and processing of data, it revealed that Yb, Sc, Ni, Li, Eu, Cd, Co, as contents in one sample is far higher than other samples. After detecting this sample as an outlier sample, the effect of this sample on multivariate statistical data processing for destructive effects of outlier sample in geochemical exploration was investigated. Pearson and Spear man correlation coefficient methods and cluster analysis were used for multivariate studies and the scatter plot of some elements together the regression profiles are given in case of 152 and 151 samples and the results are compared. After investigation of multivariate statistical data processing results, it was realized that results of existence of outlier samples may appear as the following relations between elements: - true relation between two elements, which have no outlier frequency in the outlier sample. - false relation between two elements which one of them has outlier frequency in the outlier sample. - complete false relation between two elements which both have outlier frequency in the outlier sample
Fast concentration of dissolved forms of cesium radioisotopes from large seawater samples

International Nuclear Information System (INIS)

Jan Kamenik; Henrieta Dulaiova; Ferdinand Sebesta; Kamila St'astna; Czech Technical University, Prague

2013-01-01

The method developed for cesium concentration from large freshwater samples was tested and adapted for analysis of cesium radionuclides in seawater. Concentration of dissolved forms of cesium in large seawater samples (about 100 L) was performed using composite absorbers AMP-PAN and KNiFC-PAN with ammonium molybdophosphate and potassium–nickel hexacyanoferrate(II) as active components, respectively, and polyacrylonitrile as a binding polymer. A specially designed chromatography column with bed volume (BV) 25 mL allowed fast flow rates of seawater (up to 1,200 BV h -1 ). The recovery yields were determined by ICP-MS analysis of stable cesium added to seawater sample. Both absorbers proved usability for cesium concentration from large seawater samples. KNiFC-PAN material was slightly more effective in cesium concentration from acidified seawater (recovery yield around 93 % for 700 BV h -1 ). This material showed similar efficiency in cesium concentration also from natural seawater. The activity concentrations of 137 Cs determined in seawater from the central Pacific Ocean were 1.5 ± 0.1 and 1.4 ± 0.1 Bq m -3 for an offshore (January 2012) and a coastal (February 2012) locality, respectively, 134 Cs activities were below detection limit ( -3 ). (author)
Large-eddy simulation in a mixing tee junction: High-order turbulent statistics analysis

International Nuclear Information System (INIS)

Howard, Richard J.A.; Serre, Eric

2015-01-01

Highlights: • Mixing and thermal fluctuations in a junction are studied using large eddy simulation. • Adiabatic and conducting steel wall boundaries are tested. • Wall thermal fluctuations are not the same between the flow and the solid. • Solid thermal fluctuations cannot be predicted from the fluid thermal fluctuations. • High-order turbulent statistics show that the turbulent transport term is important. - Abstract: This study analyses the mixing and thermal fluctuations induced in a mixing tee junction with circular cross-sections when cold water flowing in a pipe is joined by hot water from a branch pipe. This configuration is representative of industrial piping systems in which temperature fluctuations in the fluid may cause thermal fatigue damage on the walls. Implicit large-eddy simulations (LES) are performed for equal inflow rates corresponding to a bulk Reynolds number Re = 39,080. Two different thermal boundary conditions are studied for the pipe walls; an insulating adiabatic boundary and a conducting steel wall boundary. The predicted flow structures show a satisfactory agreement with the literature. The velocity and thermal fields (including high-order statistics) are not affected by the heat transfer with the steel walls. However, predicted thermal fluctuations at the boundary are not the same between the flow and the solid, showing that solid thermal fluctuations cannot be predicted by the knowledge of the fluid thermal fluctuations alone. The analysis of high-order turbulent statistics provides a better understanding of the turbulence features. In particular, the budgets of the turbulent kinetic energy and temperature variance allows a comparative analysis of dissipation, production and transport terms. It is found that the turbulent transport term is an important term that acts to balance the production. We therefore use a priori tests to evaluate three different models for the triple correlation
Exact distributions of two-sample rank statistics and block rank statistics using computer algebra

NARCIS (Netherlands)

Wiel, van de M.A.

1998-01-01

We derive generating functions for various rank statistics and we use computer algebra to compute the exact null distribution of these statistics. We present various techniques for reducing time and memory space used by the computations. We use the results to write Mathematica notebooks for
Solving Large-Scale Computational Problems Using Insights from Statistical Physics

Energy Technology Data Exchange (ETDEWEB)

Selman, Bart [Cornell University

2012-02-29

Many challenging problems in computer science and related fields can be formulated as constraint satisfaction problems. Such problems consist of a set of discrete variables and a set of constraints between those variables, and represent a general class of so-called NP-complete problems. The goal is to find a value assignment to the variables that satisfies all constraints, generally requiring a search through and exponentially large space of variable-value assignments. Models for disordered systems, as studied in statistical physics, can provide important new insights into the nature of constraint satisfaction problems. Recently, work in this area has resulted in the discovery of a new method for solving such problems, called the survey propagation (SP) method. With SP, we can solve problems with millions of variables and constraints, an improvement of two orders of magnitude over previous methods.
Assessing the Validity of Single-item Life Satisfaction Measures: Results from Three Large Samples

Science.gov (United States)

Cheung, Felix; Lucas, Richard E.

2014-01-01

Purpose The present paper assessed the validity of single-item life satisfaction measures by comparing single-item measures to the Satisfaction with Life Scale (SWLS) - a more psychometrically established measure. Methods Two large samples from Washington (N=13,064) and Oregon (N=2,277) recruited by the Behavioral Risk Factor Surveillance System (BRFSS) and a representative German sample (N=1,312) recruited by the Germany Socio-Economic Panel (GSOEP) were included in the present analyses. Single-item life satisfaction measures and the SWLS were correlated with theoretically relevant variables, such as demographics, subjective health, domain satisfaction, and affect. The correlations between the two life satisfaction measures and these variables were examined to assess the construct validity of single-item life satisfaction measures. Results Consistent across three samples, single-item life satisfaction measures demonstrated substantial degree of criterion validity with the SWLS (zero-order r = 0.62 – 0.64; disattenuated r = 0.78 – 0.80). Patterns of statistical significance for correlations with theoretically relevant variables were the same across single-item measures and the SWLS. Single-item measures did not produce systematically different correlations compared to the SWLS (average difference = 0.001 – 0.005). The average absolute difference in the magnitudes of the correlations produced by single-item measures and the SWLS were very small (average absolute difference = 0.015 −0.042). Conclusions Single-item life satisfaction measures performed very similarly compared to the multiple-item SWLS. Social scientists would get virtually identical answer to substantive questions regardless of which measure they use. PMID:24890827
Efficient bootstrap estimates for tail statistics

Science.gov (United States)

Breivik, Øyvind; Aarnes, Ole Johan

2017-03-01

Bootstrap resamples can be used to investigate the tail of empirical distributions as well as return value estimates from the extremal behaviour of the sample. Specifically, the confidence intervals on return value estimates or bounds on in-sample tail statistics can be obtained using bootstrap techniques. However, non-parametric bootstrapping from the entire sample is expensive. It is shown here that it suffices to bootstrap from a small subset consisting of the highest entries in the sequence to make estimates that are essentially identical to bootstraps from the entire sample. Similarly, bootstrap estimates of confidence intervals of threshold return estimates are found to be well approximated by using a subset consisting of the highest entries. This has practical consequences in fields such as meteorology, oceanography and hydrology where return values are calculated from very large gridded model integrations spanning decades at high temporal resolution or from large ensembles of independent and identically distributed model fields. In such cases the computational savings are substantial.
A self-sampling method to obtain large volumes of undiluted cervicovaginal secretions.

Science.gov (United States)

Boskey, Elizabeth R; Moench, Thomas R; Hees, Paul S; Cone, Richard A

2003-02-01

Studies of vaginal physiology and pathophysiology sometime require larger volumes of undiluted cervicovaginal secretions than can be obtained by current methods. A convenient method for self-sampling these secretions outside a clinical setting can facilitate such studies of reproductive health. The goal was to develop a vaginal self-sampling method for collecting large volumes of undiluted cervicovaginal secretions. A menstrual collection device (the Instead cup) was inserted briefly into the vagina to collect secretions that were then retrieved from the cup by centrifugation in a 50-ml conical tube. All 16 women asked to perform this procedure found it feasible and acceptable. Among 27 samples, an average of 0.5 g of secretions (range, 0.1-1.5 g) was collected. This is a rapid and convenient self-sampling method for obtaining relatively large volumes of undiluted cervicovaginal secretions. It should prove suitable for a wide range of assays, including those involving sexually transmitted diseases, microbicides, vaginal physiology, immunology, and pathophysiology.
Sample preparation for large-scale bioanalytical studies based on liquid chromatographic techniques.

Science.gov (United States)

Medvedovici, Andrei; Bacalum, Elena; David, Victor

2018-01-01

Quality of the analytical data obtained for large-scale and long term bioanalytical studies based on liquid chromatography depends on a number of experimental factors including the choice of sample preparation method. This review discusses this tedious part of bioanalytical studies, applied to large-scale samples and using liquid chromatography coupled with different detector types as core analytical technique. The main sample preparation methods included in this paper are protein precipitation, liquid-liquid extraction, solid-phase extraction, derivatization and their versions. They are discussed by analytical performances, fields of applications, advantages and disadvantages. The cited literature covers mainly the analytical achievements during the last decade, although several previous papers became more valuable in time and they are included in this review. Copyright © 2017 John Wiley & Sons, Ltd.
Comparison of Two Methods for Estimating the Sampling-Related Uncertainty of Satellite Rainfall Averages Based on a Large Radar Data Set

Science.gov (United States)

Lau, William K. M. (Technical Monitor); Bell, Thomas L.; Steiner, Matthias; Zhang, Yu; Wood, Eric F.

2002-01-01

The uncertainty of rainfall estimated from averages of discrete samples collected by a satellite is assessed using a multi-year radar data set covering a large portion of the United States. The sampling-related uncertainty of rainfall estimates is evaluated for all combinations of 100 km, 200 km, and 500 km space domains, 1 day, 5 day, and 30 day rainfall accumulations, and regular sampling time intervals of 1 h, 3 h, 6 h, 8 h, and 12 h. These extensive analyses are combined to characterize the sampling uncertainty as a function of space and time domain, sampling frequency, and rainfall characteristics by means of a simple scaling law. Moreover, it is shown that both parametric and non-parametric statistical techniques of estimating the sampling uncertainty produce comparable results. Sampling uncertainty estimates, however, do depend on the choice of technique for obtaining them. They can also vary considerably from case to case, reflecting the great variability of natural rainfall, and should therefore be expressed in probabilistic terms. Rainfall calibration errors are shown to affect comparison of results obtained by studies based on data from different climate regions and/or observation platforms.
Quantification of integrated HIV DNA by repetitive-sampling Alu-HIV PCR on the basis of poisson statistics.

Science.gov (United States)

De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos

2014-06-01

Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.
Performance modeling, loss networks, and statistical multiplexing

CERN Document Server

Mazumdar, Ravi

2009-01-01

This monograph presents a concise mathematical approach for modeling and analyzing the performance of communication networks with the aim of understanding the phenomenon of statistical multiplexing. The novelty of the monograph is the fresh approach and insights provided by a sample-path methodology for queueing models that highlights the important ideas of Palm distributions associated with traffic models and their role in performance measures. Also presented are recent ideas of large buffer, and many sources asymptotics that play an important role in understanding statistical multiplexing. I
The Impact of a Flipped Classroom Model of Learning on a Large Undergraduate Statistics Class

Science.gov (United States)

Nielson, Perpetua Lynne; Bean, Nathan William Bean; Larsen, Ross Allen Andrew

2018-01-01

We examine the impact of a flipped classroom model of learning on student performance and satisfaction in a large undergraduate introductory statistics class. Two professors each taught a lecture-section and a flipped-class section. Using MANCOVA, a linear combination of final exam scores, average quiz scores, and course ratings was compared for…
An 'intelligent' approach to radioimmunoassay sample counting employing a microprocessor-controlled sample counter

International Nuclear Information System (INIS)

Ekins, R.P.; Sufi, S.; Malan, P.G.

1978-01-01

The enormous impact on medical science in the last two decades of microanalytical techniques employing radioisotopic labels has, in turn, generated a large demand for automatic radioisotopic sample counters. Such instruments frequently comprise the most important item of capital equipment required in the use of radioimmunoassay and related techniques and often form a principle bottleneck in the flow of samples through a busy laboratory. It is therefore imperative that such instruments should be used 'intelligently' and in an optimal fashion to avoid both the very large capital expenditure involved in the unnecessary proliferation of instruments and the time delays arising from their sub-optimal use. Most of the current generation of radioactive sample counters nevertheless rely on primitive control mechanisms based on a simplistic statistical theory of radioactive sample counting which preclude their efficient and rational use. The fundamental principle upon which this approach is based is that it is useless to continue counting a radioactive sample for a time longer than that required to yield a significant increase in precision of the measurement. Thus, since substantial experimental errors occur during sample preparation, these errors should be assessed and must be related to the counting errors for that sample. The objective of the paper is to demonstrate that the combination of a realistic statistical assessment of radioactive sample measurement, together with the more sophisticated control mechanisms that modern microprocessor technology make possible, may often enable savings in counter usage of the order of 5- to 10-fold to be made. (author)
Finite-sample instrumental variables Inference using an Asymptotically Pivotal Statistic

NARCIS (Netherlands)

Bekker, P.; Kleibergen, F.R.

2001-01-01

The paper considers the K-statistic, Kleibergen’s (2000) adaptation ofthe Anderson-Rubin (AR) statistic in instrumental variables regression.Compared to the AR-statistic this K-statistic shows improvedasymptotic efficiency in terms of degrees of freedom in overidentifiedmodels and yet it shares,
Uncertainty budget in internal monostandard NAA for small and large size samples analysis

International Nuclear Information System (INIS)

Dasari, K.B.; Acharya, R.

2014-01-01

Total uncertainty budget evaluation on determined concentration value is important under quality assurance programme. Concentration calculation in NAA or carried out by relative NAA and k0 based internal monostandard NAA (IM-NAA) method. IM-NAA method has been used for small and large sample analysis of clay potteries. An attempt was made to identify the uncertainty components in IM-NAA and uncertainty budget for La in both small and large size samples has been evaluated and compared. (author)
A preliminary study on identification of Thai rice samples by INAA and statistical analysis

Science.gov (United States)

Kongsri, S.; Kukusamude, C.

2017-09-01

This study aims to investigate the elemental compositions in 93 Thai rice samples using instrumental neutron activation analysis (INAA) and to identify rice according to their types and rice cultivars using statistical analysis. As, Mg, Cl, Al, Br, Mn, K, Rb and Zn in Thai jasmine rice and Sung Yod rice samples were successfully determined by INAA. The accuracy and precision of the INAA method were verified by SRM 1568a Rice Flour. All elements were found to be in a good agreement with the certified values. The precisions in term of %RSD were lower than 7%. The LODs were obtained in range of 0.01 to 29 mg kg-1. The concentration of 9 elements distributed in Thai rice samples was evaluated and used as chemical indicators to identify the type of rice samples. The result found that Mg, Cl, As, Br, Mn, K, Rb, and Zn concentrations in Thai jasmine rice samples are significantly different but there was no evidence that Al is significantly different from concentration in Sung Yod rice samples at 95% confidence interval. Our results may provide preliminary information for discrimination of rice samples and may be useful database of Thai rice.

Statistical and quantitative research

International Nuclear Information System (INIS)

Anon.

1984-01-01

Environmental impacts may escape detection if the statistical tests used to analyze data from field studies are inadequate or the field design is not appropriate. To alleviate this problem, PNL scientists are doing theoretical research which will provide the basis for new sampling schemes or better methods to analyze and present data. Such efforts have resulted in recommendations about the optimal size of study plots, sampling intensity, field replication, and program duration. Costs associated with any of these factors can be substantial if, for example, attention is not paid to the adequacy of a sampling scheme. In the study of dynamics of large-mammal populations, the findings are sometimes surprising. For example, the survival of a grizzly bear population may hinge on the loss of one or two adult females per year
Statistical inference involving binomial and negative binomial parameters.

Science.gov (United States)

García-Pérez, Miguel A; Núñez-Antón, Vicente

2009-05-01

Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.
Finite-sample instrumental variables inference using an asymptotically pivotal statistic

NARCIS (Netherlands)

Bekker, Paul A.; Kleibergen, Frank

2001-01-01

The paper considers the K-statistic, Kleibergen’s (2000) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Compared to the AR-statistic this K-statistic shows improved asymptotic efficiency in terms of degrees of freedom in overidenti?ed models and yet it shares,
A critical discussion of null hypothesis significance testing and statistical power analysis within psychological research

DEFF Research Database (Denmark)

Jones, Allan; Sommerlund, Bo

2007-01-01

The uses of null hypothesis significance testing (NHST) and statistical power analysis within psychological research are critically discussed. The article looks at the problems of relying solely on NHST when dealing with small and large sample sizes. The use of power-analysis in estimating...... the potential error introduced by small and large samples is advocated. Power analysis is not recommended as a replacement to NHST but as an additional source of information about the phenomena under investigation. Moreover, the importance of conceptual analysis in relation to statistical analysis of hypothesis...
Evaluation of Respondent-Driven Sampling

Science.gov (United States)

McCreesh, Nicky; Frost, Simon; Seeley, Janet; Katongole, Joseph; Tarsh, Matilda Ndagire; Ndunguse, Richard; Jichi, Fatima; Lunel, Natasha L; Maher, Dermot; Johnston, Lisa G; Sonnenberg, Pam; Copas, Andrew J; Hayes, Richard J; White, Richard G

2012-01-01

Background Respondent-driven sampling is a novel variant of link-tracing sampling for estimating the characteristics of hard-to-reach groups, such as HIV prevalence in sex-workers. Despite its use by leading health organizations, the performance of this method in realistic situations is still largely unknown. We evaluated respondent-driven sampling by comparing estimates from a respondent-driven sampling survey with total-population data. Methods Total-population data on age, tribe, religion, socioeconomic status, sexual activity and HIV status were available on a population of 2402 male household-heads from an open cohort in rural Uganda. A respondent-driven sampling (RDS) survey was carried out in this population, employing current methods of sampling (RDS sample) and statistical inference (RDS estimates). Analyses were carried out for the full RDS sample and then repeated for the first 250 recruits (small sample). Results We recruited 927 household-heads. Full and small RDS samples were largely representative of the total population, but both samples under-represented men who were younger, of higher socioeconomic status, and with unknown sexual activity and HIV status. Respondent-driven-sampling statistical-inference methods failed to reduce these biases. Only 31%-37% (depending on method and sample size) of RDS estimates were closer to the true population proportions than the RDS sample proportions. Only 50%-74% of respondent-driven-sampling bootstrap 95% confidence intervals included the population proportion. Conclusions Respondent-driven sampling produced a generally representative sample of this well-connected non-hidden population. However, current respondent-driven-sampling inference methods failed to reduce bias when it occurred. Whether the data required to remove bias and measure precision can be collected in a respondent-driven sampling survey is unresolved. Respondent-driven sampling should be regarded as a (potentially superior) form of convenience-sampling
Evaluation of respondent-driven sampling.

Science.gov (United States)

McCreesh, Nicky; Frost, Simon D W; Seeley, Janet; Katongole, Joseph; Tarsh, Matilda N; Ndunguse, Richard; Jichi, Fatima; Lunel, Natasha L; Maher, Dermot; Johnston, Lisa G; Sonnenberg, Pam; Copas, Andrew J; Hayes, Richard J; White, Richard G

2012-01-01

Respondent-driven sampling is a novel variant of link-tracing sampling for estimating the characteristics of hard-to-reach groups, such as HIV prevalence in sex workers. Despite its use by leading health organizations, the performance of this method in realistic situations is still largely unknown. We evaluated respondent-driven sampling by comparing estimates from a respondent-driven sampling survey with total population data. Total population data on age, tribe, religion, socioeconomic status, sexual activity, and HIV status were available on a population of 2402 male household heads from an open cohort in rural Uganda. A respondent-driven sampling (RDS) survey was carried out in this population, using current methods of sampling (RDS sample) and statistical inference (RDS estimates). Analyses were carried out for the full RDS sample and then repeated for the first 250 recruits (small sample). We recruited 927 household heads. Full and small RDS samples were largely representative of the total population, but both samples underrepresented men who were younger, of higher socioeconomic status, and with unknown sexual activity and HIV status. Respondent-driven sampling statistical inference methods failed to reduce these biases. Only 31%-37% (depending on method and sample size) of RDS estimates were closer to the true population proportions than the RDS sample proportions. Only 50%-74% of respondent-driven sampling bootstrap 95% confidence intervals included the population proportion. Respondent-driven sampling produced a generally representative sample of this well-connected nonhidden population. However, current respondent-driven sampling inference methods failed to reduce bias when it occurred. Whether the data required to remove bias and measure precision can be collected in a respondent-driven sampling survey is unresolved. Respondent-driven sampling should be regarded as a (potentially superior) form of convenience sampling method, and caution is required
CONFIDENCE LEVELS AND/VS. STATISTICAL HYPOTHESIS TESTING IN STATISTICAL ANALYSIS. CASE STUDY

Directory of Open Access Journals (Sweden)

ILEANA BRUDIU

2009-05-01

Full Text Available Estimated parameters with confidence intervals and testing statistical assumptions used in statistical analysis to obtain conclusions on research from a sample extracted from the population. Paper to the case study presented aims to highlight the importance of volume of sample taken in the study and how this reflects on the results obtained when using confidence intervals and testing for pregnant. If statistical testing hypotheses not only give an answer "yes" or "no" to some questions of statistical estimation using statistical confidence intervals provides more information than a test statistic, show high degree of uncertainty arising from small samples and findings build in the "marginally significant" or "almost significant (p very close to 0.05.
Some challenges with statistical inference in adaptive designs.

Science.gov (United States)

Hung, H M James; Wang, Sue-Jane; Yang, Peiling

2014-01-01

Adaptive designs have generated a great deal of attention to clinical trial communities. The literature contains many statistical methods to deal with added statistical uncertainties concerning the adaptations. Increasingly encountered in regulatory applications are adaptive statistical information designs that allow modification of sample size or related statistical information and adaptive selection designs that allow selection of doses or patient populations during the course of a clinical trial. For adaptive statistical information designs, a few statistical testing methods are mathematically equivalent, as a number of articles have stipulated, but arguably there are large differences in their practical ramifications. We pinpoint some undesirable features of these methods in this work. For adaptive selection designs, the selection based on biomarker data for testing the correlated clinical endpoints may increase statistical uncertainty in terms of type I error probability, and most importantly the increased statistical uncertainty may be impossible to assess.
Multivariate statistical analysis a high-dimensional approach

CERN Document Server

Serdobolskii, V

2000-01-01

In the last few decades the accumulation of large amounts of in formation in numerous applications. has stimtllated an increased in terest in multivariate analysis. Computer technologies allow one to use multi-dimensional and multi-parametric models successfully. At the same time, an interest arose in statistical analysis with a de ficiency of sample data. Nevertheless, it is difficult to describe the recent state of affairs in applied multivariate methods as satisfactory. Unimprovable (dominating) statistical procedures are still unknown except for a few specific cases. The simplest problem of estimat ing the mean vector with minimum quadratic risk is unsolved, even for normal distributions. Commonly used standard linear multivari ate procedures based on the inversion of sample covariance matrices can lead to unstable results or provide no solution in dependence of data. Programs included in standard statistical packages cannot process 'multi-collinear data' and there are no theoretical recommen ...
Robust statistical methods with R

CERN Document Server

Jureckova, Jana

2005-01-01

Robust statistical methods were developed to supplement the classical procedures when the data violate classical assumptions. They are ideally suited to applied research across a broad spectrum of study, yet most books on the subject are narrowly focused, overly theoretical, or simply outdated. Robust Statistical Methods with R provides a systematic treatment of robust procedures with an emphasis on practical application.The authors work from underlying mathematical tools to implementation, paying special attention to the computational aspects. They cover the whole range of robust methods, including differentiable statistical functions, distance of measures, influence functions, and asymptotic distributions, in a rigorous yet approachable manner. Highlighting hands-on problem solving, many examples and computational algorithms using the R software supplement the discussion. The book examines the characteristics of robustness, estimators of real parameter, large sample properties, and goodness-of-fit tests. It...
Adaptive sampling rate control for networked systems based on statistical characteristics of packet disordering.

Science.gov (United States)

Li, Jin-Na; Er, Meng-Joo; Tan, Yen-Kheng; Yu, Hai-Bin; Zeng, Peng

2015-09-01

This paper investigates an adaptive sampling rate control scheme for networked control systems (NCSs) subject to packet disordering. The main objectives of the proposed scheme are (a) to avoid heavy packet disordering existing in communication networks and (b) to stabilize NCSs with packet disordering, transmission delay and packet loss. First, a novel sampling rate control algorithm based on statistical characteristics of disordering entropy is proposed; secondly, an augmented closed-loop NCS that consists of a plant, a sampler and a state-feedback controller is transformed into an uncertain and stochastic system, which facilitates the controller design. Then, a sufficient condition for stochastic stability in terms of Linear Matrix Inequalities (LMIs) is given. Moreover, an adaptive tracking controller is designed such that the sampling period tracks a desired sampling period, which represents a significant contribution. Finally, experimental results are given to illustrate the effectiveness and advantages of the proposed scheme. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Sampling-based approaches to improve estimation of mortality among patient dropouts: experience from a large PEPFAR-funded program in Western Kenya.

Directory of Open Access Journals (Sweden)

Constantin T Yiannoutsos

Full Text Available Monitoring and evaluation (M&E of HIV care and treatment programs is impacted by losses to follow-up (LTFU in the patient population. The severity of this effect is undeniable but its extent unknown. Tracing all lost patients addresses this but census methods are not feasible in programs involving rapid scale-up of HIV treatment in the developing world. Sampling-based approaches and statistical adjustment are the only scaleable methods permitting accurate estimation of M&E indices.In a large antiretroviral therapy (ART program in western Kenya, we assessed the impact of LTFU on estimating patient mortality among 8,977 adult clients of whom, 3,624 were LTFU. Overall, dropouts were more likely male (36.8% versus 33.7%; p = 0.003, and younger than non-dropouts (35.3 versus 35.7 years old; p = 0.020, with lower median CD4 count at enrollment (160 versus 189 cells/ml; p<0.001 and WHO stage 3-4 disease (47.5% versus 41.1%; p<0.001. Urban clinic clients were 75.0% of non-dropouts but 70.3% of dropouts (p<0.001. Of the 3,624 dropouts, 1,143 were sought and 621 had their vital status ascertained. Statistical techniques were used to adjust mortality estimates based on information obtained from located LTFU patients. Observed mortality estimates one year after enrollment were 1.7% (95% CI 1.3%-2.0%, revised to 2.8% (2.3%-3.1% when deaths discovered through outreach were added and adjusted to 9.2% (7.8%-10.6% and 9.9% (8.4%-11.5% through statistical modeling depending on the method used. The estimates 12 months after ART initiation were 1.7% (1.3%-2.2%, 3.4% (2.9%-4.0%, 10.5% (8.7%-12.3% and 10.7% (8.9%-12.6% respectively. CONCLUSIONS/SIGNIFICANCE ABSTRACT: Assessment of the impact of LTFU is critical in program M&E as estimated mortality based on passive monitoring may underestimate true mortality by up to 80%. This bias can be ameliorated by tracing a sample of dropouts and statistically adjust the mortality estimates to properly evaluate and guide large
A large sample of Kohonen-selected SDSS quasars with weak emission lines: selection effects and statistical properties

Science.gov (United States)

Meusinger, H.; Balafkan, N.

2014-08-01

Aims: A tiny fraction of the quasar population shows remarkably weak emission lines. Several hypotheses have been developed, but the weak line quasar (WLQ) phenomenon still remains puzzling. The aim of this study was to create a sizeable sample of WLQs and WLQ-like objects and to evaluate various properties of this sample. Methods: We performed a search for WLQs in the spectroscopic data from the Sloan Digital Sky Survey Data Release 7 based on Kohonen self-organising maps for nearly 105 quasar spectra. The final sample consists of 365 quasars in the redshift range z = 0.6 - 4.2 (z¯ = 1.50 ± 0.45) and includes in particular a subsample of 46 WLQs with equivalent widths WMg iiattention was paid to selection effects. Results: The WLQs have, on average, significantly higher luminosities, Eddington ratios, and accretion rates. About half of the excess comes from a selection bias, but an intrinsic excess remains probably caused primarily by higher accretion rates. The spectral energy distribution shows a bluer continuum at rest-frame wavelengths ≳1500 Å. The variability in the optical and UV is relatively low, even taking the variability-luminosity anti-correlation into account. The percentage of radio detected quasars and of core-dominant radio sources is significantly higher than for the control sample, whereas the mean radio-loudness is lower. Conclusions: The properties of our WLQ sample can be consistently understood assuming that it consists of a mix of quasars at the beginning of a stage of increased accretion activity and of beamed radio-quiet quasars. The higher luminosities and Eddington ratios in combination with a bluer spectral energy distribution can be explained by hotter continua, i.e. higher accretion rates. If quasar activity consists of subphases with different accretion rates, a change towards a higher rate is probably accompanied by an only slow development of the broad line region. The composite WLQ spectrum can be reasonably matched by the
An examination of the RCMAS-2 scores across gender, ethnic background, and age in a large Asian school sample.

Science.gov (United States)

Ang, Rebecca P; Lowe, Patricia A; Yusof, Noradlin

2011-12-01

The present study investigated the factor structure, reliability, convergent and discriminant validity, and U.S. norms of the Revised Children's Manifest Anxiety Scale, Second Edition (RCMAS-2; C. R. Reynolds & B. O. Richmond, 2008a) scores in a Singapore sample of 1,618 school-age children and adolescents. Although there were small statistically significant differences in the average RCMAS-2 T scores found across various demographic groupings, on the whole, the U.S. norms appear adequate for use in the Asian Singapore sample. Results from item bias analyses suggested that biased items detected had small effects and were counterbalanced across gender and ethnicity, and hence, their relative impact on test score variation appears to be minimal. Results of factor analyses on the RCMAS-2 scores supported the presence of a large general anxiety factor, the Total Anxiety factor, and the 5-factor structure found in U.S. samples was replicated. Both the large general anxiety factor and the 5-factor solution were invariant across gender and ethnic background. Internal consistency estimates ranged from adequate to good, and 2-week test-retest reliability estimates were comparable to previous studies. Evidence providing support for convergent and discriminant validity of the RCMAS-2 scores was also found. Taken together, findings provide additional cross-cultural evidence of the appropriateness and usefulness of the RCMAS-2 as a measure of anxiety in Asian Singaporean school-age children and adolescents.
An examination of smoking behavior and opinions about smoke-free environments in a large sample of sexual and gender minority community members.

Science.gov (United States)

McElroy, Jane A; Everett, Kevin D; Zaniletti, Isabella

2011-06-01

The purpose of this study is to more completely quantify smoking rate and support for smoke-free policies in private and public environments from a large sample of self-identified sexual and gender minority (SGM) populations. A targeted sampling strategy recruited participants from 4 Missouri Pride Festivals and online surveys targeted to SGM populations during the summer of 2008. A 24-item survey gathered information on gender and sexual orientation, smoking status, and questions assessing behaviors and preferences related to smoke-free policies. The project recruited participants through Pride Festivals (n = 2,676) and Web-based surveys (n = 231) representing numerous sexual and gender orientations and the racial composite of the state of Missouri. Differences were found between the Pride Festivals sample and the Web-based sample, including smoking rates, with current smoking for the Web-based sample (22%) significantly less than the Pride Festivals sample (37%; p times more likely to be current smokers compared with the study's heterosexual group (n = 436; p = .005). Statistically fewer SGM racial minorities (33%) are current smokers compared with SGM Whites (37%; p = .04). Support and preferences for public and private smoke-free environments were generally low in the SGM population. The strategic targeting method achieved a large and diverse sample. The findings of high rates of smoking coupled with generally low levels of support for smoke-free public policies in the SGM community highlight the need for additional research to inform programmatic attempts to reduce tobacco use and increase support for smoke-free environments.
Predictability of the recent slowdown and subsequent recovery of large-scale surface warming using statistical methods

Science.gov (United States)

Mann, Michael E.; Steinman, Byron A.; Miller, Sonya K.; Frankcombe, Leela M.; England, Matthew H.; Cheung, Anson H.

2016-04-01

The temporary slowdown in large-scale surface warming during the early 2000s has been attributed to both external and internal sources of climate variability. Using semiempirical estimates of the internal low-frequency variability component in Northern Hemisphere, Atlantic, and Pacific surface temperatures in concert with statistical hindcast experiments, we investigate whether the slowdown and its recent recovery were predictable. We conclude that the internal variability of the North Pacific, which played a critical role in the slowdown, does not appear to have been predictable using statistical forecast methods. An additional minor contribution from the North Atlantic, by contrast, appears to exhibit some predictability. While our analyses focus on combining semiempirical estimates of internal climatic variability with statistical hindcast experiments, possible implications for initialized model predictions are also discussed.
Automated, feature-based image alignment for high-resolution imaging mass spectrometry of large biological samples

NARCIS (Netherlands)

Broersen, A.; Liere, van R.; Altelaar, A.F.M.; Heeren, R.M.A.; McDonnell, L.A.

2008-01-01

High-resolution imaging mass spectrometry of large biological samples is the goal of several research groups. In mosaic imaging, the most common method, the large sample is divided into a mosaic of small areas that are then analyzed with high resolution. Here we present an automated alignment
Mapping cell populations in flow cytometry data for cross‐sample comparison using the Friedman–Rafsky test statistic as a distance measure

Science.gov (United States)

Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu

2015-01-01

Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing
Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure.

Science.gov (United States)

Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H

2016-01-01

Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell
Determination of Sr-90 in milk samples from the study of statistical results

Directory of Open Access Journals (Sweden)

Otero-Pazos Alberto

2017-01-01

Full Text Available The determination of 90Sr in milk samples is the main objective of radiation monitoring laboratories because of its environmental importance. In this paper the concentration of activity of 39 milk samples was obtained through radiochemical separation based on selective retention of Sr in a cationic resin (Dowex 50WX8, 50-100 mesh and subsequent determination by a low-level proportional gas counter. The results were checked by performing the measurement of the Sr concentration by using the flame atomic absorption spectroscopy technique, to finally obtain the mass of 90Sr. From the data obtained a statistical treatment was performed using linear regressions. A reliable estimate of the mass of 90Sr was obtained based on the gravimetric technique, and secondly, the counts per minute of the third measurement in the 90Sr and 90Y equilibrium, without having to perform the analysis. These estimates have been verified with 19 milk samples, obtaining overlapping results. The novelty of the manuscript is the possibility of determining the concentration of 90Sr in milk samples, without the need to perform the third measurement in the equilibrium.

Environmental restoration and statistics: Issues and needs

International Nuclear Information System (INIS)

Gilbert, R.O.

1991-10-01

Statisticians have a vital role to play in environmental restoration (ER) activities. One facet of that role is to point out where additional work is needed to develop statistical sampling plans and data analyses that meet the needs of ER. This paper is an attempt to show where statistics fits into the ER process. The statistician, as member of the ER planning team, works collaboratively with the team to develop the site characterization sampling design, so that data of the quality and quantity required by the specified data quality objectives (DQOs) are obtained. At the same time, the statistician works with the rest of the planning team to design and implement, when appropriate, the observational approach to streamline the ER process and reduce costs. The statistician will also provide the expertise needed to select or develop appropriate tools for statistical analysis that are suited for problems that are common to waste-site data. These data problems include highly heterogeneous waste forms, large variability in concentrations over space, correlated data, data that do not have a normal (Gaussian) distribution, and measurements below detection limits. Other problems include environmental transport and risk models that yield highly uncertain predictions, and the need to effectively communicate to the public highly technical information, such as sampling plans, site characterization data, statistical analysis results, and risk estimates. Even though some statistical analysis methods are available ''off the shelf'' for use in ER, these problems require the development of additional statistical tools, as discussed in this paper. 29 refs
Sampling large landscapes with small-scale stratification-User's Manual

Science.gov (United States)

Bart, Jonathan

2011-01-01

This manual explains procedures for partitioning a large landscape into plots, assigning the plots to strata, and selecting plots in each stratum to be surveyed. These steps are referred to as the "sampling large landscapes (SLL) process." We assume that users of the manual have a moderate knowledge of ArcGIS and Microsoft ® Excel. The manual is written for a single user but in many cases, some steps will be carried out by a biologist designing the survey and some steps will be carried out by a quantitative assistant. Thus, the manual essentially may be passed back and forth between these users. The SLL process primarily has been used to survey birds, and we refer to birds as subjects of the counts. The process, however, could be used to count any objects. ®
Transport Coefficients from Large Deviation Functions

OpenAIRE

Gao, Chloe Ya; Limmer, David T.

2017-01-01

We describe a method for computing transport coefficients from the direct evaluation of large deviation functions. This method is general, relying on only equilibrium fluctuations, and is statistically efficient, employing trajectory based importance sampling. Equilibrium fluctuations of molecular currents are characterized by their large deviation functions, which are scaled cumulant generating functions analogous to the free energies. A diffusion Monte Carlo algorithm is used to evaluate th...
Effect of the Target Motion Sampling Temperature Treatment Method on the Statistics and Performance

Science.gov (United States)

Viitanen, Tuomas; Leppänen, Jaakko

2014-06-01

Target Motion Sampling (TMS) is a stochastic on-the-fly temperature treatment technique that is being developed as a part of the Monte Carlo reactor physics code Serpent. The method provides for modeling of arbitrary temperatures in continuous-energy Monte Carlo tracking routines with only one set of cross sections stored in the computer memory. Previously, only the performance of the TMS method in terms of CPU time per transported neutron has been discussed. Since the effective cross sections are not calculated at any point of a transport simulation with TMS, reaction rate estimators must be scored using sampled cross sections, which is expected to increase the variances and, consequently, to decrease the figures-of-merit. This paper examines the effects of the TMS on the statistics and performance in practical calculations involving reaction rate estimation with collision estimators. Against all expectations it turned out that the usage of sampled response values has no practical effect on the performance of reaction rate estimators when using TMS with elevated basis cross section temperatures (EBT), i.e. the usual way. With 0 Kelvin cross sections a significant increase in the variances of capture rate estimators was observed right below the energy region of unresolved resonances, but at these energies the figures-of-merit could be increased using a simple resampling technique to decrease the variances of the responses. It was, however, noticed that the usage of the TMS method increases the statistical deviances of all estimators, including the flux estimator, by tens of percents in the vicinity of very strong resonances. This effect is actually not related to the usage of sampled responses, but is instead an inherent property of the TMS tracking method and concerns both EBT and 0 K calculations.
Municipal solid waste composition: Sampling methodology, statistical analyses, and case study evaluation

International Nuclear Information System (INIS)

Edjabou, Maklawe Essonanawe; Jensen, Morten Bang; Götze, Ramona; Pivnenko, Kostyantyn; Petersen, Claus; Scheutz, Charlotte; Astrup, Thomas Fruergaard

2015-01-01

Highlights: • Tiered approach to waste sorting ensures flexibility and facilitates comparison of solid waste composition data. • Food and miscellaneous wastes are the main fractions contributing to the residual household waste. • Separation of food packaging from food leftovers during sorting is not critical for determination of the solid waste composition. - Abstract: Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10–50 waste fractions, organised according to a three-level (tiered approach) facilitating comparison of the waste data between individual sub-areas with different fractionation (waste from one municipality was sorted at “Level III”, e.g. detailed, while the two others were sorted only at “Level I”). The results showed that residual household waste mainly contained food waste (42 ± 5%, mass per wet basis) and miscellaneous combustibles (18 ± 3%, mass per wet basis). The residual household waste generation rate in the study areas was 3–4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three municipalities. While the waste generation rates were similar for each of the two housing types (single
Municipal solid waste composition: Sampling methodology, statistical analyses, and case study evaluation

Energy Technology Data Exchange (ETDEWEB)

Edjabou, Maklawe Essonanawe, E-mail: vine@env.dtu.dk [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark); Jensen, Morten Bang; Götze, Ramona; Pivnenko, Kostyantyn [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark); Petersen, Claus [Econet AS, Omøgade 8, 2.sal, 2100 Copenhagen (Denmark); Scheutz, Charlotte; Astrup, Thomas Fruergaard [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark)

2015-02-15

Highlights: • Tiered approach to waste sorting ensures flexibility and facilitates comparison of solid waste composition data. • Food and miscellaneous wastes are the main fractions contributing to the residual household waste. • Separation of food packaging from food leftovers during sorting is not critical for determination of the solid waste composition. - Abstract: Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10–50 waste fractions, organised according to a three-level (tiered approach) facilitating comparison of the waste data between individual sub-areas with different fractionation (waste from one municipality was sorted at “Level III”, e.g. detailed, while the two others were sorted only at “Level I”). The results showed that residual household waste mainly contained food waste (42 ± 5%, mass per wet basis) and miscellaneous combustibles (18 ± 3%, mass per wet basis). The residual household waste generation rate in the study areas was 3–4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three municipalities. While the waste generation rates were similar for each of the two housing types (single
Prospective elementary and secondary school mathematics teachers’ statistical reasoning

Directory of Open Access Journals (Sweden)

Rabia KARATOPRAK

2015-04-01

Full Text Available This study investigated prospective elementary (PEMTs and secondary (PSMTs school mathematics teachers’ statistical reasoning. The study began with the adaptation of the Statistical Reasoning Assessment (Garfield, 2003 test. Then, the test was administered to 82 PEMTs and 91 PSMTs in a metropolitan city of Turkey. Results showed that both groups were equally successful in understanding independence, and understanding importance of large samples. However, results from selecting appropriate measures of center together with the misconceptions assessing the same subscales showed that both groups selected mode rather than mean as an appropriate average. This suggested their lack of attention to the categorical and interval/ratio variables while examining data. Similarly, both groups were successful in interpreting and computing probability; however, they had equiprobability bias, law of small numbers and representativeness misconceptions. The results imply a change in some questions in the Statistical Reasoning Assessment test and that teacher training programs should include statistics courses focusing on studying characteristics of samples.
Absolute activity determinations on large volume geological samples independent of self-absorption effects

International Nuclear Information System (INIS)

Wilson, W.E.

1980-01-01

This paper describes a method for measuring the absolute activity of large volume samples by γ-spectroscopy independent of self-absorption effects using Ge detectors. The method yields accurate matrix independent results at the expense of replicative counting of the unknown sample. (orig./HP)
Statistical Modeling of Large Wind Plant System's Generation - A Case Study

International Nuclear Information System (INIS)

Sabolic, D.

2014-01-01

This paper presents simplistic, yet very accurate, descriptive statistical models of various static and dynamic parameters of energy output from a large system of wind plants operated by Bonneville Power Administration (BPA), USA. The system's size at the end of 2013 was 4515 MW of installed capacity. The 5-minute readings from the beginning of 2007 to the end of 2013, recorded and published by BPA, were used to derive a number of experimental distributions, which were then used to devise theoretic statistical models with merely one or two parameters. In spite of the simplicity, they reproduced experimental data with great accuracy, which was checked by rigorous tests of goodness-of-fit. Statistical distribution functions were obtained for the following wind generation-related quantities: total generation as percentage of total installed capacity; change in total generation power in 5, 10, 15, 20, 25, 30, 45, and 60 minutes as percentage of total installed capacity; duration of intervals with total generated power, expressed as percentage of total installed capacity, lower than certain pre-specified level. Limitation of total installed wind plant capacity, when it is determined by regulation demand from wind plants, is discussed, too. The models presented here can be utilized in analyses related to power system economics/policy, which is also briefly discussed in the paper. (author).
Sample Size and Statistical Conclusions from Tests of Fit to the Rasch Model According to the Rasch Unidimensional Measurement Model (Rumm) Program in Health Outcome Measurement.

Science.gov (United States)

Hagell, Peter; Westergren, Albert

Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).
Understanding Statistics and Statistics Education: A Chinese Perspective

Science.gov (United States)

Shi, Ning-Zhong; He, Xuming; Tao, Jian

2009-01-01

In recent years, statistics education in China has made great strides. However, there still exists a fairly large gap with the advanced levels of statistics education in more developed countries. In this paper, we identify some existing problems in statistics education in Chinese schools and make some proposals as to how they may be overcome. We…
Statistical sampling methods for soils monitoring

Science.gov (United States)

Ann M. Abbott

2010-01-01

Development of the best sampling design to answer a research question should be an interactive venture between the land manager or researcher and statisticians, and is the result of answering various questions. A series of questions that can be asked to guide the researcher in making decisions that will arrive at an effective sampling plan are described, and a case...
An easy and low cost option for economic statistical process control ...

African Journals Online (AJOL)

a large number of nonconforming products are manufactured. ... size, n, sampling interval, h, and control limit parameter, k, that minimize the ...... [11] Montgomery DC, 2001, Introduction to statistical quality control, 4th Edition, John Wiley, New.
Procedure for plutonium analysis of large (100g) soil and sediment samples

International Nuclear Information System (INIS)

Meadows, J.W.T.; Schweiger, J.S.; Mendoza, B.; Stone, R.

1975-01-01

A method for the complete dissolution of large soil or sediment samples is described. This method is in routine usage at Lawrence Livermore Laboratory for the analysis of fall-out levels of Pu in soils and sediments. Intercomparison with partial dissolution (leach) techniques shows the complete dissolution method to be superior for the determination of plutonium in a wide variety of environmental samples. (author)
Some statistical aspects of the cleanup of Enewetak Atoll

International Nuclear Information System (INIS)

Barnes, M.G.; Giacomini, J.J.; Friesen, H.N.

1979-01-01

Cleaning up the radionuclide contamination at Enewetak Atoll has involved a number of statistical design problems. Theoretical considerations led to choosing a grid sampling pattern; practical problems sometimes lead to resampling on a finer grid. Other problems associated with using grids have been both physical and statistical. The standard sampling system is an in situ intrinsic gamma detector which measures americium concentration. The cleanup guidelines include plutonium concentration, so additional sampling of soil is required to establish Pu/Am ratios. The soil sampling design included both guidelines for location of the samples and also a special pattern of subsamples making up composite samples. The large variance of the soil, sample results makes comparison between the two types difficult anyway, but this is compounded by vegetation attenuation of the in situ readings, soil disturbance influences, and differences in devegetation methods. The constraints inherent in doing what amounts to a research and development project, on a limited budget of time and money, in a field engineering environment are also considered
Fast sampling from a Hidden Markov Model posterior for large data

DEFF Research Database (Denmark)

Bonnevie, Rasmus; Hansen, Lars Kai

2014-01-01

Hidden Markov Models are of interest in a broad set of applications including modern data driven systems involving very large data sets. However, approximate inference methods based on Bayesian averaging are precluded in such applications as each sampling step requires a full sweep over the data...
17 CFR Appendix B to Part 420 - Sample Large Position Report

Science.gov (United States)

2010-04-01

..., and as collateral for financial derivatives and other securities transactions $ Total Memorandum 1... 17 Commodity and Securities Exchanges 3 2010-04-01 2010-04-01 false Sample Large Position Report B Appendix B to Part 420 Commodity and Securities Exchanges DEPARTMENT OF THE TREASURY REGULATIONS UNDER...
The relation between statistical power and inference in fMRI.

Directory of Open Access Journals (Sweden)

Henk R Cremers

Full Text Available Statistically underpowered studies can result in experimental failure even when all other experimental considerations have been addressed impeccably. In fMRI the combination of a large number of dependent variables, a relatively small number of observations (subjects, and a need to correct for multiple comparisons can decrease statistical power dramatically. This problem has been clearly addressed yet remains controversial-especially in regards to the expected effect sizes in fMRI, and especially for between-subjects effects such as group comparisons and brain-behavior correlations. We aimed to clarify the power problem by considering and contrasting two simulated scenarios of such possible brain-behavior correlations: weak diffuse effects and strong localized effects. Sampling from these scenarios shows that, particularly in the weak diffuse scenario, common sample sizes (n = 20-30 display extremely low statistical power, poorly represent the actual effects in the full sample, and show large variation on subsequent replications. Empirical data from the Human Connectome Project resembles the weak diffuse scenario much more than the localized strong scenario, which underscores the extent of the power problem for many studies. Possible solutions to the power problem include increasing the sample size, using less stringent thresholds, or focusing on a region-of-interest. However, these approaches are not always feasible and some have major drawbacks. The most prominent solutions that may help address the power problem include model-based (multivariate prediction methods and meta-analyses with related synthesis-oriented approaches.
Statistical theory and inference

CERN Document Server

Olive, David J

2014-01-01

This text is for a one semester graduate course in statistical theory and covers minimal and complete sufficient statistics, maximum likelihood estimators, method of moments, bias and mean square error, uniform minimum variance estimators and the Cramer-Rao lower bound, an introduction to large sample theory, likelihood ratio tests and uniformly most powerful tests and the Neyman Pearson Lemma. A major goal of this text is to make these topics much more accessible to students by using the theory of exponential families. Exponential families, indicator functions and the support of the distribution are used throughout the text to simplify the theory. More than 50 ``brand name" distributions are used to illustrate the theory with many examples of exponential families, maximum likelihood estimators and uniformly minimum variance unbiased estimators. There are many homework problems with over 30 pages of solutions.
Statistics

CERN Document Server

Hayslett, H T

1991-01-01

Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the

The significance of Sampling Design on Inference: An Analysis of Binary Outcome Model of Children’s Schooling Using Indonesian Large Multi-stage Sampling Data

OpenAIRE

Ekki Syamsulhakim

2008-01-01

This paper aims to exercise a rather recent trend in applied microeconometrics, namely the effect of sampling design on statistical inference, especially on binary outcome model. Many theoretical research in econometrics have shown the inappropriateness of applying i.i.dassumed statistical analysis on non-i.i.d data. These research have provided proofs showing that applying the iid-assumed analysis on a non-iid observations would result in an inflated standard errors which could make the esti...
MULTI-LEVEL SAMPLING APPROACH FOR CONTINOUS LOSS DETECTION USING ITERATIVE WINDOW AND STATISTICAL MODEL

OpenAIRE

Mohd Fo'ad Rohani; Mohd Aizaini Maarof; Ali Selamat; Houssain Kettani

2010-01-01

This paper proposes a Multi-Level Sampling (MLS) approach for continuous Loss of Self-Similarity (LoSS) detection using iterative window. The method defines LoSS based on Second Order Self-Similarity (SOSS) statistical model. The Optimization Method (OM) is used to estimate self-similarity parameter since it is fast and more accurate in comparison with other estimation methods known in the literature. Probability of LoSS detection is introduced to measure continuous LoSS detection performance...
Sampling of charged liquid radwaste stored in large tanks

International Nuclear Information System (INIS)

Tchemitcheff, E.; Domage, M.; Bernard-Bruls, X.

1995-01-01

The final safe disposal of radwaste, in France and elsewhere, entails, for liquid effluents, their conversion to a stable solid form, hence implying their conditioning. The production of conditioned waste with the requisite quality, traceability of the characteristics of the packages produced, and safe operation of the conditioning processes, implies at least the accurate knowledge of the chemical and radiochemical properties of the effluents concerned. The problem in sampling the normally charged effluents is aggravated for effluents that have been stored for several years in very large tanks, without stirring and retrieval systems. In 1992, SGN was asked by Cogema to study the retrieval and conditioning of LL/ML chemical sludge and spent ion-exchange resins produced in the operation of the UP2 400 plant at La Hague, and stored temporarily in rectangular silos and tanks. The sampling aspect was crucial for validating the inventories, identifying the problems liable to arise in the aging of the effluents, dimensioning the retrieval systems and checking the transferability and compatibility with the downstream conditioning process. Two innovative self-contained systems were developed and built for sampling operations, positioned above the tanks concerned. Both systems have been operated in active conditions and have proved totally satisfactory for taking representative samples. Today SGN can propose industrially proven overall solutions, adaptable to the various constraints of many spent fuel cycle operators
STATISTICAL EVALUATION OF SMALL SCALE MIXING DEMONSTRATION SAMPLING AND BATCH TRANSFER PERFORMANCE - 12093

Energy Technology Data Exchange (ETDEWEB)

GREER DA; THIEN MG

2012-01-12

The ability to effectively mix, sample, certify, and deliver consistent batches of High Level Waste (HLW) feed from the Hanford Double Shell Tanks (DST) to the Waste Treatment and Immobilization Plant (WTP) presents a significant mission risk with potential to impact mission length and the quantity of HLW glass produced. DOE's Tank Operations Contractor, Washington River Protection Solutions (WRPS) has previously presented the results of mixing performance in two different sizes of small scale DSTs to support scale up estimates of full scale DST mixing performance. Currently, sufficient sampling of DSTs is one of the largest programmatic risks that could prevent timely delivery of high level waste to the WTP. WRPS has performed small scale mixing and sampling demonstrations to study the ability to sufficiently sample the tanks. The statistical evaluation of the demonstration results which lead to the conclusion that the two scales of small DST are behaving similarly and that full scale performance is predictable will be presented. This work is essential to reduce the risk of requiring a new dedicated feed sampling facility and will guide future optimization work to ensure the waste feed delivery mission will be accomplished successfully. This paper will focus on the analytical data collected from mixing, sampling, and batch transfer testing from the small scale mixing demonstration tanks and how those data are being interpreted to begin to understand the relationship between samples taken prior to transfer and samples from the subsequent batches transferred. An overview of the types of data collected and examples of typical raw data will be provided. The paper will then discuss the processing and manipulation of the data which is necessary to begin evaluating sampling and batch transfer performance. This discussion will also include the evaluation of the analytical measurement capability with regard to the simulant material used in the demonstration tests. The
Development of Large Sample Neutron Activation Technique for New Applications in Thailand

International Nuclear Information System (INIS)

Laoharojanaphand, S.; Tippayakul, C.; Wonglee, S.; Channuie, J.

2018-01-01

The development of the Large Sample Neutron Activation Analysis (LSNAA) in Thailand is presented in this paper. The technique had been firstly developed with rice sample as the test subject. The Thai Research Reactor-1/Modification 1 (TRR-1/M1) was used as the neutron source. The first step was to select and characterize an appropriate irradiation facility for the research. An out-core irradiation facility (A4 position) was first attempted. The results performed with the A4 facility were then used as guides for the subsequent experiments with the thermal column facility. The characterization of the thermal column was performed with Cu-wire to determine spatial distribution without and with rice sample. The flux depression without rice sample was observed to be less than 30% while the flux depression with rice sample increased to within 60%. The flux monitors internal to the rice sample were used to determine average flux over the rice sample. The gamma selfshielding effect during gamma measurement was corrected using the Monte Carlo simulation. The ratio between the efficiencies of the volume source and the point source for each energy point was calculated by the MCNPX code. The research team adopted the k0-NAA methodology to calculate the element concentration in the research. The k0-NAA program which developed by IAEA was set up to simulate the conditions of the irradiation and measurement facilities used in this research. The element concentrations in the bulk rice sample were then calculated taking into account the flux depression and gamma efficiency corrections. At the moment, the results still show large discrepancies with the reference values. However, more research on the validation will be performed to identify sources of errors. Moreover, this LS-NAA technique was introduced for the activation analysis of the IAEA archaeological mock-up. The results are provided in this report. (author)
CO2 isotope analyses using large air samples collected on intercontinental flights by the CARIBIC Boeing 767

NARCIS (Netherlands)

Assonov, S.S.; Brenninkmeijer, C.A.M.; Koeppel, C.; Röckmann, T.

2009-01-01

Analytical details for 13C and 18O isotope analyses of atmospheric CO2 in large air samples are given. The large air samples of nominally 300 L were collected during the passenger aircraft-based atmospheric chemistry research project CARIBIC and analyzed for a large number of trace gases and
The problem of large samples. An activation analysis study of electronic waste material

International Nuclear Information System (INIS)

Segebade, C.; Goerner, W.; Bode, P.

2007-01-01

Large-volume instrumental photon activation analysis (IPAA) was used for the investigation of shredded electronic waste material. Sample masses from 1 to 150 grams were analyzed to obtain an estimate of the minimum sample size to be taken to achieve a representativeness of the results which is satisfactory for a defined investigation task. Furthermore, the influence of irradiation and measurement parameters upon the quality of the analytical results were studied. Finally, the analytical data obtained from IPAA and instrumental neutron activation analysis (INAA), both carried out in a large-volume mode, were compared. Only parts of the values were found in satisfactory agreement. (author)
Statistical comparison of the geometry of second-phase particles

Energy Technology Data Exchange (ETDEWEB)

Benes, Viktor, E-mail: benesv@karlin.mff.cuni.cz [Charles University in Prague, Faculty of Mathematics and Physics, Department of Probability and Mathematical Statistics, Sokolovska 83, 186 75 Prague 8-Karlin (Czech Republic); Lechnerova, Radka, E-mail: radka.lech@seznam.cz [Private College on Economical Studies, Ltd., Lindnerova 575/1, 180 00 Prague 8-Liben (Czech Republic); Klebanov, Lev [Charles University in Prague, Faculty of Mathematics and Physics, Department of Probability and Mathematical Statistics, Sokolovska 83, 186 75 Prague 8-Karlin (Czech Republic); Slamova, Margarita, E-mail: slamova@vyzkum-kovu.cz [Research Institute for Metals, Ltd., Panenske Brezany 50, 250 70 Odolena Voda (Czech Republic); Slama, Peter [Research Institute for Metals, Ltd., Panenske Brezany 50, 250 70 Odolena Voda (Czech Republic)

2009-10-15

In microscopic studies of materials, there is often a need to provide a statistical test as to whether two microstructures are different or not. Typically, there are some random objects (particles, grains, pores) and the comparison concerns their density, individual geometrical parameters and their spatial distribution. The problem is that neighbouring objects observed in a single window cannot be assumed to be stochastically independent, therefore classical statistical testing based on random sampling is not applicable. The aim of the present paper is to develop a test based on N-distances in probability theory. Using the measurements from a few independent windows, we consider a two-sample test, which involves a large amount of information collected from each window. An application is presented consisting in a comparison of metallographic samples of aluminium alloys, and the results are interpreted.
Statistical comparison of the geometry of second-phase particles

International Nuclear Information System (INIS)

Benes, Viktor; Lechnerova, Radka; Klebanov, Lev; Slamova, Margarita; Slama, Peter

2009-01-01

In microscopic studies of materials, there is often a need to provide a statistical test as to whether two microstructures are different or not. Typically, there are some random objects (particles, grains, pores) and the comparison concerns their density, individual geometrical parameters and their spatial distribution. The problem is that neighbouring objects observed in a single window cannot be assumed to be stochastically independent, therefore classical statistical testing based on random sampling is not applicable. The aim of the present paper is to develop a test based on N-distances in probability theory. Using the measurements from a few independent windows, we consider a two-sample test, which involves a large amount of information collected from each window. An application is presented consisting in a comparison of metallographic samples of aluminium alloys, and the results are interpreted.
Multi-reader ROC studies with split-plot designs: a comparison of statistical methods.

Science.gov (United States)

Obuchowski, Nancy A; Gallas, Brandon D; Hillis, Stephen L

2012-12-01

Multireader imaging trials often use a factorial design, in which study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of this design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper, the authors compare three methods of analysis for the split-plot design. Three statistical methods are presented: the Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean analysis-of-variance approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power, and confidence interval coverage of the three test statistics. The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% confidence intervals falls close to the nominal coverage for small and large sample sizes. The split-plot multireader, multicase study design can be statistically efficient compared to the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rates, similar power, and nominal confidence interval coverage, are available for this study design. Copyright © 2012 AUR. All rights reserved.
Effect of the Target Motion Sampling temperature treatment method on the statistics and performance

International Nuclear Information System (INIS)

Viitanen, Tuomas; Leppänen, Jaakko

2015-01-01

Highlights: • Use of the Target Motion Sampling (TMS) method with collision estimators is studied. • The expected values of the estimators agree with NJOY-based reference. • In most practical cases also the variances of the estimators are unaffected by TMS. • Transport calculation slow-down due to TMS dominates the impact on figures-of-merit. - Abstract: Target Motion Sampling (TMS) is a stochastic on-the-fly temperature treatment technique that is being developed as a part of the Monte Carlo reactor physics code Serpent. The method provides for modeling of arbitrary temperatures in continuous-energy Monte Carlo tracking routines with only one set of cross sections stored in the computer memory. Previously, only the performance of the TMS method in terms of CPU time per transported neutron has been discussed. Since the effective cross sections are not calculated at any point of a transport simulation with TMS, reaction rate estimators must be scored using sampled cross sections, which is expected to increase the variances and, consequently, to decrease the figures-of-merit. This paper examines the effects of the TMS on the statistics and performance in practical calculations involving reaction rate estimation with collision estimators. Against all expectations it turned out that the usage of sampled response values has no practical effect on the performance of reaction rate estimators when using TMS with elevated basis cross section temperatures (EBT), i.e. the usual way. With 0 Kelvin cross sections a significant increase in the variances of capture rate estimators was observed right below the energy region of unresolved resonances, but at these energies the figures-of-merit could be increased using a simple resampling technique to decrease the variances of the responses. It was, however, noticed that the usage of the TMS method increases the statistical deviances of all estimators, including the flux estimator, by tens of percents in the vicinity of very
Sampling Errors in Monthly Rainfall Totals for TRMM and SSM/I, Based on Statistics of Retrieved Rain Rates and Simple Models

Science.gov (United States)

Bell, Thomas L.; Kundu, Prasun K.; Einaudi, Franco (Technical Monitor)

2000-01-01

Estimates from TRMM satellite data of monthly total rainfall over an area are subject to substantial sampling errors due to the limited number of visits to the area by the satellite during the month. Quantitative comparisons of TRMM averages with data collected by other satellites and by ground-based systems require some estimate of the size of this sampling error. A method of estimating this sampling error based on the actual statistics of the TRMM observations and on some modeling work has been developed. "Sampling error" in TRMM monthly averages is defined here relative to the monthly total a hypothetical satellite permanently stationed above the area would have reported. "Sampling error" therefore includes contributions from the random and systematic errors introduced by the satellite remote sensing system. As part of our long-term goal of providing error estimates for each grid point accessible to the TRMM instruments, sampling error estimates for TRMM based on rain retrievals from TRMM microwave (TMI) data are compared for different times of the year and different oceanic areas (to minimize changes in the statistics due to algorithmic differences over land and ocean). Changes in sampling error estimates due to changes in rain statistics due 1) to evolution of the official algorithms used to process the data, and 2) differences from other remote sensing systems such as the Defense Meteorological Satellite Program (DMSP) Special Sensor Microwave/Imager (SSM/I), are analyzed.
Comparing identified and statistically significant lipids and polar metabolites in 15-year old serum and dried blood spot samples for longitudinal studies: Comparing lipids and metabolites in serum and DBS samples

Energy Technology Data Exchange (ETDEWEB)

Kyle, Jennifer E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Casey, Cameron P. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Stratton, Kelly G. [National Security Directorate, Pacific Northwest National Laboratory, Richland WA USA; Zink, Erika M. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Kim, Young-Mo [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Zheng, Xueyun [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Monroe, Matthew E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Weitz, Karl K. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Bloodsworth, Kent J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Orton, Daniel J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Ibrahim, Yehia M. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Moore, Ronald J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Lee, Christine G. [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Research Service, Portland Veterans Affairs Medical Center, Portland OR USA; Pedersen, Catherine [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Orwoll, Eric [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Smith, Richard D. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Burnum-Johnson, Kristin E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Baker, Erin S. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA

2017-02-05

The use of dried blood spots (DBS) has many advantages over traditional plasma and serum samples such as smaller blood volume required, storage at room temperature, and ability for sampling in remote locations. However, understanding the robustness of different analytes in DBS samples is essential, especially in older samples collected for longitudinal studies. Here we analyzed DBS samples collected in 2000-2001 and stored at room temperature and compared them to matched serum samples stored at -80°C to determine if they could be effectively used as specific time points in a longitudinal study following metabolic disease. Four hundred small molecules were identified in both the serum and DBS samples using gas chromatograph-mass spectrometry (GC-MS), liquid chromatography-MS (LC-MS) and LC-ion mobility spectrometry-MS (LC-IMS-MS). The identified polar metabolites overlapped well between the sample types, though only one statistically significant polar metabolite in a case-control study was conserved, indicating degradation occurs in the DBS samples affecting quantitation. Differences in the lipid identifications indicated that some oxidation occurs in the DBS samples. However, thirty-six statistically significant lipids correlated in both sample types indicating that lipid quantitation was more stable across the sample types.
Acceptance sampling using judgmental and randomly selected samples

Energy Technology Data Exchange (ETDEWEB)

Sego, Landon H.; Shulman, Stanley A.; Anderson, Kevin K.; Wilson, John E.; Pulsipher, Brent A.; Sieber, W. Karl

2010-09-01

We present a Bayesian model for acceptance sampling where the population consists of two groups, each with different levels of risk of containing unacceptable items. Expert opinion, or judgment, may be required to distinguish between the high and low-risk groups. Hence, high-risk items are likely to be identifed (and sampled) using expert judgment, while the remaining low-risk items are sampled randomly. We focus on the situation where all observed samples must be acceptable. Consequently, the objective of the statistical inference is to quantify the probability that a large percentage of the unsampled items in the population are also acceptable. We demonstrate that traditional (frequentist) acceptance sampling and simpler Bayesian formulations of the problem are essentially special cases of the proposed model. We explore the properties of the model in detail, and discuss the conditions necessary to ensure that required samples sizes are non-decreasing function of the population size. The method is applicable to a variety of acceptance sampling problems, and, in particular, to environmental sampling where the objective is to demonstrate the safety of reoccupying a remediated facility that has been contaminated with a lethal agent.
Large sample neutron activation analysis: establishment at CDTN/CNEN, Brazil

Energy Technology Data Exchange (ETDEWEB)

Menezes, Maria Angela de B.C., E-mail: menezes@cdtn.b [Centro de Desenvolvimento da Tecnologia Nuclear (CDTN/CNEN-MG), Belo Horizonte, MG (Brazil); Jacimovic, Radojko, E-mail: radojko.jacimovic@ijs.s [Jozef Stefan Institute, Ljubljana (Slovenia). Dept. of Environmental Sciences. Group for Radiochemistry and Radioecology

2011-07-01

In order to improve the application of the neutron activation technique at CDTN/CNEN, the large sample instrumental neutron activation analysis is being established, IAEA BRA 14798 and FAPEMIG APQ-01259-09 projects. This procedure, LS-INAA, usually requires special facilities for the activation as well as for the detection. However, the TRIGA Mark I IPR R1, CDTN/CNEN has not been adapted for the irradiation and the usual gamma spectrometry has being carried out. To start the establishment of the LS-INAA, a 5g sample - IAEA/Soil 7 reference material was analyzed by k{sub 0}-standardized method. This paper is about the detector efficiency over the volume source using KayWin v2.23 and ANGLE V3.0 software. (author)
Bayesian stratified sampling to assess corpus utility

Energy Technology Data Exchange (ETDEWEB)

Hochberg, J.; Scovel, C.; Thomas, T.; Hall, S.

1998-12-01

This paper describes a method for asking statistical questions about a large text corpus. The authors exemplify the method by addressing the question, ``What percentage of Federal Register documents are real documents, of possible interest to a text researcher or analyst?`` They estimate an answer to this question by evaluating 200 documents selected from a corpus of 45,820 Federal Register documents. Bayesian analysis and stratified sampling are used to reduce the sampling uncertainty of the estimate from over 3,100 documents to fewer than 1,000. A possible application of the method is to establish baseline statistics used to estimate recall rates for information retrieval systems.
Application of binomial and multinomial probability statistics to the sampling design process of a global grain tracing and recall system

Science.gov (United States)

Small, coded, pill-sized tracers embedded in grain are proposed as a method for grain traceability. A sampling process for a grain traceability system was designed and investigated by applying probability statistics using a science-based sampling approach to collect an adequate number of tracers fo...
Municipal solid waste composition: Sampling methodology, statistical analyses, and case study evaluation

DEFF Research Database (Denmark)

Edjabou, Vincent Maklawe Essonanawe; Jensen, Morten Bang; Götze, Ramona

2015-01-01

Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both...... comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub......-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10-50 waste fractions, organised according to a three-level (tiered approach) facilitating,comparison of the waste data between individual sub-areas with different fractionation (waste...
Statistics for the LHC: Quantifying our Scientific Narrative (1/4)

CERN Multimedia

CERN. Geneva

2011-01-01

Now that the LHC physics program is well under way and results have begun to pour out of the experiments, the statistical methodology used for these results is a hot topic. This is a challenge at the LHC, as we have sensitivity to discover new physics in a stage of the experiments where systematic uncertainties can still be quite large. The emphasis of these lectures is how we can translate the scientific narrative of why we think we know what we know into quantitative statistical statements about the presence or absence of new physics. Topics will include statistical modeling, incorporation of control samples to constrain systematics, and Bayesian and Frequentist statistical tests that are capable of answering these questions.
Validation Of Intermediate Large Sample Analysis (With Sizes Up to 100 G) and Associated Facility Improvement

International Nuclear Information System (INIS)

Bode, P.; Koster-Ammerlaan, M.J.J.

2018-01-01

Pragmatic rather than physical correction factors for neutron and gamma-ray shielding were studied for samples of intermediate size, i.e. up to the 10-100 gram range. It was found that for most biological and geological materials, the neutron self-shielding is less than 5 % and the gamma-ray self-attenuation can easily be estimated. A trueness control material of 1 kg size was made based on use of left-overs of materials, used in laboratory intercomparisons. A design study for a large sample pool-side facility, handling plate-type volumes, had to be stopped because of a reduction in human resources, available for this CRP. The large sample NAA facilities were made available to guest scientists from Greece and Brazil. The laboratory for neutron activation analysis participated in the world’s first laboratory intercomparison utilizing large samples. (author)

Evaluation of environmental sampling methods for detection of Salmonella enterica in a large animal veterinary hospital.

Science.gov (United States)

Goeman, Valerie R; Tinkler, Stacy H; Hammac, G Kenitra; Ruple, Audrey

2018-04-01

Environmental surveillance for Salmonella enterica can be used for early detection of contamination; thus routine sampling is an integral component of infection control programs in hospital environments. At the Purdue University Veterinary Teaching Hospital (PUVTH), the technique regularly employed in the large animal hospital for sample collection uses sterile gauze sponges for environmental sampling, which has proven labor-intensive and time-consuming. Alternative sampling methods use Swiffer brand electrostatic wipes for environmental sample collection, which are reportedly effective and efficient. It was hypothesized that use of Swiffer wipes for sample collection would be more efficient and less costly than the use of gauze sponges. A head-to-head comparison between the 2 sampling methods was conducted in the PUVTH large animal hospital and relative agreement, cost-effectiveness, and sampling efficiency were compared. There was fair agreement in culture results between the 2 sampling methods, but Swiffer wipes required less time and less physical effort to collect samples and were more cost-effective.
Extracting climate signals from large hydrological data cubes using multivariate statistics - an example for the Mediterranean basin

Science.gov (United States)

Kauer, Agnes; Dorigo, Wouter; Bauer-Marschallinger, Bernhard

2017-04-01

Global warming is expected to change ocean-atmosphere oscillation patterns, e.g. the El Nino Southern Oscillation, and may thus have a substantial impact on water resources over land. Yet, the link between climate oscillations and terrestrial hydrology has large uncertainties. In particular, the climate in the Mediterranean basin is expected to be sensitive to global warming as it may increase insufficient and irregular water supply and lead to more frequent and intense droughts and heavy precipitation events. The ever increasing need for water in tourism and agriculture reinforce the problem. Therefore, the monitoring and better understanding of the hydrological cycle are crucial for this area. This study seeks to quantify the effect of regional climate modes, e.g. the Northern Atlantic Oscillation (NAO) on the hydrological cycle in the Mediterranean. We apply Empirical Orthogonal Functions (EOF) to a wide range of hydrological datasets to extract the major modes of variation over the study period. We use more than ten datasets describing precipitation, soil moisture, evapotranspiration, and changes in water mass with study periods ranging from one to three decades depending on the dataset. The resulting EOFs are then examined for correlations with regional climate modes using Spearman rank correlation analysis. This is done for the entire time span of the EOFs and for monthly and seasonally sampled data. We find relationships between the hydrological datasets and the climate modes NAO, Arctic Oscillation (AO), Eastern Atlantic (EA), and Tropical Northern Atlantic (TNA). Analyses of monthly and seasonally sampled data reveal high correlations especially in the winter months. However, the spatial extent of the data cube considered for the analyses have a large impact on the results. Our statistical analyses suggest an impact of regional climate modes on the hydrological cycle in the Mediterranean area and may provide valuable input for evaluating process
Overweight and Obesity: Prevalence and Correlates in a Large Clinical Sample of Children with Autism Spectrum Disorder

Science.gov (United States)

Zuckerman, Katharine E.; Hill, Alison P.; Guion, Kimberly; Voltolina, Lisa; Fombonne, Eric

2014-01-01

Autism Spectrum Disorders (ASDs) and childhood obesity (OBY) are rising public health concerns. This study aimed to evaluate the prevalence of overweight (OWT) and OBY in a sample of 376 Oregon children with ASD, and to assess correlates of OWT and OBY in this sample. We used descriptive statistics, bivariate, and focused multivariate analyses to…
A Statistical Primer: Understanding Descriptive and Inferential Statistics

OpenAIRE

Gillian Byrne

2007-01-01

As libraries and librarians move more towards evidence‐based decision making, the data being generated in libraries is growing. Understanding the basics of statistical analysis is crucial for evidence‐based practice (EBP), in order to correctly design and analyze researchas well as to evaluate the research of others. This article covers the fundamentals of descriptive and inferential statistics, from hypothesis construction to sampling to common statistical techniques including chi‐square, co...
Statistical properties of the surface velocity field in the northern Gulf of Mexico sampled by GLAD drifters

OpenAIRE

Mariano, A.J.; Ryan, E.H.; Huntley, H.S.; Laurindo, L.C.; Coelho, E.; Ozgokmen, TM; Berta, M.; Bogucki, D; Chen, S.S.; Curcic, M.; Drouin, K.L.; Gough, M; Haus, BK; Haza, A.C.; Hogan, P

2016-01-01

The Grand LAgrangian Deployment (GLAD) used multiscale sampling and GPS technology to observe time series of drifter positions with initial drifter separation of O(100 m) to O(10 km), and nominal 5 min sampling, during the summer and fall of 2012 in the northern Gulf of Mexico. Histograms of the velocity field and its statistical parameters are non-Gaussian; most are multimodal. The dominant periods for the surface velocity field are 1–2 days due to inertial oscillations, tides, and the sea b...
Relationship between accuracy and number of samples on statistical quantity and contour map of environmental gamma-ray dose rate. Example of random sampling

International Nuclear Information System (INIS)

Matsuda, Hideharu; Minato, Susumu

2002-01-01

The accuracy of statistical quantity like the mean value and contour map obtained by measurement of the environmental gamma-ray dose rate was evaluated by random sampling of 5 different model distribution maps made by the mean slope, -1.3, of power spectra calculated from the actually measured values. The values were derived from 58 natural gamma dose rate data reported worldwide ranging in the means of 10-100 Gy/h rates and 10 -3 -10 7 km 2 areas. The accuracy of the mean value was found around ±7% even for 60 or 80 samplings (the most frequent number) and the standard deviation had the accuracy less than 1/4-1/3 of the means. The correlation coefficient of the frequency distribution was found 0.860 or more for 200-400 samplings (the most frequent number) but of the contour map, 0.502-0.770. (K.H.)
The use of mass spectrometry for analysing metabolite biomarkers in epidemiology: methodological and statistical considerations for application to large numbers of biological samples.

Science.gov (United States)

Lind, Mads V; Savolainen, Otto I; Ross, Alastair B

2016-08-01

Data quality is critical for epidemiology, and as scientific understanding expands, the range of data available for epidemiological studies and the types of tools used for measurement have also expanded. It is essential for the epidemiologist to have a grasp of the issues involved with different measurement tools. One tool that is increasingly being used for measuring biomarkers in epidemiological cohorts is mass spectrometry (MS), because of the high specificity and sensitivity of MS-based methods and the expanding range of biomarkers that can be measured. Further, the ability of MS to quantify many biomarkers simultaneously is advantageously compared to single biomarker methods. However, as with all methods used to measure biomarkers, there are a number of pitfalls to consider which may have an impact on results when used in epidemiology. In this review we discuss the use of MS for biomarker analyses, focusing on metabolites and their application and potential issues related to large-scale epidemiology studies, the use of MS "omics" approaches for biomarker discovery and how MS-based results can be used for increasing biological knowledge gained from epidemiological studies. Better understanding of the possibilities and possible problems related to MS-based measurements will help the epidemiologist in their discussions with analytical chemists and lead to the use of the most appropriate statistical tools for these data.
Efficient statistical tests to compare Youden index: accounting for contingency correlation.

Science.gov (United States)

Chen, Fangyao; Xue, Yuqiang; Tan, Ming T; Chen, Pingyan

2015-04-30

Youden index is widely utilized in studies evaluating accuracy of diagnostic tests and performance of predictive, prognostic, or risk models. However, both one and two independent sample tests on Youden index have been derived ignoring the dependence (association) between sensitivity and specificity, resulting in potentially misleading findings. Besides, paired sample test on Youden index is currently unavailable. This article develops efficient statistical inference procedures for one sample, independent, and paired sample tests on Youden index by accounting for contingency correlation, namely associations between sensitivity and specificity and paired samples typically represented in contingency tables. For one and two independent sample tests, the variances are estimated by Delta method, and the statistical inference is based on the central limit theory, which are then verified by bootstrap estimates. For paired samples test, we show that the estimated covariance of the two sensitivities and specificities can be represented as a function of kappa statistic so the test can be readily carried out. We then show the remarkable accuracy of the estimated variance using a constrained optimization approach. Simulation is performed to evaluate the statistical properties of the derived tests. The proposed approaches yield more stable type I errors at the nominal level and substantially higher power (efficiency) than does the original Youden's approach. Therefore, the simple explicit large sample solution performs very well. Because we can readily implement the asymptotic and exact bootstrap computation with common software like R, the method is broadly applicable to the evaluation of diagnostic tests and model performance. Copyright © 2015 John Wiley & Sons, Ltd.
Thermal neutron self-shielding correction factors for large sample instrumental neutron activation analysis using the MCNP code

International Nuclear Information System (INIS)

Tzika, F.; Stamatelatos, I.E.

2004-01-01

Thermal neutron self-shielding within large samples was studied using the Monte Carlo neutron transport code MCNP. The code enabled a three-dimensional modeling of the actual source and geometry configuration including reactor core, graphite pile and sample. Neutron flux self-shielding correction factors derived for a set of materials of interest for large sample neutron activation analysis are presented and evaluated. Simulations were experimentally verified by measurements performed using activation foils. The results of this study can be applied in order to determine neutron self-shielding factors of unknown samples from the thermal neutron fluxes measured at the surface of the sample
Understanding the Sampling Distribution and Its Use in Testing Statistical Significance.

Science.gov (United States)

Breunig, Nancy A.

Despite the increasing criticism of statistical significance testing by researchers, particularly in the publication of the 1994 American Psychological Association's style manual, statistical significance test results are still popular in journal articles. For this reason, it remains important to understand the logic of inferential statistics. A…
Radioimmunoassay of h-TSH - methodological suggestions for dealing with medium to large numbers of samples

International Nuclear Information System (INIS)

Mahlstedt, J.

1977-01-01

The article deals with practical aspects of establishing a TSH-RIA for patients, with particular regard to predetermined quality criteria. Methodological suggestions are made for medium to large numbers of samples with the target of reducing monotonous precision working steps by means of simple aids. The quality criteria required are well met, while the test procedure is well adapted to the rhythm of work and may be carried out without loss of precision even with large numbers of samples. (orig.) [de
Understanding Computational Bayesian Statistics

CERN Document Server

Bolstad, William M

2011-01-01

A hands-on introduction to computational statistics from a Bayesian point of view Providing a solid grounding in statistics while uniquely covering the topics from a Bayesian perspective, Understanding Computational Bayesian Statistics successfully guides readers through this new, cutting-edge approach. With its hands-on treatment of the topic, the book shows how samples can be drawn from the posterior distribution when the formula giving its shape is all that is known, and how Bayesian inferences can be based on these samples from the posterior. These ideas are illustrated on common statistic
On the Use of Biomineral Oxygen Isotope Data to Identify Human Migrants in the Archaeological Record: Intra-Sample Variation, Statistical Methods and Geographical Considerations.

Directory of Open Access Journals (Sweden)

Emma Lightfoot

Full Text Available Oxygen isotope analysis of archaeological skeletal remains is an increasingly popular tool to study past human migrations. It is based on the assumption that human body chemistry preserves the δ18O of precipitation in such a way as to be a useful technique for identifying migrants and, potentially, their homelands. In this study, the first such global survey, we draw on published human tooth enamel and bone bioapatite data to explore the validity of using oxygen isotope analyses to identify migrants in the archaeological record. We use human δ18O results to show that there are large variations in human oxygen isotope values within a population sample. This may relate to physiological factors influencing the preservation of the primary isotope signal, or due to human activities (such as brewing, boiling, stewing, differential access to water sources and so on causing variation in ingested water and food isotope values. We compare the number of outliers identified using various statistical methods. We determine that the most appropriate method for identifying migrants is dependent on the data but is likely to be the IQR or median absolute deviation from the median under most archaeological circumstances. Finally, through a spatial assessment of the dataset, we show that the degree of overlap in human isotope values from different locations across Europe is such that identifying individuals' homelands on the basis of oxygen isotope analysis alone is not possible for the regions analysed to date. Oxygen isotope analysis is a valid method for identifying first-generation migrants from an archaeological site when used appropriately, however it is difficult to identify migrants using statistical methods for a sample size of less than c. 25 individuals. In the absence of local previous analyses, each sample should be treated as an individual dataset and statistical techniques can be used to identify migrants, but in most cases pinpointing a specific
Elemental mapping of large samples by external ion beam analysis with sub-millimeter resolution and its applications

Science.gov (United States)

Silva, T. F.; Rodrigues, C. L.; Added, N.; Rizzutto, M. A.; Tabacniks, M. H.; Mangiarotti, A.; Curado, J. F.; Aguirre, F. R.; Aguero, N. F.; Allegro, P. R. P.; Campos, P. H. O. V.; Restrepo, J. M.; Trindade, G. F.; Antonio, M. R.; Assis, R. F.; Leite, A. R.

2018-05-01

The elemental mapping of large areas using ion beam techniques is a desired capability for several scientific communities, involved on topics ranging from geoscience to cultural heritage. Usually, the constraints for large-area mapping are not met in setups employing micro- and nano-probes implemented all over the world. A novel setup for mapping large sized samples in an external beam was recently built at the University of São Paulo employing a broad MeV-proton probe with sub-millimeter dimension, coupled to a high-precision large range XYZ robotic stage (60 cm range in all axis and precision of 5 μ m ensured by optical sensors). An important issue on large area mapping is how to deal with the irregularities of the sample's surface, that may introduce artifacts in the images due to the variation of the measuring conditions. In our setup, we implemented an automatic system based on machine vision to correct the position of the sample to compensate for its surface irregularities. As an additional benefit, a 3D digital reconstruction of the scanned surface can also be obtained. Using this new and unique setup, we have produced large-area elemental maps of ceramics, stones, fossils, and other sort of samples.
Rapid separation method for {sup 237}Np and Pu isotopes in large soil samples

Energy Technology Data Exchange (ETDEWEB)

Maxwell, Sherrod L., E-mail: sherrod.maxwell@srs.go [Savannah River Nuclear Solutions, LLC, Building 735-B, Aiken, SC 29808 (United States); Culligan, Brian K.; Noyes, Gary W. [Savannah River Nuclear Solutions, LLC, Building 735-B, Aiken, SC 29808 (United States)

2011-07-15

A new rapid method for the determination of {sup 237}Np and Pu isotopes in soil and sediment samples has been developed at the Savannah River Site Environmental Lab (Aiken, SC, USA) that can be used for large soil samples. The new soil method utilizes an acid leaching method, iron/titanium hydroxide precipitation, a lanthanum fluoride soil matrix removal step, and a rapid column separation process with TEVA Resin. The large soil matrix is removed easily and rapidly using these two simple precipitations with high chemical recoveries and effective removal of interferences. Vacuum box technology and rapid flow rates are used to reduce analytical time.
Introduction to probability and statistics for science, engineering, and finance

CERN Document Server

Rosenkrantz, Walter A

2008-01-01

Data Analysis Orientation The Role and Scope of Statistics in Science and Engineering Types of Data: Examples from Engineering, Public Health, and Finance The Frequency Distribution of a Variable Defined on a Population Quantiles of a Distribution Measures of Location (Central Value) and Variability Covariance, Correlation, and Regression: Computing a Stock's Beta Mathematical Details and Derivations Large Data Sets Probability Theory Orientation Sample Space, Events, Axioms of Probability Theory Mathematical Models of Random Sampling Conditional Probability and Baye
[A comparison of convenience sampling and purposive sampling].

Science.gov (United States)

Suen, Lee-Jen Wu; Huang, Hui-Man; Lee, Hao-Hsien

2014-06-01

Convenience sampling and purposive sampling are two different sampling methods. This article first explains sampling terms such as target population, accessible population, simple random sampling, intended sample, actual sample, and statistical power analysis. These terms are then used to explain the difference between "convenience sampling" and purposive sampling." Convenience sampling is a non-probabilistic sampling technique applicable to qualitative or quantitative studies, although it is most frequently used in quantitative studies. In convenience samples, subjects more readily accessible to the researcher are more likely to be included. Thus, in quantitative studies, opportunity to participate is not equal for all qualified individuals in the target population and study results are not necessarily generalizable to this population. As in all quantitative studies, increasing the sample size increases the statistical power of the convenience sample. In contrast, purposive sampling is typically used in qualitative studies. Researchers who use this technique carefully select subjects based on study purpose with the expectation that each participant will provide unique and rich information of value to the study. As a result, members of the accessible population are not interchangeable and sample size is determined by data saturation not by statistical power analysis.
Large scale statistical inference of signaling pathways from RNAi and microarray data

Directory of Open Access Journals (Sweden)

Poustka Annemarie

2007-10-01

Full Text Available Abstract Background The advent of RNA interference techniques enables the selective silencing of biologically interesting genes in an efficient way. In combination with DNA microarray technology this enables researchers to gain insights into signaling pathways by observing downstream effects of individual knock-downs on gene expression. These secondary effects can be used to computationally reverse engineer features of the upstream signaling pathway. Results In this paper we address this challenging problem by extending previous work by Markowetz et al., who proposed a statistical framework to score networks hypotheses in a Bayesian manner. Our extensions go in three directions: First, we introduce a way to omit the data discretization step needed in the original framework via a calculation based on p-values instead. Second, we show how prior assumptions on the network structure can be incorporated into the scoring scheme using regularization techniques. Third and most important, we propose methods to scale up the original approach, which is limited to around 5 genes, to large scale networks. Conclusion Comparisons of these methods on artificial data are conducted. Our proposed module network is employed to infer the signaling network between 13 genes in the ER-α pathway in human MCF-7 breast cancer cells. Using a bootstrapping approach this reconstruction can be found with good statistical stability. The code for the module network inference method is available in the latest version of the R-package nem, which can be obtained from the Bioconductor homepage.
Statistical Image Properties in Large Subsets of Traditional Art, Bad Art, and Abstract Art.

Science.gov (United States)

Redies, Christoph; Brachmann, Anselm

2017-01-01

Several statistical image properties have been associated with large subsets of traditional visual artworks. Here, we investigate some of these properties in three categories of art that differ in artistic claim and prestige: (1) Traditional art of different cultural origin from established museums and art collections (oil paintings and graphic art of Western provenance, Islamic book illustration and Chinese paintings), (2) Bad Art from two museums that collect contemporary artworks of lesser importance (© Museum Of Bad Art [MOBA], Somerville, and Official Bad Art Museum of Art [OBAMA], Seattle), and (3) twentieth century abstract art of Western provenance from two prestigious museums (Tate Gallery and Kunstsammlung Nordrhein-Westfalen). We measured the following four statistical image properties: the fractal dimension (a measure relating to subjective complexity); self-similarity (a measure of how much the sections of an image resemble the image as a whole), 1st-order entropy of edge orientations (a measure of how uniformly different orientations are represented in an image); and 2nd-order entropy of edge orientations (a measure of how independent edge orientations are across an image). As shown previously, traditional artworks of different styles share similar values for these measures. The values for Bad Art and twentieth century abstract art show a considerable overlap with those of traditional art, but we also identified numerous examples of Bad Art and abstract art that deviate from traditional art. By measuring statistical image properties, we quantify such differences in image composition for the first time.
Matrix Sampling of Items in Large-Scale Assessments

Directory of Open Access Journals (Sweden)

Ruth A. Childs

2003-07-01

Full Text Available Matrix sampling of items -' that is, division of a set of items into different versions of a test form..-' is used by several large-scale testing programs. Like other test designs, matrixed designs have..both advantages and disadvantages. For example, testing time per student is less than if each..student received all the items, but the comparability of student scores may decrease. Also,..curriculum coverage is maintained, but reporting of scores becomes more complex. In this paper,..matrixed designs are compared with more traditional designs in nine categories of costs:..development costs, materials costs, administration costs, educational costs, scoring costs,..reliability costs, comparability costs, validity costs, and reporting costs. In choosing among test..designs, a testing program should examine the costs in light of its mandate(s, the content of the..tests, and the financial resources available, among other considerations.

Big Data, Small Sample.

Science.gov (United States)

Gerlovina, Inna; van der Laan, Mark J; Hubbard, Alan

2017-05-20

Multiple comparisons and small sample size, common characteristics of many types of "Big Data" including those that are produced by genomic studies, present specific challenges that affect reliability of inference. Use of multiple testing procedures necessitates calculation of very small tail probabilities of a test statistic distribution. Results based on large deviation theory provide a formal condition that is necessary to guarantee error rate control given practical sample sizes, linking the number of tests and the sample size; this condition, however, is rarely satisfied. Using methods that are based on Edgeworth expansions (relying especially on the work of Peter Hall), we explore the impact of departures of sampling distributions from typical assumptions on actual error rates. Our investigation illustrates how far the actual error rates can be from the declared nominal levels, suggesting potentially wide-spread problems with error rate control, specifically excessive false positives. This is an important factor that contributes to "reproducibility crisis". We also review some other commonly used methods (such as permutation and methods based on finite sampling inequalities) in their application to multiple testing/small sample data. We point out that Edgeworth expansions, providing higher order approximations to the sampling distribution, offer a promising direction for data analysis that could improve reliability of studies relying on large numbers of comparisons with modest sample sizes.
Isotope dilution and sampling factors of the quality assurance and TQM of environmental analysis

International Nuclear Information System (INIS)

Macasek, F.

1999-01-01

Sampling and preparatory treatment of environmental objects is discussed from the view of their information content, functional speciation of the pollutant, statistical distribution treatment and uncertainty assessment. During homogenization of large samples, a substantial information may be lost and validity of environmental information becomes vague. Isotope dilution analysis is discussed as the most valuable tool for both validity of analysis and evaluation of samples variance. Data collection for a non-parametric statistical treatment of series of 'non-representative' sub-samples, and physico-chemical speciation of analyte may actually better fulfill criteria of similarity and representativeness. Large samples are often required due to detection limits of analysis, but the representativeness of environmental samples should by understood not only by the mean analyte concentration, but also by its spatial and time variance. Hence, heuristic analytical scenarios and interpretation of results must be designed by cooperation of environmentalists and analytical chemists. (author)
Large scale study of tooth enamel

International Nuclear Information System (INIS)

Bodart, F.; Deconninck, G.; Martin, M.T.

Human tooth enamel contains traces of foreign elements. The presence of these elements is related to the history and the environment of the human body and can be considered as the signature of perturbations which occur during the growth of a tooth. A map of the distribution of these traces on a large scale sample of the population will constitute a reference for further investigations of environmental effects. On hundred eighty samples of teeth were first analyzed using PIXE, backscattering and nuclear reaction techniques. The results were analyzed using statistical methods. Correlations between O, F, Na, P, Ca, Mn, Fe, Cu, Zn, Pb and Sr were observed and cluster analysis was in progress. The techniques described in the present work have been developed in order to establish a method for the exploration of very large samples of the Belgian population. (author)
Random sampling of elementary flux modes in large-scale metabolic networks.

Science.gov (United States)

Machado, Daniel; Soons, Zita; Patil, Kiran Raosaheb; Ferreira, Eugénio C; Rocha, Isabel

2012-09-15

The description of a metabolic network in terms of elementary (flux) modes (EMs) provides an important framework for metabolic pathway analysis. However, their application to large networks has been hampered by the combinatorial explosion in the number of modes. In this work, we develop a method for generating random samples of EMs without computing the whole set. Our algorithm is an adaptation of the canonical basis approach, where we add an additional filtering step which, at each iteration, selects a random subset of the new combinations of modes. In order to obtain an unbiased sample, all candidates are assigned the same probability of getting selected. This approach avoids the exponential growth of the number of modes during computation, thus generating a random sample of the complete set of EMs within reasonable time. We generated samples of different sizes for a metabolic network of Escherichia coli, and observed that they preserve several properties of the full EM set. It is also shown that EM sampling can be used for rational strain design. A well distributed sample, that is representative of the complete set of EMs, should be suitable to most EM-based methods for analysis and optimization of metabolic networks. Source code for a cross-platform implementation in Python is freely available at http://code.google.com/p/emsampler. dmachado@deb.uminho.pt Supplementary data are available at Bioinformatics online.
Proteomic Biomarker Discovery in 1000 Human Plasma Samples with Mass Spectrometry.

Science.gov (United States)

Cominetti, Ornella; Núñez Galindo, Antonio; Corthésy, John; Oller Moreno, Sergio; Irincheeva, Irina; Valsesia, Armand; Astrup, Arne; Saris, Wim H M; Hager, Jörg; Kussmann, Martin; Dayon, Loïc

2016-02-05

The overall impact of proteomics on clinical research and its translation has lagged behind expectations. One recognized caveat is the limited size (subject numbers) of (pre)clinical studies performed at the discovery stage, the findings of which fail to be replicated in larger verification/validation trials. Compromised study designs and insufficient statistical power are consequences of the to-date still limited capacity of mass spectrometry (MS)-based workflows to handle large numbers of samples in a realistic time frame, while delivering comprehensive proteome coverages. We developed a highly automated proteomic biomarker discovery workflow. Herein, we have applied this approach to analyze 1000 plasma samples from the multicentered human dietary intervention study "DiOGenes". Study design, sample randomization, tracking, and logistics were the foundations of our large-scale study. We checked the quality of the MS data and provided descriptive statistics. The data set was interrogated for proteins with most stable expression levels in that set of plasma samples. We evaluated standard clinical variables that typically impact forthcoming results and assessed body mass index-associated and gender-specific proteins at two time points. We demonstrate that analyzing a large number of human plasma samples for biomarker discovery with MS using isobaric tagging is feasible, providing robust and consistent biological results.
Statistics on the parameters of nonisothermal ionospheric plasma in large mesospheric electric fields

Science.gov (United States)

Martynenko, S.; Rozumenko, V.; Tyrnov, O.; Manson, A.; Meek, C.

The large V/m electric fields inherent in the mesosphere play an essential role in lower ionospheric electrodynamics. They must be the cause of large variations in the electron temperature and the electron collision frequency at D region altitudes, and consequently the ionospheric plasma in the lower part of the D region undergoes a transition into a nonisothermal state. This study is based on the databases on large mesospheric electric fields collected with the 2.2-MHz radar of the Institute of Space and Atmospheric Studies, University of Saskatchewan, Canada (52°N geographic latitude, 60.4°N geomagnetic latitude) and with the 2.3-MHz radar of the Kharkiv V. Karazin National University (49.6°N geographic latitude, 45.6°N geomagnetic latitude). The statistical analysis of these data is presented in Meek, C. E., A. H. Manson, S. I. Martynenko, V. T. Rozumenko, O. F. Tyrnov, Remote sensing of mesospheric electric fields using MF radars, Journal of Atmospheric and Solar-Terrestrial Physics, in press. The large mesospheric electric fields is experimentally established to follow a Rayleigh distribution in the interval 0
Computational statistics handbook with Matlab

CERN Document Server

Martinez, Wendy L

2007-01-01

Prefaces Introduction What Is Computational Statistics? An Overview of the Book Probability Concepts Introduction Probability Conditional Probability and Independence Expectation Common Distributions Sampling Concepts Introduction Sampling Terminology and Concepts Sampling Distributions Parameter Estimation Empirical Distribution Function Generating Random Variables Introduction General Techniques for Generating Random Variables Generating Continuous Random Variables Generating Discrete Random Variables Exploratory Data Analysis Introduction Exploring Univariate Data Exploring Bivariate and Trivariate Data Exploring Multidimensional Data Finding Structure Introduction Projecting Data Principal Component Analysis Projection Pursuit EDA Independent Component Analysis Grand Tour Nonlinear Dimensionality Reduction Monte Carlo Methods for Inferential Statistics Introduction Classical Inferential Statistics Monte Carlo Methods for Inferential Statist...
Study of a large rapid ashing apparatus and a rapid dry ashing method for biological samples and its application

International Nuclear Information System (INIS)

Jin Meisun; Wang Benli; Liu Wencang

1988-04-01

A large rapid-dry-ashing apparatus and a rapid ashing method for biological samples are described. The apparatus consists of specially made ashing furnace, gas supply system and temperature-programming control cabinet. The following adventages have been showed by ashing experiment with the above apparatus: (1) high speed of ashing and saving of electric energy; (2) The apparatus can ash a large amount of samples at a time; (3) The ashed sample is pure white (or spotless), loose and easily soluble with few content of residual char; (4) The fresh sample can also be ashed directly. The apparatus is suitable for ashing a large amount of the environmental samples containing low level radioactivity trace elements and the medical, food and agricultural research samples
A fast learning method for large scale and multi-class samples of SVM

Science.gov (United States)

Fan, Yu; Guo, Huiming

2017-06-01

A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.
Development of digital gamma-activation autoradiography for analysis of samples of large area

International Nuclear Information System (INIS)

Kolotov, V.P.; Grozdov, D.S.; Dogadkin, N.N.; Korobkov, V.I.

2011-01-01

Gamma-activation autoradiography is a prospective method for screening detection of inclusions of precious metals in geochemical samples. Its characteristics allow analysis of thin sections of large size (tens of cm2), that favourably distinguishes it among the other methods for local analysis. At the same time, the activating field of the accelerator bremsstrahlung, displays a sharp intensity decrease relative to the distance along the axis. A method for activation dose ''equalization'' during irradiation of the large size thin sections has been developed. The method is based on the usage of a hardware-software system. This includes a device for moving the sample during the irradiation, a program for computer modelling of the acquired activating dose for the chosen kinematics of the sample movement and a program for pixel-by pixel correction of the autoradiographic images. For detection of inclusions of precious metals, a method for analysis of the acquired dose dynamics during sample decay has been developed. The method is based on the software processing pixel by pixel a time-series of coaxial autoradiographic images and generation of the secondary meta-images allowing interpretation regarding the presence of interesting inclusions based on half-lives. The method is tested for analysis of copper-nickel polymetallic ores. The developed solutions considerably expand the possible applications of digital gamma-activation autoradiography. (orig.)
Development of digital gamma-activation autoradiography for analysis of samples of large area

Energy Technology Data Exchange (ETDEWEB)

Kolotov, V.P.; Grozdov, D.S.; Dogadkin, N.N.; Korobkov, V.I. [Russian Academy of Sciences, Moscow (Russian Federation). Vernadsky Inst. of Geochemistry and Analytical Chemistry

2011-07-01

Gamma-activation autoradiography is a prospective method for screening detection of inclusions of precious metals in geochemical samples. Its characteristics allow analysis of thin sections of large size (tens of cm2), that favourably distinguishes it among the other methods for local analysis. At the same time, the activating field of the accelerator bremsstrahlung, displays a sharp intensity decrease relative to the distance along the axis. A method for activation dose ''equalization'' during irradiation of the large size thin sections has been developed. The method is based on the usage of a hardware-software system. This includes a device for moving the sample during the irradiation, a program for computer modelling of the acquired activating dose for the chosen kinematics of the sample movement and a program for pixel-by pixel correction of the autoradiographic images. For detection of inclusions of precious metals, a method for analysis of the acquired dose dynamics during sample decay has been developed. The method is based on the software processing pixel by pixel a time-series of coaxial autoradiographic images and generation of the secondary meta-images allowing interpretation regarding the presence of interesting inclusions based on half-lives. The method is tested for analysis of copper-nickel polymetallic ores. The developed solutions considerably expand the possible applications of digital gamma-activation autoradiography. (orig.)
Significance levels for studies with correlated test statistics.

Science.gov (United States)

Shi, Jianxin; Levinson, Douglas F; Whittemore, Alice S

2008-07-01

When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.
The estimation of differential counting measurements of possitive quantities with relatively large statistical errors

International Nuclear Information System (INIS)

Vincent, C.H.

1982-01-01

Bayes' principle is applied to the differential counting measurement of a positive quantity in which the statistical errors are not necessarily small in relation to the true value of the quantity. The methods of estimation derived are found to give consistent results and to avoid the anomalous negative estimates sometimes obtained by conventional methods. One of the methods given provides a simple means of deriving the required estimates from conventionally presented results and appears to have wide potential applications. Both methods provide the actual posterior probability distribution of the quantity to be measured. A particularly important potential application is the correction of counts on low radioacitvity samples for background. (orig.)
Sample-path large deviations in credit risk

NARCIS (Netherlands)

Leijdekker, V.J.G.; Mandjes, M.R.H.; Spreij, P.J.C.

2011-01-01

The event of large losses plays an important role in credit risk. As these large losses are typically rare, and portfolios usually consist of a large number of positions, large deviation theory is the natural tool to analyze the tail asymptotics of the probabilities involved. We first derive a
Relationship of fish indices with sampling effort and land use change in a large Mediterranean river.

Science.gov (United States)

Almeida, David; Alcaraz-Hernández, Juan Diego; Merciai, Roberto; Benejam, Lluís; García-Berthou, Emili

2017-12-15

Fish are invaluable ecological indicators in freshwater ecosystems but have been less used for ecological assessments in large Mediterranean rivers. We evaluated the effects of sampling effort (transect length) on fish metrics, such as species richness and two fish indices (the new European Fish Index EFI+ and a regional index, IBICAT2b), in the mainstem of a large Mediterranean river. For this purpose, we sampled by boat electrofishing five sites each with 10 consecutive transects corresponding to a total length of 20 times the river width (European standard required by the Water Framework Directive) and we also analysed the effect of sampling area on previous surveys. Species accumulation curves and richness extrapolation estimates in general suggested that species richness was reasonably estimated with transect lengths of 10 times the river width or less. The EFI+ index was significantly affected by sampling area, both for our samplings and previous data. Surprisingly, EFI+ values in general decreased with increasing sampling area, despite the higher observed richness, likely because the expected values of metrics were higher. By contrast, the regional fish index was not dependent on sampling area, likely because it does not use a predictive model. Both fish indices, but particularly the EFI+, decreased with less forest cover percentage, even within the smaller disturbance gradient in the river type studied (mainstem of a large Mediterranean river, where environmental pressures are more general). Although the two fish-based indices are very different in terms of their development, methodology, and metrics used, they were significantly correlated and provided a similar assessment of ecological status. Our results reinforce the importance of standardization of sampling methods for bioassessment and suggest that predictive models that use sampling area as a predictor might be more affected by differences in sampling effort than simpler biotic indices. Copyright �
An introduction to Bayesian statistics in health psychology.

Science.gov (United States)

Depaoli, Sarah; Rus, Holly M; Clifton, James P; van de Schoot, Rens; Tiemensma, Jitske

2017-09-01

The aim of the current article is to provide a brief introduction to Bayesian statistics within the field of health psychology. Bayesian methods are increasing in prevalence in applied fields, and they have been shown in simulation research to improve the estimation accuracy of structural equation models, latent growth curve (and mixture) models, and hierarchical linear models. Likewise, Bayesian methods can be used with small sample sizes since they do not rely on large sample theory. In this article, we discuss several important components of Bayesian statistics as they relate to health-based inquiries. We discuss the incorporation and impact of prior knowledge into the estimation process and the different components of the analysis that should be reported in an article. We present an example implementing Bayesian estimation in the context of blood pressure changes after participants experienced an acute stressor. We conclude with final thoughts on the implementation of Bayesian statistics in health psychology, including suggestions for reviewing Bayesian manuscripts and grant proposals. We have also included an extensive amount of online supplementary material to complement the content presented here, including Bayesian examples using many different software programmes and an extensive sensitivity analysis examining the impact of priors.
Concepts in sample size determination

Directory of Open Access Journals (Sweden)

Umadevi K Rao

2012-01-01

Full Text Available Investigators involved in clinical, epidemiological or translational research, have the drive to publish their results so that they can extrapolate their findings to the population. This begins with the preliminary step of deciding the topic to be studied, the subjects and the type of study design. In this context, the researcher must determine how many subjects would be required for the proposed study. Thus, the number of individuals to be included in the study, i.e., the sample size is an important consideration in the design of many clinical studies. The sample size determination should be based on the difference in the outcome between the two groups studied as in an analytical study, as well as on the accepted p value for statistical significance and the required statistical power to test a hypothesis. The accepted risk of type I error or alpha value, which by convention is set at the 0.05 level in biomedical research defines the cutoff point at which the p value obtained in the study is judged as significant or not. The power in clinical research is the likelihood of finding a statistically significant result when it exists and is typically set to >80%. This is necessary since the most rigorously executed studies may fail to answer the research question if the sample size is too small. Alternatively, a study with too large a sample size will be difficult and will result in waste of time and resources. Thus, the goal of sample size planning is to estimate an appropriate number of subjects for a given study design. This article describes the concepts in estimating the sample size.
Size and shape characteristics of drumlins, derived from a large sample, and associated scaling laws

Science.gov (United States)

Clark, Chris D.; Hughes, Anna L. C.; Greenwood, Sarah L.; Spagnolo, Matteo; Ng, Felix S. L.

2009-04-01

Ice sheets flowing across a sedimentary bed usually produce a landscape of blister-like landforms streamlined in the direction of the ice flow and with each bump of the order of 10 2 to 10 3 m in length and 10 1 m in relief. Such landforms, known as drumlins, have mystified investigators for over a hundred years. A satisfactory explanation for their formation, and thus an appreciation of their glaciological significance, has remained elusive. A recent advance has been in numerical modelling of the land-forming process. In anticipation of future modelling endeavours, this paper is motivated by the requirement for robust data on drumlin size and shape for model testing. From a systematic programme of drumlin mapping from digital elevation models and satellite images of Britain and Ireland, we used a geographic information system to compile a range of statistics on length L, width W, and elongation ratio E (where E = L/ W) for a large sample. Mean L, is found to be 629 m ( n = 58,983), mean W is 209 m and mean E is 2.9 ( n = 37,043). Most drumlins are between 250 and 1000 metres in length; between 120 and 300 metres in width; and between 1.7 and 4.1 times as long as they are wide. Analysis of such data and plots of drumlin width against length reveals some new insights. All frequency distributions are unimodal from which we infer that the geomorphological label of 'drumlin' is fair in that this is a true single population of landforms, rather than an amalgam of different landform types. Drumlin size shows a clear minimum bound of around 100 m (horizontal). Maybe drumlins are generated at many scales and this is the minimum, or this value may be an indication of the fundamental scale of bump generation ('proto-drumlins') prior to them growing and elongating. A relationship between drumlin width and length is found (with r2 = 0.48) and that is approximately W = 7 L 1/2 when measured in metres. A surprising and sharply-defined line bounds the data cloud plotted in E- W
Statistical techniques for sampling and monitoring natural resources

Science.gov (United States)

Hans T. Schreuder; Richard Ernst; Hugo Ramirez-Maldonado

2004-01-01

We present the statistical theory of inventory and monitoring from a probabilistic point of view. We start with the basics and show the interrelationships between designs and estimators illustrating the methods with a small artificial population as well as with a mapped realistic population. For such applications, useful open source software is given in Appendix 4....
Latent spatial models and sampling design for landscape genetics

Science.gov (United States)

Hanks, Ephraim M.; Hooten, Mevin B.; Knick, Steven T.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Cross, Todd B.; Schwartz, Michael K.

2016-01-01

We propose a spatially-explicit approach for modeling genetic variation across space and illustrate how this approach can be used to optimize spatial prediction and sampling design for landscape genetic data. We propose a multinomial data model for categorical microsatellite allele data commonly used in landscape genetic studies, and introduce a latent spatial random effect to allow for spatial correlation between genetic observations. We illustrate how modern dimension reduction approaches to spatial statistics can allow for efficient computation in landscape genetic statistical models covering large spatial domains. We apply our approach to propose a retrospective spatial sampling design for greater sage-grouse (Centrocercus urophasianus) population genetics in the western United States.

Analysis of statistical misconception in terms of statistical reasoning

Science.gov (United States)

Maryati, I.; Priatna, N.

2018-05-01

Reasoning skill is needed for everyone to face globalization era, because every person have to be able to manage and use information from all over the world which can be obtained easily. Statistical reasoning skill is the ability to collect, group, process, interpret, and draw conclusion of information. Developing this skill can be done through various levels of education. However, the skill is low because many people assume that statistics is just the ability to count and using formulas and so do students. Students still have negative attitude toward course which is related to research. The purpose of this research is analyzing students’ misconception in descriptive statistic course toward the statistical reasoning skill. The observation was done by analyzing the misconception test result and statistical reasoning skill test; observing the students’ misconception effect toward statistical reasoning skill. The sample of this research was 32 students of math education department who had taken descriptive statistic course. The mean value of misconception test was 49,7 and standard deviation was 10,6 whereas the mean value of statistical reasoning skill test was 51,8 and standard deviation was 8,5. If the minimal value is 65 to state the standard achievement of a course competence, students’ mean value is lower than the standard competence. The result of students’ misconception study emphasized on which sub discussion that should be considered. Based on the assessment result, it was found that students’ misconception happen on this: 1) writing mathematical sentence and symbol well, 2) understanding basic definitions, 3) determining concept that will be used in solving problem. In statistical reasoning skill, the assessment was done to measure reasoning from: 1) data, 2) representation, 3) statistic format, 4) probability, 5) sample, and 6) association.
Feasibility studies on large sample neutron activation analysis using a low power research reactor

International Nuclear Information System (INIS)

Gyampo, O.

2008-06-01

Instrumental neutron activation analysis (INAA) using Ghana Research Reactor-1 (GHARR-1) can be directly applied to samples with masses in grams. Samples weights were in the range of 0.5g to 5g. Therefore, the representativity of the sample is improved as well as sensitivity. Irradiation of samples was done using a low power research reactor. The correction for the neutron self-shielding within the sample is determined from measurement of the neutron flux depression just outside the sample. Correction for gamma ray self-attenuation in the sample was performed via linear attenuation coefficients derived from transmission measurements. Quantitative and qualitative analysis of data were done using gamma ray spectrometry (HPGe detector). The results of this study on the possibilities of large sample NAA using a miniature neutron source reactor (MNSR) show clearly that the Ghana Research Reactor-1 (GHARR-1) at the National Nuclear Research Institute (NNRI) can be used for sample analyses up to 5 grams (5g) using the pneumatic transfer systems.
Software engineering the mixed model for genome-wide association studies on large samples

Science.gov (United States)

Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample siz...
THE SLOAN DIGITAL SKY SURVEY QUASAR LENS SEARCH. IV. STATISTICAL LENS SAMPLE FROM THE FIFTH DATA RELEASE

International Nuclear Information System (INIS)

Inada, Naohisa; Oguri, Masamune; Shin, Min-Su; Kayo, Issha; Fukugita, Masataka; Strauss, Michael A.; Gott, J. Richard; Hennawi, Joseph F.; Morokuma, Tomoki; Becker, Robert H.; Gregg, Michael D.; White, Richard L.; Kochanek, Christopher S.; Chiu, Kuenley; Johnston, David E.; Clocchiatti, Alejandro; Richards, Gordon T.; Schneider, Donald P.; Frieman, Joshua A.

2010-01-01

We present the second report of our systematic search for strongly lensed quasars from the data of the Sloan Digital Sky Survey (SDSS). From extensive follow-up observations of 136 candidate objects, we find 36 lenses in the full sample of 77,429 spectroscopically confirmed quasars in the SDSS Data Release 5. We then define a complete sample of 19 lenses, including 11 from our previous search in the SDSS Data Release 3, from the sample of 36,287 quasars with i Λ = 0.84 +0.06 -0.08 (stat.) +0.09 -0.07 (syst.) assuming a flat universe, which is in good agreement with other cosmological observations. We also report the discoveries of seven binary quasars with separations ranging from 1.''1 to 16.''6, which are identified in the course of our lens survey. This study concludes the construction of our statistical lens sample in the full SDSS-I data set.
Kappa statistic for clustered matched-pair data.

Science.gov (United States)

Yang, Zhao; Zhou, Ming

2014-07-10

Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
Optimum method to determine radioactivity in large tracts of land. In-situ gamma spectroscopy or sampling followed by laboratory measurement

International Nuclear Information System (INIS)

Bronson, Frazier

2008-01-01

In the process of decommissioning contaminated facilities, and in the conduct of normal operations involving radioactive material, it is frequently required to show that large areas of land are not contaminated, or if contaminated that the amount is below an acceptable level. However, it is quite rare for the radioactivity in the soil to be uniformly distributed. Rather it is generally in a few isolated and probably unknown locations. One way to ascertain the status of the land concentration is to take soil samples for subsequent measurement in the laboratory. Another way is to use in-situ gamma spectroscopy. In both cases, the non-uniform distribution of radioactivity can greatly compromise the accuracy of the assay, and makes uncertainty estimates much more complicated than simple propagation of counting statistics. This paper examines the process of determining the best way to estimate the activity on the tract of land, and gives quantitative estimates of measurement uncertainty for various conditions of radioactivity. When the distribution of radioactivity in the soil is not homogeneous, the sampling uncertainty is likely to be larger than the in-situ measurement uncertainty. (author)
Determination of 129I in large soil samples after alkaline wet disintegration

International Nuclear Information System (INIS)

Bunzl, K.; Kracke, W.

1992-01-01

Large soil samples (up to 500 g) can conveniently be disintegrated by hydrogen peroxide in an utility tank under alkaline conditions to determine subsequently 129 I by neutron activation analysis. Interfering elements such as Br are removed already before neutron irradiation to reduce the radiation exposure of the personnel. The precision of the method is 129 I also by the combustion method. (orig.)
Adaptive sampling based on the cumulative distribution function of order statistics to delineate heavy-metal contaminated soils using kriging

International Nuclear Information System (INIS)

Juang, K.-W.; Lee, D.-Y.; Teng, Y.-L.

2005-01-01

Correctly classifying 'contaminated' areas in soils, based on the threshold for a contaminated site, is important for determining effective clean-up actions. Pollutant mapping by means of kriging is increasingly being used for the delineation of contaminated soils. However, those areas where the kriged pollutant concentrations are close to the threshold have a high possibility for being misclassified. In order to reduce the misclassification due to the over- or under-estimation from kriging, an adaptive sampling using the cumulative distribution function of order statistics (CDFOS) was developed to draw additional samples for delineating contaminated soils, while kriging. A heavy-metal contaminated site in Hsinchu, Taiwan was used to illustrate this approach. The results showed that compared with random sampling, adaptive sampling using CDFOS reduced the kriging estimation errors and misclassification rates, and thus would appear to be a better choice than random sampling, as additional sampling is required for delineating the 'contaminated' areas. - A sampling approach was derived for drawing additional samples while kriging
Statistical methods in physical mapping

International Nuclear Information System (INIS)

Nelson, D.O.

1995-05-01

One of the great success stories of modern molecular genetics has been the ability of biologists to isolate and characterize the genes responsible for serious inherited diseases like fragile X syndrome, cystic fibrosis and myotonic muscular dystrophy. This dissertation concentrates on constructing high-resolution physical maps. It demonstrates how probabilistic modeling and statistical analysis can aid molecular geneticists in the tasks of planning, execution, and evaluation of physical maps of chromosomes and large chromosomal regions. The dissertation is divided into six chapters. Chapter 1 provides an introduction to the field of physical mapping, describing the role of physical mapping in gene isolation and ill past efforts at mapping chromosomal regions. The next two chapters review and extend known results on predicting progress in large mapping projects. Such predictions help project planners decide between various approaches and tactics for mapping large regions of the human genome. Chapter 2 shows how probability models have been used in the past to predict progress in mapping projects. Chapter 3 presents new results, based on stationary point process theory, for progress measures for mapping projects based on directed mapping strategies. Chapter 4 describes in detail the construction of all initial high-resolution physical map for human chromosome 19. This chapter introduces the probability and statistical models involved in map construction in the context of a large, ongoing physical mapping project. Chapter 5 concentrates on one such model, the trinomial model. This chapter contains new results on the large-sample behavior of this model, including distributional results, asymptotic moments, and detection error rates. In addition, it contains an optimality result concerning experimental procedures based on the trinomial model. The last chapter explores unsolved problems and describes future work
Statistical methods in physical mapping

Energy Technology Data Exchange (ETDEWEB)

Nelson, David O. [Univ. of California, Berkeley, CA (United States)

1995-05-01

One of the great success stories of modern molecular genetics has been the ability of biologists to isolate and characterize the genes responsible for serious inherited diseases like fragile X syndrome, cystic fibrosis and myotonic muscular dystrophy. This dissertation concentrates on constructing high-resolution physical maps. It demonstrates how probabilistic modeling and statistical analysis can aid molecular geneticists in the tasks of planning, execution, and evaluation of physical maps of chromosomes and large chromosomal regions. The dissertation is divided into six chapters. Chapter 1 provides an introduction to the field of physical mapping, describing the role of physical mapping in gene isolation and ill past efforts at mapping chromosomal regions. The next two chapters review and extend known results on predicting progress in large mapping projects. Such predictions help project planners decide between various approaches and tactics for mapping large regions of the human genome. Chapter 2 shows how probability models have been used in the past to predict progress in mapping projects. Chapter 3 presents new results, based on stationary point process theory, for progress measures for mapping projects based on directed mapping strategies. Chapter 4 describes in detail the construction of all initial high-resolution physical map for human chromosome 19. This chapter introduces the probability and statistical models involved in map construction in the context of a large, ongoing physical mapping project. Chapter 5 concentrates on one such model, the trinomial model. This chapter contains new results on the large-sample behavior of this model, including distributional results, asymptotic moments, and detection error rates. In addition, it contains an optimality result concerning experimental procedures based on the trinomial model. The last chapter explores unsolved problems and describes future work.
[The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].

Science.gov (United States)

Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel

2017-01-01

The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
The research protocol VI: How to choose the appropriate statistical test. Inferential statistics

Directory of Open Access Journals (Sweden)

Eric Flores-Ruiz

2017-10-01

Full Text Available The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
Determinants of salivary evening alpha-amylase in a large sample free of psychopathology

NARCIS (Netherlands)

Veen, Gerthe; Giltay, Erik J.; Vreeburg, Sophie A.; Licht, Carmilla M. M.; Cobbaert, Christa M.; Zitman, Frans G.; Penninx, Brenda W. J. H.

Objective: Recently, salivary alpha-amylase (sAA) has been proposed as a suitable index for sympathetic activity and dysregulation of the autonomic nervous system (ANS). Although determinants of sAA have been described, they have not been studied within the same study with a large sample size
Water pollution screening by large-volume injection of aqueous samples and application to GC/MS analysis of a river Elbe sample

Energy Technology Data Exchange (ETDEWEB)

Mueller, S.; Efer, J.; Engewald, W. [Leipzig Univ. (Germany). Inst. fuer Analytische Chemie

1997-03-01

The large-volume sampling of aqueous samples in a programmed temperature vaporizer (PTV) injector was used successfully for the target and non-target analysis of real samples. In this still rarely applied method, e.g., 1 mL of the water sample to be analyzed is slowly injected direct into the PTV. The vaporized water is eliminated through the split vent. The analytes are concentrated onto an adsorbent inside the insert and subsequently thermally desorbed. The capability of the method is demonstrated using a sample from the river Elbe. By means of coupling this method with a mass selective detector in SIM mode (target analysis) the method allows the determination of pollutants in the concentration range up to 0.01 {mu}g/L. Furthermore, PTV enrichment is an effective and time-saving method for non-target analysis in SCAN mode. In a sample from the river Elbe over 20 compounds were identified. (orig.) With 3 figs., 2 tabs.
Statistical analysis of hydrological response in urbanising catchments based on adaptive sampling using inter-amount times

Science.gov (United States)

ten Veldhuis, Marie-Claire; Schleiss, Marc

2017-04-01

Urban catchments are typically characterised by a more flashy nature of the hydrological response compared to natural catchments. Predicting flow changes associated with urbanisation is not straightforward, as they are influenced by interactions between impervious cover, basin size, drainage connectivity and stormwater management infrastructure. In this study, we present an alternative approach to statistical analysis of hydrological response variability and basin flashiness, based on the distribution of inter-amount times. We analyse inter-amount time distributions of high-resolution streamflow time series for 17 (semi-)urbanised basins in North Carolina, USA, ranging from 13 to 238 km2 in size. We show that in the inter-amount-time framework, sampling frequency is tuned to the local variability of the flow pattern, resulting in a different representation and weighting of high and low flow periods in the statistical distribution. This leads to important differences in the way the distribution quantiles, mean, coefficient of variation and skewness vary across scales and results in lower mean intermittency and improved scaling. Moreover, we show that inter-amount-time distributions can be used to detect regulation effects on flow patterns, identify critical sampling scales and characterise flashiness of hydrological response. The possibility to use both the classical approach and the inter-amount-time framework to identify minimum observable scales and analyse flow data opens up interesting areas for future research.
A hard-to-read font reduces the framing effect in a large sample.

Science.gov (United States)

Korn, Christoph W; Ries, Juliane; Schalk, Lennart; Oganian, Yulia; Saalbach, Henrik

2018-04-01

How can apparent decision biases, such as the framing effect, be reduced? Intriguing findings within recent years indicate that foreign language settings reduce framing effects, which has been explained in terms of deeper cognitive processing. Because hard-to-read fonts have been argued to trigger deeper cognitive processing, so-called cognitive disfluency, we tested whether hard-to-read fonts reduce framing effects. We found no reliable evidence for an effect of hard-to-read fonts on four framing scenarios in a laboratory (final N = 158) and an online study (N = 271). However, in a preregistered online study with a rather large sample (N = 732), a hard-to-read font reduced the framing effect in the classic "Asian disease" scenario (in a one-sided test). This suggests that hard-read-fonts can modulate decision biases-albeit with rather small effect sizes. Overall, our findings stress the importance of large samples for the reliability and replicability of modulations of decision biases.
Optimizing liquid effluent monitoring at a large nuclear complex.

Science.gov (United States)

Chou, Charissa J; Barnett, D Brent; Johnson, Vernon G; Olson, Phil M

2003-12-01

Effluent monitoring typically requires a large number of analytes and samples during the initial or startup phase of a facility. Once a baseline is established, the analyte list and sampling frequency may be reduced. Although there is a large body of literature relevant to the initial design, few, if any, published papers exist on updating established effluent monitoring programs. This paper statistically evaluates four years of baseline data to optimize the liquid effluent monitoring efficiency of a centralized waste treatment and disposal facility at a large defense nuclear complex. Specific objectives were to: (1) assess temporal variability in analyte concentrations, (2) determine operational factors contributing to waste stream variability, (3) assess the probability of exceeding permit limits, and (4) streamline the sampling and analysis regime. Results indicated that the probability of exceeding permit limits was one in a million under normal facility operating conditions, sampling frequency could be reduced, and several analytes could be eliminated. Furthermore, indicators such as gross alpha and gross beta measurements could be used in lieu of more expensive specific isotopic analyses (radium, cesium-137, and strontium-90) for routine monitoring. Study results were used by the state regulatory agency to modify monitoring requirements for a new discharge permit, resulting in an annual cost savings of US dollars 223,000. This case study demonstrates that statistical evaluation of effluent contaminant variability coupled with process knowledge can help plant managers and regulators streamline analyte lists and sampling frequencies based on detection history and environmental risk.
A Pipeline for Large Data Processing Using Regular Sampling for Unstructured Grids

Energy Technology Data Exchange (ETDEWEB)

Berres, Anne Sabine [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Adhinarayanan, Vignesh [Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States); Turton, Terece [Univ. of Texas, Austin, TX (United States); Feng, Wu [Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States); Rogers, David Honegger [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2017-05-12

Large simulation data requires a lot of time and computational resources to compute, store, analyze, visualize, and run user studies. Today, the largest cost of a supercomputer is not hardware but maintenance, in particular energy consumption. Our goal is to balance energy consumption and cognitive value of visualizations of resulting data. This requires us to go through the entire processing pipeline, from simulation to user studies. To reduce the amount of resources, data can be sampled or compressed. While this adds more computation time, the computational overhead is negligible compared to the simulation time. We built a processing pipeline at the example of regular sampling. The reasons for this choice are two-fold: using a simple example reduces unnecessary complexity as we know what to expect from the results. Furthermore, it provides a good baseline for future, more elaborate sampling methods. We measured time and energy for each test we did, and we conducted user studies in Amazon Mechanical Turk (AMT) for a range of different results we produced through sampling.
National transportation statistics 2010

Science.gov (United States)

2010-01-01

National Transportation Statistics presents statistics on the U.S. transportation system, including its physical components, safety record, economic performance, the human and natural environment, and national security. This is a large online documen...
Some statistical and sampling needs for detecting spills or migration at commercial low-level radioactive waste disposal sites

International Nuclear Information System (INIS)

Thomas, J.M.; Eberhardt, L.L.; Skalski, J.R.; Simmons, M.A.

1984-05-01

As part of a larger study funded by the US Nuclear Regulatory Commission we have been investigating field sampling strategies and compositing as a means of detecting spills or migration at commercial low-level radioactive waste disposal sites. The overall project is designed to produce information for developing guidance on implementing 10 CFR part 61. Compositing (pooling samples) for detection is discussed first, followed by our development of a statistical test to allow a decision as to whether any component of a composite exceeds a prescribed maximum acceptable level. The question of optimal field sampling designs and an Apple computer program designed to show the difficulties in constructing efficient field designs and using compositing schemes are considered. 6 references, 3 figures, 3 tables

Association between genetic variation in a region on chromosome 11 and schizophrenia in large samples from Europe

DEFF Research Database (Denmark)

Rietschel, M; Mattheisen, M; Degenhardt, F

2012-01-01

the recruitment of very large samples of patients and controls (that is tens of thousands), or large, potentially more homogeneous samples that have been recruited from confined geographical areas using identical diagnostic criteria. Applying the latter strategy, we performed a genome-wide association study (GWAS...... between emotion regulation and cognition that is structurally and functionally abnormal in SCZ and bipolar disorder.Molecular Psychiatry advance online publication, 12 July 2011; doi:10.1038/mp.2011.80....
Lensing corrections to the Eg(z) statistics from large scale structure

Science.gov (United States)

Moradinezhad Dizgah, Azadeh; Durrer, Ruth

2016-09-01

We study the impact of the often neglected lensing contribution to galaxy number counts on the Eg statistics which is used to constrain deviations from GR. This contribution affects both the galaxy-galaxy and the convergence-galaxy spectra, while it is larger for the latter. At higher redshifts probed by upcoming surveys, for instance at z = 1.5, neglecting this term induces an error of (25-40)% in the spectra and therefore on the Eg statistics which is constructed from the combination of the two. Moreover, including it, renders the Eg statistics scale and bias-dependent and hence puts into question its very objective.
In-Depth Investigation of Statistical and Physicochemical Properties on the Field Study of the Intermittent Filling of Large Water Tanks

Directory of Open Access Journals (Sweden)

Do-Hwan Kim

2017-01-01

Full Text Available Large-demand customers, generally high-density dwellings and buildings, have dedicated ground or elevated water tanks to consistently supply drinking water to residents. Online field measurement for Nonsan-2 district meter area demonstrated that intermittent replenishment from large-demand customers could disrupt the normal operation of a water distribution system by taking large quantities of water in short times when filling the tanks from distribution mains. Based on the previous results of field measurement for hydraulic and water quality parameters, statistical analysis is performed for measured data in terms of autocorrelation, power spectral density, and cross-correlation. The statistical results show that the intermittent filling interval of 6.7 h and diurnal demand pattern of 23.3 h are detected through autocorrelation analyses, the similarities of the flow-pressure and the turbidity-particle count data are confirmed as a function of frequency through power spectral density analyses, and a strong cross-correlation is observed in the flow-pressure and turbidity-particle count analyses. In addition, physicochemical results show that the intermittent refill of storage tank from large-demand customers induces abnormal flow and pressure fluctuations and results in transient-induced turbid flow mainly composed of fine particles ranging within 2–4 μm and constituting Fe, Si, and Al.
A review of empirical research related to the use of small quantitative samples in clinical outcome scale development.

Science.gov (United States)

Houts, Carrie R; Edwards, Michael C; Wirth, R J; Deal, Linda S

2016-11-01

There has been a notable increase in the advocacy of using small-sample designs as an initial quantitative assessment of item and scale performance during the scale development process. This is particularly true in the development of clinical outcome assessments (COAs), where Rasch analysis has been advanced as an appropriate statistical tool for evaluating the developing COAs using a small sample. We review the benefits such methods are purported to offer from both a practical and statistical standpoint and detail several problematic areas, including both practical and statistical theory concerns, with respect to the use of quantitative methods, including Rasch-consistent methods, with small samples. The feasibility of obtaining accurate information and the potential negative impacts of misusing large-sample statistical methods with small samples during COA development are discussed.
Explorations in statistics: the log transformation.

Science.gov (United States)

Curran-Everett, Douglas

2018-06-01

Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This thirteenth installment of Explorations in Statistics explores the log transformation, an established technique that rescales the actual observations from an experiment so that the assumptions of some statistical analysis are better met. A general assumption in statistics is that the variability of some response Y is homogeneous across groups or across some predictor variable X. If the variability-the standard deviation-varies in rough proportion to the mean value of Y, a log transformation can equalize the standard deviations. Moreover, if the actual observations from an experiment conform to a skewed distribution, then a log transformation can make the theoretical distribution of the sample mean more consistent with a normal distribution. This is important: the results of a one-sample t test are meaningful only if the theoretical distribution of the sample mean is roughly normal. If we log-transform our observations, then we want to confirm the transformation was useful. We can do this if we use the Box-Cox method, if we bootstrap the sample mean and the statistic t itself, and if we assess the residual plots from the statistical model of the actual and transformed sample observations.
The Statistics of Emission and Detection of Neutrons and Photons from Fissile Samples for Safeguard Applications

International Nuclear Information System (INIS)

Enqvist, Andreas

2008-03-01

One particular purpose of nuclear safeguards, in addition to accounting for known materials, is the detection, identifying and quantifying unknown material, to prevent accidental and clandestine transports and uses of nuclear materials. This can be achieved in a non-destructive way through the various physical and statistical properties of particle emission and detection from such materials. This thesis addresses some fundamental aspects of nuclear materials and the way they can be detected and quantified by such methods. Factorial moments or multiplicities have long been used within the safeguard area. These are low order moments of the underlying number distributions of emission and detection. One objective of the present work was to determine the full probability distribution and its dependence on the sample mass and the detection process. Derivation and analysis of the full probability distribution and its dependence on the above factors constitutes the first part of the thesis. Another possibility of identifying unknown samples lies in the information in the 'fingerprints' (pulse shape distribution) left by a detected neutron or photon. A study of the statistical properties of the interaction of the incoming radiation (neutrons and photons) with the detectors constitutes the second part of the thesis. The interaction between fast neutrons and organic scintillation detectors is derived, and compared to Monte Carlo simulations. An experimental approach is also addressed in which cross correlation measurements were made using liquid scintillation detectors. First the dependence of the pulse height distribution on the energy and collision number of an incoming neutron was derived analytically and compared to numerical simulations. Then an algorithm was elaborated which can discriminate neutron pulses from photon pulses. The resulting cross correlation graphs are analyzed and discussed whether they can be used in applications to distinguish possible sample
The Statistics of Emission and Detection of Neutrons and Photons from Fissile Samples for Safeguard Applications

Energy Technology Data Exchange (ETDEWEB)

Enqvist, Andreas

2008-03-15

One particular purpose of nuclear safeguards, in addition to accounting for known materials, is the detection, identifying and quantifying unknown material, to prevent accidental and clandestine transports and uses of nuclear materials. This can be achieved in a non-destructive way through the various physical and statistical properties of particle emission and detection from such materials. This thesis addresses some fundamental aspects of nuclear materials and the way they can be detected and quantified by such methods. Factorial moments or multiplicities have long been used within the safeguard area. These are low order moments of the underlying number distributions of emission and detection. One objective of the present work was to determine the full probability distribution and its dependence on the sample mass and the detection process. Derivation and analysis of the full probability distribution and its dependence on the above factors constitutes the first part of the thesis. Another possibility of identifying unknown samples lies in the information in the 'fingerprints' (pulse shape distribution) left by a detected neutron or photon. A study of the statistical properties of the interaction of the incoming radiation (neutrons and photons) with the detectors constitutes the second part of the thesis. The interaction between fast neutrons and organic scintillation detectors is derived, and compared to Monte Carlo simulations. An experimental approach is also addressed in which cross correlation measurements were made using liquid scintillation detectors. First the dependence of the pulse height distribution on the energy and collision number of an incoming neutron was derived analytically and compared to numerical simulations. Then an algorithm was elaborated which can discriminate neutron pulses from photon pulses. The resulting cross correlation graphs are analyzed and discussed whether they can be used in applications to distinguish possible
A random-sum Wilcoxon statistic and its application to analysis of ROC and LROC data.

Science.gov (United States)

Tang, Liansheng Larry; Balakrishnan, N

2011-01-01

The Wilcoxon-Mann-Whitney statistic is commonly used for a distribution-free comparison of two groups. One requirement for its use is that the sample sizes of the two groups are fixed. This is violated in some of the applications such as medical imaging studies and diagnostic marker studies; in the former, the violation occurs since the number of correctly localized abnormal images is random, while in the latter the violation is due to some subjects not having observable measurements. For this reason, we propose here a random-sum Wilcoxon statistic for comparing two groups in the presence of ties, and derive its variance as well as its asymptotic distribution for large sample sizes. The proposed statistic includes the regular Wilcoxon rank-sum statistic. Finally, we apply the proposed statistic for summarizing location response operating characteristic data from a liver computed tomography study, and also for summarizing diagnostic accuracy of biomarker data.
AMORE-HX: a multidimensional optimization of radial enhanced NMR-sampled hydrogen exchange

International Nuclear Information System (INIS)

Gledhill, John M.; Walters, Benjamin T.; Wand, A. Joshua

2009-01-01

The Cartesian sampled three-dimensional HNCO experiment is inherently limited in time resolution and sensitivity for the real time measurement of protein hydrogen exchange. This is largely overcome by use of the radial HNCO experiment that employs the use of optimized sampling angles. The significant practical limitation presented by use of three-dimensional data is the large data storage and processing requirements necessary and is largely overcome by taking advantage of the inherent capabilities of the 2D-FT to process selective frequency space without artifact or limitation. Decomposition of angle spectra into positive and negative ridge components provides increased resolution and allows statistical averaging of intensity and therefore increased precision. Strategies for averaging ridge cross sections within and between angle spectra are developed to allow further statistical approaches for increasing the precision of measured hydrogen occupancy. Intensity artifacts potentially introduced by over-pulsing are effectively eliminated by use of the BEST approach
Sanov and central limit theorems for output statistics of quantum Markov chains

Energy Technology Data Exchange (ETDEWEB)

Horssen, Merlijn van, E-mail: merlijn.vanhorssen@nottingham.ac.uk [School of Physics and Astronomy, University of Nottingham, Nottingham NG7 2RD (United Kingdom); Guţă, Mădălin, E-mail: madalin.guta@nottingham.ac.uk [School of Mathematical Sciences, University of Nottingham, Nottingham NG7 2RD (United Kingdom)

2015-02-15

In this paper, we consider the statistics of repeated measurements on the output of a quantum Markov chain. We establish a large deviations result analogous to Sanov’s theorem for the multi-site empirical measure associated to finite sequences of consecutive outcomes of a classical stochastic process. Our result relies on the construction of an extended quantum transition operator (which keeps track of previous outcomes) in terms of which we compute moment generating functions, and whose spectral radius is related to the large deviations rate function. As a corollary to this, we obtain a central limit theorem for the empirical measure. Such higher level statistics may be used to uncover critical behaviour such as dynamical phase transitions, which are not captured by lower level statistics such as the sample mean. As a step in this direction, we give an example of a finite system whose level-1 (empirical mean) rate function is independent of a model parameter while the level-2 (empirical measure) rate is not.
Advances in statistics

Science.gov (United States)

Howard Stauffer; Nadav Nur

2005-01-01

The papers included in the Advances in Statistics section of the Partners in Flight (PIF) 2002 Proceedings represent a small sample of statistical topics of current importance to Partners In Flight research scientists: hierarchical modeling, estimation of detection probabilities, and Bayesian applications. Sauer et al. (this volume) examines a hierarchical model...
A NEW TEST OF THE STATISTICAL NATURE OF THE BRIGHTEST CLUSTER GALAXIES

International Nuclear Information System (INIS)

Lin, Yen-Ting; Ostriker, Jeremiah P.; Miller, Christopher J.

2010-01-01

A novel statistic is proposed to examine the hypothesis that all cluster galaxies are drawn from the same luminosity distribution (LD). In such a 'statistical model' of galaxy LD, the brightest cluster galaxies (BCGs) are simply the statistical extreme of the galaxy population. Using a large sample of nearby clusters, we show that BCGs in high luminosity clusters (e.g., L tot ∼> 4 x 10 11 h -2 70 L sun ) are unlikely (probability ≤3 x 10 -4 ) to be drawn from the LD defined by all red cluster galaxies more luminous than M r = -20. On the other hand, BCGs in less luminous clusters are consistent with being the statistical extreme. Applying our method to the second brightest galaxies, we show that they are consistent with being the statistical extreme, which implies that the BCGs are also distinct from non-BCG luminous, red, cluster galaxies. We point out some issues with the interpretation of the classical tests proposed by Tremaine and Richstone (TR) that are designed to examine the statistical nature of BCGs, investigate the robustness of both our statistical test and those of TR against difficulties in photometry of galaxies of large angular size, and discuss the implication of our findings on surveys that use the luminous red galaxies to measure the baryon acoustic oscillation features in the galaxy power spectrum.
Geometric statistical inference

International Nuclear Information System (INIS)

Periwal, Vipul

1999-01-01

A reparametrization-covariant formulation of the inverse problem of probability is explicitly solved for finite sample sizes. The inferred distribution is explicitly continuous for finite sample size. A geometric solution of the statistical inference problem in higher dimensions is outlined
Psychometric Evaluation of the Thought–Action Fusion Scale in a Large Clinical Sample

Science.gov (United States)

Meyer, Joseph F.; Brown, Timothy A.

2015-01-01

This study examined the psychometric properties of the 19-item Thought–Action Fusion (TAF) Scale, a measure of maladaptive cognitive intrusions, in a large clinical sample (N = 700). An exploratory factor analysis (n = 300) yielded two interpretable factors: TAF Moral (TAF-M) and TAF Likelihood (TAF-L). A confirmatory bifactor analysis was conducted on the second portion of the sample (n = 400) to account for possible sources of item covariance using a general TAF factor (subsuming TAF-M) alongside the TAF-L domain-specific factor. The bifactor model provided an acceptable fit to the sample data. Results indicated that global TAF was more strongly associated with a measure of obsessive-compulsiveness than measures of general worry and depression, and the TAF-L dimension was more strongly related to obsessive-compulsiveness than depression. Overall, results support the bifactor structure of the TAF in a clinical sample and its close relationship to its neighboring obsessive-compulsiveness construct. PMID:22315482
Psychometric evaluation of the thought-action fusion scale in a large clinical sample.

Science.gov (United States)

Meyer, Joseph F; Brown, Timothy A

2013-12-01

This study examined the psychometric properties of the 19-item Thought-Action Fusion (TAF) Scale, a measure of maladaptive cognitive intrusions, in a large clinical sample (N = 700). An exploratory factor analysis (n = 300) yielded two interpretable factors: TAF Moral (TAF-M) and TAF Likelihood (TAF-L). A confirmatory bifactor analysis was conducted on the second portion of the sample (n = 400) to account for possible sources of item covariance using a general TAF factor (subsuming TAF-M) alongside the TAF-L domain-specific factor. The bifactor model provided an acceptable fit to the sample data. Results indicated that global TAF was more strongly associated with a measure of obsessive-compulsiveness than measures of general worry and depression, and the TAF-L dimension was more strongly related to obsessive-compulsiveness than depression. Overall, results support the bifactor structure of the TAF in a clinical sample and its close relationship to its neighboring obsessive-compulsiveness construct.
Frontiers in statistical quality control 11

CERN Document Server

Schmid, Wolfgang

2015-01-01

The main focus of this edited volume is on three major areas of statistical quality control: statistical process control (SPC), acceptance sampling and design of experiments. The majority of the papers deal with statistical process control, while acceptance sampling and design of experiments are also treated to a lesser extent. The book is organized into four thematic parts, with Part I addressing statistical process control. Part II is devoted to acceptance sampling. Part III covers the design of experiments, while Part IV discusses related fields. The twenty-three papers in this volume stem from The 11th International Workshop on Intelligent Statistical Quality Control, which was held in Sydney, Australia from August 20 to August 23, 2013. The event was hosted by Professor Ross Sparks, CSIRO Mathematics, Informatics and Statistics, North Ryde, Australia and was jointly organized by Professors S. Knoth, W. Schmid and Ross Sparks. The papers presented here were carefully selected and reviewed by the scientifi...
Classroom Research: Assessment of Student Understanding of Sampling Distributions of Means and the Central Limit Theorem in Post-Calculus Probability and Statistics Classes

Science.gov (United States)

Lunsford, M. Leigh; Rowell, Ginger Holmes; Goodson-Espy, Tracy

2006-01-01

We applied a classroom research model to investigate student understanding of sampling distributions of sample means and the Central Limit Theorem in post-calculus introductory probability and statistics courses. Using a quantitative assessment tool developed by previous researchers and a qualitative assessment tool developed by the authors, we…
Empirical Statistical Power for Testing Multilocus Genotypic Effects under Unbalanced Designs Using a Gibbs Sampler

Directory of Open Access Journals (Sweden)

Chaeyoung Lee

2012-11-01

Full Text Available Epistasis that may explain a large portion of the phenotypic variation for complex economic traits of animals has been ignored in many genetic association studies. A Baysian method was introduced to draw inferences about multilocus genotypic effects based on their marginal posterior distributions by a Gibbs sampler. A simulation study was conducted to provide statistical powers under various unbalanced designs by using this method. Data were simulated by combined designs of number of loci, within genotype variance, and sample size in unbalanced designs with or without null combined genotype cells. Mean empirical statistical power was estimated for testing posterior mean estimate of combined genotype effect. A practical example for obtaining empirical statistical power estimates with a given sample size was provided under unbalanced designs. The empirical statistical powers would be useful for determining an optimal design when interactive associations of multiple loci with complex phenotypes were examined.
Statistical analysis of error rate of large-scale single flux quantum logic circuit by considering fluctuation of timing parameters

International Nuclear Information System (INIS)

Yamanashi, Yuki; Masubuchi, Kota; Yoshikawa, Nobuyuki

2016-01-01

The relationship between the timing margin and the error rate of the large-scale single flux quantum logic circuits is quantitatively investigated to establish a timing design guideline. We observed that the fluctuation in the set-up/hold time of single flux quantum logic gates caused by thermal noises is the most probable origin of the logical error of the large-scale single flux quantum circuit. The appropriate timing margin for stable operation of the large-scale logic circuit is discussed by taking the fluctuation of setup/hold time and the timing jitter in the single flux quantum circuits. As a case study, the dependence of the error rate of the 1-million-bit single flux quantum shift register on the timing margin is statistically analyzed. The result indicates that adjustment of timing margin and the bias voltage is important for stable operation of a large-scale SFQ logic circuit.
Statistical analysis of error rate of large-scale single flux quantum logic circuit by considering fluctuation of timing parameters

Energy Technology Data Exchange (ETDEWEB)

Yamanashi, Yuki, E-mail: yamanasi@ynu.ac.jp [Department of Electrical and Computer Engineering, Yokohama National University, Tokiwadai 79-5, Hodogaya-ku, Yokohama 240-8501 (Japan); Masubuchi, Kota; Yoshikawa, Nobuyuki [Department of Electrical and Computer Engineering, Yokohama National University, Tokiwadai 79-5, Hodogaya-ku, Yokohama 240-8501 (Japan)

2016-11-15

The relationship between the timing margin and the error rate of the large-scale single flux quantum logic circuits is quantitatively investigated to establish a timing design guideline. We observed that the fluctuation in the set-up/hold time of single flux quantum logic gates caused by thermal noises is the most probable origin of the logical error of the large-scale single flux quantum circuit. The appropriate timing margin for stable operation of the large-scale logic circuit is discussed by taking the fluctuation of setup/hold time and the timing jitter in the single flux quantum circuits. As a case study, the dependence of the error rate of the 1-million-bit single flux quantum shift register on the timing margin is statistically analyzed. The result indicates that adjustment of timing margin and the bias voltage is important for stable operation of a large-scale SFQ logic circuit.

Statistical Inference for Data Adaptive Target Parameters.

Science.gov (United States)

Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J

2016-05-01

Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Molecular dynamics based enhanced sampling of collective variables with very large time steps

Science.gov (United States)

Chen, Pei-Yang; Tuckerman, Mark E.

2018-01-01

Enhanced sampling techniques that target a set of collective variables and that use molecular dynamics as the driving engine have seen widespread application in the computational molecular sciences as a means to explore the free-energy landscapes of complex systems. The use of molecular dynamics as the fundamental driver of the sampling requires the introduction of a time step whose magnitude is limited by the fastest motions in a system. While standard multiple time-stepping methods allow larger time steps to be employed for the slower and computationally more expensive forces, the maximum achievable increase in time step is limited by resonance phenomena, which inextricably couple fast and slow motions. Recently, we introduced deterministic and stochastic resonance-free multiple time step algorithms for molecular dynamics that solve this resonance problem and allow ten- to twenty-fold gains in the large time step compared to standard multiple time step algorithms [P. Minary et al., Phys. Rev. Lett. 93, 150201 (2004); B. Leimkuhler et al., Mol. Phys. 111, 3579-3594 (2013)]. These methods are based on the imposition of isokinetic constraints that couple the physical system to Nosé-Hoover chains or Nosé-Hoover Langevin schemes. In this paper, we show how to adapt these methods for collective variable-based enhanced sampling techniques, specifically adiabatic free-energy dynamics/temperature-accelerated molecular dynamics, unified free-energy dynamics, and by extension, metadynamics, thus allowing simulations employing these methods to employ similarly very large time steps. The combination of resonance-free multiple time step integrators with free-energy-based enhanced sampling significantly improves the efficiency of conformational exploration.
Statistical Methods and Sampling Design for Estimating Step Trends in Surface-Water Quality

Science.gov (United States)

Hirsch, Robert M.

1988-01-01

This paper addresses two components of the problem of estimating the magnitude of step trends in surface water quality. The first is finding a robust estimator appropriate to the data characteristics expected in water-quality time series. The J. L. Hodges-E. L. Lehmann class of estimators is found to be robust in comparison to other nonparametric and moment-based estimators. A seasonal Hodges-Lehmann estimator is developed and shown to have desirable properties. Second, the effectiveness of various sampling strategies is examined using Monte Carlo simulation coupled with application of this estimator. The simulation is based on a large set of total phosphorus data from the Potomac River. To assure that the simulated records have realistic properties, the data are modeled in a multiplicative fashion incorporating flow, hysteresis, seasonal, and noise components. The results demonstrate the importance of balancing the length of the two sampling periods and balancing the number of data values between the two periods.
Investigating sex differences in psychological predictors of snack intake among a large representative sample

NARCIS (Netherlands)

Adriaanse, M.A.; Evers, C.; Verhoeven, A.A.C.; de Ridder, D.T.D.

It is often assumed that there are substantial sex differences in eating behaviour (e.g. women are more likely to be dieters or emotional eaters than men). The present study investigates this assumption in a large representative community sample while incorporating a comprehensive set of
Mathematical background and attitudes toward statistics in a sample of Spanish college students.

Science.gov (United States)

Carmona, José; Martínez, Rafael J; Sánchez, Manuel

2005-08-01

To examine the relation of mathematical background and initial attitudes toward statistics of Spanish college students in social sciences the Survey of Attitudes Toward Statistics was given to 827 students. Multivariate analyses tested the effects of two indicators of mathematical background (amount of exposure and achievement in previous courses) on the four subscales. Analysis suggested grades in previous courses are more related to initial attitudes toward statistics than the number of mathematics courses taken. Mathematical background was related with students' affective responses to statistics but not with their valuing of statistics. Implications of possible research are discussed.
Non-statistical effects in bond fission reactions of 1,2-difluoroethane

Science.gov (United States)

Schranz, Harold W.; Raff, Lionel M.; Thompson, Donald L.

1991-08-01

A microcanonical, classical variational transition-state theory based on the use of the efficient microcanonical sampling (EMS) procedure is applied to simple bond fission in 1,2-difluoroethane. Comparison is made with results of trajectory calculations performed on the same global potential-energy surface. Agreement between the statistical theory and trajectory results for CC CF and CH bond fissions is poor with differences as large as a factor of 125. Most importantly, at the lower energy studied, 6.0 eV, the statistical calculations predict considerably slower rates than those computed from trajectories. We conclude from these results that the statistical assumptions inherent in the transition-state theory method are not valid for 1,2-difluoroethane in spite of the fact that the total intramolecular energy transfer rate out of CH and CC normal and local modes is large relative to the bond fission rates. The IVR rate is not globally rapid and the trajectories do not access all of the energetically available phase space uniformly on the timescale of the reactions.
Sample representativeness verification of the FADN CZ farm business sample

Directory of Open Access Journals (Sweden)

Marie Prášilová

2011-01-01

Full Text Available Sample representativeness verification is one of the key stages of statistical work. After having joined the European Union the Czech Republic joined also the Farm Accountancy Data Network system of the Union. This is a sample of bodies and companies doing business in agriculture. Detailed production and economic data on the results of farming business are collected from that sample annually and results for the entire population of the country´s farms are then estimated and assessed. It is important hence, that the sample be representative. Representativeness is to be assessed as to the number of farms included in the survey and also as to the degree of accordance of the measures and indices as related to the population. The paper deals with the special statistical techniques and methods of the FADN CZ sample representativeness verification including the necessary sample size statement procedure. The Czech farm population data have been obtained from the Czech Statistical Office data bank.
EFFECT OF MEASUREMENT ERRORS ON PREDICTED COSMOLOGICAL CONSTRAINTS FROM SHEAR PEAK STATISTICS WITH LARGE SYNOPTIC SURVEY TELESCOPE

Energy Technology Data Exchange (ETDEWEB)

Bard, D.; Chang, C.; Kahn, S. M.; Gilmore, K.; Marshall, S. [KIPAC, Stanford University, 452 Lomita Mall, Stanford, CA 94309 (United States); Kratochvil, J. M.; Huffenberger, K. M. [Department of Physics, University of Miami, Coral Gables, FL 33124 (United States); May, M. [Physics Department, Brookhaven National Laboratory, Upton, NY 11973 (United States); AlSayyad, Y.; Connolly, A.; Gibson, R. R.; Jones, L.; Krughoff, S. [Department of Astronomy, University of Washington, Seattle, WA 98195 (United States); Ahmad, Z.; Bankert, J.; Grace, E.; Hannel, M.; Lorenz, S. [Department of Physics, Purdue University, West Lafayette, IN 47907 (United States); Haiman, Z.; Jernigan, J. G., E-mail: djbard@slac.stanford.edu [Department of Astronomy and Astrophysics, Columbia University, New York, NY 10027 (United States); and others

2013-09-01

We study the effect of galaxy shape measurement errors on predicted cosmological constraints from the statistics of shear peak counts with the Large Synoptic Survey Telescope (LSST). We use the LSST Image Simulator in combination with cosmological N-body simulations to model realistic shear maps for different cosmological models. We include both galaxy shape noise and, for the first time, measurement errors on galaxy shapes. We find that the measurement errors considered have relatively little impact on the constraining power of shear peak counts for LSST.
Contributions to statistics

CERN Document Server

Mahalanobis, P C

1965-01-01

Contributions to Statistics focuses on the processes, methodologies, and approaches involved in statistics. The book is presented to Professor P. C. Mahalanobis on the occasion of his 70th birthday. The selection first offers information on the recovery of ancillary information and combinatorial properties of partially balanced designs and association schemes. Discussions focus on combinatorial applications of the algebra of association matrices, sample size analogy, association matrices and the algebra of association schemes, and conceptual statistical experiments. The book then examines latt
Towards the harmonization between National Forest Inventory and Forest Condition Monitoring. Consistency of plot allocation and effect of tree selection methods on sample statistics in Italy.

Science.gov (United States)

Gasparini, Patrizia; Di Cosmo, Lucio; Cenni, Enrico; Pompei, Enrico; Ferretti, Marco

2013-07-01

In the frame of a process aiming at harmonizing National Forest Inventory (NFI) and ICP Forests Level I Forest Condition Monitoring (FCM) in Italy, we investigated (a) the long-term consistency between FCM sample points (a subsample of the first NFI, 1985, NFI_1) and recent forest area estimates (after the second NFI, 2005, NFI_2) and (b) the effect of tree selection method (tree-based or plot-based) on sample composition and defoliation statistics. The two investigations were carried out on 261 and 252 FCM sites, respectively. Results show that some individual forest categories (larch and stone pine, Norway spruce, other coniferous, beech, temperate oaks and cork oak forests) are over-represented and others (hornbeam and hophornbeam, other deciduous broadleaved and holm oak forests) are under-represented in the FCM sample. This is probably due to a change in forest cover, which has increased by 1,559,200 ha from 1985 to 2005. In case of shift from a tree-based to a plot-based selection method, 3,130 (46.7%) of the original 6,703 sample trees will be abandoned, and 1,473 new trees will be selected. The balance between exclusion of former sample trees and inclusion of new ones will be particularly unfavourable for conifers (with only 16.4% of excluded trees replaced by new ones) and less for deciduous broadleaves (with 63.5% of excluded trees replaced). The total number of tree species surveyed will not be impacted, while the number of trees per species will, and the resulting (plot-based) sample composition will have a much larger frequency of deciduous broadleaved trees. The newly selected trees have-in general-smaller diameter at breast height (DBH) and defoliation scores. Given the larger rate of turnover, the deciduous broadleaved part of the sample will be more impacted. Our results suggest that both a revision of FCM network to account for forest area change and a plot-based approach to permit statistical inference and avoid bias in the tree sample
The use of test scores from large-scale assessment surveys: psychometric and statistical considerations

Directory of Open Access Journals (Sweden)

Henry Braun

2017-11-01

Full Text Available Abstract Background Economists are making increasing use of measures of student achievement obtained through large-scale survey assessments such as NAEP, TIMSS, and PISA. The construction of these measures, employing plausible value (PV methodology, is quite different from that of the more familiar test scores associated with assessments such as the SAT or ACT. These differences have important implications both for utilization and interpretation. Although much has been written about PVs, it appears that there are still misconceptions about whether and how to employ them in secondary analyses. Methods We address a range of technical issues, including those raised in a recent article that was written to inform economists using these databases. First, an extensive review of the relevant literature was conducted, with particular attention to key publications that describe the derivation and psychometric characteristics of such achievement measures. Second, a simulation study was carried out to compare the statistical properties of estimates based on the use of PVs with those based on other, commonly used methods. Results It is shown, through both theoretical analysis and simulation, that under fairly general conditions appropriate use of PV yields approximately unbiased estimates of model parameters in regression analyses of large scale survey data. The superiority of the PV methodology is particularly evident when measures of student achievement are employed as explanatory variables. Conclusions The PV methodology used to report student test performance in large scale surveys remains the state-of-the-art for secondary analyses of these databases.
Statistics Clinic

Science.gov (United States)

Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

2014-01-01

Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
Design-based Sample and Probability Law-Assumed Sample: Their Role in Scientific Investigation.

Science.gov (United States)

Ojeda, Mario Miguel; Sahai, Hardeo

2002-01-01

Discusses some key statistical concepts in probabilistic and non-probabilistic sampling to provide an overview for understanding the inference process. Suggests a statistical model constituting the basis of statistical inference and provides a brief review of the finite population descriptive inference and a quota sampling inferential theory.…
Statistical basis for positive identification in forensic anthropology.

Science.gov (United States)

Steadman, Dawnie Wolfe; Adams, Bradley J; Konigsberg, Lyle W

2006-09-01

Forensic scientists are often expected to present the likelihood of DNA identifications in US courts based on comparative population data, yet forensic anthropologists tend not to quantify the strength of an osteological identification. Because forensic anthropologists are trained first and foremost as physical anthropologists, they emphasize estimation problems at the expense of evidentiary problems, but this approach must be reexamined. In this paper, the statistical bases for presenting osteological and dental evidence are outlined, using a forensic case as a motivating example. A brief overview of Bayesian statistics is provided, and methods to calculate likelihood ratios for five aspects of the biological profile are demonstrated. This paper emphasizes the definition of appropriate reference samples and of the "population at large," and points out the conceptual differences between them. Several databases are introduced for both reference information and to characterize the "population at large," and new data are compiled to calculate the frequency of specific characters, such as age or fractures, within the "population at large." Despite small individual likelihood ratios for age, sex, and stature in the case example, the power of this approach is that, assuming each likelihood ratio is independent, the product rule can be applied. In this particular example, it is over three million times more likely to obtain the observed osteological and dental data if the identification is correct than if the identification is incorrect. This likelihood ratio is a convincing statistic that can support the forensic anthropologist's opinion on personal identity in court. 2006 Wiley-Liss, Inc.
Statistical hadronization and hadronic micro-canonical ensemble II

International Nuclear Information System (INIS)

Becattini, F.; Ferroni, L.

2004-01-01

We present a Monte Carlo calculation of the micro-canonical ensemble of the ideal hadron-resonance gas including all known states up to a mass of about 1.8 GeV and full quantum statistics. The micro-canonical average multiplicities of the various hadron species are found to converge to the canonical ones for moderately low values of the total energy, around 8 GeV, thus bearing out previous analyses of hadronic multiplicities in the canonical ensemble. The main numerical computing method is an importance sampling Monte Carlo algorithm using the product of Poisson distributions to generate multi-hadronic channels. It is shown that the use of this multi-Poisson distribution allows for an efficient and fast computation of averages, which can be further improved in the limit of very large clusters. We have also studied the fitness of a previously proposed computing method, based on the Metropolis Monte Carlo algorithm, for event generation in the statistical hadronization model. We find that the use of the multi-Poisson distribution as proposal matrix dramatically improves the computation performance. However, due to the correlation of subsequent samples, this method proves to be generally less robust and effective than the importance sampling method. (orig.)
Large-volume constant-concentration sampling technique coupling with surface-enhanced Raman spectroscopy for rapid on-site gas analysis.

Science.gov (United States)

Zhang, Zhuomin; Zhan, Yisen; Huang, Yichun; Li, Gongke

2017-08-05

In this work, a portable large-volume constant-concentration (LVCC) sampling technique coupling with surface-enhanced Raman spectroscopy (SERS) was developed for the rapid on-site gas analysis based on suitable derivatization methods. LVCC sampling technique mainly consisted of a specially designed sampling cell including the rigid sample container and flexible sampling bag, and an absorption-derivatization module with a portable pump and a gas flowmeter. LVCC sampling technique allowed large, alterable and well-controlled sampling volume, which kept the concentration of gas target in headspace phase constant during the entire sampling process and made the sampling result more representative. Moreover, absorption and derivatization of gas target during LVCC sampling process were efficiently merged in one step using bromine-thiourea and OPA-NH 4 + strategy for ethylene and SO 2 respectively, which made LVCC sampling technique conveniently adapted to consequent SERS analysis. Finally, a new LVCC sampling-SERS method was developed and successfully applied for rapid analysis of trace ethylene and SO 2 from fruits. It was satisfied that trace ethylene and SO 2 from real fruit samples could be actually and accurately quantified by this method. The minor concentration fluctuations of ethylene and SO 2 during the entire LVCC sampling process were proved to be gas targets from real samples by SERS. Copyright © 2017 Elsevier B.V. All rights reserved.
Statistical data fusion for cross-tabulation

NARCIS (Netherlands)

Kamakura, W.A.; Wedel, M.

The authors address the situation in which a researcher wants to cross-tabulate two sets of discrete variables collected in independent samples, but a subset of the variables is common to both samples. The authors propose a statistical data-fusion model that allows for statistical tests of
Business statistics I essentials

CERN Document Server

Clark, Louise

2014-01-01

REA's Essentials provide quick and easy access to critical information in a variety of different fields, ranging from the most basic to the most advanced. As its name implies, these concise, comprehensive study guides summarize the essentials of the field covered. Essentials are helpful when preparing for exams, doing homework and will remain a lasting reference source for students, teachers, and professionals. Business Statistics I includes descriptive statistics, introduction to probability, probability distributions, sampling and sampling distributions, interval estimation, and hypothesis t
Statistics 101 for Radiologists.

Science.gov (United States)

Anvari, Arash; Halpern, Elkan F; Samir, Anthony E

2015-10-01

Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced. © RSNA, 2015.
Sample preparation and analysis of large 238PuO2 and ThO2 spheres

International Nuclear Information System (INIS)

Wise, R.L.; Selle, J.E.

1975-01-01

A program was initiated to determine the density gradient across a large spherical 238 PuO 2 sample produced by vacuum hot pressing. Due to the high thermal output of the ceramic a thin section was necessary to prevent overheating of the plastic mount. Techniques were developed for cross sectioning, mounting, grinding, and polishing of the sample. The polished samples were then analyzed on a quantitative image analyzer to determine the density as a function of location across the sphere. The techniques for indexing, analyzing, and reducing the data are described. Typical results obtained on a ThO 2 simulant sphere are given

Seasonal rationalization of river water quality sampling locations: a comparative study of the modified Sanders and multivariate statistical approaches.

Science.gov (United States)

Varekar, Vikas; Karmakar, Subhankar; Jha, Ramakar

2016-02-01

The design of surface water quality sampling location is a crucial decision-making process for rationalization of monitoring network. The quantity, quality, and types of available dataset (watershed characteristics and water quality data) may affect the selection of appropriate design methodology. The modified Sanders approach and multivariate statistical techniques [particularly factor analysis (FA)/principal component analysis (PCA)] are well-accepted and widely used techniques for design of sampling locations. However, their performance may vary significantly with quantity, quality, and types of available dataset. In this paper, an attempt has been made to evaluate performance of these techniques by accounting the effect of seasonal variation, under a situation of limited water quality data but extensive watershed characteristics information, as continuous and consistent river water quality data is usually difficult to obtain, whereas watershed information may be made available through application of geospatial techniques. A case study of Kali River, Western Uttar Pradesh, India, is selected for the analysis. The monitoring was carried out at 16 sampling locations. The discrete and diffuse pollution loads at different sampling sites were estimated and accounted using modified Sanders approach, whereas the monitored physical and chemical water quality parameters were utilized as inputs for FA/PCA. The designed optimum number of sampling locations for monsoon and non-monsoon seasons by modified Sanders approach are eight and seven while that for FA/PCA are eleven and nine, respectively. Less variation in the number and locations of designed sampling sites were obtained by both techniques, which shows stability of results. A geospatial analysis has also been carried out to check the significance of designed sampling location with respect to river basin characteristics and land use of the study area. Both methods are equally efficient; however, modified Sanders
Use of a statistical model of the whole femur in a large scale, multi-model study of femoral neck fracture risk.

Science.gov (United States)

Bryan, Rebecca; Nair, Prasanth B; Taylor, Mark

2009-09-18

Interpatient variability is often overlooked in orthopaedic computational studies due to the substantial challenges involved in sourcing and generating large numbers of bone models. A statistical model of the whole femur incorporating both geometric and material property variation was developed as a potential solution to this problem. The statistical model was constructed using principal component analysis, applied to 21 individual computer tomography scans. To test the ability of the statistical model to generate realistic, unique, finite element (FE) femur models it was used as a source of 1000 femurs to drive a study on femoral neck fracture risk. The study simulated the impact of an oblique fall to the side, a scenario known to account for a large proportion of hip fractures in the elderly and have a lower fracture load than alternative loading approaches. FE model generation, application of subject specific loading and boundary conditions, FE processing and post processing of the solutions were completed automatically. The generated models were within the bounds of the training data used to create the statistical model with a high mesh quality, able to be used directly by the FE solver without remeshing. The results indicated that 28 of the 1000 femurs were at highest risk of fracture. Closer analysis revealed the percentage of cortical bone in the proximal femur to be a crucial differentiator between the failed and non-failed groups. The likely fracture location was indicated to be intertrochantic. Comparison to previous computational, clinical and experimental work revealed support for these findings.
Sampling, Probability Models and Statistical Reasoning -RE ...

Indian Academy of Sciences (India)

random sampling allows data to be modelled with the help of probability ... g based on different trials to get an estimate of the experimental error. ... research interests lie in the .... if e is indeed the true value of the proportion of defectives in the.
Development of the Large-Scale Statistical Analysis System of Satellites Observations Data with Grid Datafarm Architecture

Science.gov (United States)

Yamamoto, K.; Murata, K.; Kimura, E.; Honda, R.

2006-12-01

number of files and the elapsed time, parallel and distributed processing shorten the elapsed time to 1/5 than sequential processing. On the other hand, sequential processing times were shortened in another experiment, whose file size is smaller than 100KB. In this case, the elapsed time to scan one file is within one second. It implies that disk swap took place in case of parallel processing by each node. We note that the operation became unstable when the number of the files exceeded 1000. To overcome the problem (iii), we developed an original data class. This class supports our reading of data files with various data formats since it converts them into an original data format since it defines schemata for every type of data and encapsulates the structure of data files. In addition, since this class provides a function of time re-sampling, users can easily convert multiple data (array) with different time resolution into the same time resolution array. Finally, using the Gfarm, we achieved a high performance environment for large-scale statistical data analyses. It should be noted that the present method is effective only when one data file size is large enough. At present, we are restructuring the new Gfarm environment with 8 nodes: CPU is Athlon 64 x2 Dual Core 2GHz, 2GB memory and 1.2TB disk (using RAID0) for each node. Our original class is to be implemented on the new Gfarm environment. In the present talk, we show the latest results with applying the present system for data analyses with huge number of satellite observation data files.
Avoiding Pitfalls in the Statistical Analysis of Heterogeneous Tumors

Directory of Open Access Journals (Sweden)

Judith-Anne W. Chapman

2009-01-01

Full Text Available Information about tumors is usually obtained from a single assessment of a tumor sample, performed at some point in the course of the development and progression of the tumor, with patient characteristics being surrogates for natural history context. Differences between cells within individual tumors (intratumor heterogeneity and between tumors of different patients (intertumor heterogeneity may mean that a small sample is not representative of the tumor as a whole, particularly for solid tumors which are the focus of this paper. This issue is of increasing importance as high-throughput technologies generate large multi-feature data sets in the areas of genomics, proteomics, and image analysis. Three potential pitfalls in statistical analysis are discussed (sampling, cut-points, and validation and suggestions are made about how to avoid these pitfalls.
Image Statistics

Energy Technology Data Exchange (ETDEWEB)

Wendelberger, Laura Jean [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2017-08-08

In large datasets, it is time consuming or even impossible to pick out interesting images. Our proposed solution is to find statistics to quantify the information in each image and use those to identify and pick out images of interest.
A Sorting Statistic with Application in Neurological Magnetic Resonance Imaging of Autism.

Science.gov (United States)

Levman, Jacob; Takahashi, Emi; Forgeron, Cynthia; MacDonald, Patrick; Stewart, Natalie; Lim, Ashley; Martel, Anne

2018-01-01

Effect size refers to the assessment of the extent of differences between two groups of samples on a single measurement. Assessing effect size in medical research is typically accomplished with Cohen's d statistic. Cohen's d statistic assumes that average values are good estimators of the position of a distribution of numbers and also assumes Gaussian (or bell-shaped) underlying data distributions. In this paper, we present an alternative evaluative statistic that can quantify differences between two data distributions in a manner that is similar to traditional effect size calculations; however, the proposed approach avoids making assumptions regarding the shape of the underlying data distribution. The proposed sorting statistic is compared with Cohen's d statistic and is demonstrated to be capable of identifying feature measurements of potential interest for which Cohen's d statistic implies the measurement would be of little use. This proposed sorting statistic has been evaluated on a large clinical autism dataset from Boston Children's Hospital , Harvard Medical School , demonstrating that it can potentially play a constructive role in future healthcare technologies.
Limiting values of large deviation probabilities of quadratic statistics

NARCIS (Netherlands)

Jeurnink, Gerardus A.M.; Kallenberg, W.C.M.

1990-01-01

Application of exact Bahadur efficiencies in testing theory or exact inaccuracy rates in estimation theory needs evaluation of large deviation probabilities. Because of the complexity of the expressions, frequently a local limit of the nonlocal measure is considered. Local limits of large deviation
Balanced sampling

NARCIS (Netherlands)

Brus, D.J.

2015-01-01

In balanced sampling a linear relation between the soil property of interest and one or more covariates with known means is exploited in selecting the sampling locations. Recent developments make this sampling design attractive for statistical soil surveys. This paper introduces balanced sampling
Dark matter statistics for large galaxy catalogs: power spectra and covariance matrices

Science.gov (United States)

Klypin, Anatoly; Prada, Francisco

2018-06-01

Large-scale surveys of galaxies require accurate theoretical predictions of the dark matter clustering for thousands of mock galaxy catalogs. We demonstrate that this goal can be achieve with the new Parallel Particle-Mesh (PM) N-body code GLAM at a very low computational cost. We run ˜22, 000 simulations with ˜2 billion particles that provide ˜1% accuracy of the dark matter power spectra P(k) for wave-numbers up to k ˜ 1hMpc-1. Using this large data-set we study the power spectrum covariance matrix. In contrast to many previous analytical and numerical results, we find that the covariance matrix normalised to the power spectrum C(k, k΄)/P(k)P(k΄) has a complex structure of non-diagonal components: an upturn at small k, followed by a minimum at k ≈ 0.1 - 0.2 hMpc-1, and a maximum at k ≈ 0.5 - 0.6 hMpc-1. The normalised covariance matrix strongly evolves with redshift: C(k, k΄)∝δα(t)P(k)P(k΄), where δ is the linear growth factor and α ≈ 1 - 1.25, which indicates that the covariance matrix depends on cosmological parameters. We also show that waves longer than 1h-1Gpc have very little impact on the power spectrum and covariance matrix. This significantly reduces the computational costs and complexity of theoretical predictions: relatively small volume ˜(1h-1Gpc)3 simulations capture the necessary properties of dark matter clustering statistics. As our results also indicate, achieving ˜1% errors in the covariance matrix for k < 0.50 hMpc-1 requires a resolution better than ɛ ˜ 0.5h-1Mpc.
Peroral endoscopic myotomy can improve esophageal motility in patients with achalasia from a large sample self-control research (66 patients.

Directory of Open Access Journals (Sweden)

Shuangzhe Yao

Full Text Available Peroral endoscopic myotomy (POEM as a new approach to achalasia attracts broad attention. The primary objective of this study was to evaluate the results with esophageal motility after POEM through the first large sample clinical research.We have a self-control research with all patients (205 in total who underwent POEM from 2010 to 2014 at our Digestive Endoscopic Center, 66 patients of which underwent high resolution manometry (HRM before and after POEM in our motility laboratory. Follow-ups last for 5.6 months on average. Outcome variables analyzed included upper esophageal sphincter pressure (UESP, upper esophageal sphincter residual pressure (UESRP, lower esophageal sphincter pressure (LESP, lower esophageal sphincter residual pressure (LESRP and esophageal body peristalsis. We have a statistical analysis to illustrate how POEM impacts on the change of esophageal motility.The symptoms related to dysphagia were relieved in 95% of patients in recent term after POEM. While HRM showed a statistically significant reduction of URSRP, LESP and LESRP (P0.05 did not occur for these two groups on LESP and LESRP reduction.POEM clearly relieved the symptoms related to dysphagia by lowering the pressure of upper esophageal sphincter (UES and lower esophageal sphincter (LES,and other endoscopic treatment before POEM did not affect the improvement of LES pressure. These results are concluded from our short-term follow-up study, while the long-term efficacy remains to be further illustrated.Chinese Clinical Trial Register ChiCTR-TRC-12002204.
The Sloan Digital Sky Survey Quasar Lens Search. IV. Statistical Lens Sample from the Fifth Data Release

Energy Technology Data Exchange (ETDEWEB)

Inada, Naohisa; /Wako, RIKEN /Tokyo U., ICEPP; Oguri, Masamune; /Natl. Astron. Observ. of Japan /Stanford U., Phys. Dept.; Shin, Min-Su; /Michigan U. /Princeton U. Observ.; Kayo, Issha; /Tokyo U., ICRR; Strauss, Michael A.; /Princeton U. Observ.; Hennawi, Joseph F.; /UC, Berkeley /Heidelberg, Max Planck Inst. Astron.; Morokuma, Tomoki; /Natl. Astron. Observ. of Japan; Becker, Robert H.; /LLNL, Livermore /UC, Davis; White, Richard L.; /Baltimore, Space Telescope Sci.; Kochanek, Christopher S.; /Ohio State U.; Gregg, Michael D.; /LLNL, Livermore /UC, Davis /Exeter U.

2010-05-01

We present the second report of our systematic search for strongly lensed quasars from the data of the Sloan Digital Sky Survey (SDSS). From extensive follow-up observations of 136 candidate objects, we find 36 lenses in the full sample of 77,429 spectroscopically confirmed quasars in the SDSS Data Release 5. We then define a complete sample of 19 lenses, including 11 from our previous search in the SDSS Data Release 3, from the sample of 36,287 quasars with i < 19.1 in the redshift range 0.6 < z < 2.2, where we require the lenses to have image separations of 1 < {theta} < 20 and i-band magnitude differences between the two images smaller than 1.25 mag. Among the 19 lensed quasars, 3 have quadruple-image configurations, while the remaining 16 show double images. This lens sample constrains the cosmological constant to be {Omega}{sub {Lambda}} = 0.84{sub -0.08}{sup +0.06}(stat.){sub -0.07}{sup + 0.09}(syst.) assuming a flat universe, which is in good agreement with other cosmological observations. We also report the discoveries of 7 binary quasars with separations ranging from 1.1 to 16.6, which are identified in the course of our lens survey. This study concludes the construction of our statistical lens sample in the full SDSS-I data set.
Statistical analyses of scatterplots to identify important factors in large-scale simulations, 1: Review and comparison of techniques

International Nuclear Information System (INIS)

Kleijnen, J.P.C.; Helton, J.C.

1999-01-01

Procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses are described and illustrated. These procedures attempt to detect increasingly complex patterns in scatterplots and involve the identification of (i) linear relationships with correlation coefficients, (ii) monotonic relationships with rank correlation coefficients, (iii) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (iv) trends in variability as defined by variances and interquartile ranges, and (v) deviations from randomness as defined by the chi-square statistic. A sequence of example analyses with a large model for two-phase fluid flow illustrates how the individual procedures can differ in the variables that they identify as having effects on particular model outcomes. The example analyses indicate that the use of a sequence of procedures is a good analysis strategy and provides some assurance that an important effect is not overlooked
Large-volume constant-concentration sampling technique coupling with surface-enhanced Raman spectroscopy for rapid on-site gas analysis

Science.gov (United States)

Zhang, Zhuomin; Zhan, Yisen; Huang, Yichun; Li, Gongke

2017-08-01

In this work, a portable large-volume constant-concentration (LVCC) sampling technique coupling with surface-enhanced Raman spectroscopy (SERS) was developed for the rapid on-site gas analysis based on suitable derivatization methods. LVCC sampling technique mainly consisted of a specially designed sampling cell including the rigid sample container and flexible sampling bag, and an absorption-derivatization module with a portable pump and a gas flowmeter. LVCC sampling technique allowed large, alterable and well-controlled sampling volume, which kept the concentration of gas target in headspace phase constant during the entire sampling process and made the sampling result more representative. Moreover, absorption and derivatization of gas target during LVCC sampling process were efficiently merged in one step using bromine-thiourea and OPA-NH4+ strategy for ethylene and SO2 respectively, which made LVCC sampling technique conveniently adapted to consequent SERS analysis. Finally, a new LVCC sampling-SERS method was developed and successfully applied for rapid analysis of trace ethylene and SO2 from fruits. It was satisfied that trace ethylene and SO2 from real fruit samples could be actually and accurately quantified by this method. The minor concentration fluctuations of ethylene and SO2 during the entire LVCC sampling process were proved to be samples were achieved in range of 95.0-101% and 97.0-104% respectively. It is expected that portable LVCC sampling technique would pave the way for rapid on-site analysis of accurate concentrations of trace gas targets from real samples by SERS.
Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics

Science.gov (United States)

Dowding, Irene; Haufe, Stefan

2018-01-01

Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885
Remote sensing estimation of the total phosphorus concentration in a large lake using band combinations and regional multivariate statistical modeling techniques.

Science.gov (United States)

Gao, Yongnian; Gao, Junfeng; Yin, Hongbin; Liu, Chuansheng; Xia, Ting; Wang, Jing; Huang, Qi

2015-03-15

Remote sensing has been widely used for ater quality monitoring, but most of these monitoring studies have only focused on a few water quality variables, such as chlorophyll-a, turbidity, and total suspended solids, which have typically been considered optically active variables. Remote sensing presents a challenge in estimating the phosphorus concentration in water. The total phosphorus (TP) in lakes has been estimated from remotely sensed observations, primarily using the simple individual band ratio or their natural logarithm and the statistical regression method based on the field TP data and the spectral reflectance. In this study, we investigated the possibility of establishing a spatial modeling scheme to estimate the TP concentration of a large lake from multi-spectral satellite imagery using band combinations and regional multivariate statistical modeling techniques, and we tested the applicability of the spatial modeling scheme. The results showed that HJ-1A CCD multi-spectral satellite imagery can be used to estimate the TP concentration in a lake. The correlation and regression analysis showed a highly significant positive relationship between the TP concentration and certain remotely sensed combination variables. The proposed modeling scheme had a higher accuracy for the TP concentration estimation in the large lake compared with the traditional individual band ratio method and the whole-lake scale regression-modeling scheme. The TP concentration values showed a clear spatial variability and were high in western Lake Chaohu and relatively low in eastern Lake Chaohu. The northernmost portion, the northeastern coastal zone and the southeastern portion of western Lake Chaohu had the highest TP concentrations, and the other regions had the lowest TP concentration values, except for the coastal zone of eastern Lake Chaohu. These results strongly suggested that the proposed modeling scheme, i.e., the band combinations and the regional multivariate
A study of diabetes mellitus within a large sample of Australian twins

DEFF Research Database (Denmark)

Condon, Julianne; Shaw, Joanne E; Luciano, Michelle

2008-01-01

with type 2 diabetes (T2D), 41 female pairs with gestational diabetes (GD), 5 pairs with impaired glucose tolerance (IGT) and one pair with MODY. Heritabilities of T1D, T2D and GD were all high, but our samples did not have the power to detect effects of shared environment unless they were very large......Twin studies of diabetes mellitus can help elucidate genetic and environmental factors in etiology and can provide valuable biological samples for testing functional hypotheses, for example using expression and methylation studies of discordant pairs. We searched the volunteer Australian Twin...... Registry (19,387 pairs) for twins with diabetes using disease checklists from nine different surveys conducted from 1980-2000. After follow-up questionnaires to the twins and their doctors to confirm diagnoses, we eventually identified 46 pairs where one or both had type 1 diabetes (T1D), 113 pairs...
Applied systems ecology: models, data, and statistical methods

Energy Technology Data Exchange (ETDEWEB)

Eberhardt, L L

1976-01-01

In this report, systems ecology is largely equated to mathematical or computer simulation modelling. The need for models in ecology stems from the necessity to have an integrative device for the diversity of ecological data, much of which is observational, rather than experimental, as well as from the present lack of a theoretical structure for ecology. Different objectives in applied studies require specialized methods. The best predictive devices may be regression equations, often non-linear in form, extracted from much more detailed models. A variety of statistical aspects of modelling, including sampling, are discussed. Several aspects of population dynamics and food-chain kinetics are described, and it is suggested that the two presently separated approaches should be combined into a single theoretical framework. It is concluded that future efforts in systems ecology should emphasize actual data and statistical methods, as well as modelling.
Is Business Failure Due to Lack of Effort? Empirical Evidence from a Large Administrative Sample

NARCIS (Netherlands)

Ejrnaes, M.; Hochguertel, S.

2013-01-01

Does insurance provision reduce entrepreneurs' effort to avoid business failure? We exploit unique features of the voluntary Danish unemployment insurance (UI) scheme, that is available to the self-employed. Using a large sample of self-employed individuals, we estimate the causal effect of
In-situ high resolution particle sampling by large time sequence inertial spectrometry

International Nuclear Information System (INIS)

Prodi, V.; Belosi, F.

1990-09-01

In situ sampling is always preferred, when possible, because of the artifacts that can arise when the aerosol has to flow through long sampling lines. On the other hand, the amount of possible losses can be calculated with some confidence only when the size distribution can be measured with a sufficient precision and the losses are not too large. This makes it desirable to sample directly in the vicinity of the aerosol source or containment. High temperature sampling devices with a detailed aerodynamic separation are extremely useful to this purpose. Several measurements are possible with the inertial spectrometer (INSPEC), but not with cascade impactors or cyclones. INSPEC - INertial SPECtrometer - has been conceived to measure the size distribution of aerosols by separating the particles while airborne according to their size and collecting them on a filter. It consists of a channel of rectangular cross-section with a 90 degree bend. Clean air is drawn through the channel, with a thin aerosol sheath injected close to the inner wall. Due to the bend, the particles are separated according to their size, leaving the original streamline by a distance which is a function of particle inertia and resistance, i.e. of aerodynamic diameter. The filter collects all the particles of the same aerodynamic size at the same distance from the inlet, in a continuous distribution. INSPEC particle separation at high temperature (up to 800 C) has been tested with Zirconia particles as calibration aerosols. The feasibility study has been concerned with resolution and time sequence sampling capabilities under high temperature (700 C)

MANAGERIAL DECISION IN INNOVATIVE EDUCATION SYSTEMS STATISTICAL SURVEY BASED ON SAMPLE THEORY

Directory of Open Access Journals (Sweden)

Gheorghe SĂVOIU

2012-12-01

Full Text Available Before formulating the statistical hypotheses and the econometrictesting itself, a breakdown of some of the technical issues is required, which are related to managerial decision in innovative educational systems, the educational managerial phenomenon tested through statistical and mathematical methods, respectively the significant difference in perceiving the current qualities, knowledge, experience, behaviour and desirable health, obtained through a questionnaire applied to a stratified population at the end,in the educational environment, either with educational activities, or with simultaneously managerial and educational activities. The details having to do with research focused on the survey theory, turning into a working tool the questionnaires and statistical data that are processed from those questionnaires, are summarized below.
Field sampling, preparation procedure and plutonium analyses of large freshwater samples

International Nuclear Information System (INIS)

Straelberg, E.; Bjerk, T.O.; Oestmo, K.; Brittain, J.E.

2002-01-01

This work is part of an investigation of the mobility of plutonium in freshwater systems containing humic substances. A well-defined bog-stream system located in the catchment area of a subalpine lake, Oevre Heimdalsvatn, Norway, is being studied. During the summer of 1999, six water samples were collected from the tributary stream Lektorbekken and the lake itself. However, the analyses showed that the plutonium concentration was below the detection limit in all the samples. Therefore renewed sampling at the same sites was carried out in August 2000. The results so far are in agreement with previous analyses from the Heimdalen area. However, 100 times higher concentrations are found in the lowlands in the eastern part of Norway. The reason for this is not understood, but may be caused by differences in the concentrations of humic substances and/or the fact that the mountain areas are covered with snow for a longer period of time every year. (LN)
A topological analysis of large-scale structure, studied using the CMASS sample of SDSS-III

International Nuclear Information System (INIS)

Parihar, Prachi; Gott, J. Richard III; Vogeley, Michael S.; Choi, Yun-Young; Kim, Juhan; Kim, Sungsoo S.; Speare, Robert; Brownstein, Joel R.; Brinkmann, J.

2014-01-01

We study the three-dimensional genus topology of large-scale structure using the northern region of the CMASS Data Release 10 (DR10) sample of the SDSS-III Baryon Oscillation Spectroscopic Survey. We select galaxies with redshift 0.452 < z < 0.625 and with a stellar mass M stellar > 10 11.56 M ☉ . We study the topology at two smoothing lengths: R G = 21 h –1 Mpc and R G = 34 h –1 Mpc. The genus topology studied at the R G = 21 h –1 Mpc scale results in the highest genus amplitude observed to date. The CMASS sample yields a genus curve that is characteristic of one produced by Gaussian random phase initial conditions. The data thus support the standard model of inflation where random quantum fluctuations in the early universe produced Gaussian random phase initial conditions. Modest deviations in the observed genus from random phase are as expected from shot noise effects and the nonlinear evolution of structure. We suggest the use of a fitting formula motivated by perturbation theory to characterize the shift and asymmetries in the observed genus curve with a single parameter. We construct 54 mock SDSS CMASS surveys along the past light cone from the Horizon Run 3 (HR3) N-body simulations, where gravitationally bound dark matter subhalos are identified as the sites of galaxy formation. We study the genus topology of the HR3 mock surveys with the same geometry and sampling density as the observational sample and find the observed genus topology to be consistent with ΛCDM as simulated by the HR3 mock samples. We conclude that the topology of the large-scale structure in the SDSS CMASS sample is consistent with cosmological models having primordial Gaussian density fluctuations growing in accordance with general relativity to form galaxies in massive dark matter halos.
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining

Science.gov (United States)

Hero, Alfred O.; Rajaratnam, Bala

2015-01-01

When can reliable inference be drawn in fue “Big Data” context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for “Big Data”. Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks. PMID:27087700
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining.

Science.gov (United States)

Hero, Alfred O; Rajaratnam, Bala

2016-01-01

When can reliable inference be drawn in fue "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data". Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.
Gaussian vs. Bessel light-sheets: performance analysis in live large sample imaging

Science.gov (United States)

Reidt, Sascha L.; Correia, Ricardo B. C.; Donnachie, Mark; Weijer, Cornelis J.; MacDonald, Michael P.

2017-08-01

Lightsheet fluorescence microscopy (LSFM) has rapidly progressed in the past decade from an emerging technology into an established methodology. This progress has largely been driven by its suitability to developmental biology, where it is able to give excellent spatial-temporal resolution over relatively large fields of view with good contrast and low phototoxicity. In many respects it is superseding confocal microscopy. However, it is no magic bullet and still struggles to image deeply in more highly scattering samples. Many solutions to this challenge have been presented, including, Airy and Bessel illumination, 2-photon operation and deconvolution techniques. In this work, we show a comparison between a simple but effective Gaussian beam illumination and Bessel illumination for imaging in chicken embryos. Whilst Bessel illumination is shown to be of benefit when a greater depth of field is required, it is not possible to see any benefits for imaging into the highly scattering tissue of the chick embryo.
Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

Directory of Open Access Journals (Sweden)

Ujjwal Maulik

Full Text Available Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution. The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post
Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

Science.gov (United States)

Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

2015-01-01

Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data
Religion and the Unmaking of Prejudice toward Muslims: Evidence from a Large National Sample

Science.gov (United States)

Shaver, John H.; Troughton, Geoffrey; Sibley, Chris G.; Bulbulia, Joseph A.

2016-01-01

In the West, anti-Muslim sentiments are widespread. It has been theorized that inter-religious tensions fuel anti-Muslim prejudice, yet previous attempts to isolate sectarian motives have been inconclusive. Factors contributing to ambiguous results are: (1) failures to assess and adjust for multi-level denomination effects; (2) inattention to demographic covariates; (3) inadequate methods for comparing anti-Muslim prejudice relative to other minority group prejudices; and (4) ad hoc theories for the mechanisms that underpin prejudice and tolerance. Here we investigate anti-Muslim prejudice using a large national sample of non-Muslim New Zealanders (N = 13,955) who responded to the 2013 New Zealand Attitudes and Values Study. We address previous shortcomings by: (1) building Bayesian multivariate, multi-level regression models with denominations modeled as random effects; (2) including high-resolution demographic information that adjusts for factors known to influence prejudice; (3) simultaneously evaluating the relative strength of anti-Muslim prejudice by comparing it to anti-Arab prejudice and anti-immigrant prejudice within the same statistical model; and (4) testing predictions derived from the Evolutionary Lag Theory of religious prejudice and tolerance. This theory predicts that in countries such as New Zealand, with historically low levels of conflict, religion will tend to increase tolerance generally, and extend to minority religious groups. Results show that anti-Muslim and anti-Arab sentiments are confounded, widespread, and substantially higher than anti-immigrant sentiments. In support of the theory, the intensity of religious commitments was associated with a general increase in tolerance toward minority groups, including a poorly tolerated religious minority group: Muslims. Results clarify religion’s power to enhance tolerance in peaceful societies that are nevertheless afflicted by prejudice. PMID:26959976
Religion and the Unmaking of Prejudice toward Muslims: Evidence from a Large National Sample.

Science.gov (United States)

Shaver, John H; Troughton, Geoffrey; Sibley, Chris G; Bulbulia, Joseph A

2016-01-01

In the West, anti-Muslim sentiments are widespread. It has been theorized that inter-religious tensions fuel anti-Muslim prejudice, yet previous attempts to isolate sectarian motives have been inconclusive. Factors contributing to ambiguous results are: (1) failures to assess and adjust for multi-level denomination effects; (2) inattention to demographic covariates; (3) inadequate methods for comparing anti-Muslim prejudice relative to other minority group prejudices; and (4) ad hoc theories for the mechanisms that underpin prejudice and tolerance. Here we investigate anti-Muslim prejudice using a large national sample of non-Muslim New Zealanders (N = 13,955) who responded to the 2013 New Zealand Attitudes and Values Study. We address previous shortcomings by: (1) building Bayesian multivariate, multi-level regression models with denominations modeled as random effects; (2) including high-resolution demographic information that adjusts for factors known to influence prejudice; (3) simultaneously evaluating the relative strength of anti-Muslim prejudice by comparing it to anti-Arab prejudice and anti-immigrant prejudice within the same statistical model; and (4) testing predictions derived from the Evolutionary Lag Theory of religious prejudice and tolerance. This theory predicts that in countries such as New Zealand, with historically low levels of conflict, religion will tend to increase tolerance generally, and extend to minority religious groups. Results show that anti-Muslim and anti-Arab sentiments are confounded, widespread, and substantially higher than anti-immigrant sentiments. In support of the theory, the intensity of religious commitments was associated with a general increase in tolerance toward minority groups, including a poorly tolerated religious minority group: Muslims. Results clarify religion's power to enhance tolerance in peaceful societies that are nevertheless afflicted by prejudice.
Statistical correlations in an ideal gas of particles obeying fractional exclusion statistics.

Science.gov (United States)

Pellegrino, F M D; Angilella, G G N; March, N H; Pucci, R

2007-12-01

After a brief discussion of the concepts of fractional exchange and fractional exclusion statistics, we report partly analytical and partly numerical results on thermodynamic properties of assemblies of particles obeying fractional exclusion statistics. The effect of dimensionality is one focal point, the ratio mu/k_(B)T of chemical potential to thermal energy being obtained numerically as a function of a scaled particle density. Pair correlation functions are also presented as a function of the statistical parameter, with Friedel oscillations developing close to the fermion limit, for sufficiently large density.
Accelerating inference for diffusions observed with measurement error and large sample sizes using approximate Bayesian computation

DEFF Research Database (Denmark)

Picchini, Umberto; Forman, Julie Lyng

2016-01-01

a nonlinear stochastic differential equation model observed with correlated measurement errors and an application to protein folding modelling. An approximate Bayesian computation (ABC)-MCMC algorithm is suggested to allow inference for model parameters within reasonable time constraints. The ABC algorithm......In recent years, dynamical modelling has been provided with a range of breakthrough methods to perform exact Bayesian inference. However, it is often computationally unfeasible to apply exact statistical methodologies in the context of large data sets and complex models. This paper considers...... applications. A simulation study is conducted to compare our strategy with exact Bayesian inference, the latter resulting two orders of magnitude slower than ABC-MCMC for the considered set-up. Finally, the ABC algorithm is applied to a large size protein data. The suggested methodology is fairly general...
Waardenburg syndrome: Novel mutations in a large Brazilian sample.

Science.gov (United States)

Bocángel, Magnolia Astrid Pretell; Melo, Uirá Souto; Alves, Leandro Ucela; Pardono, Eliete; Lourenço, Naila Cristina Vilaça; Marcolino, Humberto Vicente Cezar; Otto, Paulo Alberto; Mingroni-Netto, Regina Célia

2018-06-01

This paper deals with the molecular investigation of Waardenburg syndrome (WS) in a sample of 49 clinically diagnosed probands (most from southeastern Brazil), 24 of them having the type 1 (WS1) variant (10 familial and 14 isolated cases) and 25 being affected by the type 2 (WS2) variant (five familial and 20 isolated cases). Sequential Sanger sequencing of all coding exons of PAX3, MITF, EDN3, EDNRB, SOX10 and SNAI2 genes, followed by CNV detection by MLPA of PAX3, MITF and SOX10 genes in selected cases revealed many novel pathogenic variants. Molecular screening, performed in all patients, revealed 19 causative variants (19/49 = 38.8%), six of them being large whole-exon deletions detected by MLPA, seven (four missense and three nonsense substitutions) resulting from single nucleotide substitutions (SNV), and six representing small indels. A pair of dizygotic affected female twins presented the c.430delC variant in SOX10, but the mutation, imputed to gonadal mosaicism, was not found in their unaffected parents. At least 10 novel causative mutations, described in this paper, were found in this Brazilian sample. Copy-number-variation detected by MLPA identified the causative mutation in 12.2% of our cases, corresponding to 31.6% of all causative mutations. In the majority of cases, the deletions were sporadic, since they were not present in the parents of isolated cases. Our results, as a whole, reinforce the fact that the screening of copy-number-variants by MLPA is a powerful tool to identify the molecular cause in WS patients. Copyright © 2018 Elsevier Masson SAS. All rights reserved.
Statistical processing of large image sequences.

Science.gov (United States)

Khellah, F; Fieguth, P; Murray, M J; Allen, M

2005-01-01

The dynamic estimation of large-scale stochastic image sequences, as frequently encountered in remote sensing, is important in a variety of scientific applications. However, the size of such images makes conventional dynamic estimation methods, for example, the Kalman and related filters, impractical. In this paper, we present an approach that emulates the Kalman filter, but with considerably reduced computational and storage requirements. Our approach is illustrated in the context of a 512 x 512 image sequence of ocean surface temperature. The static estimation step, the primary contribution here, uses a mixture of stationary models to accurately mimic the effect of a nonstationary prior, simplifying both computational complexity and modeling. Our approach provides an efficient, stable, positive-definite model which is consistent with the given correlation structure. Thus, the methods of this paper may find application in modeling and single-frame estimation.
Tracing the trajectory of skill learning with a very large sample of online game players.

Science.gov (United States)

Stafford, Tom; Dewar, Michael

2014-02-01

In the present study, we analyzed data from a very large sample (N = 854,064) of players of an online game involving rapid perception, decision making, and motor responding. Use of game data allowed us to connect, for the first time, rich details of training history with measures of performance from participants engaged for a sustained amount of time in effortful practice. We showed that lawful relations exist between practice amount and subsequent performance, and between practice spacing and subsequent performance. Our methodology allowed an in situ confirmation of results long established in the experimental literature on skill acquisition. Additionally, we showed that greater initial variation in performance is linked to higher subsequent performance, a result we link to the exploration/exploitation trade-off from the computational framework of reinforcement learning. We discuss the benefits and opportunities of behavioral data sets with very large sample sizes and suggest that this approach could be particularly fecund for studies of skill acquisition.
Imaging a Large Sample with Selective Plane Illumination Microscopy Based on Multiple Fluorescent Microsphere Tracking

Science.gov (United States)

Ryu, Inkeon; Kim, Daekeun

2018-04-01

A typical selective plane illumination microscopy (SPIM) image size is basically limited by the field of view, which is a characteristic of the objective lens. If an image larger than the imaging area of the sample is to be obtained, image stitching, which combines step-scanned images into a single panoramic image, is required. However, accurately registering the step-scanned images is very difficult because the SPIM system uses a customized sample mount where uncertainties for the translational and the rotational motions exist. In this paper, an image registration technique based on multiple fluorescent microsphere tracking is proposed in the view of quantifying the constellations and measuring the distances between at least two fluorescent microspheres embedded in the sample. Image stitching results are demonstrated for optically cleared large tissue with various staining methods. Compensation for the effect of the sample rotation that occurs during the translational motion in the sample mount is also discussed.
Statistical and Machine Learning forecasting methods: Concerns and ways forward

Science.gov (United States)

Makridakis, Spyros; Assimakopoulos, Vassilios

2018-01-01

Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions. PMID:29584784
Statistical and Machine Learning forecasting methods: Concerns and ways forward.

Science.gov (United States)

Makridakis, Spyros; Spiliotis, Evangelos; Assimakopoulos, Vassilios

2018-01-01

Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions.
Inverse Statistics and Asset Allocation Efficiency

Science.gov (United States)

Bolgorian, Meysam

In this paper using inverse statistics analysis, the effect of investment horizon on the efficiency of portfolio selection is examined. Inverse statistics analysis is a general tool also known as probability distribution of exit time that is used for detecting the distribution of the time in which a stochastic process exits from a zone. This analysis was used in Refs. 1 and 2 for studying the financial returns time series. This distribution provides an optimal investment horizon which determines the most likely horizon for gaining a specific return. Using samples of stocks from Tehran Stock Exchange (TSE) as an emerging market and S&P 500 as a developed market, effect of optimal investment horizon in asset allocation is assessed. It is found that taking into account the optimal investment horizon in TSE leads to more efficiency for large size portfolios while for stocks selected from S&P 500, regardless of portfolio size, this strategy does not only not produce more efficient portfolios, but also longer investment horizons provides more efficiency.
SWOT ANALYSIS ON SAMPLING METHOD

Directory of Open Access Journals (Sweden)

CHIS ANCA OANA

2014-07-01

Full Text Available Audit sampling involves the application of audit procedures to less than 100% of items within an account balance or class of transactions. Our article aims to study audit sampling in audit of financial statements. As an audit technique largely used, in both its statistical and nonstatistical form, the method is very important for auditors. It should be applied correctly for a fair view of financial statements, to satisfy the needs of all financial users. In order to be applied correctly the method must be understood by all its users and mainly by auditors. Otherwise the risk of not applying it correctly would cause loose of reputation and discredit, litigations and even prison. Since there is not a unitary practice and methodology for applying the technique, the risk of incorrectly applying it is pretty high. The SWOT analysis is a technique used that shows the advantages, disadvantages, threats and opportunities. We applied SWOT analysis in studying the sampling method, from the perspective of three players: the audit company, the audited entity and users of financial statements. The study shows that by applying the sampling method the audit company and the audited entity both save time, effort and money. The disadvantages of the method are difficulty in applying and understanding its insight. Being largely used as an audit method and being a factor of a correct audit opinion, the sampling method’s advantages, disadvantages, threats and opportunities must be understood by auditors.

The Brief Negative Symptom Scale (BNSS): Independent validation in a large sample of Italian patients with schizophrenia.

Science.gov (United States)

Mucci, A; Galderisi, S; Merlotti, E; Rossi, A; Rocca, P; Bucci, P; Piegari, G; Chieffi, M; Vignapiano, A; Maj, M

2015-07-01

The Brief Negative Symptom Scale (BNSS) was developed to address the main limitations of the existing scales for the assessment of negative symptoms of schizophrenia. The initial validation of the scale by the group involved in its development demonstrated good convergent and discriminant validity, and a factor structure confirming the two domains of negative symptoms (reduced emotional/verbal expression and anhedonia/asociality/avolition). However, only relatively small samples of patients with schizophrenia were investigated. Further independent validation in large clinical samples might be instrumental to the broad diffusion of the scale in clinical research. The present study aimed to examine the BNSS inter-rater reliability, convergent/discriminant validity and factor structure in a large Italian sample of outpatients with schizophrenia. Our results confirmed the excellent inter-rater reliability of the BNSS (the intraclass correlation coefficient ranged from 0.81 to 0.98 for individual items and was 0.98 for the total score). The convergent validity measures had r values from 0.62 to 0.77, while the divergent validity measures had r values from 0.20 to 0.28 in the main sample (n=912) and in a subsample without clinically significant levels of depression and extrapyramidal symptoms (n=496). The BNSS factor structure was supported in both groups. The study confirms that the BNSS is a promising measure for quantifying negative symptoms of schizophrenia in large multicenter clinical studies. Copyright © 2015 Elsevier Masson SAS. All rights reserved.
Statistical concepts a second course

CERN Document Server

Lomax, Richard G

2012-01-01

Statistical Concepts consists of the last 9 chapters of An Introduction to Statistical Concepts, 3rd ed. Designed for the second course in statistics, it is one of the few texts that focuses just on intermediate statistics. The book highlights how statistics work and what they mean to better prepare students to analyze their own data and interpret SPSS and research results. As such it offers more coverage of non-parametric procedures used when standard assumptions are violated since these methods are more frequently encountered when working with real data. Determining appropriate sample sizes
Estimation of global network statistics from incomplete data.

Directory of Open Access Journals (Sweden)

Catherine A Bliss

Full Text Available Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week.
An introduction to statistical computing a simulation-based approach

CERN Document Server

Voss, Jochen

2014-01-01

A comprehensive introduction to sampling-based methods in statistical computing The use of computers in mathematics and statistics has opened up a wide range of techniques for studying otherwise intractable problems. Sampling-based simulation techniques are now an invaluable tool for exploring statistical models. This book gives a comprehensive introduction to the exciting area of sampling-based methods. An Introduction to Statistical Computing introduces the classical topics of random number generation and Monte Carlo methods. It also includes some advanced met
Practical Statistics for the LHC

CERN Document Server

Cranmer, Kyle

2015-05-22

This document is a pedagogical introduction to statistics for particle physics. Emphasis is placed on the terminology, concepts, and methods being used at the Large Hadron Collider. The document addresses both the statistical tests applied to a model of the data and the modeling itself.
A statistical evaluation of asbestos air concentrations

International Nuclear Information System (INIS)

Lange, J.H.

1999-01-01

Both area and personal air samples collected during an asbestos abatement project were matched and statistically analysed. Among the many parameters studied were fibre concentrations and their variability. Mean values for area and personal samples were 0.005 and 0.024 f cm - - 3 of air, respectively. Summary values for area and personal samples suggest that exposures are low with no single exposure value exceeding the current OSHA TWA value of 0.1 f cm -3 of air. Within- and between-worker analysis suggests that these data are homogeneous. Comparison of within- and between-worker values suggests that the exposure source and variability for abatement are more related to the process than individual practices. This supports the importance of control measures for abatement. Study results also suggest that area and personal samples are not statistically related, that is, there is no association observed for these two sampling methods when data are analysed by correlation or regression analysis. Personal samples were statistically higher in concentration than area samples. Area sampling cannot be used as a surrogate exposure for asbestos abatement workers. (author)
Alignment between galaxies and large-scale structure

International Nuclear Information System (INIS)

Faltenbacher, A.; Li Cheng; White, Simon D. M.; Jing, Yi-Peng; Mao Shude; Wang Jie

2009-01-01

Based on the Sloan Digital Sky Survey DR6 (SDSS) and the Millennium Simulation (MS), we investigate the alignment between galaxies and large-scale structure. For this purpose, we develop two new statistical tools, namely the alignment correlation function and the cos(2θ)-statistic. The former is a two-dimensional extension of the traditional two-point correlation function and the latter is related to the ellipticity correlation function used for cosmic shear measurements. Both are based on the cross correlation between a sample of galaxies with orientations and a reference sample which represents the large-scale structure. We apply the new statistics to the SDSS galaxy catalog. The alignment correlation function reveals an overabundance of reference galaxies along the major axes of red, luminous (L ∼ * ) galaxies out to projected separations of 60 h- 1 Mpc. The signal increases with central galaxy luminosity. No alignment signal is detected for blue galaxies. The cos(2θ)-statistic yields very similar results. Starting from a MS semi-analytic galaxy catalog, we assign an orientation to each red, luminous and central galaxy, based on that of the central region of the host halo (with size similar to that of the stellar galaxy). As an alternative, we use the orientation of the host halo itself. We find a mean projected misalignment between a halo and its central region of ∼ 25 deg. The misalignment decreases slightly with increasing luminosity of the central galaxy. Using the orientations and luminosities of the semi-analytic galaxies, we repeat our alignment analysis on mock surveys of the MS. Agreement with the SDSS results is good if the central orientations are used. Predictions using the halo orientations as proxies for central galaxy orientations overestimate the observed alignment by more than a factor of 2. Finally, the large volume of the MS allows us to generate a two-dimensional map of the alignment correlation function, which shows the reference
Sampling the Mouse Hippocampal Dentate Gyrus

OpenAIRE

Lisa Basler; Lisa Basler; Stephan Gerdes; David P. Wolfer; David P. Wolfer; David P. Wolfer; Lutz Slomianka; Lutz Slomianka

2017-01-01

Sampling is a critical step in procedures that generate quantitative morphological data in the neurosciences. Samples need to be representative to allow statistical evaluations, and samples need to deliver a precision that makes statistical evaluations not only possible but also meaningful. Sampling generated variability should, e.g., not be able to hide significant group differences from statistical detection if they are present. Estimators of the coefficient of error (CE) have been develope...
Validation of the MOS Social Support Survey 6-item (MOS-SSS-6) measure with two large population-based samples of Australian women.

Science.gov (United States)

Holden, Libby; Lee, Christina; Hockey, Richard; Ware, Robert S; Dobson, Annette J

2014-12-01

This study aimed to validate a 6-item 1-factor global measure of social support developed from the Medical Outcomes Study Social Support Survey (MOS-SSS) for use in large epidemiological studies. Data were obtained from two large population-based samples of participants in the Australian Longitudinal Study on Women's Health. The two cohorts were aged 53-58 and 28-33 years at data collection (N = 10,616 and 8,977, respectively). Items selected for the 6-item 1-factor measure were derived from the factor structure obtained from unpublished work using an earlier wave of data from one of these cohorts. Descriptive statistics, including polychoric correlations, were used to describe the abbreviated scale. Cronbach's alpha was used to assess internal consistency and confirmatory factor analysis to assess scale validity. Concurrent validity was assessed using correlations between the new 6-item version and established 19-item version, and other concurrent variables. In both cohorts, the new 6-item 1-factor measure showed strong internal consistency and scale reliability. It had excellent goodness-of-fit indices, similar to those of the established 19-item measure. Both versions correlated similarly with concurrent measures. The 6-item 1-factor MOS-SSS measures global functional social support with fewer items than the established 19-item measure.
Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data.

Science.gov (United States)

Bhaskar, Anand; Wang, Y X Rachel; Song, Yun S

2015-02-01

With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions. © 2015 Bhaskar et al.; Published by Cold Spring Harbor Laboratory Press.
Ad hoc statistical consulting within a large research organization

CSIR Research Space (South Africa)

Elphinstone, CD

2009-08-01

Full Text Available requests were growing to the extent where it was difficult to manage them together with project and research workload. Also, the access to computing and some basic statistical literacy meant that a high proportion of advanced queries were received.... The challenge was to achieve this in a cost effective way with limited financial and personnel resources. Experience Some of the challenges experienced with the HotSeat service: • Researchers consulting with a statistician after the data is collected...
Effects of baryons on the statistical properties of large scale structure of the Universe

International Nuclear Information System (INIS)

Guillet, T.

2010-01-01

Observations of weak gravitational lensing will provide strong constraints on the cosmic expansion history and the growth rate of large scale structure, yielding clues to the properties and nature of dark energy. Their interpretation is impacted by baryonic physics, which are expected to modify the total matter distribution at small scales. My work has focused on determining and modeling the impact of baryons on the statistics of the large scale matter distribution in the Universe. Using numerical simulations, I have extracted the effect of baryons on the power spectrum, variance and skewness of the total density field as predicted by these simulations. I have shown that a model based on the halo model construction, featuring a concentrated central component to account for cool condensed baryons, is able to reproduce accurately, and down to very small scales, the measured amplifications of both the variance and skewness of the density field. Because of well-known issues with baryons in current cosmological simulations, I have extended the central component model to rely on as many observation-based ingredients as possible. As an application, I have studied the effect of baryons on the predictions of the upcoming Euclid weak lensing survey. During the course of this work, I have also worked at developing and extending the RAMSES code, in particular by developing a parallel self-gravity solver, which offers significant performance gains, in particular for the simulation of some astrophysical setups such as isolated galaxy or cluster simulations. (author) [fr
Statistical Analysis of Big Data on Pharmacogenomics

Science.gov (United States)

Fan, Jianqing; Liu, Han

2013-01-01

This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905
Time Series Analysis Based on Running Mann Whitney Z Statistics

Science.gov (United States)

A sensitive and objective time series analysis method based on the calculation of Mann Whitney U statistics is described. This method samples data rankings over moving time windows, converts those samples to Mann-Whitney U statistics, and then normalizes the U statistics to Z statistics using Monte-...
A note on the kappa statistic for clustered dichotomous data.

Science.gov (United States)

Zhou, Ming; Yang, Zhao

2014-06-30

The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.
Constructing and sampling directed graphs with given degree sequences

International Nuclear Information System (INIS)

Kim, H; Del Genio, C I; Bassler, K E; Toroczkai, Z

2012-01-01

The interactions between the components of complex networks are often directed. Proper modeling of such systems frequently requires the construction of ensembles of digraphs with a given sequence of in- and out-degrees. As the number of simple labeled graphs with a given degree sequence is typically very large even for short sequences, sampling methods are needed for statistical studies. Currently, there are two main classes of methods that generate samples. One of the existing methods first generates a restricted class of graphs and then uses a Markov chain Monte-Carlo algorithm based on edge swaps to generate other realizations. As the mixing time of this process is still unknown, the independence of the samples is not well controlled. The other class of methods is based on the configuration model that may lead to unacceptably many sample rejections due to self-loops and multiple edges. Here we present an algorithm that can directly construct all possible realizations of a given bi-degree sequence by simple digraphs. Our method is rejection-free, guarantees the independence of the constructed samples and provides their weight. The weights can then be used to compute statistical averages of network observables as if they were obtained from uniformly distributed sampling or from any other chosen distribution. (paper)
Monitoring larval populations of the Douglas-fir tussock moth and the western spruce budworm on permanent plots: sampling methods and statistical properties of data

Science.gov (United States)

A.R. Mason; H.G. Paul

1994-01-01

Procedures for monitoring larval populations of the Douglas-fir tussock moth and the western spruce budworm are recommended based on many years experience in sampling these species in eastern Oregon and Washington. It is shown that statistically reliable estimates of larval density can be made for a population by sampling host trees in a series of permanent plots in a...
FUNSTAT and statistical image representations

Science.gov (United States)

Parzen, E.

1983-01-01

General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.
Statistically Controlling for Confounding Constructs Is Harder than You Think.

Directory of Open Access Journals (Sweden)

Jacob Westfall

Full Text Available Social scientists often seek to demonstrate that a construct has incremental validity over and above other related constructs. However, these claims are typically supported by measurement-level models that fail to consider the effects of measurement (unreliability. We use intuitive examples, Monte Carlo simulations, and a novel analytical framework to demonstrate that common strategies for establishing incremental construct validity using multiple regression analysis exhibit extremely high Type I error rates under parameter regimes common in many psychological domains. Counterintuitively, we find that error rates are highest--in some cases approaching 100%--when sample sizes are large and reliability is moderate. Our findings suggest that a potentially large proportion of incremental validity claims made in the literature are spurious. We present a web application (http://jakewestfall.org/ivy/ that readers can use to explore the statistical properties of these and other incremental validity arguments. We conclude by reviewing SEM-based statistical approaches that appropriately control the Type I error rate when attempting to establish incremental validity.
Large-volume injection of sample diluents not miscible with the mobile phase as an alternative approach in sample preparation for bioanalysis: an application for fenspiride bioequivalence.

Science.gov (United States)

Medvedovici, Andrei; Udrescu, Stefan; Albu, Florin; Tache, Florentin; David, Victor

2011-09-01

Liquid-liquid extraction of target compounds from biological matrices followed by the injection of a large volume from the organic layer into the chromatographic column operated under reversed-phase (RP) conditions would successfully combine the selectivity and the straightforward character of the procedure in order to enhance sensitivity, compared with the usual approach of involving solvent evaporation and residue re-dissolution. Large-volume injection of samples in diluents that are not miscible with the mobile phase was recently introduced in chromatographic practice. The risk of random errors produced during the manipulation of samples is also substantially reduced. A bioanalytical method designed for the bioequivalence of fenspiride containing pharmaceutical formulations was based on a sample preparation procedure involving extraction of the target analyte and the internal standard (trimetazidine) from alkalinized plasma samples in 1-octanol. A volume of 75 µl from the octanol layer was directly injected on a Zorbax SB C18 Rapid Resolution, 50 mm length × 4.6 mm internal diameter × 1.8 µm particle size column, with the RP separation being carried out under gradient elution conditions. Detection was made through positive ESI and MS/MS. Aspects related to method development and validation are discussed. The bioanalytical method was successfully applied to assess bioequivalence of a modified release pharmaceutical formulation containing 80 mg fenspiride hydrochloride during two different studies carried out as single-dose administration under fasting and fed conditions (four arms), and multiple doses administration, respectively. The quality attributes assigned to the bioanalytical method, as resulting from its application to the bioequivalence studies, are highlighted and fully demonstrate that sample preparation based on large-volume injection of immiscible diluents has an increased potential for application in bioanalysis.

Psychometric Properties of the Penn State Worry Questionnaire for Children in a Large Clinical Sample

Science.gov (United States)

Pestle, Sarah L.; Chorpita, Bruce F.; Schiffman, Jason

2008-01-01

The Penn State Worry Questionnaire for Children (PSWQ-C; Chorpita, Tracey, Brown, Collica, & Barlow, 1997) is a 14-item self-report measure of worry in children and adolescents. Although the PSWQ-C has demonstrated favorable psychometric properties in small clinical and large community samples, this study represents the first psychometric…
Statistical measures of galaxy clustering

International Nuclear Information System (INIS)

Porter, D.H.

1988-01-01

Consideration is given to the large-scale distribution of galaxies and ways in which this distribution may be statistically measured. Galaxy clustering is hierarchical in nature, so that the positions of clusters of galaxies are themselves spatially clustered. A simple identification of groups of galaxies would be an inadequate description of the true richness of galaxy clustering. Current observations of the large-scale structure of the universe and modern theories of cosmology may be studied with a statistical description of the spatial and velocity distributions of galaxies. 8 refs
Analysis of reflection-peak wavelengths of sampled fiber Bragg gratings with large chirp.

Science.gov (United States)

Zou, Xihua; Pan, Wei; Luo, Bin

2008-09-10

The reflection-peak wavelengths (RPWs) in the spectra of sampled fiber Bragg gratings with large chirp (SFBGs-LC) are theoretically investigated. Such RPWs are divided into two parts, the RPWs of equivalent uniform SFBGs (U-SFBGs) and the wavelength shift caused by the large chirp in the grating period (CGP). We propose a quasi-equivalent transform to deal with the CGP. That is, the CGP is transferred into quasi-equivalent phase shifts to directly derive the Fourier transform of the refractive index modulation. Then, in the case of both the direct and the inverse Talbot effect, the wavelength shift is obtained from the Fourier transform. Finally, the RPWs of SFBGs-LC can be achieved by combining the wavelength shift and the RPWs of equivalent U-SFBGs. Several simulations are shown to numerically confirm these predicted RPWs of SFBGs-LC.
Large-Scale Optimization for Bayesian Inference in Complex Systems

Energy Technology Data Exchange (ETDEWEB)

Willcox, Karen [MIT; Marzouk, Youssef [MIT

2013-11-12

The SAGUARO (Scalable Algorithms for Groundwater Uncertainty Analysis and Robust Optimization) Project focused on the development of scalable numerical algorithms for large-scale Bayesian inversion in complex systems that capitalize on advances in large-scale simulation-based optimization and inversion methods. The project was a collaborative effort among MIT, the University of Texas at Austin, Georgia Institute of Technology, and Sandia National Laboratories. The research was directed in three complementary areas: efficient approximations of the Hessian operator, reductions in complexity of forward simulations via stochastic spectral approximations and model reduction, and employing large-scale optimization concepts to accelerate sampling. The MIT--Sandia component of the SAGUARO Project addressed the intractability of conventional sampling methods for large-scale statistical inverse problems by devising reduced-order models that are faithful to the full-order model over a wide range of parameter values; sampling then employs the reduced model rather than the full model, resulting in very large computational savings. Results indicate little effect on the computed posterior distribution. On the other hand, in the Texas--Georgia Tech component of the project, we retain the full-order model, but exploit inverse problem structure (adjoint-based gradients and partial Hessian information of the parameter-to-observation map) to implicitly extract lower dimensional information on the posterior distribution; this greatly speeds up sampling methods, so that fewer sampling points are needed. We can think of these two approaches as ``reduce then sample'' and ``sample then reduce.'' In fact, these two approaches are complementary, and can be used in conjunction with each other. Moreover, they both exploit deterministic inverse problem structure, in the form of adjoint-based gradient and Hessian information of the underlying parameter-to-observation map, to
Statistical analysis of fuel failures in large break loss-of-coolant accident (LBLOCA) in EPR type nuclear power plant

International Nuclear Information System (INIS)

Arkoma, Asko; Hänninen, Markku; Rantamäki, Karin; Kurki, Joona; Hämäläinen, Anitta

2015-01-01

Highlights: • The number of failing fuel rods in a LB-LOCA in an EPR is evaluated. • 59 scenarios are simulated with the system code APROS. • 1000 rods per scenario are simulated with the fuel performance code FRAPTRAN-GENFLO. • All the rods in the reactor are simulated in the worst scenario. • Results suggest that the regulations set by the Finnish safety authority are met. - Abstract: In this paper, the number of failing fuel rods in a large break loss-of-coolant accident (LB-LOCA) in EPR-type nuclear power plant is evaluated using statistical methods. For this purpose, a statistical fuel failure analysis procedure has been developed. The developed method utilizes the results of nonparametric statistics, the Wilks’ formula in particular, and is based on the selection and variation of parameters that are important in accident conditions. The accident scenario is simulated with the coupled fuel performance – thermal hydraulics code FRAPTRAN-GENFLO using various parameter values and thermal hydraulic and power history boundary conditions between the simulations. The number of global scenarios is 59 (given by the Wilks’ formula), and 1000 rods are simulated in each scenario. The boundary conditions are obtained from a new statistical version of the system code APROS. As a result, in the worst global scenario, 1.2% of the simulated rods failed, and it can be concluded that the Finnish safety regulations are hereby met (max. 10% of the rods allowed to fail)
Statistical analysis of fuel failures in large break loss-of-coolant accident (LBLOCA) in EPR type nuclear power plant

Energy Technology Data Exchange (ETDEWEB)

Arkoma, Asko, E-mail: asko.arkoma@vtt.fi; Hänninen, Markku; Rantamäki, Karin; Kurki, Joona; Hämäläinen, Anitta

2015-04-15

Highlights: • The number of failing fuel rods in a LB-LOCA in an EPR is evaluated. • 59 scenarios are simulated with the system code APROS. • 1000 rods per scenario are simulated with the fuel performance code FRAPTRAN-GENFLO. • All the rods in the reactor are simulated in the worst scenario. • Results suggest that the regulations set by the Finnish safety authority are met. - Abstract: In this paper, the number of failing fuel rods in a large break loss-of-coolant accident (LB-LOCA) in EPR-type nuclear power plant is evaluated using statistical methods. For this purpose, a statistical fuel failure analysis procedure has been developed. The developed method utilizes the results of nonparametric statistics, the Wilks’ formula in particular, and is based on the selection and variation of parameters that are important in accident conditions. The accident scenario is simulated with the coupled fuel performance – thermal hydraulics code FRAPTRAN-GENFLO using various parameter values and thermal hydraulic and power history boundary conditions between the simulations. The number of global scenarios is 59 (given by the Wilks’ formula), and 1000 rods are simulated in each scenario. The boundary conditions are obtained from a new statistical version of the system code APROS. As a result, in the worst global scenario, 1.2% of the simulated rods failed, and it can be concluded that the Finnish safety regulations are hereby met (max. 10% of the rods allowed to fail)
Subdomain sensitive statistical parsing using raw corpora

NARCIS (Netherlands)

Plank, B.; Sima'an, K.

2008-01-01

Modern statistical parsers are trained on large annotated corpora (treebanks). These treebanks usually consist of sentences addressing different subdomains (e.g. sports, politics, music), which implies that the statistics gathered by current statistical parsers are mixtures of subdomains of language
Statistical mechanics in JINR

International Nuclear Information System (INIS)

Tonchev, N.; Shumovskij, A.S.

1986-01-01

The history of investigations, conducted at the JINR in the field of statistical mechanics, beginning with the fundamental works by Bogolyubov N.N. on superconductivity microscopic theory is presented. Ideas, introduced in these works and methods developed in them, have largely determined the ways for developing statistical mechanics in the JINR and Hartree-Fock-Bogolyubov variational principle has become an important method of the modern nucleus theory. A brief review of the main achievements, connected with the development of statistical mechanics methods and their application in different fields of physical science is given
Superwind Outflows in Seyfert Galaxies? : Large-Scale Radio Maps of an Edge-On Sample

Science.gov (United States)

Colbert, E.; Gallimore, J.; Baum, S.; O'Dea, C.

1995-03-01

Large-scale galactic winds (superwinds) are commonly found flowing out of the nuclear region of ultraluminous infrared and powerful starburst galaxies. Stellar winds and supernovae from the nuclear starburst provide the energy to drive these superwinds. The outflowing gas escapes along the rotation axis, sweeping up and shock-heating clouds in the halo, which produces optical line emission, radio synchrotron emission, and X-rays. These features can most easily be studied in edge-on systems, so that the wind emission is not confused by that from the disk. We have begun a systematic search for superwind outflows in Seyfert galaxies. In an earlier optical emission-line survey, we found extended minor axis emission and/or double-peaked emission line profiles in >~30% of the sample objects. We present here large-scale (6cm VLA C-config) radio maps of 11 edge-on Seyfert galaxies, selected (without bias) from a distance-limited sample of 23 edge-on Seyferts. These data have been used to estimate the frequency of occurrence of superwinds. Preliminary results indicate that four (36%) of the 11 objects observed and six (26%) of the 23 objects in the distance-limited sample have extended radio emission oriented perpendicular to the galaxy disk. This emission may be produced by a galactic wind blowing out of the disk. Two (NGC 2992 and NGC 5506) of the nine objects for which we have both radio and optical data show good evidence for a galactic wind in both datasets. We suggest that galactic winds occur in >~30% of all Seyferts. A goal of this work is to find a diagnostic that can be used to distinguish between large-scale outflows that are driven by starbursts and those that are driven by an AGN. The presence of starburst-driven superwinds in Seyferts, if established, would have important implications for the connection between starburst galaxies and AGN.
Radiological decontamination, survey, and statistical release method for vehicles

International Nuclear Information System (INIS)

Goodwill, M.E.; Lively, J.W.; Morris, R.L.

1996-06-01

Earth-moving vehicles (e.g., dump trucks, belly dumps) commonly haul radiologically contaminated materials from a site being remediated to a disposal site. Traditionally, each vehicle must be surveyed before being released. The logistical difficulties of implementing the traditional approach on a large scale demand that an alternative be devised. A statistical method for assessing product quality from a continuous process was adapted to the vehicle decontamination process. This method produced a sampling scheme that automatically compensates and accommodates fluctuating batch sizes and changing conditions without the need to modify or rectify the sampling scheme in the field. Vehicles are randomly selected (sampled) upon completion of the decontamination process to be surveyed for residual radioactive surface contamination. The frequency of sampling is based on the expected number of vehicles passing through the decontamination process in a given period and the confidence level desired. This process has been successfully used for 1 year at the former uranium millsite in Monticello, Utah (a cleanup site regulated under the Comprehensive Environmental Response, Compensation, and Liability Act). The method forces improvement in the quality of the decontamination process and results in a lower likelihood that vehicles exceeding the surface contamination standards are offered for survey. Implementation of this statistical sampling method on Monticello projects has resulted in more efficient processing of vehicles through decontamination and radiological release, saved hundreds of hours of processing time, provided a high level of confidence that release limits are met, and improved the radiological cleanliness of vehicles leaving the controlled site
The Kinematics of the Permitted C ii λ 6578 Line in a Large Sample of Planetary Nebulae

Energy Technology Data Exchange (ETDEWEB)

Richer, Michael G.; Suárez, Genaro; López, José Alberto; García Díaz, María Teresa, E-mail: richer@astrosen.unam.mx, E-mail: gsuarez@astro.unam.mx, E-mail: jal@astrosen.unam.mx, E-mail: tere@astro.unam.mx [Instituto de Astronomía, Universidad Nacional Autónoma de México, Ensenada, Baja California (Mexico)

2017-03-01

We present spectroscopic observations of the C ii λ 6578 permitted line for 83 lines of sight in 76 planetary nebulae at high spectral resolution, most of them obtained with the Manchester Echelle Spectrograph on the 2.1 m telescope at the Observatorio Astronómico Nacional on the Sierra San Pedro Mártir. We study the kinematics of the C ii λ 6578 permitted line with respect to other permitted and collisionally excited lines. Statistically, we find that the kinematics of the C ii λ 6578 line are not those expected if this line arises from the recombination of C{sup 2+} ions or the fluorescence of C{sup +} ions in ionization equilibrium in a chemically homogeneous nebular plasma, but instead its kinematics are those appropriate for a volume more internal than expected. The planetary nebulae in this sample have well-defined morphology and are restricted to a limited range in H α line widths (no large values) compared to their counterparts in the Milky Way bulge; both these features could be interpreted as the result of young nebular shells, an inference that is also supported by nebular modeling. Concerning the long-standing discrepancy between chemical abundances inferred from permitted and collisionally excited emission lines in photoionized nebulae, our results imply that multiple plasma components occur commonly in planetary nebulae.
Applied statistical designs for the researcher

CERN Document Server

Paulson, Daryl S

2003-01-01

Research and Statistics Basic Review of Parametric Statistics Exploratory Data Analysis Two Sample Tests Completely Randomized One-Factor Analysis of Variance One and Two Restrictions on Randomization Completely Randomized Two-Factor Factorial Designs Two-Factor Factorial Completely Randomized Blocked Designs Useful Small Scale Pilot Designs Nested Statistical Designs Linear Regression Nonparametric Statistics Introduction to Research Synthesis and "Meta-Analysis" and Conclusory Remarks References Index.
On the fairness of the main galaxy sample of SDSS

International Nuclear Information System (INIS)

Meng Kelai; Pan Jun; Feng Longlong; Ma Bin

2011-01-01

Flux-limited and volume-limited galaxy samples are constructed from the Sloan Digital Sky Survey (SDSS) data releases DR4, DR6 and DR7 for statistical analysis. The two-point correlation functions ξ(s), monopole of three-point correlation functions ζ 0 , projected two-point correlation function w p and pairwise velocity dispersion σ 12 are measured to test if galaxy samples are fair for these statistics. We find that with the increment of sky coverage of subsequent data releases in SDSS, ξ(s) of the flux-limited sample is extremely robust and insensitive to local structures at low redshift. However, for volume-limited samples fainter than L* at large scales s > or approx. 10 h -1 Mpc, the deviation of ξ(s) from different SDSS data releases (DR7, DR6 and DR4) increases with the increment of absolute magnitude. The case of ζ 0 (s) is similar to that of ξ(s). In the weakly nonlinear regime, there is no agreement between ζ 0 of different data releases in all luminosity bins. Furthermore, w p of volume-limited samples of DR7 in luminosity bins fainter than -M r,0.1 = [18.5, 19.5] are significantly larger and σ 12 of the two faintest volume-limited samples of DR7 display a very different scale dependence than results from DR4 and DR6. Our findings call for caution in understanding clustering analysis results of SDSS faint galaxy samples and higher order statistics of SDSS volume-limited samples in the weakly nonlinear regime. The first zero-crossing points of ξ(s) from volume-limited samples are also investigated and discussed. (research papers)
Constrained statistical inference : sample-size tables for ANOVA and regression

NARCIS (Netherlands)

Vanbrabant, Leonard; Van De Schoot, Rens; Rosseel, Yves

2015-01-01

Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient β1 is larger than β2 and β3. The corresponding hypothesis is H: β1 > {β2, β3} and
Assessment of statistical methods used in library-based approaches to microbial source tracking.

Science.gov (United States)

Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D

2003-12-01

Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.
Statistical aspects of the program of the Atomic Bomb Casualty Commission

Energy Technology Data Exchange (ETDEWEB)

Beebe, G W

1961-02-24

The Atomic Bomb Casualty Commission (ABCC) is a medical research institute in Hiroshima and Nagasaki devoted to long term study of the late effects of nuclear radiation upon man. The work draws its great interest from the paucity of existing information on the effect of radiation on man; from the unique radiation experience of the atomic bomb survivors; from the increasing utilization of nuclear energy in modern technology; and from humanitarian concern for the survivors of the bombs. The ABCC program provides the statistician with an important opportunity to apply the tools and concepts of statistics, for the inferences to be drawn are largely statistical inferences growing out of the comparison of samples defined as to radiation exposure. The work is of international as well as statistical interest by virtue of its subject matter and as a meeting-ground for statisticians trained in different countries.
The quantitative LOD score: test statistic and sample size for exclusion and linkage of quantitative traits in human sibships.

Science.gov (United States)

Page, G P; Amos, C I; Boerwinkle, E

1998-04-01

We present a test statistic, the quantitative LOD (QLOD) score, for the testing of both linkage and exclusion of quantitative-trait loci in randomly selected human sibships. As with the traditional LOD score, the boundary values of 3, for linkage, and -2, for exclusion, can be used for the QLOD score. We investigated the sample sizes required for inferring exclusion and linkage, for various combinations of linked genetic variance, total heritability, recombination distance, and sibship size, using fixed-size sampling. The sample sizes required for both linkage and exclusion were not qualitatively different and depended on the percentage of variance being linked or excluded and on the total genetic variance. Information regarding linkage and exclusion in sibships larger than size 2 increased as approximately all possible pairs n(n-1)/2 up to sibships of size 6. Increasing the recombination (theta) distance between the marker and the trait loci reduced empirically the power for both linkage and exclusion, as a function of approximately (1-2theta)4.
Application of k0-based internal monostandard NAA for large sample analysis of clay pottery. As a part of inter comparison exercise

International Nuclear Information System (INIS)

Acharya, R.; Dasari, K.B.; Pujari, P.K.; Swain, K.K.; Shinde, A.D.; Reddy, A.V.R.

2014-01-01

As a part of inter comparison exercise of an IAEA Coordinated Research Project on large sample neutron activation analysis, a large size and non standard geometry size pottery replica (obtained from Peru) was analyzed by k 0 -based internal monostandard neutron activation analysis (IM-NAA). Two large size sub samples (0.40 and 0.25 kg) were irradiated at graphite reflector position of AHWR Critical Facility in BARC, Trombay, Mumbai, India. Small samples (100-200 mg) were also analyzed by IM-NAA for comparison purpose. Radioactive assay was carried out using a 40 % relative efficiency HPGe detector. To examine homogeneity of the sample, counting was also carried out using X-Z rotary scanning unit. In situ relative detection efficiency was evaluated using gamma rays of the activation products in the irradiated sample in the energy range of 122-2,754 keV. Elemental concentration ratios with respect to Na of small size (100 mg mass) as well as large size (15 and 400 g) samples were used to check the homogeneity of the samples. Concentration ratios of 18 elements such as K, Sc, Cr, Mn, Fe, Co, Zn, As, Rb, Cs, La, Ce, Sm, Eu, Yb, Lu, Hf and Th with respect to Na (internal mono standard) were calculated using IM-NAA. Absolute concentrations were arrived at for both large and small samples using Na concentration, obtained from relative method of NAA. The percentage combined uncertainties at ±1 s confidence limit on the determined values were in the range of 3-9 %. Two IAEA reference materials SL-1 and SL-3 were analyzed by IM-NAA to evaluate accuracy of the method. (author)
Frontiers in statistical quality control

CERN Document Server

Wilrich, Peter-Theodor

2004-01-01

This volume treats the four main categories of Statistical Quality Control: General SQC Methodology, On-line Control including Sampling Inspection and Statistical Process Control, Off-line Control with Data Analysis and Experimental Design, and, fields related to Reliability. Experts with international reputation present their newest contributions.
A statistical evaluation of asbestos air concentrations

Energy Technology Data Exchange (ETDEWEB)

Lange, J.H. [Envirosafe Training and Consultants, Pittsburgh, PA (United States)

1999-07-01

Both area and personal air samples collected during an asbestos abatement project were matched and statistically analysed. Among the many parameters studied were fibre concentrations and their variability. Mean values for area and personal samples were 0.005 and 0.024 f cm{sup -}-{sup 3} of air, respectively. Summary values for area and personal samples suggest that exposures are low with no single exposure value exceeding the current OSHA TWA value of 0.1 f cm{sup -3} of air. Within- and between-worker analysis suggests that these data are homogeneous. Comparison of within- and between-worker values suggests that the exposure source and variability for abatement are more related to the process than individual practices. This supports the importance of control measures for abatement. Study results also suggest that area and personal samples are not statistically related, that is, there is no association observed for these two sampling methods when data are analysed by correlation or regression analysis. Personal samples were statistically higher in concentration than area samples. Area sampling cannot be used as a surrogate exposure for asbestos abatement workers. (author)

Inverse statistical physics of protein sequences: a key issues review.

Science.gov (United States)

Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

2018-03-01

In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
Fundamentals of statistics

CERN Document Server

Mulholland, Henry

1968-01-01

Fundamentals of Statistics covers topics on the introduction, fundamentals, and science of statistics. The book discusses the collection, organization and representation of numerical data; elementary probability; the binomial Poisson distributions; and the measures of central tendency. The text describes measures of dispersion for measuring the spread of a distribution; continuous distributions for measuring on a continuous scale; the properties and use of normal distribution; and tests involving the normal or student's 't' distributions. The use of control charts for sample means; the ranges
How Sample Size Affects a Sampling Distribution

Science.gov (United States)

Mulekar, Madhuri S.; Siegel, Murray H.

2009-01-01

If students are to understand inferential statistics successfully, they must have a profound understanding of the nature of the sampling distribution. Specifically, they must comprehend the determination of the expected value and standard error of a sampling distribution as well as the meaning of the central limit theorem. Many students in a high…
Lensing corrections to the E {sub g} ( z ) statistics from large scale structure

Energy Technology Data Exchange (ETDEWEB)

Dizgah, Azadeh Moradinezhad; Durrer, Ruth, E-mail: Azadeh.Moradinezhad@unige.ch, E-mail: Ruth.Durrer@unige.ch [Department of Theoretical Physics and Center for Astroparticle Physics, University of Geneva, 24 quai E. Ansermet, CH-1211 Geneva 4 (Switzerland)

2016-09-01

We study the impact of the often neglected lensing contribution to galaxy number counts on the E {sub g} statistics which is used to constrain deviations from GR. This contribution affects both the galaxy-galaxy and the convergence-galaxy spectra, while it is larger for the latter. At higher redshifts probed by upcoming surveys, for instance at z = 1.5, neglecting this term induces an error of (25–40)% in the spectra and therefore on the E {sub g} statistics which is constructed from the combination of the two. Moreover, including it, renders the E {sub g} statistics scale and bias-dependent and hence puts into question its very objective.
Large sample NAA of a pottery replica utilizing thermal neutron flux at AHWR critical facility and X-Z rotary scanning unit

International Nuclear Information System (INIS)

Acharya, R.; Dasari, K.B.; Pujari, P.K.; Swain, K.K.; Shinde, A.D.; Reddy, A.V.R.

2013-01-01

Large sample neutron activation analysis (LSNAA) of a clay pottery replica from Peru was carried out using low neutron flux graphite reflector position of Advanced Heavy Water Reactor (AHWR) critical facility. This work was taken up as a part of inter-comparison exercise under IAEA CRP on LSNAA of archaeological objects. Irradiated large size sample, placed on an X-Z rotary scanning unit, was assayed using a 40% relative efficiency HPGe detector. The k 0 -based internal monostandard NAA (IM-NAA) in conjunction with insitu relative detection efficiency was used to calculate concentration ratios of 12 elements with respect to Na. Analyses of both small and large size samples were carried out to check homogeneity and to arrive at absolute concentrations. (author)
Direct Learning of Systematics-Aware Summary Statistics

CERN Multimedia

CERN. Geneva

2018-01-01

Complex machine learning tools, such as deep neural networks and gradient boosting algorithms, are increasingly being used to construct powerful discriminative features for High Energy Physics analyses. These methods are typically trained with simulated or auxiliary data samples by optimising some classification or regression surrogate objective. The learned feature representations are then used to build a sample-based statistical model to perform inference (e.g. interval estimation or hypothesis testing) over a set of parameters of interest. However, the effectiveness of the mentioned approach can be reduced by the presence of known uncertainties that cause differences between training and experimental data, included in the statistical model via nuisance parameters. This work presents an end-to-end algorithm, which leverages on existing deep learning technologies but directly aims to produce inference-optimal sample-summary statistics. By including the statistical model and a differentiable approximation of ...
On the Sampling

OpenAIRE

Güleda Doğan

2017-01-01

This editorial is on statistical sampling, which is one of the most two important reasons for editorial rejection from our journal Turkish Librarianship. The stages of quantitative research, the stage in which we are sampling, the importance of sampling for a research, deciding on sample size and sampling methods are summarised briefly.
Statistical aspects of determinantal point processes

DEFF Research Database (Denmark)

Lavancier, Frédéric; Møller, Jesper; Rubak, Ege

The statistical aspects of determinantal point processes (DPPs) seem largely unexplored. We review the appealing properties of DDPs, demonstrate that they are useful models for repulsiveness, detail a simulation procedure, and provide freely available software for simulation and statistical infer...
Evaluation of single and two-stage adaptive sampling designs for estimation of density and abundance of freshwater mussels in a large river

Science.gov (United States)

Smith, D.R.; Rogala, J.T.; Gray, B.R.; Zigler, S.J.; Newton, T.J.

2011-01-01

Reliable estimates of abundance are needed to assess consequences of proposed habitat restoration and enhancement projects on freshwater mussels in the Upper Mississippi River (UMR). Although there is general guidance on sampling techniques for population assessment of freshwater mussels, the actual performance of sampling designs can depend critically on the population density and spatial distribution at the project site. To evaluate various sampling designs, we simulated sampling of populations, which varied in density and degree of spatial clustering. Because of logistics and costs of large river sampling and spatial clustering of freshwater mussels, we focused on adaptive and non-adaptive versions of single and two-stage sampling. The candidate designs performed similarly in terms of precision (CV) and probability of species detection for fixed sample size. Both CV and species detection were determined largely by density, spatial distribution and sample size. However, designs did differ in the rate that occupied quadrats were encountered. Occupied units had a higher probability of selection using adaptive designs than conventional designs. We used two measures of cost: sample size (i.e. number of quadrats) and distance travelled between the quadrats. Adaptive and two-stage designs tended to reduce distance between sampling units, and thus performed better when distance travelled was considered. Based on the comparisons, we provide general recommendations on the sampling designs for the freshwater mussels in the UMR, and presumably other large rivers.
Samples in applied psychology: over a decade of research in review.

Science.gov (United States)

Shen, Winny; Kiger, Thomas B; Davies, Stacy E; Rasch, Rena L; Simon, Kara M; Ones, Deniz S

2011-09-01

This study examines sample characteristics of articles published in Journal of Applied Psychology (JAP) from 1995 to 2008. At the individual level, the overall median sample size over the period examined was approximately 173, which is generally adequate for detecting the average magnitude of effects of primary interest to researchers who publish in JAP. Samples using higher units of analyses (e.g., teams, departments/work units, and organizations) had lower median sample sizes (Mdn ≈ 65), yet were arguably robust given typical multilevel design choices of JAP authors despite the practical constraints of collecting data at higher units of analysis. A substantial proportion of studies used student samples (~40%); surprisingly, median sample sizes for student samples were smaller than working adult samples. Samples were more commonly occupationally homogeneous (~70%) than occupationally heterogeneous. U.S. and English-speaking participants made up the vast majority of samples, whereas Middle Eastern, African, and Latin American samples were largely unrepresented. On the basis of study results, recommendations are provided for authors, editors, and readers, which converge on 3 themes: (a) appropriateness and match between sample characteristics and research questions, (b) careful consideration of statistical power, and (c) the increased popularity of quantitative synthesis. Implications are discussed in terms of theory building, generalizability of research findings, and statistical power to detect effects. PsycINFO Database Record (c) 2011 APA, all rights reserved
Preferential sampling in veterinary parasitological surveillance

Directory of Open Access Journals (Sweden)

Lorenzo Cecconi

2016-04-01

Full Text Available In parasitological surveillance of livestock, prevalence surveys are conducted on a sample of farms using several sampling designs. For example, opportunistic surveys or informative sampling designs are very common. Preferential sampling refers to any situation in which the spatial process and the sampling locations are not independent. Most examples of preferential sampling in the spatial statistics literature are in environmental statistics with focus on pollutant monitors, and it has been shown that, if preferential sampling is present and is not accounted for in the statistical modelling and data analysis, statistical inference can be misleading. In this paper, working in the context of veterinary parasitology, we propose and use geostatistical models to predict the continuous and spatially-varying risk of a parasite infection. Specifically, breaking with the common practice in veterinary parasitological surveillance to ignore preferential sampling even though informative or opportunistic samples are very common, we specify a two-stage hierarchical Bayesian model that adjusts for preferential sampling and we apply it to data on Fasciola hepatica infection in sheep farms in Campania region (Southern Italy in the years 2013-2014.
Catch statistics for belugas in West Greenland 1862 to 1999

Directory of Open Access Journals (Sweden)

MP Heide-Jørgensen

2002-07-01

Full Text Available Information and statistics including trade statistics on catches of white whales or belugas (Delphinapterus leucas in West Greenland since 1862 are presented. The period before 1952 was dominated by large catches south of 66o N that peaked with 1,380 reported kills in 1922. Catch levels in the past five decades are evaluated on the basis of official catch statistics, trade in mattak (whale skin, sampling of jaws and reports from local residents and other observers. Options are given for corrections of catch statistics based upon auxiliary statistics on trade of mattak, catches in previous decades for areas without reporting and on likely levels of loss rates in different hunting operations. The fractions of the reported catches that are caused by ice entrapments of whales are estimated. During 1954-1999 total reported catches ranged from 216 to 1,874 and they peaked around 1970. Correcting for underreporting and killed-but-lost whales increases the catch reports by 42% on average for 1954-1998. If the whales killed in ice entrapments are removed then the corrected catch estimate is on average 28% larger than the reported catches.
On incomplete sampling under birth-death models and connections to the sampling-based coalescent.

Science.gov (United States)

Stadler, Tanja

2009-11-07

The constant rate birth-death process is used as a stochastic model for many biological systems, for example phylogenies or disease transmission. As the biological data are usually not fully available, it is crucial to understand the effect of incomplete sampling. In this paper, we analyze the constant rate birth-death process with incomplete sampling. We derive the density of the bifurcation events for trees on n leaves which evolved under this birth-death-sampling process. This density is used for calculating prior distributions in Bayesian inference programs and for efficiently simulating trees. We show that the birth-death-sampling process can be interpreted as a birth-death process with reduced rates and complete sampling. This shows that joint inference of birth rate, death rate and sampling probability is not possible. The birth-death-sampling process is compared to the sampling-based population genetics model, the coalescent. It is shown that despite many similarities between these two models, the distribution of bifurcation times remains different even in the case of very large population sizes. We illustrate these findings on an Hepatitis C virus dataset from Egypt. We show that the transmission times estimates are significantly different-the widely used Gamma statistic even changes its sign from negative to positive when switching from the coalescent to the birth-death process.
"Best Practices in Using Large, Complex Samples: The Importance of Using Appropriate Weights and Design Effect Compensation"

Directory of Open Access Journals (Sweden)

Jason W. Osborne

2011-09-01

Full Text Available Large surveys often use probability sampling in order to obtain representative samples, and these data sets are valuable tools for researchers in all areas of science. Yet many researchers are not formally prepared to appropriately utilize these resources. Indeed, users of one popular dataset were generally found not to have modeled the analyses to take account of the complex sample (Johnson & Elliott, 1998 even when publishing in highly-regarded journals. It is well known that failure to appropriately model the complex sample can substantially bias the results of the analysis. Examples presented in this paper highlight the risk of error of inference and mis-estimation of parameters from failure to analyze these data sets appropriately.
A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic

Science.gov (United States)

Qi, Jin-Peng; Qi, Jie; Zhang, Qing

2016-01-01

Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals. PMID:27413364
A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic.

Science.gov (United States)

Qi, Jin-Peng; Qi, Jie; Zhang, Qing

2016-01-01

Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals.
Large scale sample management and data analysis via MIRACLE

DEFF Research Database (Denmark)

Block, Ines; List, Markus; Pedersen, Marlene Lemvig

Reverse-phase protein arrays (RPPAs) allow sensitive quantification of relative protein abundance in thousands of samples in parallel. In the past years the technology advanced based on improved methods and protocols concerning sample preparation and printing, antibody selection, optimization...... of staining conditions and mode of signal analysis. However, the sample management and data analysis still poses challenges because of the high number of samples, sample dilutions, customized array patterns, and various programs necessary for array construction and data processing. We developed...... a comprehensive and user-friendly web application called MIRACLE (MIcroarray R-based Analysis of Complex Lysate Experiments), which bridges the gap between sample management and array analysis by conveniently keeping track of the sample information from lysate preparation, through array construction and signal...
Statistical significance of epidemiological data. Seminar: Evaluation of epidemiological studies

International Nuclear Information System (INIS)

Weber, K.H.

1993-01-01

In stochastic damages, the numbers of events, e.g. the persons who are affected by or have died of cancer, and thus the relative frequencies (incidence or mortality) are binomially distributed random variables. Their statistical fluctuations can be characterized by confidence intervals. For epidemiologic questions, especially for the analysis of stochastic damages in the low dose range, the following issues are interesting: - Is a sample (a group of persons) with a definite observed damage frequency part of the whole population? - Is an observed frequency difference between two groups of persons random or statistically significant? - Is an observed increase or decrease of the frequencies with increasing dose random or statistically significant and how large is the regression coefficient (= risk coefficient) in this case? These problems can be solved by sttistical tests. So-called distribution-free tests and tests which are not bound to the supposition of normal distribution are of particular interest, such as: - χ 2 -independence test (test in contingency tables); - Fisher-Yates-test; - trend test according to Cochran; - rank correlation test given by Spearman. These tests are explained in terms of selected epidemiologic data, e.g. of leukaemia clusters, of the cancer mortality of the Japanese A-bomb survivors especially in the low dose range as well as on the sample of the cancer mortality in the high background area in Yangjiang (China). (orig.) [de
Examining gray matter structure associated with academic performance in a large sample of Chinese high school students

OpenAIRE

Song Wang; Ming Zhou; Taolin Chen; Xun Yang; Guangxiang Chen; Meiyun Wang; Qiyong Gong

2017-01-01

Achievement in school is crucial for students to be able to pursue successful careers and lead happy lives in the future. Although many psychological attributes have been found to be associated with academic performance, the neural substrates of academic performance remain largely unknown. Here, we investigated the relationship between brain structure and academic performance in a large sample of high school students via structural magnetic resonance imaging (S-MRI) using voxel-based morphome...
Statistical power and the Rorschach: 1975-1991.

Science.gov (United States)

Acklin, M W; McDowell, C J; Orndoff, S

1992-10-01

The Rorschach Inkblot Test has been the source of long-standing controversies as to its nature and its psychometric properties. Consistent with behavioral science research in general, the concept of statistical power has been entirely ignored by Rorschach researchers. The concept of power is introduced and discussed, and a power survey of the Rorschach literature published between 1975 and 1991 in the Journal of Personality Assessment, Journal of Consulting and Clinical Psychology, Journal of Abnormal Psychology, Journal of Clinical Psychology, Journal of Personality, Psychological Bulletin, American Journal of Psychiatry, and Journal of Personality and Social Psychology was undertaken. Power was calculated for 2,300 statistical tests in 158 journal articles. Power to detect small, medium, and large effect sizes was .13, .56, and .85, respectively. Similar to the findings in other power surveys conducted on behavioral science research, we concluded that Rorschach research is underpowered to detect the differences under investigation. This undoubtedly contributes to the inconsistency of research findings which has been a source of controversy and criticism over the decades. It appears that research conducted according to the Comprehensive System for the Rorschach is more powerful. Recommendations are offered for improving power and strengthening the design sensitivity of Rorschach research, including increasing sample sizes, use of parametric statistics, reduction of error variance, more accurate reporting of findings, and editorial policies reflecting concern about the magnitude of relationships beyond an exclusive focus on levels of statistical significance.

Statistics II essentials

CERN Document Server

Milewski, Emil G

2012-01-01

REA's Essentials provide quick and easy access to critical information in a variety of different fields, ranging from the most basic to the most advanced. As its name implies, these concise, comprehensive study guides summarize the essentials of the field covered. Essentials are helpful when preparing for exams, doing homework and will remain a lasting reference source for students, teachers, and professionals. Statistics II discusses sampling theory, statistical inference, independent and dependent variables, correlation theory, experimental design, count data, chi-square test, and time se
Statistical quality management using miniTAB 14

International Nuclear Information System (INIS)

An, Seong Jin

2007-01-01

This book explains statistical quality management giving descriptions of definition of quality, quality management, quality cost, basic methods of quality management, principles of control chart, control chart for variables, control chart for attributes, capability analysis, other issues of statistical process control, acceptance sampling, sampling for variable acceptance, design and analysis of experiment, Taguchi quality engineering, reaction surface methodology reliability analysis.
Small sample approach, and statistical and epidemiological aspects

NARCIS (Netherlands)

Offringa, Martin; van der Lee, Hanneke

2011-01-01

In this chapter, the design of pharmacokinetic studies and phase III trials in children is discussed. Classical approaches and relatively novel approaches, which may be more useful in the context of drug research in children, are discussed. The burden of repeated blood sampling in pediatric
Simulating the complex output of rainfall and hydrological processes using the information contained in large data sets: the Direct Sampling approach.

Science.gov (United States)

Oriani, Fabio

2017-04-01

The unpredictable nature of rainfall makes its estimation as much difficult as it is essential to hydrological applications. Stochastic simulation is often considered a convenient approach to asses the uncertainty of rainfall processes, but preserving their irregular behavior and variability at multiple scales is a challenge even for the most advanced techniques. In this presentation, an overview on the Direct Sampling technique [1] and its recent application to rainfall and hydrological data simulation [2, 3] is given. The algorithm, having its roots in multiple-point statistics, makes use of a training data set to simulate the outcome of a process without inferring any explicit probability measure: the data are simulated in time or space by sampling the training data set where a sufficiently similar group of neighbor data exists. This approach allows preserving complex statistical dependencies at different scales with a good approximation, while reducing the parameterization to the minimum. The straights and weaknesses of the Direct Sampling approach are shown through a series of applications to rainfall and hydrological data: from time-series simulation to spatial rainfall fields conditioned by elevation or a climate scenario. In the era of vast databases, is this data-driven approach a valid alternative to parametric simulation techniques? [1] Mariethoz G., Renard P., and Straubhaar J. (2010), The Direct Sampling method to perform multiple-point geostatistical simulations, Water. Rerous. Res., 46(11), http://dx.doi.org/10.1029/2008WR007621 [2] Oriani F., Straubhaar J., Renard P., and Mariethoz G. (2014), Simulation of rainfall time series from different climatic regions using the direct sampling technique, Hydrol. Earth Syst. Sci., 18, 3015-3031, http://dx.doi.org/10.5194/hess-18-3015-2014 [3] Oriani F., Borghi A., Straubhaar J., Mariethoz G., Renard P. (2016), Missing data simulation inside flow rate time-series using multiple-point statistics, Environ. Model
Statistical Computing

Indian Academy of Sciences (India)

inference and finite population sampling. Sudhakar Kunte. Elements of statistical computing are discussed in this series. ... which captain gets an option to decide whether to field first or bat first ... may of course not be fair, in the sense that the team which wins ... describe two methods of drawing a random number between 0.
Sampling

CERN Document Server

Thompson, Steven K

2012-01-01

Praise for the Second Edition "This book has never had a competitor. It is the only book that takes a broad approach to sampling . . . any good personal statistics library should include a copy of this book." —Technometrics "Well-written . . . an excellent book on an important subject. Highly recommended." —Choice "An ideal reference for scientific researchers and other professionals who use sampling." —Zentralblatt Math Features new developments in the field combined with all aspects of obtaining, interpreting, and using sample data Sampling provides an up-to-date treat
Deterministic methods for sensitivity and uncertainty analysis in large-scale computer models

International Nuclear Information System (INIS)

Worley, B.A.; Oblow, E.M.; Pin, F.G.; Maerker, R.E.; Horwedel, J.E.; Wright, R.Q.; Lucius, J.L.

1987-01-01

The fields of sensitivity and uncertainty analysis are dominated by statistical techniques when large-scale modeling codes are being analyzed. This paper reports on the development and availability of two systems, GRESS and ADGEN, that make use of computer calculus compilers to automate the implementation of deterministic sensitivity analysis capability into existing computer models. This automation removes the traditional limitation of deterministic sensitivity methods. The paper describes a deterministic uncertainty analysis method (DUA) that uses derivative information as a basis to propagate parameter probability distributions to obtain result probability distributions. The paper demonstrates the deterministic approach to sensitivity and uncertainty analysis as applied to a sample problem that models the flow of water through a borehole. The sample problem is used as a basis to compare the cumulative distribution function of the flow rate as calculated by the standard statistical methods and the DUA method. The DUA method gives a more accurate result based upon only two model executions compared to fifty executions in the statistical case
Statistical Thermodynamic Approach to Vibrational Solitary Waves in Acetanilide

Science.gov (United States)

Vasconcellos, Áurea R.; Mesquita, Marcus V.; Luzzi, Roberto

1998-03-01

We analyze the behavior of the macroscopic thermodynamic state of polymers, centering on acetanilide. The nonlinear equations of evolution for the populations and the statistically averaged field amplitudes of CO-stretching modes are derived. The existence of excitations of the solitary wave type is evidenced. The infrared spectrum is calculated and compared with the experimental data of Careri et al. [Phys. Rev. Lett. 51, 104 (1983)], resulting in a good agreement. We also consider the situation of a nonthermally highly excited sample, predicting the occurrence of a large increase in the lifetime of the solitary wave excitation.
Statistical Power in Meta-Analysis

Science.gov (United States)

Liu, Jin

2015-01-01

Statistical power is important in a meta-analysis study, although few studies have examined the performance of simulated power in meta-analysis. The purpose of this study is to inform researchers about statistical power estimation on two sample mean difference test under different situations: (1) the discrepancy between the analytical power and…
Frontiers in statistical quality control

CERN Document Server

Wilrich, Peter-Theodor

2001-01-01

The book is a collection of papers presented at the 5th International Workshop on Intelligent Statistical Quality Control in Würzburg, Germany. Contributions deal with methodology and successful industrial applications. They can be grouped in four catagories: Sampling Inspection, Statistical Process Control, Data Analysis and Process Capability Studies and Experimental Design.
Crowdsourcing for large-scale mosquito (Diptera: Culicidae) sampling

Science.gov (United States)

Sampling a cosmopolitan mosquito (Diptera: Culicidae) species throughout its range is logistically challenging and extremely resource intensive. Mosquito control programmes and regional networks operate at the local level and often conduct sampling activities across much of North America. A method f...
Using Pre-Statistical Analysis to Streamline Monitoring Assessments

International Nuclear Information System (INIS)

Reed, J.K.

1999-01-01

A variety of statistical methods exist to aid evaluation of groundwater quality and subsequent decision making in regulatory programs. These methods are applied because of large temporal and spatial extrapolations commonly applied to these data. In short, statistical conclusions often serve as a surrogate for knowledge. However, facilities with mature monitoring programs that have generated abundant data have inherently less uncertainty because of the sheer quantity of analytical results. In these cases, statistical tests can be less important, and ''expert'' data analysis should assume an important screening role.The WSRC Environmental Protection Department, working with the General Separations Area BSRI Environmental Restoration project team has developed a method for an Integrated Hydrogeological Analysis (IHA) of historical water quality data from the F and H Seepage Basins groundwater remediation project. The IHA combines common sense analytical techniques and a GIS presentation that force direct interactive evaluation of the data. The IHA can perform multiple data analysis tasks required by the RCRA permit. These include: (1) Development of a groundwater quality baseline prior to remediation startup, (2) Targeting of constituents for removal from RCRA GWPS, (3) Targeting of constituents for removal from UIC, permit, (4) Targeting of constituents for reduced, (5)Targeting of monitoring wells not producing representative samples, (6) Reduction in statistical evaluation, and (7) Identification of contamination from other facilities
The Math Problem: Advertising Students' Attitudes toward Statistics

Science.gov (United States)

Fullerton, Jami A.; Kendrick, Alice

2013-01-01

This study used the Students' Attitudes toward Statistics Scale (STATS) to measure attitude toward statistics among a national sample of advertising students. A factor analysis revealed four underlying factors make up the attitude toward statistics construct--"Interest & Future Applicability," "Confidence," "Statistical Tools," and "Initiative."…
An Efficient and Reliable Statistical Method for Estimating Functional Connectivity in Large Scale Brain Networks Using Partial Correlation.

Science.gov (United States)

Wang, Yikai; Kang, Jian; Kemmer, Phebe B; Guo, Ying

2016-01-01

Currently, network-oriented analysis of fMRI data has become an important tool for understanding brain organization and brain networks. Among the range of network modeling methods, partial correlation has shown great promises in accurately detecting true brain network connections. However, the application of partial correlation in investigating brain connectivity, especially in large-scale brain networks, has been limited so far due to the technical challenges in its estimation. In this paper, we propose an efficient and reliable statistical method for estimating partial correlation in large-scale brain network modeling. Our method derives partial correlation based on the precision matrix estimated via Constrained L1-minimization Approach (CLIME), which is a recently developed statistical method that is more efficient and demonstrates better performance than the existing methods. To help select an appropriate tuning parameter for sparsity control in the network estimation, we propose a new Dens-based selection method that provides a more informative and flexible tool to allow the users to select the tuning parameter based on the desired sparsity level. Another appealing feature of the Dens-based method is that it is much faster than the existing methods, which provides an important advantage in neuroimaging applications. Simulation studies show that the Dens-based method demonstrates comparable or better performance with respect to the existing methods in network estimation. We applied the proposed partial correlation method to investigate resting state functional connectivity using rs-fMRI data from the Philadelphia Neurodevelopmental Cohort (PNC) study. Our results show that partial correlation analysis removed considerable between-module marginal connections identified by full correlation analysis, suggesting these connections were likely caused by global effects or common connection to other nodes. Based on partial correlation, we find that the most significant
Evaluation of statistical methods for quantifying fractal scaling in water-quality time series with irregular sampling

Science.gov (United States)

Zhang, Qian; Harman, Ciaran J.; Kirchner, James W.

2018-02-01

River water-quality time series often exhibit fractal scaling, which here refers to autocorrelation that decays as a power law over some range of scales. Fractal scaling presents challenges to the identification of deterministic trends because (1) fractal scaling has the potential to lead to false inference about the statistical significance of trends and (2) the abundance of irregularly spaced data in water-quality monitoring networks complicates efforts to quantify fractal scaling. Traditional methods for estimating fractal scaling - in the form of spectral slope (β) or other equivalent scaling parameters (e.g., Hurst exponent) - are generally inapplicable to irregularly sampled data. Here we consider two types of estimation approaches for irregularly sampled data and evaluate their performance using synthetic time series. These time series were generated such that (1) they exhibit a wide range of prescribed fractal scaling behaviors, ranging from white noise (β = 0) to Brown noise (β = 2) and (2) their sampling gap intervals mimic the sampling irregularity (as quantified by both the skewness and mean of gap-interval lengths) in real water-quality data. The results suggest that none of the existing methods fully account for the effects of sampling irregularity on β estimation. First, the results illustrate the danger of using interpolation for gap filling when examining autocorrelation, as the interpolation methods consistently underestimate or overestimate β under a wide range of prescribed β values and gap distributions. Second, the widely used Lomb-Scargle spectral method also consistently underestimates β. A previously published modified form, using only the lowest 5 % of the frequencies for spectral slope estimation, has very poor precision, although the overall bias is small. Third, a recent wavelet-based method, coupled with an aliasing filter, generally has the smallest bias and root-mean-squared error among all methods for a wide range of
Equivalent statistics and data interpretation.

Science.gov (United States)

Francis, Gregory

2017-08-01

Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.
Assessing the sustainable construction of large construction companies in Malaysia

Science.gov (United States)

Adewale, Bamgbade Jibril; Mohammed, Kamaruddeen Ahmed; Nasrun, Mohd Nawi Mohd

2016-08-01

Considering the increasing concerns for the consideration of sustainability issues in construction project delivery within the construction industry, this paper assesses the extent of sustainable construction among Malaysian large contractors, in order to ascertain the level of the industry's impacts on both the environment and the society. Sustainable construction explains the construction industry's responsibility to efficiently utilise the finite resources while also reducing construction impacts on both humans and the environment throughout the phases of construction. This study used proportionate stratified random sampling to conduct a field study with a sample of 172 contractors out of the 708 administered questionnaires. Data were collected from large contractors in the eleven states of peninsular Malaysia. Using the five-level rating scale (which include: 1= Very Low; 2= Low; 3= Moderate; 4= High; 5= Very High) to describe the level of sustainable construction of Malaysian contractors based on previous studies, statistical analysis reveals that environmental, social and economic sustainability of Malaysian large contractors are high.
Simulated tempering distributed replica sampling: A practical guide to enhanced conformational sampling

Energy Technology Data Exchange (ETDEWEB)

Rauscher, Sarah; Pomes, Regis, E-mail: pomes@sickkids.ca

2010-11-01

Simulated tempering distributed replica sampling (STDR) is a generalized-ensemble method designed specifically for simulations of large molecular systems on shared and heterogeneous computing platforms [Rauscher, Neale and Pomes (2009) J. Chem. Theor. Comput. 5, 2640]. The STDR algorithm consists of an alternation of two steps: (1) a short molecular dynamics (MD) simulation; and (2) a stochastic temperature jump. Repeating these steps thousands of times results in a random walk in temperature, which allows the system to overcome energetic barriers, thereby enhancing conformational sampling. The aim of the present paper is to provide a practical guide to applying STDR to complex biomolecular systems. We discuss the details of our STDR implementation, which is a highly-parallel algorithm designed to maximize computational efficiency while simultaneously minimizing network communication and data storage requirements. Using a 35-residue disordered peptide in explicit water as a test system, we characterize the efficiency of the STDR algorithm with respect to both diffusion in temperature space and statistical convergence of structural properties. Importantly, we show that STDR provides a dramatic enhancement of conformational sampling compared to a canonical MD simulation.
Theory of sampling. A mini seminar under the NKS project SAMPSTRAT

International Nuclear Information System (INIS)

Holm, E.; Oestergaard, L.F.; Sidhu, R.

2006-04-01

At an emergency situation a large number of matrixes can be contaminated and samples of these different matrixes will be collected. These sample matrixes might be or often certainly are heterogeneous and in general more unevenly distributed than from nuclear test fallout or even the Chernobyl accident. On basis of the reported data conclusions and remedial actions causing social and economical costs for the society are taken. Therefore the number of samples from each site, their size and further homogenizations is of great importance. In the case of an emergency situation the activities are generally high and the errors due to counting statistics are small. We could also imagine a situation when a certain nuclear enterprise/activity has to close down or being prosecuted, based on sampling and analysis, for not following directives of discarding radioactivity in the environment. We therefore organized a seminar focusing on the above mentioned problems. The seminar covered several important topics such as an introduction to Theory of sampling (TOS), Lot heterogeneity and sampling in practice, Statistics for sampling in analytical chemistry, Representative mass reduction in sampling. Case studies were presented such as Sampling of heterogeneous bottom ash from municipal waste-incineration plants and Sampling and inventories at Thule Greenland, which also illustrated the difficulties with Plutonium Inventory Calculations in Sediments when Hot Particles were present. (au)
Sample size determination and power

CERN Document Server

Ryan, Thomas P, Jr

2013-01-01

THOMAS P. RYAN, PhD, teaches online advanced statistics courses for Northwestern University and The Institute for Statistics Education in sample size determination, design of experiments, engineering statistics, and regression analysis.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.