WorldWideScience

Sample records for sample size statistically

  1. [Effect sizes, statistical power and sample sizes in "the Japanese Journal of Psychology"].

    Science.gov (United States)

    Suzukawa, Yumi; Toyoda, Hideki

    2012-04-01

    This study analyzed the statistical power of research studies published in the "Japanese Journal of Psychology" in 2008 and 2009. Sample effect sizes and sample statistical powers were calculated for each statistical test and analyzed with respect to the analytical methods and the fields of the studies. The results show that in the fields like perception, cognition or learning, the effect sizes were relatively large, although the sample sizes were small. At the same time, because of the small sample sizes, some meaningful effects could not be detected. In the other fields, because of the large sample sizes, meaningless effects could be detected. This implies that researchers who could not get large enough effect sizes would use larger samples to obtain significant results.

  2. Causality in Statistical Power: Isomorphic Properties of Measurement, Research Design, Effect Size, and Sample Size

    Directory of Open Access Journals (Sweden)

    R. Eric Heidel

    2016-01-01

    Full Text Available Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by an a priori sample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up an a priori sample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.

  3. Nomogram for sample size calculation on a straightforward basis for the kappa statistic.

    Science.gov (United States)

    Hong, Hyunsook; Choi, Yunhee; Hahn, Seokyung; Park, Sue Kyung; Park, Byung-Joo

    2014-09-01

    Kappa is a widely used measure of agreement. However, it may not be straightforward in some situation such as sample size calculation due to the kappa paradox: high agreement but low kappa. Hence, it seems reasonable in sample size calculation that the level of agreement under a certain marginal prevalence is considered in terms of a simple proportion of agreement rather than a kappa value. Therefore, sample size formulae and nomograms using a simple proportion of agreement rather than a kappa under certain marginal prevalences are proposed. A sample size formula was derived using the kappa statistic under the common correlation model and goodness-of-fit statistic. The nomogram for the sample size formula was developed using SAS 9.3. The sample size formulae using a simple proportion of agreement instead of a kappa statistic and nomograms to eliminate the inconvenience of using a mathematical formula were produced. A nomogram for sample size calculation with a simple proportion of agreement should be useful in the planning stages when the focus of interest is on testing the hypothesis of interobserver agreement involving two raters and nominal outcome measures. Copyright © 2014 Elsevier Inc. All rights reserved.

  4. Constrained statistical inference: sample-size tables for ANOVA and regression

    Directory of Open Access Journals (Sweden)

    Leonard eVanbrabant

    2015-01-01

    Full Text Available Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient beta1 is larger than beta2 and beta3. The corresponding hypothesis is H: beta1 > {beta2, beta3} and this is known as an (order constrained hypothesis. A major advantage of testing such a hypothesis is that power can be gained and inherently a smaller sample size is needed. This article discusses this gain in sample size reduction, when an increasing number of constraints is included into the hypothesis. The main goal is to present sample-size tables for constrained hypotheses. A sample-size table contains the necessary sample-size at a prespecified power (say, 0.80 for an increasing number of constraints. To obtain sample-size tables, two Monte Carlo simulations were performed, one for ANOVA and one for multiple regression. Three results are salient. First, in an ANOVA the needed sample-size decreases with 30% to 50% when complete ordering of the parameters is taken into account. Second, small deviations from the imposed order have only a minor impact on the power. Third, at the maximum number of constraints, the linear regression results are comparable with the ANOVA results. However, in the case of fewer constraints, ordering the parameters (e.g., beta1 > beta2 results in a higher power than assigning a positive or a negative sign to the parameters (e.g., beta1 > 0.

  5. Subclinical delusional ideation and appreciation of sample size and heterogeneity in statistical judgment.

    Science.gov (United States)

    Galbraith, Niall D; Manktelow, Ken I; Morris, Neil G

    2010-11-01

    Previous studies demonstrate that people high in delusional ideation exhibit a data-gathering bias on inductive reasoning tasks. The current study set out to investigate the factors that may underpin such a bias by examining healthy individuals, classified as either high or low scorers on the Peters et al. Delusions Inventory (PDI). More specifically, whether high PDI scorers have a relatively poor appreciation of sample size and heterogeneity when making statistical judgments. In Expt 1, high PDI scorers made higher probability estimates when generalizing from a sample of 1 with regard to the heterogeneous human property of obesity. In Expt 2, this effect was replicated and was also observed in relation to the heterogeneous property of aggression. The findings suggest that delusion-prone individuals are less appreciative of the importance of sample size when making statistical judgments about heterogeneous properties; this may underpin the data gathering bias observed in previous studies. There was some support for the hypothesis that threatening material would exacerbate high PDI scorers' indifference to sample size.

  6. Speeding Up Non-Parametric Bootstrap Computations for Statistics Based on Sample Moments in Small/Moderate Sample Size Applications.

    Directory of Open Access Journals (Sweden)

    Elias Chaibub Neto

    Full Text Available In this paper we propose a vectorized implementation of the non-parametric bootstrap for statistics based on sample moments. Basically, we adopt the multinomial sampling formulation of the non-parametric bootstrap, and compute bootstrap replications of sample moment statistics by simply weighting the observed data according to multinomial counts instead of evaluating the statistic on a resampled version of the observed data. Using this formulation we can generate a matrix of bootstrap weights and compute the entire vector of bootstrap replications with a few matrix multiplications. Vectorization is particularly important for matrix-oriented programming languages such as R, where matrix/vector calculations tend to be faster than scalar operations implemented in a loop. We illustrate the application of the vectorized implementation in real and simulated data sets, when bootstrapping Pearson's sample correlation coefficient, and compared its performance against two state-of-the-art R implementations of the non-parametric bootstrap, as well as a straightforward one based on a for loop. Our investigations spanned varying sample sizes and number of bootstrap replications. The vectorized bootstrap compared favorably against the state-of-the-art implementations in all cases tested, and was remarkably/considerably faster for small/moderate sample sizes. The same results were observed in the comparison with the straightforward implementation, except for large sample sizes, where the vectorized bootstrap was slightly slower than the straightforward implementation due to increased time expenditures in the generation of weight matrices via multinomial sampling.

  7. Statistical characterization of a large geochemical database and effect of sample size

    Science.gov (United States)

    Zhang, C.; Manheim, F.T.; Hinde, J.; Grossman, J.N.

    2005-01-01

    smaller numbers of data points showed that few elements passed standard statistical tests for normality or log-normality until sample size decreased to a few hundred data points. Large sample size enhances the power of statistical tests, and leads to rejection of most statistical hypotheses for real data sets. For large sample sizes (e.g., n > 1000), graphical methods such as histogram, stem-and-leaf, and probability plots are recommended for rough judgement of probability distribution if needed. ?? 2005 Elsevier Ltd. All rights reserved.

  8. Sample Size Requirements for Assessing Statistical Moments of Simulated Crop Yield Distributions

    NARCIS (Netherlands)

    Lehmann, N.; Finger, R.; Klein, T.; Calanca, P.

    2013-01-01

    Mechanistic crop growth models are becoming increasingly important in agricultural research and are extensively used in climate change impact assessments. In such studies, statistics of crop yields are usually evaluated without the explicit consideration of sample size requirements. The purpose of

  9. The large sample size fallacy.

    Science.gov (United States)

    Lantz, Björn

    2013-06-01

    Significance in the statistical sense has little to do with significance in the common practical sense. Statistical significance is a necessary but not a sufficient condition for practical significance. Hence, results that are extremely statistically significant may be highly nonsignificant in practice. The degree of practical significance is generally determined by the size of the observed effect, not the p-value. The results of studies based on large samples are often characterized by extreme statistical significance despite small or even trivial effect sizes. Interpreting such results as significant in practice without further analysis is referred to as the large sample size fallacy in this article. The aim of this article is to explore the relevance of the large sample size fallacy in contemporary nursing research. Relatively few nursing articles display explicit measures of observed effect sizes or include a qualitative discussion of observed effect sizes. Statistical significance is often treated as an end in itself. Effect sizes should generally be calculated and presented along with p-values for statistically significant results, and observed effect sizes should be discussed qualitatively through direct and explicit comparisons with the effects in related literature. © 2012 Nordic College of Caring Science.

  10. Sample size determination and power

    CERN Document Server

    Ryan, Thomas P, Jr

    2013-01-01

    THOMAS P. RYAN, PhD, teaches online advanced statistics courses for Northwestern University and The Institute for Statistics Education in sample size determination, design of experiments, engineering statistics, and regression analysis.

  11. Concepts in sample size determination

    Directory of Open Access Journals (Sweden)

    Umadevi K Rao

    2012-01-01

    Full Text Available Investigators involved in clinical, epidemiological or translational research, have the drive to publish their results so that they can extrapolate their findings to the population. This begins with the preliminary step of deciding the topic to be studied, the subjects and the type of study design. In this context, the researcher must determine how many subjects would be required for the proposed study. Thus, the number of individuals to be included in the study, i.e., the sample size is an important consideration in the design of many clinical studies. The sample size determination should be based on the difference in the outcome between the two groups studied as in an analytical study, as well as on the accepted p value for statistical significance and the required statistical power to test a hypothesis. The accepted risk of type I error or alpha value, which by convention is set at the 0.05 level in biomedical research defines the cutoff point at which the p value obtained in the study is judged as significant or not. The power in clinical research is the likelihood of finding a statistically significant result when it exists and is typically set to >80%. This is necessary since the most rigorously executed studies may fail to answer the research question if the sample size is too small. Alternatively, a study with too large a sample size will be difficult and will result in waste of time and resources. Thus, the goal of sample size planning is to estimate an appropriate number of subjects for a given study design. This article describes the concepts in estimating the sample size.

  12. Sample Size and Statistical Conclusions from Tests of Fit to the Rasch Model According to the Rasch Unidimensional Measurement Model (Rumm) Program in Health Outcome Measurement.

    Science.gov (United States)

    Hagell, Peter; Westergren, Albert

    Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).

  13. Experimental determination of size distributions: analyzing proper sample sizes

    International Nuclear Information System (INIS)

    Buffo, A; Alopaeus, V

    2016-01-01

    The measurement of various particle size distributions is a crucial aspect for many applications in the process industry. Size distribution is often related to the final product quality, as in crystallization or polymerization. In other cases it is related to the correct evaluation of heat and mass transfer, as well as reaction rates, depending on the interfacial area between the different phases or to the assessment of yield stresses of polycrystalline metals/alloys samples. The experimental determination of such distributions often involves laborious sampling procedures and the statistical significance of the outcome is rarely investigated. In this work, we propose a novel rigorous tool, based on inferential statistics, to determine the number of samples needed to obtain reliable measurements of size distribution, according to specific requirements defined a priori. Such methodology can be adopted regardless of the measurement technique used. (paper)

  14. The Statistics and Mathematics of High Dimension Low Sample Size Asymptotics.

    Science.gov (United States)

    Shen, Dan; Shen, Haipeng; Zhu, Hongtu; Marron, J S

    2016-10-01

    The aim of this paper is to establish several deep theoretical properties of principal component analysis for multiple-component spike covariance models. Our new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size. When this ratio converges to a nonzero constant, the sample eigenvector converges to a cone, with a certain angle to its corresponding population eigenvector. In the High Dimension, Low Sample Size case, the angle between the sample eigenvector and its population counterpart converges to a limiting distribution. Several generalizations of the multi-spike covariance models are also explored, and additional theoretical results are presented.

  15. [Practical aspects regarding sample size in clinical research].

    Science.gov (United States)

    Vega Ramos, B; Peraza Yanes, O; Herrera Correa, G; Saldívar Toraya, S

    1996-01-01

    The knowledge of the right sample size let us to be sure if the published results in medical papers had a suitable design and a proper conclusion according to the statistics analysis. To estimate the sample size we must consider the type I error, type II error, variance, the size of the effect, significance and power of the test. To decide what kind of mathematics formula will be used, we must define what kind of study we have, it means if its a prevalence study, a means values one or a comparative one. In this paper we explain some basic topics of statistics and we describe four simple samples of estimation of sample size.

  16. Sample size calculation in metabolic phenotyping studies.

    Science.gov (United States)

    Billoir, Elise; Navratil, Vincent; Blaise, Benjamin J

    2015-09-01

    The number of samples needed to identify significant effects is a key question in biomedical studies, with consequences on experimental designs, costs and potential discoveries. In metabolic phenotyping studies, sample size determination remains a complex step. This is due particularly to the multiple hypothesis-testing framework and the top-down hypothesis-free approach, with no a priori known metabolic target. Until now, there was no standard procedure available to address this purpose. In this review, we discuss sample size estimation procedures for metabolic phenotyping studies. We release an automated implementation of the Data-driven Sample size Determination (DSD) algorithm for MATLAB and GNU Octave. Original research concerning DSD was published elsewhere. DSD allows the determination of an optimized sample size in metabolic phenotyping studies. The procedure uses analytical data only from a small pilot cohort to generate an expanded data set. The statistical recoupling of variables procedure is used to identify metabolic variables, and their intensity distributions are estimated by Kernel smoothing or log-normal density fitting. Statistically significant metabolic variations are evaluated using the Benjamini-Yekutieli correction and processed for data sets of various sizes. Optimal sample size determination is achieved in a context of biomarker discovery (at least one statistically significant variation) or metabolic exploration (a maximum of statistically significant variations). DSD toolbox is encoded in MATLAB R2008A (Mathworks, Natick, MA) for Kernel and log-normal estimates, and in GNU Octave for log-normal estimates (Kernel density estimates are not robust enough in GNU octave). It is available at http://www.prabi.fr/redmine/projects/dsd/repository, with a tutorial at http://www.prabi.fr/redmine/projects/dsd/wiki. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  17. The quantitative LOD score: test statistic and sample size for exclusion and linkage of quantitative traits in human sibships.

    Science.gov (United States)

    Page, G P; Amos, C I; Boerwinkle, E

    1998-04-01

    We present a test statistic, the quantitative LOD (QLOD) score, for the testing of both linkage and exclusion of quantitative-trait loci in randomly selected human sibships. As with the traditional LOD score, the boundary values of 3, for linkage, and -2, for exclusion, can be used for the QLOD score. We investigated the sample sizes required for inferring exclusion and linkage, for various combinations of linked genetic variance, total heritability, recombination distance, and sibship size, using fixed-size sampling. The sample sizes required for both linkage and exclusion were not qualitatively different and depended on the percentage of variance being linked or excluded and on the total genetic variance. Information regarding linkage and exclusion in sibships larger than size 2 increased as approximately all possible pairs n(n-1)/2 up to sibships of size 6. Increasing the recombination (theta) distance between the marker and the trait loci reduced empirically the power for both linkage and exclusion, as a function of approximately (1-2theta)4.

  18. Effect of model choice and sample size on statistical tolerance limits

    International Nuclear Information System (INIS)

    Duran, B.S.; Campbell, K.

    1980-03-01

    Statistical tolerance limits are estimates of large (or small) quantiles of a distribution, quantities which are very sensitive to the shape of the tail of the distribution. The exact nature of this tail behavior cannot be ascertained brom small samples, so statistical tolerance limits are frequently computed using a statistical model chosen on the basis of theoretical considerations or prior experience with similar populations. This report illustrates the effects of such choices on the computations

  19. Audit sampling: A qualitative study on the role of statistical and non-statistical sampling approaches on audit practices in Sweden

    OpenAIRE

    Ayam, Rufus Tekoh

    2011-01-01

    PURPOSE: The two approaches to audit sampling; statistical and nonstatistical have been examined in this study. The overall purpose of the study is to explore the current extent at which statistical and nonstatistical sampling approaches are utilized by independent auditors during auditing practices. Moreover, the study also seeks to achieve two additional purposes; the first is to find out whether auditors utilize different sampling techniques when auditing SME´s (Small and Medium-Sized Ente...

  20. Relative efficiency and sample size for cluster randomized trials with variable cluster sizes.

    Science.gov (United States)

    You, Zhiying; Williams, O Dale; Aban, Inmaculada; Kabagambe, Edmond Kato; Tiwari, Hemant K; Cutter, Gary

    2011-02-01

    The statistical power of cluster randomized trials depends on two sample size components, the number of clusters per group and the numbers of individuals within clusters (cluster size). Variable cluster sizes are common and this variation alone may have significant impact on study power. Previous approaches have taken this into account by either adjusting total sample size using a designated design effect or adjusting the number of clusters according to an assessment of the relative efficiency of unequal versus equal cluster sizes. This article defines a relative efficiency of unequal versus equal cluster sizes using noncentrality parameters, investigates properties of this measure, and proposes an approach for adjusting the required sample size accordingly. We focus on comparing two groups with normally distributed outcomes using t-test, and use the noncentrality parameter to define the relative efficiency of unequal versus equal cluster sizes and show that statistical power depends only on this parameter for a given number of clusters. We calculate the sample size required for an unequal cluster sizes trial to have the same power as one with equal cluster sizes. Relative efficiency based on the noncentrality parameter is straightforward to calculate and easy to interpret. It connects the required mean cluster size directly to the required sample size with equal cluster sizes. Consequently, our approach first determines the sample size requirements with equal cluster sizes for a pre-specified study power and then calculates the required mean cluster size while keeping the number of clusters unchanged. Our approach allows adjustment in mean cluster size alone or simultaneous adjustment in mean cluster size and number of clusters, and is a flexible alternative to and a useful complement to existing methods. Comparison indicated that we have defined a relative efficiency that is greater than the relative efficiency in the literature under some conditions. Our measure

  1. Breaking Free of Sample Size Dogma to Perform Innovative Translational Research

    Science.gov (United States)

    Bacchetti, Peter; Deeks, Steven G.; McCune, Joseph M.

    2011-01-01

    Innovative clinical and translational research is often delayed or prevented by reviewers’ expectations that any study performed in humans must be shown in advance to have high statistical power. This supposed requirement is not justifiable and is contradicted by the reality that increasing sample size produces diminishing marginal returns. Studies of new ideas often must start small (sometimes even with an N of 1) because of cost and feasibility concerns, and recent statistical work shows that small sample sizes for such research can produce more projected scientific value per dollar spent than larger sample sizes. Renouncing false dogma about sample size would remove a serious barrier to innovation and translation. PMID:21677197

  2. The N-Pact Factor: Evaluating the Quality of Empirical Journals with Respect to Sample Size and Statistical Power

    Science.gov (United States)

    Fraley, R. Chris; Vazire, Simine

    2014-01-01

    The authors evaluate the quality of research reported in major journals in social-personality psychology by ranking those journals with respect to their N-pact Factors (NF)—the statistical power of the empirical studies they publish to detect typical effect sizes. Power is a particularly important attribute for evaluating research quality because, relative to studies that have low power, studies that have high power are more likely to (a) to provide accurate estimates of effects, (b) to produce literatures with low false positive rates, and (c) to lead to replicable findings. The authors show that the average sample size in social-personality research is 104 and that the power to detect the typical effect size in the field is approximately 50%. Moreover, they show that there is considerable variation among journals in sample sizes and power of the studies they publish, with some journals consistently publishing higher power studies than others. The authors hope that these rankings will be of use to authors who are choosing where to submit their best work, provide hiring and promotion committees with a superior way of quantifying journal quality, and encourage competition among journals to improve their NF rankings. PMID:25296159

  3. Combining censored and uncensored data in a U-statistic: design and sample size implications for cell therapy research.

    Science.gov (United States)

    Moyé, Lemuel A; Lai, Dejian; Jing, Kaiyan; Baraniuk, Mary Sarah; Kwak, Minjung; Penn, Marc S; Wu, Colon O

    2011-01-01

    The assumptions that anchor large clinical trials are rooted in smaller, Phase II studies. In addition to specifying the target population, intervention delivery, and patient follow-up duration, physician-scientists who design these Phase II studies must select the appropriate response variables (endpoints). However, endpoint measures can be problematic. If the endpoint assesses the change in a continuous measure over time, then the occurrence of an intervening significant clinical event (SCE), such as death, can preclude the follow-up measurement. Finally, the ideal continuous endpoint measurement may be contraindicated in a fraction of the study patients, a change that requires a less precise substitution in this subset of participants.A score function that is based on the U-statistic can address these issues of 1) intercurrent SCE's and 2) response variable ascertainments that use different measurements of different precision. The scoring statistic is easy to apply, clinically relevant, and provides flexibility for the investigators' prospective design decisions. Sample size and power formulations for this statistic are provided as functions of clinical event rates and effect size estimates that are easy for investigators to identify and discuss. Examples are provided from current cardiovascular cell therapy research.

  4. Estimation of sample size and testing power (Part 4).

    Science.gov (United States)

    Hu, Liang-ping; Bao, Xiao-lei; Guan, Xue; Zhou, Shi-guo

    2012-01-01

    Sample size estimation is necessary for any experimental or survey research. An appropriate estimation of sample size based on known information and statistical knowledge is of great significance. This article introduces methods of sample size estimation of difference test for data with the design of one factor with two levels, including sample size estimation formulas and realization based on the formulas and the POWER procedure of SAS software for quantitative data and qualitative data with the design of one factor with two levels. In addition, this article presents examples for analysis, which will play a leading role for researchers to implement the repetition principle during the research design phase.

  5. Effect size, confidence intervals and statistical power in psychological research.

    Directory of Open Access Journals (Sweden)

    Téllez A.

    2015-07-01

    Full Text Available Quantitative psychological research is focused on detecting the occurrence of certain population phenomena by analyzing data from a sample, and statistics is a particularly helpful mathematical tool that is used by researchers to evaluate hypotheses and make decisions to accept or reject such hypotheses. In this paper, the various statistical tools in psychological research are reviewed. The limitations of null hypothesis significance testing (NHST and the advantages of using effect size and its respective confidence intervals are explained, as the latter two measurements can provide important information about the results of a study. These measurements also can facilitate data interpretation and easily detect trivial effects, enabling researchers to make decisions in a more clinically relevant fashion. Moreover, it is recommended to establish an appropriate sample size by calculating the optimum statistical power at the moment that the research is designed. Psychological journal editors are encouraged to follow APA recommendations strictly and ask authors of original research studies to report the effect size, its confidence intervals, statistical power and, when required, any measure of clinical significance. Additionally, we must account for the teaching of statistics at the graduate level. At that level, students do not receive sufficient information concerning the importance of using different types of effect sizes and their confidence intervals according to the different types of research designs; instead, most of the information is focused on the various tools of NHST.

  6. A statistical rationale for establishing process quality control limits using fixed sample size, for critical current verification of SSC superconducting wire

    International Nuclear Information System (INIS)

    Pollock, D.A.; Brown, G.; Capone, D.W. II; Christopherson, D.; Seuntjens, J.M.; Woltz, J.

    1992-01-01

    This work has demonstrated the statistical concepts behind the XBAR R method for determining sample limits to verify billet I c performance and process uniformity. Using a preliminary population estimate for μ and σ from a stable production lot of only 5 billets, we have shown that reasonable sensitivity to systematic process drift and random within billet variation may be achieved, by using per billet subgroup sizes of moderate proportions. The effects of subgroup size (n) and sampling risk (α and β) on the calculated control limits have been shown to be important factors that need to be carefully considered when selecting an actual number of measurements to be used per billet, for each supplier process. Given the present method of testing in which individual wire samples are ramped to I c only once, with measurement uncertainty due to repeatability and reproducibility (typically > 1.4%), large subgroups (i.e. >30 per billet) appear to be unnecessary, except as an inspection tool to confirm wire process history for each spool. The introduction of the XBAR R method or a similar Statistical Quality Control procedure is recommend for use in the superconducing wire production program, particularly when the program transitions from requiring tests for all pieces of wire to sampling each production unit

  7. Statistical sampling techniques as applied to OSE inspections

    International Nuclear Information System (INIS)

    Davis, J.J.; Cote, R.W.

    1987-01-01

    The need has been recognized for statistically valid methods for gathering information during OSE inspections; and for interpretation of results, both from performance testing and from records reviews, interviews, etc. Battelle Columbus Division, under contract to DOE OSE has performed and is continuing to perform work in the area of statistical methodology for OSE inspections. This paper represents some of the sampling methodology currently being developed for use during OSE inspections. Topics include population definition, sample size requirements, level of confidence and practical logistical constraints associated with the conduct of an inspection based on random sampling. Sequential sampling schemes and sampling from finite populations are also discussed. The methods described are applicable to various data gathering activities, ranging from the sampling and examination of classified documents to the sampling of Protective Force security inspectors for skill testing

  8. Sample size in psychological research over the past 30 years.

    Science.gov (United States)

    Marszalek, Jacob M; Barber, Carolyn; Kohlhart, Julie; Holmes, Cooper B

    2011-04-01

    The American Psychological Association (APA) Task Force on Statistical Inference was formed in 1996 in response to a growing body of research demonstrating methodological issues that threatened the credibility of psychological research, and made recommendations to address them. One issue was the small, even dramatically inadequate, size of samples used in studies published by leading journals. The present study assessed the progress made since the Task Force's final report in 1999. Sample sizes reported in four leading APA journals in 1955, 1977, 1995, and 2006 were compared using nonparametric statistics, while data from the last two waves were fit to a hierarchical generalized linear growth model for more in-depth analysis. Overall, results indicate that the recommendations for increasing sample sizes have not been integrated in core psychological research, although results slightly vary by field. This and other implications are discussed in the context of current methodological critique and practice.

  9. Sample size methodology

    CERN Document Server

    Desu, M M

    2012-01-01

    One of the most important problems in designing an experiment or a survey is sample size determination and this book presents the currently available methodology. It includes both random sampling from standard probability distributions and from finite populations. Also discussed is sample size determination for estimating parameters in a Bayesian setting by considering the posterior distribution of the parameter and specifying the necessary requirements. The determination of the sample size is considered for ranking and selection problems as well as for the design of clinical trials. Appropria

  10. Sample size determination for mediation analysis of longitudinal data.

    Science.gov (United States)

    Pan, Haitao; Liu, Suyu; Miao, Danmin; Yuan, Ying

    2018-03-27

    Sample size planning for longitudinal data is crucial when designing mediation studies because sufficient statistical power is not only required in grant applications and peer-reviewed publications, but is essential to reliable research results. However, sample size determination is not straightforward for mediation analysis of longitudinal design. To facilitate planning the sample size for longitudinal mediation studies with a multilevel mediation model, this article provides the sample size required to achieve 80% power by simulations under various sizes of the mediation effect, within-subject correlations and numbers of repeated measures. The sample size calculation is based on three commonly used mediation tests: Sobel's method, distribution of product method and the bootstrap method. Among the three methods of testing the mediation effects, Sobel's method required the largest sample size to achieve 80% power. Bootstrapping and the distribution of the product method performed similarly and were more powerful than Sobel's method, as reflected by the relatively smaller sample sizes. For all three methods, the sample size required to achieve 80% power depended on the value of the ICC (i.e., within-subject correlation). A larger value of ICC typically required a larger sample size to achieve 80% power. Simulation results also illustrated the advantage of the longitudinal study design. The sample size tables for most encountered scenarios in practice have also been published for convenient use. Extensive simulations study showed that the distribution of the product method and bootstrapping method have superior performance to the Sobel's method, but the product method was recommended to use in practice in terms of less computation time load compared to the bootstrapping method. A R package has been developed for the product method of sample size determination in mediation longitudinal study design.

  11. The application of statistical and/or non-statistical sampling techniques by internal audit functions in the South African banking industry

    Directory of Open Access Journals (Sweden)

    D.P. van der Nest

    2015-03-01

    Full Text Available This article explores the use by internal audit functions of audit sampling techniques in order to test the effectiveness of controls in the banking sector. The article focuses specifically on the use of statistical and/or non-statistical sampling techniques by internal auditors. The focus of the research for this article was internal audit functions in the banking sector of South Africa. The results discussed in the article indicate that audit sampling is still used frequently as an audit evidence-gathering technique. Non-statistical sampling techniques are used more frequently than statistical sampling techniques for the evaluation of the sample. In addition, both techniques are regarded as important for the determination of the sample size and the selection of the sample items

  12. A normative inference approach for optimal sample sizes in decisions from experience

    Science.gov (United States)

    Ostwald, Dirk; Starke, Ludger; Hertwig, Ralph

    2015-01-01

    “Decisions from experience” (DFE) refers to a body of work that emerged in research on behavioral decision making over the last decade. One of the major experimental paradigms employed to study experience-based choice is the “sampling paradigm,” which serves as a model of decision making under limited knowledge about the statistical structure of the world. In this paradigm respondents are presented with two payoff distributions, which, in contrast to standard approaches in behavioral economics, are specified not in terms of explicit outcome-probability information, but by the opportunity to sample outcomes from each distribution without economic consequences. Participants are encouraged to explore the distributions until they feel confident enough to decide from which they would prefer to draw from in a final trial involving real monetary payoffs. One commonly employed measure to characterize the behavior of participants in the sampling paradigm is the sample size, that is, the number of outcome draws which participants choose to obtain from each distribution prior to terminating sampling. A natural question that arises in this context concerns the “optimal” sample size, which could be used as a normative benchmark to evaluate human sampling behavior in DFE. In this theoretical study, we relate the DFE sampling paradigm to the classical statistical decision theoretic literature and, under a probabilistic inference assumption, evaluate optimal sample sizes for DFE. In our treatment we go beyond analytically established results by showing how the classical statistical decision theoretic framework can be used to derive optimal sample sizes under arbitrary, but numerically evaluable, constraints. Finally, we critically evaluate the value of deriving optimal sample sizes under this framework as testable predictions for the experimental study of sampling behavior in DFE. PMID:26441720

  13. Optimum sample size allocation to minimize cost or maximize power for the two-sample trimmed mean test.

    Science.gov (United States)

    Guo, Jiin-Huarng; Luh, Wei-Ming

    2009-05-01

    When planning a study, sample size determination is one of the most important tasks facing the researcher. The size will depend on the purpose of the study, the cost limitations, and the nature of the data. By specifying the standard deviation ratio and/or the sample size ratio, the present study considers the problem of heterogeneous variances and non-normality for Yuen's two-group test and develops sample size formulas to minimize the total cost or maximize the power of the test. For a given power, the sample size allocation ratio can be manipulated so that the proposed formulas can minimize the total cost, the total sample size, or the sum of total sample size and total cost. On the other hand, for a given total cost, the optimum sample size allocation ratio can maximize the statistical power of the test. After the sample size is determined, the present simulation applies Yuen's test to the sample generated, and then the procedure is validated in terms of Type I errors and power. Simulation results show that the proposed formulas can control Type I errors and achieve the desired power under the various conditions specified. Finally, the implications for determining sample sizes in experimental studies and future research are discussed.

  14. Sample size re-assessment leading to a raised sample size does not inflate type I error rate under mild conditions.

    Science.gov (United States)

    Broberg, Per

    2013-07-19

    One major concern with adaptive designs, such as the sample size adjustable designs, has been the fear of inflating the type I error rate. In (Stat Med 23:1023-1038, 2004) it is however proven that when observations follow a normal distribution and the interim result show promise, meaning that the conditional power exceeds 50%, type I error rate is protected. This bound and the distributional assumptions may seem to impose undesirable restrictions on the use of these designs. In (Stat Med 30:3267-3284, 2011) the possibility of going below 50% is explored and a region that permits an increased sample size without inflation is defined in terms of the conditional power at the interim. A criterion which is implicit in (Stat Med 30:3267-3284, 2011) is derived by elementary methods and expressed in terms of the test statistic at the interim to simplify practical use. Mathematical and computational details concerning this criterion are exhibited. Under very general conditions the type I error rate is preserved under sample size adjustable schemes that permit a raise. The main result states that for normally distributed observations raising the sample size when the result looks promising, where the definition of promising depends on the amount of knowledge gathered so far, guarantees the protection of the type I error rate. Also, in the many situations where the test statistic approximately follows a normal law, the deviation from the main result remains negligible. This article provides details regarding the Weibull and binomial distributions and indicates how one may approach these distributions within the current setting. There is thus reason to consider such designs more often, since they offer a means of adjusting an important design feature at little or no cost in terms of error rate.

  15. Sample sizes and model comparison metrics for species distribution models

    Science.gov (United States)

    B.B. Hanberry; H.S. He; D.C. Dey

    2012-01-01

    Species distribution models use small samples to produce continuous distribution maps. The question of how small a sample can be to produce an accurate model generally has been answered based on comparisons to maximum sample sizes of 200 observations or fewer. In addition, model comparisons often are made with the kappa statistic, which has become controversial....

  16. Differentiating gold nanorod samples using particle size and shape distributions from transmission electron microscope images

    Science.gov (United States)

    Grulke, Eric A.; Wu, Xiaochun; Ji, Yinglu; Buhr, Egbert; Yamamoto, Kazuhiro; Song, Nam Woong; Stefaniak, Aleksandr B.; Schwegler-Berry, Diane; Burchett, Woodrow W.; Lambert, Joshua; Stromberg, Arnold J.

    2018-04-01

    Size and shape distributions of gold nanorod samples are critical to their physico-chemical properties, especially their longitudinal surface plasmon resonance. This interlaboratory comparison study developed methods for measuring and evaluating size and shape distributions for gold nanorod samples using transmission electron microscopy (TEM) images. The objective was to determine whether two different samples, which had different performance attributes in their application, were different with respect to their size and/or shape descriptor distributions. Touching particles in the captured images were identified using a ruggedness shape descriptor. Nanorods could be distinguished from nanocubes using an elongational shape descriptor. A non-parametric statistical test showed that cumulative distributions of an elongational shape descriptor, that is, the aspect ratio, were statistically different between the two samples for all laboratories. While the scale parameters of size and shape distributions were similar for both samples, the width parameters of size and shape distributions were statistically different. This protocol fulfills an important need for a standardized approach to measure gold nanorod size and shape distributions for applications in which quantitative measurements and comparisons are important. Furthermore, the validated protocol workflow can be automated, thus providing consistent and rapid measurements of nanorod size and shape distributions for researchers, regulatory agencies, and industry.

  17. Effect size and statistical power in the rodent fear conditioning literature - A systematic review.

    Science.gov (United States)

    Carneiro, Clarissa F D; Moulin, Thiago C; Macleod, Malcolm R; Amaral, Olavo B

    2018-01-01

    Proposals to increase research reproducibility frequently call for focusing on effect sizes instead of p values, as well as for increasing the statistical power of experiments. However, it is unclear to what extent these two concepts are indeed taken into account in basic biomedical science. To study this in a real-case scenario, we performed a systematic review of effect sizes and statistical power in studies on learning of rodent fear conditioning, a widely used behavioral task to evaluate memory. Our search criteria yielded 410 experiments comparing control and treated groups in 122 articles. Interventions had a mean effect size of 29.5%, and amnesia caused by memory-impairing interventions was nearly always partial. Mean statistical power to detect the average effect size observed in well-powered experiments with significant differences (37.2%) was 65%, and was lower among studies with non-significant results. Only one article reported a sample size calculation, and our estimated sample size to achieve 80% power considering typical effect sizes and variances (15 animals per group) was reached in only 12.2% of experiments. Actual effect sizes correlated with effect size inferences made by readers on the basis of textual descriptions of results only when findings were non-significant, and neither effect size nor power correlated with study quality indicators, number of citations or impact factor of the publishing journal. In summary, effect sizes and statistical power have a wide distribution in the rodent fear conditioning literature, but do not seem to have a large influence on how results are described or cited. Failure to take these concepts into consideration might limit attempts to improve reproducibility in this field of science.

  18. Statistical Analysis Of Tank 19F Floor Sample Results

    International Nuclear Information System (INIS)

    Harris, S.

    2010-01-01

    Representative sampling has been completed for characterization of the residual material on the floor of Tank 19F as per the statistical sampling plan developed by Harris and Shine. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples results to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL95%) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current scrape sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 19F. The uncertainty is quantified in this report by an UCL95% on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL95% was based entirely on the six current scrape sample results (each averaged across three analytical determinations).

  19. Predictors of Citation Rate in Psychology: Inconclusive Influence of Effect and Sample Size.

    Science.gov (United States)

    Hanel, Paul H P; Haase, Jennifer

    2017-01-01

    In the present article, we investigate predictors of how often a scientific article is cited. Specifically, we focus on the influence of two often neglected predictors of citation rate: effect size and sample size, using samples from two psychological topical areas. Both can be considered as indicators of the importance of an article and post hoc (or observed) statistical power, and should, especially in applied fields, predict citation rates. In Study 1, effect size did not have an influence on citation rates across a topical area, both with and without controlling for numerous variables that have been previously linked to citation rates. In contrast, sample size predicted citation rates, but only while controlling for other variables. In Study 2, sample and partly effect sizes predicted citation rates, indicating that the relations vary even between scientific topical areas. Statistically significant results had more citations in Study 2 but not in Study 1. The results indicate that the importance (or power) of scientific findings may not be as strongly related to citation rate as is generally assumed.

  20. The PowerAtlas: a power and sample size atlas for microarray experimental design and research

    Directory of Open Access Journals (Sweden)

    Wang Jelai

    2006-02-01

    Full Text Available Abstract Background Microarrays permit biologists to simultaneously measure the mRNA abundance of thousands of genes. An important issue facing investigators planning microarray experiments is how to estimate the sample size required for good statistical power. What is the projected sample size or number of replicate chips needed to address the multiple hypotheses with acceptable accuracy? Statistical methods exist for calculating power based upon a single hypothesis, using estimates of the variability in data from pilot studies. There is, however, a need for methods to estimate power and/or required sample sizes in situations where multiple hypotheses are being tested, such as in microarray experiments. In addition, investigators frequently do not have pilot data to estimate the sample sizes required for microarray studies. Results To address this challenge, we have developed a Microrarray PowerAtlas 1. The atlas enables estimation of statistical power by allowing investigators to appropriately plan studies by building upon previous studies that have similar experimental characteristics. Currently, there are sample sizes and power estimates based on 632 experiments from Gene Expression Omnibus (GEO. The PowerAtlas also permits investigators to upload their own pilot data and derive power and sample size estimates from these data. This resource will be updated regularly with new datasets from GEO and other databases such as The Nottingham Arabidopsis Stock Center (NASC. Conclusion This resource provides a valuable tool for investigators who are planning efficient microarray studies and estimating required sample sizes.

  1. Sampling, Probability Models and Statistical Reasoning Statistical

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 1; Issue 5. Sampling, Probability Models and Statistical Reasoning Statistical Inference. Mohan Delampady V R Padmawar. General Article Volume 1 Issue 5 May 1996 pp 49-58 ...

  2. Sample size for morphological traits of pigeonpea

    Directory of Open Access Journals (Sweden)

    Giovani Facco

    2015-12-01

    Full Text Available The objectives of this study were to determine the sample size (i.e., number of plants required to accurately estimate the average of morphological traits of pigeonpea (Cajanus cajan L. and to check for variability in sample size between evaluation periods and seasons. Two uniformity trials (i.e., experiments without treatment were conducted for two growing seasons. In the first season (2011/2012, the seeds were sown by broadcast seeding, and in the second season (2012/2013, the seeds were sown in rows spaced 0.50 m apart. The ground area in each experiment was 1,848 m2, and 360 plants were marked in the central area, in a 2 m × 2 m grid. Three morphological traits (e.g., number of nodes, plant height and stem diameter were evaluated 13 times during the first season and 22 times in the second season. Measurements for all three morphological traits were normally distributed and confirmed through the Kolmogorov-Smirnov test. Randomness was confirmed using the Run Test, and the descriptive statistics were calculated. For each trait, the sample size (n was calculated for the semiamplitudes of the confidence interval (i.e., estimation error equal to 2, 4, 6, ..., 20% of the estimated mean with a confidence coefficient (1-? of 95%. Subsequently, n was fixed at 360 plants, and the estimation error of the estimated percentage of the average for each trait was calculated. Variability of the sample size for the pigeonpea culture was observed between the morphological traits evaluated, among the evaluation periods and between seasons. Therefore, to assess with an accuracy of 6% of the estimated average, at least 136 plants must be evaluated throughout the pigeonpea crop cycle to determine the sample size for the traits (e.g., number of nodes, plant height and stem diameter in the different evaluation periods and between seasons. 

  3. Evaluation of Approaches to Analyzing Continuous Correlated Eye Data When Sample Size Is Small.

    Science.gov (United States)

    Huang, Jing; Huang, Jiayan; Chen, Yong; Ying, Gui-Shuang

    2018-02-01

    To evaluate the performance of commonly used statistical methods for analyzing continuous correlated eye data when sample size is small. We simulated correlated continuous data from two designs: (1) two eyes of a subject in two comparison groups; (2) two eyes of a subject in the same comparison group, under various sample size (5-50), inter-eye correlation (0-0.75) and effect size (0-0.8). Simulated data were analyzed using paired t-test, two sample t-test, Wald test and score test using the generalized estimating equations (GEE) and F-test using linear mixed effects model (LMM). We compared type I error rates and statistical powers, and demonstrated analysis approaches through analyzing two real datasets. In design 1, paired t-test and LMM perform better than GEE, with nominal type 1 error rate and higher statistical power. In design 2, no test performs uniformly well: two sample t-test (average of two eyes or a random eye) achieves better control of type I error but yields lower statistical power. In both designs, the GEE Wald test inflates type I error rate and GEE score test has lower power. When sample size is small, some commonly used statistical methods do not perform well. Paired t-test and LMM perform best when two eyes of a subject are in two different comparison groups, and t-test using the average of two eyes performs best when the two eyes are in the same comparison group. When selecting the appropriate analysis approach the study design should be considered.

  4. Sample Size and Saturation in PhD Studies Using Qualitative Interviews

    Directory of Open Access Journals (Sweden)

    Mark Mason

    2010-08-01

    Full Text Available A number of issues can affect sample size in qualitative research; however, the guiding principle should be the concept of saturation. This has been explored in detail by a number of authors but is still hotly debated, and some say little understood. A sample of PhD studies using qualitative approaches, and qualitative interviews as the method of data collection was taken from theses.com and contents analysed for their sample sizes. Five hundred and sixty studies were identified that fitted the inclusion criteria. Results showed that the mean sample size was 31; however, the distribution was non-random, with a statistically significant proportion of studies, presenting sample sizes that were multiples of ten. These results are discussed in relation to saturation. They suggest a pre-meditated approach that is not wholly congruent with the principles of qualitative research. URN: urn:nbn:de:0114-fqs100387

  5. Sample size estimation and sampling techniques for selecting a representative sample

    Directory of Open Access Journals (Sweden)

    Aamir Omair

    2014-01-01

    Full Text Available Introduction: The purpose of this article is to provide a general understanding of the concepts of sampling as applied to health-related research. Sample Size Estimation: It is important to select a representative sample in quantitative research in order to be able to generalize the results to the target population. The sample should be of the required sample size and must be selected using an appropriate probability sampling technique. There are many hidden biases which can adversely affect the outcome of the study. Important factors to consider for estimating the sample size include the size of the study population, confidence level, expected proportion of the outcome variable (for categorical variables/standard deviation of the outcome variable (for numerical variables, and the required precision (margin of accuracy from the study. The more the precision required, the greater is the required sample size. Sampling Techniques: The probability sampling techniques applied for health related research include simple random sampling, systematic random sampling, stratified random sampling, cluster sampling, and multistage sampling. These are more recommended than the nonprobability sampling techniques, because the results of the study can be generalized to the target population.

  6. Effect size and statistical power in the rodent fear conditioning literature – A systematic review

    Science.gov (United States)

    Macleod, Malcolm R.

    2018-01-01

    Proposals to increase research reproducibility frequently call for focusing on effect sizes instead of p values, as well as for increasing the statistical power of experiments. However, it is unclear to what extent these two concepts are indeed taken into account in basic biomedical science. To study this in a real-case scenario, we performed a systematic review of effect sizes and statistical power in studies on learning of rodent fear conditioning, a widely used behavioral task to evaluate memory. Our search criteria yielded 410 experiments comparing control and treated groups in 122 articles. Interventions had a mean effect size of 29.5%, and amnesia caused by memory-impairing interventions was nearly always partial. Mean statistical power to detect the average effect size observed in well-powered experiments with significant differences (37.2%) was 65%, and was lower among studies with non-significant results. Only one article reported a sample size calculation, and our estimated sample size to achieve 80% power considering typical effect sizes and variances (15 animals per group) was reached in only 12.2% of experiments. Actual effect sizes correlated with effect size inferences made by readers on the basis of textual descriptions of results only when findings were non-significant, and neither effect size nor power correlated with study quality indicators, number of citations or impact factor of the publishing journal. In summary, effect sizes and statistical power have a wide distribution in the rodent fear conditioning literature, but do not seem to have a large influence on how results are described or cited. Failure to take these concepts into consideration might limit attempts to improve reproducibility in this field of science. PMID:29698451

  7. Statistical distribution sampling

    Science.gov (United States)

    Johnson, E. S.

    1975-01-01

    Determining the distribution of statistics by sampling was investigated. Characteristic functions, the quadratic regression problem, and the differential equations for the characteristic functions are analyzed.

  8. Statistical analyses to support guidelines for marine avian sampling. Final report

    Science.gov (United States)

    Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris

    2012-01-01

    Interest in development of offshore renewable energy facilities has led to a need for high-quality, statistically robust information on marine wildlife distributions. A practical approach is described to estimate the amount of sampling effort required to have sufficient statistical power to identify species-specific “hotspots” and “coldspots” of marine bird abundance and occurrence in an offshore environment divided into discrete spatial units (e.g., lease blocks), where “hotspots” and “coldspots” are defined relative to a reference (e.g., regional) mean abundance and/or occurrence probability for each species of interest. For example, a location with average abundance or occurrence that is three times larger the mean (3x effect size) could be defined as a “hotspot,” and a location that is three times smaller than the mean (1/3x effect size) as a “coldspot.” The choice of the effect size used to define hot and coldspots will generally depend on a combination of ecological and regulatory considerations. A method is also developed for testing the statistical significance of possible hotspots and coldspots. Both methods are illustrated with historical seabird survey data from the USGS Avian Compendium Database. Our approach consists of five main components: 1. A review of the primary scientific literature on statistical modeling of animal group size and avian count data to develop a candidate set of statistical distributions that have been used or may be useful to model seabird counts. 2. Statistical power curves for one-sample, one-tailed Monte Carlo significance tests of differences of observed small-sample means from a specified reference distribution. These curves show the power to detect "hotspots" or "coldspots" of occurrence and abundance at a range of effect sizes, given assumptions which we discuss. 3. A model selection procedure, based on maximum likelihood fits of models in the candidate set, to determine an appropriate statistical

  9. STATISTICAL ANALYSIS OF TANK 18F FLOOR SAMPLE RESULTS

    Energy Technology Data Exchange (ETDEWEB)

    Harris, S.

    2010-09-02

    Representative sampling has been completed for characterization of the residual material on the floor of Tank 18F as per the statistical sampling plan developed by Shine [1]. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL [2]. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples results [3] to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL{sub 95%}) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 18F. The uncertainty is quantified in this report by an upper 95% confidence limit (UCL{sub 95%}) on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL{sub 95%} was based entirely on the six current scrape sample results (each averaged across three analytical determinations).

  10. Optimal sample size for probability of detection curves

    International Nuclear Information System (INIS)

    Annis, Charles; Gandossi, Luca; Martin, Oliver

    2013-01-01

    Highlights: • We investigate sample size requirement to develop probability of detection curves. • We develop simulations to determine effective inspection target sizes, number and distribution. • We summarize these findings and provide guidelines for the NDE practitioner. -- Abstract: The use of probability of detection curves to quantify the reliability of non-destructive examination (NDE) systems is common in the aeronautical industry, but relatively less so in the nuclear industry, at least in European countries. Due to the nature of the components being inspected, sample sizes tend to be much lower. This makes the manufacturing of test pieces with representative flaws, in sufficient numbers, so to draw statistical conclusions on the reliability of the NDT system under investigation, quite costly. The European Network for Inspection and Qualification (ENIQ) has developed an inspection qualification methodology, referred to as the ENIQ Methodology. It has become widely used in many European countries and provides assurance on the reliability of NDE systems, but only qualitatively. The need to quantify the output of inspection qualification has become more important as structural reliability modelling and quantitative risk-informed in-service inspection methodologies become more widely used. A measure of the NDE reliability is necessary to quantify risk reduction after inspection and probability of detection (POD) curves provide such a metric. The Joint Research Centre, Petten, The Netherlands supported ENIQ by investigating the question of the sample size required to determine a reliable POD curve. As mentioned earlier manufacturing of test pieces with defects that are typically found in nuclear power plants (NPPs) is usually quite expensive. Thus there is a tendency to reduce sample sizes, which in turn increases the uncertainty associated with the resulting POD curve. The main question in conjunction with POS curves is the appropriate sample size. Not

  11. Computing Confidence Bounds for Power and Sample Size of the General Linear Univariate Model

    OpenAIRE

    Taylor, Douglas J.; Muller, Keith E.

    1995-01-01

    The power of a test, the probability of rejecting the null hypothesis in favor of an alternative, may be computed using estimates of one or more distributional parameters. Statisticians frequently fix mean values and calculate power or sample size using a variance estimate from an existing study. Hence computed power becomes a random variable for a fixed sample size. Likewise, the sample size necessary to achieve a fixed power varies randomly. Standard statistical practice requires reporting ...

  12. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

    Science.gov (United States)

    Lin, Johnny; Bentler, Peter M

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.

  13. Statistical sampling method for releasing decontaminated vehicles

    International Nuclear Information System (INIS)

    Lively, J.W.; Ware, J.A.

    1996-01-01

    Earth moving vehicles (e.g., dump trucks, belly dumps) commonly haul radiologically contaminated materials from a site being remediated to a disposal site. Traditionally, each vehicle must be surveyed before being released. The logistical difficulties of implementing the traditional approach on a large scale demand that an alternative be devised. A statistical method (MIL-STD-105E, open-quotes Sampling Procedures and Tables for Inspection by Attributesclose quotes) for assessing product quality from a continuous process was adapted to the vehicle decontamination process. This method produced a sampling scheme that automatically compensates and accommodates fluctuating batch sizes and changing conditions without the need to modify or rectify the sampling scheme in the field. Vehicles are randomly selected (sampled) upon completion of the decontamination process to be surveyed for residual radioactive surface contamination. The frequency of sampling is based on the expected number of vehicles passing through the decontamination process in a given period and the confidence level desired. This process has been successfully used for 1 year at the former uranium mill site in Monticello, Utah (a CERCLA regulated clean-up site). The method forces improvement in the quality of the decontamination process and results in a lower likelihood that vehicles exceeding the surface contamination standards are offered for survey. Implementation of this statistical sampling method on Monticello Projects has resulted in more efficient processing of vehicles through decontamination and radiological release, saved hundreds of hours of processing time, provided a high level of confidence that release limits are met, and improved the radiological cleanliness of vehicles leaving the controlled site

  14. Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics

    DEFF Research Database (Denmark)

    Holland, Dominic; Wang, Yunpeng; Thompson, Wesley K

    2016-01-01

    Genome-wide Association Studies (GWAS) result in millions of summary statistics ("z-scores") for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric......-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities, relying only on summary statistics from GWAS substudies, and a scheme allowing...... for estimating the degree of polygenicity of the phenotype and predicting the proportion of chip heritability explainable by genome-wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N = 82,315) and putamen volume (N = 12,596), with approximately...

  15. Discrepancies in sample size calculations and data analyses reported in randomised trials: comparison of publications with protocols

    DEFF Research Database (Denmark)

    Chan, A.W.; Hrobjartsson, A.; Jorgensen, K.J.

    2008-01-01

    OBJECTIVE: To evaluate how often sample size calculations and methods of statistical analysis are pre-specified or changed in randomised trials. DESIGN: Retrospective cohort study. Data source Protocols and journal publications of published randomised parallel group trials initially approved...... in 1994-5 by the scientific-ethics committees for Copenhagen and Frederiksberg, Denmark (n=70). MAIN OUTCOME MEASURE: Proportion of protocols and publications that did not provide key information about sample size calculations and statistical methods; proportion of trials with discrepancies between...... of handling missing data was described in 16 protocols and 49 publications. 39/49 protocols and 42/43 publications reported the statistical test used to analyse primary outcome measures. Unacknowledged discrepancies between protocols and publications were found for sample size calculations (18/34 trials...

  16. Choosing a suitable sample size in descriptive sampling

    International Nuclear Information System (INIS)

    Lee, Yong Kyun; Choi, Dong Hoon; Cha, Kyung Joon

    2010-01-01

    Descriptive sampling (DS) is an alternative to crude Monte Carlo sampling (CMCS) in finding solutions to structural reliability problems. It is known to be an effective sampling method in approximating the distribution of a random variable because it uses the deterministic selection of sample values and their random permutation,. However, because this method is difficult to apply to complex simulations, the sample size is occasionally determined without thorough consideration. Input sample variability may cause the sample size to change between runs, leading to poor simulation results. This paper proposes a numerical method for choosing a suitable sample size for use in DS. Using this method, one can estimate a more accurate probability of failure in a reliability problem while running a minimal number of simulations. The method is then applied to several examples and compared with CMCS and conventional DS to validate its usefulness and efficiency

  17. Sample Size Calculations for Population Size Estimation Studies Using Multiplier Methods With Respondent-Driven Sampling Surveys.

    Science.gov (United States)

    Fearon, Elizabeth; Chabata, Sungai T; Thompson, Jennifer A; Cowan, Frances M; Hargreaves, James R

    2017-09-14

    While guidance exists for obtaining population size estimates using multiplier methods with respondent-driven sampling surveys, we lack specific guidance for making sample size decisions. To guide the design of multiplier method population size estimation studies using respondent-driven sampling surveys to reduce the random error around the estimate obtained. The population size estimate is obtained by dividing the number of individuals receiving a service or the number of unique objects distributed (M) by the proportion of individuals in a representative survey who report receipt of the service or object (P). We have developed an approach to sample size calculation, interpreting methods to estimate the variance around estimates obtained using multiplier methods in conjunction with research into design effects and respondent-driven sampling. We describe an application to estimate the number of female sex workers in Harare, Zimbabwe. There is high variance in estimates. Random error around the size estimate reflects uncertainty from M and P, particularly when the estimate of P in the respondent-driven sampling survey is low. As expected, sample size requirements are higher when the design effect of the survey is assumed to be greater. We suggest a method for investigating the effects of sample size on the precision of a population size estimate obtained using multipler methods and respondent-driven sampling. Uncertainty in the size estimate is high, particularly when P is small, so balancing against other potential sources of bias, we advise researchers to consider longer service attendance reference periods and to distribute more unique objects, which is likely to result in a higher estimate of P in the respondent-driven sampling survey. ©Elizabeth Fearon, Sungai T Chabata, Jennifer A Thompson, Frances M Cowan, James R Hargreaves. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 14.09.2017.

  18. [Formal sample size calculation and its limited validity in animal studies of medical basic research].

    Science.gov (United States)

    Mayer, B; Muche, R

    2013-01-01

    Animal studies are highly relevant for basic medical research, although their usage is discussed controversially in public. Thus, an optimal sample size for these projects should be aimed at from a biometrical point of view. Statistical sample size calculation is usually the appropriate methodology in planning medical research projects. However, required information is often not valid or only available during the course of an animal experiment. This article critically discusses the validity of formal sample size calculation for animal studies. Within the discussion, some requirements are formulated to fundamentally regulate the process of sample size determination for animal experiments.

  19. A visual basic program to generate sediment grain-size statistics and to extrapolate particle distributions

    Science.gov (United States)

    Poppe, L.J.; Eliason, A.H.; Hastings, M.E.

    2004-01-01

    Measures that describe and summarize sediment grain-size distributions are important to geologists because of the large amount of information contained in textural data sets. Statistical methods are usually employed to simplify the necessary comparisons among samples and quantify the observed differences. The two statistical methods most commonly used by sedimentologists to describe particle distributions are mathematical moments (Krumbein and Pettijohn, 1938) and inclusive graphics (Folk, 1974). The choice of which of these statistical measures to use is typically governed by the amount of data available (Royse, 1970). If the entire distribution is known, the method of moments may be used; if the next to last accumulated percent is greater than 95, inclusive graphics statistics can be generated. Unfortunately, earlier programs designed to describe sediment grain-size distributions statistically do not run in a Windows environment, do not allow extrapolation of the distribution's tails, or do not generate both moment and graphic statistics (Kane and Hubert, 1963; Collias et al., 1963; Schlee and Webster, 1967; Poppe et al., 2000)1.Owing to analytical limitations, electro-resistance multichannel particle-size analyzers, such as Coulter Counters, commonly truncate the tails of the fine-fraction part of grain-size distributions. These devices do not detect fine clay in the 0.6–0.1 μm range (part of the 11-phi and all of the 12-phi and 13-phi fractions). Although size analyses performed down to 0.6 μm microns are adequate for most freshwater and near shore marine sediments, samples from many deeper water marine environments (e.g. rise and abyssal plain) may contain significant material in the fine clay fraction, and these analyses benefit from extrapolation.The program (GSSTAT) described herein generates statistics to characterize sediment grain-size distributions and can extrapolate the fine-grained end of the particle distribution. It is written in Microsoft

  20. Statistical Symbolic Execution with Informed Sampling

    Science.gov (United States)

    Filieri, Antonio; Pasareanu, Corina S.; Visser, Willem; Geldenhuys, Jaco

    2014-01-01

    Symbolic execution techniques have been proposed recently for the probabilistic analysis of programs. These techniques seek to quantify the likelihood of reaching program events of interest, e.g., assert violations. They have many promising applications but have scalability issues due to high computational demand. To address this challenge, we propose a statistical symbolic execution technique that performs Monte Carlo sampling of the symbolic program paths and uses the obtained information for Bayesian estimation and hypothesis testing with respect to the probability of reaching the target events. To speed up the convergence of the statistical analysis, we propose Informed Sampling, an iterative symbolic execution that first explores the paths that have high statistical significance, prunes them from the state space and guides the execution towards less likely paths. The technique combines Bayesian estimation with a partial exact analysis for the pruned paths leading to provably improved convergence of the statistical analysis. We have implemented statistical symbolic execution with in- formed sampling in the Symbolic PathFinder tool. We show experimentally that the informed sampling obtains more precise results and converges faster than a purely statistical analysis and may also be more efficient than an exact symbolic analysis. When the latter does not terminate symbolic execution with informed sampling can give meaningful results under the same time and memory limits.

  1. Impact of shoe size in a sample of elderly individuals

    Directory of Open Access Journals (Sweden)

    Daniel López-López

    Full Text Available Summary Introduction: The use of an improper shoe size is common in older people and is believed to have a detrimental effect on the quality of life related to foot health. The objective is to describe and compare, in a sample of participants, the impact of shoes that fit properly or improperly, as well as analyze the scores related to foot health and health overall. Method: A sample of 64 participants, with a mean age of 75.3±7.9 years, attended an outpatient center where self-report data was recorded, the measurements of the size of the feet and footwear were determined and the scores compared between the group that wears the correct size of shoes and another group of individuals who do not wear the correct size of shoes, using the Spanish version of the Foot Health Status Questionnaire. Results: The group wearing an improper shoe size showed poorer quality of life regarding overall health and specifically foot health. Differences between groups were evaluated using a t-test for independent samples resulting statistically significant (p<0.05 for the dimension of pain, function, footwear, overall foot health, and social function. Conclusion: Inadequate shoe size has a significant negative impact on quality of life related to foot health. The degree of negative impact seems to be associated with age, sex, and body mass index (BMI.

  2. Sample size in qualitative interview studies

    DEFF Research Database (Denmark)

    Malterud, Kirsti; Siersma, Volkert Dirk; Guassora, Ann Dorrit Kristiane

    2016-01-01

    Sample sizes must be ascertained in qualitative studies like in quantitative studies but not by the same means. The prevailing concept for sample size in qualitative studies is “saturation.” Saturation is closely tied to a specific methodology, and the term is inconsistently applied. We propose...... the concept “information power” to guide adequate sample size for qualitative studies. Information power indicates that the more information the sample holds, relevant for the actual study, the lower amount of participants is needed. We suggest that the size of a sample with sufficient information power...... and during data collection of a qualitative study is discussed....

  3. Sample Size for Tablet Compression and Capsule Filling Events During Process Validation.

    Science.gov (United States)

    Charoo, Naseem Ahmad; Durivage, Mark; Rahman, Ziyaur; Ayad, Mohamad Haitham

    2017-12-01

    During solid dosage form manufacturing, the uniformity of dosage units (UDU) is ensured by testing samples at 2 stages, that is, blend stage and tablet compression or capsule/powder filling stage. The aim of this work is to propose a sample size selection approach based on quality risk management principles for process performance qualification (PPQ) and continued process verification (CPV) stages by linking UDU to potential formulation and process risk factors. Bayes success run theorem appeared to be the most appropriate approach among various methods considered in this work for computing sample size for PPQ. The sample sizes for high-risk (reliability level of 99%), medium-risk (reliability level of 95%), and low-risk factors (reliability level of 90%) were estimated to be 299, 59, and 29, respectively. Risk-based assignment of reliability levels was supported by the fact that at low defect rate, the confidence to detect out-of-specification units would decrease which must be supplemented with an increase in sample size to enhance the confidence in estimation. Based on level of knowledge acquired during PPQ and the level of knowledge further required to comprehend process, sample size for CPV was calculated using Bayesian statistics to accomplish reduced sampling design for CPV. Copyright © 2017 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.

  4. 42 CFR 402.109 - Statistical sampling.

    Science.gov (United States)

    2010-10-01

    ... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study, if based upon an appropriate sampling and computed by valid statistical methods, constitute prima... § 402.1. (c) Burden of proof. Once CMS or OIG has made a prima facie case, the burden is on the...

  5. How Sample Size Affects a Sampling Distribution

    Science.gov (United States)

    Mulekar, Madhuri S.; Siegel, Murray H.

    2009-01-01

    If students are to understand inferential statistics successfully, they must have a profound understanding of the nature of the sampling distribution. Specifically, they must comprehend the determination of the expected value and standard error of a sampling distribution as well as the meaning of the central limit theorem. Many students in a high…

  6. Calculating Confidence, Uncertainty, and Numbers of Samples When Using Statistical Sampling Approaches to Characterize and Clear Contaminated Areas

    Energy Technology Data Exchange (ETDEWEB)

    Piepel, Gregory F.; Matzke, Brett D.; Sego, Landon H.; Amidan, Brett G.

    2013-04-27

    for FNR > 0 2. qualitative data when the FNR > 0 but statistical sampling methods are used that assume the FNR = 0 3. quantitative data (e.g., contaminant concentrations expressed as CFU/cm2) when the FNR = 0 or when using statistical sampling methods that account for FNR > 0 4. quantitative data when the FNR > 0 but statistical sampling methods are used that assume the FNR = 0. For Situation 2, the hotspot sampling approach provides for stating with Z% confidence that a hotspot of specified shape and size with detectable contamination will be found. Also for Situation 2, the CJR approach provides for stating with X% confidence that at least Y% of the decision area does not contain detectable contamination. Forms of these statements for the other three situations are discussed in Section 2.2. Statistical methods that account for FNR > 0 currently only exist for the hotspot sampling approach with qualitative data (or quantitative data converted to qualitative data). This report documents the current status of methods and formulas for the hotspot and CJR sampling approaches. Limitations of these methods are identified. Extensions of the methods that are applicable when FNR = 0 to account for FNR > 0, or to address other limitations, will be documented in future revisions of this report if future funding supports the development of such extensions. For quantitative data, this report also presents statistical methods and formulas for 1. quantifying the uncertainty in measured sample results 2. estimating the true surface concentration corresponding to a surface sample 3. quantifying the uncertainty of the estimate of the true surface concentration. All of the methods and formulas discussed in the report were applied to example situations to illustrate application of the methods and interpretation of the results.

  7. Sample size determination for disease prevalence studies with partially validated data.

    Science.gov (United States)

    Qiu, Shi-Fang; Poon, Wai-Yin; Tang, Man-Lai

    2016-02-01

    Disease prevalence is an important topic in medical research, and its study is based on data that are obtained by classifying subjects according to whether a disease has been contracted. Classification can be conducted with high-cost gold standard tests or low-cost screening tests, but the latter are subject to the misclassification of subjects. As a compromise between the two, many research studies use partially validated datasets in which all data points are classified by fallible tests, and some of the data points are validated in the sense that they are also classified by the completely accurate gold-standard test. In this article, we investigate the determination of sample sizes for disease prevalence studies with partially validated data. We use two approaches. The first is to find sample sizes that can achieve a pre-specified power of a statistical test at a chosen significance level, and the second is to find sample sizes that can control the width of a confidence interval with a pre-specified confidence level. Empirical studies have been conducted to demonstrate the performance of various testing procedures with the proposed sample sizes. The applicability of the proposed methods are illustrated by a real-data example. © The Author(s) 2012.

  8. Supporting Students to Develop Concepts Underlying Sampling and to Shuttle Between Contextual and Statistical Spheres

    NARCIS (Netherlands)

    Bakker, A.; Dierdorp, A.; Maanen, J.A. van; Eijkelhof, H.M.C.

    2012-01-01

    To stimulate students’ shuttling between contextual and statistical spheres, we based tasks on professional practices. This article focuses on two tasks to support reasoning about sampling by students aged 16-17. The purpose of the tasks was to find out which smaller sample size would have been

  9. Improved sample size determination for attributes and variables sampling

    International Nuclear Information System (INIS)

    Stirpe, D.; Picard, R.R.

    1985-01-01

    Earlier INMM papers have addressed the attributes/variables problem and, under conservative/limiting approximations, have reported analytical solutions for the attributes and variables sample sizes. Through computer simulation of this problem, we have calculated attributes and variables sample sizes as a function of falsification, measurement uncertainties, and required detection probability without using approximations. Using realistic assumptions for uncertainty parameters of measurement, the simulation results support the conclusions: (1) previously used conservative approximations can be expensive because they lead to larger sample sizes than needed; and (2) the optimal verification strategy, as well as the falsification strategy, are highly dependent on the underlying uncertainty parameters of the measurement instruments. 1 ref., 3 figs

  10. Overestimation of test performance by ROC analysis: Effect of small sample size

    International Nuclear Information System (INIS)

    Seeley, G.W.; Borgstrom, M.C.; Patton, D.D.; Myers, K.J.; Barrett, H.H.

    1984-01-01

    New imaging systems are often observer-rated by ROC techniques. For practical reasons the number of different images, or sample size (SS), is kept small. Any systematic bias due to small SS would bias system evaluation. The authors set about to determine whether the area under the ROC curve (AUC) would be systematically biased by small SS. Monte Carlo techniques were used to simulate observer performance in distinguishing signal (SN) from noise (N) on a 6-point scale; P(SN) = P(N) = .5. Four sample sizes (15, 25, 50 and 100 each of SN and N), three ROC slopes (0.8, 1.0 and 1.25), and three intercepts (0.8, 1.0 and 1.25) were considered. In each of the 36 combinations of SS, slope and intercept, 2000 runs were simulated. Results showed a systematic bias: the observed AUC exceeded the expected AUC in every one of the 36 combinations for all sample sizes, with the smallest sample sizes having the largest bias. This suggests that evaluations of imaging systems using ROC curves based on small sample size systematically overestimate system performance. The effect is consistent but subtle (maximum 10% of AUC standard deviation), and is probably masked by the s.d. in most practical settings. Although there is a statistically significant effect (F = 33.34, P<0.0001) due to sample size, none was found for either the ROC curve slope or intercept. Overestimation of test performance by small SS seems to be an inherent characteristic of the ROC technique that has not previously been described

  11. Statistical issues in reporting quality data: small samples and casemix variation.

    Science.gov (United States)

    Zaslavsky, A M

    2001-12-01

    To present two key statistical issues that arise in analysis and reporting of quality data. Casemix variation is relevant to quality reporting when the units being measured have differing distributions of patient characteristics that also affect the quality outcome. When this is the case, adjustment using stratification or regression may be appropriate. Such adjustments may be controversial when the patient characteristic does not have an obvious relationship to the outcome. Stratified reporting poses problems for sample size and reporting format, but may be useful when casemix effects vary across units. Although there are no absolute standards of reliability, high reliabilities (interunit F > or = 10 or reliability > or = 0.9) are desirable for distinguishing above- and below-average units. When small or unequal sample sizes complicate reporting, precision may be improved using indirect estimation techniques that incorporate auxiliary information, and 'shrinkage' estimation can help to summarize the strength of evidence about units with small samples. With broader understanding of casemix adjustment and methods for analyzing small samples, quality data can be analysed and reported more accurately.

  12. Contributions to sampling statistics

    CERN Document Server

    Conti, Pier; Ranalli, Maria

    2014-01-01

    This book contains a selection of the papers presented at the ITACOSM 2013 Conference, held in Milan in June 2013. ITACOSM is the bi-annual meeting of the Survey Sampling Group S2G of the Italian Statistical Society, intended as an international  forum of scientific discussion on the developments of theory and application of survey sampling methodologies and applications in human and natural sciences. The book gathers research papers carefully selected from both invited and contributed sessions of the conference. The whole book appears to be a relevant contribution to various key aspects of sampling methodology and techniques; it deals with some hot topics in sampling theory, such as calibration, quantile-regression and multiple frame surveys, and with innovative methodologies in important topics of both sampling theory and applications. Contributions cut across current sampling methodologies such as interval estimation for complex samples, randomized responses, bootstrap, weighting, modeling, imputati...

  13. Maximum type I error rate inflation from sample size reassessment when investigators are blind to treatment labels.

    Science.gov (United States)

    Żebrowska, Magdalena; Posch, Martin; Magirr, Dominic

    2016-05-30

    Consider a parallel group trial for the comparison of an experimental treatment to a control, where the second-stage sample size may depend on the blinded primary endpoint data as well as on additional blinded data from a secondary endpoint. For the setting of normally distributed endpoints, we demonstrate that this may lead to an inflation of the type I error rate if the null hypothesis holds for the primary but not the secondary endpoint. We derive upper bounds for the inflation of the type I error rate, both for trials that employ random allocation and for those that use block randomization. We illustrate the worst-case sample size reassessment rule in a case study. For both randomization strategies, the maximum type I error rate increases with the effect size in the secondary endpoint and the correlation between endpoints. The maximum inflation increases with smaller block sizes if information on the block size is used in the reassessment rule. Based on our findings, we do not question the well-established use of blinded sample size reassessment methods with nuisance parameter estimates computed from the blinded interim data of the primary endpoint. However, we demonstrate that the type I error rate control of these methods relies on the application of specific, binding, pre-planned and fully algorithmic sample size reassessment rules and does not extend to general or unplanned sample size adjustments based on blinded data. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  14. Statistical literacy and sample survey results

    Science.gov (United States)

    McAlevey, Lynn; Sullivan, Charles

    2010-10-01

    Sample surveys are widely used in the social sciences and business. The news media almost daily quote from them, yet they are widely misused. Using students with prior managerial experience embarking on an MBA course, we show that common sample survey results are misunderstood even by those managers who have previously done a statistics course. In general, they fare no better than managers who have never studied statistics. There are implications for teaching, especially in business schools, as well as for consulting.

  15. TRAN-STAT: statistics for environmental studies, Number 22. Comparison of soil-sampling techniques for plutonium at Rocky Flats

    International Nuclear Information System (INIS)

    Gilbert, R.O.; Bernhardt, D.E.; Hahn, P.B.

    1983-01-01

    A summary of a field soil sampling study conducted around the Rocky Flats Colorado plant in May 1977 is preseted. Several different soil sampling techniques that had been used in the area were applied at four different sites. One objective was to comparethe average 239 - 240 Pu concentration values obtained by the various soil sampling techniques used. There was also interest in determining whether there are differences in the reproducibility of the various techniques and how the techniques compared with the proposed EPA technique of sampling to 1 cm depth. Statistically significant differences in average concentrations between the techniques were found. The differences could be largely related to the differences in sampling depth-the primary physical variable between the techniques. The reproducibility of the techniques was evaluated by comparing coefficients of variation. Differences between coefficients of variation were not statistically significant. Average (median) coefficients ranged from 21 to 42 percent for the five sampling techniques. A laboratory study indicated that various sample treatment and particle sizing techniques could increase the concentration of plutonium in the less than 10 micrometer size fraction by up to a factor of about 4 compared to the 2 mm size fraction

  16. Optimal Sample Size for Probability of Detection Curves

    International Nuclear Information System (INIS)

    Annis, Charles; Gandossi, Luca; Martin, Oliver

    2012-01-01

    The use of Probability of Detection (POD) curves to quantify NDT reliability is common in the aeronautical industry, but relatively less so in the nuclear industry. The European Network for Inspection Qualification's (ENIQ) Inspection Qualification Methodology is based on the concept of Technical Justification, a document assembling all the evidence to assure that the NDT system in focus is indeed capable of finding the flaws for which it was designed. This methodology has become widely used in many countries, but the assurance it provides is usually of qualitative nature. The need to quantify the output of inspection qualification has become more important, especially as structural reliability modelling and quantitative risk-informed in-service inspection methodologies become more widely used. To credit the inspections in structural reliability evaluations, a measure of the NDT reliability is necessary. A POD curve provides such metric. In 2010 ENIQ developed a technical report on POD curves, reviewing the statistical models used to quantify inspection reliability. Further work was subsequently carried out to investigate the issue of optimal sample size for deriving a POD curve, so that adequate guidance could be given to the practitioners of inspection reliability. Manufacturing of test pieces with cracks that are representative of real defects found in nuclear power plants (NPP) can be very expensive. Thus there is a tendency to reduce sample sizes and in turn reduce the conservatism associated with the POD curve derived. Not much guidance on the correct sample size can be found in the published literature, where often qualitative statements are given with no further justification. The aim of this paper is to summarise the findings of such work. (author)

  17. The impact of sample size on the reproducibility of voxel-based lesion-deficit mappings.

    Science.gov (United States)

    Lorca-Puls, Diego L; Gajardo-Vidal, Andrea; White, Jitrachote; Seghier, Mohamed L; Leff, Alexander P; Green, David W; Crinion, Jenny T; Ludersdorfer, Philipp; Hope, Thomas M H; Bowman, Howard; Price, Cathy J

    2018-07-01

    This study investigated how sample size affects the reproducibility of findings from univariate voxel-based lesion-deficit analyses (e.g., voxel-based lesion-symptom mapping and voxel-based morphometry). Our effect of interest was the strength of the mapping between brain damage and speech articulation difficulties, as measured in terms of the proportion of variance explained. First, we identified a region of interest by searching on a voxel-by-voxel basis for brain areas where greater lesion load was associated with poorer speech articulation using a large sample of 360 right-handed English-speaking stroke survivors. We then randomly drew thousands of bootstrap samples from this data set that included either 30, 60, 90, 120, 180, or 360 patients. For each resample, we recorded effect size estimates and p values after conducting exactly the same lesion-deficit analysis within the previously identified region of interest and holding all procedures constant. The results show (1) how often small effect sizes in a heterogeneous population fail to be detected; (2) how effect size and its statistical significance varies with sample size; (3) how low-powered studies (due to small sample sizes) can greatly over-estimate as well as under-estimate effect sizes; and (4) how large sample sizes (N ≥ 90) can yield highly significant p values even when effect sizes are so small that they become trivial in practical terms. The implications of these findings for interpreting the results from univariate voxel-based lesion-deficit analyses are discussed. Copyright © 2018 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  18. Sample size calculations for case-control studies

    Science.gov (United States)

    This R package can be used to calculate the required samples size for unconditional multivariate analyses of unmatched case-control studies. The sample sizes are for a scalar exposure effect, such as binary, ordinal or continuous exposures. The sample sizes can also be computed for scalar interaction effects. The analyses account for the effects of potential confounder variables that are also included in the multivariate logistic model.

  19. Development of sample size allocation program using hypergeometric distribution

    International Nuclear Information System (INIS)

    Kim, Hyun Tae; Kwack, Eun Ho; Park, Wan Soo; Min, Kyung Soo; Park, Chan Sik

    1996-01-01

    The objective of this research is the development of sample allocation program using hypergeometric distribution with objected-oriented method. When IAEA(International Atomic Energy Agency) performs inspection, it simply applies a standard binomial distribution which describes sampling with replacement instead of a hypergeometric distribution which describes sampling without replacement in sample allocation to up to three verification methods. The objective of the IAEA inspection is the timely detection of diversion of significant quantities of nuclear material, therefore game theory is applied to its sampling plan. It is necessary to use hypergeometric distribution directly or approximate distribution to secure statistical accuracy. Improved binomial approximation developed by Mr. J. L. Jaech and correctly applied binomial approximation are more closer to hypergeometric distribution in sample size calculation than the simply applied binomial approximation of the IAEA. Object-oriented programs of 1. sample approximate-allocation with correctly applied standard binomial approximation, 2. sample approximate-allocation with improved binomial approximation, and 3. sample approximate-allocation with hypergeometric distribution were developed with Visual C ++ and corresponding programs were developed with EXCEL(using Visual Basic for Application). 8 tabs., 15 refs. (Author)

  20. Threshold-dependent sample sizes for selenium assessment with stream fish tissue

    Science.gov (United States)

    Hitt, Nathaniel P.; Smith, David R.

    2015-01-01

    Natural resource managers are developing assessments of selenium (Se) contamination in freshwater ecosystems based on fish tissue concentrations. We evaluated the effects of sample size (i.e., number of fish per site) on the probability of correctly detecting mean whole-body Se values above a range of potential management thresholds. We modeled Se concentrations as gamma distributions with shape and scale parameters fitting an empirical mean-to-variance relationship in data from southwestern West Virginia, USA (63 collections, 382 individuals). We used parametric bootstrapping techniques to calculate statistical power as the probability of detecting true mean concentrations up to 3 mg Se/kg above management thresholds ranging from 4 to 8 mg Se/kg. Sample sizes required to achieve 80% power varied as a function of management thresholds and Type I error tolerance (α). Higher thresholds required more samples than lower thresholds because populations were more heterogeneous at higher mean Se levels. For instance, to assess a management threshold of 4 mg Se/kg, a sample of eight fish could detect an increase of approximately 1 mg Se/kg with 80% power (given α = 0.05), but this sample size would be unable to detect such an increase from a management threshold of 8 mg Se/kg with more than a coin-flip probability. Increasing α decreased sample size requirements to detect above-threshold mean Se concentrations with 80% power. For instance, at an α-level of 0.05, an 8-fish sample could detect an increase of approximately 2 units above a threshold of 8 mg Se/kg with 80% power, but when α was relaxed to 0.2, this sample size was more sensitive to increasing mean Se concentrations, allowing detection of an increase of approximately 1.2 units with equivalent power. Combining individuals into 2- and 4-fish composite samples for laboratory analysis did not decrease power because the reduced number of laboratory samples was compensated for by increased

  1. A Web-based Simulator for Sample Size and Power Estimation in Animal Carcinogenicity Studies

    Directory of Open Access Journals (Sweden)

    Hojin Moon

    2002-12-01

    Full Text Available A Web-based statistical tool for sample size and power estimation in animal carcinogenicity studies is presented in this paper. It can be used to provide a design with sufficient power for detecting a dose-related trend in the occurrence of a tumor of interest when competing risks are present. The tumors of interest typically are occult tumors for which the time to tumor onset is not directly observable. It is applicable to rodent tumorigenicity assays that have either a single terminal sacrifice or multiple (interval sacrifices. The design is achieved by varying sample size per group, number of sacrifices, number of sacrificed animals at each interval, if any, and scheduled time points for sacrifice. Monte Carlo simulation is carried out in this tool to simulate experiments of rodent bioassays because no closed-form solution is available. It takes design parameters for sample size and power estimation as inputs through the World Wide Web. The core program is written in C and executed in the background. It communicates with the Web front end via a Component Object Model interface passing an Extensible Markup Language string. The proposed statistical tool is illustrated with an animal study in lung cancer prevention research.

  2. Representative volume size: A comparison of statistical continuum mechanics and statistical physics

    Energy Technology Data Exchange (ETDEWEB)

    AIDUN,JOHN B.; TRUCANO,TIMOTHY G.; LO,CHI S.; FYE,RICHARD M.

    1999-05-01

    In this combination background and position paper, the authors argue that careful work is needed to develop accurate methods for relating the results of fine-scale numerical simulations of material processes to meaningful values of macroscopic properties for use in constitutive models suitable for finite element solid mechanics simulations. To provide a definite context for this discussion, the problem is couched in terms of the lack of general objective criteria for identifying the size of the representative volume (RV) of a material. The objective of this report is to lay out at least the beginnings of an approach for applying results and methods from statistical physics to develop concepts and tools necessary for determining the RV size, as well as alternatives to RV volume-averaging for situations in which the RV is unmanageably large. The background necessary to understand the pertinent issues and statistical physics concepts is presented.

  3. Statistical sampling approaches for soil monitoring

    NARCIS (Netherlands)

    Brus, D.J.

    2014-01-01

    This paper describes three statistical sampling approaches for regional soil monitoring, a design-based, a model-based and a hybrid approach. In the model-based approach a space-time model is exploited to predict global statistical parameters of interest such as the space-time mean. In the hybrid

  4. Constrained statistical inference : sample-size tables for ANOVA and regression

    NARCIS (Netherlands)

    Vanbrabant, Leonard; Van De Schoot, Rens; Rosseel, Yves

    2015-01-01

    Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient β1 is larger than β2 and β3. The corresponding hypothesis is H: β1 > {β2, β3} and

  5. Neuromuscular dose-response studies: determining sample size.

    Science.gov (United States)

    Kopman, A F; Lien, C A; Naguib, M

    2011-02-01

    Investigators planning dose-response studies of neuromuscular blockers have rarely used a priori power analysis to determine the minimal sample size their protocols require. Institutional Review Boards and peer-reviewed journals now generally ask for this information. This study outlines a proposed method for meeting these requirements. The slopes of the dose-response relationships of eight neuromuscular blocking agents were determined using regression analysis. These values were substituted for γ in the Hill equation. When this is done, the coefficient of variation (COV) around the mean value of the ED₅₀ for each drug is easily calculated. Using these values, we performed an a priori one-sample two-tailed t-test of the means to determine the required sample size when the allowable error in the ED₅₀ was varied from ±10-20%. The COV averaged 22% (range 15-27%). We used a COV value of 25% in determining the sample size. If the allowable error in finding the mean ED₅₀ is ±15%, a sample size of 24 is needed to achieve a power of 80%. Increasing 'accuracy' beyond this point requires increasing greater sample sizes (e.g. an 'n' of 37 for a ±12% error). On the basis of the results of this retrospective analysis, a total sample size of not less than 24 subjects should be adequate for determining a neuromuscular blocking drug's clinical potency with a reasonable degree of assurance.

  6. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach.

    Directory of Open Access Journals (Sweden)

    Simon Boitard

    2016-03-01

    Full Text Available Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey, PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles.

  7. Estimating Sample Size for Usability Testing

    Directory of Open Access Journals (Sweden)

    Alex Cazañas

    2017-02-01

    Full Text Available One strategy used to assure that an interface meets user requirements is to conduct usability testing. When conducting such testing one of the unknowns is sample size. Since extensive testing is costly, minimizing the number of participants can contribute greatly to successful resource management of a project. Even though a significant number of models have been proposed to estimate sample size in usability testing, there is still not consensus on the optimal size. Several studies claim that 3 to 5 users suffice to uncover 80% of problems in a software interface. However, many other studies challenge this assertion. This study analyzed data collected from the user testing of a web application to verify the rule of thumb, commonly known as the “magic number 5”. The outcomes of the analysis showed that the 5-user rule significantly underestimates the required sample size to achieve reasonable levels of problem detection.

  8. Statistical aspects of food safety sampling

    NARCIS (Netherlands)

    Jongenburger, I.; Besten, den H.M.W.; Zwietering, M.H.

    2015-01-01

    In food safety management, sampling is an important tool for verifying control. Sampling by nature is a stochastic process. However, uncertainty regarding results is made even greater by the uneven distribution of microorganisms in a batch of food. This article reviews statistical aspects of

  9. Adaptive clinical trial designs with pre-specified rules for modifying the sample size: understanding efficient types of adaptation.

    Science.gov (United States)

    Levin, Gregory P; Emerson, Sarah C; Emerson, Scott S

    2013-04-15

    Adaptive clinical trial design has been proposed as a promising new approach that may improve the drug discovery process. Proponents of adaptive sample size re-estimation promote its ability to avoid 'up-front' commitment of resources, better address the complicated decisions faced by data monitoring committees, and minimize accrual to studies having delayed ascertainment of outcomes. We investigate aspects of adaptation rules, such as timing of the adaptation analysis and magnitude of sample size adjustment, that lead to greater or lesser statistical efficiency. Owing in part to the recent Food and Drug Administration guidance that promotes the use of pre-specified sampling plans, we evaluate alternative approaches in the context of well-defined, pre-specified adaptation. We quantify the relative costs and benefits of fixed sample, group sequential, and pre-specified adaptive designs with respect to standard operating characteristics such as type I error, maximal sample size, power, and expected sample size under a range of alternatives. Our results build on others' prior research by demonstrating in realistic settings that simple and easily implemented pre-specified adaptive designs provide only very small efficiency gains over group sequential designs with the same number of analyses. In addition, we describe optimal rules for modifying the sample size, providing efficient adaptation boundaries on a variety of scales for the interim test statistic for adaptation analyses occurring at several different stages of the trial. We thus provide insight into what are good and bad choices of adaptive sampling plans when the added flexibility of adaptive designs is desired. Copyright © 2012 John Wiley & Sons, Ltd.

  10. Statistical process control charts for attribute data involving very large sample sizes: a review of problems and solutions.

    Science.gov (United States)

    Mohammed, Mohammed A; Panesar, Jagdeep S; Laney, David B; Wilson, Richard

    2013-04-01

    The use of statistical process control (SPC) charts in healthcare is increasing. The primary purpose of SPC is to distinguish between common-cause variation which is attributable to the underlying process, and special-cause variation which is extrinsic to the underlying process. This is important because improvement under common-cause variation requires action on the process, whereas special-cause variation merits an investigation to first find the cause. Nonetheless, when dealing with attribute or count data (eg, number of emergency admissions) involving very large sample sizes, traditional SPC charts often produce tight control limits with most of the data points appearing outside the control limits. This can give a false impression of common and special-cause variation, and potentially misguide the user into taking the wrong actions. Given the growing availability of large datasets from routinely collected databases in healthcare, there is a need to present a review of this problem (which arises because traditional attribute charts only consider within-subgroup variation) and its solutions (which consider within and between-subgroup variation), which involve the use of the well-established measurements chart and the more recently developed attribute charts based on Laney's innovative approach. We close by making some suggestions for practice.

  11. Sample Size Determination for One- and Two-Sample Trimmed Mean Tests

    Science.gov (United States)

    Luh, Wei-Ming; Olejnik, Stephen; Guo, Jiin-Huarng

    2008-01-01

    Formulas to determine the necessary sample sizes for parametric tests of group comparisons are available from several sources and appropriate when population distributions are normal. However, in the context of nonnormal population distributions, researchers recommend Yuen's trimmed mean test, but formulas to determine sample sizes have not been…

  12. Assessing terpene content variability of whitebark pine in order to estimate representative sample size

    Directory of Open Access Journals (Sweden)

    Stefanović Milena

    2013-01-01

    Full Text Available In studies of population variability, particular attention has to be paid to the selection of a representative sample. The aim of this study was to assess the size of the new representative sample on the basis of the variability of chemical content of the initial sample on the example of a whitebark pine population. Statistical analysis included the content of 19 characteristics (terpene hydrocarbons and their derivates of the initial sample of 10 elements (trees. It was determined that the new sample should contain 20 trees so that the mean value calculated from it represents a basic set with a probability higher than 95 %. Determination of the lower limit of the representative sample size that guarantees a satisfactory reliability of generalization proved to be very important in order to achieve cost efficiency of the research. [Projekat Ministarstva nauke Republike Srbije, br. OI-173011, br. TR-37002 i br. III-43007

  13. Generalized procedures for determining inspection sample sizes (related to quantitative measurements). Vol. 1: Detailed explanations

    International Nuclear Information System (INIS)

    Jaech, J.L.; Lemaire, R.J.

    1986-11-01

    Generalized procedures have been developed to determine sample sizes in connection with the planning of inspection activities. These procedures are based on different measurement methods. They are applied mainly to Bulk Handling Facilities and Physical Inventory Verifications. The present report attempts (i) to assign to appropriate statistical testers (viz. testers for gross, partial and small defects) the measurement methods to be used, and (ii) to associate the measurement uncertainties with the sample sizes required for verification. Working papers are also provided to assist in the application of the procedures. This volume contains the detailed explanations concerning the above mentioned procedures

  14. Sample Size Bounding and Context Ranking as Approaches to the Human Error Quantification Problem

    Energy Technology Data Exchange (ETDEWEB)

    Reer, B

    2004-03-01

    The paper describes a technique denoted as Sub-Sample-Size Bounding (SSSB), which is useable for the statistical derivation of context-specific probabilities from data available in existing reports on operating experience. Applications to human reliability analysis (HRA) are emphasised in the presentation of this technique. Exemplified by a sample of 180 abnormal event sequences, the manner in which SSSB can provide viable input for the quantification of errors of commission (EOCs) are outlined. (author)

  15. Sample Size Bounding and Context Ranking as Approaches to the Human Error Quantification Problem

    International Nuclear Information System (INIS)

    Reer, B.

    2004-01-01

    The paper describes a technique denoted as Sub-Sample-Size Bounding (SSSB), which is useable for the statistical derivation of context-specific probabilities from data available in existing reports on operating experience. Applications to human reliability analysis (HRA) are emphasised in the presentation of this technique. Exemplified by a sample of 180 abnormal event sequences, the manner in which SSSB can provide viable input for the quantification of errors of commission (EOCs) are outlined. (author)

  16. Caution regarding the choice of standard deviations to guide sample size calculations in clinical trials.

    Science.gov (United States)

    Chen, Henian; Zhang, Nanhua; Lu, Xiaosun; Chen, Sophie

    2013-08-01

    The method used to determine choice of standard deviation (SD) is inadequately reported in clinical trials. Underestimations of the population SD may result in underpowered clinical trials. This study demonstrates how using the wrong method to determine population SD can lead to inaccurate sample sizes and underpowered studies, and offers recommendations to maximize the likelihood of achieving adequate statistical power. We review the practice of reporting sample size and its effect on the power of trials published in major journals. Simulated clinical trials were used to compare the effects of different methods of determining SD on power and sample size calculations. Prior to 1996, sample size calculations were reported in just 1%-42% of clinical trials. This proportion increased from 38% to 54% after the initial Consolidated Standards of Reporting Trials (CONSORT) was published in 1996, and from 64% to 95% after the revised CONSORT was published in 2001. Nevertheless, underpowered clinical trials are still common. Our simulated data showed that all minimal and 25th-percentile SDs fell below 44 (the population SD), regardless of sample size (from 5 to 50). For sample sizes 5 and 50, the minimum sample SDs underestimated the population SD by 90.7% and 29.3%, respectively. If only one sample was available, there was less than 50% chance that the actual power equaled or exceeded the planned power of 80% for detecting a median effect size (Cohen's d = 0.5) when using the sample SD to calculate the sample size. The proportions of studies with actual power of at least 80% were about 95%, 90%, 85%, and 80% when we used the larger SD, 80% upper confidence limit (UCL) of SD, 70% UCL of SD, and 60% UCL of SD to calculate the sample size, respectively. When more than one sample was available, the weighted average SD resulted in about 50% of trials being underpowered; the proportion of trials with power of 80% increased from 90% to 100% when the 75th percentile and the

  17. Effect of sample size on bias correction performance

    Science.gov (United States)

    Reiter, Philipp; Gutjahr, Oliver; Schefczyk, Lukas; Heinemann, Günther; Casper, Markus C.

    2014-05-01

    The output of climate models often shows a bias when compared to observed data, so that a preprocessing is necessary before using it as climate forcing in impact modeling (e.g. hydrology, species distribution). A common bias correction method is the quantile matching approach, which adapts the cumulative distribution function of the model output to the one of the observed data by means of a transfer function. Especially for precipitation we expect the bias correction performance to strongly depend on sample size, i.e. the length of the period used for calibration of the transfer function. We carry out experiments using the precipitation output of ten regional climate model (RCM) hindcast runs from the EU-ENSEMBLES project and the E-OBS observational dataset for the period 1961 to 2000. The 40 years are split into a 30 year calibration period and a 10 year validation period. In the first step, for each RCM transfer functions are set up cell-by-cell, using the complete 30 year calibration period. The derived transfer functions are applied to the validation period of the respective RCM precipitation output and the mean absolute errors in reference to the observational dataset are calculated. These values are treated as "best fit" for the respective RCM. In the next step, this procedure is redone using subperiods out of the 30 year calibration period. The lengths of these subperiods are reduced from 29 years down to a minimum of 1 year, only considering subperiods of consecutive years. This leads to an increasing number of repetitions for smaller sample sizes (e.g. 2 for a length of 29 years). In the last step, the mean absolute errors are statistically tested against the "best fit" of the respective RCM to compare the performances. In order to analyze if the intensity of the effect of sample size depends on the chosen correction method, four variations of the quantile matching approach (PTF, QUANT/eQM, gQM, GQM) are applied in this study. The experiments are further

  18. Sample size of the reference sample in a case-augmented study.

    Science.gov (United States)

    Ghosh, Palash; Dewanji, Anup

    2017-05-01

    The case-augmented study, in which a case sample is augmented with a reference (random) sample from the source population with only covariates information known, is becoming popular in different areas of applied science such as pharmacovigilance, ecology, and econometrics. In general, the case sample is available from some source (for example, hospital database, case registry, etc.); however, the reference sample is required to be drawn from the corresponding source population. The required minimum size of the reference sample is an important issue in this regard. In this work, we address the minimum sample size calculation and discuss related issues. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  19. 40 CFR 80.127 - Sample size guidelines.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 16 2010-07-01 2010-07-01 false Sample size guidelines. 80.127 Section 80.127 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) REGULATION OF FUELS AND FUEL ADDITIVES Attest Engagements § 80.127 Sample size guidelines. In performing the...

  20. Measuring radioactive half-lives via statistical sampling in practice

    Science.gov (United States)

    Lorusso, G.; Collins, S. M.; Jagan, K.; Hitt, G. W.; Sadek, A. M.; Aitken-Smith, P. M.; Bridi, D.; Keightley, J. D.

    2017-10-01

    The statistical sampling method for the measurement of radioactive decay half-lives exhibits intriguing features such as that the half-life is approximately the median of a distribution closely resembling a Cauchy distribution. Whilst initial theoretical considerations suggested that in certain cases the method could have significant advantages, accurate measurements by statistical sampling have proven difficult, for they require an exercise in non-standard statistical analysis. As a consequence, no half-life measurement using this method has yet been reported and no comparison with traditional methods has ever been made. We used a Monte Carlo approach to address these analysis difficulties, and present the first experimental measurement of a radioisotope half-life (211Pb) by statistical sampling in good agreement with the literature recommended value. Our work also focused on the comparison between statistical sampling and exponential regression analysis, and concluded that exponential regression achieves generally the highest accuracy.

  1. Statistics and sampling in transuranic studies

    International Nuclear Information System (INIS)

    Eberhardt, L.L.; Gilbert, R.O.

    1980-01-01

    The existing data on transuranics in the environment exhibit a remarkably high variability from sample to sample (coefficients of variation of 100% or greater). This chapter stresses the necessity of adequate sample size and suggests various ways to increase sampling efficiency. Objectives in sampling are regarded as being of great importance in making decisions as to sampling methodology. Four different classes of sampling methods are described: (1) descriptive sampling, (2) sampling for spatial pattern, (3) analytical sampling, and (4) sampling for modeling. A number of research needs are identified in the various sampling categories along with several problems that appear to be common to two or more such areas

  2. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA.

    Science.gov (United States)

    Kelly, Brendan J; Gross, Robert; Bittinger, Kyle; Sherrill-Mix, Scott; Lewis, James D; Collman, Ronald G; Bushman, Frederic D; Li, Hongzhe

    2015-08-01

    The variation in community composition between microbiome samples, termed beta diversity, can be measured by pairwise distance based on either presence-absence or quantitative species abundance data. PERMANOVA, a permutation-based extension of multivariate analysis of variance to a matrix of pairwise distances, partitions within-group and between-group distances to permit assessment of the effect of an exposure or intervention (grouping factor) upon the sampled microbiome. Within-group distance and exposure/intervention effect size must be accurately modeled to estimate statistical power for a microbiome study that will be analyzed with pairwise distances and PERMANOVA. We present a framework for PERMANOVA power estimation tailored to marker-gene microbiome studies that will be analyzed by pairwise distances, which includes: (i) a novel method for distance matrix simulation that permits modeling of within-group pairwise distances according to pre-specified population parameters; (ii) a method to incorporate effects of different sizes within the simulated distance matrix; (iii) a simulation-based method for estimating PERMANOVA power from simulated distance matrices; and (iv) an R statistical software package that implements the above. Matrices of pairwise distances can be efficiently simulated to satisfy the triangle inequality and incorporate group-level effects, which are quantified by the adjusted coefficient of determination, omega-squared (ω2). From simulated distance matrices, available PERMANOVA power or necessary sample size can be estimated for a planned microbiome study. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Determination of the optimal sample size for a clinical trial accounting for the population size.

    Science.gov (United States)

    Stallard, Nigel; Miller, Frank; Day, Simon; Hee, Siew Wan; Madan, Jason; Zohar, Sarah; Posch, Martin

    2017-07-01

    The problem of choosing a sample size for a clinical trial is a very common one. In some settings, such as rare diseases or other small populations, the large sample sizes usually associated with the standard frequentist approach may be infeasible, suggesting that the sample size chosen should reflect the size of the population under consideration. Incorporation of the population size is possible in a decision-theoretic approach either explicitly by assuming that the population size is fixed and known, or implicitly through geometric discounting of the gain from future patients reflecting the expected population size. This paper develops such approaches. Building on previous work, an asymptotic expression is derived for the sample size for single and two-arm clinical trials in the general case of a clinical trial with a primary endpoint with a distribution of one parameter exponential family form that optimizes a utility function that quantifies the cost and gain per patient as a continuous function of this parameter. It is shown that as the size of the population, N, or expected size, N∗ in the case of geometric discounting, becomes large, the optimal trial size is O(N1/2) or O(N∗1/2). The sample size obtained from the asymptotic expression is also compared with the exact optimal sample size in examples with responses with Bernoulli and Poisson distributions, showing that the asymptotic approximations can also be reasonable in relatively small sample sizes. © 2016 The Author. Biometrical Journal published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Publication Bias in Psychology: A Diagnosis Based on the Correlation between Effect Size and Sample Size

    Science.gov (United States)

    Kühberger, Anton; Fritz, Astrid; Scherndl, Thomas

    2014-01-01

    Background The p value obtained from a significance test provides no information about the magnitude or importance of the underlying phenomenon. Therefore, additional reporting of effect size is often recommended. Effect sizes are theoretically independent from sample size. Yet this may not hold true empirically: non-independence could indicate publication bias. Methods We investigate whether effect size is independent from sample size in psychological research. We randomly sampled 1,000 psychological articles from all areas of psychological research. We extracted p values, effect sizes, and sample sizes of all empirical papers, and calculated the correlation between effect size and sample size, and investigated the distribution of p values. Results We found a negative correlation of r = −.45 [95% CI: −.53; −.35] between effect size and sample size. In addition, we found an inordinately high number of p values just passing the boundary of significance. Additional data showed that neither implicit nor explicit power analysis could account for this pattern of findings. Conclusion The negative correlation between effect size and samples size, and the biased distribution of p values indicate pervasive publication bias in the entire field of psychology. PMID:25192357

  5. Statistical benchmark for BosonSampling

    International Nuclear Information System (INIS)

    Walschaers, Mattia; Mayer, Klaus; Buchleitner, Andreas; Kuipers, Jack; Urbina, Juan-Diego; Richter, Klaus; Tichy, Malte Christopher

    2016-01-01

    Boson samplers—set-ups that generate complex many-particle output states through the transmission of elementary many-particle input states across a multitude of mutually coupled modes—promise the efficient quantum simulation of a classically intractable computational task, and challenge the extended Church–Turing thesis, one of the fundamental dogmas of computer science. However, as in all experimental quantum simulations of truly complex systems, one crucial problem remains: how to certify that a given experimental measurement record unambiguously results from enforcing the claimed dynamics, on bosons, fermions or distinguishable particles? Here we offer a statistical solution to the certification problem, identifying an unambiguous statistical signature of many-body quantum interference upon transmission across a multimode, random scattering device. We show that statistical analysis of only partial information on the output state allows to characterise the imparted dynamics through particle type-specific features of the emerging interference patterns. The relevant statistical quantifiers are classically computable, define a falsifiable benchmark for BosonSampling, and reveal distinctive features of many-particle quantum dynamics, which go much beyond mere bunching or anti-bunching effects. (fast track communication)

  6. Geometric statistical inference

    International Nuclear Information System (INIS)

    Periwal, Vipul

    1999-01-01

    A reparametrization-covariant formulation of the inverse problem of probability is explicitly solved for finite sample sizes. The inferred distribution is explicitly continuous for finite sample size. A geometric solution of the statistical inference problem in higher dimensions is outlined

  7. Sample size requirements for studies of treatment effects on beta-cell function in newly diagnosed type 1 diabetes.

    Science.gov (United States)

    Lachin, John M; McGee, Paula L; Greenbaum, Carla J; Palmer, Jerry; Pescovitz, Mark D; Gottlieb, Peter; Skyler, Jay

    2011-01-01

    Preservation of β-cell function as measured by stimulated C-peptide has recently been accepted as a therapeutic target for subjects with newly diagnosed type 1 diabetes. In recently completed studies conducted by the Type 1 Diabetes Trial Network (TrialNet), repeated 2-hour Mixed Meal Tolerance Tests (MMTT) were obtained for up to 24 months from 156 subjects with up to 3 months duration of type 1 diabetes at the time of study enrollment. These data provide the information needed to more accurately determine the sample size needed for future studies of the effects of new agents on the 2-hour area under the curve (AUC) of the C-peptide values. The natural log(x), log(x+1) and square-root (√x) transformations of the AUC were assessed. In general, a transformation of the data is needed to better satisfy the normality assumptions for commonly used statistical tests. Statistical analysis of the raw and transformed data are provided to estimate the mean levels over time and the residual variation in untreated subjects that allow sample size calculations for future studies at either 12 or 24 months of follow-up and among children 8-12 years of age, adolescents (13-17 years) and adults (18+ years). The sample size needed to detect a given relative (percentage) difference with treatment versus control is greater at 24 months than at 12 months of follow-up, and differs among age categories. Owing to greater residual variation among those 13-17 years of age, a larger sample size is required for this age group. Methods are also described for assessment of sample size for mixtures of subjects among the age categories. Statistical expressions are presented for the presentation of analyses of log(x+1) and √x transformed values in terms of the original units of measurement (pmol/ml). Analyses using different transformations are described for the TrialNet study of masked anti-CD20 (rituximab) versus masked placebo. These results provide the information needed to accurately

  8. Sample size requirements for studies of treatment effects on beta-cell function in newly diagnosed type 1 diabetes.

    Directory of Open Access Journals (Sweden)

    John M Lachin

    Full Text Available Preservation of β-cell function as measured by stimulated C-peptide has recently been accepted as a therapeutic target for subjects with newly diagnosed type 1 diabetes. In recently completed studies conducted by the Type 1 Diabetes Trial Network (TrialNet, repeated 2-hour Mixed Meal Tolerance Tests (MMTT were obtained for up to 24 months from 156 subjects with up to 3 months duration of type 1 diabetes at the time of study enrollment. These data provide the information needed to more accurately determine the sample size needed for future studies of the effects of new agents on the 2-hour area under the curve (AUC of the C-peptide values. The natural log(x, log(x+1 and square-root (√x transformations of the AUC were assessed. In general, a transformation of the data is needed to better satisfy the normality assumptions for commonly used statistical tests. Statistical analysis of the raw and transformed data are provided to estimate the mean levels over time and the residual variation in untreated subjects that allow sample size calculations for future studies at either 12 or 24 months of follow-up and among children 8-12 years of age, adolescents (13-17 years and adults (18+ years. The sample size needed to detect a given relative (percentage difference with treatment versus control is greater at 24 months than at 12 months of follow-up, and differs among age categories. Owing to greater residual variation among those 13-17 years of age, a larger sample size is required for this age group. Methods are also described for assessment of sample size for mixtures of subjects among the age categories. Statistical expressions are presented for the presentation of analyses of log(x+1 and √x transformed values in terms of the original units of measurement (pmol/ml. Analyses using different transformations are described for the TrialNet study of masked anti-CD20 (rituximab versus masked placebo. These results provide the information needed to

  9. Multiparametric statistics

    CERN Document Server

    Serdobolskii, Vadim Ivanovich

    2007-01-01

    This monograph presents mathematical theory of statistical models described by the essentially large number of unknown parameters, comparable with sample size but can also be much larger. In this meaning, the proposed theory can be called "essentially multiparametric". It is developed on the basis of the Kolmogorov asymptotic approach in which sample size increases along with the number of unknown parameters.This theory opens a way for solution of central problems of multivariate statistics, which up until now have not been solved. Traditional statistical methods based on the idea of an infinite sampling often break down in the solution of real problems, and, dependent on data, can be inefficient, unstable and even not applicable. In this situation, practical statisticians are forced to use various heuristic methods in the hope the will find a satisfactory solution.Mathematical theory developed in this book presents a regular technique for implementing new, more efficient versions of statistical procedures. ...

  10. Sample size determination in clinical trials with multiple endpoints

    CERN Document Server

    Sozu, Takashi; Hamasaki, Toshimitsu; Evans, Scott R

    2015-01-01

    This book integrates recent methodological developments for calculating the sample size and power in trials with more than one endpoint considered as multiple primary or co-primary, offering an important reference work for statisticians working in this area. The determination of sample size and the evaluation of power are fundamental and critical elements in the design of clinical trials. If the sample size is too small, important effects may go unnoticed; if the sample size is too large, it represents a waste of resources and unethically puts more participants at risk than necessary. Recently many clinical trials have been designed with more than one endpoint considered as multiple primary or co-primary, creating a need for new approaches to the design and analysis of these clinical trials. The book focuses on the evaluation of power and sample size determination when comparing the effects of two interventions in superiority clinical trials with multiple endpoints. Methods for sample size calculation in clin...

  11. Effect size measures in a two-independent-samples case with nonnormal and nonhomogeneous data.

    Science.gov (United States)

    Li, Johnson Ching-Hong

    2016-12-01

    In psychological science, the "new statistics" refer to the new statistical practices that focus on effect size (ES) evaluation instead of conventional null-hypothesis significance testing (Cumming, Psychological Science, 25, 7-29, 2014). In a two-independent-samples scenario, Cohen's (1988) standardized mean difference (d) is the most popular ES, but its accuracy relies on two assumptions: normality and homogeneity of variances. Five other ESs-the unscaled robust d (d r * ; Hogarty & Kromrey, 2001), scaled robust d (d r ; Algina, Keselman, & Penfield, Psychological Methods, 10, 317-328, 2005), point-biserial correlation (r pb ; McGrath & Meyer, Psychological Methods, 11, 386-401, 2006), common-language ES (CL; Cliff, Psychological Bulletin, 114, 494-509, 1993), and nonparametric estimator for CL (A w ; Ruscio, Psychological Methods, 13, 19-30, 2008)-may be robust to violations of these assumptions, but no study has systematically evaluated their performance. Thus, in this simulation study the performance of these six ESs was examined across five factors: data distribution, sample, base rate, variance ratio, and sample size. The results showed that A w and d r were generally robust to these violations, and A w slightly outperformed d r . Implications for the use of A w and d r in real-world research are discussed.

  12. Statistical conditional sampling for variable-resolution video compression.

    Directory of Open Access Journals (Sweden)

    Alexander Wong

    Full Text Available In this study, we investigate a variable-resolution approach to video compression based on Conditional Random Field and statistical conditional sampling in order to further improve compression rate while maintaining high-quality video. In the proposed approach, representative key-frames within a video shot are identified and stored at full resolution. The remaining frames within the video shot are stored and compressed at a reduced resolution. At the decompression stage, a region-based dictionary is constructed from the key-frames and used to restore the reduced resolution frames to the original resolution via statistical conditional sampling. The sampling approach is based on the conditional probability of the CRF modeling by use of the constructed dictionary. Experimental results show that the proposed variable-resolution approach via statistical conditional sampling has potential for improving compression rates when compared to compressing the video at full resolution, while achieving higher video quality when compared to compressing the video at reduced resolution.

  13. Fissure formation in coke. 3: Coke size distribution and statistical analysis

    Energy Technology Data Exchange (ETDEWEB)

    D.R. Jenkins; D.E. Shaw; M.R. Mahoney [CSIRO, North Ryde, NSW (Australia). Mathematical and Information Sciences

    2010-07-15

    A model of coke stabilization, based on a fundamental model of fissuring during carbonisation is used to demonstrate the applicability of the fissuring model to actual coke size distributions. The results indicate that the degree of stabilization is important in determining the size distribution. A modified form of the Weibull distribution is shown to provide a better representation of the whole coke size distribution compared to the Rosin-Rammler distribution, which is generally only fitted to the lump coke. A statistical analysis of a large number of experiments in a pilot scale coke oven shows reasonably good prediction of the coke mean size, based on parameters related to blend rank, amount of low rank coal, fluidity and ash. However, the prediction of measures of the spread of the size distribution is more problematic. The fissuring model, the size distribution representation and the statistical analysis together provide a comprehensive capability for understanding and predicting the mean size and distribution of coke lumps produced during carbonisation. 12 refs., 16 figs., 4 tabs.

  14. A simple nomogram for sample size for estimating sensitivity and specificity of medical tests

    Directory of Open Access Journals (Sweden)

    Malhotra Rajeev

    2010-01-01

    Full Text Available Sensitivity and specificity measure inherent validity of a diagnostic test against a gold standard. Researchers develop new diagnostic methods to reduce the cost, risk, invasiveness, and time. Adequate sample size is a must to precisely estimate the validity of a diagnostic test. In practice, researchers generally decide about the sample size arbitrarily either at their convenience, or from the previous literature. We have devised a simple nomogram that yields statistically valid sample size for anticipated sensitivity or anticipated specificity. MS Excel version 2007 was used to derive the values required to plot the nomogram using varying absolute precision, known prevalence of disease, and 95% confidence level using the formula already available in the literature. The nomogram plot was obtained by suitably arranging the lines and distances to conform to this formula. This nomogram could be easily used to determine the sample size for estimating the sensitivity or specificity of a diagnostic test with required precision and 95% confidence level. Sample size at 90% and 99% confidence level, respectively, can also be obtained by just multiplying 0.70 and 1.75 with the number obtained for the 95% confidence level. A nomogram instantly provides the required number of subjects by just moving the ruler and can be repeatedly used without redoing the calculations. This can also be applied for reverse calculations. This nomogram is not applicable for testing of the hypothesis set-up and is applicable only when both diagnostic test and gold standard results have a dichotomous category.

  15. Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic.

    Science.gov (United States)

    Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

    2016-01-01

    Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set-proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.

  16. Predicting sample size required for classification performance

    Directory of Open Access Journals (Sweden)

    Figueroa Rosa L

    2012-02-01

    Full Text Available Abstract Background Supervised learning methods need annotated data in order to generate efficient models. Annotated data, however, is a relatively scarce resource and can be expensive to obtain. For both passive and active learning methods, there is a need to estimate the size of the annotated sample required to reach a performance target. Methods We designed and implemented a method that fits an inverse power law model to points of a given learning curve created using a small annotated training set. Fitting is carried out using nonlinear weighted least squares optimization. The fitted model is then used to predict the classifier's performance and confidence interval for larger sample sizes. For evaluation, the nonlinear weighted curve fitting method was applied to a set of learning curves generated using clinical text and waveform classification tasks with active and passive sampling methods, and predictions were validated using standard goodness of fit measures. As control we used an un-weighted fitting method. Results A total of 568 models were fitted and the model predictions were compared with the observed performances. Depending on the data set and sampling method, it took between 80 to 560 annotated samples to achieve mean average and root mean squared error below 0.01. Results also show that our weighted fitting method outperformed the baseline un-weighted method (p Conclusions This paper describes a simple and effective sample size prediction algorithm that conducts weighted fitting of learning curves. The algorithm outperformed an un-weighted algorithm described in previous literature. It can help researchers determine annotation sample size for supervised machine learning.

  17. Application of binomial and multinomial probability statistics to the sampling design process of a global grain tracing and recall system

    Science.gov (United States)

    Small, coded, pill-sized tracers embedded in grain are proposed as a method for grain traceability. A sampling process for a grain traceability system was designed and investigated by applying probability statistics using a science-based sampling approach to collect an adequate number of tracers fo...

  18. Maximum type 1 error rate inflation in multiarmed clinical trials with adaptive interim sample size modifications.

    Science.gov (United States)

    Graf, Alexandra C; Bauer, Peter; Glimm, Ekkehard; Koenig, Franz

    2014-07-01

    Sample size modifications in the interim analyses of an adaptive design can inflate the type 1 error rate, if test statistics and critical boundaries are used in the final analysis as if no modification had been made. While this is already true for designs with an overall change of the sample size in a balanced treatment-control comparison, the inflation can be much larger if in addition a modification of allocation ratios is allowed as well. In this paper, we investigate adaptive designs with several treatment arms compared to a single common control group. Regarding modifications, we consider treatment arm selection as well as modifications of overall sample size and allocation ratios. The inflation is quantified for two approaches: a naive procedure that ignores not only all modifications, but also the multiplicity issue arising from the many-to-one comparison, and a Dunnett procedure that ignores modifications, but adjusts for the initially started multiple treatments. The maximum inflation of the type 1 error rate for such types of design can be calculated by searching for the "worst case" scenarios, that are sample size adaptation rules in the interim analysis that lead to the largest conditional type 1 error rate in any point of the sample space. To show the most extreme inflation, we initially assume unconstrained second stage sample size modifications leading to a large inflation of the type 1 error rate. Furthermore, we investigate the inflation when putting constraints on the second stage sample sizes. It turns out that, for example fixing the sample size of the control group, leads to designs controlling the type 1 error rate. © 2014 The Author. Biometrical Journal published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Optimal sample preparation for nanoparticle metrology (statistical size measurements) using atomic force microscopy

    International Nuclear Information System (INIS)

    Hoo, Christopher M.; Doan, Trang; Starostin, Natasha; West, Paul E.; Mecartney, Martha L.

    2010-01-01

    Optimal deposition procedures are determined for nanoparticle size characterization by atomic force microscopy (AFM). Accurate nanoparticle size distribution analysis with AFM requires non-agglomerated nanoparticles on a flat substrate. The deposition of polystyrene (100 nm), silica (300 and 100 nm), gold (100 nm), and CdSe quantum dot (2-5 nm) nanoparticles by spin coating was optimized for size distribution measurements by AFM. Factors influencing deposition include spin speed, concentration, solvent, and pH. A comparison using spin coating, static evaporation, and a new fluid cell deposition method for depositing nanoparticles is also made. The fluid cell allows for a more uniform and higher density deposition of nanoparticles on a substrate at laminar flow rates, making nanoparticle size analysis via AFM more efficient and also offers the potential for nanoparticle analysis in liquid environments.

  20. Sample size determination for equivalence assessment with multiple endpoints.

    Science.gov (United States)

    Sun, Anna; Dong, Xiaoyu; Tsong, Yi

    2014-01-01

    Equivalence assessment between a reference and test treatment is often conducted by two one-sided tests (TOST). The corresponding power function and sample size determination can be derived from a joint distribution of the sample mean and sample variance. When an equivalence trial is designed with multiple endpoints, it often involves several sets of two one-sided tests. A naive approach for sample size determination in this case would select the largest sample size required for each endpoint. However, such a method ignores the correlation among endpoints. With the objective to reject all endpoints and when the endpoints are uncorrelated, the power function is the production of all power functions for individual endpoints. With correlated endpoints, the sample size and power should be adjusted for such a correlation. In this article, we propose the exact power function for the equivalence test with multiple endpoints adjusted for correlation under both crossover and parallel designs. We further discuss the differences in sample size for the naive method without and with correlation adjusted methods and illustrate with an in vivo bioequivalence crossover study with area under the curve (AUC) and maximum concentration (Cmax) as the two endpoints.

  1. Preeminence and prerequisites of sample size calculations in clinical trials

    OpenAIRE

    Richa Singhal; Rakesh Rana

    2015-01-01

    The key components while planning a clinical study are the study design, study duration, and sample size. These features are an integral part of planning a clinical trial efficiently, ethically, and cost-effectively. This article describes some of the prerequisites for sample size calculation. It also explains that sample size calculation is different for different study designs. The article in detail describes the sample size calculation for a randomized controlled trial when the primary out...

  2. Developing Students' Reasoning about Samples and Sampling Variability as a Path to Expert Statistical Thinking

    Science.gov (United States)

    Garfield, Joan; Le, Laura; Zieffler, Andrew; Ben-Zvi, Dani

    2015-01-01

    This paper describes the importance of developing students' reasoning about samples and sampling variability as a foundation for statistical thinking. Research on expert-novice thinking as well as statistical thinking is reviewed and compared. A case is made that statistical thinking is a type of expert thinking, and as such, research…

  3. A course in mathematical statistics and large sample theory

    CERN Document Server

    Bhattacharya, Rabi; Patrangenaru, Victor

    2016-01-01

    This graduate-level textbook is primarily aimed at graduate students of statistics, mathematics, science, and engineering who have had an undergraduate course in statistics, an upper division course in analysis, and some acquaintance with measure theoretic probability. It provides a rigorous presentation of the core of mathematical statistics. Part I of this book constitutes a one-semester course on basic parametric mathematical statistics. Part II deals with the large sample theory of statistics — parametric and nonparametric, and its contents may be covered in one semester as well. Part III provides brief accounts of a number of topics of current interest for practitioners and other disciplines whose work involves statistical methods. Large Sample theory with many worked examples, numerical calculations, and simulations to illustrate theory Appendices provide ready access to a number of standard results, with many proofs Solutions given to a number of selected exercises from Part I Part II exercises with ...

  4. Illustrating Sampling Distribution of a Statistic: Minitab Revisited

    Science.gov (United States)

    Johnson, H. Dean; Evans, Marc A.

    2008-01-01

    Understanding the concept of the sampling distribution of a statistic is essential for the understanding of inferential procedures. Unfortunately, this topic proves to be a stumbling block for students in introductory statistics classes. In efforts to aid students in their understanding of this concept, alternatives to a lecture-based mode of…

  5. Preeminence and prerequisites of sample size calculations in clinical trials

    Directory of Open Access Journals (Sweden)

    Richa Singhal

    2015-01-01

    Full Text Available The key components while planning a clinical study are the study design, study duration, and sample size. These features are an integral part of planning a clinical trial efficiently, ethically, and cost-effectively. This article describes some of the prerequisites for sample size calculation. It also explains that sample size calculation is different for different study designs. The article in detail describes the sample size calculation for a randomized controlled trial when the primary outcome is a continuous variable and when it is a proportion or a qualitative variable.

  6. Reporting effect sizes as a supplement to statistical significance ...

    African Journals Online (AJOL)

    The purpose of the article is to review the statistical significance reporting practices in reading instruction studies and to provide guidelines for when to calculate and report effect sizes in educational research. A review of six readily accessible (online) and accredited journals publishing research on reading instruction ...

  7. Determination of the minimum size of a statistical representative volume element from a fibre-reinforced composite based on point pattern statistics

    DEFF Research Database (Denmark)

    Hansen, Jens Zangenberg; Brøndsted, Povl

    2013-01-01

    In a previous study, Trias et al. [1] determined the minimum size of a statistical representative volume element (SRVE) of a unidirectional fibre-reinforced composite primarily based on numerical analyses of the stress/strain field. In continuation of this, the present study determines the minimu...... size of an SRVE based on a statistical analysis on the spatial statistics of the fibre packing patterns found in genuine laminates, and those generated numerically using a microstructure generator. © 2012 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved....

  8. A statistical analysis of North East Atlantic (submicron aerosol size distributions

    Directory of Open Access Journals (Sweden)

    M. Dall'Osto

    2011-12-01

    Full Text Available The Global Atmospheric Watch research station at Mace Head (Ireland offers the possibility to sample some of the cleanest air masses being imported into Europe as well as some of the most polluted being exported out of Europe. We present a statistical cluster analysis of the physical characteristics of aerosol size distributions in air ranging from the cleanest to the most polluted for the year 2008. Data coverage achieved was 75% throughout the year. By applying the Hartigan-Wong k-Means method, 12 clusters were identified as systematically occurring. These 12 clusters could be further combined into 4 categories with similar characteristics, namely: coastal nucleation category (occurring 21.3 % of the time, open ocean nucleation category (occurring 32.6% of the time, background clean marine category (occurring 26.1% of the time and anthropogenic category (occurring 20% of the time aerosol size distributions. The coastal nucleation category is characterised by a clear and dominant nucleation mode at sizes less than 10 nm while the open ocean nucleation category is characterised by a dominant Aitken mode between 15 nm and 50 nm. The background clean marine aerosol exhibited a clear bimodality in the sub-micron size distribution, with although it should be noted that either the Aitken mode or the accumulation mode may dominate the number concentration. However, peculiar background clean marine size distributions with coarser accumulation modes are also observed during winter months. By contrast, the continentally-influenced size distributions are generally more monomodal (accumulation, albeit with traces of bimodality. The open ocean category occurs more often during May, June and July, corresponding with the North East (NE Atlantic high biological period. Combined with the relatively high percentage frequency of occurrence (32.6%, this suggests that the marine biota is an important source of new nano aerosol particles in NE Atlantic Air.

  9. Effects of sample size and sampling frequency on studies of brown bear home ranges and habitat use

    Science.gov (United States)

    Arthur, Steve M.; Schwartz, Charles C.

    1999-01-01

    We equipped 9 brown bears (Ursus arctos) on the Kenai Peninsula, Alaska, with collars containing both conventional very-high-frequency (VHF) transmitters and global positioning system (GPS) receivers programmed to determine an animal's position at 5.75-hr intervals. We calculated minimum convex polygon (MCP) and fixed and adaptive kernel home ranges for randomly-selected subsets of the GPS data to examine the effects of sample size on accuracy and precision of home range estimates. We also compared results obtained by weekly aerial radiotracking versus more frequent GPS locations to test for biases in conventional radiotracking data. Home ranges based on the MCP were 20-606 km2 (x = 201) for aerial radiotracking data (n = 12-16 locations/bear) and 116-1,505 km2 (x = 522) for the complete GPS data sets (n = 245-466 locations/bear). Fixed kernel home ranges were 34-955 km2 (x = 224) for radiotracking data and 16-130 km2 (x = 60) for the GPS data. Differences between means for radiotracking and GPS data were due primarily to the larger samples provided by the GPS data. Means did not differ between radiotracking data and equivalent-sized subsets of GPS data (P > 0.10). For the MCP, home range area increased and variability decreased asymptotically with number of locations. For the kernel models, both area and variability decreased with increasing sample size. Simulations suggested that the MCP and kernel models required >60 and >80 locations, respectively, for estimates to be both accurate (change in area bears. Our results suggest that the usefulness of conventional radiotracking data may be limited by potential biases and variability due to small samples. Investigators that use home range estimates in statistical tests should consider the effects of variability of those estimates. Use of GPS-equipped collars can facilitate obtaining larger samples of unbiased data and improve accuracy and precision of home range estimates.

  10. Revisiting sample size: are big trials the answer?

    Science.gov (United States)

    Lurati Buse, Giovanna A L; Botto, Fernando; Devereaux, P J

    2012-07-18

    The superiority of the evidence generated in randomized controlled trials over observational data is not only conditional to randomization. Randomized controlled trials require proper design and implementation to provide a reliable effect estimate. Adequate random sequence generation, allocation implementation, analyses based on the intention-to-treat principle, and sufficient power are crucial to the quality of a randomized controlled trial. Power, or the probability of the trial to detect a difference when a real difference between treatments exists, strongly depends on sample size. The quality of orthopaedic randomized controlled trials is frequently threatened by a limited sample size. This paper reviews basic concepts and pitfalls in sample-size estimation and focuses on the importance of large trials in the generation of valid evidence.

  11. Test of a sample container for shipment of small size plutonium samples with PAT-2

    International Nuclear Information System (INIS)

    Kuhn, E.; Aigner, H.; Deron, S.

    1981-11-01

    A light-weight container for the air transport of plutonium, to be designated PAT-2, has been developed in the USA and is presently undergoing licensing. The very limited effective space for bearing plutonium required the design of small size sample canisters to meet the needs of international safeguards for the shipment of plutonium samples. The applicability of a small canister for the sampling of small size powder and solution samples has been tested in an intralaboratory experiment. The results of the experiment, based on the concept of pre-weighed samples, show that the tested canister can successfully be used for the sampling of small size PuO 2 -powder samples of homogeneous source material, as well as for dried aliquands of plutonium nitrate solutions. (author)

  12. Sensitivity and specificity of normality tests and consequences on reference interval accuracy at small sample size: a computer-simulation study.

    Science.gov (United States)

    Le Boedec, Kevin

    2016-12-01

    According to international guidelines, parametric methods must be chosen for RI construction when the sample size is small and the distribution is Gaussian. However, normality tests may not be accurate at small sample size. The purpose of the study was to evaluate normality test performance to properly identify samples extracted from a Gaussian population at small sample sizes, and assess the consequences on RI accuracy of applying parametric methods to samples that falsely identified the parent population as Gaussian. Samples of n = 60 and n = 30 values were randomly selected 100 times from simulated Gaussian, lognormal, and asymmetric populations of 10,000 values. The sensitivity and specificity of 4 normality tests were compared. Reference intervals were calculated using 6 different statistical methods from samples that falsely identified the parent population as Gaussian, and their accuracy was compared. Shapiro-Wilk and D'Agostino-Pearson tests were the best performing normality tests. However, their specificity was poor at sample size n = 30 (specificity for P Box-Cox transformation) on all samples regardless of their distribution or adjusting, the significance level of normality tests depending on sample size would limit the risk of constructing inaccurate RI. © 2016 American Society for Veterinary Clinical Pathology.

  13. Estimating sample size for landscape-scale mark-recapture studies of North American migratory tree bats

    Science.gov (United States)

    Ellison, Laura E.; Lukacs, Paul M.

    2014-01-01

    Concern for migratory tree-roosting bats in North America has grown because of possible population declines from wind energy development. This concern has driven interest in estimating population-level changes. Mark-recapture methodology is one possible analytical framework for assessing bat population changes, but sample size requirements to produce reliable estimates have not been estimated. To illustrate the sample sizes necessary for a mark-recapture-based monitoring program we conducted power analyses using a statistical model that allows reencounters of live and dead marked individuals. We ran 1,000 simulations for each of five broad sample size categories in a Burnham joint model, and then compared the proportion of simulations in which 95% confidence intervals overlapped between and among years for a 4-year study. Additionally, we conducted sensitivity analyses of sample size to various capture probabilities and recovery probabilities. More than 50,000 individuals per year would need to be captured and released to accurately determine 10% and 15% declines in annual survival. To detect more dramatic declines of 33% or 50% survival over four years, then sample sizes of 25,000 or 10,000 per year, respectively, would be sufficient. Sensitivity analyses reveal that increasing recovery of dead marked individuals may be more valuable than increasing capture probability of marked individuals. Because of the extraordinary effort that would be required, we advise caution should such a mark-recapture effort be initiated because of the difficulty in attaining reliable estimates. We make recommendations for what techniques show the most promise for mark-recapture studies of bats because some techniques violate the assumptions of mark-recapture methodology when used to mark bats.

  14. CT dose survey in adults: what sample size for what precision?

    International Nuclear Information System (INIS)

    Taylor, Stephen; Muylem, Alain van; Howarth, Nigel; Gevenois, Pierre Alain; Tack, Denis

    2017-01-01

    To determine variability of volume computed tomographic dose index (CTDIvol) and dose-length product (DLP) data, and propose a minimum sample size to achieve an expected precision. CTDIvol and DLP values of 19,875 consecutive CT acquisitions of abdomen (7268), thorax (3805), lumbar spine (3161), cervical spine (1515) and head (4106) were collected in two centers. Their variabilities were investigated according to sample size (10 to 1000 acquisitions) and patient body weight categories (no weight selection, 67-73 kg and 60-80 kg). The 95 % confidence interval in percentage of their median (CI95/med) value was calculated for increasing sample sizes. We deduced the sample size that set a 95 % CI lower than 10 % of the median (CI95/med ≤ 10 %). Sample size ensuring CI95/med ≤ 10 %, ranged from 15 to 900 depending on the body region and the dose descriptor considered. In sample sizes recommended by regulatory authorities (i.e., from 10-20 patients), mean CTDIvol and DLP of one sample ranged from 0.50 to 2.00 times its actual value extracted from 2000 samples. The sampling error in CTDIvol and DLP means is high in dose surveys based on small samples of patients. Sample size should be increased at least tenfold to decrease this variability. (orig.)

  15. CT dose survey in adults: what sample size for what precision?

    Energy Technology Data Exchange (ETDEWEB)

    Taylor, Stephen [Hopital Ambroise Pare, Department of Radiology, Mons (Belgium); Muylem, Alain van [Hopital Erasme, Department of Pneumology, Brussels (Belgium); Howarth, Nigel [Clinique des Grangettes, Department of Radiology, Chene-Bougeries (Switzerland); Gevenois, Pierre Alain [Hopital Erasme, Department of Radiology, Brussels (Belgium); Tack, Denis [EpiCURA, Clinique Louis Caty, Department of Radiology, Baudour (Belgium)

    2017-01-15

    To determine variability of volume computed tomographic dose index (CTDIvol) and dose-length product (DLP) data, and propose a minimum sample size to achieve an expected precision. CTDIvol and DLP values of 19,875 consecutive CT acquisitions of abdomen (7268), thorax (3805), lumbar spine (3161), cervical spine (1515) and head (4106) were collected in two centers. Their variabilities were investigated according to sample size (10 to 1000 acquisitions) and patient body weight categories (no weight selection, 67-73 kg and 60-80 kg). The 95 % confidence interval in percentage of their median (CI95/med) value was calculated for increasing sample sizes. We deduced the sample size that set a 95 % CI lower than 10 % of the median (CI95/med ≤ 10 %). Sample size ensuring CI95/med ≤ 10 %, ranged from 15 to 900 depending on the body region and the dose descriptor considered. In sample sizes recommended by regulatory authorities (i.e., from 10-20 patients), mean CTDIvol and DLP of one sample ranged from 0.50 to 2.00 times its actual value extracted from 2000 samples. The sampling error in CTDIvol and DLP means is high in dose surveys based on small samples of patients. Sample size should be increased at least tenfold to decrease this variability. (orig.)

  16. Pierre Gy's sampling theory and sampling practice heterogeneity, sampling correctness, and statistical process control

    CERN Document Server

    Pitard, Francis F

    1993-01-01

    Pierre Gy's Sampling Theory and Sampling Practice, Second Edition is a concise, step-by-step guide for process variability management and methods. Updated and expanded, this new edition provides a comprehensive study of heterogeneity, covering the basic principles of sampling theory and its various applications. It presents many practical examples to allow readers to select appropriate sampling protocols and assess the validity of sampling protocols from others. The variability of dynamic process streams using variography is discussed to help bridge sampling theory with statistical process control. Many descriptions of good sampling devices, as well as descriptions of poor ones, are featured to educate readers on what to look for when purchasing sampling systems. The book uses its accessible, tutorial style to focus on professional selection and use of methods. The book will be a valuable guide for mineral processing engineers; metallurgists; geologists; miners; chemists; environmental scientists; and practit...

  17. Sample-size dependence of diversity indices and the determination of sufficient sample size in a high-diversity deep-sea environment

    OpenAIRE

    Soetaert, K.; Heip, C.H.R.

    1990-01-01

    Diversity indices, although designed for comparative purposes, often cannot be used as such, due to their sample-size dependence. It is argued here that this dependence is more pronounced in high diversity than in low diversity assemblages and that indices more sensitive to rarer species require larger sample sizes to estimate diversity with reasonable precision than indices which put more weight on commoner species. This was tested for Hill's diversity number N sub(0) to N sub( proportional ...

  18. Sample size calculation for comparing two negative binomial rates.

    Science.gov (United States)

    Zhu, Haiyuan; Lakkis, Hassan

    2014-02-10

    Negative binomial model has been increasingly used to model the count data in recent clinical trials. It is frequently chosen over Poisson model in cases of overdispersed count data that are commonly seen in clinical trials. One of the challenges of applying negative binomial model in clinical trial design is the sample size estimation. In practice, simulation methods have been frequently used for sample size estimation. In this paper, an explicit formula is developed to calculate sample size based on the negative binomial model. Depending on different approaches to estimate the variance under null hypothesis, three variations of the sample size formula are proposed and discussed. Important characteristics of the formula include its accuracy and its ability to explicitly incorporate dispersion parameter and exposure time. The performance of the formula with each variation is assessed using simulations. Copyright © 2013 John Wiley & Sons, Ltd.

  19. [A comparison of convenience sampling and purposive sampling].

    Science.gov (United States)

    Suen, Lee-Jen Wu; Huang, Hui-Man; Lee, Hao-Hsien

    2014-06-01

    Convenience sampling and purposive sampling are two different sampling methods. This article first explains sampling terms such as target population, accessible population, simple random sampling, intended sample, actual sample, and statistical power analysis. These terms are then used to explain the difference between "convenience sampling" and purposive sampling." Convenience sampling is a non-probabilistic sampling technique applicable to qualitative or quantitative studies, although it is most frequently used in quantitative studies. In convenience samples, subjects more readily accessible to the researcher are more likely to be included. Thus, in quantitative studies, opportunity to participate is not equal for all qualified individuals in the target population and study results are not necessarily generalizable to this population. As in all quantitative studies, increasing the sample size increases the statistical power of the convenience sample. In contrast, purposive sampling is typically used in qualitative studies. Researchers who use this technique carefully select subjects based on study purpose with the expectation that each participant will provide unique and rich information of value to the study. As a result, members of the accessible population are not interchangeable and sample size is determined by data saturation not by statistical power analysis.

  20. Estimation of sample size and testing power (part 5).

    Science.gov (United States)

    Hu, Liang-ping; Bao, Xiao-lei; Guan, Xue; Zhou, Shi-guo

    2012-02-01

    Estimation of sample size and testing power is an important component of research design. This article introduced methods for sample size and testing power estimation of difference test for quantitative and qualitative data with the single-group design, the paired design or the crossover design. To be specific, this article introduced formulas for sample size and testing power estimation of difference test for quantitative and qualitative data with the above three designs, the realization based on the formulas and the POWER procedure of SAS software and elaborated it with examples, which will benefit researchers for implementing the repetition principle.

  1. Statistical shear lag model - unraveling the size effect in hierarchical composites.

    Science.gov (United States)

    Wei, Xiaoding; Filleter, Tobin; Espinosa, Horacio D

    2015-05-01

    Numerous experimental and computational studies have established that the hierarchical structures encountered in natural materials, such as the brick-and-mortar structure observed in sea shells, are essential for achieving defect tolerance. Due to this hierarchy, the mechanical properties of natural materials have a different size dependence compared to that of typical engineered materials. This study aimed to explore size effects on the strength of bio-inspired staggered hierarchical composites and to define the influence of the geometry of constituents in their outstanding defect tolerance capability. A statistical shear lag model is derived by extending the classical shear lag model to account for the statistics of the constituents' strength. A general solution emerges from rigorous mathematical derivations, unifying the various empirical formulations for the fundamental link length used in previous statistical models. The model shows that the staggered arrangement of constituents grants composites a unique size effect on mechanical strength in contrast to homogenous continuous materials. The model is applied to hierarchical yarns consisting of double-walled carbon nanotube bundles to assess its predictive capabilities for novel synthetic materials. Interestingly, the model predicts that yarn gauge length does not significantly influence the yarn strength, in close agreement with experimental observations. Copyright © 2015 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.

  2. Frictional behaviour of sandstone: A sample-size dependent triaxial investigation

    Science.gov (United States)

    Roshan, Hamid; Masoumi, Hossein; Regenauer-Lieb, Klaus

    2017-01-01

    Frictional behaviour of rocks from the initial stage of loading to final shear displacement along the formed shear plane has been widely investigated in the past. However the effect of sample size on such frictional behaviour has not attracted much attention. This is mainly related to the limitations in rock testing facilities as well as the complex mechanisms involved in sample-size dependent frictional behaviour of rocks. In this study, a suite of advanced triaxial experiments was performed on Gosford sandstone samples at different sizes and confining pressures. The post-peak response of the rock along the formed shear plane has been captured for the analysis with particular interest in sample-size dependency. Several important phenomena have been observed from the results of this study: a) the rate of transition from brittleness to ductility in rock is sample-size dependent where the relatively smaller samples showed faster transition toward ductility at any confining pressure; b) the sample size influences the angle of formed shear band and c) the friction coefficient of the formed shear plane is sample-size dependent where the relatively smaller sample exhibits lower friction coefficient compared to larger samples. We interpret our results in terms of a thermodynamics approach in which the frictional properties for finite deformation are viewed as encompassing a multitude of ephemeral slipping surfaces prior to the formation of the through going fracture. The final fracture itself is seen as a result of the self-organisation of a sufficiently large ensemble of micro-slip surfaces and therefore consistent in terms of the theory of thermodynamics. This assumption vindicates the use of classical rock mechanics experiments to constrain failure of pressure sensitive rocks and the future imaging of these micro-slips opens an exciting path for research in rock failure mechanisms.

  3. Assessing the precision of a time-sampling-based study among GPs: balancing sample size and measurement frequency.

    Science.gov (United States)

    van Hassel, Daniël; van der Velden, Lud; de Bakker, Dinny; van der Hoek, Lucas; Batenburg, Ronald

    2017-12-04

    Our research is based on a technique for time sampling, an innovative method for measuring the working hours of Dutch general practitioners (GPs), which was deployed in an earlier study. In this study, 1051 GPs were questioned about their activities in real time by sending them one SMS text message every 3 h during 1 week. The required sample size for this study is important for health workforce planners to know if they want to apply this method to target groups who are hard to reach or if fewer resources are available. In this time-sampling method, however, standard power analyses is not sufficient for calculating the required sample size as this accounts only for sample fluctuation and not for the fluctuation of measurements taken from every participant. We investigated the impact of the number of participants and frequency of measurements per participant upon the confidence intervals (CIs) for the hours worked per week. Statistical analyses of the time-use data we obtained from GPs were performed. Ninety-five percent CIs were calculated, using equations and simulation techniques, for various different numbers of GPs included in the dataset and for various frequencies of measurements per participant. Our results showed that the one-tailed CI, including sample and measurement fluctuation, decreased from 21 until 3 h between one and 50 GPs. As a result of the formulas to calculate CIs, the increase of the precision continued and was lower with the same additional number of GPs. Likewise, the analyses showed how the number of participants required decreased if more measurements per participant were taken. For example, one measurement per 3-h time slot during the week requires 300 GPs to achieve a CI of 1 h, while one measurement per hour requires 100 GPs to obtain the same result. The sample size needed for time-use research based on a time-sampling technique depends on the design and aim of the study. In this paper, we showed how the precision of the

  4. [An investigation of the statistical power of the effect size in randomized controlled trials for the treatment of patients with type 2 diabetes mellitus using Chinese medicine].

    Science.gov (United States)

    Ma, Li-Xin; Liu, Jian-Ping

    2012-01-01

    To investigate whether the power of the effect size was based on adequate sample size in randomized controlled trials (RCTs) for the treatment of patients with type 2 diabetes mellitus (T2DM) using Chinese medicine. China Knowledge Resource Integrated Database (CNKI), VIP Database for Chinese Technical Periodicals (VIP), Chinese Biomedical Database (CBM), and Wangfang Data were systematically recruited using terms like "Xiaoke" or diabetes, Chinese herbal medicine, patent medicine, traditional Chinese medicine, randomized, controlled, blinded, and placebo-controlled. Limitation was set on the intervention course > or = 3 months in order to identify the information of outcome assessement and the sample size. Data collection forms were made according to the checking lists found in the CONSORT statement. Independent double data extractions were performed on all included trials. The statistical power of the effects size for each RCT study was assessed using sample size calculation equations. (1) A total of 207 RCTs were included, including 111 superiority trials and 96 non-inferiority trials. (2) Among the 111 superiority trials, fasting plasma glucose (FPG) and glycosylated hemoglobin HbA1c (HbA1c) outcome measure were reported in 9% and 12% of the RCTs respectively with the sample size > 150 in each trial. For the outcome of HbA1c, only 10% of the RCTs had more than 80% power. For FPG, 23% of the RCTs had more than 80% power. (3) In the 96 non-inferiority trials, the outcomes FPG and HbA1c were reported as 31% and 36% respectively. These RCTs had a samples size > 150. For HbA1c only 36% of the RCTs had more than 80% power. For FPG, only 27% of the studies had more than 80% power. The sample size for statistical analysis was distressingly low and most RCTs did not achieve 80% power. In order to obtain a sufficient statistic power, it is recommended that clinical trials should establish clear research objective and hypothesis first, and choose scientific and evidence

  5. Finite-sample instrumental variables inference using an asymptotically pivotal statistic

    NARCIS (Netherlands)

    Bekker, P; Kleibergen, F

    2003-01-01

    We consider the K-statistic, Kleibergen's (2002, Econometrica 70, 1781-1803) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Whereas Kleibergen (2002) especially analyzes the asymptotic behavior of the statistic, we focus on finite-sample properties in, a

  6. On the Importance of Accounting for Competing Risks in Pediatric Brain Cancer: II. Regression Modeling and Sample Size

    International Nuclear Information System (INIS)

    Tai, Bee-Choo; Grundy, Richard; Machin, David

    2011-01-01

    Purpose: To accurately model the cumulative need for radiotherapy in trials designed to delay or avoid irradiation among children with malignant brain tumor, it is crucial to account for competing events and evaluate how each contributes to the timing of irradiation. An appropriate choice of statistical model is also important for adequate determination of sample size. Methods and Materials: We describe the statistical modeling of competing events (A, radiotherapy after progression; B, no radiotherapy after progression; and C, elective radiotherapy) using proportional cause-specific and subdistribution hazard functions. The procedures of sample size estimation based on each method are outlined. These are illustrated by use of data comparing children with ependymoma and other malignant brain tumors. The results from these two approaches are compared. Results: The cause-specific hazard analysis showed a reduction in hazards among infants with ependymoma for all event types, including Event A (adjusted cause-specific hazard ratio, 0.76; 95% confidence interval, 0.45-1.28). Conversely, the subdistribution hazard analysis suggested an increase in hazard for Event A (adjusted subdistribution hazard ratio, 1.35; 95% confidence interval, 0.80-2.30), but the reduction in hazards for Events B and C remained. Analysis based on subdistribution hazard requires a larger sample size than the cause-specific hazard approach. Conclusions: Notable differences in effect estimates and anticipated sample size were observed between methods when the main event showed a beneficial effect whereas the competing events showed an adverse effect on the cumulative incidence. The subdistribution hazard is the most appropriate for modeling treatment when its effects on both the main and competing events are of interest.

  7. Computing physical properties with quantum Monte Carlo methods with statistical fluctuations independent of system size.

    Science.gov (United States)

    Assaraf, Roland

    2014-12-01

    We show that the recently proposed correlated sampling without reweighting procedure extends the locality (asymptotic independence of the system size) of a physical property to the statistical fluctuations of its estimator. This makes the approach potentially vastly more efficient for computing space-localized properties in large systems compared with standard correlated methods. A proof is given for a large collection of noninteracting fragments. Calculations on hydrogen chains suggest that this behavior holds not only for systems displaying short-range correlations, but also for systems with long-range correlations.

  8. New Hybrid Monte Carlo methods for efficient sampling. From physics to biology and statistics

    International Nuclear Information System (INIS)

    Akhmatskaya, Elena; Reich, Sebastian

    2011-01-01

    We introduce a class of novel hybrid methods for detailed simulations of large complex systems in physics, biology, materials science and statistics. These generalized shadow Hybrid Monte Carlo (GSHMC) methods combine the advantages of stochastic and deterministic simulation techniques. They utilize a partial momentum update to retain some of the dynamical information, employ modified Hamiltonians to overcome exponential performance degradation with the system’s size and make use of multi-scale nature of complex systems. Variants of GSHMCs were developed for atomistic simulation, particle simulation and statistics: GSHMC (thermodynamically consistent implementation of constant-temperature molecular dynamics), MTS-GSHMC (multiple-time-stepping GSHMC), meso-GSHMC (Metropolis corrected dissipative particle dynamics (DPD) method), and a generalized shadow Hamiltonian Monte Carlo, GSHmMC (a GSHMC for statistical simulations). All of these are compatible with other enhanced sampling techniques and suitable for massively parallel computing allowing for a range of multi-level parallel strategies. A brief description of the GSHMC approach, examples of its application on high performance computers and comparison with other existing techniques are given. Our approach is shown to resolve such problems as resonance instabilities of the MTS methods and non-preservation of thermodynamic equilibrium properties in DPD, and to outperform known methods in sampling efficiency by an order of magnitude. (author)

  9. Effects of sample size on the second magnetization peak in ...

    Indian Academy of Sciences (India)

    the sample size decreases – a result that could be interpreted as a size effect in the order– disorder vortex matter phase transition. However, local magnetic measurements trace this effect to metastable disordered vortex states, revealing the same order–disorder transition induction in samples of different size. Keywords.

  10. Box-Cox transformation of firm size data in statistical analysis

    Science.gov (United States)

    Chen, Ting Ting; Takaishi, Tetsuya

    2014-03-01

    Firm size data usually do not show the normality that is often assumed in statistical analysis such as regression analysis. In this study we focus on two firm size data: the number of employees and sale. Those data deviate considerably from a normal distribution. To improve the normality of those data we transform them by the Box-Cox transformation with appropriate parameters. The Box-Cox transformation parameters are determined so that the transformed data best show the kurtosis of a normal distribution. It is found that the two firm size data transformed by the Box-Cox transformation show strong linearity. This indicates that the number of employees and sale have the similar property as a firm size indicator. The Box-Cox parameters obtained for the firm size data are found to be very close to zero. In this case the Box-Cox transformations are approximately a log-transformation. This suggests that the firm size data we used are approximately log-normal distributions.

  11. Box-Cox transformation of firm size data in statistical analysis

    International Nuclear Information System (INIS)

    Chen, Ting Ting; Takaishi, Tetsuya

    2014-01-01

    Firm size data usually do not show the normality that is often assumed in statistical analysis such as regression analysis. In this study we focus on two firm size data: the number of employees and sale. Those data deviate considerably from a normal distribution. To improve the normality of those data we transform them by the Box-Cox transformation with appropriate parameters. The Box-Cox transformation parameters are determined so that the transformed data best show the kurtosis of a normal distribution. It is found that the two firm size data transformed by the Box-Cox transformation show strong linearity. This indicates that the number of employees and sale have the similar property as a firm size indicator. The Box-Cox parameters obtained for the firm size data are found to be very close to zero. In this case the Box-Cox transformations are approximately a log-transformation. This suggests that the firm size data we used are approximately log-normal distributions

  12. On sample size and different interpretations of snow stability datasets

    Science.gov (United States)

    Schirmer, M.; Mitterer, C.; Schweizer, J.

    2009-04-01

    aspect distributions to the large dataset. We used 100 different subsets for each sample size. Statistical variations obtained in the complete dataset were also tested on the smaller subsets using the Mann-Whitney or the Kruskal-Wallis test. For each subset size, the number of subsets were counted in which the significance level was reached. For these tests no nominal data scale was assumed. (iii) For the same subsets described above, the distribution of the aspect median was determined. A count of how often this distribution was substantially different from the distribution obtained with the complete dataset was made. Since two valid stability interpretations were available (an objective and a subjective interpretation as described above), the effect of the arbitrary choice of the interpretation on spatial variability results was tested. In over one third of the cases the two interpretations came to different results. The effect of these differences were studied in a similar method as described in (iii): the distribution of the aspect median was determined for subsets of the complete dataset using both interpretations, compared against each other as well as to the results of the complete dataset. For the complete dataset the two interpretations showed mainly identical results. Therefore the subset size was determined from the point at which the results of the two interpretations converged. A universal result for the optimal subset size cannot be presented since results differed between different situations contained in the dataset. The optimal subset size is thus dependent on stability variation in a given situation, which is unknown initially. There are indications that for some situations even the complete dataset might be not large enough. At a subset size of approximately 25, the significant differences between aspect groups (as determined using the whole dataset) were only obtained in one out of five situations. In some situations, up to 20% of the subsets showed a

  13. On the Structure of Cortical Microcircuits Inferred from Small Sample Sizes.

    Science.gov (United States)

    Vegué, Marina; Perin, Rodrigo; Roxin, Alex

    2017-08-30

    The structure in cortical microcircuits deviates from what would be expected in a purely random network, which has been seen as evidence of clustering. To address this issue, we sought to reproduce the nonrandom features of cortical circuits by considering several distinct classes of network topology, including clustered networks, networks with distance-dependent connectivity, and those with broad degree distributions. To our surprise, we found that all of these qualitatively distinct topologies could account equally well for all reported nonrandom features despite being easily distinguishable from one another at the network level. This apparent paradox was a consequence of estimating network properties given only small sample sizes. In other words, networks that differ markedly in their global structure can look quite similar locally. This makes inferring network structure from small sample sizes, a necessity given the technical difficulty inherent in simultaneous intracellular recordings, problematic. We found that a network statistic called the sample degree correlation (SDC) overcomes this difficulty. The SDC depends only on parameters that can be estimated reliably given small sample sizes and is an accurate fingerprint of every topological family. We applied the SDC criterion to data from rat visual and somatosensory cortex and discovered that the connectivity was not consistent with any of these main topological classes. However, we were able to fit the experimental data with a more general network class, of which all previous topologies were special cases. The resulting network topology could be interpreted as a combination of physical spatial dependence and nonspatial, hierarchical clustering. SIGNIFICANCE STATEMENT The connectivity of cortical microcircuits exhibits features that are inconsistent with a simple random network. Here, we show that several classes of network models can account for this nonrandom structure despite qualitative differences in

  14. Rule-of-thumb adjustment of sample sizes to accommodate dropouts in a two-stage analysis of repeated measurements.

    Science.gov (United States)

    Overall, John E; Tonidandel, Scott; Starbuck, Robert R

    2006-01-01

    Recent contributions to the statistical literature have provided elegant model-based solutions to the problem of estimating sample sizes for testing the significance of differences in mean rates of change across repeated measures in controlled longitudinal studies with differentially correlated error and missing data due to dropouts. However, the mathematical complexity and model specificity of these solutions make them generally inaccessible to most applied researchers who actually design and undertake treatment evaluation research in psychiatry. In contrast, this article relies on a simple two-stage analysis in which dropout-weighted slope coefficients fitted to the available repeated measurements for each subject separately serve as the dependent variable for a familiar ANCOVA test of significance for differences in mean rates of change. This article is about how a sample of size that is estimated or calculated to provide desired power for testing that hypothesis without considering dropouts can be adjusted appropriately to take dropouts into account. Empirical results support the conclusion that, whatever reasonable level of power would be provided by a given sample size in the absence of dropouts, essentially the same power can be realized in the presence of dropouts simply by adding to the original dropout-free sample size the number of subjects who would be expected to drop from a sample of that original size under conditions of the proposed study.

  15. Sample Size in Qualitative Interview Studies: Guided by Information Power.

    Science.gov (United States)

    Malterud, Kirsti; Siersma, Volkert Dirk; Guassora, Ann Dorrit

    2015-11-27

    Sample sizes must be ascertained in qualitative studies like in quantitative studies but not by the same means. The prevailing concept for sample size in qualitative studies is "saturation." Saturation is closely tied to a specific methodology, and the term is inconsistently applied. We propose the concept "information power" to guide adequate sample size for qualitative studies. Information power indicates that the more information the sample holds, relevant for the actual study, the lower amount of participants is needed. We suggest that the size of a sample with sufficient information power depends on (a) the aim of the study, (b) sample specificity, (c) use of established theory, (d) quality of dialogue, and (e) analysis strategy. We present a model where these elements of information and their relevant dimensions are related to information power. Application of this model in the planning and during data collection of a qualitative study is discussed. © The Author(s) 2015.

  16. Effect of dislocation pile-up on size-dependent yield strength in finite single-crystal micro-samples

    Energy Technology Data Exchange (ETDEWEB)

    Pan, Bo; Shibutani, Yoji, E-mail: sibutani@mech.eng.osaka-u.ac.jp [Department of Mechanical Engineering, Osaka University, Suita 565-0871 (Japan); Zhang, Xu [State Key Laboratory for Strength and Vibration of Mechanical Structures, School of Aerospace, Xi' an Jiaotong University, Xi' an 710049 (China); School of Mechanics and Engineering Science, Zhengzhou University, Zhengzhou 450001 (China); Shang, Fulin [State Key Laboratory for Strength and Vibration of Mechanical Structures, School of Aerospace, Xi' an Jiaotong University, Xi' an 710049 (China)

    2015-07-07

    Recent research has explained that the steeply increasing yield strength in metals depends on decreasing sample size. In this work, we derive a statistical physical model of the yield strength of finite single-crystal micro-pillars that depends on single-ended dislocation pile-up inside the micro-pillars. We show that this size effect can be explained almost completely by considering the stochastic lengths of the dislocation source and the dislocation pile-up length in the single-crystal micro-pillars. The Hall–Petch-type relation holds even in a microscale single-crystal, which is characterized by its dislocation source lengths. Our quantitative conclusions suggest that the number of dislocation sources and pile-ups are significant factors for the size effect. They also indicate that starvation of dislocation sources is another reason for the size effect. Moreover, we investigated the explicit relationship between the stacking fault energy and the dislocation “pile-up” effect inside the sample: materials with low stacking fault energy exhibit an obvious dislocation pile-up effect. Our proposed physical model predicts a sample strength that agrees well with experimental data, and our model can give a more precise prediction than the current single arm source model, especially for materials with low stacking fault energy.

  17. Conservative Sample Size Determination for Repeated Measures Analysis of Covariance.

    Science.gov (United States)

    Morgan, Timothy M; Case, L Douglas

    2013-07-05

    In the design of a randomized clinical trial with one pre and multiple post randomized assessments of the outcome variable, one needs to account for the repeated measures in determining the appropriate sample size. Unfortunately, one seldom has a good estimate of the variance of the outcome measure, let alone the correlations among the measurements over time. We show how sample sizes can be calculated by making conservative assumptions regarding the correlations for a variety of covariance structures. The most conservative choice for the correlation depends on the covariance structure and the number of repeated measures. In the absence of good estimates of the correlations, the sample size is often based on a two-sample t-test, making the 'ultra' conservative and unrealistic assumption that there are zero correlations between the baseline and follow-up measures while at the same time assuming there are perfect correlations between the follow-up measures. Compared to the case of taking a single measurement, substantial savings in sample size can be realized by accounting for the repeated measures, even with very conservative assumptions regarding the parameters of the assumed correlation matrix. Assuming compound symmetry, the sample size from the two-sample t-test calculation can be reduced at least 44%, 56%, and 61% for repeated measures analysis of covariance by taking 2, 3, and 4 follow-up measures, respectively. The results offer a rational basis for determining a fairly conservative, yet efficient, sample size for clinical trials with repeated measures and a baseline value.

  18. The Power of Low Back Pain Trials: A Systematic Review of Power, Sample Size, and Reporting of Sample Size Calculations Over Time, in Trials Published Between 1980 and 2012.

    Science.gov (United States)

    Froud, Robert; Rajendran, Dévan; Patel, Shilpa; Bright, Philip; Bjørkli, Tom; Eldridge, Sandra; Buchbinder, Rachelle; Underwood, Martin

    2017-06-01

    A systematic review of nonspecific low back pain trials published between 1980 and 2012. To explore what proportion of trials have been powered to detect different bands of effect size; whether there is evidence that sample size in low back pain trials has been increasing; what proportion of trial reports include a sample size calculation; and whether likelihood of reporting sample size calculations has increased. Clinical trials should have a sample size sufficient to detect a minimally important difference for a given power and type I error rate. An underpowered trial is one within which probability of type II error is too high. Meta-analyses do not mitigate underpowered trials. Reviewers independently abstracted data on sample size at point of analysis, whether a sample size calculation was reported, and year of publication. Descriptive analyses were used to explore ability to detect effect sizes, and regression analyses to explore the relationship between sample size, or reporting sample size calculations, and time. We included 383 trials. One-third were powered to detect a standardized mean difference of less than 0.5, and 5% were powered to detect less than 0.3. The average sample size was 153 people, which increased only slightly (∼4 people/yr) from 1980 to 2000, and declined slightly (∼4.5 people/yr) from 2005 to 2011 (P pain trials and the reporting of sample size calculations may need to be increased. It may be justifiable to power a trial to detect only large effects in the case of novel interventions. 3.

  19. Sample size choices for XRCT scanning of highly unsaturated soil mixtures

    Directory of Open Access Journals (Sweden)

    Smith Jonathan C.

    2016-01-01

    Full Text Available Highly unsaturated soil mixtures (clay, sand and gravel are used as building materials in many parts of the world, and there is increasing interest in understanding their mechanical and hydraulic behaviour. In the laboratory, x-ray computed tomography (XRCT is becoming more widely used to investigate the microstructures of soils, however a crucial issue for such investigations is the choice of sample size, especially concerning the scanning of soil mixtures where there will be a range of particle and void sizes. In this paper we present a discussion (centred around a new set of XRCT scans on sample sizing for scanning of samples comprising soil mixtures, where a balance has to be made between realistic representation of the soil components and the desire for high resolution scanning, We also comment on the appropriateness of differing sample sizes in comparison to sample sizes used for other geotechnical testing. Void size distributions for the samples are presented and from these some hypotheses are made as to the roles of inter- and intra-aggregate voids in the mechanical behaviour of highly unsaturated soils.

  20. Statistical sampling for holdup measurement

    International Nuclear Information System (INIS)

    Picard, R.R.; Pillay, K.K.S.

    1986-01-01

    Nuclear materials holdup is a serious problem in many operating facilities. Estimating amounts of holdup is important for materials accounting and, sometimes, for process safety. Clearly, measuring holdup in all pieces of equipment is not a viable option in terms of time, money, and radiation exposure to personnel. Furthermore, 100% measurement is not only impractical but unnecessary for developing estimated values. Principles of statistical sampling are valuable in the design of cost effective holdup monitoring plans and in qualifying uncertainties in holdup estimates. The purpose of this paper is to describe those principles and to illustrate their use

  1. An audit of the statistics and the comparison with the parameter in the population

    Science.gov (United States)

    Bujang, Mohamad Adam; Sa'at, Nadiah; Joys, A. Reena; Ali, Mariana Mohamad

    2015-10-01

    The sufficient sample size that is needed to closely estimate the statistics for particular parameters are use to be an issue. Although sample size might had been calculated referring to objective of the study, however, it is difficult to confirm whether the statistics are closed with the parameter for a particular population. All these while, guideline that uses a p-value less than 0.05 is widely used as inferential evidence. Therefore, this study had audited results that were analyzed from various sub sample and statistical analyses and had compared the results with the parameters in three different populations. Eight types of statistical analysis and eight sub samples for each statistical analysis were analyzed. Results found that the statistics were consistent and were closed to the parameters when the sample study covered at least 15% to 35% of population. Larger sample size is needed to estimate parameter that involve with categorical variables compared with numerical variables. Sample sizes with 300 to 500 are sufficient to estimate the parameters for medium size of population.

  2. Decision Support on Small size Passive Samples

    Directory of Open Access Journals (Sweden)

    Vladimir Popukaylo

    2018-05-01

    Full Text Available A construction technique of adequate mathematical models for small size passive samples, in conditions when classical probabilistic-statis\\-tical methods do not allow obtaining valid conclusions was developed.

  3. Statistical Significance and Effect Size: Two Sides of a Coin.

    Science.gov (United States)

    Fan, Xitao

    This paper suggests that statistical significance testing and effect size are two sides of the same coin; they complement each other, but do not substitute for one another. Good research practice requires that both should be taken into consideration to make sound quantitative decisions. A Monte Carlo simulation experiment was conducted, and a…

  4. Sampling methods to the statistical control of the production of blood components.

    Science.gov (United States)

    Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo

    2017-12-01

    The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Simple and multiple linear regression: sample size considerations.

    Science.gov (United States)

    Hanley, James A

    2016-11-01

    The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. Effects of sample size on estimation of rainfall extremes at high temperatures

    Science.gov (United States)

    Boessenkool, Berry; Bürger, Gerd; Heistermann, Maik

    2017-09-01

    High precipitation quantiles tend to rise with temperature, following the so-called Clausius-Clapeyron (CC) scaling. It is often reported that the CC-scaling relation breaks down and even reverts for very high temperatures. In our study, we investigate this reversal using observational climate data from 142 stations across Germany. One of the suggested meteorological explanations for the breakdown is limited moisture supply. Here we argue that, instead, it could simply originate from undersampling. As rainfall frequency generally decreases with higher temperatures, rainfall intensities as dictated by CC scaling are less likely to be recorded than for moderate temperatures. Empirical quantiles are conventionally estimated from order statistics via various forms of plotting position formulas. They have in common that their largest representable return period is given by the sample size. In small samples, high quantiles are underestimated accordingly. The small-sample effect is weaker, or disappears completely, when using parametric quantile estimates from a generalized Pareto distribution (GPD) fitted with L moments. For those, we obtain quantiles of rainfall intensities that continue to rise with temperature.

  7. Effects of sample size on estimation of rainfall extremes at high temperatures

    Directory of Open Access Journals (Sweden)

    B. Boessenkool

    2017-09-01

    Full Text Available High precipitation quantiles tend to rise with temperature, following the so-called Clausius–Clapeyron (CC scaling. It is often reported that the CC-scaling relation breaks down and even reverts for very high temperatures. In our study, we investigate this reversal using observational climate data from 142 stations across Germany. One of the suggested meteorological explanations for the breakdown is limited moisture supply. Here we argue that, instead, it could simply originate from undersampling. As rainfall frequency generally decreases with higher temperatures, rainfall intensities as dictated by CC scaling are less likely to be recorded than for moderate temperatures. Empirical quantiles are conventionally estimated from order statistics via various forms of plotting position formulas. They have in common that their largest representable return period is given by the sample size. In small samples, high quantiles are underestimated accordingly. The small-sample effect is weaker, or disappears completely, when using parametric quantile estimates from a generalized Pareto distribution (GPD fitted with L moments. For those, we obtain quantiles of rainfall intensities that continue to rise with temperature.

  8. The attention-weighted sample-size model of visual short-term memory

    DEFF Research Database (Denmark)

    Smith, Philip L.; Lilburn, Simon D.; Corbett, Elaine A.

    2016-01-01

    exceeded that predicted by the sample-size model for both simultaneously and sequentially presented stimuli. Instead, the set-size effect and the serial position curves with sequential presentation were predicted by an attention-weighted version of the sample-size model, which assumes that one of the items...

  9. Sampling surface and subsurface particle-size distributions in wadable gravel-and cobble-bed streams for analyses in sediment transport, hydraulics, and streambed monitoring

    Science.gov (United States)

    Kristin Bunte; Steven R. Abt

    2001-01-01

    This document provides guidance for sampling surface and subsurface sediment from wadable gravel-and cobble-bed streams. After a short introduction to streams types and classifications in gravel-bed rivers, the document explains the field and laboratory measurement of particle sizes and the statistical analysis of particle-size distributions. Analysis of particle...

  10. Statistical analysis of hydrological response in urbanising catchments based on adaptive sampling using inter-amount times

    Science.gov (United States)

    ten Veldhuis, Marie-Claire; Schleiss, Marc

    2017-04-01

    Urban catchments are typically characterised by a more flashy nature of the hydrological response compared to natural catchments. Predicting flow changes associated with urbanisation is not straightforward, as they are influenced by interactions between impervious cover, basin size, drainage connectivity and stormwater management infrastructure. In this study, we present an alternative approach to statistical analysis of hydrological response variability and basin flashiness, based on the distribution of inter-amount times. We analyse inter-amount time distributions of high-resolution streamflow time series for 17 (semi-)urbanised basins in North Carolina, USA, ranging from 13 to 238 km2 in size. We show that in the inter-amount-time framework, sampling frequency is tuned to the local variability of the flow pattern, resulting in a different representation and weighting of high and low flow periods in the statistical distribution. This leads to important differences in the way the distribution quantiles, mean, coefficient of variation and skewness vary across scales and results in lower mean intermittency and improved scaling. Moreover, we show that inter-amount-time distributions can be used to detect regulation effects on flow patterns, identify critical sampling scales and characterise flashiness of hydrological response. The possibility to use both the classical approach and the inter-amount-time framework to identify minimum observable scales and analyse flow data opens up interesting areas for future research.

  11. Grain size statistics and depositional pattern of the Ecca Group sandstones, Karoo Supergroup in the Eastern Cape Province, South Africa

    Directory of Open Access Journals (Sweden)

    Baiyegunhi Christopher

    2017-11-01

    Full Text Available Grain size analysis is a vital sedimentological tool used to unravel the hydrodynamic conditions, mode of transportation and deposition of detrital sediments. In this study, detailed grain-size analysis was carried out on thirty-five sandstone samples from the Ecca Group in the Eastern Cape Province of South Africa. Grain-size statistical parameters, bivariate analysis, linear discriminate functions, Passega diagrams and log-probability curves were used to reveal the depositional processes, sedimentation mechanisms, hydrodynamic energy conditions and to discriminate different depositional environments. The grain-size parameters show that most of the sandstones are very fine to fine grained, moderately well sorted, mostly near-symmetrical and mesokurtic in nature. The abundance of very fine to fine grained sandstones indicate the dominance of low energy environment. The bivariate plots show that the samples are mostly grouped, except for the Prince Albert samples that show scattered trend, which is due to the either mixture of two modes in equal proportion in bimodal sediments or good sorting in unimodal sediments. The linear discriminant function analysis is dominantly indicative of turbidity current deposits under shallow marine environments for samples from the Prince Albert, Collingham and Ripon Formations, while those samples from the Fort Brown Formation are lacustrine or deltaic deposits. The C-M plots indicated that the sediments were deposited mainly by suspension and saltation, and graded suspension. Visher diagrams show that saltation is the major process of transportation, followed by suspension.

  12. Grain size statistics and depositional pattern of the Ecca Group sandstones, Karoo Supergroup in the Eastern Cape Province, South Africa

    Science.gov (United States)

    Baiyegunhi, Christopher; Liu, Kuiwu; Gwavava, Oswald

    2017-11-01

    Grain size analysis is a vital sedimentological tool used to unravel the hydrodynamic conditions, mode of transportation and deposition of detrital sediments. In this study, detailed grain-size analysis was carried out on thirty-five sandstone samples from the Ecca Group in the Eastern Cape Province of South Africa. Grain-size statistical parameters, bivariate analysis, linear discriminate functions, Passega diagrams and log-probability curves were used to reveal the depositional processes, sedimentation mechanisms, hydrodynamic energy conditions and to discriminate different depositional environments. The grain-size parameters show that most of the sandstones are very fine to fine grained, moderately well sorted, mostly near-symmetrical and mesokurtic in nature. The abundance of very fine to fine grained sandstones indicate the dominance of low energy environment. The bivariate plots show that the samples are mostly grouped, except for the Prince Albert samples that show scattered trend, which is due to the either mixture of two modes in equal proportion in bimodal sediments or good sorting in unimodal sediments. The linear discriminant function analysis is dominantly indicative of turbidity current deposits under shallow marine environments for samples from the Prince Albert, Collingham and Ripon Formations, while those samples from the Fort Brown Formation are lacustrine or deltaic deposits. The C-M plots indicated that the sediments were deposited mainly by suspension and saltation, and graded suspension. Visher diagrams show that saltation is the major process of transportation, followed by suspension.

  13. DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data

    International Nuclear Information System (INIS)

    Harris, S.P.

    1997-01-01

    This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock-up Facility using 3 ml inserts and 15 ml peanut vials. A number of the insert samples were analyzed by Cold Chem and compared with full peanut vial samples analyzed by the current methods. The remaining inserts were analyzed by

  14. A Systematic Review of Surgical Randomized Controlled Trials: Part 2. Funding Source, Conflict of Interest, and Sample Size in Plastic Surgery.

    Science.gov (United States)

    Voineskos, Sophocles H; Coroneos, Christopher J; Ziolkowski, Natalia I; Kaur, Manraj N; Banfield, Laura; Meade, Maureen O; Chung, Kevin C; Thoma, Achilleas; Bhandari, Mohit

    2016-02-01

    The authors examined industry support, conflict of interest, and sample size in plastic surgery randomized controlled trials that compared surgical interventions. They hypothesized that industry-funded trials demonstrate statistically significant outcomes more often, and randomized controlled trials with small sample sizes report statistically significant results more frequently. An electronic search identified randomized controlled trials published between 2000 and 2013. Independent reviewers assessed manuscripts and performed data extraction. Funding source, conflict of interest, primary outcome direction, and sample size were examined. Chi-squared and independent-samples t tests were used in the analysis. The search identified 173 randomized controlled trials, of which 100 (58 percent) did not acknowledge funding status. A relationship between funding source and trial outcome direction was not observed. Both funding status and conflict of interest reporting improved over time. Only 24 percent (six of 25) of industry-funded randomized controlled trials reported authors to have independent control of data and manuscript contents. The mean number of patients randomized was 73 per trial (median, 43, minimum, 3, maximum, 936). Small trials were not found to be positive more often than large trials (p = 0.87). Randomized controlled trials with small sample size were common; however, this provides great opportunity for the field to engage in further collaboration and produce larger, more definitive trials. Reporting of trial funding and conflict of interest is historically poor, but it greatly improved over the study period. Underreporting at author and journal levels remains a limitation when assessing the relationship between funding source and trial outcomes. Improved reporting and manuscript control should be goals that both authors and journals can actively achieve.

  15. Multivariate statistics high-dimensional and large-sample approximations

    CERN Document Server

    Fujikoshi, Yasunori; Shimizu, Ryoichi

    2010-01-01

    A comprehensive examination of high-dimensional analysis of multivariate methods and their real-world applications Multivariate Statistics: High-Dimensional and Large-Sample Approximations is the first book of its kind to explore how classical multivariate methods can be revised and used in place of conventional statistical tools. Written by prominent researchers in the field, the book focuses on high-dimensional and large-scale approximations and details the many basic multivariate methods used to achieve high levels of accuracy. The authors begin with a fundamental presentation of the basic

  16. The Role of the Sampling Distribution in Understanding Statistical Inference

    Science.gov (United States)

    Lipson, Kay

    2003-01-01

    Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…

  17. Weighted statistical parameters for irregularly sampled time series

    Science.gov (United States)

    Rimoldini, Lorenzo

    2014-01-01

    Unevenly spaced time series are common in astronomy because of the day-night cycle, weather conditions, dependence on the source position in the sky, allocated telescope time and corrupt measurements, for example, or inherent to the scanning law of satellites like Hipparcos and the forthcoming Gaia. Irregular sampling often causes clumps of measurements and gaps with no data which can severely disrupt the values of estimators. This paper aims at improving the accuracy of common statistical parameters when linear interpolation (in time or phase) can be considered an acceptable approximation of a deterministic signal. A pragmatic solution is formulated in terms of a simple weighting scheme, adapting to the sampling density and noise level, applicable to large data volumes at minimal computational cost. Tests on time series from the Hipparcos periodic catalogue led to significant improvements in the overall accuracy and precision of the estimators with respect to the unweighted counterparts and those weighted by inverse-squared uncertainties. Automated classification procedures employing statistical parameters weighted by the suggested scheme confirmed the benefits of the improved input attributes. The classification of eclipsing binaries, Mira, RR Lyrae, Delta Cephei and Alpha2 Canum Venaticorum stars employing exclusively weighted descriptive statistics achieved an overall accuracy of 92 per cent, about 6 per cent higher than with unweighted estimators.

  18. Sample size allocation in multiregional equivalence studies.

    Science.gov (United States)

    Liao, Jason J Z; Yu, Ziji; Li, Yulan

    2018-06-17

    With the increasing globalization of drug development, the multiregional clinical trial (MRCT) has gained extensive use. The data from MRCTs could be accepted by regulatory authorities across regions and countries as the primary sources of evidence to support global marketing drug approval simultaneously. The MRCT can speed up patient enrollment and drug approval, and it makes the effective therapies available to patients all over the world simultaneously. However, there are many challenges both operationally and scientifically in conducting a drug development globally. One of many important questions to answer for the design of a multiregional study is how to partition sample size into each individual region. In this paper, two systematic approaches are proposed for the sample size allocation in a multiregional equivalence trial. A numerical evaluation and a biosimilar trial are used to illustrate the characteristics of the proposed approaches. Copyright © 2018 John Wiley & Sons, Ltd.

  19. Sampling strategies for estimating brook trout effective population size

    Science.gov (United States)

    Andrew R. Whiteley; Jason A. Coombs; Mark Hudy; Zachary Robinson; Keith H. Nislow; Benjamin H. Letcher

    2012-01-01

    The influence of sampling strategy on estimates of effective population size (Ne) from single-sample genetic methods has not been rigorously examined, though these methods are increasingly used. For headwater salmonids, spatially close kin association among age-0 individuals suggests that sampling strategy (number of individuals and location from...

  20. The Effect of Sterilization on Size and Shape of Fat Globules in Model Processed Cheese Samples

    Directory of Open Access Journals (Sweden)

    B. Tremlová

    2006-01-01

    Full Text Available Model cheese samples from 4 independent productions were heat sterilized (117 °C, 20 minutes after the melting process and packing with an aim to prolong their durability. The objective of the study was to assess changes in the size and shape of fat globules due to heat sterilization by using image analysis methods. The study included a selection of suitable methods of preparation mounts, taking microphotographs and making overlays for automatic processing of photographs by image analyser, ascertaining parameters to determine the size and shape of fat globules and statistical analysis of results obtained. The results of the experiment suggest that changes in shape of fat globules due to heat sterilization are not unequivocal. We found that the size of fat globules was significantly increased (p < 0.01 due to heat sterilization (117 °C, 20 min, and the shares of small fat globules (up to 500 μm2, or 100 μm2 in the samples of heat sterilized processed cheese were decreased. The results imply that the image analysis method is very useful when assessing the effect of technological process on the quality of processed cheese quality.

  1. Sample Size Induced Brittle-to-Ductile Transition of Single-Crystal Aluminum Nitride

    Science.gov (United States)

    2015-08-01

    ARL-RP-0528 ● AUG 2015 US Army Research Laboratory Sample Size Induced Brittle-to- Ductile Transition of Single-Crystal Aluminum...originator. ARL-RP-0528 ● AUG 2015 US Army Research Laboratory Sample Size Induced Brittle-to- Ductile Transition of Single-Crystal...Sample Size Induced Brittle-to- Ductile Transition of Single-Crystal Aluminum Nitride 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT

  2. Sample size determination for logistic regression on a logit-normal distribution.

    Science.gov (United States)

    Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance

    2017-06-01

    Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.

  3. Sample size reassessment for a two-stage design controlling the false discovery rate.

    Science.gov (United States)

    Zehetmayer, Sonja; Graf, Alexandra C; Posch, Martin

    2015-11-01

    Sample size calculations for gene expression microarray and NGS-RNA-Seq experiments are challenging because the overall power depends on unknown quantities as the proportion of true null hypotheses and the distribution of the effect sizes under the alternative. We propose a two-stage design with an adaptive interim analysis where these quantities are estimated from the interim data. The second stage sample size is chosen based on these estimates to achieve a specific overall power. The proposed procedure controls the power in all considered scenarios except for very low first stage sample sizes. The false discovery rate (FDR) is controlled despite of the data dependent choice of sample size. The two-stage design can be a useful tool to determine the sample size of high-dimensional studies if in the planning phase there is high uncertainty regarding the expected effect sizes and variability.

  4. Optimizing the maximum reported cluster size in the spatial scan statistic for ordinal data.

    Science.gov (United States)

    Kim, Sehwi; Jung, Inkyung

    2017-01-01

    The spatial scan statistic is an important tool for spatial cluster detection. There have been numerous studies on scanning window shapes. However, little research has been done on the maximum scanning window size or maximum reported cluster size. Recently, Han et al. proposed to use the Gini coefficient to optimize the maximum reported cluster size. However, the method has been developed and evaluated only for the Poisson model. We adopt the Gini coefficient to be applicable to the spatial scan statistic for ordinal data to determine the optimal maximum reported cluster size. Through a simulation study and application to a real data example, we evaluate the performance of the proposed approach. With some sophisticated modification, the Gini coefficient can be effectively employed for the ordinal model. The Gini coefficient most often picked the optimal maximum reported cluster sizes that were the same as or smaller than the true cluster sizes with very high accuracy. It seems that we can obtain a more refined collection of clusters by using the Gini coefficient. The Gini coefficient developed specifically for the ordinal model can be useful for optimizing the maximum reported cluster size for ordinal data and helpful for properly and informatively discovering cluster patterns.

  5. Contributions to statistics

    CERN Document Server

    Mahalanobis, P C

    1965-01-01

    Contributions to Statistics focuses on the processes, methodologies, and approaches involved in statistics. The book is presented to Professor P. C. Mahalanobis on the occasion of his 70th birthday. The selection first offers information on the recovery of ancillary information and combinatorial properties of partially balanced designs and association schemes. Discussions focus on combinatorial applications of the algebra of association matrices, sample size analogy, association matrices and the algebra of association schemes, and conceptual statistical experiments. The book then examines latt

  6. Statistics 101 for Radiologists.

    Science.gov (United States)

    Anvari, Arash; Halpern, Elkan F; Samir, Anthony E

    2015-10-01

    Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced. © RSNA, 2015.

  7. Sample size optimization in nuclear material control. 1

    International Nuclear Information System (INIS)

    Gladitz, J.

    1982-01-01

    Equations have been derived and exemplified which allow the determination of the minimum variables sample size for given false alarm and detection probabilities of nuclear material losses and diversions, respectively. (author)

  8. Characterizing the Joint Effect of Diverse Test-Statistic Correlation Structures and Effect Size on False Discovery Rates in a Multiple-Comparison Study of Many Outcome Measures

    Science.gov (United States)

    Feiveson, Alan H.; Ploutz-Snyder, Robert; Fiedler, James

    2011-01-01

    In their 2009 Annals of Statistics paper, Gavrilov, Benjamini, and Sarkar report the results of a simulation assessing the robustness of their adaptive step-down procedure (GBS) for controlling the false discovery rate (FDR) when normally distributed test statistics are serially correlated. In this study we extend the investigation to the case of multiple comparisons involving correlated non-central t-statistics, in particular when several treatments or time periods are being compared to a control in a repeated-measures design with many dependent outcome measures. In addition, we consider several dependence structures other than serial correlation and illustrate how the FDR depends on the interaction between effect size and the type of correlation structure as indexed by Foerstner s distance metric from an identity. The relationship between the correlation matrix R of the original dependent variables and R, the correlation matrix of associated t-statistics is also studied. In general R depends not only on R, but also on sample size and the signed effect sizes for the multiple comparisons.

  9. Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics

    Science.gov (United States)

    Dowding, Irene; Haufe, Stefan

    2018-01-01

    Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885

  10. Comparison of pure and 'Latinized' centroidal Voronoi tessellation against various other statistical sampling methods

    International Nuclear Information System (INIS)

    Romero, Vicente J.; Burkardt, John V.; Gunzburger, Max D.; Peterson, Janet S.

    2006-01-01

    A recently developed centroidal Voronoi tessellation (CVT) sampling method is investigated here to assess its suitability for use in statistical sampling applications. CVT efficiently generates a highly uniform distribution of sample points over arbitrarily shaped M-dimensional parameter spaces. On several 2-D test problems CVT has recently been found to provide exceedingly effective and efficient point distributions for response surface generation. Additionally, for statistical function integration and estimation of response statistics associated with uniformly distributed random-variable inputs (uncorrelated), CVT has been found in initial investigations to provide superior points sets when compared against latin-hypercube and simple-random Monte Carlo methods and Halton and Hammersley quasi-random sequence methods. In this paper, the performance of all these sampling methods and a new variant ('Latinized' CVT) are further compared for non-uniform input distributions. Specifically, given uncorrelated normal inputs in a 2-D test problem, statistical sampling efficiencies are compared for resolving various statistics of response: mean, variance, and exceedence probabilities

  11. EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.

    Science.gov (United States)

    Tong, Xiaoxiao; Bentler, Peter M

    2013-01-01

    Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.

  12. A contemporary decennial global Landsat sample of changing agricultural field sizes

    Science.gov (United States)

    White, Emma; Roy, David

    2014-05-01

    Agriculture has caused significant human induced Land Cover Land Use (LCLU) change, with dramatic cropland expansion in the last century and significant increases in productivity over the past few decades. Satellite data have been used for agricultural applications including cropland distribution mapping, crop condition monitoring, crop production assessment and yield prediction. Satellite based agricultural applications are less reliable when the sensor spatial resolution is small relative to the field size. However, to date, studies of agricultural field size distributions and their change have been limited, even though this information is needed to inform the design of agricultural satellite monitoring systems. Moreover, the size of agricultural fields is a fundamental description of rural landscapes and provides an insight into the drivers of rural LCLU change. In many parts of the world field sizes may have increased. Increasing field sizes cause a subsequent decrease in the number of fields and therefore decreased landscape spatial complexity with impacts on biodiversity, habitat, soil erosion, plant-pollinator interactions, and impacts on the diffusion of herbicides, pesticides, disease pathogens, and pests. The Landsat series of satellites provide the longest record of global land observations, with 30m observations available since 1982. Landsat data are used to examine contemporary field size changes in a period (1980 to 2010) when significant global agricultural changes have occurred. A multi-scale sampling approach is used to locate global hotspots of field size change by examination of a recent global agricultural yield map and literature review. Nine hotspots are selected where significant field size change is apparent and where change has been driven by technological advancements (Argentina and U.S.), abrupt societal changes (Albania and Zimbabwe), government land use and agricultural policy changes (China, Malaysia, Brazil), and/or constrained by

  13. Equivalent statistics and data interpretation.

    Science.gov (United States)

    Francis, Gregory

    2017-08-01

    Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.

  14. Detecting spatial structures in throughfall data: The effect of extent, sample size, sampling design, and variogram estimation method

    Science.gov (United States)

    Voss, Sebastian; Zimmermann, Beate; Zimmermann, Alexander

    2016-09-01

    In the last decades, an increasing number of studies analyzed spatial patterns in throughfall by means of variograms. The estimation of the variogram from sample data requires an appropriate sampling scheme: most importantly, a large sample and a layout of sampling locations that often has to serve both variogram estimation and geostatistical prediction. While some recommendations on these aspects exist, they focus on Gaussian data and high ratios of the variogram range to the extent of the study area. However, many hydrological data, and throughfall data in particular, do not follow a Gaussian distribution. In this study, we examined the effect of extent, sample size, sampling design, and calculation method on variogram estimation of throughfall data. For our investigation, we first generated non-Gaussian random fields based on throughfall data with large outliers. Subsequently, we sampled the fields with three extents (plots with edge lengths of 25 m, 50 m, and 100 m), four common sampling designs (two grid-based layouts, transect and random sampling) and five sample sizes (50, 100, 150, 200, 400). We then estimated the variogram parameters by method-of-moments (non-robust and robust estimators) and residual maximum likelihood. Our key findings are threefold. First, the choice of the extent has a substantial influence on the estimation of the variogram. A comparatively small ratio of the extent to the correlation length is beneficial for variogram estimation. Second, a combination of a minimum sample size of 150, a design that ensures the sampling of small distances and variogram estimation by residual maximum likelihood offers a good compromise between accuracy and efficiency. Third, studies relying on method-of-moments based variogram estimation may have to employ at least 200 sampling points for reliable variogram estimates. These suggested sample sizes exceed the number recommended by studies dealing with Gaussian data by up to 100 %. Given that most previous

  15. Optimum sample size to estimate mean parasite abundance in fish parasite surveys

    Directory of Open Access Journals (Sweden)

    Shvydka S.

    2018-03-01

    Full Text Available To reach ethically and scientifically valid mean abundance values in parasitological and epidemiological studies this paper considers analytic and simulation approaches for sample size determination. The sample size estimation was carried out by applying mathematical formula with predetermined precision level and parameter of the negative binomial distribution estimated from the empirical data. A simulation approach to optimum sample size determination aimed at the estimation of true value of the mean abundance and its confidence interval (CI was based on the Bag of Little Bootstraps (BLB. The abundance of two species of monogenean parasites Ligophorus cephali and L. mediterraneus from Mugil cephalus across the Azov-Black Seas localities were subjected to the analysis. The dispersion pattern of both helminth species could be characterized as a highly aggregated distribution with the variance being substantially larger than the mean abundance. The holistic approach applied here offers a wide range of appropriate methods in searching for the optimum sample size and the understanding about the expected precision level of the mean. Given the superior performance of the BLB relative to formulae with its few assumptions, the bootstrap procedure is the preferred method. Two important assessments were performed in the present study: i based on CIs width a reasonable precision level for the mean abundance in parasitological surveys of Ligophorus spp. could be chosen between 0.8 and 0.5 with 1.6 and 1x mean of the CIs width, and ii the sample size equal 80 or more host individuals allows accurate and precise estimation of mean abundance. Meanwhile for the host sample size in range between 25 and 40 individuals, the median estimates showed minimal bias but the sampling distribution skewed to the low values; a sample size of 10 host individuals yielded to unreliable estimates.

  16. Sample size for post-marketing safety studies based on historical controls.

    Science.gov (United States)

    Wu, Yu-te; Makuch, Robert W

    2010-08-01

    As part of a drug's entire life cycle, post-marketing studies are an important part in the identification of rare, serious adverse events. Recently, the US Food and Drug Administration (FDA) has begun to implement new post-marketing safety mandates as a consequence of increased emphasis on safety. The purpose of this research is to provide exact sample size formula for the proposed hybrid design, based on a two-group cohort study with incorporation of historical external data. Exact sample size formula based on the Poisson distribution is developed, because the detection of rare events is our outcome of interest. Performance of exact method is compared to its approximate large-sample theory counterpart. The proposed hybrid design requires a smaller sample size compared to the standard, two-group prospective study design. In addition, the exact method reduces the number of subjects required in the treatment group by up to 30% compared to the approximate method for the study scenarios examined. The proposed hybrid design satisfies the advantages and rationale of the two-group design with smaller sample sizes generally required. 2010 John Wiley & Sons, Ltd.

  17. Sample size computation for association studies using case–parents ...

    Indian Academy of Sciences (India)

    ple size needed to reach a given power (Knapp 1999; Schaid. 1999; Chen and Deng 2001; Brown 2004). In their seminal paper, Risch and Merikangas (1996) showed that for a mul- tiplicative mode of inheritance (MOI) for the susceptibility gene, sample size depends on two parameters: the frequency of the risk allele at the ...

  18. DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data

    Energy Technology Data Exchange (ETDEWEB)

    Harris, S.P. [Westinghouse Savannah River Company, AIKEN, SC (United States)

    1997-09-18

    This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Either new DWPF analytical method could result in a two to three fold improvement in sample analysis time.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples.The insert sample is named after the initial trials which placed the container inside the sample (peanut) vials. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock

  19. On the Sampling

    OpenAIRE

    Güleda Doğan

    2017-01-01

    This editorial is on statistical sampling, which is one of the most two important reasons for editorial rejection from our journal Turkish Librarianship. The stages of quantitative research, the stage in which we are sampling, the importance of sampling for a research, deciding on sample size and sampling methods are summarised briefly.

  20. Enhancing sampling design in mist-net bat surveys by accounting for sample size optimization

    OpenAIRE

    Trevelin, Leonardo Carreira; Novaes, Roberto Leonan Morim; Colas-Rosas, Paul François; Benathar, Thayse Cristhina Melo; Peres, Carlos A.

    2017-01-01

    The advantages of mist-netting, the main technique used in Neotropical bat community studies to date, include logistical implementation, standardization and sampling representativeness. Nonetheless, study designs still have to deal with issues of detectability related to how different species behave and use the environment. Yet there is considerable sampling heterogeneity across available studies in the literature. Here, we approach the problem of sample size optimization. We evaluated the co...

  1. Determining Sample Size for Accurate Estimation of the Squared Multiple Correlation Coefficient.

    Science.gov (United States)

    Algina, James; Olejnik, Stephen

    2000-01-01

    Discusses determining sample size for estimation of the squared multiple correlation coefficient and presents regression equations that permit determination of the sample size for estimating this parameter for up to 20 predictor variables. (SLD)

  2. Right-sizing statistical models for longitudinal data.

    Science.gov (United States)

    Wood, Phillip K; Steinley, Douglas; Jackson, Kristina M

    2015-12-01

    Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to "right-size" the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting, overly parsimonious models to more complex, better-fitting alternatives and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically underidentified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A 3-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation-covariation patterns. The orthogonal free curve slope intercept (FCSI) growth model is considered a general model that includes, as special cases, many models, including the factor mean (FM) model (McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, hierarchical linear models (HLMs), repeated-measures multivariate analysis of variance (MANOVA), and the linear slope intercept (linearSI) growth model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparing several candidate parametric growth and chronometric models in a Monte Carlo study. (c) 2015 APA, all rights reserved).

  3. A flexible method for multi-level sample size determination

    International Nuclear Information System (INIS)

    Lu, Ming-Shih; Sanborn, J.B.; Teichmann, T.

    1997-01-01

    This paper gives a flexible method to determine sample sizes for both systematic and random error models (this pertains to sampling problems in nuclear safeguard questions). In addition, the method allows different attribute rejection limits. The new method could assist achieving a higher detection probability and enhance inspection effectiveness

  4. An Introduction to Confidence Intervals for Both Statistical Estimates and Effect Sizes.

    Science.gov (United States)

    Capraro, Mary Margaret

    This paper summarizes methods of estimating confidence intervals, including classical intervals and intervals for effect sizes. The recent American Psychological Association (APA) Task Force on Statistical Inference report suggested that confidence intervals should always be reported, and the fifth edition of the APA "Publication Manual"…

  5. Sample Size Calculation for Controlling False Discovery Proportion

    Directory of Open Access Journals (Sweden)

    Shulian Shang

    2012-01-01

    Full Text Available The false discovery proportion (FDP, the proportion of incorrect rejections among all rejections, is a direct measure of abundance of false positive findings in multiple testing. Many methods have been proposed to control FDP, but they are too conservative to be useful for power analysis. Study designs for controlling the mean of FDP, which is false discovery rate, have been commonly used. However, there has been little attempt to design study with direct FDP control to achieve certain level of efficiency. We provide a sample size calculation method using the variance formula of the FDP under weak-dependence assumptions to achieve the desired overall power. The relationship between design parameters and sample size is explored. The adequacy of the procedure is assessed by simulation. We illustrate the method using estimated correlations from a prostate cancer dataset.

  6. Statistics Clinic

    Science.gov (United States)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  7. Sampling stored product insect pests: a comparison of four statistical sampling models for probability of pest detection

    Science.gov (United States)

    Statistically robust sampling strategies form an integral component of grain storage and handling activities throughout the world. Developing sampling strategies to target biological pests such as insects in stored grain is inherently difficult due to species biology and behavioral characteristics. ...

  8. Improving Statistics Education through Simulations: The Case of the Sampling Distribution.

    Science.gov (United States)

    Earley, Mark A.

    This paper presents a summary of action research investigating statistics students' understandings of the sampling distribution of the mean. With four sections of an introductory Statistics in Education course (n=98 students), a computer simulation activity (R. delMas, J. Garfield, and B. Chance, 1999) was implemented and evaluated to show…

  9. Rock sampling. [method for controlling particle size distribution

    Science.gov (United States)

    Blum, P. (Inventor)

    1971-01-01

    A method for sampling rock and other brittle materials and for controlling resultant particle sizes is described. The method involves cutting grooves in the rock surface to provide a grouping of parallel ridges and subsequently machining the ridges to provide a powder specimen. The machining step may comprise milling, drilling, lathe cutting or the like; but a planing step is advantageous. Control of the particle size distribution is effected primarily by changing the height and width of these ridges. This control exceeds that obtainable by conventional grinding.

  10. Comparing Simulated and Theoretical Sampling Distributions of the U3 Person-Fit Statistic.

    Science.gov (United States)

    Emons, Wilco H. M.; Meijer, Rob R.; Sijtsma, Klaas

    2002-01-01

    Studied whether the theoretical sampling distribution of the U3 person-fit statistic is in agreement with the simulated sampling distribution under different item response theory models and varying item and test characteristics. Simulation results suggest that the use of standard normal deviates for the standardized version of the U3 statistic may…

  11. Effects of sample size on the second magnetization peak in ...

    Indian Academy of Sciences (India)

    8+ crystals are observed at low temperatures, above the temperature where the SMP totally disappears. In particular, the onset of the SMP shifts to lower fields as the sample size decreases - a result that could be interpreted as a size effect in ...

  12. Sample size for estimation of the Pearson correlation coefficient in cherry tomato tests

    Directory of Open Access Journals (Sweden)

    Bruno Giacomini Sari

    2017-09-01

    Full Text Available ABSTRACT: The aim of this study was to determine the required sample size for estimation of the Pearson coefficient of correlation between cherry tomato variables. Two uniformity tests were set up in a protected environment in the spring/summer of 2014. The observed variables in each plant were mean fruit length, mean fruit width, mean fruit weight, number of bunches, number of fruits per bunch, number of fruits, and total weight of fruits, with calculation of the Pearson correlation matrix between them. Sixty eight sample sizes were planned for one greenhouse and 48 for another, with the initial sample size of 10 plants, and the others were obtained by adding five plants. For each planned sample size, 3000 estimates of the Pearson correlation coefficient were obtained through bootstrap re-samplings with replacement. The sample size for each correlation coefficient was determined when the 95% confidence interval amplitude value was less than or equal to 0.4. Obtaining estimates of the Pearson correlation coefficient with high precision is difficult for parameters with a weak linear relation. Accordingly, a larger sample size is necessary to estimate them. Linear relations involving variables dealing with size and number of fruits per plant have less precision. To estimate the coefficient of correlation between productivity variables of cherry tomato, with a confidence interval of 95% equal to 0.4, it is necessary to sample 275 plants in a 250m² greenhouse, and 200 plants in a 200m² greenhouse.

  13. Sampling Errors in Monthly Rainfall Totals for TRMM and SSM/I, Based on Statistics of Retrieved Rain Rates and Simple Models

    Science.gov (United States)

    Bell, Thomas L.; Kundu, Prasun K.; Einaudi, Franco (Technical Monitor)

    2000-01-01

    Estimates from TRMM satellite data of monthly total rainfall over an area are subject to substantial sampling errors due to the limited number of visits to the area by the satellite during the month. Quantitative comparisons of TRMM averages with data collected by other satellites and by ground-based systems require some estimate of the size of this sampling error. A method of estimating this sampling error based on the actual statistics of the TRMM observations and on some modeling work has been developed. "Sampling error" in TRMM monthly averages is defined here relative to the monthly total a hypothetical satellite permanently stationed above the area would have reported. "Sampling error" therefore includes contributions from the random and systematic errors introduced by the satellite remote sensing system. As part of our long-term goal of providing error estimates for each grid point accessible to the TRMM instruments, sampling error estimates for TRMM based on rain retrievals from TRMM microwave (TMI) data are compared for different times of the year and different oceanic areas (to minimize changes in the statistics due to algorithmic differences over land and ocean). Changes in sampling error estimates due to changes in rain statistics due 1) to evolution of the official algorithms used to process the data, and 2) differences from other remote sensing systems such as the Defense Meteorological Satellite Program (DMSP) Special Sensor Microwave/Imager (SSM/I), are analyzed.

  14. Experimental uncertainty estimation and statistics for data having interval uncertainty.

    Energy Technology Data Exchange (ETDEWEB)

    Kreinovich, Vladik (Applied Biomathematics, Setauket, New York); Oberkampf, William Louis (Applied Biomathematics, Setauket, New York); Ginzburg, Lev (Applied Biomathematics, Setauket, New York); Ferson, Scott (Applied Biomathematics, Setauket, New York); Hajagos, Janos (Applied Biomathematics, Setauket, New York)

    2007-05-01

    This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.

  15. Effect of the absolute statistic on gene-sampling gene-set analysis methods.

    Science.gov (United States)

    Nam, Dougu

    2017-06-01

    Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.

  16. When one size does not fit all: a simple statistical method to deal with across-individual variations of effects.

    Science.gov (United States)

    Vindras, Philippe; Desmurget, Michel; Baraduc, Pierre

    2012-01-01

    In science, it is a common experience to discover that although the investigated effect is very clear in some individuals, statistical tests are not significant because the effect is null or even opposite in other individuals. Indeed, t-tests, Anovas and linear regressions compare the average effect with respect to its inter-individual variability, so that they can fail to evidence a factor that has a high effect in many individuals (with respect to the intra-individual variability). In such paradoxical situations, statistical tools are at odds with the researcher's aim to uncover any factor that affects individual behavior, and not only those with stereotypical effects. In order to go beyond the reductive and sometimes illusory description of the average behavior, we propose a simple statistical method: applying a Kolmogorov-Smirnov test to assess whether the distribution of p-values provided by individual tests is significantly biased towards zero. Using Monte-Carlo studies, we assess the power of this two-step procedure with respect to RM Anova and multilevel mixed-effect analyses, and probe its robustness when individual data violate the assumption of normality and homoscedasticity. We find that the method is powerful and robust even with small sample sizes for which multilevel methods reach their limits. In contrast to existing methods for combining p-values, the Kolmogorov-Smirnov test has unique resistance to outlier individuals: it cannot yield significance based on a high effect in one or two exceptional individuals, which allows drawing valid population inferences. The simplicity and ease of use of our method facilitates the identification of factors that would otherwise be overlooked because they affect individual behavior in significant but variable ways, and its power and reliability with small sample sizes (<30-50 individuals) suggest it as a tool of choice in exploratory studies.

  17. When one size does not fit all: a simple statistical method to deal with across-individual variations of effects.

    Directory of Open Access Journals (Sweden)

    Philippe Vindras

    Full Text Available In science, it is a common experience to discover that although the investigated effect is very clear in some individuals, statistical tests are not significant because the effect is null or even opposite in other individuals. Indeed, t-tests, Anovas and linear regressions compare the average effect with respect to its inter-individual variability, so that they can fail to evidence a factor that has a high effect in many individuals (with respect to the intra-individual variability. In such paradoxical situations, statistical tools are at odds with the researcher's aim to uncover any factor that affects individual behavior, and not only those with stereotypical effects. In order to go beyond the reductive and sometimes illusory description of the average behavior, we propose a simple statistical method: applying a Kolmogorov-Smirnov test to assess whether the distribution of p-values provided by individual tests is significantly biased towards zero. Using Monte-Carlo studies, we assess the power of this two-step procedure with respect to RM Anova and multilevel mixed-effect analyses, and probe its robustness when individual data violate the assumption of normality and homoscedasticity. We find that the method is powerful and robust even with small sample sizes for which multilevel methods reach their limits. In contrast to existing methods for combining p-values, the Kolmogorov-Smirnov test has unique resistance to outlier individuals: it cannot yield significance based on a high effect in one or two exceptional individuals, which allows drawing valid population inferences. The simplicity and ease of use of our method facilitates the identification of factors that would otherwise be overlooked because they affect individual behavior in significant but variable ways, and its power and reliability with small sample sizes (<30-50 individuals suggest it as a tool of choice in exploratory studies.

  18. Test of methods for retrospective activity size distribution determination from filter samples

    International Nuclear Information System (INIS)

    Meisenberg, Oliver; Tschiersch, Jochen

    2015-01-01

    Determining the activity size distribution of radioactive aerosol particles requires sophisticated and heavy equipment, which makes measurements at large number of sites difficult and expensive. Therefore three methods for a retrospective determination of size distributions from aerosol filter samples in the laboratory were tested for their applicability. Extraction into a carrier liquid with subsequent nebulisation showed size distributions with a slight but correctable bias towards larger diameters compared with the original size distribution. Yields in the order of magnitude of 1% could be achieved. Sonication-assisted extraction into a carrier liquid caused a coagulation mode to appear in the size distribution. Sonication-assisted extraction into the air did not show acceptable results due to small yields. The method of extraction into a carrier liquid without sonication was applied to aerosol samples from Chernobyl in order to calculate inhalation dose coefficients for 137 Cs based on the individual size distribution. The effective dose coefficient is about half of that calculated with a default reference size distribution. - Highlights: • Activity size distributions can be recovered after aerosol sampling on filters. • Extraction into a carrier liquid and subsequent nebulisation is appropriate. • This facilitates the determination of activity size distributions for individuals. • Size distributions from this method can be used for individual dose coefficients. • Dose coefficients were calculated for the workers at the new Chernobyl shelter

  19. Two to five repeated measurements per patient reduced the required sample size considerably in a randomized clinical trial for patients with inflammatory rheumatic diseases

    Directory of Open Access Journals (Sweden)

    Smedslund Geir

    2013-02-01

    Full Text Available Abstract Background Patient reported outcomes are accepted as important outcome measures in rheumatology. The fluctuating symptoms in patients with rheumatic diseases have serious implications for sample size in clinical trials. We estimated the effects of measuring the outcome 1-5 times on the sample size required in a two-armed trial. Findings In a randomized controlled trial that evaluated the effects of a mindfulness-based group intervention for patients with inflammatory arthritis (n=71, the outcome variables Numerical Rating Scales (NRS (pain, fatigue, disease activity, self-care ability, and emotional wellbeing and General Health Questionnaire (GHQ-20 were measured five times before and after the intervention. For each variable we calculated the necessary sample sizes for obtaining 80% power (α=.05 for one up to five measurements. Two, three, and four measures reduced the required sample sizes by 15%, 21%, and 24%, respectively. With three (and five measures, the required sample size per group was reduced from 56 to 39 (32 for the GHQ-20, from 71 to 60 (55 for pain, 96 to 71 (73 for fatigue, 57 to 51 (48 for disease activity, 59 to 44 (45 for self-care, and 47 to 37 (33 for emotional wellbeing. Conclusions Measuring the outcomes five times rather than once reduced the necessary sample size by an average of 27%. When planning a study, researchers should carefully compare the advantages and disadvantages of increasing sample size versus employing three to five repeated measurements in order to obtain the required statistical power.

  20. Statistical concepts a second course

    CERN Document Server

    Lomax, Richard G

    2012-01-01

    Statistical Concepts consists of the last 9 chapters of An Introduction to Statistical Concepts, 3rd ed. Designed for the second course in statistics, it is one of the few texts that focuses just on intermediate statistics. The book highlights how statistics work and what they mean to better prepare students to analyze their own data and interpret SPSS and research results. As such it offers more coverage of non-parametric procedures used when standard assumptions are violated since these methods are more frequently encountered when working with real data. Determining appropriate sample sizes

  1. Reliable calculation in probabilistic logic: Accounting for small sample size and model uncertainty

    Energy Technology Data Exchange (ETDEWEB)

    Ferson, S. [Applied Biomathematics, Setauket, NY (United States)

    1996-12-31

    A variety of practical computational problems arise in risk and safety assessments, forensic statistics and decision analyses in which the probability of some event or proposition E is to be estimated from the probabilities of a finite list of related subevents or propositions F,G,H,.... In practice, the analyst`s knowledge may be incomplete in two ways. First, the probabilities of the subevents may be imprecisely known from statistical estimations, perhaps based on very small sample sizes. Second, relationships among the subevents may be known imprecisely. For instance, there may be only limited information about their stochastic dependencies. Representing probability estimates as interval ranges on has been suggested as a way to address the first source of imprecision. A suite of AND, OR and NOT operators defined with reference to the classical Frochet inequalities permit these probability intervals to be used in calculations that address the second source of imprecision, in many cases, in a best possible way. Using statistical confidence intervals as inputs unravels the closure properties of this approach however, requiring that probability estimates be characterized by a nested stack of intervals for all possible levels of statistical confidence, from a point estimate (0% confidence) to the entire unit interval (100% confidence). The corresponding logical operations implied by convolutive application of the logical operators for every possible pair of confidence intervals reduces by symmetry to a manageably simple level-wise iteration. The resulting calculus can be implemented in software that allows users to compute comprehensive and often level-wise best possible bounds on probabilities for logical functions of events.

  2. An Analytic Solution to the Computation of Power and Sample Size for Genetic Association Studies under a Pleiotropic Mode of Inheritance.

    Science.gov (United States)

    Gordon, Derek; Londono, Douglas; Patel, Payal; Kim, Wonkuk; Finch, Stephen J; Heiman, Gary A

    2016-01-01

    Our motivation here is to calculate the power of 3 statistical tests used when there are genetic traits that operate under a pleiotropic mode of inheritance and when qualitative phenotypes are defined by use of thresholds for the multiple quantitative phenotypes. Specifically, we formulate a multivariate function that provides the probability that an individual has a vector of specific quantitative trait values conditional on having a risk locus genotype, and we apply thresholds to define qualitative phenotypes (affected, unaffected) and compute penetrances and conditional genotype frequencies based on the multivariate function. We extend the analytic power and minimum-sample-size-necessary (MSSN) formulas for 2 categorical data-based tests (genotype, linear trend test [LTT]) of genetic association to the pleiotropic model. We further compare the MSSN of the genotype test and the LTT with that of a multivariate ANOVA (Pillai). We approximate the MSSN for statistics by linear models using a factorial design and ANOVA. With ANOVA decomposition, we determine which factors most significantly change the power/MSSN for all statistics. Finally, we determine which test statistics have the smallest MSSN. In this work, MSSN calculations are for 2 traits (bivariate distributions) only (for illustrative purposes). We note that the calculations may be extended to address any number of traits. Our key findings are that the genotype test usually has lower MSSN requirements than the LTT. More inclusive thresholds (top/bottom 25% vs. top/bottom 10%) have higher sample size requirements. The Pillai test has a much larger MSSN than both the genotype test and the LTT, as a result of sample selection. With these formulas, researchers can specify how many subjects they must collect to localize genes for pleiotropic phenotypes. © 2017 S. Karger AG, Basel.

  3. Influence of Sample Size on Automatic Positional Accuracy Assessment Methods for Urban Areas

    Directory of Open Access Journals (Sweden)

    Francisco J. Ariza-López

    2018-05-01

    Full Text Available In recent years, new approaches aimed to increase the automation level of positional accuracy assessment processes for spatial data have been developed. However, in such cases, an aspect as significant as sample size has not yet been addressed. In this paper, we study the influence of sample size when estimating the planimetric positional accuracy of urban databases by means of an automatic assessment using polygon-based methodology. Our study is based on a simulation process, which extracts pairs of homologous polygons from the assessed and reference data sources and applies two buffer-based methods. The parameter used for determining the different sizes (which range from 5 km up to 100 km has been the length of the polygons’ perimeter, and for each sample size 1000 simulations were run. After completing the simulation process, the comparisons between the estimated distribution functions for each sample and population distribution function were carried out by means of the Kolmogorov–Smirnov test. Results show a significant reduction in the variability of estimations when sample size increased from 5 km to 100 km.

  4. Sample size determinations for group-based randomized clinical trials with different levels of data hierarchy between experimental and control arms.

    Science.gov (United States)

    Heo, Moonseong; Litwin, Alain H; Blackstock, Oni; Kim, Namhee; Arnsten, Julia H

    2017-02-01

    We derived sample size formulae for detecting main effects in group-based randomized clinical trials with different levels of data hierarchy between experimental and control arms. Such designs are necessary when experimental interventions need to be administered to groups of subjects whereas control conditions need to be administered to individual subjects. This type of trial, often referred to as a partially nested or partially clustered design, has been implemented for management of chronic diseases such as diabetes and is beginning to emerge more commonly in wider clinical settings. Depending on the research setting, the level of hierarchy of data structure for the experimental arm can be three or two, whereas that for the control arm is two or one. Such different levels of data hierarchy assume correlation structures of outcomes that are different between arms, regardless of whether research settings require two or three level data structure for the experimental arm. Therefore, the different correlations should be taken into account for statistical modeling and for sample size determinations. To this end, we considered mixed-effects linear models with different correlation structures between experimental and control arms to theoretically derive and empirically validate the sample size formulae with simulation studies.

  5. Statistical properties of the normalized ice particle size distribution

    Science.gov (United States)

    Delanoë, Julien; Protat, Alain; Testud, Jacques; Bouniol, Dominique; Heymsfield, A. J.; Bansemer, A.; Brown, P. R. A.; Forbes, R. M.

    2005-05-01

    Testud et al. (2001) have recently developed a formalism, known as the "normalized particle size distribution (PSD)", which consists in scaling the diameter and concentration axes in such a way that the normalized PSDs are independent of water content and mean volume-weighted diameter. In this paper we investigate the statistical properties of the normalized PSD for the particular case of ice clouds, which are known to play a crucial role in the Earth's radiation balance. To do so, an extensive database of airborne in situ microphysical measurements has been constructed. A remarkable stability in shape of the normalized PSD is obtained. The impact of using a single analytical shape to represent all PSDs in the database is estimated through an error analysis on the instrumental (radar reflectivity and attenuation) and cloud (ice water content, effective radius, terminal fall velocity of ice crystals, visible extinction) properties. This resulted in a roughly unbiased estimate of the instrumental and cloud parameters, with small standard deviations ranging from 5 to 12%. This error is found to be roughly independent of the temperature range. This stability in shape and its single analytical approximation implies that two parameters are now sufficient to describe any normalized PSD in ice clouds: the intercept parameter N*0 and the mean volume-weighted diameter Dm. Statistical relationships (parameterizations) between N*0 and Dm have then been evaluated in order to reduce again the number of unknowns. It has been shown that a parameterization of N*0 and Dm by temperature could not be envisaged to retrieve the cloud parameters. Nevertheless, Dm-T and mean maximum dimension diameter -T parameterizations have been derived and compared to the parameterization of Kristjánsson et al. (2000) currently used to characterize particle size in climate models. The new parameterization generally produces larger particle sizes at any temperature than the Kristjánsson et al. (2000

  6. On Using a Pilot Sample Variance for Sample Size Determination in the Detection of Differences between Two Means: Power Consideration

    Science.gov (United States)

    Shieh, Gwowen

    2013-01-01

    The a priori determination of a proper sample size necessary to achieve some specified power is an important problem encountered frequently in practical studies. To establish the needed sample size for a two-sample "t" test, researchers may conduct the power analysis by specifying scientifically important values as the underlying population means…

  7. Acceleration statistics of finite-sized particles in turbulent flow: the role of Faxen forces

    OpenAIRE

    Calzavarini, Enrico; Volk, Romain; Bourgoin, Mickael; Leveque, Emmanuel; Pinton, Jean-Francois; Toschi, Federico

    2008-01-01

    International audience; The dynamics of particles in turbulence when the particle size is larger than the dissipative scale of the carrier flow are studied. Recent experiments have highlighted signatures of particles' finiteness on their statistical properties, namely a decrease of their acceleration variance, an increase of correlation times (at increasing the particles size) and an independence of the probability density function of the acceleration once normalized to their variance. These ...

  8. What is the optimum sample size for the study of peatland testate amoeba assemblages?

    Science.gov (United States)

    Mazei, Yuri A; Tsyganov, Andrey N; Esaulov, Anton S; Tychkov, Alexander Yu; Payne, Richard J

    2017-10-01

    Testate amoebae are widely used in ecological and palaeoecological studies of peatlands, particularly as indicators of surface wetness. To ensure data are robust and comparable it is important to consider methodological factors which may affect results. One significant question which has not been directly addressed in previous studies is how sample size (expressed here as number of Sphagnum stems) affects data quality. In three contrasting locations in a Russian peatland we extracted samples of differing size, analysed testate amoebae and calculated a number of widely-used indices: species richness, Simpson diversity, compositional dissimilarity from the largest sample and transfer function predictions of water table depth. We found that there was a trend for larger samples to contain more species across the range of commonly-used sample sizes in ecological studies. Smaller samples sometimes failed to produce counts of testate amoebae often considered minimally adequate. It seems likely that analyses based on samples of different sizes may not produce consistent data. Decisions about sample size need to reflect trade-offs between logistics, data quality, spatial resolution and the disturbance involved in sample extraction. For most common ecological applications we suggest that samples of more than eight Sphagnum stems are likely to be desirable. Copyright © 2017 Elsevier GmbH. All rights reserved.

  9. [Sample size calculation in clinical post-marketing evaluation of traditional Chinese medicine].

    Science.gov (United States)

    Fu, Yingkun; Xie, Yanming

    2011-10-01

    In recent years, as the Chinese government and people pay more attention on the post-marketing research of Chinese Medicine, part of traditional Chinese medicine breed has or is about to begin after the listing of post-marketing evaluation study. In the post-marketing evaluation design, sample size calculation plays a decisive role. It not only ensures the accuracy and reliability of post-marketing evaluation. but also assures that the intended trials will have a desired power for correctly detecting a clinically meaningful difference of different medicine under study if such a difference truly exists. Up to now, there is no systemic method of sample size calculation in view of the traditional Chinese medicine. In this paper, according to the basic method of sample size calculation and the characteristic of the traditional Chinese medicine clinical evaluation, the sample size calculation methods of the Chinese medicine efficacy and safety are discussed respectively. We hope the paper would be beneficial to medical researchers, and pharmaceutical scientists who are engaged in the areas of Chinese medicine research.

  10. Effects of sample size on estimates of population growth rates calculated with matrix models.

    Directory of Open Access Journals (Sweden)

    Ian J Fiske

    Full Text Available BACKGROUND: Matrix models are widely used to study the dynamics and demography of populations. An important but overlooked issue is how the number of individuals sampled influences estimates of the population growth rate (lambda calculated with matrix models. Even unbiased estimates of vital rates do not ensure unbiased estimates of lambda-Jensen's Inequality implies that even when the estimates of the vital rates are accurate, small sample sizes lead to biased estimates of lambda due to increased sampling variance. We investigated if sampling variability and the distribution of sampling effort among size classes lead to biases in estimates of lambda. METHODOLOGY/PRINCIPAL FINDINGS: Using data from a long-term field study of plant demography, we simulated the effects of sampling variance by drawing vital rates and calculating lambda for increasingly larger populations drawn from a total population of 3842 plants. We then compared these estimates of lambda with those based on the entire population and calculated the resulting bias. Finally, we conducted a review of the literature to determine the sample sizes typically used when parameterizing matrix models used to study plant demography. CONCLUSIONS/SIGNIFICANCE: We found significant bias at small sample sizes when survival was low (survival = 0.5, and that sampling with a more-realistic inverse J-shaped population structure exacerbated this bias. However our simulations also demonstrate that these biases rapidly become negligible with increasing sample sizes or as survival increases. For many of the sample sizes used in demographic studies, matrix models are probably robust to the biases resulting from sampling variance of vital rates. However, this conclusion may depend on the structure of populations or the distribution of sampling effort in ways that are unexplored. We suggest more intensive sampling of populations when individual survival is low and greater sampling of stages with high

  11. Effects of sample size on estimates of population growth rates calculated with matrix models.

    Science.gov (United States)

    Fiske, Ian J; Bruna, Emilio M; Bolker, Benjamin M

    2008-08-28

    Matrix models are widely used to study the dynamics and demography of populations. An important but overlooked issue is how the number of individuals sampled influences estimates of the population growth rate (lambda) calculated with matrix models. Even unbiased estimates of vital rates do not ensure unbiased estimates of lambda-Jensen's Inequality implies that even when the estimates of the vital rates are accurate, small sample sizes lead to biased estimates of lambda due to increased sampling variance. We investigated if sampling variability and the distribution of sampling effort among size classes lead to biases in estimates of lambda. Using data from a long-term field study of plant demography, we simulated the effects of sampling variance by drawing vital rates and calculating lambda for increasingly larger populations drawn from a total population of 3842 plants. We then compared these estimates of lambda with those based on the entire population and calculated the resulting bias. Finally, we conducted a review of the literature to determine the sample sizes typically used when parameterizing matrix models used to study plant demography. We found significant bias at small sample sizes when survival was low (survival = 0.5), and that sampling with a more-realistic inverse J-shaped population structure exacerbated this bias. However our simulations also demonstrate that these biases rapidly become negligible with increasing sample sizes or as survival increases. For many of the sample sizes used in demographic studies, matrix models are probably robust to the biases resulting from sampling variance of vital rates. However, this conclusion may depend on the structure of populations or the distribution of sampling effort in ways that are unexplored. We suggest more intensive sampling of populations when individual survival is low and greater sampling of stages with high elasticities.

  12. Determining sample size for assessing species composition in ...

    African Journals Online (AJOL)

    Species composition is measured in grasslands for a variety of reasons. Commonly, observations are made using the wheel-point apparatus, but the problem of determining optimum sample size has not yet been satisfactorily resolved. In this study the wheel-point apparatus was used to record 2 000 observations in each of ...

  13. Sampling the Mouse Hippocampal Dentate Gyrus

    Directory of Open Access Journals (Sweden)

    Lisa Basler

    2017-12-01

    Full Text Available Sampling is a critical step in procedures that generate quantitative morphological data in the neurosciences. Samples need to be representative to allow statistical evaluations, and samples need to deliver a precision that makes statistical evaluations not only possible but also meaningful. Sampling generated variability should, e.g., not be able to hide significant group differences from statistical detection if they are present. Estimators of the coefficient of error (CE have been developed to provide tentative answers to the question if sampling has been “good enough” to provide meaningful statistical outcomes. We tested the performance of the commonly used Gundersen-Jensen CE estimator, using the layers of the mouse hippocampal dentate gyrus as an example (molecular layer, granule cell layer and hilus. We found that this estimator provided useful estimates of the precision that can be expected from samples of different sizes. For all layers, we found that a smoothness factor (m of 0 generally provided better estimates than an m of 1. Only for the combined layers, i.e., the entire dentate gyrus, better CE estimates could be obtained using an m of 1. The orientation of the sections impacted on CE sizes. Frontal (coronal sections are typically most efficient by providing the smallest CEs for a given amount of work. Applying the estimator to 3D-reconstructed layers and using very intense sampling, we observed CE size plots with m = 0 to m = 1 transitions that should also be expected but are not often observed in real section series. The data we present also allows the reader to approximate the sampling intervals in frontal, horizontal or sagittal sections that provide CEs of specified sizes for the layers of the mouse dentate gyrus.

  14. Using Patient Demographics and Statistical Modeling to Predict Knee Tibia Component Sizing in Total Knee Arthroplasty.

    Science.gov (United States)

    Ren, Anna N; Neher, Robert E; Bell, Tyler; Grimm, James

    2018-06-01

    Preoperative planning is important to achieve successful implantation in primary total knee arthroplasty (TKA). However, traditional TKA templating techniques are not accurate enough to predict the component size to a very close range. With the goal of developing a general predictive statistical model using patient demographic information, ordinal logistic regression was applied to build a proportional odds model to predict the tibia component size. The study retrospectively collected the data of 1992 primary Persona Knee System TKA procedures. Of them, 199 procedures were randomly selected as testing data and the rest of the data were randomly partitioned between model training data and model evaluation data with a ratio of 7:3. Different models were trained and evaluated on the training and validation data sets after data exploration. The final model had patient gender, age, weight, and height as independent variables and predicted the tibia size within 1 size difference 96% of the time on the validation data, 94% of the time on the testing data, and 92% on a prospective cadaver data set. The study results indicated the statistical model built by ordinal logistic regression can increase the accuracy of tibia sizing information for Persona Knee preoperative templating. This research shows statistical modeling may be used with radiographs to dramatically enhance the templating accuracy, efficiency, and quality. In general, this methodology can be applied to other TKA products when the data are applicable. Copyright © 2018 Elsevier Inc. All rights reserved.

  15. Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression.

    Science.gov (United States)

    Candel, Math J J M; Van Breukelen, Gerard J P

    2010-06-30

    Adjustments of sample size formulas are given for varying cluster sizes in cluster randomized trials with a binary outcome when testing the treatment effect with mixed effects logistic regression using second-order penalized quasi-likelihood estimation (PQL). Starting from first-order marginal quasi-likelihood (MQL) estimation of the treatment effect, the asymptotic relative efficiency of unequal versus equal cluster sizes is derived. A Monte Carlo simulation study shows this asymptotic relative efficiency to be rather accurate for realistic sample sizes, when employing second-order PQL. An approximate, simpler formula is presented to estimate the efficiency loss due to varying cluster sizes when planning a trial. In many cases sampling 14 per cent more clusters is sufficient to repair the efficiency loss due to varying cluster sizes. Since current closed-form formulas for sample size calculation are based on first-order MQL, planning a trial also requires a conversion factor to obtain the variance of the second-order PQL estimator. In a second Monte Carlo study, this conversion factor turned out to be 1.25 at most. (c) 2010 John Wiley & Sons, Ltd.

  16. The effect of clustering on lot quality assurance sampling: a probabilistic model to calculate sample sizes for quality assessments.

    Science.gov (United States)

    Hedt-Gauthier, Bethany L; Mitsunaga, Tisha; Hund, Lauren; Olives, Casey; Pagano, Marcello

    2013-10-26

    Traditional Lot Quality Assurance Sampling (LQAS) designs assume observations are collected using simple random sampling. Alternatively, randomly sampling clusters of observations and then individuals within clusters reduces costs but decreases the precision of the classifications. In this paper, we develop a general framework for designing the cluster(C)-LQAS system and illustrate the method with the design of data quality assessments for the community health worker program in Rwanda. To determine sample size and decision rules for C-LQAS, we use the beta-binomial distribution to account for inflated risk of errors introduced by sampling clusters at the first stage. We present general theory and code for sample size calculations.The C-LQAS sample sizes provided in this paper constrain misclassification risks below user-specified limits. Multiple C-LQAS systems meet the specified risk requirements, but numerous considerations, including per-cluster versus per-individual sampling costs, help identify optimal systems for distinct applications. We show the utility of C-LQAS for data quality assessments, but the method generalizes to numerous applications. This paper provides the necessary technical detail and supplemental code to support the design of C-LQAS for specific programs.

  17. Mapping cell populations in flow cytometry data for cross‐sample comparison using the Friedman–Rafsky test statistic as a distance measure

    Science.gov (United States)

    Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu

    2015-01-01

    Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing

  18. Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure.

    Science.gov (United States)

    Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H

    2016-01-01

    Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell

  19. Does increasing the size of bi-weekly samples of records influence results when using the Global Trigger Tool? An observational study of retrospective record reviews of two different sample sizes.

    Science.gov (United States)

    Mevik, Kjersti; Griffin, Frances A; Hansen, Tonje E; Deilkås, Ellen T; Vonen, Barthold

    2016-04-25

    To investigate the impact of increasing sample of records reviewed bi-weekly with the Global Trigger Tool method to identify adverse events in hospitalised patients. Retrospective observational study. A Norwegian 524-bed general hospital trust. 1920 medical records selected from 1 January to 31 December 2010. Rate, type and severity of adverse events identified in two different samples sizes of records selected as 10 and 70 records, bi-weekly. In the large sample, 1.45 (95% CI 1.07 to 1.97) times more adverse events per 1000 patient days (39.3 adverse events/1000 patient days) were identified than in the small sample (27.2 adverse events/1000 patient days). Hospital-acquired infections were the most common category of adverse events in both the samples, and the distributions of the other categories of adverse events did not differ significantly between the samples. The distribution of severity level of adverse events did not differ between the samples. The findings suggest that while the distribution of categories and severity are not dependent on the sample size, the rate of adverse events is. Further studies are needed to conclude if the optimal sample size may need to be adjusted based on the hospital size in order to detect a more accurate rate of adverse events. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  20. STATISTICAL EVALUATION OF SMALL SCALE MIXING DEMONSTRATION SAMPLING AND BATCH TRANSFER PERFORMANCE - 12093

    Energy Technology Data Exchange (ETDEWEB)

    GREER DA; THIEN MG

    2012-01-12

    The ability to effectively mix, sample, certify, and deliver consistent batches of High Level Waste (HLW) feed from the Hanford Double Shell Tanks (DST) to the Waste Treatment and Immobilization Plant (WTP) presents a significant mission risk with potential to impact mission length and the quantity of HLW glass produced. DOE's Tank Operations Contractor, Washington River Protection Solutions (WRPS) has previously presented the results of mixing performance in two different sizes of small scale DSTs to support scale up estimates of full scale DST mixing performance. Currently, sufficient sampling of DSTs is one of the largest programmatic risks that could prevent timely delivery of high level waste to the WTP. WRPS has performed small scale mixing and sampling demonstrations to study the ability to sufficiently sample the tanks. The statistical evaluation of the demonstration results which lead to the conclusion that the two scales of small DST are behaving similarly and that full scale performance is predictable will be presented. This work is essential to reduce the risk of requiring a new dedicated feed sampling facility and will guide future optimization work to ensure the waste feed delivery mission will be accomplished successfully. This paper will focus on the analytical data collected from mixing, sampling, and batch transfer testing from the small scale mixing demonstration tanks and how those data are being interpreted to begin to understand the relationship between samples taken prior to transfer and samples from the subsequent batches transferred. An overview of the types of data collected and examples of typical raw data will be provided. The paper will then discuss the processing and manipulation of the data which is necessary to begin evaluating sampling and batch transfer performance. This discussion will also include the evaluation of the analytical measurement capability with regard to the simulant material used in the demonstration tests. The

  1. Sample representativeness verification of the FADN CZ farm business sample

    Directory of Open Access Journals (Sweden)

    Marie Prášilová

    2011-01-01

    Full Text Available Sample representativeness verification is one of the key stages of statistical work. After having joined the European Union the Czech Republic joined also the Farm Accountancy Data Network system of the Union. This is a sample of bodies and companies doing business in agriculture. Detailed production and economic data on the results of farming business are collected from that sample annually and results for the entire population of the country´s farms are then estimated and assessed. It is important hence, that the sample be representative. Representativeness is to be assessed as to the number of farms included in the survey and also as to the degree of accordance of the measures and indices as related to the population. The paper deals with the special statistical techniques and methods of the FADN CZ sample representativeness verification including the necessary sample size statement procedure. The Czech farm population data have been obtained from the Czech Statistical Office data bank.

  2. Statistical sampling strategies

    International Nuclear Information System (INIS)

    Andres, T.H.

    1987-01-01

    Systems assessment codes use mathematical models to simulate natural and engineered systems. Probabilistic systems assessment codes carry out multiple simulations to reveal the uncertainty in values of output variables due to uncertainty in the values of the model parameters. In this paper, methods are described for sampling sets of parameter values to be used in a probabilistic systems assessment code. Three Monte Carlo parameter selection methods are discussed: simple random sampling, Latin hypercube sampling, and sampling using two-level orthogonal arrays. Three post-selection transformations are also described: truncation, importance transformation, and discretization. Advantages and disadvantages of each method are summarized

  3. Sample size calculation to externally validate scoring systems based on logistic regression models.

    Directory of Open Access Journals (Sweden)

    Antonio Palazón-Bru

    Full Text Available A sample size containing at least 100 events and 100 non-events has been suggested to validate a predictive model, regardless of the model being validated and that certain factors can influence calibration of the predictive model (discrimination, parameterization and incidence. Scoring systems based on binary logistic regression models are a specific type of predictive model.The aim of this study was to develop an algorithm to determine the sample size for validating a scoring system based on a binary logistic regression model and to apply it to a case study.The algorithm was based on bootstrap samples in which the area under the ROC curve, the observed event probabilities through smooth curves, and a measure to determine the lack of calibration (estimated calibration index were calculated. To illustrate its use for interested researchers, the algorithm was applied to a scoring system, based on a binary logistic regression model, to determine mortality in intensive care units.In the case study provided, the algorithm obtained a sample size with 69 events, which is lower than the value suggested in the literature.An algorithm is provided for finding the appropriate sample size to validate scoring systems based on binary logistic regression models. This could be applied to determine the sample size in other similar cases.

  4. Modified Distribution-Free Goodness-of-Fit Test Statistic.

    Science.gov (United States)

    Chun, So Yeon; Browne, Michael W; Shapiro, Alexander

    2018-03-01

    Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.

  5. Statistical sampling and modelling for cork oak and eucalyptus stands

    NARCIS (Netherlands)

    Paulo, M.J.

    2002-01-01

    This thesis focuses on the use of modern statistical methods to solve problems on sampling, optimal cutting time and agricultural modelling in Portuguese cork oak and eucalyptus stands. The results are contained in five chapters that have been submitted for publication

  6. Size selective isocyanate aerosols personal air sampling using porous plastic foams

    International Nuclear Information System (INIS)

    Cong Khanh Huynh; Trinh Vu Duc

    2009-01-01

    As part of a European project (SMT4-CT96-2137), various European institutions specialized in occupational hygiene (BGIA, HSL, IOM, INRS, IST, Ambiente e Lavoro) have established a program of scientific collaboration to develop one or more prototypes of European personal samplers for the collection of simultaneous three dust fractions: inhalable, thoracic and respirable. These samplers based on existing sampling heads (IOM, GSP and cassettes) use Polyurethane Plastic Foam (PUF) according to their porosity to support sampling and separator size of the particles. In this study, the authors present an original application of size selective personal air sampling using chemical impregnated PUF to perform isocyanate aerosols capturing and derivatizing in industrial spray-painting shops.

  7. An integrated approach for multi-level sample size determination

    International Nuclear Information System (INIS)

    Lu, M.S.; Teichmann, T.; Sanborn, J.B.

    1997-01-01

    Inspection procedures involving the sampling of items in a population often require steps of increasingly sensitive measurements, with correspondingly smaller sample sizes; these are referred to as multilevel sampling schemes. In the case of nuclear safeguards inspections verifying that there has been no diversion of Special Nuclear Material (SNM), these procedures have been examined often and increasingly complex algorithms have been developed to implement them. The aim in this paper is to provide an integrated approach, and, in so doing, to describe a systematic, consistent method that proceeds logically from level to level with increasing accuracy. The authors emphasize that the methods discussed are generally consistent with those presented in the references mentioned, and yield comparable results when the error models are the same. However, because of its systematic, integrated approach the proposed method elucidates the conceptual understanding of what goes on, and, in many cases, simplifies the calculations. In nuclear safeguards inspections, an important aspect of verifying nuclear items to detect any possible diversion of nuclear fissile materials is the sampling of such items at various levels of sensitivity. The first step usually is sampling by ''attributes'' involving measurements of relatively low accuracy, followed by further levels of sampling involving greater accuracy. This process is discussed in some detail in the references given; also, the nomenclature is described. Here, the authors outline a coordinated step-by-step procedure for achieving such multilevel sampling, and they develop the relationships between the accuracy of measurement and the sample size required at each stage, i.e., at the various levels. The logic of the underlying procedures is carefully elucidated; the calculations involved and their implications, are clearly described, and the process is put in a form that allows systematic generalization

  8. Statistical Study to Check the Conformity of Aggregate in Kirkuk City to Requirement of Iraqi Specification

    Directory of Open Access Journals (Sweden)

    Ammar Saleem Khazaal

    2018-01-01

    Full Text Available This research reviews a statistical study to check the conformity of aggregates (Coarse and Fine was used in Kirkuk city to the requirements of the Iraqi specifications. The data of sieve analysis (215 samples of aggregates being obtained from of National Central Construction Laboratory and Technical College Construction Laboratory in Kirkuk city have analyzed using the statistical program SAS. The results showed that 5%, 17%, and 18% of fine aggregate samples are passing sieve sizes 10 mm, 4.75 mm, and 2.36 mm, respectively, which were less than the minimum limit allowed by the Iraqi specifications for each sieve. The percentages passing sieve sizes 1.18mm, 600micrometers, and 300micrometers were more than the upper limit of specification by 5%, 20%, and 30% respectively. The samples were passing sieve sizes 1.18mm, and 600micrometers less than the minimum limit of specification by 17%, and 4%, respectively. The results showed that the deviation in a sieve size of 150 micrometers for the upper limit of the specification performs 2% of the total number of samples. For Coarse aggregate, the samples passing sieves size 37.5mm and 20mm were comforting the Iraqi specifications by 100% and 83% respectively, it has found that the samples were passing sieve sizes 10 mm was 5% was more than the higher limit of Iraqi specifications, and 27% of these samples were less than the minimum limit, whereas sample passing sieve size 5mm was 1% which is more than the upper limit of the Iraqi specification. As a result of statistical analysis of data for fine aggregate, it has found that the samples were passing sieve sizes 10 mm, 2.36 mm, 1.18 mm and 150micrometers conforming from statistical point of view the Iraqi specifications, whereas the samples were passing sieve sizes 4.75 mm, 600micrometers and 300 micrometers didn’t conform. Statistical analysis of the results of the coarse aggregates also showed that conforming to sieve sizes of 37.5 mm and 20 mm and

  9. Multiple sensitive estimation and optimal sample size allocation in the item sum technique.

    Science.gov (United States)

    Perri, Pier Francesco; Rueda García, María Del Mar; Cobo Rodríguez, Beatriz

    2018-01-01

    For surveys of sensitive issues in life sciences, statistical procedures can be used to reduce nonresponse and social desirability response bias. Both of these phenomena provoke nonsampling errors that are difficult to deal with and can seriously flaw the validity of the analyses. The item sum technique (IST) is a very recent indirect questioning method derived from the item count technique that seeks to procure more reliable responses on quantitative items than direct questioning while preserving respondents' anonymity. This article addresses two important questions concerning the IST: (i) its implementation when two or more sensitive variables are investigated and efficient estimates of their unknown population means are required; (ii) the determination of the optimal sample size to achieve minimum variance estimates. These aspects are of great relevance for survey practitioners engaged in sensitive research and, to the best of our knowledge, were not studied so far. In this article, theoretical results for multiple estimation and optimal allocation are obtained under a generic sampling design and then particularized to simple random sampling and stratified sampling designs. Theoretical considerations are integrated with a number of simulation studies based on data from two real surveys and conducted to ascertain the efficiency gain derived from optimal allocation in different situations. One of the surveys concerns cannabis consumption among university students. Our findings highlight some methodological advances that can be obtained in life sciences IST surveys when optimal allocation is achieved. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Interpreting meta-analysis according to the adequacy of sample size. An example using isoniazid chemoprophylaxis for tuberculosis in purified protein derivative negative HIV-infected individuals

    Directory of Open Access Journals (Sweden)

    Kristian Thorlund

    2010-04-01

    Full Text Available Kristian Thorlund1,2, Aranka Anema3, Edward Mills41Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada; 2The Copenhagen Trial Unit, Centre for Clinical Intervention Research, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark; 3British Columbia Centre for Excellence in HIV/AIDS, University of British Columbia, Vancouver, British Columbia, Canada; 4Faculty of Health Sciences, University of Ottawa, Ottawa, Ontario, CanadaObjective: To illustrate the utility of statistical monitoring boundaries in meta-analysis, and provide a framework in which meta-analysis can be interpreted according to the adequacy of sample size. To propose a simple method for determining how many patients need to be randomized in a future trial before a meta-analysis can be deemed conclusive.Study design and setting: Prospective meta-analysis of randomized clinical trials (RCTs that evaluated the effectiveness of isoniazid chemoprophylaxis versus placebo for preventing the incidence of tuberculosis disease among human immunodeficiency virus (HIV-positive individuals testing purified protein derivative negative. Assessment of meta-analysis precision using trial sequential analysis (TSA with LanDeMets monitoring boundaries. Sample size determination for a future trials to make the meta-analysis conclusive according to the thresholds set by the monitoring boundaries.Results: The meta-analysis included nine trials comprising 2,911 trial participants and yielded a relative risk of 0.74 (95% CI, 0.53–1.04, P = 0.082, I2 = 0%. To deem the meta-analysis conclusive according to the thresholds set by the monitoring boundaries, a future RCT would need to randomize 3,800 participants.Conclusion: Statistical monitoring boundaries provide a framework for interpreting meta-analysis according to the adequacy of sample size and project the required sample size for a future RCT to make a meta-analysis conclusive

  11. Estimation of sample size and testing power (Part 3).

    Science.gov (United States)

    Hu, Liang-ping; Bao, Xiao-lei; Guan, Xue; Zhou, Shi-guo

    2011-12-01

    This article introduces the definition and sample size estimation of three special tests (namely, non-inferiority test, equivalence test and superiority test) for qualitative data with the design of one factor with two levels having a binary response variable. Non-inferiority test refers to the research design of which the objective is to verify that the efficacy of the experimental drug is not clinically inferior to that of the positive control drug. Equivalence test refers to the research design of which the objective is to verify that the experimental drug and the control drug have clinically equivalent efficacy. Superiority test refers to the research design of which the objective is to verify that the efficacy of the experimental drug is clinically superior to that of the control drug. By specific examples, this article introduces formulas of sample size estimation for the three special tests, and their SAS realization in detail.

  12. An Improved Rank Correlation Effect Size Statistic for Single-Case Designs: Baseline Corrected Tau.

    Science.gov (United States)

    Tarlow, Kevin R

    2017-07-01

    Measuring treatment effects when an individual's pretreatment performance is improving poses a challenge for single-case experimental designs. It may be difficult to determine whether improvement is due to the treatment or due to the preexisting baseline trend. Tau- U is a popular single-case effect size statistic that purports to control for baseline trend. However, despite its strengths, Tau- U has substantial limitations: Its values are inflated and not bound between -1 and +1, it cannot be visually graphed, and its relatively weak method of trend control leads to unacceptable levels of Type I error wherein ineffective treatments appear effective. An improved effect size statistic based on rank correlation and robust regression, Baseline Corrected Tau, is proposed and field-tested with both published and simulated single-case time series. A web-based calculator for Baseline Corrected Tau is also introduced for use by single-case investigators.

  13. Species richness in soil bacterial communities: a proposed approach to overcome sample size bias.

    Science.gov (United States)

    Youssef, Noha H; Elshahed, Mostafa S

    2008-09-01

    Estimates of species richness based on 16S rRNA gene clone libraries are increasingly utilized to gauge the level of bacterial diversity within various ecosystems. However, previous studies have indicated that regardless of the utilized approach, species richness estimates obtained are dependent on the size of the analyzed clone libraries. We here propose an approach to overcome sample size bias in species richness estimates in complex microbial communities. Parametric (Maximum likelihood-based and rarefaction curve-based) and non-parametric approaches were used to estimate species richness in a library of 13,001 near full-length 16S rRNA clones derived from soil, as well as in multiple subsets of the original library. Species richness estimates obtained increased with the increase in library size. To obtain a sample size-unbiased estimate of species richness, we calculated the theoretical clone library sizes required to encounter the estimated species richness at various clone library sizes, used curve fitting to determine the theoretical clone library size required to encounter the "true" species richness, and subsequently determined the corresponding sample size-unbiased species richness value. Using this approach, sample size-unbiased estimates of 17,230, 15,571, and 33,912 were obtained for the ML-based, rarefaction curve-based, and ACE-1 estimators, respectively, compared to bias-uncorrected values of 15,009, 11,913, and 20,909.

  14. Robust functional statistics applied to Probability Density Function shape screening of sEMG data.

    Science.gov (United States)

    Boudaoud, S; Rix, H; Al Harrach, M; Marin, F

    2014-01-01

    Recent studies pointed out possible shape modifications of the Probability Density Function (PDF) of surface electromyographical (sEMG) data according to several contexts like fatigue and muscle force increase. Following this idea, criteria have been proposed to monitor these shape modifications mainly using High Order Statistics (HOS) parameters like skewness and kurtosis. In experimental conditions, these parameters are confronted with small sample size in the estimation process. This small sample size induces errors in the estimated HOS parameters restraining real-time and precise sEMG PDF shape monitoring. Recently, a functional formalism, the Core Shape Model (CSM), has been used to analyse shape modifications of PDF curves. In this work, taking inspiration from CSM method, robust functional statistics are proposed to emulate both skewness and kurtosis behaviors. These functional statistics combine both kernel density estimation and PDF shape distances to evaluate shape modifications even in presence of small sample size. Then, the proposed statistics are tested, using Monte Carlo simulations, on both normal and Log-normal PDFs that mimic observed sEMG PDF shape behavior during muscle contraction. According to the obtained results, the functional statistics seem to be more robust than HOS parameters to small sample size effect and more accurate in sEMG PDF shape screening applications.

  15. Generating Random Samples of a Given Size Using Social Security Numbers.

    Science.gov (United States)

    Erickson, Richard C.; Brauchle, Paul E.

    1984-01-01

    The purposes of this article are (1) to present a method by which social security numbers may be used to draw cluster samples of a predetermined size and (2) to describe procedures used to validate this method of drawing random samples. (JOW)

  16. Gene coexpression measures in large heterogeneous samples using count statistics.

    Science.gov (United States)

    Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan

    2014-11-18

    With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.

  17. Rational Arithmetic Mathematica Functions to Evaluate the Two-Sided One Sample K-S Cumulative Sampling Distribution

    Directory of Open Access Journals (Sweden)

    J. Randall Brown

    2007-06-01

    Full Text Available One of the most widely used goodness-of-fit tests is the two-sided one sample Kolmogorov-Smirnov (K-S test which has been implemented by many computer statistical software packages. To calculate a two-sided p value (evaluate the cumulative sampling distribution, these packages use various methods including recursion formulae, limiting distributions, and approximations of unknown accuracy developed over thirty years ago. Based on an extensive literature search for the two-sided one sample K-S test, this paper identifies an exact formula for sample sizes up to 31, six recursion formulae, and one matrix formula that can be used to calculate a p value. To ensure accurate calculation by avoiding catastrophic cancelation and eliminating rounding error, each of these formulae is implemented in rational arithmetic. For the six recursion formulae and the matrix formula, computational experience for sample sizes up to 500 shows that computational times are increasing functions of both the sample size and the number of digits in the numerator and denominator integers of the rational number test statistic. The computational times of the seven formulae vary immensely but the Durbin recursion formula is almost always the fastest. Linear search is used to calculate the inverse of the cumulative sampling distribution (find the confidence interval half-width and tables of calculated half-widths are presented for sample sizes up to 500. Using calculated half-widths as input, computational times for the fastest formula, the Durbin recursion formula, are given for sample sizes up to two thousand.

  18. SU-E-I-46: Sample-Size Dependence of Model Observers for Estimating Low-Contrast Detection Performance From CT Images

    International Nuclear Information System (INIS)

    Reiser, I; Lu, Z

    2014-01-01

    Purpose: Recently, task-based assessment of diagnostic CT systems has attracted much attention. Detection task performance can be estimated using human observers, or mathematical observer models. While most models are well established, considerable bias can be introduced when performance is estimated from a limited number of image samples. Thus, the purpose of this work was to assess the effect of sample size on bias and uncertainty of two channelized Hotelling observers and a template-matching observer. Methods: The image data used for this study consisted of 100 signal-present and 100 signal-absent regions-of-interest, which were extracted from CT slices. The experimental conditions included two signal sizes and five different x-ray beam current settings (mAs). Human observer performance for these images was determined in 2-alternative forced choice experiments. These data were provided by the Mayo clinic in Rochester, MN. Detection performance was estimated from three observer models, including channelized Hotelling observers (CHO) with Gabor or Laguerre-Gauss (LG) channels, and a template-matching observer (TM). Different sample sizes were generated by randomly selecting a subset of image pairs, (N=20,40,60,80). Observer performance was quantified as proportion of correct responses (PC). Bias was quantified as the relative difference of PC for 20 and 80 image pairs. Results: For n=100, all observer models predicted human performance across mAs and signal sizes. Bias was 23% for CHO (Gabor), 7% for CHO (LG), and 3% for TM. The relative standard deviation, σ(PC)/PC at N=20 was highest for the TM observer (11%) and lowest for the CHO (Gabor) observer (5%). Conclusion: In order to make image quality assessment feasible in the clinical practice, a statistically efficient observer model, that can predict performance from few samples, is needed. Our results identified two observer models that may be suited for this task

  19. Statistical distributions of avalanche size and waiting times in an inter-sandpile cascade model

    Science.gov (United States)

    Batac, Rene; Longjas, Anthony; Monterola, Christopher

    2012-02-01

    Sandpile-based models have successfully shed light on key features of nonlinear relaxational processes in nature, particularly the occurrence of fat-tailed magnitude distributions and exponential return times, from simple local stress redistributions. In this work, we extend the existing sandpile paradigm into an inter-sandpile cascade, wherein the avalanches emanating from a uniformly-driven sandpile (first layer) is used to trigger the next (second layer), and so on, in a successive fashion. Statistical characterizations reveal that avalanche size distributions evolve from a power-law p(S)≈S-1.3 for the first layer to gamma distributions p(S)≈Sαexp(-S/S0) for layers far away from the uniformly driven sandpile. The resulting avalanche size statistics is found to be associated with the corresponding waiting time distribution, as explained in an accompanying analytic formulation. Interestingly, both the numerical and analytic models show good agreement with actual inventories of non-uniformly driven events in nature.

  20. Statistical Methods and Tools for Hanford Staged Feed Tank Sampling

    Energy Technology Data Exchange (ETDEWEB)

    Fountain, Matthew S. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Brigantic, Robert T. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Peterson, Reid A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2013-10-01

    This report summarizes work conducted by Pacific Northwest National Laboratory to technically evaluate the current approach to staged feed sampling of high-level waste (HLW) sludge to meet waste acceptance criteria (WAC) for transfer from tank farms to the Hanford Waste Treatment and Immobilization Plant (WTP). The current sampling and analysis approach is detailed in the document titled Initial Data Quality Objectives for WTP Feed Acceptance Criteria, 24590-WTP-RPT-MGT-11-014, Revision 0 (Arakali et al. 2011). The goal of this current work is to evaluate and provide recommendations to support a defensible, technical and statistical basis for the staged feed sampling approach that meets WAC data quality objectives (DQOs).

  1. Support vector regression to predict porosity and permeability: Effect of sample size

    Science.gov (United States)

    Al-Anazi, A. F.; Gates, I. D.

    2012-02-01

    Porosity and permeability are key petrophysical parameters obtained from laboratory core analysis. Cores, obtained from drilled wells, are often few in number for most oil and gas fields. Porosity and permeability correlations based on conventional techniques such as linear regression or neural networks trained with core and geophysical logs suffer poor generalization to wells with only geophysical logs. The generalization problem of correlation models often becomes pronounced when the training sample size is small. This is attributed to the underlying assumption that conventional techniques employing the empirical risk minimization (ERM) inductive principle converge asymptotically to the true risk values as the number of samples increases. In small sample size estimation problems, the available training samples must span the complexity of the parameter space so that the model is able both to match the available training samples reasonably well and to generalize to new data. This is achieved using the structural risk minimization (SRM) inductive principle by matching the capability of the model to the available training data. One method that uses SRM is support vector regression (SVR) network. In this research, the capability of SVR to predict porosity and permeability in a heterogeneous sandstone reservoir under the effect of small sample size is evaluated. Particularly, the impact of Vapnik's ɛ-insensitivity loss function and least-modulus loss function on generalization performance was empirically investigated. The results are compared to the multilayer perception (MLP) neural network, a widely used regression method, which operates under the ERM principle. The mean square error and correlation coefficients were used to measure the quality of predictions. The results demonstrate that SVR yields consistently better predictions of the porosity and permeability with small sample size than the MLP method. Also, the performance of SVR depends on both kernel function

  2. Interpreting and Reporting Effect Sizes in Research Investigations.

    Science.gov (United States)

    Tapia, Martha; Marsh, George E., II

    Since 1994, the American Psychological Association (APA) has advocated the inclusion of effect size indices in reporting research to elucidate the statistical significance of studies based on sample size. In 2001, the fifth edition of the APA "Publication Manual" stressed the importance of including an index of effect size to clarify…

  3. The statistical-inference approach to generalized thermodynamics

    International Nuclear Information System (INIS)

    Lavenda, B.H.; Scherer, C.

    1987-01-01

    Limit theorems, such as the central-limit theorem and the weak law of large numbers, are applicable to statistical thermodynamics for sufficiently large sample size of indipendent and identically distributed observations performed on extensive thermodynamic (chance) variables. The estimation of the intensive thermodynamic quantities is a problem in parametric statistical estimation. The normal approximation to the Gibbs' distribution is justified by the analysis of large deviations. Statistical thermodynamics is generalized to include the statistical estimation of variance as well as mean values

  4. The quality of the reported sample size calculations in randomized controlled trials indexed in PubMed.

    Science.gov (United States)

    Lee, Paul H; Tse, Andy C Y

    2017-05-01

    There are limited data on the quality of reporting of information essential for replication of the calculation as well as the accuracy of the sample size calculation. We examine the current quality of reporting of the sample size calculation in randomized controlled trials (RCTs) published in PubMed and to examine the variation in reporting across study design, study characteristics, and journal impact factor. We also reviewed the targeted sample size reported in trial registries. We reviewed and analyzed all RCTs published in December 2014 with journals indexed in PubMed. The 2014 Impact Factors for the journals were used as proxies for their quality. Of the 451 analyzed papers, 58.1% reported an a priori sample size calculation. Nearly all papers provided the level of significance (97.7%) and desired power (96.6%), and most of the papers reported the minimum clinically important effect size (73.3%). The median (inter-quartile range) of the percentage difference of the reported and calculated sample size calculation was 0.0% (IQR -4.6%;3.0%). The accuracy of the reported sample size was better for studies published in journals that endorsed the CONSORT statement and journals with an impact factor. A total of 98 papers had provided targeted sample size on trial registries and about two-third of these papers (n=62) reported sample size calculation, but only 25 (40.3%) had no discrepancy with the reported number in the trial registries. The reporting of the sample size calculation in RCTs published in PubMed-indexed journals and trial registries were poor. The CONSORT statement should be more widely endorsed. Copyright © 2016 European Federation of Internal Medicine. Published by Elsevier B.V. All rights reserved.

  5. Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.

    Science.gov (United States)

    Kieffer, Kevin M.; Thompson, Bruce

    As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate…

  6. Perceived Statistical Knowledge Level and Self-Reported Statistical Practice Among Academic Psychologists

    Directory of Open Access Journals (Sweden)

    Laura Badenes-Ribera

    2018-06-01

    Full Text Available Introduction: Publications arguing against the null hypothesis significance testing (NHST procedure and in favor of good statistical practices have increased. The most frequently mentioned alternatives to NHST are effect size statistics (ES, confidence intervals (CIs, and meta-analyses. A recent survey conducted in Spain found that academic psychologists have poor knowledge about effect size statistics, confidence intervals, and graphic displays for meta-analyses, which might lead to a misinterpretation of the results. In addition, it also found that, although the use of ES is becoming generalized, the same thing is not true for CIs. Finally, academics with greater knowledge about ES statistics presented a profile closer to good statistical practice and research design. Our main purpose was to analyze the extension of these results to a different geographical area through a replication study.Methods: For this purpose, we elaborated an on-line survey that included the same items as the original research, and we asked academic psychologists to indicate their level of knowledge about ES, their CIs, and meta-analyses, and how they use them. The sample consisted of 159 Italian academic psychologists (54.09% women, mean age of 47.65 years. The mean number of years in the position of professor was 12.90 (SD = 10.21.Results: As in the original research, the results showed that, although the use of effect size estimates is becoming generalized, an under-reporting of CIs for ES persists. The most frequent ES statistics mentioned were Cohen's d and R2/η2, which can have outliers or show non-normality or violate statistical assumptions. In addition, academics showed poor knowledge about meta-analytic displays (e.g., forest plot and funnel plot and quality checklists for studies. Finally, academics with higher-level knowledge about ES statistics seem to have a profile closer to good statistical practices.Conclusions: Changing statistical practice is not

  7. Bayesian sample size determination for cost-effectiveness studies with censored data.

    Directory of Open Access Journals (Sweden)

    Daniel P Beavers

    Full Text Available Cost-effectiveness models are commonly utilized to determine the combined clinical and economic impact of one treatment compared to another. However, most methods for sample size determination of cost-effectiveness studies assume fully observed costs and effectiveness outcomes, which presents challenges for survival-based studies in which censoring exists. We propose a Bayesian method for the design and analysis of cost-effectiveness data in which costs and effectiveness may be censored, and the sample size is approximated for both power and assurance. We explore two parametric models and demonstrate the flexibility of the approach to accommodate a variety of modifications to study assumptions.

  8. A comparison of confidence/credible interval methods for the area under the ROC curve for continuous diagnostic tests with small sample size.

    Science.gov (United States)

    Feng, Dai; Cortese, Giuliana; Baumgartner, Richard

    2017-12-01

    The receiver operating characteristic (ROC) curve is frequently used as a measure of accuracy of continuous markers in diagnostic tests. The area under the ROC curve (AUC) is arguably the most widely used summary index for the ROC curve. Although the small sample size scenario is common in medical tests, a comprehensive study of small sample size properties of various methods for the construction of the confidence/credible interval (CI) for the AUC has been by and large missing in the literature. In this paper, we describe and compare 29 non-parametric and parametric methods for the construction of the CI for the AUC when the number of available observations is small. The methods considered include not only those that have been widely adopted, but also those that have been less frequently mentioned or, to our knowledge, never applied to the AUC context. To compare different methods, we carried out a simulation study with data generated from binormal models with equal and unequal variances and from exponential models with various parameters and with equal and unequal small sample sizes. We found that the larger the true AUC value and the smaller the sample size, the larger the discrepancy among the results of different approaches. When the model is correctly specified, the parametric approaches tend to outperform the non-parametric ones. Moreover, in the non-parametric domain, we found that a method based on the Mann-Whitney statistic is in general superior to the others. We further elucidate potential issues and provide possible solutions to along with general guidance on the CI construction for the AUC when the sample size is small. Finally, we illustrate the utility of different methods through real life examples.

  9. Novel joint selection methods can reduce sample size for rheumatoid arthritis clinical trials with ultrasound endpoints.

    Science.gov (United States)

    Allen, John C; Thumboo, Julian; Lye, Weng Kit; Conaghan, Philip G; Chew, Li-Ching; Tan, York Kiat

    2018-03-01

    To determine whether novel methods of selecting joints through (i) ultrasonography (individualized-ultrasound [IUS] method), or (ii) ultrasonography and clinical examination (individualized-composite-ultrasound [ICUS] method) translate into smaller rheumatoid arthritis (RA) clinical trial sample sizes when compared to existing methods utilizing predetermined joint sites for ultrasonography. Cohen's effect size (ES) was estimated (ES^) and a 95% CI (ES^L, ES^U) calculated on a mean change in 3-month total inflammatory score for each method. Corresponding 95% CIs [nL(ES^U), nU(ES^L)] were obtained on a post hoc sample size reflecting the uncertainty in ES^. Sample size calculations were based on a one-sample t-test as the patient numbers needed to provide 80% power at α = 0.05 to reject a null hypothesis H 0 : ES = 0 versus alternative hypotheses H 1 : ES = ES^, ES = ES^L and ES = ES^U. We aimed to provide point and interval estimates on projected sample sizes for future studies reflecting the uncertainty in our study ES^S. Twenty-four treated RA patients were followed up for 3 months. Utilizing the 12-joint approach and existing methods, the post hoc sample size (95% CI) was 22 (10-245). Corresponding sample sizes using ICUS and IUS were 11 (7-40) and 11 (6-38), respectively. Utilizing a seven-joint approach, the corresponding sample sizes using ICUS and IUS methods were nine (6-24) and 11 (6-35), respectively. Our pilot study suggests that sample size for RA clinical trials with ultrasound endpoints may be reduced using the novel methods, providing justification for larger studies to confirm these observations. © 2017 Asia Pacific League of Associations for Rheumatology and John Wiley & Sons Australia, Ltd.

  10. Effects of sample size on robustness and prediction accuracy of a prognostic gene signature

    Directory of Open Access Journals (Sweden)

    Kim Seon-Young

    2009-05-01

    Full Text Available Abstract Background Few overlap between independently developed gene signatures and poor inter-study applicability of gene signatures are two of major concerns raised in the development of microarray-based prognostic gene signatures. One recent study suggested that thousands of samples are needed to generate a robust prognostic gene signature. Results A data set of 1,372 samples was generated by combining eight breast cancer gene expression data sets produced using the same microarray platform and, using the data set, effects of varying samples sizes on a few performances of a prognostic gene signature were investigated. The overlap between independently developed gene signatures was increased linearly with more samples, attaining an average overlap of 16.56% with 600 samples. The concordance between predicted outcomes by different gene signatures also was increased with more samples up to 94.61% with 300 samples. The accuracy of outcome prediction also increased with more samples. Finally, analysis using only Estrogen Receptor-positive (ER+ patients attained higher prediction accuracy than using both patients, suggesting that sub-type specific analysis can lead to the development of better prognostic gene signatures Conclusion Increasing sample sizes generated a gene signature with better stability, better concordance in outcome prediction, and better prediction accuracy. However, the degree of performance improvement by the increased sample size was different between the degree of overlap and the degree of concordance in outcome prediction, suggesting that the sample size required for a study should be determined according to the specific aims of the study.

  11. Statistical Power in Plant Pathology Research.

    Science.gov (United States)

    Gent, David H; Esker, Paul D; Kriss, Alissa B

    2018-01-01

    In null hypothesis testing, failure to reject a null hypothesis may have two potential interpretations. One interpretation is that the treatments being evaluated do not have a significant effect, and a correct conclusion was reached in the analysis. Alternatively, a treatment effect may have existed but the conclusion of the study was that there was none. This is termed a Type II error, which is most likely to occur when studies lack sufficient statistical power to detect a treatment effect. In basic terms, the power of a study is the ability to identify a true effect through a statistical test. The power of a statistical test is 1 - (the probability of Type II errors), and depends on the size of treatment effect (termed the effect size), variance, sample size, and significance criterion (the probability of a Type I error, α). Low statistical power is prevalent in scientific literature in general, including plant pathology. However, power is rarely reported, creating uncertainty in the interpretation of nonsignificant results and potentially underestimating small, yet biologically significant relationships. The appropriate level of power for a study depends on the impact of Type I versus Type II errors and no single level of power is acceptable for all purposes. Nonetheless, by convention 0.8 is often considered an acceptable threshold and studies with power less than 0.5 generally should not be conducted if the results are to be conclusive. The emphasis on power analysis should be in the planning stages of an experiment. Commonly employed strategies to increase power include increasing sample sizes, selecting a less stringent threshold probability for Type I errors, increasing the hypothesized or detectable effect size, including as few treatment groups as possible, reducing measurement variability, and including relevant covariates in analyses. Power analysis will lead to more efficient use of resources and more precisely structured hypotheses, and may even

  12. Statistical sampling applied to the radiological characterization of historical waste

    Directory of Open Access Journals (Sweden)

    Zaffora Biagio

    2016-01-01

    Full Text Available The evaluation of the activity of radionuclides in radioactive waste is required for its disposal in final repositories. Easy-to-measure nuclides, like γ-emitters and high-energy X-rays, can be measured via non-destructive nuclear techniques from outside a waste package. Some radionuclides are difficult-to-measure (DTM from outside a package because they are α- or β-emitters. The present article discusses the application of linear regression, scaling factors (SF and the so-called “mean activity method” to estimate the activity of DTM nuclides on metallic waste produced at the European Organization for Nuclear Research (CERN. Various statistical sampling techniques including simple random sampling, systematic sampling, stratified and authoritative sampling are described and applied to 2 waste populations of activated copper cables. The bootstrap is introduced as a tool to estimate average activities and standard errors in waste characterization. The analysis of the DTM Ni-63 is used as an example. Experimental and theoretical values of SFs are calculated and compared. Guidelines for sampling historical waste using probabilistic and non-probabilistic sampling are finally given.

  13. IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

    Science.gov (United States)

    Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

    2017-09-15

    Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  14. Network and adaptive sampling

    CERN Document Server

    Chaudhuri, Arijit

    2014-01-01

    Combining the two statistical techniques of network sampling and adaptive sampling, this book illustrates the advantages of using them in tandem to effectively capture sparsely located elements in unknown pockets. It shows how network sampling is a reliable guide in capturing inaccessible entities through linked auxiliaries. The text also explores how adaptive sampling is strengthened in information content through subsidiary sampling with devices to mitigate unmanageable expanding sample sizes. Empirical data illustrates the applicability of both methods.

  15. Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module

    Energy Technology Data Exchange (ETDEWEB)

    Martinez, Gregory D. [University of California, Physics and Astronomy Department, Los Angeles, CA (United States); McKay, James; Scott, Pat [Imperial College London, Department of Physics, Blackett Laboratory, London (United Kingdom); Farmer, Ben; Conrad, Jan [AlbaNova University Centre, Oskar Klein Centre for Cosmoparticle Physics, Stockholm (Sweden); Stockholm University, Department of Physics, Stockholm (Sweden); Roebber, Elinore [McGill University, Department of Physics, Montreal, QC (Canada); Putze, Antje [LAPTh, Universite de Savoie, CNRS, Annecy-le-Vieux (France); Collaboration: The GAMBIT Scanner Workgroup

    2017-11-15

    We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework GAMBIT. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. We also announce the release of a new standalone differential evolution sampler, Diver, and describe its design, usage and interface to ScannerBit. We subject Diver and three other samplers (the nested sampler MultiNest, the MCMC GreAT, and the native ScannerBit implementation of the ensemble Monte Carlo algorithm T-Walk) to a battery of statistical tests. For this we use a realistic physical likelihood function, based on the scalar singlet model of dark matter. We examine the performance of each sampler as a function of its adjustable settings, and the dimensionality of the sampling problem. We evaluate performance on four metrics: optimality of the best fit found, completeness in exploring the best-fit region, number of likelihood evaluations, and total runtime. For Bayesian posterior estimation at high resolution, T-Walk provides the most accurate and timely mapping of the full parameter space. For profile likelihood analysis in less than about ten dimensions, we find that Diver and MultiNest score similarly in terms of best fit and speed, outperforming GreAT and T-Walk; in ten or more dimensions, Diver substantially outperforms the other three samplers on all metrics. (orig.)

  16. Volatile and non-volatile elements in grain-size separated samples of Apollo 17 lunar soils

    International Nuclear Information System (INIS)

    Giovanoli, R.; Gunten, H.R. von; Kraehenbuehl, U.; Meyer, G.; Wegmueller, F.; Gruetter, A.; Wyttenbach, A.

    1977-01-01

    Three samples of Apollo 17 lunar soils (75081, 72501 and 72461) were separated into 9 grain-size fractions between 540 and 1 μm mean diameter. In order to detect mineral fractionations caused during the separation procedures major elements were determined by instrumental neutron activation analyses performed on small aliquots of the separated samples. Twenty elements were measured in each size fraction using instrumental and radiochemical neutron activation techniques. The concentration of the main elements in sample 75081 does not change with the grain-size. Exceptions are Fe and Ti which decrease slightly and Al which increases slightly with the decrease in the grain-size. These changes in the composition in main elements suggest a decrease in Ilmenite and an increase in Anorthite with decreasing grain-size. However, it can be concluded that the mineral composition of the fractions changes less than a factor of 2. Samples 72501 and 72461 are not yet analyzed for the main elements. (Auth.)

  17. A modified approach to estimating sample size for simple logistic regression with one continuous covariate.

    Science.gov (United States)

    Novikov, I; Fund, N; Freedman, L S

    2010-01-15

    Different methods for the calculation of sample size for simple logistic regression (LR) with one normally distributed continuous covariate give different results. Sometimes the difference can be large. Furthermore, some methods require the user to specify the prevalence of cases when the covariate equals its population mean, rather than the more natural population prevalence. We focus on two commonly used methods and show through simulations that the power for a given sample size may differ substantially from the nominal value for one method, especially when the covariate effect is large, while the other method performs poorly if the user provides the population prevalence instead of the required parameter. We propose a modification of the method of Hsieh et al. that requires specification of the population prevalence and that employs Schouten's sample size formula for a t-test with unequal variances and group sizes. This approach appears to increase the accuracy of the sample size estimates for LR with one continuous covariate.

  18. Three-year-olds obey the sample size principle of induction: the influence of evidence presentation and sample size disparity on young children's generalizations.

    Science.gov (United States)

    Lawson, Chris A

    2014-07-01

    Three experiments with 81 3-year-olds (M=3.62years) examined the conditions that enable young children to use the sample size principle (SSP) of induction-the inductive rule that facilitates generalizations from large rather than small samples of evidence. In Experiment 1, children exhibited the SSP when exemplars were presented sequentially but not when exemplars were presented simultaneously. Results from Experiment 3 suggest that the advantage of sequential presentation is not due to the additional time to process the available input from the two samples but instead may be linked to better memory for specific individuals in the large sample. In addition, findings from Experiments 1 and 2 suggest that adherence to the SSP is mediated by the disparity between presented samples. Overall, these results reveal that the SSP appears early in development and is guided by basic cognitive processes triggered during the acquisition of input. Copyright © 2013 Elsevier Inc. All rights reserved.

  19. RHAPSODY. I. STRUCTURAL PROPERTIES AND FORMATION HISTORY FROM A STATISTICAL SAMPLE OF RE-SIMULATED CLUSTER-SIZE HALOS

    International Nuclear Information System (INIS)

    Wu, Hao-Yi; Hahn, Oliver; Wechsler, Risa H.; Mao, Yao-Yuan; Behroozi, Peter S.

    2013-01-01

    We present the first results from the RHAPSODY cluster re-simulation project: a sample of 96 'zoom-in' simulations of dark matter halos of 10 14.8±0.05 h –1 M ☉ , selected from a 1 h –3 Gpc 3 volume. This simulation suite is the first to resolve this many halos with ∼5 × 10 6 particles per halo in the cluster mass regime, allowing us to statistically characterize the distribution of and correlation between halo properties at fixed mass. We focus on the properties of the main halos and how they are affected by formation history, which we track back to z = 12, over five decades in mass. We give particular attention to the impact of the formation history on the density profiles of the halos. We find that the deviations from the Navarro-Frenk-White (NFW) model and the Einasto model depend on formation time. Late-forming halos tend to have considerable deviations from both models, partly due to the presence of massive subhalos, while early-forming halos deviate less but still significantly from the NFW model and are better described by the Einasto model. We find that the halo shapes depend only moderately on formation time. Departure from spherical symmetry impacts the density profiles through the anisotropic distribution of massive subhalos. Further evidence of the impact of subhalos is provided by analyzing the phase-space structure. A detailed analysis of the properties of the subhalo population in RHAPSODY is presented in a companion paper.

  20. Sample size methods for estimating HIV incidence from cross-sectional surveys.

    Science.gov (United States)

    Konikoff, Jacob; Brookmeyer, Ron

    2015-12-01

    Understanding HIV incidence, the rate at which new infections occur in populations, is critical for tracking and surveillance of the epidemic. In this article, we derive methods for determining sample sizes for cross-sectional surveys to estimate incidence with sufficient precision. We further show how to specify sample sizes for two successive cross-sectional surveys to detect changes in incidence with adequate power. In these surveys biomarkers such as CD4 cell count, viral load, and recently developed serological assays are used to determine which individuals are in an early disease stage of infection. The total number of individuals in this stage, divided by the number of people who are uninfected, is used to approximate the incidence rate. Our methods account for uncertainty in the durations of time spent in the biomarker defined early disease stage. We find that failure to account for this uncertainty when designing surveys can lead to imprecise estimates of incidence and underpowered studies. We evaluated our sample size methods in simulations and found that they performed well in a variety of underlying epidemics. Code for implementing our methods in R is available with this article at the Biometrics website on Wiley Online Library. © 2015, The International Biometric Society.

  1. A Self-Organizing Map-Based Approach to Generating Reduced-Size, Statistically Similar Climate Datasets

    Science.gov (United States)

    Cabell, R.; Delle Monache, L.; Alessandrini, S.; Rodriguez, L.

    2015-12-01

    Climate-based studies require large amounts of data in order to produce accurate and reliable results. Many of these studies have used 30-plus year data sets in order to produce stable and high-quality results, and as a result, many such data sets are available, generally in the form of global reanalyses. While the analysis of these data lead to high-fidelity results, its processing can be very computationally expensive. This computational burden prevents the utilization of these data sets for certain applications, e.g., when rapid response is needed in crisis management and disaster planning scenarios resulting from release of toxic material in the atmosphere. We have developed a methodology to reduce large climate datasets to more manageable sizes while retaining statistically similar results when used to produce ensembles of possible outcomes. We do this by employing a Self-Organizing Map (SOM) algorithm to analyze general patterns of meteorological fields over a regional domain of interest to produce a small set of "typical days" with which to generate the model ensemble. The SOM algorithm takes as input a set of vectors and generates a 2D map of representative vectors deemed most similar to the input set and to each other. Input predictors are selected that are correlated with the model output, which in our case is an Atmospheric Transport and Dispersion (T&D) model that is highly dependent on surface winds and boundary layer depth. To choose a subset of "typical days," each input day is assigned to its closest SOM map node vector and then ranked by distance. Each node vector is treated as a distribution and days are sampled from them by percentile. Using a 30-node SOM, with sampling every 20th percentile, we have been able to reduce 30 years of the Climate Forecast System Reanalysis (CFSR) data for the month of October to 150 "typical days." To estimate the skill of this approach, the "Measure of Effectiveness" (MOE) metric is used to compare area and overlap

  2. Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics

    Science.gov (United States)

    Pohorille, Andrew

    2006-01-01

    The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described

  3. Statistical sampling plans

    International Nuclear Information System (INIS)

    Jaech, J.L.

    1984-01-01

    In auditing and in inspection, one selects a number of items by some set of procedures and performs measurements which are compared with the operator's values. This session considers the problem of how to select the samples to be measured, and what kinds of measurements to make. In the inspection situation, the ultimate aim is to independently verify the operator's material balance. The effectiveness of the sample plan in achieving this objective is briefly considered. The discussion focuses on the model plant

  4. Comparing simulated and theoretical sampling distributions of the U3 person-fit statistic

    NARCIS (Netherlands)

    Emons, W.H.M.; Meijer, R.R.; Sijtsma, K.

    2002-01-01

    The accuracy with which the theoretical sampling distribution of van der Flier's person-.t statistic U3 approaches the empirical U3 sampling distribution is affected by the item discrimination. A simulation study showed that for tests with a moderate or a strong mean item discrimination, the Type I

  5. The Statistics of Radio Astronomical Polarimetry: Disjoint, Superposed, and Composite Samples

    Energy Technology Data Exchange (ETDEWEB)

    Straten, W. van [Centre for Astrophysics and Supercomputing, Swinburne University of Technology, Hawthorn, VIC 3122 (Australia); Tiburzi, C., E-mail: willem.van.straten@aut.ac.nz [Max-Planck-Institut für Radioastronomie, Auf dem Hügel 69, D-53121 Bonn (Germany)

    2017-02-01

    A statistical framework is presented for the study of the orthogonally polarized modes of radio pulsar emission via the covariances between the Stokes parameters. To accommodate the typically heavy-tailed distributions of single-pulse radio flux density, the fourth-order joint cumulants of the electric field are used to describe the superposition of modes with arbitrary probability distributions. The framework is used to consider the distinction between superposed and disjoint modes, with particular attention to the effects of integration over finite samples. If the interval over which the polarization state is estimated is longer than the timescale for switching between two or more disjoint modes of emission, then the modes are unresolved by the instrument. The resulting composite sample mean exhibits properties that have been attributed to mode superposition, such as depolarization. Because the distinction between disjoint modes and a composite sample of unresolved disjoint modes depends on the temporal resolution of the observing instrumentation, the arguments in favor of superposed modes of pulsar emission are revisited, and observational evidence for disjoint modes is described. In principle, the four-dimensional covariance matrix that describes the distribution of sample mean Stokes parameters can be used to distinguish between disjoint modes, superposed modes, and a composite sample of unresolved disjoint modes. More comprehensive and conclusive interpretation of the covariance matrix requires more detailed consideration of various relevant phenomena, including temporally correlated subpulse modulation (e.g., jitter), statistical dependence between modes (e.g., covariant intensities and partial coherence), and multipath propagation effects (e.g., scintillation and scattering).

  6. Subcritical crack growth law and its consequences for lifetime statistics and size effect of quasibrittle structures

    International Nuclear Information System (INIS)

    Le, Jia-Liang; Bazant, Zdenek P; Bazant, Martin Z

    2009-01-01

    For brittle failures, the probability distribution of structural strength and lifetime are known to be Weibullian, in which case the knowledge of the mean and standard deviation suffices to determine the loading or time corresponding to a tolerable failure probability such as 10 -6 . Unfortunately, this is not so for quasibrittle structures, characterized by material inhomogeneities that are not negligible compared with the structure size (as is typical, e.g. for concrete, fibre composites, tough ceramics, rocks and sea ice). For such structures, the distribution of structural strength was shown to vary from almost Gaussian to Weibullian as a function of structure size (and also shape). Here we predict the size dependence of the distribution type for the lifetime of quasibrittle structures. To derive the lifetime statistics from the strength statistics, the subcritical crack growth law is requisite. This empirical law is shown to be justified by fracture mechanics of random crack jumps in the atomic lattice and the condition of equality of the energy dissipation rates calculated on the nano-scale and the macro-scale. The size effect on the lifetime is found to be much stronger than that on the structural strength. The theory is shown to match the experimentally observed systematic deviations of lifetime histograms from the Weibull distribution.

  7. Sample size calculations for cluster randomised crossover trials in Australian and New Zealand intensive care research.

    Science.gov (United States)

    Arnup, Sarah J; McKenzie, Joanne E; Pilcher, David; Bellomo, Rinaldo; Forbes, Andrew B

    2018-06-01

    The cluster randomised crossover (CRXO) design provides an opportunity to conduct randomised controlled trials to evaluate low risk interventions in the intensive care setting. Our aim is to provide a tutorial on how to perform a sample size calculation for a CRXO trial, focusing on the meaning of the elements required for the calculations, with application to intensive care trials. We use all-cause in-hospital mortality from the Australian and New Zealand Intensive Care Society Adult Patient Database clinical registry to illustrate the sample size calculations. We show sample size calculations for a two-intervention, two 12-month period, cross-sectional CRXO trial. We provide the formulae, and examples of their use, to determine the number of intensive care units required to detect a risk ratio (RR) with a designated level of power between two interventions for trials in which the elements required for sample size calculations remain constant across all ICUs (unstratified design); and in which there are distinct groups (strata) of ICUs that differ importantly in the elements required for sample size calculations (stratified design). The CRXO design markedly reduces the sample size requirement compared with the parallel-group, cluster randomised design for the example cases. The stratified design further reduces the sample size requirement compared with the unstratified design. The CRXO design enables the evaluation of routinely used interventions that can bring about small, but important, improvements in patient care in the intensive care setting.

  8. TRAN-STAT, Issue No. 3, January 1978. Topics discussed: some statistical aspects of compositing field samples

    International Nuclear Information System (INIS)

    Gilbert, R.O.

    1978-01-01

    Some statistical aspects of compositing field samples of soils for determining the content of Pu are discussed. Some of the potential problems involved in pooling samples are reviewed. This is followed by more detailed discussions and examples of compositing designs, adequacy of mixing, statistical models and their role in compositing, and related topics

  9. Comparing simulated and theoretical sampling distributions of the U3 person-fit statistic

    NARCIS (Netherlands)

    Emons, Wilco H.M.; Meijer, R.R.; Sijtsma, Klaas

    2002-01-01

    The accuracy with which the theoretical sampling distribution of van der Flier’s person-fit statistic U3 approaches the empirical U3 sampling distribution is affected by the item discrimination. A simulation study showed that for tests with a moderate or a strong mean item discrimination, the Type I

  10. Evaluation of pump pulsation in respirable size-selective sampling: part II. Changes in sampling efficiency.

    Science.gov (United States)

    Lee, Eun Gyung; Lee, Taekhee; Kim, Seung Won; Lee, Larry; Flemmer, Michael M; Harper, Martin

    2014-01-01

    This second, and concluding, part of this study evaluated changes in sampling efficiency of respirable size-selective samplers due to air pulsations generated by the selected personal sampling pumps characterized in Part I (Lee E, Lee L, Möhlmann C et al. Evaluation of pump pulsation in respirable size-selective sampling: Part I. Pulsation measurements. Ann Occup Hyg 2013). Nine particle sizes of monodisperse ammonium fluorescein (from 1 to 9 μm mass median aerodynamic diameter) were generated individually by a vibrating orifice aerosol generator from dilute solutions of fluorescein in aqueous ammonia and then injected into an environmental chamber. To collect these particles, 10-mm nylon cyclones, also known as Dorr-Oliver (DO) cyclones, were used with five medium volumetric flow rate pumps. Those were the Apex IS, HFS513, GilAir5, Elite5, and Basic5 pumps, which were found in Part I to generate pulsations of 5% (the lowest), 25%, 30%, 56%, and 70% (the highest), respectively. GK2.69 cyclones were used with the Legacy [pump pulsation (PP) = 15%] and Elite12 (PP = 41%) pumps for collection at high flows. The DO cyclone was also used to evaluate changes in sampling efficiency due to pulse shape. The HFS513 pump, which generates a more complex pulse shape, was compared to a single sine wave fluctuation generated by a piston. The luminescent intensity of the fluorescein extracted from each sample was measured with a luminescence spectrometer. Sampling efficiencies were obtained by dividing the intensity of the fluorescein extracted from the filter placed in a cyclone with the intensity obtained from the filter used with a sharp-edged reference sampler. Then, sampling efficiency curves were generated using a sigmoid function with three parameters and each sampling efficiency curve was compared to that of the reference cyclone by constructing bias maps. In general, no change in sampling efficiency (bias under ±10%) was observed until pulsations exceeded 25% for the

  11. Sample-size effects in fast-neutron gamma-ray production measurements: solid-cylinder samples

    International Nuclear Information System (INIS)

    Smith, D.L.

    1975-09-01

    The effects of geometry, absorption and multiple scattering in (n,Xγ) reaction measurements with solid-cylinder samples are investigated. Both analytical and Monte-Carlo methods are employed in the analysis. Geometric effects are shown to be relatively insignificant except in definition of the scattering angles. However, absorption and multiple-scattering effects are quite important; accurate microscopic differential cross sections can be extracted from experimental data only after a careful determination of corrections for these processes. The results of measurements performed using several natural iron samples (covering a wide range of sizes) confirm validity of the correction procedures described herein. It is concluded that these procedures are reliable whenever sufficiently accurate neutron and photon cross section and angular distribution information is available for the analysis. (13 figures, 5 tables) (auth)

  12. Sample size clay kaolin of primary in pegmatites regions Junco Serido - PB and Equador - RN

    International Nuclear Information System (INIS)

    Meyer, M.F.; Sousa, J.B.M.; Sales, L.R.; Silva, P.A.S.; Lima, A.D.D.

    2016-01-01

    Kaolin is a clay formed mainly of kaolinite resulting from feldspar weathering or hydrothermal. This study aims to investigate the way of occurrence, kaolin particle size of the pegmatites of the Borborema Province Pegmatitic in the regions of Junco do Serido-PB and Ecuador-RN. These variables were analyzed considering granulometric intervals obtained from wet sieving of samples of pegmatite mines in the region. Kaolin was received using sieves of 200, 325, 400 and 500 mesh and the sieve fractions retained by generating statistical parameters histograms. kaolin particles are extremely fine and pass in its entirety through 500 mesh sieve. The characterization of minerals in fine fractions by diffraction of X-rays showed that the relative amount of sericite in fractions retained in sieves 400 and 500 mesh impairing the whiteness and mineralogical texture kaolin production. (author)

  13. Page sample size in web accessibility testing: how many pages is enough?

    NARCIS (Netherlands)

    Velleman, Eric Martin; van der Geest, Thea

    2013-01-01

    Various countries and organizations use a different sampling approach and sample size of web pages in accessibility conformance tests. We are conducting a systematic analysis to determine how many pages is enough for testing whether a website is compliant with standard accessibility guidelines. This

  14. Sensitivity of Mantel Haenszel Model and Rasch Model as Viewed From Sample Size

    OpenAIRE

    ALWI, IDRUS

    2011-01-01

    The aims of this research is to study the sensitivity comparison of Mantel Haenszel and Rasch Model for detection differential item functioning, observed from the sample size. These two differential item functioning (DIF) methods were compared using simulate binary item respon data sets of varying sample size,  200 and 400 examinees were used in the analyses, a detection method of differential item functioning (DIF) based on gender difference. These test conditions were replication 4 tim...

  15. Effects of (α,n) contaminants and sample multiplication on statistical neutron correlation measurements

    International Nuclear Information System (INIS)

    Dowdy, E.J.; Hansen, G.E.; Robba, A.A.; Pratt, J.C.

    1980-01-01

    The complete formalism for the use of statistical neutron fluctuation measurements for the nondestructive assay of fissionable materials has been developed. This formalism includes the effect of detector deadtime, neutron multiplicity, random neutron pulse contributions from (α,n) contaminants in the sample, and the sample multiplication of both fission-related and background neutrons

  16. STATISTICAL LANDMARKS AND PRACTICAL ISSUES REGARDING THE USE OF SIMPLE RANDOM SAMPLING IN MARKET RESEARCHES

    Directory of Open Access Journals (Sweden)

    CODRUŢA DURA

    2010-01-01

    Full Text Available The sample represents a particular segment of the statistical populationchosen to represent it as a whole. The representativeness of the sample determines the accuracyfor estimations made on the basis of calculating the research indicators and the inferentialstatistics. The method of random sampling is part of probabilistic methods which can be usedwithin marketing research and it is characterized by the fact that it imposes the requirementthat each unit belonging to the statistical population should have an equal chance of beingselected for the sampling process. When the simple random sampling is meant to be rigorouslyput into practice, it is recommended to use the technique of random number tables in order toconfigure the sample which will provide information that the marketer needs. The paper alsodetails the practical procedure implemented in order to create a sample for a marketingresearch by generating random numbers using the facilities offered by Microsoft Excel.

  17. Statistical Decision Theory Estimation, Testing, and Selection

    CERN Document Server

    Liese, Friedrich

    2008-01-01

    Suitable for advanced graduate students and researchers in mathematical statistics and decision theory, this title presents an account of the concepts and a treatment of the major results of classical finite sample size decision theory and modern asymptotic decision theory

  18. Statistical Inference for Data Adaptive Target Parameters.

    Science.gov (United States)

    Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J

    2016-05-01

    Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.

  19. Research Note Pilot survey to assess sample size for herbaceous ...

    African Journals Online (AJOL)

    A pilot survey to determine sub-sample size (number of point observations per plot) for herbaceous species composition assessments, using a wheel-point apparatus applying the nearest-plant method, was conducted. Three plots differing in species composition on the Zululand coastal plain were selected, and on each plot ...

  20. Acceptance sampling using judgmental and randomly selected samples

    Energy Technology Data Exchange (ETDEWEB)

    Sego, Landon H.; Shulman, Stanley A.; Anderson, Kevin K.; Wilson, John E.; Pulsipher, Brent A.; Sieber, W. Karl

    2010-09-01

    We present a Bayesian model for acceptance sampling where the population consists of two groups, each with different levels of risk of containing unacceptable items. Expert opinion, or judgment, may be required to distinguish between the high and low-risk groups. Hence, high-risk items are likely to be identifed (and sampled) using expert judgment, while the remaining low-risk items are sampled randomly. We focus on the situation where all observed samples must be acceptable. Consequently, the objective of the statistical inference is to quantify the probability that a large percentage of the unsampled items in the population are also acceptable. We demonstrate that traditional (frequentist) acceptance sampling and simpler Bayesian formulations of the problem are essentially special cases of the proposed model. We explore the properties of the model in detail, and discuss the conditions necessary to ensure that required samples sizes are non-decreasing function of the population size. The method is applicable to a variety of acceptance sampling problems, and, in particular, to environmental sampling where the objective is to demonstrate the safety of reoccupying a remediated facility that has been contaminated with a lethal agent.

  1. Assessing Statistical Competencies in Clinical and Translational Science Education: One Size Does Not Fit All

    Science.gov (United States)

    Lindsell, Christopher J.; Welty, Leah J.; Mazumdar, Madhu; Thurston, Sally W.; Rahbar, Mohammad H.; Carter, Rickey E.; Pollock, Bradley H.; Cucchiara, Andrew J.; Kopras, Elizabeth J.; Jovanovic, Borko D.; Enders, Felicity T.

    2014-01-01

    Abstract Introduction Statistics is an essential training component for a career in clinical and translational science (CTS). Given the increasing complexity of statistics, learners may have difficulty selecting appropriate courses. Our question was: what depth of statistical knowledge do different CTS learners require? Methods For three types of CTS learners (principal investigator, co‐investigator, informed reader of the literature), each with different backgrounds in research (no previous research experience, reader of the research literature, previous research experience), 18 experts in biostatistics, epidemiology, and research design proposed levels for 21 statistical competencies. Results Statistical competencies were categorized as fundamental, intermediate, or specialized. CTS learners who intend to become independent principal investigators require more specialized training, while those intending to become informed consumers of the medical literature require more fundamental education. For most competencies, less training was proposed for those with more research background. Discussion When selecting statistical coursework, the learner's research background and career goal should guide the decision. Some statistical competencies are considered to be more important than others. Baseline knowledge assessments may help learners identify appropriate coursework. Conclusion Rather than one size fits all, tailoring education to baseline knowledge, learner background, and future goals increases learning potential while minimizing classroom time. PMID:25212569

  2. Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

    Science.gov (United States)

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.

  3. Special nuclear material inventory sampling plans

    International Nuclear Information System (INIS)

    Vaccaro, H.; Goldman, A.

    1987-01-01

    Since their introduction in 1942, sampling inspection procedures have been common quality assurance practice. The U.S. Department of Energy (DOE) supports such sampling of special nuclear materials inventories. The DOE Order 5630.7 states, Operations Offices may develop and use statistically valid sampling plans appropriate for their site-specific needs. The benefits for nuclear facilities operations include reduced worker exposure and reduced work load. Improved procedures have been developed for obtaining statistically valid sampling plans that maximize these benefits. The double sampling concept is described and the resulting sample sizes for double sample plans are compared with other plans. An algorithm is given for finding optimal double sampling plans that assist in choosing the appropriate detection and false alarm probabilities for various sampling plans

  4. Tidal controls on earthquake size-frequency statistics

    Science.gov (United States)

    Ide, S.; Yabe, S.; Tanaka, Y.

    2016-12-01

    The possibility that tidal stresses can trigger earthquakes is a long-standing issue in seismology. Except in some special cases, a causal relationship between seismicity and the phase of tidal stress has been rejected on the basis of studies using many small events. However, recently discovered deep tectonic tremors are highly sensitive to tidal stress levels, with the relationship being governed by a nonlinear law according to which the tremor rate increases exponentially with increasing stress; thus, slow deformation (and the probability of earthquakes) may be enhanced during periods of large tidal stress. Here, we show the influence of tidal stress on seismicity by calculating histories of tidal shear stress during the 2-week period before earthquakes. Very large earthquakes tend to occur near the time of maximum tidal stress, but this tendency is not obvious for small earthquakes. Rather, we found that tidal stress controls the earthquake size-frequency statistics; i.e., the fraction of large events increases (i.e. the b-value of the Gutenberg-Richter relation decreases) as the tidal shear stress increases. This correlation is apparent in data from the global catalog and in relatively homogeneous regional catalogues of earthquakes in Japan. The relationship is also reasonable, considering the well-known relationship between stress and the b-value. Our findings indicate that the probability of a tiny rock failure expanding to a gigantic rupture increases with increasing tidal stress levels. This finding has clear implications for probabilistic earthquake forecasting.

  5. Estimating sample size for a small-quadrat method of botanical ...

    African Journals Online (AJOL)

    Reports the results of a study conducted to determine an appropriate sample size for a small-quadrat method of botanical survey for application in the Mixed Bushveld of South Africa. Species density and grass density were measured using a small-quadrat method in eight plant communities in the Nylsvley Nature Reserve.

  6. Norm Block Sample Sizes: A Review of 17 Individually Administered Intelligence Tests

    Science.gov (United States)

    Norfolk, Philip A.; Farmer, Ryan L.; Floyd, Randy G.; Woods, Isaac L.; Hawkins, Haley K.; Irby, Sarah M.

    2015-01-01

    The representativeness, recency, and size of norm samples strongly influence the accuracy of inferences drawn from their scores. Inadequate norm samples may lead to inflated or deflated scores for individuals and poorer prediction of developmental and academic outcomes. The purpose of this study was to apply Kranzler and Floyd's method for…

  7. Statistical distribution of time to crack initiation and initial crack size using service data

    Science.gov (United States)

    Heller, R. A.; Yang, J. N.

    1977-01-01

    Crack growth inspection data gathered during the service life of the C-130 Hercules airplane were used in conjunction with a crack propagation rule to estimate the distribution of crack initiation times and of initial crack sizes. A Bayesian statistical approach was used to calculate the fraction of undetected initiation times as a function of the inspection time and the reliability of the inspection procedure used.

  8. Statistical Sampling For In-Service Inspection Of Liquid Waste Tanks At The Savannah River Site

    International Nuclear Information System (INIS)

    Harris, S.

    2011-01-01

    Savannah River Remediation, LLC (SRR) is implementing a statistical sampling strategy for In-Service Inspection (ISI) of Liquid Waste (LW) Tanks at the United States Department of Energy's Savannah River Site (SRS) in Aiken, South Carolina. As a component of SRS's corrosion control program, the ISI program assesses tank wall structural integrity through the use of ultrasonic testing (UT). The statistical strategy for ISI is based on the random sampling of a number of vertically oriented unit areas, called strips, within each tank. The number of strips to inspect was determined so as to attain, over time, a high probability of observing at least one of the worst 5% in terms of pitting and corrosion across all tanks. The probability estimation to determine the number of strips to inspect was performed using the hypergeometric distribution. Statistical tolerance limits for pit depth and corrosion rates were calculated by fitting the lognormal distribution to the data. In addition to the strip sampling strategy, a single strip within each tank was identified to serve as the baseline for a longitudinal assessment of the tank safe operational life. The statistical sampling strategy enables the ISI program to develop individual profiles of LW tank wall structural integrity that collectively provide a high confidence in their safety and integrity over operational lifetimes.

  9. Precision of quantization of the hall conductivity in a finite-size sample: Power law

    International Nuclear Information System (INIS)

    Greshnov, A. A.; Kolesnikova, E. N.; Zegrya, G. G.

    2006-01-01

    A microscopic calculation of the conductivity in the integer quantum Hall effect (IQHE) mode is carried out. The precision of quantization is analyzed for finite-size samples. The precision of quantization shows a power-law dependence on the sample size. A new scaling parameter describing this dependence is introduced. It is also demonstrated that the precision of quantization linearly depends on the ratio between the amplitude of the disorder potential and the cyclotron energy. The data obtained are compared with the results of magnetotransport measurements in mesoscopic samples

  10. Sample size for monitoring sirex populations and their natural enemies

    Directory of Open Access Journals (Sweden)

    Susete do Rocio Chiarello Penteado

    2016-09-01

    Full Text Available The woodwasp Sirex noctilio Fabricius (Hymenoptera: Siricidae was introduced in Brazil in 1988 and became the main pest in pine plantations. It has spread to about 1.000.000 ha, at different population levels, in the states of Rio Grande do Sul, Santa Catarina, Paraná, São Paulo and Minas Gerais. Control is done mainly by using a nematode, Deladenus siricidicola Bedding (Nematoda: Neothylenchidae. The evaluation of the efficiency of natural enemies has been difficult because there are no appropriate sampling systems. This study tested a hierarchical sampling system to define the sample size to monitor the S. noctilio population and the efficiency of their natural enemies, which was found to be perfectly adequate.

  11. A Total Quality-Control Plan with Right-Sized Statistical Quality-Control.

    Science.gov (United States)

    Westgard, James O

    2017-03-01

    A new Clinical Laboratory Improvement Amendments option for risk-based quality-control (QC) plans became effective in January, 2016. Called an Individualized QC Plan, this option requires the laboratory to perform a risk assessment, develop a QC plan, and implement a QC program to monitor ongoing performance of the QC plan. Difficulties in performing a risk assessment may limit validity of an Individualized QC Plan. A better alternative is to develop a Total QC Plan including a right-sized statistical QC procedure to detect medically important errors. Westgard Sigma Rules provides a simple way to select the right control rules and the right number of control measurements. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Collection of size fractionated particulate matter sample for neutron activation analysis in Japan

    International Nuclear Information System (INIS)

    Otoshi, Tsunehiko; Nakamatsu, Hiroaki; Oura, Yasuji; Ebihara, Mitsuru

    2004-01-01

    According to the decision of the 2001 Workshop on Utilization of Research Reactor (Neutron Activation Analysis (NAA) Section), size fractionated particulate matter collection for NAA was started from 2002 at two sites in Japan. The two monitoring sites, ''Tokyo'' and ''Sakata'', were classified into ''urban'' and ''rural''. In each site, two size fractions, namely PM 2-10 '' and PM 2 '' particles (aerodynamic particle size between 2 to 10 micrometer and less than 2 micrometer, respectively) were collected every month on polycarbonate membrane filters. Average concentrations of PM 10 (sum of PM 2-10 and PM 2 samples) during the common sampling period of August to November 2002 in each site were 0.031mg/m 3 in Tokyo, and 0.022mg/m 3 in Sakata. (author)

  13. Statistical Analysis of Video Frame Size Distribution Originating from Scalable Video Codec (SVC

    Directory of Open Access Journals (Sweden)

    Sima Ahmadpour

    2017-01-01

    Full Text Available Designing an effective and high performance network requires an accurate characterization and modeling of network traffic. The modeling of video frame sizes is normally applied in simulation studies and mathematical analysis and generating streams for testing and compliance purposes. Besides, video traffic assumed as a major source of multimedia traffic in future heterogeneous network. Therefore, the statistical distribution of video data can be used as the inputs for performance modeling of networks. The finding of this paper comprises the theoretical definition of distribution which seems to be relevant to the video trace in terms of its statistical properties and finds the best distribution using both the graphical method and the hypothesis test. The data set used in this article consists of layered video traces generating from Scalable Video Codec (SVC video compression technique of three different movies.

  14. A two-stage Bayesian design with sample size reestimation and subgroup analysis for phase II binary response trials.

    Science.gov (United States)

    Zhong, Wei; Koopmeiners, Joseph S; Carlin, Bradley P

    2013-11-01

    Frequentist sample size determination for binary outcome data in a two-arm clinical trial requires initial guesses of the event probabilities for the two treatments. Misspecification of these event rates may lead to a poor estimate of the necessary sample size. In contrast, the Bayesian approach that considers the treatment effect to be random variable having some distribution may offer a better, more flexible approach. The Bayesian sample size proposed by (Whitehead et al., 2008) for exploratory studies on efficacy justifies the acceptable minimum sample size by a "conclusiveness" condition. In this work, we introduce a new two-stage Bayesian design with sample size reestimation at the interim stage. Our design inherits the properties of good interpretation and easy implementation from Whitehead et al. (2008), generalizes their method to a two-sample setting, and uses a fully Bayesian predictive approach to reduce an overly large initial sample size when necessary. Moreover, our design can be extended to allow patient level covariates via logistic regression, now adjusting sample size within each subgroup based on interim analyses. We illustrate the benefits of our approach with a design in non-Hodgkin lymphoma with a simple binary covariate (patient gender), offering an initial step toward within-trial personalized medicine. Copyright © 2013 Elsevier Inc. All rights reserved.

  15. Statistical considerations for grain-size analyses of tills

    Science.gov (United States)

    Jacobs, A.M.

    1971-01-01

    Relative percentages of sand, silt, and clay from samples of the same till unit are not identical because of different lithologies in the source areas, sorting in transport, random variation, and experimental error. Random variation and experimental error can be isolated from the other two as follows. For each particle-size class of each till unit, a standard population is determined by using a normally distributed, representative group of data. New measurements are compared with the standard population and, if they compare satisfactorily, the experimental error is not significant and random variation is within the expected range for the population. The outcome of the comparison depends on numerical criteria derived from a graphical method rather than on a more commonly used one-way analysis of variance with two treatments. If the number of samples and the standard deviation of the standard population are substituted in a t-test equation, a family of hyperbolas is generated, each of which corresponds to a specific number of subsamples taken from each new sample. The axes of the graphs of the hyperbolas are the standard deviation of new measurements (horizontal axis) and the difference between the means of the new measurements and the standard population (vertical axis). The area between the two branches of each hyperbola corresponds to a satisfactory comparison between the new measurements and the standard population. Measurements from a new sample can be tested by plotting their standard deviation vs. difference in means on axes containing a hyperbola corresponding to the specific number of subsamples used. If the point lies between the branches of the hyperbola, the measurements are considered reliable. But if the point lies outside this region, the measurements are repeated. Because the critical segment of the hyperbola is approximately a straight line parallel to the horizontal axis, the test is simplified to a comparison between the means of the standard

  16. Testing University Rankings Statistically: Why this Perhaps is not such a Good Idea after All. Some Reflections on Statistical Power, Effect Size, Random Sampling and Imaginary Populations

    DEFF Research Database (Denmark)

    Schneider, Jesper Wiborg

    2012-01-01

    In this paper we discuss and question the use of statistical significance tests in relation to university rankings as recently suggested. We outline the assumptions behind and interpretations of statistical significance tests and relate this to examples from the recent SCImago Institutions Rankin...

  17. Multi-actinide analysis with AMS for ultra-trace determination and small sample sizes: advantages and drawbacks

    Energy Technology Data Exchange (ETDEWEB)

    Quinto, Francesca; Lagos, Markus; Plaschke, Markus; Schaefer, Thorsten; Geckeis, Horst [Institute for Nuclear Waste Disposal, Karlsruhe Institute of Technology (Germany); Steier, Peter; Golser, Robin [VERA Laboratory, Faculty of Physics, University of Vienna (Austria)

    2016-07-01

    With the abundance sensitivities of AMS for U-236, Np-237 and Pu-239 relative to U-238 at levels lower than 1E-15, a simultaneous determination of several actinides without previous chemical separation from each other is possible. The actinides are extracted from the matrix elements via an iron hydroxide co-precipitation and the nuclides sequentially measured from the same sputter target. This simplified method allows for the use of non-isotopic tracers and consequently the determination of Np-237 and Am-243 for which isotopic tracers with the degree of purity required by ultra-trace mass-spectrometric analysis are not available. With detection limits of circa 1E+4 atoms in a sample, 1E+8 atoms are determined with circa 1 % relative uncertainty due to counting statistics. This allows for an unprecedented reduction of the sample size down to 100 ml of natural water. However, the use of non-isotopic tracers introduces a dominating uncertainty of up to 30 % related to the reproducibility of the results. The advantages and drawbacks of the novel method will be presented with the aid of recent results from the CFM Project at the Grimsel Test Site and from the investigation of global fallout in environmental samples.

  18. Modified FlowCAM procedure for quantifying size distribution of zooplankton with sample recycling capacity.

    Directory of Open Access Journals (Sweden)

    Esther Wong

    Full Text Available We have developed a modified FlowCAM procedure for efficiently quantifying the size distribution of zooplankton. The modified method offers the following new features: 1 prevents animals from settling and clogging with constant bubbling in the sample container; 2 prevents damage to sample animals and facilitates recycling by replacing the built-in peristaltic pump with an external syringe pump, in order to generate negative pressure, creates a steady flow by drawing air from the receiving conical flask (i.e. vacuum pump, and transfers plankton from the sample container toward the main flowcell of the imaging system and finally into the receiving flask; 3 aligns samples in advance of imaging and prevents clogging with an additional flowcell placed ahead of the main flowcell. These modifications were designed to overcome the difficulties applying the standard FlowCAM procedure to studies where the number of individuals per sample is small, and since the FlowCAM can only image a subset of a sample. Our effective recycling procedure allows users to pass the same sample through the FlowCAM many times (i.e. bootstrapping the sample in order to generate a good size distribution. Although more advanced FlowCAM models are equipped with syringe pump and Field of View (FOV flowcells which can image all particles passing through the flow field; we note that these advanced setups are very expensive, offer limited syringe and flowcell sizes, and do not guarantee recycling. In contrast, our modifications are inexpensive and flexible. Finally, we compared the biovolumes estimated by automated FlowCAM image analysis versus conventional manual measurements, and found that the size of an individual zooplankter can be estimated by the FlowCAM image system after ground truthing.

  19. A cost-saving statistically based screening technique for focused sampling of a lead-contaminated site

    International Nuclear Information System (INIS)

    Moscati, A.F. Jr.; Hediger, E.M.; Rupp, M.J.

    1986-01-01

    High concentrations of lead in soils along an abandoned railroad line prompted a remedial investigation to characterize the extent of contamination across a 7-acre site. Contamination was thought to be spotty across the site reflecting its past use in battery recycling operations at discrete locations. A screening technique was employed to delineate the more highly contaminated areas by testing a statistically determined minimum number of random samples from each of seven discrete site areas. The approach not only quickly identified those site areas which would require more extensive grid sampling, but also provided a statistically defensible basis for excluding other site areas from further consideration, thus saving the cost of additional sample collection and analysis. The reduction in the number of samples collected in ''clean'' areas of the site ranged from 45 to 60%

  20. Estimation of sample size and testing power (part 6).

    Science.gov (United States)

    Hu, Liang-ping; Bao, Xiao-lei; Guan, Xue; Zhou, Shi-guo

    2012-03-01

    The design of one factor with k levels (k ≥ 3) refers to the research that only involves one experimental factor with k levels (k ≥ 3), and there is no arrangement for other important non-experimental factors. This paper introduces the estimation of sample size and testing power for quantitative data and qualitative data having a binary response variable with the design of one factor with k levels (k ≥ 3).

  1. Ecotoxicology statistical sampling

    International Nuclear Information System (INIS)

    Saona, G.

    2012-01-01

    This presentation introduces to general concepts in toxicology sample designs such as the distribution of organic or inorganic contaminants, a microbiological contamination, and the determination of the position in an eco toxicological bioassays ecosystem.

  2. Particle Sampling and Real Time Size Distribution Measurement in H2/O2/TEOS Diffusion Flame

    International Nuclear Information System (INIS)

    Ahn, K.H.; Jung, C.H.; Choi, M.; Lee, J.S.

    2001-01-01

    Growth characteristics of silica particles have been studied experimentally using in situ particle sampling technique from H 2 /O 2 /Tetraethylorthosilicate (TEOS) diffusion flame with carefully devised sampling probe. The particle morphology and the size comparisons are made between the particles sampled by the local thermophoretic method from the inside of the flame and by the electrostatic collector sampling method after the dilution sampling probe. The Transmission Electron Microscope (TEM) image processed data of these two sampling techniques are compared with Scanning Mobility Particle Sizer (SMPS) measurement. TEM image analysis of two sampling methods showed a good agreement with SMPS measurement. The effects of flame conditions and TEOS flow rates on silica particle size distributions are also investigated using the new particle dilution sampling probe. It is found that the particle size distribution characteristics and morphology are mostly governed by the coagulation process and sintering process in the flame. As the flame temperature increases, the effect of coalescence or sintering becomes an important particle growth mechanism which reduces the coagulation process. However, if the flame temperature is not high enough to sinter the aggregated particles then the coagulation process is a dominant particle growth mechanism. In a certain flame condition a secondary particle formation is observed which results in a bimodal particle size distribution

  3. The Sample Size Influence in the Accuracy of the Image Classification of the Remote Sensing

    Directory of Open Access Journals (Sweden)

    Thomaz C. e C. da Costa

    2004-12-01

    Full Text Available Landuse/landcover maps produced by classification of remote sensing images incorporate uncertainty. This uncertainty is measured by accuracy indices using reference samples. The size of the reference sample is defined by approximation by a binomial function without the use of a pilot sample. This way the accuracy are not estimated, but fixed a priori. In case of divergency between the estimated and a priori accuracy the error of the sampling will deviate from the expected error. The size using pilot sample (theorically correct procedure justify when haven´t estimate of accuracy for work area, referent the product remote sensing utility.

  4. Double asymptotics for the chi-square statistic.

    Science.gov (United States)

    Rempała, Grzegorz A; Wesołowski, Jacek

    2016-12-01

    Consider distributional limit of the Pearson chi-square statistic when the number of classes m n increases with the sample size n and [Formula: see text]. Under mild moment conditions, the limit is Gaussian for λ = ∞, Poisson for finite λ > 0, and degenerate for λ = 0.

  5. Survey of statistical and sampling needs for environmental monitoring of commercial low-level radioactive waste disposal facilities

    International Nuclear Information System (INIS)

    Eberhardt, L.L.; Thomas, J.M.

    1986-07-01

    This project was designed to develop guidance for implementing 10 CFR Part 61 and to determine the overall needs for sampling and statistical work in characterizing, surveying, monitoring, and closing commercial low-level waste sites. When cost-effectiveness and statistical reliability are of prime importance, then double sampling, compositing, and stratification (with optimal allocation) are identified as key issues. If the principal concern is avoiding questionable statistical practice, then the applicability of kriging (for assessing spatial pattern), methods for routine monitoring, and use of standard textbook formulae in reporting monitoring results should be reevaluated. Other important issues identified include sampling for estimating model parameters and the use of data from left-censored (less than detectable limits) distributions

  6. Unbiased tensor-based morphometry: improved robustness and sample size estimates for Alzheimer's disease clinical trials.

    Science.gov (United States)

    Hua, Xue; Hibar, Derrek P; Ching, Christopher R K; Boyle, Christina P; Rajagopalan, Priya; Gutman, Boris A; Leow, Alex D; Toga, Arthur W; Jack, Clifford R; Harvey, Danielle; Weiner, Michael W; Thompson, Paul M

    2013-02-01

    Various neuroimaging measures are being evaluated for tracking Alzheimer's disease (AD) progression in therapeutic trials, including measures of structural brain change based on repeated scanning of patients with magnetic resonance imaging (MRI). Methods to compute brain change must be robust to scan quality. Biases may arise if any scans are thrown out, as this can lead to the true changes being overestimated or underestimated. Here we analyzed the full MRI dataset from the first phase of Alzheimer's Disease Neuroimaging Initiative (ADNI-1) from the first phase of Alzheimer's Disease Neuroimaging Initiative (ADNI-1) and assessed several sources of bias that can arise when tracking brain changes with structural brain imaging methods, as part of a pipeline for tensor-based morphometry (TBM). In all healthy subjects who completed MRI scanning at screening, 6, 12, and 24months, brain atrophy was essentially linear with no detectable bias in longitudinal measures. In power analyses for clinical trials based on these change measures, only 39AD patients and 95 mild cognitive impairment (MCI) subjects were needed for a 24-month trial to detect a 25% reduction in the average rate of change using a two-sided test (α=0.05, power=80%). Further sample size reductions were achieved by stratifying the data into Apolipoprotein E (ApoE) ε4 carriers versus non-carriers. We show how selective data exclusion affects sample size estimates, motivating an objective comparison of different analysis techniques based on statistical power and robustness. TBM is an unbiased, robust, high-throughput imaging surrogate marker for large, multi-site neuroimaging studies and clinical trials of AD and MCI. Copyright © 2012 Elsevier Inc. All rights reserved.

  7. Big Data, Small Sample.

    Science.gov (United States)

    Gerlovina, Inna; van der Laan, Mark J; Hubbard, Alan

    2017-05-20

    Multiple comparisons and small sample size, common characteristics of many types of "Big Data" including those that are produced by genomic studies, present specific challenges that affect reliability of inference. Use of multiple testing procedures necessitates calculation of very small tail probabilities of a test statistic distribution. Results based on large deviation theory provide a formal condition that is necessary to guarantee error rate control given practical sample sizes, linking the number of tests and the sample size; this condition, however, is rarely satisfied. Using methods that are based on Edgeworth expansions (relying especially on the work of Peter Hall), we explore the impact of departures of sampling distributions from typical assumptions on actual error rates. Our investigation illustrates how far the actual error rates can be from the declared nominal levels, suggesting potentially wide-spread problems with error rate control, specifically excessive false positives. This is an important factor that contributes to "reproducibility crisis". We also review some other commonly used methods (such as permutation and methods based on finite sampling inequalities) in their application to multiple testing/small sample data. We point out that Edgeworth expansions, providing higher order approximations to the sampling distribution, offer a promising direction for data analysis that could improve reliability of studies relying on large numbers of comparisons with modest sample sizes.

  8. Parts of the Whole: Hands On Statistics

    Directory of Open Access Journals (Sweden)

    Dorothy Wallace

    2018-01-01

    Full Text Available In this column we describe a hands-on data collection lab for an introductory statistics course. The exercise elicits issues of normality, sampling, and sample mean comparisons. Based on volcanology models of tephra dispersion, this lab leads students to question the accuracy of some assumptions made in the model, particularly regarding the normality of the dispersal of tephra of identical size in a given atmospheric layer.

  9. "What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"

    Science.gov (United States)

    Ozturk, Elif

    2012-01-01

    The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…

  10. The relation between statistical power and inference in fMRI.

    Directory of Open Access Journals (Sweden)

    Henk R Cremers

    Full Text Available Statistically underpowered studies can result in experimental failure even when all other experimental considerations have been addressed impeccably. In fMRI the combination of a large number of dependent variables, a relatively small number of observations (subjects, and a need to correct for multiple comparisons can decrease statistical power dramatically. This problem has been clearly addressed yet remains controversial-especially in regards to the expected effect sizes in fMRI, and especially for between-subjects effects such as group comparisons and brain-behavior correlations. We aimed to clarify the power problem by considering and contrasting two simulated scenarios of such possible brain-behavior correlations: weak diffuse effects and strong localized effects. Sampling from these scenarios shows that, particularly in the weak diffuse scenario, common sample sizes (n = 20-30 display extremely low statistical power, poorly represent the actual effects in the full sample, and show large variation on subsequent replications. Empirical data from the Human Connectome Project resembles the weak diffuse scenario much more than the localized strong scenario, which underscores the extent of the power problem for many studies. Possible solutions to the power problem include increasing the sample size, using less stringent thresholds, or focusing on a region-of-interest. However, these approaches are not always feasible and some have major drawbacks. The most prominent solutions that may help address the power problem include model-based (multivariate prediction methods and meta-analyses with related synthesis-oriented approaches.

  11. Probability sampling design in ethnobotanical surveys of medicinal plants

    Directory of Open Access Journals (Sweden)

    Mariano Martinez Espinosa

    2012-07-01

    Full Text Available Non-probability sampling design can be used in ethnobotanical surveys of medicinal plants. However, this method does not allow statistical inferences to be made from the data generated. The aim of this paper is to present a probability sampling design that is applicable in ethnobotanical studies of medicinal plants. The sampling design employed in the research titled "Ethnobotanical knowledge of medicinal plants used by traditional communities of Nossa Senhora Aparecida do Chumbo district (NSACD, Poconé, Mato Grosso, Brazil" was used as a case study. Probability sampling methods (simple random and stratified sampling were used in this study. In order to determine the sample size, the following data were considered: population size (N of 1179 families; confidence coefficient, 95%; sample error (d, 0.05; and a proportion (p, 0.5. The application of this sampling method resulted in a sample size (n of at least 290 families in the district. The present study concludes that probability sampling methods necessarily have to be employed in ethnobotanical studies of medicinal plants, particularly where statistical inferences have to be made using data obtained. This can be achieved by applying different existing probability sampling methods, or better still, a combination of such methods.

  12. Optimal sampling strategy for data mining

    International Nuclear Information System (INIS)

    Ghaffar, A.; Shahbaz, M.; Mahmood, W.

    2013-01-01

    Latest technology like Internet, corporate intranets, data warehouses, ERP's, satellites, digital sensors, embedded systems, mobiles networks all are generating such a massive amount of data that it is getting very difficult to analyze and understand all these data, even using data mining tools. Huge datasets are becoming a difficult challenge for classification algorithms. With increasing amounts of data, data mining algorithms are getting slower and analysis is getting less interactive. Sampling can be a solution. Using a fraction of computing resources, Sampling can often provide same level of accuracy. The process of sampling requires much care because there are many factors involved in the determination of correct sample size. The approach proposed in this paper tries to find a solution to this problem. Based on a statistical formula, after setting some parameters, it returns a sample size called s ufficient sample size , which is then selected through probability sampling. Results indicate the usefulness of this technique in coping with the problem of huge datasets. (author)

  13. The matchmaking paradox: a statistical explanation

    International Nuclear Information System (INIS)

    Eliazar, Iddo I; Sokolov, Igor M

    2010-01-01

    Medical surveys regarding the number of heterosexual partners per person yield different female and male averages-a result which, from a physical standpoint, is impossible. In this paper we term this puzzle the 'matchmaking paradox', and establish a statistical model explaining it. We consider a bipartite graph with N male and N female nodes (N >> 1), and B bonds connecting them (B >> 1). Each node is associated a random 'attractiveness level', and the bonds connect to the nodes randomly-with probabilities which are proportionate to the nodes' attractiveness levels. The population's average bonds-per-nodes B/N is estimated via a sample average calculated from a survey of size n (n >> 1). A comprehensive statistical analysis of this model is carried out, asserting that (i) the sample average well estimates the population average if and only if the attractiveness levels possess a finite mean; (ii) if the attractiveness levels are governed by a 'fat-tailed' probability law then the sample average displays wild fluctuations and strong skew-thus providing a statistical explanation to the matchmaking paradox.

  14. Mean stress sensitivity of ductile iron with respect to technological and statistical size effect considering defects

    Directory of Open Access Journals (Sweden)

    Kainzinger Paul

    2014-06-01

    Full Text Available Specimens of two sizes have been taken from two sampling locations within a wind turbine hub made of nodular cast iron (EN-GJS-400-18-LT for constant amplitude fatigue testing. The sampling positions exhibit varying cooling conditions, resulting in different microstructures. Fatigue tests have been carried out at R-ratios of R = −1 and R = 0. The coarse microstructure as well as the larger specimens yielded in lower fatigue strengths. No effect of the microstructure or the specimen size on the mean stress sensitivity has been found. Fractographic analysis of the fractured specimen's surface revealed micro-shrinkages to be the source of crack initiation for all specimens. Micro-shrinkage size increases from fine to coarse microstructure and with increasing specimen size. The El-Haddad equation using the √area parameter was used to describe the fatigue limit. The results were in good agreement with the experiments.

  15. Methodology for sample preparation and size measurement of commercial ZnO nanoparticles

    Directory of Open Access Journals (Sweden)

    Pei-Jia Lu

    2018-04-01

    Full Text Available This study discusses the strategies on sample preparation to acquire images with sufficient quality for size characterization by scanning electron microscope (SEM using two commercial ZnO nanoparticles of different surface properties as a demonstration. The central idea is that micrometer sized aggregates of ZnO in powdered forms need to firstly be broken down to nanosized particles through an appropriate process to generate nanoparticle dispersion before being deposited on a flat surface for SEM observation. Analytical tools such as contact angle, dynamic light scattering and zeta potential have been utilized to optimize the procedure for sample preparation and to check the quality of the results. Meanwhile, measurements of zeta potential values on flat surfaces also provide critical information and save lots of time and efforts in selection of suitable substrate for particles of different properties to be attracted and kept on the surface without further aggregation. This simple, low-cost methodology can be generally applied on size characterization of commercial ZnO nanoparticles with limited information from vendors. Keywords: Zinc oxide, Nanoparticles, Methodology

  16. The extended statistical analysis of toxicity tests using standardised effect sizes (SESs): a comparison of nine published papers.

    Science.gov (United States)

    Festing, Michael F W

    2014-01-01

    The safety of chemicals, drugs, novel foods and genetically modified crops is often tested using repeat-dose sub-acute toxicity tests in rats or mice. It is important to avoid misinterpretations of the results as these tests are used to help determine safe exposure levels in humans. Treated and control groups are compared for a range of haematological, biochemical and other biomarkers which may indicate tissue damage or other adverse effects. However, the statistical analysis and presentation of such data poses problems due to the large number of statistical tests which are involved. Often, it is not clear whether a "statistically significant" effect is real or a false positive (type I error) due to sampling variation. The author's conclusions appear to be reached somewhat subjectively by the pattern of statistical significances, discounting those which they judge to be type I errors and ignoring any biomarker where the p-value is greater than p = 0.05. However, by using standardised effect sizes (SESs) a range of graphical methods and an over-all assessment of the mean absolute response can be made. The approach is an extension, not a replacement of existing methods. It is intended to assist toxicologists and regulators in the interpretation of the results. Here, the SES analysis has been applied to data from nine published sub-acute toxicity tests in order to compare the findings with those of the author's. Line plots, box plots and bar plots show the pattern of response. Dose-response relationships are easily seen. A "bootstrap" test compares the mean absolute differences across dose groups. In four out of seven papers where the no observed adverse effect level (NOAEL) was estimated by the authors, it was set too high according to the bootstrap test, suggesting that possible toxicity is under-estimated.

  17. The extended statistical analysis of toxicity tests using standardised effect sizes (SESs: a comparison of nine published papers.

    Directory of Open Access Journals (Sweden)

    Michael F W Festing

    Full Text Available The safety of chemicals, drugs, novel foods and genetically modified crops is often tested using repeat-dose sub-acute toxicity tests in rats or mice. It is important to avoid misinterpretations of the results as these tests are used to help determine safe exposure levels in humans. Treated and control groups are compared for a range of haematological, biochemical and other biomarkers which may indicate tissue damage or other adverse effects. However, the statistical analysis and presentation of such data poses problems due to the large number of statistical tests which are involved. Often, it is not clear whether a "statistically significant" effect is real or a false positive (type I error due to sampling variation. The author's conclusions appear to be reached somewhat subjectively by the pattern of statistical significances, discounting those which they judge to be type I errors and ignoring any biomarker where the p-value is greater than p = 0.05. However, by using standardised effect sizes (SESs a range of graphical methods and an over-all assessment of the mean absolute response can be made. The approach is an extension, not a replacement of existing methods. It is intended to assist toxicologists and regulators in the interpretation of the results. Here, the SES analysis has been applied to data from nine published sub-acute toxicity tests in order to compare the findings with those of the author's. Line plots, box plots and bar plots show the pattern of response. Dose-response relationships are easily seen. A "bootstrap" test compares the mean absolute differences across dose groups. In four out of seven papers where the no observed adverse effect level (NOAEL was estimated by the authors, it was set too high according to the bootstrap test, suggesting that possible toxicity is under-estimated.

  18. Two sample Bayesian prediction intervals for order statistics based on the inverse exponential-type distributions using right censored sample

    Directory of Open Access Journals (Sweden)

    M.M. Mohie El-Din

    2011-10-01

    Full Text Available In this paper, two sample Bayesian prediction intervals for order statistics (OS are obtained. This prediction is based on a certain class of the inverse exponential-type distributions using a right censored sample. A general class of prior density functions is used and the predictive cumulative function is obtained in the two samples case. The class of the inverse exponential-type distributions includes several important distributions such the inverse Weibull distribution, the inverse Burr distribution, the loglogistic distribution, the inverse Pareto distribution and the inverse paralogistic distribution. Special cases of the inverse Weibull model such as the inverse exponential model and the inverse Rayleigh model are considered.

  19. Size and shape characteristics of drumlins, derived from a large sample, and associated scaling laws

    Science.gov (United States)

    Clark, Chris D.; Hughes, Anna L. C.; Greenwood, Sarah L.; Spagnolo, Matteo; Ng, Felix S. L.

    2009-04-01

    Ice sheets flowing across a sedimentary bed usually produce a landscape of blister-like landforms streamlined in the direction of the ice flow and with each bump of the order of 10 2 to 10 3 m in length and 10 1 m in relief. Such landforms, known as drumlins, have mystified investigators for over a hundred years. A satisfactory explanation for their formation, and thus an appreciation of their glaciological significance, has remained elusive. A recent advance has been in numerical modelling of the land-forming process. In anticipation of future modelling endeavours, this paper is motivated by the requirement for robust data on drumlin size and shape for model testing. From a systematic programme of drumlin mapping from digital elevation models and satellite images of Britain and Ireland, we used a geographic information system to compile a range of statistics on length L, width W, and elongation ratio E (where E = L/ W) for a large sample. Mean L, is found to be 629 m ( n = 58,983), mean W is 209 m and mean E is 2.9 ( n = 37,043). Most drumlins are between 250 and 1000 metres in length; between 120 and 300 metres in width; and between 1.7 and 4.1 times as long as they are wide. Analysis of such data and plots of drumlin width against length reveals some new insights. All frequency distributions are unimodal from which we infer that the geomorphological label of 'drumlin' is fair in that this is a true single population of landforms, rather than an amalgam of different landform types. Drumlin size shows a clear minimum bound of around 100 m (horizontal). Maybe drumlins are generated at many scales and this is the minimum, or this value may be an indication of the fundamental scale of bump generation ('proto-drumlins') prior to them growing and elongating. A relationship between drumlin width and length is found (with r2 = 0.48) and that is approximately W = 7 L 1/2 when measured in metres. A surprising and sharply-defined line bounds the data cloud plotted in E- W

  20. Choice of Sample Split in Out-of-Sample Forecast Evaluation

    DEFF Research Database (Denmark)

    Hansen, Peter Reinhard; Timmermann, Allan

    , while conversely the power of forecast evaluation tests is strongest with long out-of-sample periods. To deal with size distortions, we propose a test statistic that is robust to the effect of considering multiple sample split points. Empirical applications to predictabil- ity of stock returns......Out-of-sample tests of forecast performance depend on how a given data set is split into estimation and evaluation periods, yet no guidance exists on how to choose the split point. Empirical forecast evaluation results can therefore be difficult to interpret, particularly when several values...... and inflation demonstrate that out-of-sample forecast evaluation results can critically depend on how the sample split is determined....

  1. Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure

    Directory of Open Access Journals (Sweden)

    Shaukat S. Shahid

    2016-06-01

    Full Text Available In this study, we used bootstrap simulation of a real data set to investigate the impact of sample size (N = 20, 30, 40 and 50 on the eigenvalues and eigenvectors resulting from principal component analysis (PCA. For each sample size, 100 bootstrap samples were drawn from environmental data matrix pertaining to water quality variables (p = 22 of a small data set comprising of 55 samples (stations from where water samples were collected. Because in ecology and environmental sciences the data sets are invariably small owing to high cost of collection and analysis of samples, we restricted our study to relatively small sample sizes. We focused attention on comparison of first 6 eigenvectors and first 10 eigenvalues. Data sets were compared using agglomerative cluster analysis using Ward’s method that does not require any stringent distributional assumptions.

  2. Statistical analysis of support thickness and particle size effects in HRTEM imaging of metal nanoparticles

    International Nuclear Information System (INIS)

    House, Stephen D.; Bonifacio, Cecile S.; Grieshaber, Ross V.; Li, Long; Zhang, Zhongfan; Ciston, Jim; Stach, Eric A.; Yang, Judith C.

    2016-01-01

    High-resolution transmission electron microscopy (HRTEM) examination of nanoparticles requires their placement on some manner of support – either TEM grid membranes or part of the material itself, as in many heterogeneous catalyst systems – but a systematic quantification of the practical imaging limits of this approach has been lacking. Here we address this issue through a statistical evaluation of how nanoparticle size and substrate thickness affects the ability to resolve structural features of interest in HRTEM images of metallic nanoparticles on common support membranes. The visibility of lattice fringes from crystalline Au nanoparticles on amorphous carbon and silicon supports of varying thickness was investigated with both conventional and aberration-corrected TEM. Over the 1–4 nm nanoparticle size range examined, the probability of successfully resolving lattice fringes differed significantly as a function both of nanoparticle size and support thickness. Statistical analysis was used to formulate guidelines for the selection of supports and to quantify the impact a given support would have on HRTEM imaging of crystalline structure. For nanoparticles ≥1 nm, aberration-correction was found to provide limited benefit for the purpose of visualizing lattice fringes; electron dose is more predictive of lattice fringe visibility than aberration correction. These results confirm that the ability to visualize lattice fringes is ultimately dependent on the signal-to-noise ratio of the HRTEM images, rather than the point-to-point resolving power of the microscope. This study provides a benchmark for HRTEM imaging of crystalline supported metal nanoparticles and is extensible to a wide variety of supports and nanostructures. - Highlights: • The impact of supports on imaging nanoparticle lattice structure is quantified. • Visualization probabilities given particle size and support thickness are estimated. • Aberration-correction provided limited benefit

  3. Statistical analysis of support thickness and particle size effects in HRTEM imaging of metal nanoparticles

    Energy Technology Data Exchange (ETDEWEB)

    House, Stephen D., E-mail: sdh46@pitt.edu [Chemical and Petroleum Engineering, and Physics, University of Pittsburgh, Pittsburgh, PA 15261 (United States); Bonifacio, Cecile S.; Grieshaber, Ross V.; Li, Long; Zhang, Zhongfan [Chemical and Petroleum Engineering, and Physics, University of Pittsburgh, Pittsburgh, PA 15261 (United States); Ciston, Jim [National Center of Electron Microscopy, Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (United States); Stach, Eric A. [Center for Functional Nanomaterials, Brookhaven National Laboratory, Upton, NY 11973 (United States); Yang, Judith C. [Chemical and Petroleum Engineering, and Physics, University of Pittsburgh, Pittsburgh, PA 15261 (United States)

    2016-10-15

    High-resolution transmission electron microscopy (HRTEM) examination of nanoparticles requires their placement on some manner of support – either TEM grid membranes or part of the material itself, as in many heterogeneous catalyst systems – but a systematic quantification of the practical imaging limits of this approach has been lacking. Here we address this issue through a statistical evaluation of how nanoparticle size and substrate thickness affects the ability to resolve structural features of interest in HRTEM images of metallic nanoparticles on common support membranes. The visibility of lattice fringes from crystalline Au nanoparticles on amorphous carbon and silicon supports of varying thickness was investigated with both conventional and aberration-corrected TEM. Over the 1–4 nm nanoparticle size range examined, the probability of successfully resolving lattice fringes differed significantly as a function both of nanoparticle size and support thickness. Statistical analysis was used to formulate guidelines for the selection of supports and to quantify the impact a given support would have on HRTEM imaging of crystalline structure. For nanoparticles ≥1 nm, aberration-correction was found to provide limited benefit for the purpose of visualizing lattice fringes; electron dose is more predictive of lattice fringe visibility than aberration correction. These results confirm that the ability to visualize lattice fringes is ultimately dependent on the signal-to-noise ratio of the HRTEM images, rather than the point-to-point resolving power of the microscope. This study provides a benchmark for HRTEM imaging of crystalline supported metal nanoparticles and is extensible to a wide variety of supports and nanostructures. - Highlights: • The impact of supports on imaging nanoparticle lattice structure is quantified. • Visualization probabilities given particle size and support thickness are estimated. • Aberration-correction provided limited benefit

  4. B-graph sampling to estimate the size of a hidden population

    NARCIS (Netherlands)

    Spreen, M.; Bogaerts, S.

    2015-01-01

    Link-tracing designs are often used to estimate the size of hidden populations by utilizing the relational links between their members. A major problem in studies of hidden populations is the lack of a convenient sampling frame. The most frequently applied design in studies of hidden populations is

  5. Estimating HIES Data through Ratio and Regression Methods for Different Sampling Designs

    Directory of Open Access Journals (Sweden)

    Faqir Muhammad

    2007-01-01

    Full Text Available In this study, comparison has been made for different sampling designs, using the HIES data of North West Frontier Province (NWFP for 2001-02 and 1998-99 collected from the Federal Bureau of Statistics, Statistical Division, Government of Pakistan, Islamabad. The performance of the estimators has also been considered using bootstrap and Jacknife. A two-stage stratified random sample design is adopted by HIES. In the first stage, enumeration blocks and villages are treated as the first stage Primary Sampling Units (PSU. The sample PSU’s are selected with probability proportional to size. Secondary Sampling Units (SSU i.e., households are selected by systematic sampling with a random start. They have used a single study variable. We have compared the HIES technique with some other designs, which are: Stratified Simple Random Sampling. Stratified Systematic Sampling. Stratified Ranked Set Sampling. Stratified Two Phase Sampling. Ratio and Regression methods were applied with two study variables, which are: Income (y and Household sizes (x. Jacknife and Bootstrap are used for variance replication. Simple Random Sampling with sample size (462 to 561 gave moderate variances both by Jacknife and Bootstrap. By applying Systematic Sampling, we received moderate variance with sample size (467. In Jacknife with Systematic Sampling, we obtained variance of regression estimator greater than that of ratio estimator for a sample size (467 to 631. At a sample size (952 variance of ratio estimator gets greater than that of regression estimator. The most efficient design comes out to be Ranked set sampling compared with other designs. The Ranked set sampling with jackknife and bootstrap, gives minimum variance even with the smallest sample size (467. Two Phase sampling gave poor performance. Multi-stage sampling applied by HIES gave large variances especially if used with a single study variable.

  6. CAN'T MISS--conquer any number task by making important statistics simple. Part 2. Probability, populations, samples, and normal distributions.

    Science.gov (United States)

    Hansen, John P

    2003-01-01

    Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 2, describes probability, populations, and samples. The uses of descriptive and inferential statistics are outlined. The article also discusses the properties and probability of normal distributions, including the standard normal distribution.

  7. A Sorting Statistic with Application in Neurological Magnetic Resonance Imaging of Autism.

    Science.gov (United States)

    Levman, Jacob; Takahashi, Emi; Forgeron, Cynthia; MacDonald, Patrick; Stewart, Natalie; Lim, Ashley; Martel, Anne

    2018-01-01

    Effect size refers to the assessment of the extent of differences between two groups of samples on a single measurement. Assessing effect size in medical research is typically accomplished with Cohen's d statistic. Cohen's d statistic assumes that average values are good estimators of the position of a distribution of numbers and also assumes Gaussian (or bell-shaped) underlying data distributions. In this paper, we present an alternative evaluative statistic that can quantify differences between two data distributions in a manner that is similar to traditional effect size calculations; however, the proposed approach avoids making assumptions regarding the shape of the underlying data distribution. The proposed sorting statistic is compared with Cohen's d statistic and is demonstrated to be capable of identifying feature measurements of potential interest for which Cohen's d statistic implies the measurement would be of little use. This proposed sorting statistic has been evaluated on a large clinical autism dataset from Boston Children's Hospital , Harvard Medical School , demonstrating that it can potentially play a constructive role in future healthcare technologies.

  8. Designs and Methods for Association Studies and Population Size Inference in Statistical Genetics

    DEFF Research Database (Denmark)

    Waltoft, Berit Lindum

    method provides a simple goodness of t test by comparing the observed SFS with the expected SFS under a given model of population size changes. By the use of Monte Carlo estimation the expected time between coalescent events can be estimated and the expected SFS can thereby be evaluated. Using......). The OR is interpreted as the eect of an exposure on the probability of being diseased at the end of follow-up, while the interpretation of the IRR is the eect of an exposure on the probability of becoming diseased. Through a simulation study, the OR from a classical case-control study is shown to be an inconsistent...... the classical chi-square statistics we are able to infer single parameter models. Multiple parameter models, e.g. multiple epochs, are harder to identify. By introducing the inference of population size back in time as an inverse problem, the second procedure applies the theory of smoothing splines to infer...

  9. Normal Approximations to the Distributions of the Wilcoxon Statistics: Accurate to What "N"? Graphical Insights

    Science.gov (United States)

    Bellera, Carine A.; Julien, Marilyse; Hanley, James A.

    2010-01-01

    The Wilcoxon statistics are usually taught as nonparametric alternatives for the 1- and 2-sample Student-"t" statistics in situations where the data appear to arise from non-normal distributions, or where sample sizes are so small that we cannot check whether they do. In the past, critical values, based on exact tail areas, were…

  10. Sample sizing of biological materials analyzed by energy dispersion X-ray fluorescence

    International Nuclear Information System (INIS)

    Paiva, Jose D.S.; Franca, Elvis J.; Magalhaes, Marcelo R.L.; Almeida, Marcio E.S.; Hazin, Clovis A.

    2013-01-01

    Analytical portions used in chemical analyses are usually less than 1g. Errors resulting from the sampling are barely evaluated, since this type of study is a time-consuming procedure, with high costs for the chemical analysis of large number of samples. The energy dispersion X-ray fluorescence - EDXRF is a non-destructive and fast analytical technique with the possibility of determining several chemical elements. Therefore, the aim of this study was to provide information on the minimum analytical portion for quantification of chemical elements in biological matrices using EDXRF. Three species were sampled in mangroves from the Pernambuco, Brazil. Tree leaves were washed with distilled water, oven-dried at 60 deg C and milled until 0.5 mm particle size. Ten test-portions of approximately 500 mg for each species were transferred to vials sealed with polypropylene film. The quality of the analytical procedure was evaluated from the reference materials IAEA V10 Hay Powder, SRM 2976 Apple Leaves. After energy calibration, all samples were analyzed under vacuum for 100 seconds for each group of chemical elements. The voltage used was 15 kV and 50 kV for chemical elements of atomic number lower than 22 and the others, respectively. For the best analytical conditions, EDXRF was capable of estimating the sample size uncertainty for further determination of chemical elements in leaves. (author)

  11. Sample sizing of biological materials analyzed by energy dispersion X-ray fluorescence

    Energy Technology Data Exchange (ETDEWEB)

    Paiva, Jose D.S.; Franca, Elvis J.; Magalhaes, Marcelo R.L.; Almeida, Marcio E.S.; Hazin, Clovis A., E-mail: dan-paiva@hotmail.com, E-mail: ejfranca@cnen.gov.br, E-mail: marcelo_rlm@hotmail.com, E-mail: maensoal@yahoo.com.br, E-mail: chazin@cnen.gov.b [Centro Regional de Ciencias Nucleares do Nordeste (CRCN-NE/CNEN-PE), Recife, PE (Brazil)

    2013-07-01

    Analytical portions used in chemical analyses are usually less than 1g. Errors resulting from the sampling are barely evaluated, since this type of study is a time-consuming procedure, with high costs for the chemical analysis of large number of samples. The energy dispersion X-ray fluorescence - EDXRF is a non-destructive and fast analytical technique with the possibility of determining several chemical elements. Therefore, the aim of this study was to provide information on the minimum analytical portion for quantification of chemical elements in biological matrices using EDXRF. Three species were sampled in mangroves from the Pernambuco, Brazil. Tree leaves were washed with distilled water, oven-dried at 60 deg C and milled until 0.5 mm particle size. Ten test-portions of approximately 500 mg for each species were transferred to vials sealed with polypropylene film. The quality of the analytical procedure was evaluated from the reference materials IAEA V10 Hay Powder, SRM 2976 Apple Leaves. After energy calibration, all samples were analyzed under vacuum for 100 seconds for each group of chemical elements. The voltage used was 15 kV and 50 kV for chemical elements of atomic number lower than 22 and the others, respectively. For the best analytical conditions, EDXRF was capable of estimating the sample size uncertainty for further determination of chemical elements in leaves. (author)

  12. Sample size calculation while controlling false discovery rate for differential expression analysis with RNA-sequencing experiments.

    Science.gov (United States)

    Bi, Ran; Liu, Peng

    2016-03-31

    RNA-Sequencing (RNA-seq) experiments have been popularly applied to transcriptome studies in recent years. Such experiments are still relatively costly. As a result, RNA-seq experiments often employ a small number of replicates. Power analysis and sample size calculation are challenging in the context of differential expression analysis with RNA-seq data. One challenge is that there are no closed-form formulae to calculate power for the popularly applied tests for differential expression analysis. In addition, false discovery rate (FDR), instead of family-wise type I error rate, is controlled for the multiple testing error in RNA-seq data analysis. So far, there are very few proposals on sample size calculation for RNA-seq experiments. In this paper, we propose a procedure for sample size calculation while controlling FDR for RNA-seq experimental design. Our procedure is based on the weighted linear model analysis facilitated by the voom method which has been shown to have competitive performance in terms of power and FDR control for RNA-seq differential expression analysis. We derive a method that approximates the average power across the differentially expressed genes, and then calculate the sample size to achieve a desired average power while controlling FDR. Simulation results demonstrate that the actual power of several popularly applied tests for differential expression is achieved and is close to the desired power for RNA-seq data with sample size calculated based on our method. Our proposed method provides an efficient algorithm to calculate sample size while controlling FDR for RNA-seq experimental design. We also provide an R package ssizeRNA that implements our proposed method and can be downloaded from the Comprehensive R Archive Network ( http://cran.r-project.org ).

  13. Texture classification by texton: statistical versus binary.

    Directory of Open Access Journals (Sweden)

    Zhenhua Guo

    Full Text Available Using statistical textons for texture classification has shown great success recently. The maximal response 8 (Statistical_MR8, image patch (Statistical_Joint and locally invariant fractal (Statistical_Fractal are typical statistical texton algorithms and state-of-the-art texture classification methods. However, there are two limitations when using these methods. First, it needs a training stage to build a texton library, thus the recognition accuracy will be highly depended on the training samples; second, during feature extraction, local feature is assigned to a texton by searching for the nearest texton in the whole library, which is time consuming when the library size is big and the dimension of feature is high. To address the above two issues, in this paper, three binary texton counterpart methods were proposed, Binary_MR8, Binary_Joint, and Binary_Fractal. These methods do not require any training step but encode local feature into binary representation directly. The experimental results on the CUReT, UIUC and KTH-TIPS databases show that binary texton could get sound results with fast feature extraction, especially when the image size is not big and the quality of image is not poor.

  14. Sample size determination for a three-arm equivalence trial of Poisson and negative binomial responses.

    Science.gov (United States)

    Chang, Yu-Wei; Tsong, Yi; Zhao, Zhigen

    2017-01-01

    Assessing equivalence or similarity has drawn much attention recently as many drug products have lost or will lose their patents in the next few years, especially certain best-selling biologics. To claim equivalence between the test treatment and the reference treatment when assay sensitivity is well established from historical data, one has to demonstrate both superiority of the test treatment over placebo and equivalence between the test treatment and the reference treatment. Thus, there is urgency for practitioners to derive a practical way to calculate sample size for a three-arm equivalence trial. The primary endpoints of a clinical trial may not always be continuous, but may be discrete. In this paper, the authors derive power function and discuss sample size requirement for a three-arm equivalence trial with Poisson and negative binomial clinical endpoints. In addition, the authors examine the effect of the dispersion parameter on the power and the sample size by varying its coefficient from small to large. In extensive numerical studies, the authors demonstrate that required sample size heavily depends on the dispersion parameter. Therefore, misusing a Poisson model for negative binomial data may easily lose power up to 20%, depending on the value of the dispersion parameter.

  15. Statistical methodology for discrete fracture model - including fracture size, orientation uncertainty together with intensity uncertainty and variability

    International Nuclear Information System (INIS)

    Darcel, C.; Davy, P.; Le Goc, R.; Dreuzy, J.R. de; Bour, O.

    2009-11-01

    Investigations led for several years at Laxemar and Forsmark reveal the large heterogeneity of geological formations and associated fracturing. This project aims at reinforcing the statistical DFN modeling framework adapted to a site scale. This leads therefore to develop quantitative methods of characterization adapted to the nature of fracturing and data availability. We start with the hypothesis that the maximum likelihood DFN model is a power-law model with a density term depending on orientations. This is supported both by literature and specifically here by former analyses of the SKB data. This assumption is nevertheless thoroughly tested by analyzing the fracture trace and lineament maps. Fracture traces range roughly between 0.5 m and 10 m - i e the usual extension of the sample outcrops. Between the raw data and final data used to compute the fracture size distribution from which the size distribution model will arise, several steps are necessary, in order to correct data from finite-size, topographical and sampling effects. More precisely, a particular attention is paid to fracture segmentation status and fracture linkage consistent with the DFN model expected. The fracture scaling trend observed over both sites displays finally a shape parameter k t close to 1.2 with a density term (α 2d ) between 1.4 and 1.8. Only two outcrops clearly display a different trend with k t close to 3 and a density term (α 2d ) between 2 and 3.5. The fracture lineaments spread over the range between 100 meters and a few kilometers. When compared with fracture trace maps, these datasets are already interpreted and the linkage process developed previously has not to be done. Except for the subregional lineament map from Forsmark, lineaments display a clear power-law trend with a shape parameter k t equal to 3 and a density term between 2 and 4.5. The apparent variation in scaling exponent, from the outcrop scale (k t = 1.2) on one side, to the lineament scale (k t = 2) on

  16. The impact of sample size and marker selection on the study of haplotype structures

    Directory of Open Access Journals (Sweden)

    Sun Xiao

    2004-03-01

    Full Text Available Abstract Several studies of haplotype structures in the human genome in various populations have found that the human chromosomes are structured such that each chromosome can be divided into many blocks, within which there is limited haplotype diversity. In addition, only a few genetic markers in a putative block are needed to capture most of the diversity within a block. There has been no systematic empirical study of the effects of sample size and marker set on the identified block structures and representative marker sets, however. The purpose of this study was to conduct a detailed empirical study to examine such impacts. Towards this goal, we have analysed three representative autosomal regions from a large genome-wide study of haplotypes with samples consisting of African-Americans and samples consisting of Japanese and Chinese individuals. For both populations, we have found that the sample size and marker set have significant impact on the number of blocks and the total number of representative markers identified. The marker set in particular has very strong impacts, and our results indicate that the marker density in the original datasets may not be adequate to allow a meaningful characterisation of haplotype structures. In general, we conclude that we need a relatively large sample size and a very dense marker panel in the study of haplotype structures in human populations.

  17. Statistical evaluation of the data obtained from the K East Basin Sandfilter Backwash Pit samples

    International Nuclear Information System (INIS)

    Welsh, T.L.

    1994-01-01

    Samples were obtained from different locations from the K Each Sandfilter Backwash Pit to characterize the sludge material. These samples were analyzed chemically for elements, radionuclides, and residual compounds. The analytical results were statistically analyzed to determine the mean analyte content and the associated variability for each mean value

  18. Random-effects linear modeling and sample size tables for two special crossover designs of average bioequivalence studies: the four-period, two-sequence, two-formulation and six-period, three-sequence, three-formulation designs.

    Science.gov (United States)

    Diaz, Francisco J; Berg, Michel J; Krebill, Ron; Welty, Timothy; Gidal, Barry E; Alloway, Rita; Privitera, Michael

    2013-12-01

    Due to concern and debate in the epilepsy medical community and to the current interest of the US Food and Drug Administration (FDA) in revising approaches to the approval of generic drugs, the FDA is currently supporting ongoing bioequivalence studies of antiepileptic drugs, the EQUIGEN studies. During the design of these crossover studies, the researchers could not find commercial or non-commercial statistical software that quickly allowed computation of sample sizes for their designs, particularly software implementing the FDA requirement of using random-effects linear models for the analyses of bioequivalence studies. This article presents tables for sample-size evaluations of average bioequivalence studies based on the two crossover designs used in the EQUIGEN studies: the four-period, two-sequence, two-formulation design, and the six-period, three-sequence, three-formulation design. Sample-size computations assume that random-effects linear models are used in bioequivalence analyses with crossover designs. Random-effects linear models have been traditionally viewed by many pharmacologists and clinical researchers as just mathematical devices to analyze repeated-measures data. In contrast, a modern view of these models attributes an important mathematical role in theoretical formulations in personalized medicine to them, because these models not only have parameters that represent average patients, but also have parameters that represent individual patients. Moreover, the notation and language of random-effects linear models have evolved over the years. Thus, another goal of this article is to provide a presentation of the statistical modeling of data from bioequivalence studies that highlights the modern view of these models, with special emphasis on power analyses and sample-size computations.

  19. Crystallite size variation of TiO_2 samples depending time heat treatment

    International Nuclear Information System (INIS)

    Galante, A.G.M.; Paula, F.R. de; Montanhera, M.A.; Pereira, E.A.; Spada, E.R.

    2016-01-01

    Titanium dioxide (TiO_2) is an oxide semiconductor that may be found in mixed phase or in distinct phases: brookite, anatase and rutile. In this work was carried out the study of the residence time influence at a given temperature in the TiO_2 powder physical properties. After the powder synthesis, the samples were divided and heat treated at 650 °C with a ramp up to 3 °C/min and a residence time ranging from 0 to 20 hours and subsequently characterized by x-ray diffraction. Analyzing the obtained diffraction patterns, it was observed that, from 5-hour residence time, began the two-distinct phase coexistence: anatase and rutile. It also calculated the average crystallite size of each sample. The results showed an increase in average crystallite size with increasing residence time of the heat treatment. (author)

  20. Discrimination of handlebar grip samples by fourier transform infrared microspectroscopy analysis and statistics

    Directory of Open Access Journals (Sweden)

    Zeyu Lin

    2017-01-01

    Full Text Available In this paper, the authors presented a study on the discrimination of handlebar grip samples, to provide effective forensic science service for hit and run traffic cases. 50 bicycle handlebar grip samples, 49 electric bike handlebar grip samples, and 96 motorcycle handlebar grip samples have been randomly collected by the local police in Beijing (China. Fourier transform infrared microspectroscopy (FTIR was utilized as analytical technology. Then, target absorption selection, data pretreatment, and discrimination of linked samples and unlinked samples were chosen as three steps to improve the discrimination of FTIR spectrums collected from different handlebar grip samples. Principal component analysis and receiver operating characteristic curve were utilized to evaluate different data selection methods and different data pretreatment methods, respectively. It is possible to explore the evidential value of handlebar grip residue evidence through instrumental analysis and statistical treatments. It will provide a universal discrimination method for other forensic science samples as well.

  1. Unbiased tensor-based morphometry: Improved robustness and sample size estimates for Alzheimer’s disease clinical trials

    Science.gov (United States)

    Hua, Xue; Hibar, Derrek P.; Ching, Christopher R.K.; Boyle, Christina P.; Rajagopalan, Priya; Gutman, Boris A.; Leow, Alex D.; Toga, Arthur W.; Jack, Clifford R.; Harvey, Danielle; Weiner, Michael W.; Thompson, Paul M.

    2013-01-01

    Various neuroimaging measures are being evaluated for tracking Alzheimer’s disease (AD) progression in therapeutic trials, including measures of structural brain change based on repeated scanning of patients with magnetic resonance imaging (MRI). Methods to compute brain change must be robust to scan quality. Biases may arise if any scans are thrown out, as this can lead to the true changes being overestimated or underestimated. Here we analyzed the full MRI dataset from the first phase of Alzheimer’s Disease Neuroimaging Initiative (ADNI-1) from the first phase of Alzheimer’s Disease Neuroimaging Initiative (ADNI-1) and assessed several sources of bias that can arise when tracking brain changes with structural brain imaging methods, as part of a pipeline for tensor-based morphometry (TBM). In all healthy subjects who completed MRI scanning at screening, 6, 12, and 24 months, brain atrophy was essentially linear with no detectable bias in longitudinal measures. In power analyses for clinical trials based on these change measures, only 39 AD patients and 95 mild cognitive impairment (MCI) subjects were needed for a 24-month trial to detect a 25% reduction in the average rate of change using a two-sided test (α=0.05, power=80%). Further sample size reductions were achieved by stratifying the data into Apolipoprotein E (ApoE) ε4 carriers versus non-carriers. We show how selective data exclusion affects sample size estimates, motivating an objective comparison of different analysis techniques based on statistical power and robustness. TBM is an unbiased, robust, high-throughput imaging surrogate marker for large, multi-site neuroimaging studies and clinical trials of AD and MCI. PMID:23153970

  2. PIXE–PIGE analysis of size-segregated aerosol samples from remote areas

    Energy Technology Data Exchange (ETDEWEB)

    Calzolai, G., E-mail: calzolai@fi.infn.it [Department of Physics and Astronomy, University of Florence and National Institute of Nuclear Physics (INFN), Via G. Sansone 1, 50019 Sesto Fiorentino (Italy); Chiari, M.; Lucarelli, F.; Nava, S.; Taccetti, F. [Department of Physics and Astronomy, University of Florence and National Institute of Nuclear Physics (INFN), Via G. Sansone 1, 50019 Sesto Fiorentino (Italy); Becagli, S.; Frosini, D.; Traversi, R.; Udisti, R. [Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino (Italy)

    2014-01-01

    The chemical characterization of size-segregated samples is helpful to study the aerosol effects on both human health and environment. The sampling with multi-stage cascade impactors (e.g., Small Deposit area Impactor, SDI) produces inhomogeneous samples, with a multi-spot geometry and a non-negligible particle stratification. At LABEC (Laboratory of nuclear techniques for the Environment and the Cultural Heritage), an external beam line is fully dedicated to PIXE–PIGE analysis of aerosol samples. PIGE is routinely used as a sidekick of PIXE to correct the underestimation of PIXE in quantifying the concentration of the lightest detectable elements, like Na or Al, due to X-ray absorption inside the individual aerosol particles. In this work PIGE has been used to study proper attenuation correction factors for SDI samples: relevant attenuation effects have been observed also for stages collecting smaller particles, and consequent implications on the retrieved aerosol modal structure have been evidenced.

  3. The one-sample PARAFAC approach reveals molecular size distributions of fluorescent components in dissolved organic matter

    DEFF Research Database (Denmark)

    Wünsch, Urban; Murphy, Kathleen R.; Stedmon, Colin

    2017-01-01

    Molecular size plays an important role in dissolved organic matter (DOM) biogeochemistry, but its relationship with the fluorescent fraction of DOM (FDOM) remains poorly resolved. Here high-performance size exclusion chromatography (HPSEC) was coupled to fluorescence emission-excitation (EEM...... but not their spectral properties. Thus, in contrast to absorption measurements, bulk fluorescence is unlikely to reliably indicate the average molecular size of DOM. The one-sample approach enables robust and independent cross-site comparisons without large-scale sampling efforts and introduces new analytical...... opportunities for elucidating the origins and biogeochemical properties of FDOM...

  4. Algorithm for computing significance levels using the Kolmogorov-Smirnov statistic and valid for both large and small samples

    Energy Technology Data Exchange (ETDEWEB)

    Kurtz, S.E.; Fields, D.E.

    1983-10-01

    The KSTEST code presented here is designed to perform the Kolmogorov-Smirnov one-sample test. The code may be used as a stand-alone program or the principal subroutines may be excerpted and used to service other programs. The Kolmogorov-Smirnov one-sample test is a nonparametric goodness-of-fit test. A number of codes to perform this test are in existence, but they suffer from the inability to provide meaningful results in the case of small sample sizes (number of values less than or equal to 80). The KSTEST code overcomes this inadequacy by using two distinct algorithms. If the sample size is greater than 80, an asymptotic series developed by Smirnov is evaluated. If the sample size is 80 or less, a table of values generated by Birnbaum is referenced. Valid results can be obtained from KSTEST when the sample contains from 3 to 300 data points. The program was developed on a Digital Equipment Corporation PDP-10 computer using the FORTRAN-10 language. The code size is approximately 450 card images and the typical CPU execution time is 0.19 s.

  5. A NEW STATISTICAL PERSPECTIVE TO THE COSMIC VOID DISTRIBUTION

    International Nuclear Information System (INIS)

    Pycke, J-R; Russell, E.

    2016-01-01

    In this study, we obtain the size distribution of voids as a three-parameter redshift-independent log-normal void probability function (VPF) directly from the Cosmic Void Catalog (CVC). Although many statistical models of void distributions are based on the counts in randomly placed cells, the log-normal VPF that we obtain here is independent of the shape of the voids due to the parameter-free void finder of the CVC. We use three void populations drawn from the CVC generated by the Halo Occupation Distribution (HOD) Mocks, which are tuned to three mock SDSS samples to investigate the void distribution statistically and to investigate the effects of the environments on the size distribution. As a result, it is shown that void size distributions obtained from the HOD Mock samples are satisfied by the three-parameter log-normal distribution. In addition, we find that there may be a relation between the hierarchical formation, skewness, and kurtosis of the log-normal distribution for each catalog. We also show that the shape of the three-parameter distribution from the samples is strikingly similar to the galaxy log-normal mass distribution obtained from numerical studies. This similarity between void size and galaxy mass distributions may possibly indicate evidence of nonlinear mechanisms affecting both voids and galaxies, such as large-scale accretion and tidal effects. Considering the fact that in this study, all voids are generated by galaxy mocks and show hierarchical structures in different levels, it may be possible that the same nonlinear mechanisms of mass distribution affect the void size distribution.

  6. 14CO2 analysis of soil gas: Evaluation of sample size limits and sampling devices

    Science.gov (United States)

    Wotte, Anja; Wischhöfer, Philipp; Wacker, Lukas; Rethemeyer, Janet

    2017-12-01

    Radiocarbon (14C) analysis of CO2 respired from soils or sediments is a valuable tool to identify different carbon sources. The collection and processing of the CO2, however, is challenging and prone to contamination. We thus continuously improve our handling procedures and present a refined method for the collection of even small amounts of CO2 in molecular sieve cartridges (MSCs) for accelerator mass spectrometry 14C analysis. Using a modified vacuum rig and an improved desorption procedure, we were able to increase the CO2 recovery from the MSC (95%) as well as the sample throughput compared to our previous study. By processing series of different sample size, we show that our MSCs can be used for CO2 samples of as small as 50 μg C. The contamination by exogenous carbon determined in these laboratory tests, was less than 2.0 μg C from fossil and less than 3.0 μg C from modern sources. Additionally, we tested two sampling devices for the collection of CO2 samples released from soils or sediments, including a respiration chamber and a depth sampler, which are connected to the MSC. We obtained a very promising, low process blank for the entire CO2 sampling and purification procedure of ∼0.004 F14C (equal to 44,000 yrs BP) and ∼0.003 F14C (equal to 47,000 yrs BP). In contrast to previous studies, we observed no isotopic fractionation towards lighter δ13C values during the passive sampling with the depth samplers.

  7. Statistical Inference on Stochastic Dominance Efficiency. Do Omitted Risk Factors Explain the Size and Book-to-Market Effects?

    NARCIS (Netherlands)

    G.T. Post (Thierry)

    2003-01-01

    textabstractThis paper discusses statistical inference on the second-order stochastic dominance (SSD) efficiency of a given portfolio relative to all portfolios formed from a set of assets. We derive the asymptotic sampling distribution of the Post test statistic for SSD efficiency. Unfortunately, a

  8. The attention-weighted sample-size model of visual short-term memory: Attention capture predicts resource allocation and memory load.

    Science.gov (United States)

    Smith, Philip L; Lilburn, Simon D; Corbett, Elaine A; Sewell, David K; Kyllingsbæk, Søren

    2016-09-01

    We investigated the capacity of visual short-term memory (VSTM) in a phase discrimination task that required judgments about the configural relations between pairs of black and white features. Sewell et al. (2014) previously showed that VSTM capacity in an orientation discrimination task was well described by a sample-size model, which views VSTM as a resource comprised of a finite number of noisy stimulus samples. The model predicts the invariance of [Formula: see text] , the sum of squared sensitivities across items, for displays of different sizes. For phase discrimination, the set-size effect significantly exceeded that predicted by the sample-size model for both simultaneously and sequentially presented stimuli. Instead, the set-size effect and the serial position curves with sequential presentation were predicted by an attention-weighted version of the sample-size model, which assumes that one of the items in the display captures attention and receives a disproportionate share of resources. The choice probabilities and response time distributions from the task were well described by a diffusion decision model in which the drift rates embodied the assumptions of the attention-weighted sample-size model. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  9. Statistical sampling plan for the TRU waste assay facility

    International Nuclear Information System (INIS)

    Beauchamp, J.J.; Wright, T.; Schultz, F.J.; Haff, K.; Monroe, R.J.

    1983-08-01

    Due to limited space, there is a need to dispose appropriately of the Oak Ridge National Laboratory transuranic waste which is presently stored below ground in 55-gal (208-l) drums within weather-resistant structures. Waste containing less than 100 nCi/g transuranics can be removed from the present storage and be buried, while waste containing greater than 100 nCi/g transuranics must continue to be retrievably stored. To make the necessary measurements needed to determine the drums that can be buried, a transuranic Neutron Interrogation Assay System (NIAS) has been developed at Los Alamos National Laboratory and can make the needed measurements much faster than previous techniques which involved γ-ray spectroscopy. The previous techniques are reliable but time consuming. Therefore, a validation study has been planned to determine the ability of the NIAS to make adequate measurements. The validation of the NIAS will be based on a paired comparison of a sample of measurements made by the previous techniques and the NIAS. The purpose of this report is to describe the proposed sampling plan and the statistical analyses needed to validate the NIAS. 5 references, 4 figures, 5 tables

  10. Statistical methods for detecting differentially abundant features in clinical metagenomic samples.

    Directory of Open Access Journals (Sweden)

    James Robert White

    2009-04-01

    Full Text Available Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them.We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software

  11. Exact association test for small size sequencing data.

    Science.gov (United States)

    Lee, Joowon; Lee, Seungyeoun; Jang, Jin-Young; Park, Taesung

    2018-04-20

    Recent statistical methods for next generation sequencing (NGS) data have been successfully applied to identifying rare genetic variants associated with certain diseases. However, most commonly used methods (e.g., burden tests and variance-component tests) rely on large sample sizes. Notwithstanding, due to its-still high cost, NGS data is generally restricted to small sample sizes, that cannot be analyzed by most existing methods. In this work, we propose a new exact association test for sequencing data that does not require a large sample approximation, which is applicable to both common and rare variants. Our method, based on the Generalized Cochran-Mantel-Haenszel (GCMH) statistic, was applied to NGS datasets from intraductal papillary mucinous neoplasm (IPMN) patients. IPMN is a unique pancreatic cancer subtype that can turn into an invasive and hard-to-treat metastatic disease. Application of our method to IPMN data successfully identified susceptible genes associated with progression of IPMN to pancreatic cancer. Our method is expected to identify disease-associated genetic variants more successfully, and corresponding signal pathways, improving our understanding of specific disease's etiology and prognosis.

  12. A basic introduction to statistics for the orthopaedic surgeon.

    Science.gov (United States)

    Bertrand, Catherine; Van Riet, Roger; Verstreken, Frederik; Michielsen, Jef

    2012-02-01

    Orthopaedic surgeons should review the orthopaedic literature in order to keep pace with the latest insights and practices. A good understanding of basic statistical principles is of crucial importance to the ability to read articles critically, to interpret results and to arrive at correct conclusions. This paper explains some of the key concepts in statistics, including hypothesis testing, Type I and Type II errors, testing of normality, sample size and p values.

  13. Statistical and quantitative research

    International Nuclear Information System (INIS)

    Anon.

    1984-01-01

    Environmental impacts may escape detection if the statistical tests used to analyze data from field studies are inadequate or the field design is not appropriate. To alleviate this problem, PNL scientists are doing theoretical research which will provide the basis for new sampling schemes or better methods to analyze and present data. Such efforts have resulted in recommendations about the optimal size of study plots, sampling intensity, field replication, and program duration. Costs associated with any of these factors can be substantial if, for example, attention is not paid to the adequacy of a sampling scheme. In the study of dynamics of large-mammal populations, the findings are sometimes surprising. For example, the survival of a grizzly bear population may hinge on the loss of one or two adult females per year

  14. A note on power and sample size calculations for the Kruskal-Wallis test for ordered categorical data.

    Science.gov (United States)

    Fan, Chunpeng; Zhang, Donghui

    2012-01-01

    Although the Kruskal-Wallis test has been widely used to analyze ordered categorical data, power and sample size methods for this test have been investigated to a much lesser extent when the underlying multinomial distributions are unknown. This article generalizes the power and sample size procedures proposed by Fan et al. ( 2011 ) for continuous data to ordered categorical data, when estimates from a pilot study are used in the place of knowledge of the true underlying distribution. Simulations show that the proposed power and sample size formulas perform well. A myelin oligodendrocyte glycoprotein (MOG) induced experimental autoimmunce encephalomyelitis (EAE) mouse study is used to demonstrate the application of the methods.

  15. Gridsampler – A Simulation Tool to Determine the Required Sample Size for Repertory Grid Studies

    Directory of Open Access Journals (Sweden)

    Mark Heckmann

    2017-01-01

    Full Text Available The repertory grid is a psychological data collection technique that is used to elicit qualitative data in the form of attributes as well as quantitative ratings. A common approach for evaluating multiple repertory grid data is sorting the elicited bipolar attributes (so called constructs into mutually exclusive categories by means of content analysis. An important question when planning this type of study is determining the sample size needed to a discover all attribute categories relevant to the field and b yield a predefined minimal number of attributes per category. For most applied researchers who collect multiple repertory grid data, programming a numeric simulation to answer these questions is not feasible. The gridsampler software facilitates determining the required sample size by providing a GUI for conducting the necessary numerical simulations. Researchers can supply a set of parameters suitable for the specific research situation, determine the required sample size, and easily explore the effects of changes in the parameter set.

  16. Anomalies in the detection of change: When changes in sample size are mistaken for changes in proportions.

    Science.gov (United States)

    Fiedler, Klaus; Kareev, Yaakov; Avrahami, Judith; Beier, Susanne; Kutzner, Florian; Hütter, Mandy

    2016-01-01

    Detecting changes, in performance, sales, markets, risks, social relations, or public opinions, constitutes an important adaptive function. In a sequential paradigm devised to investigate detection of change, every trial provides a sample of binary outcomes (e.g., correct vs. incorrect student responses). Participants have to decide whether the proportion of a focal feature (e.g., correct responses) in the population from which the sample is drawn has decreased, remained constant, or increased. Strong and persistent anomalies in change detection arise when changes in proportional quantities vary orthogonally to changes in absolute sample size. Proportional increases are readily detected and nonchanges are erroneously perceived as increases when absolute sample size increases. Conversely, decreasing sample size facilitates the correct detection of proportional decreases and the erroneous perception of nonchanges as decreases. These anomalies are however confined to experienced samples of elementary raw events from which proportions have to be inferred inductively. They disappear when sample proportions are described as percentages in a normalized probability format. To explain these challenging findings, it is essential to understand the inductive-learning constraints imposed on decisions from experience.

  17. On sample size of the kruskal-wallis test with application to a mouse peritoneal cavity study.

    Science.gov (United States)

    Fan, Chunpeng; Zhang, Donghui; Zhang, Cun-Hui

    2011-03-01

    As the nonparametric generalization of the one-way analysis of variance model, the Kruskal-Wallis test applies when the goal is to test the difference between multiple samples and the underlying population distributions are nonnormal or unknown. Although the Kruskal-Wallis test has been widely used for data analysis, power and sample size methods for this test have been investigated to a much lesser extent. This article proposes new power and sample size calculation methods for the Kruskal-Wallis test based on the pilot study in either a completely nonparametric model or a semiparametric location model. No assumption is made on the shape of the underlying population distributions. Simulation results show that, in terms of sample size calculation for the Kruskal-Wallis test, the proposed methods are more reliable and preferable to some more traditional methods. A mouse peritoneal cavity study is used to demonstrate the application of the methods. © 2010, The International Biometric Society.

  18. A random-sum Wilcoxon statistic and its application to analysis of ROC and LROC data.

    Science.gov (United States)

    Tang, Liansheng Larry; Balakrishnan, N

    2011-01-01

    The Wilcoxon-Mann-Whitney statistic is commonly used for a distribution-free comparison of two groups. One requirement for its use is that the sample sizes of the two groups are fixed. This is violated in some of the applications such as medical imaging studies and diagnostic marker studies; in the former, the violation occurs since the number of correctly localized abnormal images is random, while in the latter the violation is due to some subjects not having observable measurements. For this reason, we propose here a random-sum Wilcoxon statistic for comparing two groups in the presence of ties, and derive its variance as well as its asymptotic distribution for large sample sizes. The proposed statistic includes the regular Wilcoxon rank-sum statistic. Finally, we apply the proposed statistic for summarizing location response operating characteristic data from a liver computed tomography study, and also for summarizing diagnostic accuracy of biomarker data.

  19. A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.

    Science.gov (United States)

    Lai, En-Yu; Chen, Yi-Hau; Wu, Kun-Pin

    2017-06-01

    Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https

  20. Statistical assessment of fish behavior from split-beam hydro-acoustic sampling

    International Nuclear Information System (INIS)

    McKinstry, Craig A.; Simmons, Mary Ann; Simmons, Carver S.; Johnson, Robert L.

    2005-01-01

    Statistical methods are presented for using echo-traces from split-beam hydro-acoustic sampling to assess fish behavior in response to a stimulus. The data presented are from a study designed to assess the response of free-ranging, lake-resident fish, primarily kokanee (Oncorhynchus nerka) and rainbow trout (Oncorhynchus mykiss) to high intensity strobe lights, and was conducted at Grand Coulee Dam on the Columbia River in Northern Washington State. The lights were deployed immediately upstream from the turbine intakes, in a region exposed to daily alternating periods of high and low flows. The study design included five down-looking split-beam transducers positioned in a line at incremental distances upstream from the strobe lights, and treatments applied in randomized pseudo-replicate blocks. Statistical methods included the use of odds-ratios from fitted loglinear models. Fish-track velocity vectors were modeled using circular probability distributions. Both analyses are depicted graphically. Study results suggest large increases of fish activity in the presence of the strobe lights, most notably at night and during periods of low flow. The lights also induced notable bimodality in the angular distributions of the fish track velocity vectors. Statistical/SUMmaries are presented along with interpretations on fish behavior

  1. Statistical inference involving binomial and negative binomial parameters.

    Science.gov (United States)

    García-Pérez, Miguel A; Núñez-Antón, Vicente

    2009-05-01

    Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.

  2. Scalability on LHS (Latin Hypercube Sampling) samples for use in uncertainty analysis of large numerical models

    International Nuclear Information System (INIS)

    Baron, Jorge H.; Nunez Mac Leod, J.E.

    2000-01-01

    The present paper deals with the utilization of advanced sampling statistical methods to perform uncertainty and sensitivity analysis on numerical models. Such models may represent physical phenomena, logical structures (such as boolean expressions) or other systems, and various of their intrinsic parameters and/or input variables are usually treated as random variables simultaneously. In the present paper a simple method to scale-up Latin Hypercube Sampling (LHS) samples is presented, starting with a small sample and duplicating its size at each step, making it possible to use the already run numerical model results with the smaller sample. The method does not distort the statistical properties of the random variables and does not add any bias to the samples. The results is a significant reduction in numerical models running time can be achieved (by re-using the previously run samples), keeping all the advantages of LHS, until an acceptable representation level is achieved in the output variables. (author)

  3. An easy and low cost option for economic statistical process control ...

    African Journals Online (AJOL)

    a large number of nonconforming products are manufactured. ... size, n, sampling interval, h, and control limit parameter, k, that minimize the ...... [11] Montgomery DC, 2001, Introduction to statistical quality control, 4th Edition, John Wiley, New.

  4. Selecting the optimum plot size for a California design-based stream and wetland mapping program.

    Science.gov (United States)

    Lackey, Leila G; Stein, Eric D

    2014-04-01

    Accurate estimates of the extent and distribution of wetlands and streams are the foundation of wetland monitoring, management, restoration, and regulatory programs. Traditionally, these estimates have relied on comprehensive mapping. However, this approach is prohibitively resource-intensive over large areas, making it both impractical and statistically unreliable. Probabilistic (design-based) approaches to evaluating status and trends provide a more cost-effective alternative because, compared with comprehensive mapping, overall extent is inferred from mapping a statistically representative, randomly selected subset of the target area. In this type of design, the size of sample plots has a significant impact on program costs and on statistical precision and accuracy; however, no consensus exists on the appropriate plot size for remote monitoring of stream and wetland extent. This study utilized simulated sampling to assess the performance of four plot sizes (1, 4, 9, and 16 km(2)) for three geographic regions of California. Simulation results showed smaller plot sizes (1 and 4 km(2)) were most efficient for achieving desired levels of statistical accuracy and precision. However, larger plot sizes were more likely to contain rare and spatially limited wetland subtypes. Balancing these considerations led to selection of 4 km(2) for the California status and trends program.

  5. Use of Statistics for Data Evaluation in Environmental Radioactivity Measurements

    International Nuclear Information System (INIS)

    Sutarman

    2001-01-01

    Counting statistics will give a correction on environmental radioactivity measurement result. Statistics provides formulas to determine standard deviation (S B ) and minimum detectable concentration (MDC) according to the Poisson distribution. Both formulas depend on the background count rate, counting time, counting efficiency, gamma intensity, and sample size. A long time background counting results in relatively low S B and MDC that can present relatively accurate measurement results. (author)

  6. Some challenges with statistical inference in adaptive designs.

    Science.gov (United States)

    Hung, H M James; Wang, Sue-Jane; Yang, Peiling

    2014-01-01

    Adaptive designs have generated a great deal of attention to clinical trial communities. The literature contains many statistical methods to deal with added statistical uncertainties concerning the adaptations. Increasingly encountered in regulatory applications are adaptive statistical information designs that allow modification of sample size or related statistical information and adaptive selection designs that allow selection of doses or patient populations during the course of a clinical trial. For adaptive statistical information designs, a few statistical testing methods are mathematically equivalent, as a number of articles have stipulated, but arguably there are large differences in their practical ramifications. We pinpoint some undesirable features of these methods in this work. For adaptive selection designs, the selection based on biomarker data for testing the correlated clinical endpoints may increase statistical uncertainty in terms of type I error probability, and most importantly the increased statistical uncertainty may be impossible to assess.

  7. Statistical methodology for discrete fracture model - including fracture size, orientation uncertainty together with intensity uncertainty and variability

    Energy Technology Data Exchange (ETDEWEB)

    Darcel, C. (Itasca Consultants SAS (France)); Davy, P.; Le Goc, R.; Dreuzy, J.R. de; Bour, O. (Geosciences Rennes, UMR 6118 CNRS, Univ. def Rennes, Rennes (France))

    2009-11-15

    Investigations led for several years at Laxemar and Forsmark reveal the large heterogeneity of geological formations and associated fracturing. This project aims at reinforcing the statistical DFN modeling framework adapted to a site scale. This leads therefore to develop quantitative methods of characterization adapted to the nature of fracturing and data availability. We start with the hypothesis that the maximum likelihood DFN model is a power-law model with a density term depending on orientations. This is supported both by literature and specifically here by former analyses of the SKB data. This assumption is nevertheless thoroughly tested by analyzing the fracture trace and lineament maps. Fracture traces range roughly between 0.5 m and 10 m - i e the usual extension of the sample outcrops. Between the raw data and final data used to compute the fracture size distribution from which the size distribution model will arise, several steps are necessary, in order to correct data from finite-size, topographical and sampling effects. More precisely, a particular attention is paid to fracture segmentation status and fracture linkage consistent with the DFN model expected. The fracture scaling trend observed over both sites displays finally a shape parameter k{sub t} close to 1.2 with a density term (alpha{sub 2d}) between 1.4 and 1.8. Only two outcrops clearly display a different trend with k{sub t} close to 3 and a density term (alpha{sub 2d}) between 2 and 3.5. The fracture lineaments spread over the range between 100 meters and a few kilometers. When compared with fracture trace maps, these datasets are already interpreted and the linkage process developed previously has not to be done. Except for the subregional lineament map from Forsmark, lineaments display a clear power-law trend with a shape parameter k{sub t} equal to 3 and a density term between 2 and 4.5. The apparent variation in scaling exponent, from the outcrop scale (k{sub t} = 1.2) on one side, to

  8. Sparse Power-Law Network Model for Reliable Statistical Predictions Based on Sampled Data

    Directory of Open Access Journals (Sweden)

    Alexander P. Kartun-Giles

    2018-04-01

    Full Text Available A projective network model is a model that enables predictions to be made based on a subsample of the network data, with the predictions remaining unchanged if a larger sample is taken into consideration. An exchangeable model is a model that does not depend on the order in which nodes are sampled. Despite a large variety of non-equilibrium (growing and equilibrium (static sparse complex network models that are widely used in network science, how to reconcile sparseness (constant average degree with the desired statistical properties of projectivity and exchangeability is currently an outstanding scientific problem. Here we propose a network process with hidden variables which is projective and can generate sparse power-law networks. Despite the model not being exchangeable, it can be closely related to exchangeable uncorrelated networks as indicated by its information theory characterization and its network entropy. The use of the proposed network process as a null model is here tested on real data, indicating that the model offers a promising avenue for statistical network modelling.

  9. Atmospheric aerosol sampling campaign in Budapest and K-puszta. Part 1. Elemental concentrations and size distributions

    International Nuclear Information System (INIS)

    Dobos, E.; Borbely-Kiss, I.; Kertesz, Zs.; Szabo, Gy.; Salma, I.

    2004-01-01

    Complete text of publication follows. Atmospheric aerosol samples were collected in a sampling campaign from 24 July to 1 Au- gust, 2003 in Hungary. The sampling were performed in two places simultaneously: in Budapest (urban site) and K-puszta (remote area). Two PIXE International 7-stage cascade impactors were used for aerosol sampling with 24 hours duration. These impactors separate the aerosol into 7 size ranges. The elemental concentrations of the samples were obtained by proton-induced X-ray Emission (PIXE) analysis. Size distributions of S, Si, Ca, W, Zn, Pb and Fe elements were investigated in K-puszta and in Budapest. Average rates (shown in Table 1) of the elemental concentrations was calculated for each stage (in %) from the obtained distributions. The elements can be grouped into two parts on the basis of these data. The majority of the particle containing Fe, Si, Ca, (Ti) are in the 2-8 μm size range (first group). These soil origin elements were found usually in higher concentration in Budapest than in K-puszta (Fig.1.). The second group consisted of S, Pb and (W). The majority of these elements was found in the 0.25-1 μm size range and was much higher in Budapest than in K-puszta. W was measured only in samples collected in Budapest. Zn has uniform distribution in Budapest and does not belong to the above mentioned groups. This work was supported by the National Research and Development Program (NRDP 3/005/2001). (author)

  10. Size Distributions and Characterization of Native and Ground Samples for Toxicology Studies

    Science.gov (United States)

    McKay, David S.; Cooper, Bonnie L.; Taylor, Larry A.

    2010-01-01

    This slide presentation shows charts and graphs that review the particle size distribution and characterization of natural and ground samples for toxicology studies. There are graphs which show the volume distribution versus the number distribution for natural occurring dust, jet mill ground dust, and ball mill ground dust.

  11. Size Matters: Assessing Optimum Soil Sample Size for Fungal and Bacterial Community Structure Analyses Using High Throughput Sequencing of rRNA Gene Amplicons

    Directory of Open Access Journals (Sweden)

    Christopher Ryan Penton

    2016-06-01

    Full Text Available We examined the effect of different soil sample sizes obtained from an agricultural field, under a single cropping system uniform in soil properties and aboveground crop responses, on bacterial and fungal community structure and microbial diversity indices. DNA extracted from soil sample sizes of 0.25, 1, 5 and 10 g using MoBIO kits and from 10 and 100 g sizes using a bead-beating method (SARDI were used as templates for high-throughput sequencing of 16S and 28S rRNA gene amplicons for bacteria and fungi, respectively, on the Illumina MiSeq and Roche 454 platforms. Sample size significantly affected overall bacterial and fungal community structure, replicate dispersion and the number of operational taxonomic units (OTUs retrieved. Richness, evenness and diversity were also significantly affected. The largest diversity estimates were always associated with the 10 g MoBIO extractions with a corresponding reduction in replicate dispersion. For the fungal data, smaller MoBIO extractions identified more unclassified Eukaryota incertae sedis and unclassified glomeromycota while the SARDI method retrieved more abundant OTUs containing unclassified Pleosporales and the fungal genera Alternaria and Cercophora. Overall, these findings indicate that a 10 g soil DNA extraction is most suitable for both soil bacterial and fungal communities for retrieving optimal diversity while still capturing rarer taxa in concert with decreasing replicate variation.

  12. Experimental toxicology: Issues of statistics, experimental design, and replication.

    Science.gov (United States)

    Briner, Wayne; Kirwan, Jeral

    2017-01-01

    The difficulty of replicating experiments has drawn considerable attention. Issues with replication occur for a variety of reasons ranging from experimental design to laboratory errors to inappropriate statistical analysis. Here we review a variety of guidelines for statistical analysis, design, and execution of experiments in toxicology. In general, replication can be improved by using hypothesis driven experiments with adequate sample sizes, randomization, and blind data collection techniques. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Empirical Statistical Power for Testing Multilocus Genotypic Effects under Unbalanced Designs Using a Gibbs Sampler

    Directory of Open Access Journals (Sweden)

    Chaeyoung Lee

    2012-11-01

    Full Text Available Epistasis that may explain a large portion of the phenotypic variation for complex economic traits of animals has been ignored in many genetic association studies. A Baysian method was introduced to draw inferences about multilocus genotypic effects based on their marginal posterior distributions by a Gibbs sampler. A simulation study was conducted to provide statistical powers under various unbalanced designs by using this method. Data were simulated by combined designs of number of loci, within genotype variance, and sample size in unbalanced designs with or without null combined genotype cells. Mean empirical statistical power was estimated for testing posterior mean estimate of combined genotype effect. A practical example for obtaining empirical statistical power estimates with a given sample size was provided under unbalanced designs. The empirical statistical powers would be useful for determining an optimal design when interactive associations of multiple loci with complex phenotypes were examined.

  14. Evaluating sampling strategy for DNA barcoding study of coastal and inland halo-tolerant Poaceae and Chenopodiaceae: A case study for increased sample size.

    Directory of Open Access Journals (Sweden)

    Peng-Cheng Yao

    Full Text Available Environmental conditions in coastal salt marsh habitats have led to the development of specialist genetic adaptations. We evaluated six DNA barcode loci of the 53 species of Poaceae and 15 species of Chenopodiaceae from China's coastal salt marsh area and inland area. Our results indicate that the optimum DNA barcode was ITS for coastal salt-tolerant Poaceae and matK for the Chenopodiaceae. Sampling strategies for ten common species of Poaceae and Chenopodiaceae were analyzed according to optimum barcode. We found that by increasing the number of samples collected from the coastal salt marsh area on the basis of inland samples, the number of haplotypes of Arundinella hirta, Digitaria ciliaris, Eleusine indica, Imperata cylindrica, Setaria viridis, and Chenopodium glaucum increased, with a principal coordinate plot clearly showing increased distribution points. The results of a Mann-Whitney test showed that for Digitaria ciliaris, Eleusine indica, Imperata cylindrica, and Setaria viridis, the distribution of intraspecific genetic distances was significantly different when samples from the coastal salt marsh area were included (P < 0.01. These results suggest that increasing the sample size in specialist habitats can improve measurements of intraspecific genetic diversity, and will have a positive effect on the application of the DNA barcodes in widely distributed species. The results of random sampling showed that when sample size reached 11 for Chloris virgata, Chenopodium glaucum, and Dysphania ambrosioides, 13 for Setaria viridis, and 15 for Eleusine indica, Imperata cylindrica and Chenopodium album, average intraspecific distance tended to reach stability. These results indicate that the sample size for DNA barcode of globally distributed species should be increased to 11-15.

  15. Sample size matters in dietary gene expression studies—A case study in the gilthead sea bream (Sparus aurata L.

    Directory of Open Access Journals (Sweden)

    Fotini Kokou

    2016-05-01

    Full Text Available One of the main concerns in gene expression studies is the calculation of statistical significance which in most cases remains low due to limited sample size. Increasing biological replicates translates into more effective gains in power which, especially in nutritional experiments, is of great importance as individual variation of growth performance parameters and feed conversion is high. The present study investigates in the gilthead sea bream Sparus aurata, one of the most important Mediterranean aquaculture species. For 24 gilthead sea bream individuals (biological replicates the effects of gradual substitution of fish meal by plant ingredients (0% (control, 25%, 50% and 75% in the diets were studied by looking at expression levels of four immune-and stress-related genes in intestine, head kidney and liver. The present results showed that only the lowest substitution percentage is tolerated and that liver is the most sensitive tissue to detect gene expression variations in relation to fish meal substituted diets. Additionally the usage of three independent biological replicates were evaluated by calculating the averages of all possible triplets in order to assess the suitability of selected genes for stress indication as well as the impact of the experimental set up, thus in the present work the impact of FM substitution. Gene expression was altered depending of the selected biological triplicate. Only for two genes in liver (hsp70 and tgf significant differential expression was assured independently of the triplicates used. These results underlined the importance of choosing the adequate sample number especially when significant, but minor differences in gene expression levels are observed. Keywords: Sample size, Gene expression, Fish meal replacement, Immune response, Gilthead sea bream

  16. Determining Sample Size with a Given Range of Mean Effects in One-Way Heteroscedastic Analysis of Variance

    Science.gov (United States)

    Shieh, Gwowen; Jan, Show-Li

    2013-01-01

    The authors examined 2 approaches for determining the required sample size of Welch's test for detecting equality of means when the greatest difference between any 2 group means is given. It is shown that the actual power obtained with the sample size of the suggested approach is consistently at least as great as the nominal power. However, the…

  17. Finite-size fluctuations and photon statistics near the polariton condensation transition in a single-mode microcavity

    International Nuclear Information System (INIS)

    Eastham, P. R.; Littlewood, P. B.

    2006-01-01

    We consider polariton condensation in a generalized Dicke model, describing a single-mode cavity containing quantum dots, and extend our previous mean-field theory to allow for finite-size fluctuations. Within the fluctuation-dominated regime the correlation functions differ from their (trivial) mean-field values. We argue that the low-energy physics of the model, which determines the photon statistics in this fluctuation-dominated crossover regime, is that of the (quantum) anharmonic oscillator. The photon statistics at the crossover are different in the high-temperature and low-temperature limits. When the temperature is high enough for quantum effects to be neglected we recover behavior similar to that of a conventional laser. At low enough temperatures, however, we find qualitatively different behavior due to quantum effects

  18. Statistical Methods in Assembly Quality Management of Multi-Element Products on Automatic Rotor Lines

    Science.gov (United States)

    Pries, V. V.; Proskuriakov, N. E.

    2018-04-01

    To control the assembly quality of multi-element mass-produced products on automatic rotor lines, control methods with operational feedback are required. However, due to possible failures in the operation of the devices and systems of automatic rotor line, there is always a real probability of getting defective (incomplete) products into the output process stream. Therefore, a continuous sampling control of the products completeness, based on the use of statistical methods, remains an important element in managing the quality of assembly of multi-element mass products on automatic rotor lines. The feature of continuous sampling control of the multi-element products completeness in the assembly process is its breaking sort, which excludes the possibility of returning component parts after sampling control to the process stream and leads to a decrease in the actual productivity of the assembly equipment. Therefore, the use of statistical procedures for continuous sampling control of the multi-element products completeness when assembled on automatic rotor lines requires the use of such sampling plans that ensure a minimum size of control samples. Comparison of the values of the limit of the average output defect level for the continuous sampling plan (CSP) and for the automated continuous sampling plan (ACSP) shows the possibility of providing lower limit values for the average output defects level using the ACSP-1. Also, the average sample size when using the ACSP-1 plan is less than when using the CSP-1 plan. Thus, the application of statistical methods in the assembly quality management of multi-element products on automatic rotor lines, involving the use of proposed plans and methods for continuous selective control, will allow to automating sampling control procedures and the required level of quality of assembled products while minimizing sample size.

  19. In Situ Sampling of Relative Dust Devil Particle Loads and Their Vertical Grain Size Distributions.

    Science.gov (United States)

    Raack, Jan; Reiss, Dennis; Balme, Matthew R; Taj-Eddine, Kamal; Ori, Gian Gabriele

    2017-04-19

    During a field campaign in the Sahara Desert in southern Morocco, spring 2012, we sampled the vertical grain size distribution of two active dust devils that exhibited different dimensions and intensities. With these in situ samples of grains in the vortices, it was possible to derive detailed vertical grain size distributions and measurements of the lifted relative particle load. Measurements of the two dust devils show that the majority of all lifted particles were only lifted within the first meter (∼46.5% and ∼61% of all particles; ∼76.5 wt % and ∼89 wt % of the relative particle load). Furthermore, ∼69% and ∼82% of all lifted sand grains occurred in the first meter of the dust devils, indicating the occurrence of "sand skirts." Both sampled dust devils were relatively small (∼15 m and ∼4-5 m in diameter) compared to dust devils in surrounding regions; nevertheless, measurements show that ∼58.5% to 73.5% of all lifted particles were small enough to go into suspension (grain size classification). This relatively high amount represents only ∼0.05 to 0.15 wt % of the lifted particle load. Larger dust devils probably entrain larger amounts of fine-grained material into the atmosphere, which can have an influence on the climate. Furthermore, our results indicate that the composition of the surface, on which the dust devils evolved, also had an influence on the particle load composition of the dust devil vortices. The internal particle load structure of both sampled dust devils was comparable related to their vertical grain size distribution and relative particle load, although both dust devils differed in their dimensions and intensities. A general trend of decreasing grain sizes with height was also detected. Key Words: Mars-Dust devils-Planetary science-Desert soils-Atmosphere-Grain sizes. Astrobiology 17, xxx-xxx.

  20. Usage Statistics

    Science.gov (United States)

    ... this page: https://medlineplus.gov/usestatistics.html MedlinePlus Statistics To use the sharing features on this page, ... By Quarter View image full size Quarterly User Statistics Quarter Page Views Unique Visitors Oct-Dec-98 ...

  1. A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions.

    Science.gov (United States)

    Pavlacky, David C; Lukacs, Paul M; Blakesley, Jennifer A; Skorkowsky, Robert C; Klute, David S; Hahn, Beth A; Dreitz, Victoria J; George, T Luke; Hanni, David J

    2017-01-01

    Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1) coordination across organizations and regions, 2) meaningful management and conservation objectives, and 3) rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR) program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17). We provide two examples for the Brewer's sparrow (Spizella breweri) in BCR 17 demonstrating the ability of the design to 1) determine hierarchical population responses to landscape change and 2) estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous statistical

  2. A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions.

    Directory of Open Access Journals (Sweden)

    David C Pavlacky

    Full Text Available Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1 coordination across organizations and regions, 2 meaningful management and conservation objectives, and 3 rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17. We provide two examples for the Brewer's sparrow (Spizella breweri in BCR 17 demonstrating the ability of the design to 1 determine hierarchical population responses to landscape change and 2 estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous

  3. Regional statistical and economic analysis of small and medium-sized businesses development in Zhytomyr region

    Directory of Open Access Journals (Sweden)

    S.I. Pavlova

    2017-08-01

    Full Text Available Small and medium-sized businesses play an important role in the development of the regional economic system in particular and in solving a number of the following local problems: developing competition, developing the market for goods and services, providing jobs for the able-bodied population, raising living standards and improving the social environment in society. The purpose of this paper is to analyze the state and development of small and medium-sized businesses in the Zhytomyr region, to analyze its contribution to the economic development of the region, and to identify the main problems existing in the region. According to the indicators of state statistics, the author presents the general characteristics of enterprises in the Zhytomyr region from 2012 to 2016 in the context of indicators of the number of enterprises, the number of employed workers and the volume of the products sold, highlighting the activities of small enterprises and assessing their share in general levels. In addition, the paper provides the description of the activities of individual entrepreneurs. The structural comparison for the above-listed indicators of the distribution of influence on the economic system of the Zhytomyr region in terms of enterprises by size is presented. In terms of quantity 93,5 % are small enterprises that provide 31,4 % of the total number of employees with work and make up 23,1 % of the total volume of sales. Average enterprises in these indicators have 6,4 %, 62,0 % and 54,8 % respectively. The statistical and economic analysis of the structure of small enterprises by types of economic activity, by indicators of the number of registered enterprises, and by the volumes of sold products is carried out. The uniformity of the distribution is estimated using the index of the concentration coefficient. The indicators of revenues to budgets of different levels from small and medium-sized businesses are set. The paper presents and summarizes the

  4. Samples in applied psychology: over a decade of research in review.

    Science.gov (United States)

    Shen, Winny; Kiger, Thomas B; Davies, Stacy E; Rasch, Rena L; Simon, Kara M; Ones, Deniz S

    2011-09-01

    This study examines sample characteristics of articles published in Journal of Applied Psychology (JAP) from 1995 to 2008. At the individual level, the overall median sample size over the period examined was approximately 173, which is generally adequate for detecting the average magnitude of effects of primary interest to researchers who publish in JAP. Samples using higher units of analyses (e.g., teams, departments/work units, and organizations) had lower median sample sizes (Mdn ≈ 65), yet were arguably robust given typical multilevel design choices of JAP authors despite the practical constraints of collecting data at higher units of analysis. A substantial proportion of studies used student samples (~40%); surprisingly, median sample sizes for student samples were smaller than working adult samples. Samples were more commonly occupationally homogeneous (~70%) than occupationally heterogeneous. U.S. and English-speaking participants made up the vast majority of samples, whereas Middle Eastern, African, and Latin American samples were largely unrepresented. On the basis of study results, recommendations are provided for authors, editors, and readers, which converge on 3 themes: (a) appropriateness and match between sample characteristics and research questions, (b) careful consideration of statistical power, and (c) the increased popularity of quantitative synthesis. Implications are discussed in terms of theory building, generalizability of research findings, and statistical power to detect effects. PsycINFO Database Record (c) 2011 APA, all rights reserved

  5. Evaluating the performance of species richness estimators: sensitivity to sample grain size

    DEFF Research Database (Denmark)

    Hortal, Joaquín; Borges, Paulo A. V.; Gaspar, Clara

    2006-01-01

    and several recent estimators [proposed by Rosenzweig et al. (Conservation Biology, 2003, 17, 864-874), and Ugland et al. (Journal of Animal Ecology, 2003, 72, 888-897)] performed poorly. 3.  Estimations developed using the smaller grain sizes (pair of traps, traps, records and individuals) presented similar....... Data obtained with standardized sampling of 78 transects in natural forest remnants of five islands were aggregated in seven different grains (i.e. ways of defining a single sample): islands, natural areas, transects, pairs of traps, traps, database records and individuals to assess the effect of using...

  6. Considerations for Sample Preparation Using Size-Exclusion Chromatography for Home and Synchrotron Sources.

    Science.gov (United States)

    Rambo, Robert P

    2017-01-01

    The success of a SAXS experiment for structural investigations depends on two precise measurements, the sample and the buffer background. Buffer matching between the sample and background can be achieved using dialysis methods but in biological SAXS of monodisperse systems, sample preparation is routinely being performed with size exclusion chromatography (SEC). SEC is the most reliable method for SAXS sample preparation as the method not only purifies the sample for SAXS but also almost guarantees ideal buffer matching. Here, I will highlight the use of SEC for SAXS sample preparation and demonstrate using example proteins that SEC purification does not always provide for ideal samples. Scrutiny of the SEC elution peak using quasi-elastic and multi-angle light scattering techniques can reveal hidden features (heterogeneity) of the sample that should be considered during SAXS data analysis. In some cases, sample heterogeneity can be controlled using a small molecule additive and I outline a simple additive screening method for sample preparation.

  7. Quality of statistical reporting in developmental disability journals.

    Science.gov (United States)

    Namasivayam, Aravind K; Yan, Tina; Wong, Wing Yiu Stephanie; van Lieshout, Pascal

    2015-12-01

    Null hypothesis significance testing (NHST) dominates quantitative data analysis, but its use is controversial and has been heavily criticized. The American Psychological Association has advocated the reporting of effect sizes (ES), confidence intervals (CIs), and statistical power analysis to complement NHST results to provide a more comprehensive understanding of research findings. The aim of this paper is to carry out a sample survey of statistical reporting practices in two journals with the highest h5-index scores in the areas of developmental disability and rehabilitation. Using a checklist that includes critical recommendations by American Psychological Association, we examined 100 randomly selected articles out of 456 articles reporting inferential statistics in the year 2013 in the Journal of Autism and Developmental Disorders (JADD) and Research in Developmental Disabilities (RDD). The results showed that for both journals, ES were reported only half the time (JADD 59.3%; RDD 55.87%). These findings are similar to psychology journals, but are in stark contrast to ES reporting in educational journals (73%). Furthermore, a priori power and sample size determination (JADD 10%; RDD 6%), along with reporting and interpreting precision measures (CI: JADD 13.33%; RDD 16.67%), were the least reported metrics in these journals, but not dissimilar to journals in other disciplines. To advance the science in developmental disability and rehabilitation and to bridge the research-to-practice divide, reforms in statistical reporting, such as providing supplemental measures to NHST, are clearly needed.

  8. The study of the sample size on the transverse magnetoresistance of bismuth nanowires

    International Nuclear Information System (INIS)

    Zare, M.; Layeghnejad, R.; Sadeghi, E.

    2012-01-01

    The effects of sample size on the galvanomagnetice properties of semimetal nanowires are theoretically investigated. Transverse magnetoresistance (TMR) ratios have been calculated within a Boltzmann Transport Equation (BTE) approach by specular reflection approximation. Temperature and radius dependence of the transverse magnetoresistance of cylindrical Bismuth nanowires are given. The obtained values are in good agreement with the experimental results, reported by Heremans et al. - Highlights: ► In this study effects of sample size on the galvanomagnetic properties of Bi. ► Nanowires were explained by Parrott theorem by solving the Boltzmann Transport Equation. ► Transverse magnetoresistance (TMR) ratios have been measured by specular reflection approximation. ► Temperature and radius dependence of the transverse magnetoresistance of cylindrical Bismuth nanowires are given. ► The obtained values are in good agreement with the experimental results, reported by Heremans et al.

  9. Basic statistical tools in research and data analysis

    Directory of Open Access Journals (Sweden)

    Zulfiqar Ali

    2016-01-01

    Full Text Available Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

  10. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures

    Directory of Open Access Journals (Sweden)

    Scheid Anika

    2012-07-01

    Full Text Available Abstract Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent stochastic context-free grammar (SCFG that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples, where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones, then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst

  11. Structure of Small and Medium-Sized Business: Results of Total Statistic Observations in Russia

    Directory of Open Access Journals (Sweden)

    Iuliia S. Pinkovetskaia

    2018-03-01

    Full Text Available The aim of the research is estimation of regularities and tendencies, characteristic for modern sectoral structure of small and mediumsized business in Russia. The subject of the research is a set of processes of structural changes on the types of economic activities of such enterprises, as well as the differentiation of the number of employees in enterprises. The research methodology included consideration of aggregates of subjects of small and medium-sized business, formed according to sectoral and territorial features. As the initial data used the official statistical information, which was obtain in the course of total observation of the activities of small and medium-sized businesses in 2010 and 2015. The study was conducted on indicators characterizing the full range of legal entities and individual entrepreneurs in the country. The materiality of structural changes was carried out on the basis of the Ryabtsev index. Modeling the differentiation of the values of the number of employees per enterprise was based on the development of density normal distribution functions. According to the hypothesis it is assumed that the differentiation of the number of employees working in enterprises depend on six main types of economic activity and on the subjects of Russia. Based on the results of the study was proved that there are no significant structural changes for the period from 2010 to 2015, both in terms of the number of enterprises and the number of their employees. Based on the results of the simulation, the average values of the number of employees for the six main types of activity were established, as well as the intervals for changing these indicators for the aggregates of small and medium-sized enterprises located in the majority of the country's subjects. The results of research can be used in the performance of scientific works related to the justification of the expected number and number of employees of enterprises, the formation of

  12. (I Can't Get No) Saturation: A simulation and guidelines for sample sizes in qualitative research.

    Science.gov (United States)

    van Rijnsoever, Frank J

    2017-01-01

    I explore the sample size in qualitative research that is required to reach theoretical saturation. I conceptualize a population as consisting of sub-populations that contain different types of information sources that hold a number of codes. Theoretical saturation is reached after all the codes in the population have been observed once in the sample. I delineate three different scenarios to sample information sources: "random chance," which is based on probability sampling, "minimal information," which yields at least one new code per sampling step, and "maximum information," which yields the largest number of new codes per sampling step. Next, I use simulations to assess the minimum sample size for each scenario for systematically varying hypothetical populations. I show that theoretical saturation is more dependent on the mean probability of observing codes than on the number of codes in a population. Moreover, the minimal and maximal information scenarios are significantly more efficient than random chance, but yield fewer repetitions per code to validate the findings. I formulate guidelines for purposive sampling and recommend that researchers follow a minimum information scenario.

  13. Determination of a representative volume element based on the variability of mechanical properties with sample size in bread.

    Science.gov (United States)

    Ramírez, Cristian; Young, Ashley; James, Bryony; Aguilera, José M

    2010-10-01

    Quantitative analysis of food structure is commonly obtained by image analysis of a small portion of the material that may not be the representative of the whole sample. In order to quantify structural parameters (air cells) of 2 types of bread (bread and bagel) the concept of representative volume element (RVE) was employed. The RVE for bread, bagel, and gelatin-gel (used as control) was obtained from the relationship between sample size and the coefficient of variation, calculated from the apparent Young's modulus measured on 25 replicates. The RVE was obtained when the coefficient of variation for different sample sizes converged to a constant value. In the 2 types of bread tested, the tendency of the coefficient of variation was to decrease as the sample size increased, while in the homogeneous gelatin-gel, it remained always constant around 2.3% to 2.4%. The RVE resulted to be cubes with sides of 45 mm for bread, 20 mm for bagels, and 10 mm for gelatin-gel (smallest sample tested). The quantitative image analysis as well as visual observation demonstrated that bread presented the largest dispersion of air-cell sizes. Moreover, both the ratio of maximum air-cell area/image area and maximum air-cell height/image height were greater for bread (values of 0.05 and 0.30, respectively) than for bagels (0.03 and 0.20, respectively). Therefore, the size and the size variation of air cells present in the structure determined the size of the RVE. It was concluded that RVE is highly dependent on the heterogeneity of the structure of the types of baked products.

  14. Chemometric and Statistical Analyses of ToF-SIMS Spectra of Increasingly Complex Biological Samples

    Energy Technology Data Exchange (ETDEWEB)

    Berman, E S; Wu, L; Fortson, S L; Nelson, D O; Kulp, K S; Wu, K J

    2007-10-24

    Characterizing and classifying molecular variation within biological samples is critical for determining fundamental mechanisms of biological processes that will lead to new insights including improved disease understanding. Towards these ends, time-of-flight secondary ion mass spectrometry (ToF-SIMS) was used to examine increasingly complex samples of biological relevance, including monosaccharide isomers, pure proteins, complex protein mixtures, and mouse embryo tissues. The complex mass spectral data sets produced were analyzed using five common statistical and chemometric multivariate analysis techniques: principal component analysis (PCA), linear discriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), soft independent modeling of class analogy (SIMCA), and decision tree analysis by recursive partitioning. PCA was found to be a valuable first step in multivariate analysis, providing insight both into the relative groupings of samples and into the molecular basis for those groupings. For the monosaccharides, pure proteins and protein mixture samples, all of LDA, PLSDA, and SIMCA were found to produce excellent classification given a sufficient number of compound variables calculated. For the mouse embryo tissues, however, SIMCA did not produce as accurate a classification. The decision tree analysis was found to be the least successful for all the data sets, providing neither as accurate a classification nor chemical insight for any of the tested samples. Based on these results we conclude that as the complexity of the sample increases, so must the sophistication of the multivariate technique used to classify the samples. PCA is a preferred first step for understanding ToF-SIMS data that can be followed by either LDA or PLSDA for effective classification analysis. This study demonstrates the strength of ToF-SIMS combined with multivariate statistical and chemometric techniques to classify increasingly complex biological samples

  15. Analysis of femtogram-sized plutonium samples by thermal ionization mass spectrometry

    International Nuclear Information System (INIS)

    Smith, D.H.; Duckworth, D.C.; Bostick, D.T.; Coleman, R.M.; McPherson, R.L.; McKown, H.S.

    1994-01-01

    The goal of this investigation was to extend the ability to perform isotopic analysis of plutonium to samples as small as possible. Plutonium ionizes thermally with quite good efficiency (first ionization potential 5.7 eV). Sub-nanogram sized samples can be analyzed on a near-routine basis given the necessary instrumentation. Efforts in this laboratory have been directed at rhenium-carbon systems; solutions of carbon in rhenium provide surfaces with work functions higher than pure rhenium (5.8 vs. ∼ 5.4 eV). Using a single resin bead as a sample loading medium both concentrates the sample nearly to a point and, due to its interaction with rhenium, produces the desired composite surface. Earlier work in this area showed that a layer of rhenium powder slurried in solution containing carbon substantially enhanced precision of isotopic measurements for uranium. Isotopic fractionation was virtually eliminated, and ionization efficiencies 2-5 times better than previously measured were attained for both Pu and U (1.7 and 0.5%, respectively). The other side of this coin should be the ability to analyze smaller samples, which is the subject of this report

  16. Examination of statistical noise in SPECT image and sampling pitch

    International Nuclear Information System (INIS)

    Takaki, Akihiro; Soma, Tsutomu; Murase, Kenya; Watanabe, Hiroyuki; Murakami, Tomonori; Kawakami, Kazunori; Teraoka, Satomi; Kojima, Akihiro; Matsumoto, Masanori

    2008-01-01

    Statistical noise in single photon emission computed tomography (SPECT) image was examined for its relation with total count and with sampling pitch by simulation and phantom experiment to obtain their projection data under defined conditions. The former SPECT simulation was performed on assumption of a virtual, homogeneous water column (20 cm diameter) as an absorbing mass. In the latter, used were 3D-Hoffman brain phantom (Data Spectrum Corp.) filled with 370 MBq of 99m Tc-pertechnetate solution and a facing 2-detector SPECT machine with a low-energy/high-resolution collimator, E-CAM (Siemens). Projected data by the two methods were reconstructed through the filtered back projection to make each transaxial image. The noise was evaluated by vision, by their root mean square uncertainty calculated from average count and standard deviation (SD) in the region of interest (ROI) defined in reconstructed images and by normalized mean squares calculated from the difference between the reference image obtained with common sampling pitch to and all of obtained slices of, the simulation and phantom. As a conclusion, the pitch was recommended to be set in the machine as to approximating the value calculated by the sampling theorem, though the projection counts per one angular direction were smaller with the same total time of data acquisition. (R.T.)

  17. Sample Size and Robustness of Inferences from Logistic Regression in the Presence of Nonlinearity and Multicollinearity

    OpenAIRE

    Bergtold, Jason S.; Yeager, Elizabeth A.; Featherstone, Allen M.

    2011-01-01

    The logistic regression models has been widely used in the social and natural sciences and results from studies using this model can have significant impact. Thus, confidence in the reliability of inferences drawn from these models is essential. The robustness of such inferences is dependent on sample size. The purpose of this study is to examine the impact of sample size on the mean estimated bias and efficiency of parameter estimation and inference for the logistic regression model. A numbe...

  18. Factors to consider in monitoring programs suggested by statistical analysis of available data

    International Nuclear Information System (INIS)

    Thomas, J.M.

    1977-01-01

    Based on experience gained in the statistical analysis of data collected during monitoring programs at three nuclear power plants, as well as on other studies in the area of impact assessment, I have attempted to outline what has been done and what I believe can be done in assessing environmental changes. Procedural changes that I suggest include the implementation of a stopping rule so field studies are terminated after a negotiated period of time and the commitment of all resources to studies of one or two species. Simulation models are suggested as a useful tool in an iterative process where results of field studies are routinely incorporated until a negotiated stopping time is reached or until acceptable results are attained. Finally, I describe the statistical analyses we have used and their limitations, and I give some sample-size estimates needed to detect changes of specified sizes in population numbers. To detect changes in population numbers of the size we have encountered, calculated sample sizes are found to be much larger than in current use

  19. Bias in segmented gamma scans arising from size differences between calibration standards and assay samples

    International Nuclear Information System (INIS)

    Sampson, T.E.

    1991-01-01

    Recent advances in segmented gamma scanning have emphasized software corrections for gamma-ray self-adsorption in particulates or lumps of special nuclear material in the sample. another feature of this software is an attenuation correction factor formalism that explicitly accounts for differences in sample container size and composition between the calibration standards and the individual items being measured. Software without this container-size correction produces biases when the unknowns are not packaged in the same containers as the calibration standards. This new software allows the use of different size and composition containers for standards and unknowns, as enormous savings considering the expense of multiple calibration standard sets otherwise needed. This paper presents calculations of the bias resulting from not using this new formalism. These calculations may be used to estimate bias corrections for segmented gamma scanners that do not incorporate these advanced concepts

  20. Sample Size Estimation for Negative Binomial Regression Comparing Rates of Recurrent Events with Unequal Follow-Up Time.

    Science.gov (United States)

    Tang, Yongqiang

    2015-01-01

    A sample size formula is derived for negative binomial regression for the analysis of recurrent events, in which subjects can have unequal follow-up time. We obtain sharp lower and upper bounds on the required size, which is easy to compute. The upper bound is generally only slightly larger than the required size, and hence can be used to approximate the sample size. The lower and upper size bounds can be decomposed into two terms. The first term relies on the mean number of events in each group, and the second term depends on two factors that measure, respectively, the extent of between-subject variability in event rates, and follow-up time. Simulation studies are conducted to assess the performance of the proposed method. An application of our formulae to a multiple sclerosis trial is provided.

  1. Sampling methods for amphibians in streams in the Pacific Northwest.

    Science.gov (United States)

    R. Bruce Bury; Paul Stephen. Corn

    1991-01-01

    Methods describing how to sample aquatic and semiaquatic amphibians in small streams and headwater habitats in the Pacific Northwest are presented. We developed a technique that samples 10-meter stretches of selected streams, which was adequate to detect presence or absence of amphibian species and provided sample sizes statistically sufficient to compare abundance of...

  2. Extreme value statistics and finite-size scaling at the ecological extinction/laminar-turbulence transition

    Science.gov (United States)

    Shih, Hong-Yan; Goldenfeld, Nigel

    Experiments on transitional turbulence in pipe flow seem to show that turbulence is a transient metastable state since the measured mean lifetime of turbulence puffs does not diverge asymptotically at a critical Reynolds number. Yet measurements reveal that the lifetime scales with Reynolds number in a super-exponential way reminiscent of extreme value statistics, and simulations and experiments in Couette and channel flow exhibit directed percolation type scaling phenomena near a well-defined transition. This universality class arises from the interplay between small-scale turbulence and a large-scale collective zonal flow, which exhibit predator-prey behavior. Why is asymptotically divergent behavior not observed? Using directed percolation and a stochastic individual level model of predator-prey dynamics related to transitional turbulence, we investigate the relation between extreme value statistics and power law critical behavior, and show that the paradox is resolved by carefully defining what is measured in the experiments. We theoretically derive the super-exponential scaling law, and using finite-size scaling, show how the same data can give both super-exponential behavior and power-law critical scaling.

  3. Uncertainty budget in internal monostandard NAA for small and large size samples analysis

    International Nuclear Information System (INIS)

    Dasari, K.B.; Acharya, R.

    2014-01-01

    Total uncertainty budget evaluation on determined concentration value is important under quality assurance programme. Concentration calculation in NAA or carried out by relative NAA and k0 based internal monostandard NAA (IM-NAA) method. IM-NAA method has been used for small and large sample analysis of clay potteries. An attempt was made to identify the uncertainty components in IM-NAA and uncertainty budget for La in both small and large size samples has been evaluated and compared. (author)

  4. Addressing small sample size bias in multiple-biomarker trials: Inclusion of biomarker-negative patients and Firth correction.

    Science.gov (United States)

    Habermehl, Christina; Benner, Axel; Kopp-Schneider, Annette

    2018-03-01

    In recent years, numerous approaches for biomarker-based clinical trials have been developed. One of these developments are multiple-biomarker trials, which aim to investigate multiple biomarkers simultaneously in independent subtrials. For low-prevalence biomarkers, small sample sizes within the subtrials have to be expected, as well as many biomarker-negative patients at the screening stage. The small sample sizes may make it unfeasible to analyze the subtrials individually. This imposes the need to develop new approaches for the analysis of such trials. With an expected large group of biomarker-negative patients, it seems reasonable to explore options to benefit from including them in such trials. We consider advantages and disadvantages of the inclusion of biomarker-negative patients in a multiple-biomarker trial with a survival endpoint. We discuss design options that include biomarker-negative patients in the study and address the issue of small sample size bias in such trials. We carry out a simulation study for a design where biomarker-negative patients are kept in the study and are treated with standard of care. We compare three different analysis approaches based on the Cox model to examine if the inclusion of biomarker-negative patients can provide a benefit with respect to bias and variance of the treatment effect estimates. We apply the Firth correction to reduce the small sample size bias. The results of the simulation study suggest that for small sample situations, the Firth correction should be applied to adjust for the small sample size bias. Additional to the Firth penalty, the inclusion of biomarker-negative patients in the analysis can lead to further but small improvements in bias and standard deviation of the estimates. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Statistical energy as a tool for binning-free, multivariate goodness-of-fit tests, two-sample comparison and unfolding

    International Nuclear Information System (INIS)

    Aslan, B.; Zech, G.

    2005-01-01

    We introduce the novel concept of statistical energy as a statistical tool. We define statistical energy of statistical distributions in a similar way as for electric charge distributions. Charges of opposite sign are in a state of minimum energy if they are equally distributed. This property is used to check whether two samples belong to the same parent distribution, to define goodness-of-fit tests and to unfold distributions distorted by measurement. The approach is binning-free and especially powerful in multidimensional applications

  6. Autoregressive Prediction with Rolling Mechanism for Time Series Forecasting with Small Sample Size

    Directory of Open Access Journals (Sweden)

    Zhihua Wang

    2014-01-01

    Full Text Available Reasonable prediction makes significant practical sense to stochastic and unstable time series analysis with small or limited sample size. Motivated by the rolling idea in grey theory and the practical relevance of very short-term forecasting or 1-step-ahead prediction, a novel autoregressive (AR prediction approach with rolling mechanism is proposed. In the modeling procedure, a new developed AR equation, which can be used to model nonstationary time series, is constructed in each prediction step. Meanwhile, the data window, for the next step ahead forecasting, rolls on by adding the most recent derived prediction result while deleting the first value of the former used sample data set. This rolling mechanism is an efficient technique for its advantages of improved forecasting accuracy, applicability in the case of limited and unstable data situations, and requirement of little computational effort. The general performance, influence of sample size, nonlinearity dynamic mechanism, and significance of the observed trends, as well as innovation variance, are illustrated and verified with Monte Carlo simulations. The proposed methodology is then applied to several practical data sets, including multiple building settlement sequences and two economic series.

  7. The importance of plot size and the number of sampling seasons on capturing macrofungal species richness.

    Science.gov (United States)

    Li, Huili; Ostermann, Anne; Karunarathna, Samantha C; Xu, Jianchu; Hyde, Kevin D; Mortimer, Peter E

    2018-07-01

    The species-area relationship is an important factor in the study of species diversity, conservation biology, and landscape ecology. A deeper understanding of this relationship is necessary, in order to provide recommendations on how to improve the quality of data collection on macrofungal diversity in different land use systems in future studies, a systematic assessment of methodological parameters, in particular optimal plot sizes. The species-area relationship of macrofungi in tropical and temperate climatic zones and four different land use systems were investigated by determining the macrofungal species richness in plot sizes ranging from 100 m 2 to 10 000 m 2 over two sampling seasons. We found that the effect of plot size on recorded species richness significantly differed between land use systems with the exception of monoculture systems. For both climate zones, land use system needs to be considered when determining optimal plot size. Using an optimal plot size was more important than temporal replication (over two sampling seasons) in accurately recording species richness. Copyright © 2018 British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  8. Multiple category-lot quality assurance sampling: a new classification system with application to schistosomiasis control.

    Directory of Open Access Journals (Sweden)

    Casey Olives

    Full Text Available Originally a binary classifier, Lot Quality Assurance Sampling (LQAS has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%, and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa.We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n=15 and n=25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa.Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87. In three of the studies, the kappa-statistic for a design with n=15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50, the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error.This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools.

  9. Pupil size in Jewish theological seminary students.

    Science.gov (United States)

    Shemesh, G; Kesler, A; Lazar, M; Rothkoff, L

    2004-01-01

    To investigate the authors' clinical impression that pupil size among myopic Jewish theological seminary students is different from pupil size of similar secular subjects. This cross-sectional study was conducted on 28 male Jewish theological seminary students and 28 secular students or workers who were matched for age and refraction. All participants were consecutively enrolled. Scotopic and photopic pupil size was measured by means of a Colvard pupillometer. Comparisons of various parameters between the groups were performed using the two-sample t-test, Fisher exact test, a paired-sample t-test, a two-way analysis of variance, and Pearson correlation coefficients as appropriate. The two groups were statistically matched for age, refraction, and visual acuity. The seminary students were undercorrected by an average of 2.35 diopters (D), while the secular subjects were undercorrected by only 0.65 D (pwork or of apparently characteristic undercorrection of the myopia is undetermined.

  10. Re-estimating sample size in cluster randomized trials with active recruitment within clusters

    NARCIS (Netherlands)

    van Schie, Sander; Moerbeek, Mirjam

    2014-01-01

    Often only a limited number of clusters can be obtained in cluster randomised trials, although many potential participants can be recruited within each cluster. Thus, active recruitment is feasible within the clusters. To obtain an efficient sample size in a cluster randomised trial, the cluster

  11. A simple approach to power and sample size calculations in logistic regression and Cox regression models.

    Science.gov (United States)

    Vaeth, Michael; Skovlund, Eva

    2004-06-15

    For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

  12. Analysis of Statistical Methods Currently used in Toxicology Journals.

    Science.gov (United States)

    Na, Jihye; Yang, Hyeri; Bae, SeungJin; Lim, Kyung-Min

    2014-09-01

    Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health.

  13. Pedagogical Simulation of Sampling Distributions and the Central Limit Theorem

    Science.gov (United States)

    Hagtvedt, Reidar; Jones, Gregory Todd; Jones, Kari

    2007-01-01

    Students often find the fact that a sample statistic is a random variable very hard to grasp. Even more mysterious is why a sample mean should become ever more Normal as the sample size increases. This simulation tool is meant to illustrate the process, thereby giving students some intuitive grasp of the relationship between a parent population…

  14. Distributional Properties of Order Statistics and Record Statistics

    Directory of Open Access Journals (Sweden)

    Abdul Hamid Khan

    2012-07-01

    Full Text Available Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif";} Distributional properties of the order statistics, upper and lower records have been utilized to characterize distributions of interest. Further, one sided random dilation and contraction are utilized to obtain the distribution of non-adjacent ordered statistics and also their important deductions are discussed.

  15. PET/CT in cancer: moderate sample sizes may suffice to justify replacement of a regional gold standard

    DEFF Research Database (Denmark)

    Gerke, Oke; Poulsen, Mads Hvid; Bouchelouche, Kirsten

    2009-01-01

    PURPOSE: For certain cancer indications, the current patient evaluation strategy is a perfect but locally restricted gold standard procedure. If positron emission tomography/computed tomography (PET/CT) can be shown to be reliable within the gold standard region and if it can be argued that PET...... of metastasized prostate cancer. RESULTS: An added value in accuracy of PET/CT in adjacent areas can outweigh a downsized target level of accuracy in the gold standard region, justifying smaller sample sizes. CONCLUSIONS: If PET/CT provides an accuracy benefit in adjacent regions, then sample sizes can be reduced....../CT also performs well in adjacent areas, then sample sizes in accuracy studies can be reduced. PROCEDURES: Traditional standard power calculations for demonstrating sensitivities of both 80% and 90% are shown. The argument is then described in general terms and demonstrated by an ongoing study...

  16. (I Can’t Get No) Saturation: A simulation and guidelines for sample sizes in qualitative research

    Science.gov (United States)

    2017-01-01

    I explore the sample size in qualitative research that is required to reach theoretical saturation. I conceptualize a population as consisting of sub-populations that contain different types of information sources that hold a number of codes. Theoretical saturation is reached after all the codes in the population have been observed once in the sample. I delineate three different scenarios to sample information sources: “random chance,” which is based on probability sampling, “minimal information,” which yields at least one new code per sampling step, and “maximum information,” which yields the largest number of new codes per sampling step. Next, I use simulations to assess the minimum sample size for each scenario for systematically varying hypothetical populations. I show that theoretical saturation is more dependent on the mean probability of observing codes than on the number of codes in a population. Moreover, the minimal and maximal information scenarios are significantly more efficient than random chance, but yield fewer repetitions per code to validate the findings. I formulate guidelines for purposive sampling and recommend that researchers follow a minimum information scenario. PMID:28746358

  17. Neutral dynamics with environmental noise: Age-size statistics and species lifetimes

    Science.gov (United States)

    Kessler, David; Suweis, Samir; Formentin, Marco; Shnerb, Nadav M.

    2015-08-01

    Neutral dynamics, where taxa are assumed to be demographically equivalent and their abundance is governed solely by the stochasticity of the underlying birth-death process, has proved itself as an important minimal model that accounts for many empirical datasets in genetics and ecology. However, the restriction of the model to demographic [O (√{N }) ] noise yields relatively slow dynamics that appears to be in conflict with both short-term and long-term characteristics of the observed systems. Here we analyze two of these problems—age-size relationships and species extinction time—in the framework of a neutral theory with both demographic and environmental stochasticity. It turns out that environmentally induced variations of the demographic rates control the long-term dynamics and modify dramatically the predictions of the neutral theory with demographic noise only, yielding much better agreement with empirical data. We consider two prototypes of "zero mean" environmental noise, one which is balanced with regard to the arithmetic abundance, another balanced in the logarithmic (fitness) space, study their species lifetime statistics, and discuss their relevance to realistic models of community dynamics.

  18. Validation Of Intermediate Large Sample Analysis (With Sizes Up to 100 G) and Associated Facility Improvement

    International Nuclear Information System (INIS)

    Bode, P.; Koster-Ammerlaan, M.J.J.

    2018-01-01

    Pragmatic rather than physical correction factors for neutron and gamma-ray shielding were studied for samples of intermediate size, i.e. up to the 10-100 gram range. It was found that for most biological and geological materials, the neutron self-shielding is less than 5 % and the gamma-ray self-attenuation can easily be estimated. A trueness control material of 1 kg size was made based on use of left-overs of materials, used in laboratory intercomparisons. A design study for a large sample pool-side facility, handling plate-type volumes, had to be stopped because of a reduction in human resources, available for this CRP. The large sample NAA facilities were made available to guest scientists from Greece and Brazil. The laboratory for neutron activation analysis participated in the world’s first laboratory intercomparison utilizing large samples. (author)

  19. Heavy metal speciation in various grain sizes of industrially contaminated street dust using multivariate statistical analysis.

    Science.gov (United States)

    Yıldırım, Gülşen; Tokalıoğlu, Şerife

    2016-02-01

    A total of 36 street dust samples were collected from the streets of the Organised Industrial District in Kayseri, Turkey. This region includes a total of 818 work places in various industrial areas. The modified BCR (the European Community Bureau of Reference) sequential extraction procedure was applied to evaluate the mobility and bioavailability of trace elements (Cd, Co, Cr, Cu, Fe, Mn, Ni, Pb and Zn) in street dusts of the study area. The BCR was classified into three steps: water/acid soluble fraction, reducible and oxidisable fraction. The remaining residue was dissolved by using aqua regia. The concentrations of the metals in street dust samples were determined by flame atomic absorption spectrometry. Also the effect of the different grain sizes (dust samples on the mobility of the metals was investigated using the modified BCR procedure. The mobility sequence based on the sum of the first three phases (for grain size) was: Cd (71.3)>Cu (48.9)>Pb (42.8)=Cr (42.1)>Ni (41.4)>Zn (40.9)>Co (36.6)=Mn (36.3)>Fe (3.1). No significant difference was observed among metal partitioning for the three particle sizes. Correlation, principal component and cluster analysis were applied to identify probable natural and anthropogenic sources in the region. The principal component analysis results showed that this industrial district was influenced by traffic, industrial activities, air-borne emissions and natural sources. The accuracy of the results was checked by analysis of both the BCR-701 certified reference material and by recovery studies in street dust samples. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. Sample Reuse in Statistical Remodeling.

    Science.gov (United States)

    1987-08-01

    as the jackknife and bootstrap, is an expansion of the functional, T(Fn), or of its distribution function or both. Frangos and Schucany (1987a) used...accelerated bootstrap. In the same report Frangos and Schucany demonstrated the small sample superiority of that approach over the proposals that take...higher order terms of an Edgeworth expansion into account. In a second report Frangos and Schucany (1987b) examined the small sample performance of

  1. Statistical properties of four effect-size measures for mediation models.

    Science.gov (United States)

    Miočević, Milica; O'Rourke, Holly P; MacKinnon, David P; Brown, Hendricks C

    2018-02-01

    This project examined the performance of classical and Bayesian estimators of four effect size measures for the indirect effect in a single-mediator model and a two-mediator model. Compared to the proportion and ratio mediation effect sizes, standardized mediation effect-size measures were relatively unbiased and efficient in the single-mediator model and the two-mediator model. Percentile and bias-corrected bootstrap interval estimates of ab/s Y , and ab(s X )/s Y in the single-mediator model outperformed interval estimates of the proportion and ratio effect sizes in terms of power, Type I error rate, coverage, imbalance, and interval width. For the two-mediator model, standardized effect-size measures were superior to the proportion and ratio effect-size measures. Furthermore, it was found that Bayesian point and interval summaries of posterior distributions of standardized effect-size measures reduced excessive relative bias for certain parameter combinations. The standardized effect-size measures are the best effect-size measures for quantifying mediated effects.

  2. Optimal design of sampling and mapping schemes in the radiometric exploration of Chipilapa, El Salvador (Geo-statistics)

    International Nuclear Information System (INIS)

    Balcazar G, M.; Flores R, J.H.

    1992-01-01

    As part of the knowledge about the radiometric surface exploration, carried out in the geothermal field of Chipilapa, El Salvador, its were considered the geo-statistical parameters starting from the calculated variogram of the field data, being that the maxim distance of correlation of the samples in 'radon' in the different observation addresses (N-S, E-W, N W-S E, N E-S W), it was of 121 mts for the monitoring grill in future prospectus in the same area. Being derived of it an optimization (minimum cost) in the spacing of the field samples by means of geo-statistical techniques, without losing the detection of the anomaly. (Author)

  3. Critical analysis of consecutive unilateral cleft lip repairs: determining ideal sample size.

    Science.gov (United States)

    Power, Stephanie M; Matic, Damir B

    2013-03-01

    Objective : Cleft surgeons often show 10 consecutive lip repairs to reduce presentation bias, however the validity remains unknown. The purpose of this study is to determine the number of consecutive cases that represent average outcomes. Secondary objectives are to determine if outcomes correlate with cleft severity and to calculate interrater reliability. Design : Consecutive preoperative and 2-year postoperative photographs of the unilateral cleft lip-nose complex were randomized and evaluated by cleft surgeons. Parametric analysis was performed according to chronologic, consecutive order. The mean standard deviation over all raters enabled calculation of expected 95% confidence intervals around a mean tested for various sample sizes. Setting : Meeting of the American Cleft Palate-Craniofacial Association in 2009. Patients, Participants : Ten senior cleft surgeons evaluated 39 consecutive lip repairs. Main Outcome Measures : Preoperative severity and postoperative outcomes were evaluated using descriptive and quantitative scales. Results : Intraclass correlation coefficients for cleft severity and postoperative evaluations were 0.65 and 0.21, respectively. Outcomes did not correlate with cleft severity (P  =  .28). Calculations for 10 consecutive cases demonstrated wide 95% confidence intervals, spanning two points on both postoperative grading scales. Ninety-five percent confidence intervals narrowed within one qualitative grade (±0.30) and one point (±0.50) on the 10-point scale for 27 consecutive cases. Conclusions : Larger numbers of consecutive cases (n > 27) are increasingly representative of average results, but less practical in presentation format. Ten consecutive cases lack statistical support. Cleft surgeons showed low interrater reliability for postoperative assessments, which may reflect personal bias when evaluating another surgeon's results.

  4. Statistical modeling of static strengths of nuclear graphites with relevance to structural design

    International Nuclear Information System (INIS)

    Arai, Taketoshi

    1992-02-01

    Use of graphite materials for structural members poses a problem as to how to take into account of statistical properties of static strength, especially tensile fracture stresses, in component structural design. The present study concerns comprehensive examinations on statistical data base and modelings on nuclear graphites. First, the report provides individual samples and their analyses on strengths of IG-110 and PGX graphites for HTTR components. Those statistical characteristics on other HTGR graphites are also exemplified from the literature. Most of statistical distributions of individual samples are found to be approximately normal. The goodness of fit to normal distributions is more satisfactory with larger sample sizes. Molded and extruded graphites, however, possess a variety of statistical properties depending of samples from different with-in-log locations and/or different orientations. Second, the previous statistical models including the Weibull theory are assessed from the viewpoint of applicability to design procedures. This leads to a conclusion that the Weibull theory and its modified ones are satisfactory only for limited parts of tensile fracture behavior. They are not consistent for whole observations. Only normal statistics are justifiable as practical approaches to discuss specified minimum ultimate strengths as statistical confidence limits for individual samples. Third, the assessment of various statistical models emphasizes the need to develop advanced analytical ones which should involve modeling of microstructural features of actual graphite materials. Improvements of other structural design methodologies are also presented. (author)

  5. Size-Resolved Penetration Through High-Efficiency Filter Media Typically Used for Aerosol Sampling

    Czech Academy of Sciences Publication Activity Database

    Zíková, Naděžda; Ondráček, Jakub; Ždímal, Vladimír

    2015-01-01

    Roč. 49, č. 4 (2015), s. 239-249 ISSN 0278-6826 R&D Projects: GA ČR(CZ) GBP503/12/G147 Institutional support: RVO:67985858 Keywords : filters * size-resolved penetration * atmospheric aerosol sampling Subject RIV: CF - Physical ; Theoretical Chemistry Impact factor: 1.953, year: 2015

  6. A simple sample size formula for analysis of covariance in cluster randomized trials.

    NARCIS (Netherlands)

    Teerenstra, S.; Eldridge, S.; Graff, M.J.; Hoop, E. de; Borm, G.F.

    2012-01-01

    For cluster randomized trials with a continuous outcome, the sample size is often calculated as if an analysis of the outcomes at the end of the treatment period (follow-up scores) would be performed. However, often a baseline measurement of the outcome is available or feasible to obtain. An

  7. Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method.

    Science.gov (United States)

    Dwivedi, Alok Kumar; Mallawaarachchi, Indika; Alvarado, Luis A

    2017-06-30

    Experimental studies in biomedical research frequently pose analytical problems related to small sample size. In such studies, there are conflicting findings regarding the choice of parametric and nonparametric analysis, especially with non-normal data. In such instances, some methodologists questioned the validity of parametric tests and suggested nonparametric tests. In contrast, other methodologists found nonparametric tests to be too conservative and less powerful and thus preferred using parametric tests. Some researchers have recommended using a bootstrap test; however, this method also has small sample size limitation. We used a pooled method in nonparametric bootstrap test that may overcome the problem related with small samples in hypothesis testing. The present study compared nonparametric bootstrap test with pooled resampling method corresponding to parametric, nonparametric, and permutation tests through extensive simulations under various conditions and using real data examples. The nonparametric pooled bootstrap t-test provided equal or greater power for comparing two means as compared with unpaired t-test, Welch t-test, Wilcoxon rank sum test, and permutation test while maintaining type I error probability for any conditions except for Cauchy and extreme variable lognormal distributions. In such cases, we suggest using an exact Wilcoxon rank sum test. Nonparametric bootstrap paired t-test also provided better performance than other alternatives. Nonparametric bootstrap test provided benefit over exact Kruskal-Wallis test. We suggest using nonparametric bootstrap test with pooled resampling method for comparing paired or unpaired means and for validating the one way analysis of variance test results for non-normal data in small sample size studies. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  8. Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

    Energy Technology Data Exchange (ETDEWEB)

    Cabalin, L.M.; Gonzalez, A. [Department of Analytical Chemistry, University of Malaga, E-29071 Malaga (Spain); Ruiz, J. [Department of Applied Physics I, University of Malaga, E-29071 Malaga (Spain); Laserna, J.J., E-mail: laserna@uma.e [Department of Analytical Chemistry, University of Malaga, E-29071 Malaga (Spain)

    2010-08-15

    Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s{sup -1}. Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

  9. Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

    Science.gov (United States)

    Cabalín, L. M.; González, A.; Ruiz, J.; Laserna, J. J.

    2010-08-01

    Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s - 1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

  10. Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

    International Nuclear Information System (INIS)

    Cabalin, L.M.; Gonzalez, A.; Ruiz, J.; Laserna, J.J.

    2010-01-01

    Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s -1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

  11. A Preliminary Study on Sensitivity and Uncertainty Analysis with Statistic Method: Uncertainty Analysis with Cross Section Sampling from Lognormal Distribution

    Energy Technology Data Exchange (ETDEWEB)

    Song, Myung Sub; Kim, Song Hyun; Kim, Jong Kyung [Hanyang Univ., Seoul (Korea, Republic of); Noh, Jae Man [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

    2013-10-15

    The uncertainty evaluation with statistical method is performed by repetition of transport calculation with sampling the directly perturbed nuclear data. Hence, the reliable uncertainty result can be obtained by analyzing the results of the numerous transport calculations. One of the problems in the uncertainty analysis with the statistical approach is known as that the cross section sampling from the normal (Gaussian) distribution with relatively large standard deviation leads to the sampling error of the cross sections such as the sampling of the negative cross section. Some collection methods are noted; however, the methods can distort the distribution of the sampled cross sections. In this study, a sampling method of the nuclear data is proposed by using lognormal distribution. After that, the criticality calculations with sampled nuclear data are performed and the results are compared with that from the normal distribution which is conventionally used in the previous studies. In this study, the statistical sampling method of the cross section with the lognormal distribution was proposed to increase the sampling accuracy without negative sampling error. Also, a stochastic cross section sampling and writing program was developed. For the sensitivity and uncertainty analysis, the cross section sampling was pursued with the normal and lognormal distribution. The uncertainties, which are caused by covariance of (n,.) cross sections, were evaluated by solving GODIVA problem. The results show that the sampling method with lognormal distribution can efficiently solve the negative sampling problem referred in the previous studies. It is expected that this study will contribute to increase the accuracy of the sampling-based uncertainty analysis.

  12. A Preliminary Study on Sensitivity and Uncertainty Analysis with Statistic Method: Uncertainty Analysis with Cross Section Sampling from Lognormal Distribution

    International Nuclear Information System (INIS)

    Song, Myung Sub; Kim, Song Hyun; Kim, Jong Kyung; Noh, Jae Man

    2013-01-01

    The uncertainty evaluation with statistical method is performed by repetition of transport calculation with sampling the directly perturbed nuclear data. Hence, the reliable uncertainty result can be obtained by analyzing the results of the numerous transport calculations. One of the problems in the uncertainty analysis with the statistical approach is known as that the cross section sampling from the normal (Gaussian) distribution with relatively large standard deviation leads to the sampling error of the cross sections such as the sampling of the negative cross section. Some collection methods are noted; however, the methods can distort the distribution of the sampled cross sections. In this study, a sampling method of the nuclear data is proposed by using lognormal distribution. After that, the criticality calculations with sampled nuclear data are performed and the results are compared with that from the normal distribution which is conventionally used in the previous studies. In this study, the statistical sampling method of the cross section with the lognormal distribution was proposed to increase the sampling accuracy without negative sampling error. Also, a stochastic cross section sampling and writing program was developed. For the sensitivity and uncertainty analysis, the cross section sampling was pursued with the normal and lognormal distribution. The uncertainties, which are caused by covariance of (n,.) cross sections, were evaluated by solving GODIVA problem. The results show that the sampling method with lognormal distribution can efficiently solve the negative sampling problem referred in the previous studies. It is expected that this study will contribute to increase the accuracy of the sampling-based uncertainty analysis

  13. Parameter sampling capabilities of sequential and simultaneous data assimilation: II. Statistical analysis of numerical results

    International Nuclear Information System (INIS)

    Fossum, Kristian; Mannseth, Trond

    2014-01-01

    We assess and compare parameter sampling capabilities of one sequential and one simultaneous Bayesian, ensemble-based, joint state-parameter (JS) estimation method. In the companion paper, part I (Fossum and Mannseth 2014 Inverse Problems 30 114002), analytical investigations lead us to propose three claims, essentially stating that the sequential method can be expected to outperform the simultaneous method for weakly nonlinear forward models. Here, we assess the reliability and robustness of these claims through statistical analysis of results from a range of numerical experiments. Samples generated by the two approximate JS methods are compared to samples from the posterior distribution generated by a Markov chain Monte Carlo method, using four approximate measures of distance between probability distributions. Forward-model nonlinearity is assessed from a stochastic nonlinearity measure allowing for sufficiently large model dimensions. Both toy models (with low computational complexity, and where the nonlinearity is fairly easy to control) and two-phase porous-media flow models (corresponding to down-scaled versions of problems to which the JS methods have been frequently applied recently) are considered in the numerical experiments. Results from the statistical analysis show strong support of all three claims stated in part I. (paper)

  14. Sample sizes to control error estimates in determining soil bulk density in California forest soils

    Science.gov (United States)

    Youzhi Han; Jianwei Zhang; Kim G. Mattson; Weidong Zhang; Thomas A. Weber

    2016-01-01

    Characterizing forest soil properties with high variability is challenging, sometimes requiring large numbers of soil samples. Soil bulk density is a standard variable needed along with element concentrations to calculate nutrient pools. This study aimed to determine the optimal sample size, the number of observation (n), for predicting the soil bulk density with a...

  15. The thresholds for statistical and clinical significance

    DEFF Research Database (Denmark)

    Jakobsen, Janus Christian; Gluud, Christian; Winkel, Per

    2014-01-01

    BACKGROUND: Thresholds for statistical significance are insufficiently demonstrated by 95% confidence intervals or P-values when assessing results from randomised clinical trials. First, a P-value only shows the probability of getting a result assuming that the null hypothesis is true and does...... not reflect the probability of getting a result assuming an alternative hypothesis to the null hypothesis is true. Second, a confidence interval or a P-value showing significance may be caused by multiplicity. Third, statistical significance does not necessarily result in clinical significance. Therefore...... of the probability that a given trial result is compatible with a 'null' effect (corresponding to the P-value) divided by the probability that the trial result is compatible with the intervention effect hypothesised in the sample size calculation; (3) adjust the confidence intervals and the statistical significance...

  16. Size-segregated urban aerosol characterization by electron microscopy and dynamic light scattering and influence of sample preparation

    Science.gov (United States)

    Marvanová, Soňa; Kulich, Pavel; Skoupý, Radim; Hubatka, František; Ciganek, Miroslav; Bendl, Jan; Hovorka, Jan; Machala, Miroslav

    2018-04-01

    Size-segregated particulate matter (PM) is frequently used in chemical and toxicological studies. Nevertheless, toxicological in vitro studies working with the whole particles often lack a proper evaluation of PM real size distribution and characterization of agglomeration under the experimental conditions. In this study, changes in particle size distributions during the PM sample manipulation and also semiquantitative elemental composition of single particles were evaluated. Coarse (1-10 μm), upper accumulation (0.5-1 μm), lower accumulation (0.17-0.5 μm), and ultrafine (culture media. PM suspension of lower accumulation fraction in water agglomerated after freezing/thawing the sample, and the agglomerates were disrupted by subsequent sonication. Ultrafine fraction did not agglomerate after freezing/thawing the sample. Both lower accumulation and ultrafine fractions were stable in cell culture media with fetal bovine serum, while high agglomeration occurred in media without fetal bovine serum as measured during 24 h.

  17. Clustering for high-dimension, low-sample size data using distance vectors

    OpenAIRE

    Terada, Yoshikazu

    2013-01-01

    In high-dimension, low-sample size (HDLSS) data, it is not always true that closeness of two objects reflects a hidden cluster structure. We point out the important fact that it is not the closeness, but the "values" of distance that contain information of the cluster structure in high-dimensional space. Based on this fact, we propose an efficient and simple clustering approach, called distance vector clustering, for HDLSS data. Under the assumptions given in the work of Hall et al. (2005), w...

  18. Adsorption of diclofenac and nimesulide on activated carbon: Statistical physics modeling and effect of adsorbate size

    Science.gov (United States)

    Sellaoui, Lotfi; Mechi, Nesrine; Lima, Éder Cláudio; Dotto, Guilherme Luiz; Ben Lamine, Abdelmottaleb

    2017-10-01

    Based on statistical physics elements, the equilibrium adsorption of diclofenac (DFC) and nimesulide (NM) on activated carbon was analyzed by a multilayer model with saturation. The paper aimed to describe experimentally and theoretically the adsorption process and study the effect of adsorbate size using the model parameters. From numerical simulation, the number of molecules per site showed that the adsorbate molecules (DFC and NM) were mostly anchored in both sides of the pore walls. The receptor sites density increase suggested that additional sites appeared during the process, to participate in DFC and NM adsorption. The description of the adsorption energy behavior indicated that the process was physisorption. Finally, by a model parameters correlation, the size effect of the adsorbate was deduced indicating that the molecule dimension has a negligible effect on the DFC and NM adsorption.

  19. The outlier sample effects on multivariate statistical data processing geochemical stream sediment survey (Moghangegh region, North West of Iran)

    International Nuclear Information System (INIS)

    Ghanbari, Y.; Habibnia, A.; Memar, A.

    2009-01-01

    In geochemical stream sediment surveys in Moghangegh Region in north west of Iran, sheet 1:50,000, 152 samples were collected and after the analyze and processing of data, it revealed that Yb, Sc, Ni, Li, Eu, Cd, Co, as contents in one sample is far higher than other samples. After detecting this sample as an outlier sample, the effect of this sample on multivariate statistical data processing for destructive effects of outlier sample in geochemical exploration was investigated. Pearson and Spear man correlation coefficient methods and cluster analysis were used for multivariate studies and the scatter plot of some elements together the regression profiles are given in case of 152 and 151 samples and the results are compared. After investigation of multivariate statistical data processing results, it was realized that results of existence of outlier samples may appear as the following relations between elements: - true relation between two elements, which have no outlier frequency in the outlier sample. - false relation between two elements which one of them has outlier frequency in the outlier sample. - complete false relation between two elements which both have outlier frequency in the outlier sample

  20. Will Outer Tropical Cyclone Size Change due to Anthropogenic Warming?

    Science.gov (United States)

    Schenkel, B. A.; Lin, N.; Chavas, D. R.; Vecchi, G. A.; Knutson, T. R.; Oppenheimer, M.

    2017-12-01

    Prior research has shown significant interbasin and intrabasin variability in outer tropical cyclone (TC) size. Moreover, outer TC size has even been shown to vary substantially over the lifetime of the majority of TCs. However, the factors responsible for both setting initial outer TC size and determining its evolution throughout the TC lifetime remain uncertain. Given these gaps in our physical understanding, there remains uncertainty in how outer TC size will change, if at all, due to anthropogenic warming. The present study seeks to quantify whether outer TC size will change significantly in response to anthropogenic warming using data from a high-resolution global climate model and a regional hurricane model. Similar to prior work, the outer TC size metric used in this study is the radius in which the azimuthal-mean surface azimuthal wind equals 8 m/s. The initial results from the high-resolution global climate model data suggest that the distribution of outer TC size shifts significantly towards larger values in each global TC basin during future climates, as revealed by 1) statistically significant increase of the median outer TC size by 5-10% (p<0.05) according to a 1,000-sample bootstrap resampling approach with replacement and 2) statistically significant differences between distributions of outer TC size from current and future climate simulations as shown using two-sample Kolmogorov Smirnov testing (p<<0.01). Additional analysis of the high-resolution global climate model data reveals that outer TC size does not uniformly increase within each basin in future climates, but rather shows substantial locational dependence. Future work will incorporate the regional mesoscale hurricane model data to help focus on identifying the source of the spatial variability in outer TC size increases within each basin during future climates and, more importantly, why outer TC size changes in response to anthropogenic warming.

  1. Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning

    Science.gov (United States)

    Li, Zhushan

    2014-01-01

    Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…

  2. Sample Size Calculation for Estimating or Testing a Nonzero Squared Multiple Correlation Coefficient

    Science.gov (United States)

    Krishnamoorthy, K.; Xia, Yanping

    2008-01-01

    The problems of hypothesis testing and interval estimation of the squared multiple correlation coefficient of a multivariate normal distribution are considered. It is shown that available one-sided tests are uniformly most powerful, and the one-sided confidence intervals are uniformly most accurate. An exact method of calculating sample size to…

  3. Determining the sample size required to establish whether a medical device is non-inferior to an external benchmark.

    Science.gov (United States)

    Sayers, Adrian; Crowther, Michael J; Judge, Andrew; Whitehouse, Michael R; Blom, Ashley W

    2017-08-28

    The use of benchmarks to assess the performance of implants such as those used in arthroplasty surgery is a widespread practice. It provides surgeons, patients and regulatory authorities with the reassurance that implants used are safe and effective. However, it is not currently clear how or how many implants should be statistically compared with a benchmark to assess whether or not that implant is superior, equivalent, non-inferior or inferior to the performance benchmark of interest.We aim to describe the methods and sample size required to conduct a one-sample non-inferiority study of a medical device for the purposes of benchmarking. Simulation study. Simulation study of a national register of medical devices. We simulated data, with and without a non-informative competing risk, to represent an arthroplasty population and describe three methods of analysis (z-test, 1-Kaplan-Meier and competing risks) commonly used in surgical research. We evaluate the performance of each method using power, bias, root-mean-square error, coverage and CI width. 1-Kaplan-Meier provides an unbiased estimate of implant net failure, which can be used to assess if a surgical device is non-inferior to an external benchmark. Small non-inferiority margins require significantly more individuals to be at risk compared with current benchmarking standards. A non-inferiority testing paradigm provides a useful framework for determining if an implant meets the required performance defined by an external benchmark. Current contemporary benchmarking standards have limited power to detect non-inferiority, and substantially larger samples sizes, in excess of 3200 procedures, are required to achieve a power greater than 60%. It is clear when benchmarking implant performance, net failure estimated using 1-KM is preferential to crude failure estimated by competing risk models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No

  4. Type-II generalized family-wise error rate formulas with application to sample size determination.

    Science.gov (United States)

    Delorme, Phillipe; de Micheaux, Pierre Lafaye; Liquet, Benoit; Riou, Jérémie

    2016-07-20

    Multiple endpoints are increasingly used in clinical trials. The significance of some of these clinical trials is established if at least r null hypotheses are rejected among m that are simultaneously tested. The usual approach in multiple hypothesis testing is to control the family-wise error rate, which is defined as the probability that at least one type-I error is made. More recently, the q-generalized family-wise error rate has been introduced to control the probability of making at least q false rejections. For procedures controlling this global type-I error rate, we define a type-II r-generalized family-wise error rate, which is directly related to the r-power defined as the probability of rejecting at least r false null hypotheses. We obtain very general power formulas that can be used to compute the sample size for single-step and step-wise procedures. These are implemented in our R package rPowerSampleSize available on the CRAN, making them directly available to end users. Complexities of the formulas are presented to gain insight into computation time issues. Comparison with Monte Carlo strategy is also presented. We compute sample sizes for two clinical trials involving multiple endpoints: one designed to investigate the effectiveness of a drug against acute heart failure and the other for the immunogenicity of a vaccine strategy against pneumococcus. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  5. Statistics in experimental design, preprocessing, and analysis of proteomics data.

    Science.gov (United States)

    Jung, Klaus

    2011-01-01

    High-throughput experiments in proteomics, such as 2-dimensional gel electrophoresis (2-DE) and mass spectrometry (MS), yield usually high-dimensional data sets of expression values for hundreds or thousands of proteins which are, however, observed on only a relatively small number of biological samples. Statistical methods for the planning and analysis of experiments are important to avoid false conclusions and to receive tenable results. In this chapter, the most frequent experimental designs for proteomics experiments are illustrated. In particular, focus is put on studies for the detection of differentially regulated proteins. Furthermore, issues of sample size planning, statistical analysis of expression levels as well as methods for data preprocessing are covered.

  6. Sample Size Calculation: Inaccurate A Priori Assumptions for Nuisance Parameters Can Greatly Affect the Power of a Randomized Controlled Trial.

    Directory of Open Access Journals (Sweden)

    Elsa Tavernier

    Full Text Available We aimed to examine the extent to which inaccurate assumptions for nuisance parameters used to calculate sample size can affect the power of a randomized controlled trial (RCT. In a simulation study, we separately considered an RCT with continuous, dichotomous or time-to-event outcomes, with associated nuisance parameters of standard deviation, success rate in the control group and survival rate in the control group at some time point, respectively. For each type of outcome, we calculated a required sample size N for a hypothesized treatment effect, an assumed nuisance parameter and a nominal power of 80%. We then assumed a nuisance parameter associated with a relative error at the design stage. For each type of outcome, we randomly drew 10,000 relative errors of the associated nuisance parameter (from empirical distributions derived from a previously published review. Then, retro-fitting the sample size formula, we derived, for the pre-calculated sample size N, the real power of the RCT, taking into account the relative error for the nuisance parameter. In total, 23%, 0% and 18% of RCTs with continuous, binary and time-to-event outcomes, respectively, were underpowered (i.e., the real power was 90%. Even with proper calculation of sample size, a substantial number of trials are underpowered or overpowered because of imprecise knowledge of nuisance parameters. Such findings raise questions about how sample size for RCTs should be determined.

  7. Exact distributions of two-sample rank statistics and block rank statistics using computer algebra

    NARCIS (Netherlands)

    Wiel, van de M.A.

    1998-01-01

    We derive generating functions for various rank statistics and we use computer algebra to compute the exact null distribution of these statistics. We present various techniques for reducing time and memory space used by the computations. We use the results to write Mathematica notebooks for

  8. Inverse Statistics and Asset Allocation Efficiency

    Science.gov (United States)

    Bolgorian, Meysam

    In this paper using inverse statistics analysis, the effect of investment horizon on the efficiency of portfolio selection is examined. Inverse statistics analysis is a general tool also known as probability distribution of exit time that is used for detecting the distribution of the time in which a stochastic process exits from a zone. This analysis was used in Refs. 1 and 2 for studying the financial returns time series. This distribution provides an optimal investment horizon which determines the most likely horizon for gaining a specific return. Using samples of stocks from Tehran Stock Exchange (TSE) as an emerging market and S&P 500 as a developed market, effect of optimal investment horizon in asset allocation is assessed. It is found that taking into account the optimal investment horizon in TSE leads to more efficiency for large size portfolios while for stocks selected from S&P 500, regardless of portfolio size, this strategy does not only not produce more efficient portfolios, but also longer investment horizons provides more efficiency.

  9. In vivo Comet assay--statistical analysis and power calculations of mice testicular cells.

    Science.gov (United States)

    Hansen, Merete Kjær; Sharma, Anoop Kumar; Dybdahl, Marianne; Boberg, Julie; Kulahci, Murat

    2014-11-01

    The in vivo Comet assay is a sensitive method for evaluating DNA damage. A recurrent concern is how to analyze the data appropriately and efficiently. A popular approach is to summarize the raw data into a summary statistic prior to the statistical analysis. However, consensus on which summary statistic to use has yet to be reached. Another important consideration concerns the assessment of proper sample sizes in the design of Comet assay studies. This study aims to identify a statistic suitably summarizing the % tail DNA of mice testicular samples in Comet assay studies. A second aim is to provide curves for this statistic outlining the number of animals and gels to use. The current study was based on 11 compounds administered via oral gavage in three doses to male mice: CAS no. 110-26-9, CAS no. 512-56-1, CAS no. 111873-33-7, CAS no. 79-94-7, CAS no. 115-96-8, CAS no. 598-55-0, CAS no. 636-97-5, CAS no. 85-28-9, CAS no. 13674-87-8, CAS no. 43100-38-5 and CAS no. 60965-26-6. Testicular cells were examined using the alkaline version of the Comet assay and the DNA damage was quantified as % tail DNA using a fully automatic scoring system. From the raw data 23 summary statistics were examined. A linear mixed-effects model was fitted to the summarized data and the estimated variance components were used to generate power curves as a function of sample size. The statistic that most appropriately summarized the within-sample distributions was the median of the log-transformed data, as it most consistently conformed to the assumptions of the statistical model. Power curves for 1.5-, 2-, and 2.5-fold changes of the highest dose group compared to the control group when 50 and 100 cells were scored per gel are provided to aid in the design of future Comet assay studies on testicular cells. Copyright © 2014 Elsevier B.V. All rights reserved.

  10. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data.

    Science.gov (United States)

    Bhaskar, Anand; Wang, Y X Rachel; Song, Yun S

    2015-02-01

    With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions. © 2015 Bhaskar et al.; Published by Cold Spring Harbor Laboratory Press.

  11. Quantification of integrated HIV DNA by repetitive-sampling Alu-HIV PCR on the basis of poisson statistics.

    Science.gov (United States)

    De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos

    2014-06-01

    Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.

  12. Multiple category-lot quality assurance sampling: a new classification system with application to schistosomiasis control.

    Science.gov (United States)

    Olives, Casey; Valadez, Joseph J; Brooker, Simon J; Pagano, Marcello

    2012-01-01

    Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n=15 and n=25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n=15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools.

  13. What big size you have! Using effect sizes to determine the impact of public health nursing interventions.

    Science.gov (United States)

    Johnson, K E; McMorris, B J; Raynor, L A; Monsen, K A

    2013-01-01

    The Omaha System is a standardized interface terminology that is used extensively by public health nurses in community settings to document interventions and client outcomes. Researchers using Omaha System data to analyze the effectiveness of interventions have typically calculated p-values to determine whether significant client changes occurred between admission and discharge. However, p-values are highly dependent on sample size, making it difficult to distinguish statistically significant changes from clinically meaningful changes. Effect sizes can help identify practical differences but have not yet been applied to Omaha System data. We compared p-values and effect sizes (Cohen's d) for mean differences between admission and discharge for 13 client problems documented in the electronic health records of 1,016 young low-income parents. Client problems were documented anywhere from 6 (Health Care Supervision) to 906 (Caretaking/parenting) times. On a scale from 1 to 5, the mean change needed to yield a large effect size (Cohen's d ≥ 0.80) was approximately 0.60 (range = 0.50 - 1.03) regardless of p-value or sample size (i.e., the number of times a client problem was documented in the electronic health record). Researchers using the Omaha System should report effect sizes to help readers determine which differences are practical and meaningful. Such disclosures will allow for increased recognition of effective interventions.

  14. Economic Statistical Design of Variable Sampling Interval X¯$\\overline X $ Control Chart Based on Surrogate Variable Using Genetic Algorithms

    Directory of Open Access Journals (Sweden)

    Lee Tae-Hoon

    2016-12-01

    Full Text Available In many cases, a X¯$\\overline X $ control chart based on a performance variable is used in industrial fields. Typically, the control chart monitors the measurements of a performance variable itself. However, if the performance variable is too costly or impossible to measure, and a less expensive surrogate variable is available, the process may be more efficiently controlled using surrogate variables. In this paper, we present a model for the economic statistical design of a VSI (Variable Sampling Interval X¯$\\overline X $ control chart using a surrogate variable that is linearly correlated with the performance variable. We derive the total average profit model from an economic viewpoint and apply the model to a Very High Temperature Reactor (VHTR nuclear fuel measurement system and derive the optimal result using genetic algorithms. Compared with the control chart based on a performance variable, the proposed model gives a larger expected net income per unit of time in the long-run if the correlation between the performance variable and the surrogate variable is relatively high. The proposed model was confined to the sample mean control chart under the assumption that a single assignable cause occurs according to the Poisson process. However, the model may also be extended to other types of control charts using a single or multiple assignable cause assumptions such as VSS (Variable Sample Size X¯$\\overline X $ control chart, EWMA, CUSUM charts and so on.

  15. Finite-sample instrumental variables Inference using an Asymptotically Pivotal Statistic

    NARCIS (Netherlands)

    Bekker, P.; Kleibergen, F.R.

    2001-01-01

    The paper considers the K-statistic, Kleibergen’s (2000) adaptation ofthe Anderson-Rubin (AR) statistic in instrumental variables regression.Compared to the AR-statistic this K-statistic shows improvedasymptotic efficiency in terms of degrees of freedom in overidentifiedmodels and yet it shares,

  16. Effects of growth rate, size, and light availability on tree survival across life stages: a demographic analysis accounting for missing values and small sample sizes.

    Science.gov (United States)

    Moustakas, Aristides; Evans, Matthew R

    2015-02-28

    Plant survival is a key factor in forest dynamics and survival probabilities often vary across life stages. Studies specifically aimed at assessing tree survival are unusual and so data initially designed for other purposes often need to be used; such data are more likely to contain errors than data collected for this specific purpose. We investigate the survival rates of ten tree species in a dataset designed to monitor growth rates. As some individuals were not included in the census at some time points we use capture-mark-recapture methods both to allow us to account for missing individuals, and to estimate relocation probabilities. Growth rates, size, and light availability were included as covariates in the model predicting survival rates. The study demonstrates that tree mortality is best described as constant between years and size-dependent at early life stages and size independent at later life stages for most species of UK hardwood. We have demonstrated that even with a twenty-year dataset it is possible to discern variability both between individuals and between species. Our work illustrates the potential utility of the method applied here for calculating plant population dynamics parameters in time replicated datasets with small sample sizes and missing individuals without any loss of sample size, and including explanatory covariates.

  17. Development of a sampling strategy and sample size calculation to estimate the distribution of mammographic breast density in Korean women.

    Science.gov (United States)

    Jun, Jae Kwan; Kim, Mi Jin; Choi, Kui Son; Suh, Mina; Jung, Kyu-Won

    2012-01-01

    Mammographic breast density is a known risk factor for breast cancer. To conduct a survey to estimate the distribution of mammographic breast density in Korean women, appropriate sampling strategies for representative and efficient sampling design were evaluated through simulation. Using the target population from the National Cancer Screening Programme (NCSP) for breast cancer in 2009, we verified the distribution estimate by repeating the simulation 1,000 times using stratified random sampling to investigate the distribution of breast density of 1,340,362 women. According to the simulation results, using a sampling design stratifying the nation into three groups (metropolitan, urban, and rural), with a total sample size of 4,000, we estimated the distribution of breast density in Korean women at a level of 0.01% tolerance. Based on the results of our study, a nationwide survey for estimating the distribution of mammographic breast density among Korean women can be conducted efficiently.

  18. R2 effect-size measures for mediation analysis.

    Science.gov (United States)

    Fairchild, Amanda J; Mackinnon, David P; Taborga, Marcia P; Taylor, Aaron B

    2009-05-01

    R(2) effect-size measures are presented to assess variance accounted for in mediation models. The measures offer a means to evaluate both component paths and the overall mediated effect in mediation models. Statistical simulation results indicate acceptable bias across varying parameter and sample-size combinations. The measures are applied to a real-world example using data from a team-based health promotion program to improve the nutrition and exercise habits of firefighters. SAS and SPSS computer code are also provided for researchers to compute the measures in their own data.

  19. R2 effect-size measures for mediation analysis

    Science.gov (United States)

    Fairchild, Amanda J.; MacKinnon, David P.; Taborga, Marcia P.; Taylor, Aaron B.

    2010-01-01

    R2 effect-size measures are presented to assess variance accounted for in mediation models. The measures offer a means to evaluate both component paths and the overall mediated effect in mediation models. Statistical simulation results indicate acceptable bias across varying parameter and sample-size combinations. The measures are applied to a real-world example using data from a team-based health promotion program to improve the nutrition and exercise habits of firefighters. SAS and SPSS computer code are also provided for researchers to compute the measures in their own data. PMID:19363189

  20. Sample size calculations based on a difference in medians for positively skewed outcomes in health care studies

    Directory of Open Access Journals (Sweden)

    Aidan G. O’Keeffe

    2017-12-01

    Full Text Available Abstract Background In healthcare research, outcomes with skewed probability distributions are common. Sample size calculations for such outcomes are typically based on estimates on a transformed scale (e.g. log which may sometimes be difficult to obtain. In contrast, estimates of median and variance on the untransformed scale are generally easier to pre-specify. The aim of this paper is to describe how to calculate a sample size for a two group comparison of interest based on median and untransformed variance estimates for log-normal outcome data. Methods A log-normal distribution for outcome data is assumed and a sample size calculation approach for a two-sample t-test that compares log-transformed outcome data is demonstrated where the change of interest is specified as difference in median values on the untransformed scale. A simulation study is used to compare the method with a non-parametric alternative (Mann-Whitney U test in a variety of scenarios and the method is applied to a real example in neurosurgery. Results The method attained a nominal power value in simulation studies and was favourable in comparison to a Mann-Whitney U test and a two-sample t-test of untransformed outcomes. In addition, the method can be adjusted and used in some situations where the outcome distribution is not strictly log-normal. Conclusions We recommend the use of this sample size calculation approach for outcome data that are expected to be positively skewed and where a two group comparison on a log-transformed scale is planned. An advantage of this method over usual calculations based on estimates on the log-transformed scale is that it allows clinical efficacy to be specified as a difference in medians and requires a variance estimate on the untransformed scale. Such estimates are often easier to obtain and more interpretable than those for log-transformed outcomes.

  1. A preliminary study on identification of Thai rice samples by INAA and statistical analysis

    Science.gov (United States)

    Kongsri, S.; Kukusamude, C.

    2017-09-01

    This study aims to investigate the elemental compositions in 93 Thai rice samples using instrumental neutron activation analysis (INAA) and to identify rice according to their types and rice cultivars using statistical analysis. As, Mg, Cl, Al, Br, Mn, K, Rb and Zn in Thai jasmine rice and Sung Yod rice samples were successfully determined by INAA. The accuracy and precision of the INAA method were verified by SRM 1568a Rice Flour. All elements were found to be in a good agreement with the certified values. The precisions in term of %RSD were lower than 7%. The LODs were obtained in range of 0.01 to 29 mg kg-1. The concentration of 9 elements distributed in Thai rice samples was evaluated and used as chemical indicators to identify the type of rice samples. The result found that Mg, Cl, As, Br, Mn, K, Rb, and Zn concentrations in Thai jasmine rice samples are significantly different but there was no evidence that Al is significantly different from concentration in Sung Yod rice samples at 95% confidence interval. Our results may provide preliminary information for discrimination of rice samples and may be useful database of Thai rice.

  2. Statistical Study to Check the Conformity of Aggregate in Kirkuk City to Requirement of Iraqi Specification

    OpenAIRE

    Ammar Saleem Khazaal; Nizar N Ismeel; Abdel fattah K. Hussein

    2018-01-01

    This research reviews a statistical study to check the conformity of aggregates (Coarse and Fine) was used in Kirkuk city to the requirements of the Iraqi specifications. The data of sieve analysis (215 samples) of aggregates being obtained from of National Central Construction Laboratory and Technical College Construction Laboratory in Kirkuk city have analyzed using the statistical program SAS. The results showed that 5%, 17%, and 18% of fine aggregate samples are passing sieve sizes 10 mm,...

  3. Finite-sample instrumental variables inference using an asymptotically pivotal statistic

    NARCIS (Netherlands)

    Bekker, Paul A.; Kleibergen, Frank

    2001-01-01

    The paper considers the K-statistic, Kleibergen’s (2000) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Compared to the AR-statistic this K-statistic shows improved asymptotic efficiency in terms of degrees of freedom in overidenti?ed models and yet it shares,

  4. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range.

    Science.gov (United States)

    Wan, Xiang; Wang, Wenqian; Liu, Jiming; Tong, Tiejun

    2014-12-19

    In systematic reviews and meta-analysis, researchers often pool the results of the sample mean and standard deviation from a set of similar clinical trials. A number of the trials, however, reported the study using the median, the minimum and maximum values, and/or the first and third quartiles. Hence, in order to combine results, one may have to estimate the sample mean and standard deviation for such trials. In this paper, we propose to improve the existing literature in several directions. First, we show that the sample standard deviation estimation in Hozo et al.'s method (BMC Med Res Methodol 5:13, 2005) has some serious limitations and is always less satisfactory in practice. Inspired by this, we propose a new estimation method by incorporating the sample size. Second, we systematically study the sample mean and standard deviation estimation problem under several other interesting settings where the interquartile range is also available for the trials. We demonstrate the performance of the proposed methods through simulation studies for the three frequently encountered scenarios, respectively. For the first two scenarios, our method greatly improves existing methods and provides a nearly unbiased estimate of the true sample standard deviation for normal data and a slightly biased estimate for skewed data. For the third scenario, our method still performs very well for both normal data and skewed data. Furthermore, we compare the estimators of the sample mean and standard deviation under all three scenarios and present some suggestions on which scenario is preferred in real-world applications. In this paper, we discuss different approximation methods in the estimation of the sample mean and standard deviation and propose some new estimation methods to improve the existing literature. We conclude our work with a summary table (an Excel spread sheet including all formulas) that serves as a comprehensive guidance for performing meta-analysis in different

  5. In vitro rumen feed degradability assessed with DaisyII and batch culture: effect of sample size

    Directory of Open Access Journals (Sweden)

    Stefano Schiavon

    2010-01-01

    Full Text Available In vitro degradability with DaisyII (D equipment is commonly performed with 0.5g of feed sample into each filter bag. Literature reported that a reduction of the ratio of sample size to bag surface could facilitate the release of soluble or fine particulate. A reduction of sample size to 0.25 g could improve the correlation between the measurements provided by D and the conventional batch culture (BC. This hypothesis was screened by analysing the results of 2 trials. In trial 1, 7 feeds were incubated for 48h with rumen fluid (3 runs x 4 replications both with D (0.5g/bag and BC; the regressions between the mean values provided for the various feeds in each run by the 2 methods either for NDF (NDFd and in vitro true DM (IVTDMD degradability, had R2 of 0.75 and 0.92 and RSD of 10.9 and 4.8%, respectively. In trial 2, 4 feeds were incubated (2 runs x 8 replications with D (0.25 g/bag and BC; the corresponding regressions for NDFd and IVTDMD showed R2 of 0.94 and 0.98 and RSD of 3.0 and 1.3%, respectively. A sample size of 0.25 g improved the precision of the measurements obtained with D.

  6. Performance and separation occurrence of binary probit regression estimator using maximum likelihood method and Firths approach under different sample size

    Science.gov (United States)

    Lusiana, Evellin Dewi

    2017-12-01

    The parameters of binary probit regression model are commonly estimated by using Maximum Likelihood Estimation (MLE) method. However, MLE method has limitation if the binary data contains separation. Separation is the condition where there are one or several independent variables that exactly grouped the categories in binary response. It will result the estimators of MLE method become non-convergent, so that they cannot be used in modeling. One of the effort to resolve the separation is using Firths approach instead. This research has two aims. First, to identify the chance of separation occurrence in binary probit regression model between MLE method and Firths approach. Second, to compare the performance of binary probit regression model estimator that obtained by MLE method and Firths approach using RMSE criteria. Those are performed using simulation method and under different sample size. The results showed that the chance of separation occurrence in MLE method for small sample size is higher than Firths approach. On the other hand, for larger sample size, the probability decreased and relatively identic between MLE method and Firths approach. Meanwhile, Firths estimators have smaller RMSE than MLEs especially for smaller sample sizes. But for larger sample sizes, the RMSEs are not much different. It means that Firths estimators outperformed MLE estimator.

  7. CONFIDENCE LEVELS AND/VS. STATISTICAL HYPOTHESIS TESTING IN STATISTICAL ANALYSIS. CASE STUDY

    Directory of Open Access Journals (Sweden)

    ILEANA BRUDIU

    2009-05-01

    Full Text Available Estimated parameters with confidence intervals and testing statistical assumptions used in statistical analysis to obtain conclusions on research from a sample extracted from the population. Paper to the case study presented aims to highlight the importance of volume of sample taken in the study and how this reflects on the results obtained when using confidence intervals and testing for pregnant. If statistical testing hypotheses not only give an answer "yes" or "no" to some questions of statistical estimation using statistical confidence intervals provides more information than a test statistic, show high degree of uncertainty arising from small samples and findings build in the "marginally significant" or "almost significant (p very close to 0.05.

  8. Sample size estimation to substantiate freedom from disease for clustered binary data with a specific risk profile

    DEFF Research Database (Denmark)

    Kostoulas, P.; Nielsen, Søren Saxmose; Browne, W. J.

    2013-01-01

    and power when applied to these groups. We propose the use of the variance partition coefficient (VPC), which measures the clustering of infection/disease for individuals with a common risk profile. Sample size estimates are obtained separately for those groups that exhibit markedly different heterogeneity......, thus, optimizing resource allocation. A VPC-based predictive simulation method for sample size estimation to substantiate freedom from disease is presented. To illustrate the benefits of the proposed approach we give two examples with the analysis of data from a risk factor study on Mycobacterium avium...

  9. Adaptive sampling rate control for networked systems based on statistical characteristics of packet disordering.

    Science.gov (United States)

    Li, Jin-Na; Er, Meng-Joo; Tan, Yen-Kheng; Yu, Hai-Bin; Zeng, Peng

    2015-09-01

    This paper investigates an adaptive sampling rate control scheme for networked control systems (NCSs) subject to packet disordering. The main objectives of the proposed scheme are (a) to avoid heavy packet disordering existing in communication networks and (b) to stabilize NCSs with packet disordering, transmission delay and packet loss. First, a novel sampling rate control algorithm based on statistical characteristics of disordering entropy is proposed; secondly, an augmented closed-loop NCS that consists of a plant, a sampler and a state-feedback controller is transformed into an uncertain and stochastic system, which facilitates the controller design. Then, a sufficient condition for stochastic stability in terms of Linear Matrix Inequalities (LMIs) is given. Moreover, an adaptive tracking controller is designed such that the sampling period tracks a desired sampling period, which represents a significant contribution. Finally, experimental results are given to illustrate the effectiveness and advantages of the proposed scheme. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.

  10. Analysis of time series and size of equivalent sample

    International Nuclear Information System (INIS)

    Bernal, Nestor; Molina, Alicia; Pabon, Daniel; Martinez, Jorge

    2004-01-01

    In a meteorological context, a first approach to the modeling of time series is to use models of autoregressive type. This allows one to take into account the meteorological persistence or temporal behavior, thereby identifying the memory of the analyzed process. This article seeks to pre-sent the concept of the size of an equivalent sample, which helps to identify in the data series sub periods with a similar structure. Moreover, in this article we examine the alternative of adjusting the variance of the series, keeping in mind its temporal structure, as well as an adjustment to the covariance of two time series. This article presents two examples, the first one corresponding to seven simulated series with autoregressive structure of first order, and the second corresponding to seven meteorological series of anomalies of the air temperature at the surface in two Colombian regions

  11. Statistical Analysis of Clinical Data on a Pocket Calculator, Part 2 Statistics on a Pocket Calculator, Part 2

    CERN Document Server

    Cleophas, Ton J

    2012-01-01

    The first part of this title contained all statistical tests relevant to starting clinical investigations, and included tests for continuous and binary data, power, sample size, multiple testing, variability, confounding, interaction, and reliability. The current part 2 of this title reviews methods for handling missing data, manipulated data, multiple confounders, predictions beyond observation, uncertainty of diagnostic tests, and the problems of outliers. Also robust tests, non-linear modeling , goodness of fit testing, Bhatacharya models, item response modeling, superiority testing, variab

  12. On some common practices of systematic sampling

    OpenAIRE

    Zhang, Li-Chun

    2006-01-01

    With permission from Statistics Sweden. The original publication is available at http://www.jos.nu/Contents/jos_online.asp Artikkelen er også utgitt i Statistisk sentralbyrås serie Særtrykk/Reprints nr 325. Systematic sampling is widely used technique in survey sampling. It is easy to execute, whether the units are to be selected with equal probability or with probabilities proportional to auxiliary sizes. It can be very efficient if one manage to achieve favourable stratification effect...

  13. Sample size for comparing negative binomial rates in noninferiority and equivalence trials with unequal follow-up times.

    Science.gov (United States)

    Tang, Yongqiang

    2017-05-25

    We derive the sample size formulae for comparing two negative binomial rates based on both the relative and absolute rate difference metrics in noninferiority and equivalence trials with unequal follow-up times, and establish an approximate relationship between the sample sizes required for the treatment comparison based on the two treatment effect metrics. The proposed method allows the dispersion parameter to vary by treatment groups. The accuracy of these methods is assessed by simulations. It is demonstrated that ignoring the between-subject variation in the follow-up time by setting the follow-up time for all individuals to be the mean follow-up time may greatly underestimate the required size, resulting in underpowered studies. Methods are provided for back-calculating the dispersion parameter based on the published summary results.

  14. Determination of Sr-90 in milk samples from the study of statistical results

    Directory of Open Access Journals (Sweden)

    Otero-Pazos Alberto

    2017-01-01

    Full Text Available The determination of 90Sr in milk samples is the main objective of radiation monitoring laboratories because of its environmental importance. In this paper the concentration of activity of 39 milk samples was obtained through radiochemical separation based on selective retention of Sr in a cationic resin (Dowex 50WX8, 50-100 mesh and subsequent determination by a low-level proportional gas counter. The results were checked by performing the measurement of the Sr concentration by using the flame atomic absorption spectroscopy technique, to finally obtain the mass of 90Sr. From the data obtained a statistical treatment was performed using linear regressions. A reliable estimate of the mass of 90Sr was obtained based on the gravimetric technique, and secondly, the counts per minute of the third measurement in the 90Sr and 90Y equilibrium, without having to perform the analysis. These estimates have been verified with 19 milk samples, obtaining overlapping results. The novelty of the manuscript is the possibility of determining the concentration of 90Sr in milk samples, without the need to perform the third measurement in the equilibrium.

  15. Volatility measurement with directional change in Chinese stock market: Statistical property and investment strategy

    Science.gov (United States)

    Ma, Junjun; Xiong, Xiong; He, Feng; Zhang, Wei

    2017-04-01

    The stock price fluctuation is studied in this paper with intrinsic time perspective. The event, directional change (DC) or overshoot, are considered as time scale of price time series. With this directional change law, its corresponding statistical properties and parameter estimation is tested in Chinese stock market. Furthermore, a directional change trading strategy is proposed for invest in the market portfolio in Chinese stock market, and both in-sample and out-of-sample performance are compared among the different method of model parameter estimation. We conclude that DC method can capture important fluctuations in Chinese stock market and gain profit due to the statistical property that average upturn overshoot size is bigger than average downturn directional change size. The optimal parameter of DC method is not fixed and we obtained 1.8% annual excess return with this DC-based trading strategy.

  16. Linear models for airborne-laser-scanning-based operational forest inventory with small field sample size and highly correlated LiDAR data

    Science.gov (United States)

    Junttila, Virpi; Kauranne, Tuomo; Finley, Andrew O.; Bradford, John B.

    2015-01-01

    Modern operational forest inventory often uses remotely sensed data that cover the whole inventory area to produce spatially explicit estimates of forest properties through statistical models. The data obtained by airborne light detection and ranging (LiDAR) correlate well with many forest inventory variables, such as the tree height, the timber volume, and the biomass. To construct an accurate model over thousands of hectares, LiDAR data must be supplemented with several hundred field sample measurements of forest inventory variables. This can be costly and time consuming. Different LiDAR-data-based and spatial-data-based sampling designs can reduce the number of field sample plots needed. However, problems arising from the features of the LiDAR data, such as a large number of predictors compared with the sample size (overfitting) or a strong correlation among predictors (multicollinearity), may decrease the accuracy and precision of the estimates and predictions. To overcome these problems, a Bayesian linear model with the singular value decomposition of predictors, combined with regularization, is proposed. The model performance in predicting different forest inventory variables is verified in ten inventory areas from two continents, where the number of field sample plots is reduced using different sampling designs. The results show that, with an appropriate field plot selection strategy and the proposed linear model, the total relative error of the predicted forest inventory variables is only 5%–15% larger using 50 field sample plots than the error of a linear model estimated with several hundred field sample plots when we sum up the error due to both the model noise variance and the model’s lack of fit.

  17. Statistical flaw strength distributions for glass fibres: Correlation between bundle test and AFM-derived flaw size density functions

    International Nuclear Information System (INIS)

    Foray, G.; Descamps-Mandine, A.; R’Mili, M.; Lamon, J.

    2012-01-01

    The present paper investigates glass fibre flaw size distributions. Two commercial fibre grades (HP and HD) mainly used in cement-based composite reinforcement were studied. Glass fibre fractography is a difficult and time consuming exercise, and thus is seldom carried out. An approach based on tensile tests on multifilament bundles and examination of the fibre surface by atomic force microscopy (AFM) was used. Bundles of more than 500 single filaments each were tested. Thus a statistically significant database of failure data was built up for the HP and HD glass fibres. Gaussian flaw distributions were derived from the filament tensile strength data or extracted from the AFM images. The two distributions were compared. Defect sizes computed from raw AFM images agreed reasonably well with those derived from tensile strength data. Finally, the pertinence of a Gaussian distribution was discussed. The alternative Pareto distribution provided a fair approximation when dealing with AFM flaw size.

  18. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.

    Science.gov (United States)

    Kristunas, Caroline; Morris, Tom; Gray, Laura

    2017-11-15

    To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  19. Sampling of illicit drugs for quantitative analysis--part II. Study of particle size and its influence on mass reduction.

    Science.gov (United States)

    Bovens, M; Csesztregi, T; Franc, A; Nagy, J; Dujourdy, L

    2014-01-01

    The basic goal in sampling for the quantitative analysis of illicit drugs is to maintain the average concentration of the drug in the material from its original seized state (the primary sample) all the way through to the analytical sample, where the effect of particle size is most critical. The size of the largest particles of different authentic illicit drug materials, in their original state and after homogenisation, using manual or mechanical procedures, was measured using a microscope with a camera attachment. The comminution methods employed included pestle and mortar (manual) and various ball and knife mills (mechanical). The drugs investigated were amphetamine, heroin, cocaine and herbal cannabis. It was shown that comminution of illicit drug materials using these techniques reduces the nominal particle size from approximately 600 μm down to between 200 and 300 μm. It was demonstrated that the choice of 1 g increments for the primary samples of powdered drugs and cannabis resin, which were used in the heterogeneity part of our study (Part I) was correct for the routine quantitative analysis of illicit seized drugs. For herbal cannabis we found that the appropriate increment size was larger. Based on the results of this study we can generally state that: An analytical sample weight of between 20 and 35 mg of an illicit powdered drug, with an assumed purity of 5% or higher, would be considered appropriate and would generate an RSDsampling in the same region as the RSDanalysis for a typical quantitative method of analysis for the most common, powdered, illicit drugs. For herbal cannabis, with an assumed purity of 1% THC (tetrahydrocannabinol) or higher, an analytical sample weight of approximately 200 mg would be appropriate. In Part III we will pull together our homogeneity studies and particle size investigations and use them to devise sampling plans and sample preparations suitable for the quantitative instrumental analysis of the most common illicit

  20. Evaluation of species richness estimators based on quantitative performance measures and sensitivity to patchiness and sample grain size

    Science.gov (United States)

    Willie, Jacob; Petre, Charles-Albert; Tagg, Nikki; Lens, Luc

    2012-11-01

    Data from forest herbaceous plants in a site of known species richness in Cameroon were used to test the performance of rarefaction and eight species richness estimators (ACE, ICE, Chao1, Chao2, Jack1, Jack2, Bootstrap and MM). Bias, accuracy, precision and sensitivity to patchiness and sample grain size were the evaluation criteria. An evaluation of the effects of sampling effort and patchiness on diversity estimation is also provided. Stems were identified and counted in linear series of 1-m2 contiguous square plots distributed in six habitat types. Initially, 500 plots were sampled in each habitat type. The sampling process was monitored using rarefaction and a set of richness estimator curves. Curves from the first dataset suggested adequate sampling in riparian forest only. Additional plots ranging from 523 to 2143 were subsequently added in the undersampled habitats until most of the curves stabilized. Jack1 and ICE, the non-parametric richness estimators, performed better, being more accurate and less sensitive to patchiness and sample grain size, and significantly reducing biases that could not be detected by rarefaction and other estimators. This study confirms the usefulness of non-parametric incidence-based estimators, and recommends Jack1 or ICE alongside rarefaction while describing taxon richness and comparing results across areas sampled using similar or different grain sizes. As patchiness varied across habitat types, accurate estimations of diversity did not require the same number of plots. The number of samples needed to fully capture diversity is not necessarily the same across habitats, and can only be known when taxon sampling curves have indicated adequate sampling. Differences in observed species richness between habitats were generally due to differences in patchiness, except between two habitats where they resulted from differences in abundance. We suggest that communities should first be sampled thoroughly using appropriate taxon sampling