WorldWideScience

Sample records for statistical sampling

  1. Sampling, Probability Models and Statistical Reasoning Statistical

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 1; Issue 5. Sampling, Probability Models and Statistical Reasoning Statistical Inference. Mohan Delampady V R Padmawar. General Article Volume 1 Issue 5 May 1996 pp 49-58 ...

  2. Statistical distribution sampling

    Science.gov (United States)

    Johnson, E. S.

    1975-01-01

    Determining the distribution of statistics by sampling was investigated. Characteristic functions, the quadratic regression problem, and the differential equations for the characteristic functions are analyzed.

  3. Statistical Symbolic Execution with Informed Sampling

    Science.gov (United States)

    Filieri, Antonio; Pasareanu, Corina S.; Visser, Willem; Geldenhuys, Jaco

    2014-01-01

    Symbolic execution techniques have been proposed recently for the probabilistic analysis of programs. These techniques seek to quantify the likelihood of reaching program events of interest, e.g., assert violations. They have many promising applications but have scalability issues due to high computational demand. To address this challenge, we propose a statistical symbolic execution technique that performs Monte Carlo sampling of the symbolic program paths and uses the obtained information for Bayesian estimation and hypothesis testing with respect to the probability of reaching the target events. To speed up the convergence of the statistical analysis, we propose Informed Sampling, an iterative symbolic execution that first explores the paths that have high statistical significance, prunes them from the state space and guides the execution towards less likely paths. The technique combines Bayesian estimation with a partial exact analysis for the pruned paths leading to provably improved convergence of the statistical analysis. We have implemented statistical symbolic execution with in- formed sampling in the Symbolic PathFinder tool. We show experimentally that the informed sampling obtains more precise results and converges faster than a purely statistical analysis and may also be more efficient than an exact symbolic analysis. When the latter does not terminate symbolic execution with informed sampling can give meaningful results under the same time and memory limits.

  4. 42 CFR 402.109 - Statistical sampling.

    Science.gov (United States)

    2010-10-01

    ... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study, if based upon an appropriate sampling and computed by valid statistical methods, constitute prima... § 402.1. (c) Burden of proof. Once CMS or OIG has made a prima facie case, the burden is on the...

  5. Contributions to sampling statistics

    CERN Document Server

    Conti, Pier; Ranalli, Maria

    2014-01-01

    This book contains a selection of the papers presented at the ITACOSM 2013 Conference, held in Milan in June 2013. ITACOSM is the bi-annual meeting of the Survey Sampling Group S2G of the Italian Statistical Society, intended as an international  forum of scientific discussion on the developments of theory and application of survey sampling methodologies and applications in human and natural sciences. The book gathers research papers carefully selected from both invited and contributed sessions of the conference. The whole book appears to be a relevant contribution to various key aspects of sampling methodology and techniques; it deals with some hot topics in sampling theory, such as calibration, quantile-regression and multiple frame surveys, and with innovative methodologies in important topics of both sampling theory and applications. Contributions cut across current sampling methodologies such as interval estimation for complex samples, randomized responses, bootstrap, weighting, modeling, imputati...

  6. Statistical literacy and sample survey results

    Science.gov (United States)

    McAlevey, Lynn; Sullivan, Charles

    2010-10-01

    Sample surveys are widely used in the social sciences and business. The news media almost daily quote from them, yet they are widely misused. Using students with prior managerial experience embarking on an MBA course, we show that common sample survey results are misunderstood even by those managers who have previously done a statistics course. In general, they fare no better than managers who have never studied statistics. There are implications for teaching, especially in business schools, as well as for consulting.

  7. Statistical sampling approaches for soil monitoring

    NARCIS (Netherlands)

    Brus, D.J.

    2014-01-01

    This paper describes three statistical sampling approaches for regional soil monitoring, a design-based, a model-based and a hybrid approach. In the model-based approach a space-time model is exploited to predict global statistical parameters of interest such as the space-time mean. In the hybrid

  8. Statistical aspects of food safety sampling

    NARCIS (Netherlands)

    Jongenburger, I.; Besten, den H.M.W.; Zwietering, M.H.

    2015-01-01

    In food safety management, sampling is an important tool for verifying control. Sampling by nature is a stochastic process. However, uncertainty regarding results is made even greater by the uneven distribution of microorganisms in a batch of food. This article reviews statistical aspects of

  9. Audit sampling: A qualitative study on the role of statistical and non-statistical sampling approaches on audit practices in Sweden

    OpenAIRE

    Ayam, Rufus Tekoh

    2011-01-01

    PURPOSE: The two approaches to audit sampling; statistical and nonstatistical have been examined in this study. The overall purpose of the study is to explore the current extent at which statistical and nonstatistical sampling approaches are utilized by independent auditors during auditing practices. Moreover, the study also seeks to achieve two additional purposes; the first is to find out whether auditors utilize different sampling techniques when auditing SME´s (Small and Medium-Sized Ente...

  10. Measuring radioactive half-lives via statistical sampling in practice

    Science.gov (United States)

    Lorusso, G.; Collins, S. M.; Jagan, K.; Hitt, G. W.; Sadek, A. M.; Aitken-Smith, P. M.; Bridi, D.; Keightley, J. D.

    2017-10-01

    The statistical sampling method for the measurement of radioactive decay half-lives exhibits intriguing features such as that the half-life is approximately the median of a distribution closely resembling a Cauchy distribution. Whilst initial theoretical considerations suggested that in certain cases the method could have significant advantages, accurate measurements by statistical sampling have proven difficult, for they require an exercise in non-standard statistical analysis. As a consequence, no half-life measurement using this method has yet been reported and no comparison with traditional methods has ever been made. We used a Monte Carlo approach to address these analysis difficulties, and present the first experimental measurement of a radioisotope half-life (211Pb) by statistical sampling in good agreement with the literature recommended value. Our work also focused on the comparison between statistical sampling and exponential regression analysis, and concluded that exponential regression achieves generally the highest accuracy.

  11. Statistical benchmark for BosonSampling

    International Nuclear Information System (INIS)

    Walschaers, Mattia; Mayer, Klaus; Buchleitner, Andreas; Kuipers, Jack; Urbina, Juan-Diego; Richter, Klaus; Tichy, Malte Christopher

    2016-01-01

    Boson samplers—set-ups that generate complex many-particle output states through the transmission of elementary many-particle input states across a multitude of mutually coupled modes—promise the efficient quantum simulation of a classically intractable computational task, and challenge the extended Church–Turing thesis, one of the fundamental dogmas of computer science. However, as in all experimental quantum simulations of truly complex systems, one crucial problem remains: how to certify that a given experimental measurement record unambiguously results from enforcing the claimed dynamics, on bosons, fermions or distinguishable particles? Here we offer a statistical solution to the certification problem, identifying an unambiguous statistical signature of many-body quantum interference upon transmission across a multimode, random scattering device. We show that statistical analysis of only partial information on the output state allows to characterise the imparted dynamics through particle type-specific features of the emerging interference patterns. The relevant statistical quantifiers are classically computable, define a falsifiable benchmark for BosonSampling, and reveal distinctive features of many-particle quantum dynamics, which go much beyond mere bunching or anti-bunching effects. (fast track communication)

  12. Statistical conditional sampling for variable-resolution video compression.

    Directory of Open Access Journals (Sweden)

    Alexander Wong

    Full Text Available In this study, we investigate a variable-resolution approach to video compression based on Conditional Random Field and statistical conditional sampling in order to further improve compression rate while maintaining high-quality video. In the proposed approach, representative key-frames within a video shot are identified and stored at full resolution. The remaining frames within the video shot are stored and compressed at a reduced resolution. At the decompression stage, a region-based dictionary is constructed from the key-frames and used to restore the reduced resolution frames to the original resolution via statistical conditional sampling. The sampling approach is based on the conditional probability of the CRF modeling by use of the constructed dictionary. Experimental results show that the proposed variable-resolution approach via statistical conditional sampling has potential for improving compression rates when compared to compressing the video at full resolution, while achieving higher video quality when compared to compressing the video at reduced resolution.

  13. Statistical sampling techniques as applied to OSE inspections

    International Nuclear Information System (INIS)

    Davis, J.J.; Cote, R.W.

    1987-01-01

    The need has been recognized for statistically valid methods for gathering information during OSE inspections; and for interpretation of results, both from performance testing and from records reviews, interviews, etc. Battelle Columbus Division, under contract to DOE OSE has performed and is continuing to perform work in the area of statistical methodology for OSE inspections. This paper represents some of the sampling methodology currently being developed for use during OSE inspections. Topics include population definition, sample size requirements, level of confidence and practical logistical constraints associated with the conduct of an inspection based on random sampling. Sequential sampling schemes and sampling from finite populations are also discussed. The methods described are applicable to various data gathering activities, ranging from the sampling and examination of classified documents to the sampling of Protective Force security inspectors for skill testing

  14. Statistical Analysis Of Tank 19F Floor Sample Results

    International Nuclear Information System (INIS)

    Harris, S.

    2010-01-01

    Representative sampling has been completed for characterization of the residual material on the floor of Tank 19F as per the statistical sampling plan developed by Harris and Shine. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples results to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL95%) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current scrape sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 19F. The uncertainty is quantified in this report by an UCL95% on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL95% was based entirely on the six current scrape sample results (each averaged across three analytical determinations).

  15. STATISTICAL ANALYSIS OF TANK 18F FLOOR SAMPLE RESULTS

    Energy Technology Data Exchange (ETDEWEB)

    Harris, S.

    2010-09-02

    Representative sampling has been completed for characterization of the residual material on the floor of Tank 18F as per the statistical sampling plan developed by Shine [1]. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL [2]. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples results [3] to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL{sub 95%}) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 18F. The uncertainty is quantified in this report by an upper 95% confidence limit (UCL{sub 95%}) on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL{sub 95%} was based entirely on the six current scrape sample results (each averaged across three analytical determinations).

  16. Developing Students' Reasoning about Samples and Sampling Variability as a Path to Expert Statistical Thinking

    Science.gov (United States)

    Garfield, Joan; Le, Laura; Zieffler, Andrew; Ben-Zvi, Dani

    2015-01-01

    This paper describes the importance of developing students' reasoning about samples and sampling variability as a foundation for statistical thinking. Research on expert-novice thinking as well as statistical thinking is reviewed and compared. A case is made that statistical thinking is a type of expert thinking, and as such, research…

  17. A course in mathematical statistics and large sample theory

    CERN Document Server

    Bhattacharya, Rabi; Patrangenaru, Victor

    2016-01-01

    This graduate-level textbook is primarily aimed at graduate students of statistics, mathematics, science, and engineering who have had an undergraduate course in statistics, an upper division course in analysis, and some acquaintance with measure theoretic probability. It provides a rigorous presentation of the core of mathematical statistics. Part I of this book constitutes a one-semester course on basic parametric mathematical statistics. Part II deals with the large sample theory of statistics — parametric and nonparametric, and its contents may be covered in one semester as well. Part III provides brief accounts of a number of topics of current interest for practitioners and other disciplines whose work involves statistical methods. Large Sample theory with many worked examples, numerical calculations, and simulations to illustrate theory Appendices provide ready access to a number of standard results, with many proofs Solutions given to a number of selected exercises from Part I Part II exercises with ...

  18. Illustrating Sampling Distribution of a Statistic: Minitab Revisited

    Science.gov (United States)

    Johnson, H. Dean; Evans, Marc A.

    2008-01-01

    Understanding the concept of the sampling distribution of a statistic is essential for the understanding of inferential procedures. Unfortunately, this topic proves to be a stumbling block for students in introductory statistics classes. In efforts to aid students in their understanding of this concept, alternatives to a lecture-based mode of…

  19. Calculating Confidence, Uncertainty, and Numbers of Samples When Using Statistical Sampling Approaches to Characterize and Clear Contaminated Areas

    Energy Technology Data Exchange (ETDEWEB)

    Piepel, Gregory F.; Matzke, Brett D.; Sego, Landon H.; Amidan, Brett G.

    2013-04-27

    This report discusses the methodology, formulas, and inputs needed to make characterization and clearance decisions for Bacillus anthracis-contaminated and uncontaminated (or decontaminated) areas using a statistical sampling approach. Specifically, the report includes the methods and formulas for calculating the • number of samples required to achieve a specified confidence in characterization and clearance decisions • confidence in making characterization and clearance decisions for a specified number of samples for two common statistically based environmental sampling approaches. In particular, the report addresses an issue raised by the Government Accountability Office by providing methods and formulas to calculate the confidence that a decision area is uncontaminated (or successfully decontaminated) if all samples collected according to a statistical sampling approach have negative results. Key to addressing this topic is the probability that an individual sample result is a false negative, which is commonly referred to as the false negative rate (FNR). The two statistical sampling approaches currently discussed in this report are 1) hotspot sampling to detect small isolated contaminated locations during the characterization phase, and 2) combined judgment and random (CJR) sampling during the clearance phase. Typically if contamination is widely distributed in a decision area, it will be detectable via judgment sampling during the characterization phrase. Hotspot sampling is appropriate for characterization situations where contamination is not widely distributed and may not be detected by judgment sampling. CJR sampling is appropriate during the clearance phase when it is desired to augment judgment samples with statistical (random) samples. The hotspot and CJR statistical sampling approaches are discussed in the report for four situations: 1. qualitative data (detect and non-detect) when the FNR = 0 or when using statistical sampling methods that account

  20. Pierre Gy's sampling theory and sampling practice heterogeneity, sampling correctness, and statistical process control

    CERN Document Server

    Pitard, Francis F

    1993-01-01

    Pierre Gy's Sampling Theory and Sampling Practice, Second Edition is a concise, step-by-step guide for process variability management and methods. Updated and expanded, this new edition provides a comprehensive study of heterogeneity, covering the basic principles of sampling theory and its various applications. It presents many practical examples to allow readers to select appropriate sampling protocols and assess the validity of sampling protocols from others. The variability of dynamic process streams using variography is discussed to help bridge sampling theory with statistical process control. Many descriptions of good sampling devices, as well as descriptions of poor ones, are featured to educate readers on what to look for when purchasing sampling systems. The book uses its accessible, tutorial style to focus on professional selection and use of methods. The book will be a valuable guide for mineral processing engineers; metallurgists; geologists; miners; chemists; environmental scientists; and practit...

  1. The application of statistical and/or non-statistical sampling techniques by internal audit functions in the South African banking industry

    Directory of Open Access Journals (Sweden)

    D.P. van der Nest

    2015-03-01

    Full Text Available This article explores the use by internal audit functions of audit sampling techniques in order to test the effectiveness of controls in the banking sector. The article focuses specifically on the use of statistical and/or non-statistical sampling techniques by internal auditors. The focus of the research for this article was internal audit functions in the banking sector of South Africa. The results discussed in the article indicate that audit sampling is still used frequently as an audit evidence-gathering technique. Non-statistical sampling techniques are used more frequently than statistical sampling techniques for the evaluation of the sample. In addition, both techniques are regarded as important for the determination of the sample size and the selection of the sample items

  2. Finite-sample instrumental variables inference using an asymptotically pivotal statistic

    NARCIS (Netherlands)

    Bekker, P; Kleibergen, F

    2003-01-01

    We consider the K-statistic, Kleibergen's (2002, Econometrica 70, 1781-1803) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Whereas Kleibergen (2002) especially analyzes the asymptotic behavior of the statistic, we focus on finite-sample properties in, a

  3. Statistical sampling method for releasing decontaminated vehicles

    International Nuclear Information System (INIS)

    Lively, J.W.; Ware, J.A.

    1996-01-01

    Earth moving vehicles (e.g., dump trucks, belly dumps) commonly haul radiologically contaminated materials from a site being remediated to a disposal site. Traditionally, each vehicle must be surveyed before being released. The logistical difficulties of implementing the traditional approach on a large scale demand that an alternative be devised. A statistical method (MIL-STD-105E, open-quotes Sampling Procedures and Tables for Inspection by Attributesclose quotes) for assessing product quality from a continuous process was adapted to the vehicle decontamination process. This method produced a sampling scheme that automatically compensates and accommodates fluctuating batch sizes and changing conditions without the need to modify or rectify the sampling scheme in the field. Vehicles are randomly selected (sampled) upon completion of the decontamination process to be surveyed for residual radioactive surface contamination. The frequency of sampling is based on the expected number of vehicles passing through the decontamination process in a given period and the confidence level desired. This process has been successfully used for 1 year at the former uranium mill site in Monticello, Utah (a CERCLA regulated clean-up site). The method forces improvement in the quality of the decontamination process and results in a lower likelihood that vehicles exceeding the surface contamination standards are offered for survey. Implementation of this statistical sampling method on Monticello Projects has resulted in more efficient processing of vehicles through decontamination and radiological release, saved hundreds of hours of processing time, provided a high level of confidence that release limits are met, and improved the radiological cleanliness of vehicles leaving the controlled site

  4. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

    Science.gov (United States)

    Lin, Johnny; Bentler, Peter M

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.

  5. Statistical sampling for holdup measurement

    International Nuclear Information System (INIS)

    Picard, R.R.; Pillay, K.K.S.

    1986-01-01

    Nuclear materials holdup is a serious problem in many operating facilities. Estimating amounts of holdup is important for materials accounting and, sometimes, for process safety. Clearly, measuring holdup in all pieces of equipment is not a viable option in terms of time, money, and radiation exposure to personnel. Furthermore, 100% measurement is not only impractical but unnecessary for developing estimated values. Principles of statistical sampling are valuable in the design of cost effective holdup monitoring plans and in qualifying uncertainties in holdup estimates. The purpose of this paper is to describe those principles and to illustrate their use

  6. Sampling methods to the statistical control of the production of blood components.

    Science.gov (United States)

    Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo

    2017-12-01

    The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Multivariate statistics high-dimensional and large-sample approximations

    CERN Document Server

    Fujikoshi, Yasunori; Shimizu, Ryoichi

    2010-01-01

    A comprehensive examination of high-dimensional analysis of multivariate methods and their real-world applications Multivariate Statistics: High-Dimensional and Large-Sample Approximations is the first book of its kind to explore how classical multivariate methods can be revised and used in place of conventional statistical tools. Written by prominent researchers in the field, the book focuses on high-dimensional and large-scale approximations and details the many basic multivariate methods used to achieve high levels of accuracy. The authors begin with a fundamental presentation of the basic

  8. The Role of the Sampling Distribution in Understanding Statistical Inference

    Science.gov (United States)

    Lipson, Kay

    2003-01-01

    Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…

  9. Weighted statistical parameters for irregularly sampled time series

    Science.gov (United States)

    Rimoldini, Lorenzo

    2014-01-01

    Unevenly spaced time series are common in astronomy because of the day-night cycle, weather conditions, dependence on the source position in the sky, allocated telescope time and corrupt measurements, for example, or inherent to the scanning law of satellites like Hipparcos and the forthcoming Gaia. Irregular sampling often causes clumps of measurements and gaps with no data which can severely disrupt the values of estimators. This paper aims at improving the accuracy of common statistical parameters when linear interpolation (in time or phase) can be considered an acceptable approximation of a deterministic signal. A pragmatic solution is formulated in terms of a simple weighting scheme, adapting to the sampling density and noise level, applicable to large data volumes at minimal computational cost. Tests on time series from the Hipparcos periodic catalogue led to significant improvements in the overall accuracy and precision of the estimators with respect to the unweighted counterparts and those weighted by inverse-squared uncertainties. Automated classification procedures employing statistical parameters weighted by the suggested scheme confirmed the benefits of the improved input attributes. The classification of eclipsing binaries, Mira, RR Lyrae, Delta Cephei and Alpha2 Canum Venaticorum stars employing exclusively weighted descriptive statistics achieved an overall accuracy of 92 per cent, about 6 per cent higher than with unweighted estimators.

  10. [Effect sizes, statistical power and sample sizes in "the Japanese Journal of Psychology"].

    Science.gov (United States)

    Suzukawa, Yumi; Toyoda, Hideki

    2012-04-01

    This study analyzed the statistical power of research studies published in the "Japanese Journal of Psychology" in 2008 and 2009. Sample effect sizes and sample statistical powers were calculated for each statistical test and analyzed with respect to the analytical methods and the fields of the studies. The results show that in the fields like perception, cognition or learning, the effect sizes were relatively large, although the sample sizes were small. At the same time, because of the small sample sizes, some meaningful effects could not be detected. In the other fields, because of the large sample sizes, meaningless effects could be detected. This implies that researchers who could not get large enough effect sizes would use larger samples to obtain significant results.

  11. Comparison of pure and 'Latinized' centroidal Voronoi tessellation against various other statistical sampling methods

    International Nuclear Information System (INIS)

    Romero, Vicente J.; Burkardt, John V.; Gunzburger, Max D.; Peterson, Janet S.

    2006-01-01

    A recently developed centroidal Voronoi tessellation (CVT) sampling method is investigated here to assess its suitability for use in statistical sampling applications. CVT efficiently generates a highly uniform distribution of sample points over arbitrarily shaped M-dimensional parameter spaces. On several 2-D test problems CVT has recently been found to provide exceedingly effective and efficient point distributions for response surface generation. Additionally, for statistical function integration and estimation of response statistics associated with uniformly distributed random-variable inputs (uncorrelated), CVT has been found in initial investigations to provide superior points sets when compared against latin-hypercube and simple-random Monte Carlo methods and Halton and Hammersley quasi-random sequence methods. In this paper, the performance of all these sampling methods and a new variant ('Latinized' CVT) are further compared for non-uniform input distributions. Specifically, given uncorrelated normal inputs in a 2-D test problem, statistical sampling efficiencies are compared for resolving various statistics of response: mean, variance, and exceedence probabilities

  12. Effect of model choice and sample size on statistical tolerance limits

    International Nuclear Information System (INIS)

    Duran, B.S.; Campbell, K.

    1980-03-01

    Statistical tolerance limits are estimates of large (or small) quantiles of a distribution, quantities which are very sensitive to the shape of the tail of the distribution. The exact nature of this tail behavior cannot be ascertained brom small samples, so statistical tolerance limits are frequently computed using a statistical model chosen on the basis of theoretical considerations or prior experience with similar populations. This report illustrates the effects of such choices on the computations

  13. Speeding Up Non-Parametric Bootstrap Computations for Statistics Based on Sample Moments in Small/Moderate Sample Size Applications.

    Directory of Open Access Journals (Sweden)

    Elias Chaibub Neto

    Full Text Available In this paper we propose a vectorized implementation of the non-parametric bootstrap for statistics based on sample moments. Basically, we adopt the multinomial sampling formulation of the non-parametric bootstrap, and compute bootstrap replications of sample moment statistics by simply weighting the observed data according to multinomial counts instead of evaluating the statistic on a resampled version of the observed data. Using this formulation we can generate a matrix of bootstrap weights and compute the entire vector of bootstrap replications with a few matrix multiplications. Vectorization is particularly important for matrix-oriented programming languages such as R, where matrix/vector calculations tend to be faster than scalar operations implemented in a loop. We illustrate the application of the vectorized implementation in real and simulated data sets, when bootstrapping Pearson's sample correlation coefficient, and compared its performance against two state-of-the-art R implementations of the non-parametric bootstrap, as well as a straightforward one based on a for loop. Our investigations spanned varying sample sizes and number of bootstrap replications. The vectorized bootstrap compared favorably against the state-of-the-art implementations in all cases tested, and was remarkably/considerably faster for small/moderate sample sizes. The same results were observed in the comparison with the straightforward implementation, except for large sample sizes, where the vectorized bootstrap was slightly slower than the straightforward implementation due to increased time expenditures in the generation of weight matrices via multinomial sampling.

  14. Sampling stored product insect pests: a comparison of four statistical sampling models for probability of pest detection

    Science.gov (United States)

    Statistically robust sampling strategies form an integral component of grain storage and handling activities throughout the world. Developing sampling strategies to target biological pests such as insects in stored grain is inherently difficult due to species biology and behavioral characteristics. ...

  15. Improving Statistics Education through Simulations: The Case of the Sampling Distribution.

    Science.gov (United States)

    Earley, Mark A.

    This paper presents a summary of action research investigating statistics students' understandings of the sampling distribution of the mean. With four sections of an introductory Statistics in Education course (n=98 students), a computer simulation activity (R. delMas, J. Garfield, and B. Chance, 1999) was implemented and evaluated to show…

  16. Comparing Simulated and Theoretical Sampling Distributions of the U3 Person-Fit Statistic.

    Science.gov (United States)

    Emons, Wilco H. M.; Meijer, Rob R.; Sijtsma, Klaas

    2002-01-01

    Studied whether the theoretical sampling distribution of the U3 person-fit statistic is in agreement with the simulated sampling distribution under different item response theory models and varying item and test characteristics. Simulation results suggest that the use of standard normal deviates for the standardized version of the U3 statistic may…

  17. Effect of the absolute statistic on gene-sampling gene-set analysis methods.

    Science.gov (United States)

    Nam, Dougu

    2017-06-01

    Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.

  18. Nomogram for sample size calculation on a straightforward basis for the kappa statistic.

    Science.gov (United States)

    Hong, Hyunsook; Choi, Yunhee; Hahn, Seokyung; Park, Sue Kyung; Park, Byung-Joo

    2014-09-01

    Kappa is a widely used measure of agreement. However, it may not be straightforward in some situation such as sample size calculation due to the kappa paradox: high agreement but low kappa. Hence, it seems reasonable in sample size calculation that the level of agreement under a certain marginal prevalence is considered in terms of a simple proportion of agreement rather than a kappa value. Therefore, sample size formulae and nomograms using a simple proportion of agreement rather than a kappa under certain marginal prevalences are proposed. A sample size formula was derived using the kappa statistic under the common correlation model and goodness-of-fit statistic. The nomogram for the sample size formula was developed using SAS 9.3. The sample size formulae using a simple proportion of agreement instead of a kappa statistic and nomograms to eliminate the inconvenience of using a mathematical formula were produced. A nomogram for sample size calculation with a simple proportion of agreement should be useful in the planning stages when the focus of interest is on testing the hypothesis of interobserver agreement involving two raters and nominal outcome measures. Copyright © 2014 Elsevier Inc. All rights reserved.

  19. Causality in Statistical Power: Isomorphic Properties of Measurement, Research Design, Effect Size, and Sample Size

    Directory of Open Access Journals (Sweden)

    R. Eric Heidel

    2016-01-01

    Full Text Available Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by an a priori sample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up an a priori sample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.

  20. Statistical analyses to support guidelines for marine avian sampling. Final report

    Science.gov (United States)

    Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris

    2012-01-01

    Interest in development of offshore renewable energy facilities has led to a need for high-quality, statistically robust information on marine wildlife distributions. A practical approach is described to estimate the amount of sampling effort required to have sufficient statistical power to identify species-specific “hotspots” and “coldspots” of marine bird abundance and occurrence in an offshore environment divided into discrete spatial units (e.g., lease blocks), where “hotspots” and “coldspots” are defined relative to a reference (e.g., regional) mean abundance and/or occurrence probability for each species of interest. For example, a location with average abundance or occurrence that is three times larger the mean (3x effect size) could be defined as a “hotspot,” and a location that is three times smaller than the mean (1/3x effect size) as a “coldspot.” The choice of the effect size used to define hot and coldspots will generally depend on a combination of ecological and regulatory considerations. A method is also developed for testing the statistical significance of possible hotspots and coldspots. Both methods are illustrated with historical seabird survey data from the USGS Avian Compendium Database. Our approach consists of five main components: 1. A review of the primary scientific literature on statistical modeling of animal group size and avian count data to develop a candidate set of statistical distributions that have been used or may be useful to model seabird counts. 2. Statistical power curves for one-sample, one-tailed Monte Carlo significance tests of differences of observed small-sample means from a specified reference distribution. These curves show the power to detect "hotspots" or "coldspots" of occurrence and abundance at a range of effect sizes, given assumptions which we discuss. 3. A model selection procedure, based on maximum likelihood fits of models in the candidate set, to determine an appropriate statistical

  1. Statistical characterization of a large geochemical database and effect of sample size

    Science.gov (United States)

    Zhang, C.; Manheim, F.T.; Hinde, J.; Grossman, J.N.

    2005-01-01

    The authors investigated statistical distributions for concentrations of chemical elements from the National Geochemical Survey (NGS) database of the U.S. Geological Survey. At the time of this study, the NGS data set encompasses 48,544 stream sediment and soil samples from the conterminous United States analyzed by ICP-AES following a 4-acid near-total digestion. This report includes 27 elements: Al, Ca, Fe, K, Mg, Na, P, Ti, Ba, Ce, Co, Cr, Cu, Ga, La, Li, Mn, Nb, Nd, Ni, Pb, Sc, Sr, Th, V, Y and Zn. The goal and challenge for the statistical overview was to delineate chemical distributions in a complex, heterogeneous data set spanning a large geographic range (the conterminous United States), and many different geological provinces and rock types. After declustering to create a uniform spatial sample distribution with 16,511 samples, histograms and quantile-quantile (Q-Q) plots were employed to delineate subpopulations that have coherent chemical and mineral affinities. Probability groupings are discerned by changes in slope (kinks) on the plots. Major rock-forming elements, e.g., Al, Ca, K and Na, tend to display linear segments on normal Q-Q plots. These segments can commonly be linked to petrologic or mineralogical associations. For example, linear segments on K and Na plots reflect dilution of clay minerals by quartz sand (low in K and Na). Minor and trace element relationships are best displayed on lognormal Q-Q plots. These sensitively reflect discrete relationships in subpopulations within the wide range of the data. For example, small but distinctly log-linear subpopulations for Pb, Cu, Zn and Ag are interpreted to represent ore-grade enrichment of naturally occurring minerals such as sulfides. None of the 27 chemical elements could pass the test for either normal or lognormal distribution on the declustered data set. Part of the reasons relate to the presence of mixtures of subpopulations and outliers. Random samples of the data set with successively

  2. Statistical sampling strategies

    International Nuclear Information System (INIS)

    Andres, T.H.

    1987-01-01

    Systems assessment codes use mathematical models to simulate natural and engineered systems. Probabilistic systems assessment codes carry out multiple simulations to reveal the uncertainty in values of output variables due to uncertainty in the values of the model parameters. In this paper, methods are described for sampling sets of parameter values to be used in a probabilistic systems assessment code. Three Monte Carlo parameter selection methods are discussed: simple random sampling, Latin hypercube sampling, and sampling using two-level orthogonal arrays. Three post-selection transformations are also described: truncation, importance transformation, and discretization. Advantages and disadvantages of each method are summarized

  3. Statistical sampling and modelling for cork oak and eucalyptus stands

    NARCIS (Netherlands)

    Paulo, M.J.

    2002-01-01

    This thesis focuses on the use of modern statistical methods to solve problems on sampling, optimal cutting time and agricultural modelling in Portuguese cork oak and eucalyptus stands. The results are contained in five chapters that have been submitted for publication

  4. Gene coexpression measures in large heterogeneous samples using count statistics.

    Science.gov (United States)

    Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan

    2014-11-18

    With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.

  5. Statistical Methods and Tools for Hanford Staged Feed Tank Sampling

    Energy Technology Data Exchange (ETDEWEB)

    Fountain, Matthew S. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Brigantic, Robert T. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Peterson, Reid A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2013-10-01

    This report summarizes work conducted by Pacific Northwest National Laboratory to technically evaluate the current approach to staged feed sampling of high-level waste (HLW) sludge to meet waste acceptance criteria (WAC) for transfer from tank farms to the Hanford Waste Treatment and Immobilization Plant (WTP). The current sampling and analysis approach is detailed in the document titled Initial Data Quality Objectives for WTP Feed Acceptance Criteria, 24590-WTP-RPT-MGT-11-014, Revision 0 (Arakali et al. 2011). The goal of this current work is to evaluate and provide recommendations to support a defensible, technical and statistical basis for the staged feed sampling approach that meets WAC data quality objectives (DQOs).

  6. Statistical sampling applied to the radiological characterization of historical waste

    Directory of Open Access Journals (Sweden)

    Zaffora Biagio

    2016-01-01

    Full Text Available The evaluation of the activity of radionuclides in radioactive waste is required for its disposal in final repositories. Easy-to-measure nuclides, like γ-emitters and high-energy X-rays, can be measured via non-destructive nuclear techniques from outside a waste package. Some radionuclides are difficult-to-measure (DTM from outside a package because they are α- or β-emitters. The present article discusses the application of linear regression, scaling factors (SF and the so-called “mean activity method” to estimate the activity of DTM nuclides on metallic waste produced at the European Organization for Nuclear Research (CERN. Various statistical sampling techniques including simple random sampling, systematic sampling, stratified and authoritative sampling are described and applied to 2 waste populations of activated copper cables. The bootstrap is introduced as a tool to estimate average activities and standard errors in waste characterization. The analysis of the DTM Ni-63 is used as an example. Experimental and theoretical values of SFs are calculated and compared. Guidelines for sampling historical waste using probabilistic and non-probabilistic sampling are finally given.

  7. Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module

    Energy Technology Data Exchange (ETDEWEB)

    Martinez, Gregory D. [University of California, Physics and Astronomy Department, Los Angeles, CA (United States); McKay, James; Scott, Pat [Imperial College London, Department of Physics, Blackett Laboratory, London (United Kingdom); Farmer, Ben; Conrad, Jan [AlbaNova University Centre, Oskar Klein Centre for Cosmoparticle Physics, Stockholm (Sweden); Stockholm University, Department of Physics, Stockholm (Sweden); Roebber, Elinore [McGill University, Department of Physics, Montreal, QC (Canada); Putze, Antje [LAPTh, Universite de Savoie, CNRS, Annecy-le-Vieux (France); Collaboration: The GAMBIT Scanner Workgroup

    2017-11-15

    We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework GAMBIT. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. We also announce the release of a new standalone differential evolution sampler, Diver, and describe its design, usage and interface to ScannerBit. We subject Diver and three other samplers (the nested sampler MultiNest, the MCMC GreAT, and the native ScannerBit implementation of the ensemble Monte Carlo algorithm T-Walk) to a battery of statistical tests. For this we use a realistic physical likelihood function, based on the scalar singlet model of dark matter. We examine the performance of each sampler as a function of its adjustable settings, and the dimensionality of the sampling problem. We evaluate performance on four metrics: optimality of the best fit found, completeness in exploring the best-fit region, number of likelihood evaluations, and total runtime. For Bayesian posterior estimation at high resolution, T-Walk provides the most accurate and timely mapping of the full parameter space. For profile likelihood analysis in less than about ten dimensions, we find that Diver and MultiNest score similarly in terms of best fit and speed, outperforming GreAT and T-Walk; in ten or more dimensions, Diver substantially outperforms the other three samplers on all metrics. (orig.)

  8. Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics

    Science.gov (United States)

    Pohorille, Andrew

    2006-01-01

    The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described

  9. Statistical sampling plans

    International Nuclear Information System (INIS)

    Jaech, J.L.

    1984-01-01

    In auditing and in inspection, one selects a number of items by some set of procedures and performs measurements which are compared with the operator's values. This session considers the problem of how to select the samples to be measured, and what kinds of measurements to make. In the inspection situation, the ultimate aim is to independently verify the operator's material balance. The effectiveness of the sample plan in achieving this objective is briefly considered. The discussion focuses on the model plant

  10. Comparing simulated and theoretical sampling distributions of the U3 person-fit statistic

    NARCIS (Netherlands)

    Emons, W.H.M.; Meijer, R.R.; Sijtsma, K.

    2002-01-01

    The accuracy with which the theoretical sampling distribution of van der Flier's person-.t statistic U3 approaches the empirical U3 sampling distribution is affected by the item discrimination. A simulation study showed that for tests with a moderate or a strong mean item discrimination, the Type I

  11. The Statistics of Radio Astronomical Polarimetry: Disjoint, Superposed, and Composite Samples

    Energy Technology Data Exchange (ETDEWEB)

    Straten, W. van [Centre for Astrophysics and Supercomputing, Swinburne University of Technology, Hawthorn, VIC 3122 (Australia); Tiburzi, C., E-mail: willem.van.straten@aut.ac.nz [Max-Planck-Institut für Radioastronomie, Auf dem Hügel 69, D-53121 Bonn (Germany)

    2017-02-01

    A statistical framework is presented for the study of the orthogonally polarized modes of radio pulsar emission via the covariances between the Stokes parameters. To accommodate the typically heavy-tailed distributions of single-pulse radio flux density, the fourth-order joint cumulants of the electric field are used to describe the superposition of modes with arbitrary probability distributions. The framework is used to consider the distinction between superposed and disjoint modes, with particular attention to the effects of integration over finite samples. If the interval over which the polarization state is estimated is longer than the timescale for switching between two or more disjoint modes of emission, then the modes are unresolved by the instrument. The resulting composite sample mean exhibits properties that have been attributed to mode superposition, such as depolarization. Because the distinction between disjoint modes and a composite sample of unresolved disjoint modes depends on the temporal resolution of the observing instrumentation, the arguments in favor of superposed modes of pulsar emission are revisited, and observational evidence for disjoint modes is described. In principle, the four-dimensional covariance matrix that describes the distribution of sample mean Stokes parameters can be used to distinguish between disjoint modes, superposed modes, and a composite sample of unresolved disjoint modes. More comprehensive and conclusive interpretation of the covariance matrix requires more detailed consideration of various relevant phenomena, including temporally correlated subpulse modulation (e.g., jitter), statistical dependence between modes (e.g., covariant intensities and partial coherence), and multipath propagation effects (e.g., scintillation and scattering).

  12. TRAN-STAT, Issue No. 3, January 1978. Topics discussed: some statistical aspects of compositing field samples

    International Nuclear Information System (INIS)

    Gilbert, R.O.

    1978-01-01

    Some statistical aspects of compositing field samples of soils for determining the content of Pu are discussed. Some of the potential problems involved in pooling samples are reviewed. This is followed by more detailed discussions and examples of compositing designs, adequacy of mixing, statistical models and their role in compositing, and related topics

  13. Comparing simulated and theoretical sampling distributions of the U3 person-fit statistic

    NARCIS (Netherlands)

    Emons, Wilco H.M.; Meijer, R.R.; Sijtsma, Klaas

    2002-01-01

    The accuracy with which the theoretical sampling distribution of van der Flier’s person-fit statistic U3 approaches the empirical U3 sampling distribution is affected by the item discrimination. A simulation study showed that for tests with a moderate or a strong mean item discrimination, the Type I

  14. Effects of (α,n) contaminants and sample multiplication on statistical neutron correlation measurements

    International Nuclear Information System (INIS)

    Dowdy, E.J.; Hansen, G.E.; Robba, A.A.; Pratt, J.C.

    1980-01-01

    The complete formalism for the use of statistical neutron fluctuation measurements for the nondestructive assay of fissionable materials has been developed. This formalism includes the effect of detector deadtime, neutron multiplicity, random neutron pulse contributions from (α,n) contaminants in the sample, and the sample multiplication of both fission-related and background neutrons

  15. STATISTICAL LANDMARKS AND PRACTICAL ISSUES REGARDING THE USE OF SIMPLE RANDOM SAMPLING IN MARKET RESEARCHES

    Directory of Open Access Journals (Sweden)

    CODRUŢA DURA

    2010-01-01

    Full Text Available The sample represents a particular segment of the statistical populationchosen to represent it as a whole. The representativeness of the sample determines the accuracyfor estimations made on the basis of calculating the research indicators and the inferentialstatistics. The method of random sampling is part of probabilistic methods which can be usedwithin marketing research and it is characterized by the fact that it imposes the requirementthat each unit belonging to the statistical population should have an equal chance of beingselected for the sampling process. When the simple random sampling is meant to be rigorouslyput into practice, it is recommended to use the technique of random number tables in order toconfigure the sample which will provide information that the marketer needs. The paper alsodetails the practical procedure implemented in order to create a sample for a marketingresearch by generating random numbers using the facilities offered by Microsoft Excel.

  16. Subclinical delusional ideation and appreciation of sample size and heterogeneity in statistical judgment.

    Science.gov (United States)

    Galbraith, Niall D; Manktelow, Ken I; Morris, Neil G

    2010-11-01

    Previous studies demonstrate that people high in delusional ideation exhibit a data-gathering bias on inductive reasoning tasks. The current study set out to investigate the factors that may underpin such a bias by examining healthy individuals, classified as either high or low scorers on the Peters et al. Delusions Inventory (PDI). More specifically, whether high PDI scorers have a relatively poor appreciation of sample size and heterogeneity when making statistical judgments. In Expt 1, high PDI scorers made higher probability estimates when generalizing from a sample of 1 with regard to the heterogeneous human property of obesity. In Expt 2, this effect was replicated and was also observed in relation to the heterogeneous property of aggression. The findings suggest that delusion-prone individuals are less appreciative of the importance of sample size when making statistical judgments about heterogeneous properties; this may underpin the data gathering bias observed in previous studies. There was some support for the hypothesis that threatening material would exacerbate high PDI scorers' indifference to sample size.

  17. Sample Size Requirements for Assessing Statistical Moments of Simulated Crop Yield Distributions

    NARCIS (Netherlands)

    Lehmann, N.; Finger, R.; Klein, T.; Calanca, P.

    2013-01-01

    Mechanistic crop growth models are becoming increasingly important in agricultural research and are extensively used in climate change impact assessments. In such studies, statistics of crop yields are usually evaluated without the explicit consideration of sample size requirements. The purpose of

  18. Statistical Sampling For In-Service Inspection Of Liquid Waste Tanks At The Savannah River Site

    International Nuclear Information System (INIS)

    Harris, S.

    2011-01-01

    Savannah River Remediation, LLC (SRR) is implementing a statistical sampling strategy for In-Service Inspection (ISI) of Liquid Waste (LW) Tanks at the United States Department of Energy's Savannah River Site (SRS) in Aiken, South Carolina. As a component of SRS's corrosion control program, the ISI program assesses tank wall structural integrity through the use of ultrasonic testing (UT). The statistical strategy for ISI is based on the random sampling of a number of vertically oriented unit areas, called strips, within each tank. The number of strips to inspect was determined so as to attain, over time, a high probability of observing at least one of the worst 5% in terms of pitting and corrosion across all tanks. The probability estimation to determine the number of strips to inspect was performed using the hypergeometric distribution. Statistical tolerance limits for pit depth and corrosion rates were calculated by fitting the lognormal distribution to the data. In addition to the strip sampling strategy, a single strip within each tank was identified to serve as the baseline for a longitudinal assessment of the tank safe operational life. The statistical sampling strategy enables the ISI program to develop individual profiles of LW tank wall structural integrity that collectively provide a high confidence in their safety and integrity over operational lifetimes.

  19. A cost-saving statistically based screening technique for focused sampling of a lead-contaminated site

    International Nuclear Information System (INIS)

    Moscati, A.F. Jr.; Hediger, E.M.; Rupp, M.J.

    1986-01-01

    High concentrations of lead in soils along an abandoned railroad line prompted a remedial investigation to characterize the extent of contamination across a 7-acre site. Contamination was thought to be spotty across the site reflecting its past use in battery recycling operations at discrete locations. A screening technique was employed to delineate the more highly contaminated areas by testing a statistically determined minimum number of random samples from each of seven discrete site areas. The approach not only quickly identified those site areas which would require more extensive grid sampling, but also provided a statistically defensible basis for excluding other site areas from further consideration, thus saving the cost of additional sample collection and analysis. The reduction in the number of samples collected in ''clean'' areas of the site ranged from 45 to 60%

  20. Ecotoxicology statistical sampling

    International Nuclear Information System (INIS)

    Saona, G.

    2012-01-01

    This presentation introduces to general concepts in toxicology sample designs such as the distribution of organic or inorganic contaminants, a microbiological contamination, and the determination of the position in an eco toxicological bioassays ecosystem.

  1. Survey of statistical and sampling needs for environmental monitoring of commercial low-level radioactive waste disposal facilities

    International Nuclear Information System (INIS)

    Eberhardt, L.L.; Thomas, J.M.

    1986-07-01

    This project was designed to develop guidance for implementing 10 CFR Part 61 and to determine the overall needs for sampling and statistical work in characterizing, surveying, monitoring, and closing commercial low-level waste sites. When cost-effectiveness and statistical reliability are of prime importance, then double sampling, compositing, and stratification (with optimal allocation) are identified as key issues. If the principal concern is avoiding questionable statistical practice, then the applicability of kriging (for assessing spatial pattern), methods for routine monitoring, and use of standard textbook formulae in reporting monitoring results should be reevaluated. Other important issues identified include sampling for estimating model parameters and the use of data from left-censored (less than detectable limits) distributions

  2. Two sample Bayesian prediction intervals for order statistics based on the inverse exponential-type distributions using right censored sample

    Directory of Open Access Journals (Sweden)

    M.M. Mohie El-Din

    2011-10-01

    Full Text Available In this paper, two sample Bayesian prediction intervals for order statistics (OS are obtained. This prediction is based on a certain class of the inverse exponential-type distributions using a right censored sample. A general class of prior density functions is used and the predictive cumulative function is obtained in the two samples case. The class of the inverse exponential-type distributions includes several important distributions such the inverse Weibull distribution, the inverse Burr distribution, the loglogistic distribution, the inverse Pareto distribution and the inverse paralogistic distribution. Special cases of the inverse Weibull model such as the inverse exponential model and the inverse Rayleigh model are considered.

  3. Constrained statistical inference: sample-size tables for ANOVA and regression

    Directory of Open Access Journals (Sweden)

    Leonard eVanbrabant

    2015-01-01

    Full Text Available Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient beta1 is larger than beta2 and beta3. The corresponding hypothesis is H: beta1 > {beta2, beta3} and this is known as an (order constrained hypothesis. A major advantage of testing such a hypothesis is that power can be gained and inherently a smaller sample size is needed. This article discusses this gain in sample size reduction, when an increasing number of constraints is included into the hypothesis. The main goal is to present sample-size tables for constrained hypotheses. A sample-size table contains the necessary sample-size at a prespecified power (say, 0.80 for an increasing number of constraints. To obtain sample-size tables, two Monte Carlo simulations were performed, one for ANOVA and one for multiple regression. Three results are salient. First, in an ANOVA the needed sample-size decreases with 30% to 50% when complete ordering of the parameters is taken into account. Second, small deviations from the imposed order have only a minor impact on the power. Third, at the maximum number of constraints, the linear regression results are comparable with the ANOVA results. However, in the case of fewer constraints, ordering the parameters (e.g., beta1 > beta2 results in a higher power than assigning a positive or a negative sign to the parameters (e.g., beta1 > 0.

  4. CAN'T MISS--conquer any number task by making important statistics simple. Part 2. Probability, populations, samples, and normal distributions.

    Science.gov (United States)

    Hansen, John P

    2003-01-01

    Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 2, describes probability, populations, and samples. The uses of descriptive and inferential statistics are outlined. The article also discusses the properties and probability of normal distributions, including the standard normal distribution.

  5. Supporting Students to Develop Concepts Underlying Sampling and to Shuttle Between Contextual and Statistical Spheres

    NARCIS (Netherlands)

    Bakker, A.; Dierdorp, A.; Maanen, J.A. van; Eijkelhof, H.M.C.

    2012-01-01

    To stimulate students’ shuttling between contextual and statistical spheres, we based tasks on professional practices. This article focuses on two tasks to support reasoning about sampling by students aged 16-17. The purpose of the tasks was to find out which smaller sample size would have been

  6. Statistical issues in reporting quality data: small samples and casemix variation.

    Science.gov (United States)

    Zaslavsky, A M

    2001-12-01

    To present two key statistical issues that arise in analysis and reporting of quality data. Casemix variation is relevant to quality reporting when the units being measured have differing distributions of patient characteristics that also affect the quality outcome. When this is the case, adjustment using stratification or regression may be appropriate. Such adjustments may be controversial when the patient characteristic does not have an obvious relationship to the outcome. Stratified reporting poses problems for sample size and reporting format, but may be useful when casemix effects vary across units. Although there are no absolute standards of reliability, high reliabilities (interunit F > or = 10 or reliability > or = 0.9) are desirable for distinguishing above- and below-average units. When small or unequal sample sizes complicate reporting, precision may be improved using indirect estimation techniques that incorporate auxiliary information, and 'shrinkage' estimation can help to summarize the strength of evidence about units with small samples. With broader understanding of casemix adjustment and methods for analyzing small samples, quality data can be analysed and reported more accurately.

  7. Statistical evaluation of the data obtained from the K East Basin Sandfilter Backwash Pit samples

    International Nuclear Information System (INIS)

    Welsh, T.L.

    1994-01-01

    Samples were obtained from different locations from the K Each Sandfilter Backwash Pit to characterize the sludge material. These samples were analyzed chemically for elements, radionuclides, and residual compounds. The analytical results were statistically analyzed to determine the mean analyte content and the associated variability for each mean value

  8. Discrimination of handlebar grip samples by fourier transform infrared microspectroscopy analysis and statistics

    Directory of Open Access Journals (Sweden)

    Zeyu Lin

    2017-01-01

    Full Text Available In this paper, the authors presented a study on the discrimination of handlebar grip samples, to provide effective forensic science service for hit and run traffic cases. 50 bicycle handlebar grip samples, 49 electric bike handlebar grip samples, and 96 motorcycle handlebar grip samples have been randomly collected by the local police in Beijing (China. Fourier transform infrared microspectroscopy (FTIR was utilized as analytical technology. Then, target absorption selection, data pretreatment, and discrimination of linked samples and unlinked samples were chosen as three steps to improve the discrimination of FTIR spectrums collected from different handlebar grip samples. Principal component analysis and receiver operating characteristic curve were utilized to evaluate different data selection methods and different data pretreatment methods, respectively. It is possible to explore the evidential value of handlebar grip residue evidence through instrumental analysis and statistical treatments. It will provide a universal discrimination method for other forensic science samples as well.

  9. DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data

    International Nuclear Information System (INIS)

    Harris, S.P.

    1997-01-01

    This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock-up Facility using 3 ml inserts and 15 ml peanut vials. A number of the insert samples were analyzed by Cold Chem and compared with full peanut vial samples analyzed by the current methods. The remaining inserts were analyzed by

  10. Statistical sampling plan for the TRU waste assay facility

    International Nuclear Information System (INIS)

    Beauchamp, J.J.; Wright, T.; Schultz, F.J.; Haff, K.; Monroe, R.J.

    1983-08-01

    Due to limited space, there is a need to dispose appropriately of the Oak Ridge National Laboratory transuranic waste which is presently stored below ground in 55-gal (208-l) drums within weather-resistant structures. Waste containing less than 100 nCi/g transuranics can be removed from the present storage and be buried, while waste containing greater than 100 nCi/g transuranics must continue to be retrievably stored. To make the necessary measurements needed to determine the drums that can be buried, a transuranic Neutron Interrogation Assay System (NIAS) has been developed at Los Alamos National Laboratory and can make the needed measurements much faster than previous techniques which involved γ-ray spectroscopy. The previous techniques are reliable but time consuming. Therefore, a validation study has been planned to determine the ability of the NIAS to make adequate measurements. The validation of the NIAS will be based on a paired comparison of a sample of measurements made by the previous techniques and the NIAS. The purpose of this report is to describe the proposed sampling plan and the statistical analyses needed to validate the NIAS. 5 references, 4 figures, 5 tables

  11. Statistics and sampling in transuranic studies

    International Nuclear Information System (INIS)

    Eberhardt, L.L.; Gilbert, R.O.

    1980-01-01

    The existing data on transuranics in the environment exhibit a remarkably high variability from sample to sample (coefficients of variation of 100% or greater). This chapter stresses the necessity of adequate sample size and suggests various ways to increase sampling efficiency. Objectives in sampling are regarded as being of great importance in making decisions as to sampling methodology. Four different classes of sampling methods are described: (1) descriptive sampling, (2) sampling for spatial pattern, (3) analytical sampling, and (4) sampling for modeling. A number of research needs are identified in the various sampling categories along with several problems that appear to be common to two or more such areas

  12. Statistical methods for detecting differentially abundant features in clinical metagenomic samples.

    Directory of Open Access Journals (Sweden)

    James Robert White

    2009-04-01

    Full Text Available Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them.We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software

  13. Statistical assessment of fish behavior from split-beam hydro-acoustic sampling

    International Nuclear Information System (INIS)

    McKinstry, Craig A.; Simmons, Mary Ann; Simmons, Carver S.; Johnson, Robert L.

    2005-01-01

    Statistical methods are presented for using echo-traces from split-beam hydro-acoustic sampling to assess fish behavior in response to a stimulus. The data presented are from a study designed to assess the response of free-ranging, lake-resident fish, primarily kokanee (Oncorhynchus nerka) and rainbow trout (Oncorhynchus mykiss) to high intensity strobe lights, and was conducted at Grand Coulee Dam on the Columbia River in Northern Washington State. The lights were deployed immediately upstream from the turbine intakes, in a region exposed to daily alternating periods of high and low flows. The study design included five down-looking split-beam transducers positioned in a line at incremental distances upstream from the strobe lights, and treatments applied in randomized pseudo-replicate blocks. Statistical methods included the use of odds-ratios from fitted loglinear models. Fish-track velocity vectors were modeled using circular probability distributions. Both analyses are depicted graphically. Study results suggest large increases of fish activity in the presence of the strobe lights, most notably at night and during periods of low flow. The lights also induced notable bimodality in the angular distributions of the fish track velocity vectors. Statistical/SUMmaries are presented along with interpretations on fish behavior

  14. Sparse Power-Law Network Model for Reliable Statistical Predictions Based on Sampled Data

    Directory of Open Access Journals (Sweden)

    Alexander P. Kartun-Giles

    2018-04-01

    Full Text Available A projective network model is a model that enables predictions to be made based on a subsample of the network data, with the predictions remaining unchanged if a larger sample is taken into consideration. An exchangeable model is a model that does not depend on the order in which nodes are sampled. Despite a large variety of non-equilibrium (growing and equilibrium (static sparse complex network models that are widely used in network science, how to reconcile sparseness (constant average degree with the desired statistical properties of projectivity and exchangeability is currently an outstanding scientific problem. Here we propose a network process with hidden variables which is projective and can generate sparse power-law networks. Despite the model not being exchangeable, it can be closely related to exchangeable uncorrelated networks as indicated by its information theory characterization and its network entropy. The use of the proposed network process as a null model is here tested on real data, indicating that the model offers a promising avenue for statistical network modelling.

  15. DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data

    Energy Technology Data Exchange (ETDEWEB)

    Harris, S.P. [Westinghouse Savannah River Company, AIKEN, SC (United States)

    1997-09-18

    This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Either new DWPF analytical method could result in a two to three fold improvement in sample analysis time.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples.The insert sample is named after the initial trials which placed the container inside the sample (peanut) vials. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock

  16. A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions.

    Science.gov (United States)

    Pavlacky, David C; Lukacs, Paul M; Blakesley, Jennifer A; Skorkowsky, Robert C; Klute, David S; Hahn, Beth A; Dreitz, Victoria J; George, T Luke; Hanni, David J

    2017-01-01

    Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1) coordination across organizations and regions, 2) meaningful management and conservation objectives, and 3) rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR) program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17). We provide two examples for the Brewer's sparrow (Spizella breweri) in BCR 17 demonstrating the ability of the design to 1) determine hierarchical population responses to landscape change and 2) estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous statistical

  17. A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions.

    Directory of Open Access Journals (Sweden)

    David C Pavlacky

    Full Text Available Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1 coordination across organizations and regions, 2 meaningful management and conservation objectives, and 3 rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17. We provide two examples for the Brewer's sparrow (Spizella breweri in BCR 17 demonstrating the ability of the design to 1 determine hierarchical population responses to landscape change and 2 estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous

  18. New Hybrid Monte Carlo methods for efficient sampling. From physics to biology and statistics

    International Nuclear Information System (INIS)

    Akhmatskaya, Elena; Reich, Sebastian

    2011-01-01

    We introduce a class of novel hybrid methods for detailed simulations of large complex systems in physics, biology, materials science and statistics. These generalized shadow Hybrid Monte Carlo (GSHMC) methods combine the advantages of stochastic and deterministic simulation techniques. They utilize a partial momentum update to retain some of the dynamical information, employ modified Hamiltonians to overcome exponential performance degradation with the system’s size and make use of multi-scale nature of complex systems. Variants of GSHMCs were developed for atomistic simulation, particle simulation and statistics: GSHMC (thermodynamically consistent implementation of constant-temperature molecular dynamics), MTS-GSHMC (multiple-time-stepping GSHMC), meso-GSHMC (Metropolis corrected dissipative particle dynamics (DPD) method), and a generalized shadow Hamiltonian Monte Carlo, GSHmMC (a GSHMC for statistical simulations). All of these are compatible with other enhanced sampling techniques and suitable for massively parallel computing allowing for a range of multi-level parallel strategies. A brief description of the GSHMC approach, examples of its application on high performance computers and comparison with other existing techniques are given. Our approach is shown to resolve such problems as resonance instabilities of the MTS methods and non-preservation of thermodynamic equilibrium properties in DPD, and to outperform known methods in sampling efficiency by an order of magnitude. (author)

  19. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures

    Directory of Open Access Journals (Sweden)

    Scheid Anika

    2012-07-01

    Full Text Available Abstract Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent stochastic context-free grammar (SCFG that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples, where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones, then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst

  20. Chemometric and Statistical Analyses of ToF-SIMS Spectra of Increasingly Complex Biological Samples

    Energy Technology Data Exchange (ETDEWEB)

    Berman, E S; Wu, L; Fortson, S L; Nelson, D O; Kulp, K S; Wu, K J

    2007-10-24

    Characterizing and classifying molecular variation within biological samples is critical for determining fundamental mechanisms of biological processes that will lead to new insights including improved disease understanding. Towards these ends, time-of-flight secondary ion mass spectrometry (ToF-SIMS) was used to examine increasingly complex samples of biological relevance, including monosaccharide isomers, pure proteins, complex protein mixtures, and mouse embryo tissues. The complex mass spectral data sets produced were analyzed using five common statistical and chemometric multivariate analysis techniques: principal component analysis (PCA), linear discriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), soft independent modeling of class analogy (SIMCA), and decision tree analysis by recursive partitioning. PCA was found to be a valuable first step in multivariate analysis, providing insight both into the relative groupings of samples and into the molecular basis for those groupings. For the monosaccharides, pure proteins and protein mixture samples, all of LDA, PLSDA, and SIMCA were found to produce excellent classification given a sufficient number of compound variables calculated. For the mouse embryo tissues, however, SIMCA did not produce as accurate a classification. The decision tree analysis was found to be the least successful for all the data sets, providing neither as accurate a classification nor chemical insight for any of the tested samples. Based on these results we conclude that as the complexity of the sample increases, so must the sophistication of the multivariate technique used to classify the samples. PCA is a preferred first step for understanding ToF-SIMS data that can be followed by either LDA or PLSDA for effective classification analysis. This study demonstrates the strength of ToF-SIMS combined with multivariate statistical and chemometric techniques to classify increasingly complex biological samples

  1. Examination of statistical noise in SPECT image and sampling pitch

    International Nuclear Information System (INIS)

    Takaki, Akihiro; Soma, Tsutomu; Murase, Kenya; Watanabe, Hiroyuki; Murakami, Tomonori; Kawakami, Kazunori; Teraoka, Satomi; Kojima, Akihiro; Matsumoto, Masanori

    2008-01-01

    Statistical noise in single photon emission computed tomography (SPECT) image was examined for its relation with total count and with sampling pitch by simulation and phantom experiment to obtain their projection data under defined conditions. The former SPECT simulation was performed on assumption of a virtual, homogeneous water column (20 cm diameter) as an absorbing mass. In the latter, used were 3D-Hoffman brain phantom (Data Spectrum Corp.) filled with 370 MBq of 99m Tc-pertechnetate solution and a facing 2-detector SPECT machine with a low-energy/high-resolution collimator, E-CAM (Siemens). Projected data by the two methods were reconstructed through the filtered back projection to make each transaxial image. The noise was evaluated by vision, by their root mean square uncertainty calculated from average count and standard deviation (SD) in the region of interest (ROI) defined in reconstructed images and by normalized mean squares calculated from the difference between the reference image obtained with common sampling pitch to and all of obtained slices of, the simulation and phantom. As a conclusion, the pitch was recommended to be set in the machine as to approximating the value calculated by the sampling theorem, though the projection counts per one angular direction were smaller with the same total time of data acquisition. (R.T.)

  2. Statistical energy as a tool for binning-free, multivariate goodness-of-fit tests, two-sample comparison and unfolding

    International Nuclear Information System (INIS)

    Aslan, B.; Zech, G.

    2005-01-01

    We introduce the novel concept of statistical energy as a statistical tool. We define statistical energy of statistical distributions in a similar way as for electric charge distributions. Charges of opposite sign are in a state of minimum energy if they are equally distributed. This property is used to check whether two samples belong to the same parent distribution, to define goodness-of-fit tests and to unfold distributions distorted by measurement. The approach is binning-free and especially powerful in multidimensional applications

  3. Sample Reuse in Statistical Remodeling.

    Science.gov (United States)

    1987-08-01

    as the jackknife and bootstrap, is an expansion of the functional, T(Fn), or of its distribution function or both. Frangos and Schucany (1987a) used...accelerated bootstrap. In the same report Frangos and Schucany demonstrated the small sample superiority of that approach over the proposals that take...higher order terms of an Edgeworth expansion into account. In a second report Frangos and Schucany (1987b) examined the small sample performance of

  4. Optimal design of sampling and mapping schemes in the radiometric exploration of Chipilapa, El Salvador (Geo-statistics)

    International Nuclear Information System (INIS)

    Balcazar G, M.; Flores R, J.H.

    1992-01-01

    As part of the knowledge about the radiometric surface exploration, carried out in the geothermal field of Chipilapa, El Salvador, its were considered the geo-statistical parameters starting from the calculated variogram of the field data, being that the maxim distance of correlation of the samples in 'radon' in the different observation addresses (N-S, E-W, N W-S E, N E-S W), it was of 121 mts for the monitoring grill in future prospectus in the same area. Being derived of it an optimization (minimum cost) in the spacing of the field samples by means of geo-statistical techniques, without losing the detection of the anomaly. (Author)

  5. Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

    Energy Technology Data Exchange (ETDEWEB)

    Cabalin, L.M.; Gonzalez, A. [Department of Analytical Chemistry, University of Malaga, E-29071 Malaga (Spain); Ruiz, J. [Department of Applied Physics I, University of Malaga, E-29071 Malaga (Spain); Laserna, J.J., E-mail: laserna@uma.e [Department of Analytical Chemistry, University of Malaga, E-29071 Malaga (Spain)

    2010-08-15

    Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s{sup -1}. Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

  6. Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

    Science.gov (United States)

    Cabalín, L. M.; González, A.; Ruiz, J.; Laserna, J. J.

    2010-08-01

    Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s - 1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

  7. Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

    International Nuclear Information System (INIS)

    Cabalin, L.M.; Gonzalez, A.; Ruiz, J.; Laserna, J.J.

    2010-01-01

    Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s -1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

  8. A Preliminary Study on Sensitivity and Uncertainty Analysis with Statistic Method: Uncertainty Analysis with Cross Section Sampling from Lognormal Distribution

    Energy Technology Data Exchange (ETDEWEB)

    Song, Myung Sub; Kim, Song Hyun; Kim, Jong Kyung [Hanyang Univ., Seoul (Korea, Republic of); Noh, Jae Man [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

    2013-10-15

    The uncertainty evaluation with statistical method is performed by repetition of transport calculation with sampling the directly perturbed nuclear data. Hence, the reliable uncertainty result can be obtained by analyzing the results of the numerous transport calculations. One of the problems in the uncertainty analysis with the statistical approach is known as that the cross section sampling from the normal (Gaussian) distribution with relatively large standard deviation leads to the sampling error of the cross sections such as the sampling of the negative cross section. Some collection methods are noted; however, the methods can distort the distribution of the sampled cross sections. In this study, a sampling method of the nuclear data is proposed by using lognormal distribution. After that, the criticality calculations with sampled nuclear data are performed and the results are compared with that from the normal distribution which is conventionally used in the previous studies. In this study, the statistical sampling method of the cross section with the lognormal distribution was proposed to increase the sampling accuracy without negative sampling error. Also, a stochastic cross section sampling and writing program was developed. For the sensitivity and uncertainty analysis, the cross section sampling was pursued with the normal and lognormal distribution. The uncertainties, which are caused by covariance of (n,.) cross sections, were evaluated by solving GODIVA problem. The results show that the sampling method with lognormal distribution can efficiently solve the negative sampling problem referred in the previous studies. It is expected that this study will contribute to increase the accuracy of the sampling-based uncertainty analysis.

  9. A Preliminary Study on Sensitivity and Uncertainty Analysis with Statistic Method: Uncertainty Analysis with Cross Section Sampling from Lognormal Distribution

    International Nuclear Information System (INIS)

    Song, Myung Sub; Kim, Song Hyun; Kim, Jong Kyung; Noh, Jae Man

    2013-01-01

    The uncertainty evaluation with statistical method is performed by repetition of transport calculation with sampling the directly perturbed nuclear data. Hence, the reliable uncertainty result can be obtained by analyzing the results of the numerous transport calculations. One of the problems in the uncertainty analysis with the statistical approach is known as that the cross section sampling from the normal (Gaussian) distribution with relatively large standard deviation leads to the sampling error of the cross sections such as the sampling of the negative cross section. Some collection methods are noted; however, the methods can distort the distribution of the sampled cross sections. In this study, a sampling method of the nuclear data is proposed by using lognormal distribution. After that, the criticality calculations with sampled nuclear data are performed and the results are compared with that from the normal distribution which is conventionally used in the previous studies. In this study, the statistical sampling method of the cross section with the lognormal distribution was proposed to increase the sampling accuracy without negative sampling error. Also, a stochastic cross section sampling and writing program was developed. For the sensitivity and uncertainty analysis, the cross section sampling was pursued with the normal and lognormal distribution. The uncertainties, which are caused by covariance of (n,.) cross sections, were evaluated by solving GODIVA problem. The results show that the sampling method with lognormal distribution can efficiently solve the negative sampling problem referred in the previous studies. It is expected that this study will contribute to increase the accuracy of the sampling-based uncertainty analysis

  10. Parameter sampling capabilities of sequential and simultaneous data assimilation: II. Statistical analysis of numerical results

    International Nuclear Information System (INIS)

    Fossum, Kristian; Mannseth, Trond

    2014-01-01

    We assess and compare parameter sampling capabilities of one sequential and one simultaneous Bayesian, ensemble-based, joint state-parameter (JS) estimation method. In the companion paper, part I (Fossum and Mannseth 2014 Inverse Problems 30 114002), analytical investigations lead us to propose three claims, essentially stating that the sequential method can be expected to outperform the simultaneous method for weakly nonlinear forward models. Here, we assess the reliability and robustness of these claims through statistical analysis of results from a range of numerical experiments. Samples generated by the two approximate JS methods are compared to samples from the posterior distribution generated by a Markov chain Monte Carlo method, using four approximate measures of distance between probability distributions. Forward-model nonlinearity is assessed from a stochastic nonlinearity measure allowing for sufficiently large model dimensions. Both toy models (with low computational complexity, and where the nonlinearity is fairly easy to control) and two-phase porous-media flow models (corresponding to down-scaled versions of problems to which the JS methods have been frequently applied recently) are considered in the numerical experiments. Results from the statistical analysis show strong support of all three claims stated in part I. (paper)

  11. The outlier sample effects on multivariate statistical data processing geochemical stream sediment survey (Moghangegh region, North West of Iran)

    International Nuclear Information System (INIS)

    Ghanbari, Y.; Habibnia, A.; Memar, A.

    2009-01-01

    In geochemical stream sediment surveys in Moghangegh Region in north west of Iran, sheet 1:50,000, 152 samples were collected and after the analyze and processing of data, it revealed that Yb, Sc, Ni, Li, Eu, Cd, Co, as contents in one sample is far higher than other samples. After detecting this sample as an outlier sample, the effect of this sample on multivariate statistical data processing for destructive effects of outlier sample in geochemical exploration was investigated. Pearson and Spear man correlation coefficient methods and cluster analysis were used for multivariate studies and the scatter plot of some elements together the regression profiles are given in case of 152 and 151 samples and the results are compared. After investigation of multivariate statistical data processing results, it was realized that results of existence of outlier samples may appear as the following relations between elements: - true relation between two elements, which have no outlier frequency in the outlier sample. - false relation between two elements which one of them has outlier frequency in the outlier sample. - complete false relation between two elements which both have outlier frequency in the outlier sample

  12. Exact distributions of two-sample rank statistics and block rank statistics using computer algebra

    NARCIS (Netherlands)

    Wiel, van de M.A.

    1998-01-01

    We derive generating functions for various rank statistics and we use computer algebra to compute the exact null distribution of these statistics. We present various techniques for reducing time and memory space used by the computations. We use the results to write Mathematica notebooks for

  13. TRAN-STAT: statistics for environmental studies, Number 22. Comparison of soil-sampling techniques for plutonium at Rocky Flats

    International Nuclear Information System (INIS)

    Gilbert, R.O.; Bernhardt, D.E.; Hahn, P.B.

    1983-01-01

    A summary of a field soil sampling study conducted around the Rocky Flats Colorado plant in May 1977 is preseted. Several different soil sampling techniques that had been used in the area were applied at four different sites. One objective was to comparethe average 239 - 240 Pu concentration values obtained by the various soil sampling techniques used. There was also interest in determining whether there are differences in the reproducibility of the various techniques and how the techniques compared with the proposed EPA technique of sampling to 1 cm depth. Statistically significant differences in average concentrations between the techniques were found. The differences could be largely related to the differences in sampling depth-the primary physical variable between the techniques. The reproducibility of the techniques was evaluated by comparing coefficients of variation. Differences between coefficients of variation were not statistically significant. Average (median) coefficients ranged from 21 to 42 percent for the five sampling techniques. A laboratory study indicated that various sample treatment and particle sizing techniques could increase the concentration of plutonium in the less than 10 micrometer size fraction by up to a factor of about 4 compared to the 2 mm size fraction

  14. Quantification of integrated HIV DNA by repetitive-sampling Alu-HIV PCR on the basis of poisson statistics.

    Science.gov (United States)

    De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos

    2014-06-01

    Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.

  15. Finite-sample instrumental variables Inference using an Asymptotically Pivotal Statistic

    NARCIS (Netherlands)

    Bekker, P.; Kleibergen, F.R.

    2001-01-01

    The paper considers the K-statistic, Kleibergen’s (2000) adaptation ofthe Anderson-Rubin (AR) statistic in instrumental variables regression.Compared to the AR-statistic this K-statistic shows improvedasymptotic efficiency in terms of degrees of freedom in overidentifiedmodels and yet it shares,

  16. A preliminary study on identification of Thai rice samples by INAA and statistical analysis

    Science.gov (United States)

    Kongsri, S.; Kukusamude, C.

    2017-09-01

    This study aims to investigate the elemental compositions in 93 Thai rice samples using instrumental neutron activation analysis (INAA) and to identify rice according to their types and rice cultivars using statistical analysis. As, Mg, Cl, Al, Br, Mn, K, Rb and Zn in Thai jasmine rice and Sung Yod rice samples were successfully determined by INAA. The accuracy and precision of the INAA method were verified by SRM 1568a Rice Flour. All elements were found to be in a good agreement with the certified values. The precisions in term of %RSD were lower than 7%. The LODs were obtained in range of 0.01 to 29 mg kg-1. The concentration of 9 elements distributed in Thai rice samples was evaluated and used as chemical indicators to identify the type of rice samples. The result found that Mg, Cl, As, Br, Mn, K, Rb, and Zn concentrations in Thai jasmine rice samples are significantly different but there was no evidence that Al is significantly different from concentration in Sung Yod rice samples at 95% confidence interval. Our results may provide preliminary information for discrimination of rice samples and may be useful database of Thai rice.

  17. Finite-sample instrumental variables inference using an asymptotically pivotal statistic

    NARCIS (Netherlands)

    Bekker, Paul A.; Kleibergen, Frank

    2001-01-01

    The paper considers the K-statistic, Kleibergen’s (2000) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Compared to the AR-statistic this K-statistic shows improved asymptotic efficiency in terms of degrees of freedom in overidenti?ed models and yet it shares,

  18. CONFIDENCE LEVELS AND/VS. STATISTICAL HYPOTHESIS TESTING IN STATISTICAL ANALYSIS. CASE STUDY

    Directory of Open Access Journals (Sweden)

    ILEANA BRUDIU

    2009-05-01

    Full Text Available Estimated parameters with confidence intervals and testing statistical assumptions used in statistical analysis to obtain conclusions on research from a sample extracted from the population. Paper to the case study presented aims to highlight the importance of volume of sample taken in the study and how this reflects on the results obtained when using confidence intervals and testing for pregnant. If statistical testing hypotheses not only give an answer "yes" or "no" to some questions of statistical estimation using statistical confidence intervals provides more information than a test statistic, show high degree of uncertainty arising from small samples and findings build in the "marginally significant" or "almost significant (p very close to 0.05.

  19. Adaptive sampling rate control for networked systems based on statistical characteristics of packet disordering.

    Science.gov (United States)

    Li, Jin-Na; Er, Meng-Joo; Tan, Yen-Kheng; Yu, Hai-Bin; Zeng, Peng

    2015-09-01

    This paper investigates an adaptive sampling rate control scheme for networked control systems (NCSs) subject to packet disordering. The main objectives of the proposed scheme are (a) to avoid heavy packet disordering existing in communication networks and (b) to stabilize NCSs with packet disordering, transmission delay and packet loss. First, a novel sampling rate control algorithm based on statistical characteristics of disordering entropy is proposed; secondly, an augmented closed-loop NCS that consists of a plant, a sampler and a state-feedback controller is transformed into an uncertain and stochastic system, which facilitates the controller design. Then, a sufficient condition for stochastic stability in terms of Linear Matrix Inequalities (LMIs) is given. Moreover, an adaptive tracking controller is designed such that the sampling period tracks a desired sampling period, which represents a significant contribution. Finally, experimental results are given to illustrate the effectiveness and advantages of the proposed scheme. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.

  20. Mapping cell populations in flow cytometry data for cross‐sample comparison using the Friedman–Rafsky test statistic as a distance measure

    Science.gov (United States)

    Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu

    2015-01-01

    Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing

  1. Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure.

    Science.gov (United States)

    Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H

    2016-01-01

    Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell

  2. Determination of Sr-90 in milk samples from the study of statistical results

    Directory of Open Access Journals (Sweden)

    Otero-Pazos Alberto

    2017-01-01

    Full Text Available The determination of 90Sr in milk samples is the main objective of radiation monitoring laboratories because of its environmental importance. In this paper the concentration of activity of 39 milk samples was obtained through radiochemical separation based on selective retention of Sr in a cationic resin (Dowex 50WX8, 50-100 mesh and subsequent determination by a low-level proportional gas counter. The results were checked by performing the measurement of the Sr concentration by using the flame atomic absorption spectroscopy technique, to finally obtain the mass of 90Sr. From the data obtained a statistical treatment was performed using linear regressions. A reliable estimate of the mass of 90Sr was obtained based on the gravimetric technique, and secondly, the counts per minute of the third measurement in the 90Sr and 90Y equilibrium, without having to perform the analysis. These estimates have been verified with 19 milk samples, obtaining overlapping results. The novelty of the manuscript is the possibility of determining the concentration of 90Sr in milk samples, without the need to perform the third measurement in the equilibrium.

  3. Effect of the Target Motion Sampling Temperature Treatment Method on the Statistics and Performance

    Science.gov (United States)

    Viitanen, Tuomas; Leppänen, Jaakko

    2014-06-01

    Target Motion Sampling (TMS) is a stochastic on-the-fly temperature treatment technique that is being developed as a part of the Monte Carlo reactor physics code Serpent. The method provides for modeling of arbitrary temperatures in continuous-energy Monte Carlo tracking routines with only one set of cross sections stored in the computer memory. Previously, only the performance of the TMS method in terms of CPU time per transported neutron has been discussed. Since the effective cross sections are not calculated at any point of a transport simulation with TMS, reaction rate estimators must be scored using sampled cross sections, which is expected to increase the variances and, consequently, to decrease the figures-of-merit. This paper examines the effects of the TMS on the statistics and performance in practical calculations involving reaction rate estimation with collision estimators. Against all expectations it turned out that the usage of sampled response values has no practical effect on the performance of reaction rate estimators when using TMS with elevated basis cross section temperatures (EBT), i.e. the usual way. With 0 Kelvin cross sections a significant increase in the variances of capture rate estimators was observed right below the energy region of unresolved resonances, but at these energies the figures-of-merit could be increased using a simple resampling technique to decrease the variances of the responses. It was, however, noticed that the usage of the TMS method increases the statistical deviances of all estimators, including the flux estimator, by tens of percents in the vicinity of very strong resonances. This effect is actually not related to the usage of sampled responses, but is instead an inherent property of the TMS tracking method and concerns both EBT and 0 K calculations.

  4. Municipal solid waste composition: Sampling methodology, statistical analyses, and case study evaluation

    International Nuclear Information System (INIS)

    Edjabou, Maklawe Essonanawe; Jensen, Morten Bang; Götze, Ramona; Pivnenko, Kostyantyn; Petersen, Claus; Scheutz, Charlotte; Astrup, Thomas Fruergaard

    2015-01-01

    Highlights: • Tiered approach to waste sorting ensures flexibility and facilitates comparison of solid waste composition data. • Food and miscellaneous wastes are the main fractions contributing to the residual household waste. • Separation of food packaging from food leftovers during sorting is not critical for determination of the solid waste composition. - Abstract: Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10–50 waste fractions, organised according to a three-level (tiered approach) facilitating comparison of the waste data between individual sub-areas with different fractionation (waste from one municipality was sorted at “Level III”, e.g. detailed, while the two others were sorted only at “Level I”). The results showed that residual household waste mainly contained food waste (42 ± 5%, mass per wet basis) and miscellaneous combustibles (18 ± 3%, mass per wet basis). The residual household waste generation rate in the study areas was 3–4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three municipalities. While the waste generation rates were similar for each of the two housing types (single

  5. Municipal solid waste composition: Sampling methodology, statistical analyses, and case study evaluation

    Energy Technology Data Exchange (ETDEWEB)

    Edjabou, Maklawe Essonanawe, E-mail: vine@env.dtu.dk [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark); Jensen, Morten Bang; Götze, Ramona; Pivnenko, Kostyantyn [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark); Petersen, Claus [Econet AS, Omøgade 8, 2.sal, 2100 Copenhagen (Denmark); Scheutz, Charlotte; Astrup, Thomas Fruergaard [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark)

    2015-02-15

    Highlights: • Tiered approach to waste sorting ensures flexibility and facilitates comparison of solid waste composition data. • Food and miscellaneous wastes are the main fractions contributing to the residual household waste. • Separation of food packaging from food leftovers during sorting is not critical for determination of the solid waste composition. - Abstract: Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10–50 waste fractions, organised according to a three-level (tiered approach) facilitating comparison of the waste data between individual sub-areas with different fractionation (waste from one municipality was sorted at “Level III”, e.g. detailed, while the two others were sorted only at “Level I”). The results showed that residual household waste mainly contained food waste (42 ± 5%, mass per wet basis) and miscellaneous combustibles (18 ± 3%, mass per wet basis). The residual household waste generation rate in the study areas was 3–4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three municipalities. While the waste generation rates were similar for each of the two housing types (single

  6. Sample Size and Statistical Conclusions from Tests of Fit to the Rasch Model According to the Rasch Unidimensional Measurement Model (Rumm) Program in Health Outcome Measurement.

    Science.gov (United States)

    Hagell, Peter; Westergren, Albert

    Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).

  7. Multiparametric statistics

    CERN Document Server

    Serdobolskii, Vadim Ivanovich

    2007-01-01

    This monograph presents mathematical theory of statistical models described by the essentially large number of unknown parameters, comparable with sample size but can also be much larger. In this meaning, the proposed theory can be called "essentially multiparametric". It is developed on the basis of the Kolmogorov asymptotic approach in which sample size increases along with the number of unknown parameters.This theory opens a way for solution of central problems of multivariate statistics, which up until now have not been solved. Traditional statistical methods based on the idea of an infinite sampling often break down in the solution of real problems, and, dependent on data, can be inefficient, unstable and even not applicable. In this situation, practical statisticians are forced to use various heuristic methods in the hope the will find a satisfactory solution.Mathematical theory developed in this book presents a regular technique for implementing new, more efficient versions of statistical procedures. ...

  8. Statistical sampling methods for soils monitoring

    Science.gov (United States)

    Ann M. Abbott

    2010-01-01

    Development of the best sampling design to answer a research question should be an interactive venture between the land manager or researcher and statisticians, and is the result of answering various questions. A series of questions that can be asked to guide the researcher in making decisions that will arrive at an effective sampling plan are described, and a case...

  9. Statistics

    CERN Document Server

    Hayslett, H T

    1991-01-01

    Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the

  10. MULTI-LEVEL SAMPLING APPROACH FOR CONTINOUS LOSS DETECTION USING ITERATIVE WINDOW AND STATISTICAL MODEL

    OpenAIRE

    Mohd Fo'ad Rohani; Mohd Aizaini Maarof; Ali Selamat; Houssain Kettani

    2010-01-01

    This paper proposes a Multi-Level Sampling (MLS) approach for continuous Loss of Self-Similarity (LoSS) detection using iterative window. The method defines LoSS based on Second Order Self-Similarity (SOSS) statistical model. The Optimization Method (OM) is used to estimate self-similarity parameter since it is fast and more accurate in comparison with other estimation methods known in the literature. Probability of LoSS detection is introduced to measure continuous LoSS detection performance...

  11. STATISTICAL EVALUATION OF SMALL SCALE MIXING DEMONSTRATION SAMPLING AND BATCH TRANSFER PERFORMANCE - 12093

    Energy Technology Data Exchange (ETDEWEB)

    GREER DA; THIEN MG

    2012-01-12

    The ability to effectively mix, sample, certify, and deliver consistent batches of High Level Waste (HLW) feed from the Hanford Double Shell Tanks (DST) to the Waste Treatment and Immobilization Plant (WTP) presents a significant mission risk with potential to impact mission length and the quantity of HLW glass produced. DOE's Tank Operations Contractor, Washington River Protection Solutions (WRPS) has previously presented the results of mixing performance in two different sizes of small scale DSTs to support scale up estimates of full scale DST mixing performance. Currently, sufficient sampling of DSTs is one of the largest programmatic risks that could prevent timely delivery of high level waste to the WTP. WRPS has performed small scale mixing and sampling demonstrations to study the ability to sufficiently sample the tanks. The statistical evaluation of the demonstration results which lead to the conclusion that the two scales of small DST are behaving similarly and that full scale performance is predictable will be presented. This work is essential to reduce the risk of requiring a new dedicated feed sampling facility and will guide future optimization work to ensure the waste feed delivery mission will be accomplished successfully. This paper will focus on the analytical data collected from mixing, sampling, and batch transfer testing from the small scale mixing demonstration tanks and how those data are being interpreted to begin to understand the relationship between samples taken prior to transfer and samples from the subsequent batches transferred. An overview of the types of data collected and examples of typical raw data will be provided. The paper will then discuss the processing and manipulation of the data which is necessary to begin evaluating sampling and batch transfer performance. This discussion will also include the evaluation of the analytical measurement capability with regard to the simulant material used in the demonstration tests. The

  12. Effect of the Target Motion Sampling temperature treatment method on the statistics and performance

    International Nuclear Information System (INIS)

    Viitanen, Tuomas; Leppänen, Jaakko

    2015-01-01

    Highlights: • Use of the Target Motion Sampling (TMS) method with collision estimators is studied. • The expected values of the estimators agree with NJOY-based reference. • In most practical cases also the variances of the estimators are unaffected by TMS. • Transport calculation slow-down due to TMS dominates the impact on figures-of-merit. - Abstract: Target Motion Sampling (TMS) is a stochastic on-the-fly temperature treatment technique that is being developed as a part of the Monte Carlo reactor physics code Serpent. The method provides for modeling of arbitrary temperatures in continuous-energy Monte Carlo tracking routines with only one set of cross sections stored in the computer memory. Previously, only the performance of the TMS method in terms of CPU time per transported neutron has been discussed. Since the effective cross sections are not calculated at any point of a transport simulation with TMS, reaction rate estimators must be scored using sampled cross sections, which is expected to increase the variances and, consequently, to decrease the figures-of-merit. This paper examines the effects of the TMS on the statistics and performance in practical calculations involving reaction rate estimation with collision estimators. Against all expectations it turned out that the usage of sampled response values has no practical effect on the performance of reaction rate estimators when using TMS with elevated basis cross section temperatures (EBT), i.e. the usual way. With 0 Kelvin cross sections a significant increase in the variances of capture rate estimators was observed right below the energy region of unresolved resonances, but at these energies the figures-of-merit could be increased using a simple resampling technique to decrease the variances of the responses. It was, however, noticed that the usage of the TMS method increases the statistical deviances of all estimators, including the flux estimator, by tens of percents in the vicinity of very

  13. Sampling Errors in Monthly Rainfall Totals for TRMM and SSM/I, Based on Statistics of Retrieved Rain Rates and Simple Models

    Science.gov (United States)

    Bell, Thomas L.; Kundu, Prasun K.; Einaudi, Franco (Technical Monitor)

    2000-01-01

    Estimates from TRMM satellite data of monthly total rainfall over an area are subject to substantial sampling errors due to the limited number of visits to the area by the satellite during the month. Quantitative comparisons of TRMM averages with data collected by other satellites and by ground-based systems require some estimate of the size of this sampling error. A method of estimating this sampling error based on the actual statistics of the TRMM observations and on some modeling work has been developed. "Sampling error" in TRMM monthly averages is defined here relative to the monthly total a hypothetical satellite permanently stationed above the area would have reported. "Sampling error" therefore includes contributions from the random and systematic errors introduced by the satellite remote sensing system. As part of our long-term goal of providing error estimates for each grid point accessible to the TRMM instruments, sampling error estimates for TRMM based on rain retrievals from TRMM microwave (TMI) data are compared for different times of the year and different oceanic areas (to minimize changes in the statistics due to algorithmic differences over land and ocean). Changes in sampling error estimates due to changes in rain statistics due 1) to evolution of the official algorithms used to process the data, and 2) differences from other remote sensing systems such as the Defense Meteorological Satellite Program (DMSP) Special Sensor Microwave/Imager (SSM/I), are analyzed.

  14. Comparing identified and statistically significant lipids and polar metabolites in 15-year old serum and dried blood spot samples for longitudinal studies: Comparing lipids and metabolites in serum and DBS samples

    Energy Technology Data Exchange (ETDEWEB)

    Kyle, Jennifer E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Casey, Cameron P. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Stratton, Kelly G. [National Security Directorate, Pacific Northwest National Laboratory, Richland WA USA; Zink, Erika M. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Kim, Young-Mo [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Zheng, Xueyun [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Monroe, Matthew E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Weitz, Karl K. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Bloodsworth, Kent J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Orton, Daniel J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Ibrahim, Yehia M. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Moore, Ronald J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Lee, Christine G. [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Research Service, Portland Veterans Affairs Medical Center, Portland OR USA; Pedersen, Catherine [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Orwoll, Eric [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Smith, Richard D. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Burnum-Johnson, Kristin E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Baker, Erin S. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA

    2017-02-05

    The use of dried blood spots (DBS) has many advantages over traditional plasma and serum samples such as smaller blood volume required, storage at room temperature, and ability for sampling in remote locations. However, understanding the robustness of different analytes in DBS samples is essential, especially in older samples collected for longitudinal studies. Here we analyzed DBS samples collected in 2000-2001 and stored at room temperature and compared them to matched serum samples stored at -80°C to determine if they could be effectively used as specific time points in a longitudinal study following metabolic disease. Four hundred small molecules were identified in both the serum and DBS samples using gas chromatograph-mass spectrometry (GC-MS), liquid chromatography-MS (LC-MS) and LC-ion mobility spectrometry-MS (LC-IMS-MS). The identified polar metabolites overlapped well between the sample types, though only one statistically significant polar metabolite in a case-control study was conserved, indicating degradation occurs in the DBS samples affecting quantitation. Differences in the lipid identifications indicated that some oxidation occurs in the DBS samples. However, thirty-six statistically significant lipids correlated in both sample types indicating that lipid quantitation was more stable across the sample types.

  15. Application of binomial and multinomial probability statistics to the sampling design process of a global grain tracing and recall system

    Science.gov (United States)

    Small, coded, pill-sized tracers embedded in grain are proposed as a method for grain traceability. A sampling process for a grain traceability system was designed and investigated by applying probability statistics using a science-based sampling approach to collect an adequate number of tracers fo...

  16. Municipal solid waste composition: Sampling methodology, statistical analyses, and case study evaluation

    DEFF Research Database (Denmark)

    Edjabou, Vincent Maklawe Essonanawe; Jensen, Morten Bang; Götze, Ramona

    2015-01-01

    Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both...... comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub......-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10-50 waste fractions, organised according to a three-level (tiered approach) facilitating,comparison of the waste data between individual sub-areas with different fractionation (waste...

  17. A Statistical Primer: Understanding Descriptive and Inferential Statistics

    OpenAIRE

    Gillian Byrne

    2007-01-01

    As libraries and librarians move more towards evidence‐based decision making, the data being generated in libraries is growing. Understanding the basics of statistical analysis is crucial for evidence‐based practice (EBP), in order to correctly design and analyze researchas well as to evaluate the research of others. This article covers the fundamentals of descriptive and inferential statistics, from hypothesis construction to sampling to common statistical techniques including chi‐square, co...

  18. Statistical properties of the surface velocity field in the northern Gulf of Mexico sampled by GLAD drifters

    OpenAIRE

    Mariano, A.J.; Ryan, E.H.; Huntley, H.S.; Laurindo, L.C.; Coelho, E.; Ozgokmen, TM; Berta, M.; Bogucki, D; Chen, S.S.; Curcic, M.; Drouin, K.L.; Gough, M; Haus, BK; Haza, A.C.; Hogan, P

    2016-01-01

    The Grand LAgrangian Deployment (GLAD) used multiscale sampling and GPS technology to observe time series of drifter positions with initial drifter separation of O(100 m) to O(10 km), and nominal 5 min sampling, during the summer and fall of 2012 in the northern Gulf of Mexico. Histograms of the velocity field and its statistical parameters are non-Gaussian; most are multimodal. The dominant periods for the surface velocity field are 1–2 days due to inertial oscillations, tides, and the sea b...

  19. Relationship between accuracy and number of samples on statistical quantity and contour map of environmental gamma-ray dose rate. Example of random sampling

    International Nuclear Information System (INIS)

    Matsuda, Hideharu; Minato, Susumu

    2002-01-01

    The accuracy of statistical quantity like the mean value and contour map obtained by measurement of the environmental gamma-ray dose rate was evaluated by random sampling of 5 different model distribution maps made by the mean slope, -1.3, of power spectra calculated from the actually measured values. The values were derived from 58 natural gamma dose rate data reported worldwide ranging in the means of 10-100 Gy/h rates and 10 -3 -10 7 km 2 areas. The accuracy of the mean value was found around ±7% even for 60 or 80 samplings (the most frequent number) and the standard deviation had the accuracy less than 1/4-1/3 of the means. The correlation coefficient of the frequency distribution was found 0.860 or more for 200-400 samplings (the most frequent number) but of the contour map, 0.502-0.770. (K.H.)

  20. Understanding the Sampling Distribution and Its Use in Testing Statistical Significance.

    Science.gov (United States)

    Breunig, Nancy A.

    Despite the increasing criticism of statistical significance testing by researchers, particularly in the publication of the 1994 American Psychological Association's style manual, statistical significance test results are still popular in journal articles. For this reason, it remains important to understand the logic of inferential statistics. A…

  1. Understanding Computational Bayesian Statistics

    CERN Document Server

    Bolstad, William M

    2011-01-01

    A hands-on introduction to computational statistics from a Bayesian point of view Providing a solid grounding in statistics while uniquely covering the topics from a Bayesian perspective, Understanding Computational Bayesian Statistics successfully guides readers through this new, cutting-edge approach. With its hands-on treatment of the topic, the book shows how samples can be drawn from the posterior distribution when the formula giving its shape is all that is known, and how Bayesian inferences can be based on these samples from the posterior. These ideas are illustrated on common statistic

  2. [A comparison of convenience sampling and purposive sampling].

    Science.gov (United States)

    Suen, Lee-Jen Wu; Huang, Hui-Man; Lee, Hao-Hsien

    2014-06-01

    Convenience sampling and purposive sampling are two different sampling methods. This article first explains sampling terms such as target population, accessible population, simple random sampling, intended sample, actual sample, and statistical power analysis. These terms are then used to explain the difference between "convenience sampling" and purposive sampling." Convenience sampling is a non-probabilistic sampling technique applicable to qualitative or quantitative studies, although it is most frequently used in quantitative studies. In convenience samples, subjects more readily accessible to the researcher are more likely to be included. Thus, in quantitative studies, opportunity to participate is not equal for all qualified individuals in the target population and study results are not necessarily generalizable to this population. As in all quantitative studies, increasing the sample size increases the statistical power of the convenience sample. In contrast, purposive sampling is typically used in qualitative studies. Researchers who use this technique carefully select subjects based on study purpose with the expectation that each participant will provide unique and rich information of value to the study. As a result, members of the accessible population are not interchangeable and sample size is determined by data saturation not by statistical power analysis.

  3. Computational statistics handbook with Matlab

    CERN Document Server

    Martinez, Wendy L

    2007-01-01

    Prefaces Introduction What Is Computational Statistics? An Overview of the Book Probability Concepts Introduction Probability Conditional Probability and Independence Expectation Common Distributions Sampling Concepts Introduction Sampling Terminology and Concepts Sampling Distributions Parameter Estimation Empirical Distribution Function Generating Random Variables Introduction General Techniques for Generating Random Variables Generating Continuous Random Variables Generating Discrete Random Variables Exploratory Data Analysis Introduction Exploring Univariate Data Exploring Bivariate and Trivariate Data Exploring Multidimensional Data Finding Structure Introduction Projecting Data Principal Component Analysis Projection Pursuit EDA Independent Component Analysis Grand Tour Nonlinear Dimensionality Reduction Monte Carlo Methods for Inferential Statistics Introduction Classical Inferential Statistics Monte Carlo Methods for Inferential Statist...

  4. Reliability and statistical power analysis of cortical and subcortical FreeSurfer metrics in a large sample of healthy elderly.

    Science.gov (United States)

    Liem, Franziskus; Mérillat, Susan; Bezzola, Ladina; Hirsiger, Sarah; Philipp, Michel; Madhyastha, Tara; Jäncke, Lutz

    2015-03-01

    FreeSurfer is a tool to quantify cortical and subcortical brain anatomy automatically and noninvasively. Previous studies have reported reliability and statistical power analyses in relatively small samples or only selected one aspect of brain anatomy. Here, we investigated reliability and statistical power of cortical thickness, surface area, volume, and the volume of subcortical structures in a large sample (N=189) of healthy elderly subjects (64+ years). Reliability (intraclass correlation coefficient) of cortical and subcortical parameters is generally high (cortical: ICCs>0.87, subcortical: ICCs>0.95). Surface-based smoothing increases reliability of cortical thickness maps, while it decreases reliability of cortical surface area and volume. Nevertheless, statistical power of all measures benefits from smoothing. When aiming to detect a 10% difference between groups, the number of subjects required to test effects with sufficient power over the entire cortex varies between cortical measures (cortical thickness: N=39, surface area: N=21, volume: N=81; 10mm smoothing, power=0.8, α=0.05). For subcortical regions this number is between 16 and 76 subjects, depending on the region. We also demonstrate the advantage of within-subject designs over between-subject designs. Furthermore, we publicly provide a tool that allows researchers to perform a priori power analysis and sensitivity analysis to help evaluate previously published studies and to design future studies with sufficient statistical power. Copyright © 2014 Elsevier Inc. All rights reserved.

  5. Statistical techniques for sampling and monitoring natural resources

    Science.gov (United States)

    Hans T. Schreuder; Richard Ernst; Hugo Ramirez-Maldonado

    2004-01-01

    We present the statistical theory of inventory and monitoring from a probabilistic point of view. We start with the basics and show the interrelationships between designs and estimators illustrating the methods with a small artificial population as well as with a mapped realistic population. For such applications, useful open source software is given in Appendix 4....

  6. Analysis of statistical misconception in terms of statistical reasoning

    Science.gov (United States)

    Maryati, I.; Priatna, N.

    2018-05-01

    Reasoning skill is needed for everyone to face globalization era, because every person have to be able to manage and use information from all over the world which can be obtained easily. Statistical reasoning skill is the ability to collect, group, process, interpret, and draw conclusion of information. Developing this skill can be done through various levels of education. However, the skill is low because many people assume that statistics is just the ability to count and using formulas and so do students. Students still have negative attitude toward course which is related to research. The purpose of this research is analyzing students’ misconception in descriptive statistic course toward the statistical reasoning skill. The observation was done by analyzing the misconception test result and statistical reasoning skill test; observing the students’ misconception effect toward statistical reasoning skill. The sample of this research was 32 students of math education department who had taken descriptive statistic course. The mean value of misconception test was 49,7 and standard deviation was 10,6 whereas the mean value of statistical reasoning skill test was 51,8 and standard deviation was 8,5. If the minimal value is 65 to state the standard achievement of a course competence, students’ mean value is lower than the standard competence. The result of students’ misconception study emphasized on which sub discussion that should be considered. Based on the assessment result, it was found that students’ misconception happen on this: 1) writing mathematical sentence and symbol well, 2) understanding basic definitions, 3) determining concept that will be used in solving problem. In statistical reasoning skill, the assessment was done to measure reasoning from: 1) data, 2) representation, 3) statistic format, 4) probability, 5) sample, and 6) association.

  7. THE SLOAN DIGITAL SKY SURVEY QUASAR LENS SEARCH. IV. STATISTICAL LENS SAMPLE FROM THE FIFTH DATA RELEASE

    International Nuclear Information System (INIS)

    Inada, Naohisa; Oguri, Masamune; Shin, Min-Su; Kayo, Issha; Fukugita, Masataka; Strauss, Michael A.; Gott, J. Richard; Hennawi, Joseph F.; Morokuma, Tomoki; Becker, Robert H.; Gregg, Michael D.; White, Richard L.; Kochanek, Christopher S.; Chiu, Kuenley; Johnston, David E.; Clocchiatti, Alejandro; Richards, Gordon T.; Schneider, Donald P.; Frieman, Joshua A.

    2010-01-01

    We present the second report of our systematic search for strongly lensed quasars from the data of the Sloan Digital Sky Survey (SDSS). From extensive follow-up observations of 136 candidate objects, we find 36 lenses in the full sample of 77,429 spectroscopically confirmed quasars in the SDSS Data Release 5. We then define a complete sample of 19 lenses, including 11 from our previous search in the SDSS Data Release 3, from the sample of 36,287 quasars with i Λ = 0.84 +0.06 -0.08 (stat.) +0.09 -0.07 (syst.) assuming a flat universe, which is in good agreement with other cosmological observations. We also report the discoveries of seven binary quasars with separations ranging from 1.''1 to 16.''6, which are identified in the course of our lens survey. This study concludes the construction of our statistical lens sample in the full SDSS-I data set.

  8. Adaptive sampling based on the cumulative distribution function of order statistics to delineate heavy-metal contaminated soils using kriging

    International Nuclear Information System (INIS)

    Juang, K.-W.; Lee, D.-Y.; Teng, Y.-L.

    2005-01-01

    Correctly classifying 'contaminated' areas in soils, based on the threshold for a contaminated site, is important for determining effective clean-up actions. Pollutant mapping by means of kriging is increasingly being used for the delineation of contaminated soils. However, those areas where the kriged pollutant concentrations are close to the threshold have a high possibility for being misclassified. In order to reduce the misclassification due to the over- or under-estimation from kriging, an adaptive sampling using the cumulative distribution function of order statistics (CDFOS) was developed to draw additional samples for delineating contaminated soils, while kriging. A heavy-metal contaminated site in Hsinchu, Taiwan was used to illustrate this approach. The results showed that compared with random sampling, adaptive sampling using CDFOS reduced the kriging estimation errors and misclassification rates, and thus would appear to be a better choice than random sampling, as additional sampling is required for delineating the 'contaminated' areas. - A sampling approach was derived for drawing additional samples while kriging

  9. [The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].

    Science.gov (United States)

    Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel

    2017-01-01

    The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.

  10. The research protocol VI: How to choose the appropriate statistical test. Inferential statistics

    Directory of Open Access Journals (Sweden)

    Eric Flores-Ruiz

    2017-10-01

    Full Text Available The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.

  11. Statistical analysis of hydrological response in urbanising catchments based on adaptive sampling using inter-amount times

    Science.gov (United States)

    ten Veldhuis, Marie-Claire; Schleiss, Marc

    2017-04-01

    Urban catchments are typically characterised by a more flashy nature of the hydrological response compared to natural catchments. Predicting flow changes associated with urbanisation is not straightforward, as they are influenced by interactions between impervious cover, basin size, drainage connectivity and stormwater management infrastructure. In this study, we present an alternative approach to statistical analysis of hydrological response variability and basin flashiness, based on the distribution of inter-amount times. We analyse inter-amount time distributions of high-resolution streamflow time series for 17 (semi-)urbanised basins in North Carolina, USA, ranging from 13 to 238 km2 in size. We show that in the inter-amount-time framework, sampling frequency is tuned to the local variability of the flow pattern, resulting in a different representation and weighting of high and low flow periods in the statistical distribution. This leads to important differences in the way the distribution quantiles, mean, coefficient of variation and skewness vary across scales and results in lower mean intermittency and improved scaling. Moreover, we show that inter-amount-time distributions can be used to detect regulation effects on flow patterns, identify critical sampling scales and characterise flashiness of hydrological response. The possibility to use both the classical approach and the inter-amount-time framework to identify minimum observable scales and analyse flow data opens up interesting areas for future research.

  12. Some statistical and sampling needs for detecting spills or migration at commercial low-level radioactive waste disposal sites

    International Nuclear Information System (INIS)

    Thomas, J.M.; Eberhardt, L.L.; Skalski, J.R.; Simmons, M.A.

    1984-05-01

    As part of a larger study funded by the US Nuclear Regulatory Commission we have been investigating field sampling strategies and compositing as a means of detecting spills or migration at commercial low-level radioactive waste disposal sites. The overall project is designed to produce information for developing guidance on implementing 10 CFR part 61. Compositing (pooling samples) for detection is discussed first, followed by our development of a statistical test to allow a decision as to whether any component of a composite exceeds a prescribed maximum acceptable level. The question of optimal field sampling designs and an Apple computer program designed to show the difficulties in constructing efficient field designs and using compositing schemes are considered. 6 references, 3 figures, 3 tables

  13. Explorations in statistics: the log transformation.

    Science.gov (United States)

    Curran-Everett, Douglas

    2018-06-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This thirteenth installment of Explorations in Statistics explores the log transformation, an established technique that rescales the actual observations from an experiment so that the assumptions of some statistical analysis are better met. A general assumption in statistics is that the variability of some response Y is homogeneous across groups or across some predictor variable X. If the variability-the standard deviation-varies in rough proportion to the mean value of Y, a log transformation can equalize the standard deviations. Moreover, if the actual observations from an experiment conform to a skewed distribution, then a log transformation can make the theoretical distribution of the sample mean more consistent with a normal distribution. This is important: the results of a one-sample t test are meaningful only if the theoretical distribution of the sample mean is roughly normal. If we log-transform our observations, then we want to confirm the transformation was useful. We can do this if we use the Box-Cox method, if we bootstrap the sample mean and the statistic t itself, and if we assess the residual plots from the statistical model of the actual and transformed sample observations.

  14. The Statistics of Emission and Detection of Neutrons and Photons from Fissile Samples for Safeguard Applications

    International Nuclear Information System (INIS)

    Enqvist, Andreas

    2008-03-01

    One particular purpose of nuclear safeguards, in addition to accounting for known materials, is the detection, identifying and quantifying unknown material, to prevent accidental and clandestine transports and uses of nuclear materials. This can be achieved in a non-destructive way through the various physical and statistical properties of particle emission and detection from such materials. This thesis addresses some fundamental aspects of nuclear materials and the way they can be detected and quantified by such methods. Factorial moments or multiplicities have long been used within the safeguard area. These are low order moments of the underlying number distributions of emission and detection. One objective of the present work was to determine the full probability distribution and its dependence on the sample mass and the detection process. Derivation and analysis of the full probability distribution and its dependence on the above factors constitutes the first part of the thesis. Another possibility of identifying unknown samples lies in the information in the 'fingerprints' (pulse shape distribution) left by a detected neutron or photon. A study of the statistical properties of the interaction of the incoming radiation (neutrons and photons) with the detectors constitutes the second part of the thesis. The interaction between fast neutrons and organic scintillation detectors is derived, and compared to Monte Carlo simulations. An experimental approach is also addressed in which cross correlation measurements were made using liquid scintillation detectors. First the dependence of the pulse height distribution on the energy and collision number of an incoming neutron was derived analytically and compared to numerical simulations. Then an algorithm was elaborated which can discriminate neutron pulses from photon pulses. The resulting cross correlation graphs are analyzed and discussed whether they can be used in applications to distinguish possible sample

  15. The Statistics of Emission and Detection of Neutrons and Photons from Fissile Samples for Safeguard Applications

    Energy Technology Data Exchange (ETDEWEB)

    Enqvist, Andreas

    2008-03-15

    One particular purpose of nuclear safeguards, in addition to accounting for known materials, is the detection, identifying and quantifying unknown material, to prevent accidental and clandestine transports and uses of nuclear materials. This can be achieved in a non-destructive way through the various physical and statistical properties of particle emission and detection from such materials. This thesis addresses some fundamental aspects of nuclear materials and the way they can be detected and quantified by such methods. Factorial moments or multiplicities have long been used within the safeguard area. These are low order moments of the underlying number distributions of emission and detection. One objective of the present work was to determine the full probability distribution and its dependence on the sample mass and the detection process. Derivation and analysis of the full probability distribution and its dependence on the above factors constitutes the first part of the thesis. Another possibility of identifying unknown samples lies in the information in the 'fingerprints' (pulse shape distribution) left by a detected neutron or photon. A study of the statistical properties of the interaction of the incoming radiation (neutrons and photons) with the detectors constitutes the second part of the thesis. The interaction between fast neutrons and organic scintillation detectors is derived, and compared to Monte Carlo simulations. An experimental approach is also addressed in which cross correlation measurements were made using liquid scintillation detectors. First the dependence of the pulse height distribution on the energy and collision number of an incoming neutron was derived analytically and compared to numerical simulations. Then an algorithm was elaborated which can discriminate neutron pulses from photon pulses. The resulting cross correlation graphs are analyzed and discussed whether they can be used in applications to distinguish possible

  16. Advances in statistics

    Science.gov (United States)

    Howard Stauffer; Nadav Nur

    2005-01-01

    The papers included in the Advances in Statistics section of the Partners in Flight (PIF) 2002 Proceedings represent a small sample of statistical topics of current importance to Partners In Flight research scientists: hierarchical modeling, estimation of detection probabilities, and Bayesian applications. Sauer et al. (this volume) examines a hierarchical model...

  17. Geometric statistical inference

    International Nuclear Information System (INIS)

    Periwal, Vipul

    1999-01-01

    A reparametrization-covariant formulation of the inverse problem of probability is explicitly solved for finite sample sizes. The inferred distribution is explicitly continuous for finite sample size. A geometric solution of the statistical inference problem in higher dimensions is outlined

  18. Frontiers in statistical quality control 11

    CERN Document Server

    Schmid, Wolfgang

    2015-01-01

    The main focus of this edited volume is on three major areas of statistical quality control: statistical process control (SPC), acceptance sampling and design of experiments. The majority of the papers deal with statistical process control, while acceptance sampling and design of experiments are also treated to a lesser extent. The book is organized into four thematic parts, with Part I addressing statistical process control. Part II is devoted to acceptance sampling. Part III covers the design of experiments, while Part IV discusses related fields. The twenty-three papers in this volume stem from The 11th International Workshop on Intelligent Statistical Quality Control, which was held in Sydney, Australia from August 20 to August 23, 2013. The event was hosted by Professor Ross Sparks, CSIRO Mathematics, Informatics and Statistics, North Ryde, Australia and was jointly organized by Professors S. Knoth, W. Schmid and Ross Sparks. The papers presented here were carefully selected and reviewed by the scientifi...

  19. Classroom Research: Assessment of Student Understanding of Sampling Distributions of Means and the Central Limit Theorem in Post-Calculus Probability and Statistics Classes

    Science.gov (United States)

    Lunsford, M. Leigh; Rowell, Ginger Holmes; Goodson-Espy, Tracy

    2006-01-01

    We applied a classroom research model to investigate student understanding of sampling distributions of sample means and the Central Limit Theorem in post-calculus introductory probability and statistics courses. Using a quantitative assessment tool developed by previous researchers and a qualitative assessment tool developed by the authors, we…

  20. Statistical Inference for Data Adaptive Target Parameters.

    Science.gov (United States)

    Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J

    2016-05-01

    Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.

  1. Mathematical background and attitudes toward statistics in a sample of Spanish college students.

    Science.gov (United States)

    Carmona, José; Martínez, Rafael J; Sánchez, Manuel

    2005-08-01

    To examine the relation of mathematical background and initial attitudes toward statistics of Spanish college students in social sciences the Survey of Attitudes Toward Statistics was given to 827 students. Multivariate analyses tested the effects of two indicators of mathematical background (amount of exposure and achievement in previous courses) on the four subscales. Analysis suggested grades in previous courses are more related to initial attitudes toward statistics than the number of mathematics courses taken. Mathematical background was related with students' affective responses to statistics but not with their valuing of statistics. Implications of possible research are discussed.

  2. A statistical rationale for establishing process quality control limits using fixed sample size, for critical current verification of SSC superconducting wire

    International Nuclear Information System (INIS)

    Pollock, D.A.; Brown, G.; Capone, D.W. II; Christopherson, D.; Seuntjens, J.M.; Woltz, J.

    1992-01-01

    This work has demonstrated the statistical concepts behind the XBAR R method for determining sample limits to verify billet I c performance and process uniformity. Using a preliminary population estimate for μ and σ from a stable production lot of only 5 billets, we have shown that reasonable sensitivity to systematic process drift and random within billet variation may be achieved, by using per billet subgroup sizes of moderate proportions. The effects of subgroup size (n) and sampling risk (α and β) on the calculated control limits have been shown to be important factors that need to be carefully considered when selecting an actual number of measurements to be used per billet, for each supplier process. Given the present method of testing in which individual wire samples are ramped to I c only once, with measurement uncertainty due to repeatability and reproducibility (typically > 1.4%), large subgroups (i.e. >30 per billet) appear to be unnecessary, except as an inspection tool to confirm wire process history for each spool. The introduction of the XBAR R method or a similar Statistical Quality Control procedure is recommend for use in the superconducing wire production program, particularly when the program transitions from requiring tests for all pieces of wire to sampling each production unit

  3. Sample representativeness verification of the FADN CZ farm business sample

    Directory of Open Access Journals (Sweden)

    Marie Prášilová

    2011-01-01

    Full Text Available Sample representativeness verification is one of the key stages of statistical work. After having joined the European Union the Czech Republic joined also the Farm Accountancy Data Network system of the Union. This is a sample of bodies and companies doing business in agriculture. Detailed production and economic data on the results of farming business are collected from that sample annually and results for the entire population of the country´s farms are then estimated and assessed. It is important hence, that the sample be representative. Representativeness is to be assessed as to the number of farms included in the survey and also as to the degree of accordance of the measures and indices as related to the population. The paper deals with the special statistical techniques and methods of the FADN CZ sample representativeness verification including the necessary sample size statement procedure. The Czech farm population data have been obtained from the Czech Statistical Office data bank.

  4. Contributions to statistics

    CERN Document Server

    Mahalanobis, P C

    1965-01-01

    Contributions to Statistics focuses on the processes, methodologies, and approaches involved in statistics. The book is presented to Professor P. C. Mahalanobis on the occasion of his 70th birthday. The selection first offers information on the recovery of ancillary information and combinatorial properties of partially balanced designs and association schemes. Discussions focus on combinatorial applications of the algebra of association matrices, sample size analogy, association matrices and the algebra of association schemes, and conceptual statistical experiments. The book then examines latt

  5. Statistics Clinic

    Science.gov (United States)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  6. Design-based Sample and Probability Law-Assumed Sample: Their Role in Scientific Investigation.

    Science.gov (United States)

    Ojeda, Mario Miguel; Sahai, Hardeo

    2002-01-01

    Discusses some key statistical concepts in probabilistic and non-probabilistic sampling to provide an overview for understanding the inference process. Suggests a statistical model constituting the basis of statistical inference and provides a brief review of the finite population descriptive inference and a quota sampling inferential theory.…

  7. Statistical data fusion for cross-tabulation

    NARCIS (Netherlands)

    Kamakura, W.A.; Wedel, M.

    The authors address the situation in which a researcher wants to cross-tabulate two sets of discrete variables collected in independent samples, but a subset of the variables is common to both samples. The authors propose a statistical data-fusion model that allows for statistical tests of

  8. Business statistics I essentials

    CERN Document Server

    Clark, Louise

    2014-01-01

    REA's Essentials provide quick and easy access to critical information in a variety of different fields, ranging from the most basic to the most advanced. As its name implies, these concise, comprehensive study guides summarize the essentials of the field covered. Essentials are helpful when preparing for exams, doing homework and will remain a lasting reference source for students, teachers, and professionals. Business Statistics I includes descriptive statistics, introduction to probability, probability distributions, sampling and sampling distributions, interval estimation, and hypothesis t

  9. Statistics 101 for Radiologists.

    Science.gov (United States)

    Anvari, Arash; Halpern, Elkan F; Samir, Anthony E

    2015-10-01

    Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced. © RSNA, 2015.

  10. Seasonal rationalization of river water quality sampling locations: a comparative study of the modified Sanders and multivariate statistical approaches.

    Science.gov (United States)

    Varekar, Vikas; Karmakar, Subhankar; Jha, Ramakar

    2016-02-01

    The design of surface water quality sampling location is a crucial decision-making process for rationalization of monitoring network. The quantity, quality, and types of available dataset (watershed characteristics and water quality data) may affect the selection of appropriate design methodology. The modified Sanders approach and multivariate statistical techniques [particularly factor analysis (FA)/principal component analysis (PCA)] are well-accepted and widely used techniques for design of sampling locations. However, their performance may vary significantly with quantity, quality, and types of available dataset. In this paper, an attempt has been made to evaluate performance of these techniques by accounting the effect of seasonal variation, under a situation of limited water quality data but extensive watershed characteristics information, as continuous and consistent river water quality data is usually difficult to obtain, whereas watershed information may be made available through application of geospatial techniques. A case study of Kali River, Western Uttar Pradesh, India, is selected for the analysis. The monitoring was carried out at 16 sampling locations. The discrete and diffuse pollution loads at different sampling sites were estimated and accounted using modified Sanders approach, whereas the monitored physical and chemical water quality parameters were utilized as inputs for FA/PCA. The designed optimum number of sampling locations for monsoon and non-monsoon seasons by modified Sanders approach are eight and seven while that for FA/PCA are eleven and nine, respectively. Less variation in the number and locations of designed sampling sites were obtained by both techniques, which shows stability of results. A geospatial analysis has also been carried out to check the significance of designed sampling location with respect to river basin characteristics and land use of the study area. Both methods are equally efficient; however, modified Sanders

  11. Sampling, Probability Models and Statistical Reasoning -RE ...

    Indian Academy of Sciences (India)

    random sampling allows data to be modelled with the help of probability ... g based on different trials to get an estimate of the experimental error. ... research interests lie in the .... if e is indeed the true value of the proportion of defectives in the.

  12. Statistical inference

    CERN Document Server

    Rohatgi, Vijay K

    2003-01-01

    Unified treatment of probability and statistics examines and analyzes the relationship between the two fields, exploring inferential issues. Numerous problems, examples, and diagrams--some with solutions--plus clear-cut, highlighted summaries of results. Advanced undergraduate to graduate level. Contents: 1. Introduction. 2. Probability Model. 3. Probability Distributions. 4. Introduction to Statistical Inference. 5. More on Mathematical Expectation. 6. Some Discrete Models. 7. Some Continuous Models. 8. Functions of Random Variables and Random Vectors. 9. Large-Sample Theory. 10. General Meth

  13. Balanced sampling

    NARCIS (Netherlands)

    Brus, D.J.

    2015-01-01

    In balanced sampling a linear relation between the soil property of interest and one or more covariates with known means is exploited in selecting the sampling locations. Recent developments make this sampling design attractive for statistical soil surveys. This paper introduces balanced sampling

  14. Combining censored and uncensored data in a U-statistic: design and sample size implications for cell therapy research.

    Science.gov (United States)

    Moyé, Lemuel A; Lai, Dejian; Jing, Kaiyan; Baraniuk, Mary Sarah; Kwak, Minjung; Penn, Marc S; Wu, Colon O

    2011-01-01

    The assumptions that anchor large clinical trials are rooted in smaller, Phase II studies. In addition to specifying the target population, intervention delivery, and patient follow-up duration, physician-scientists who design these Phase II studies must select the appropriate response variables (endpoints). However, endpoint measures can be problematic. If the endpoint assesses the change in a continuous measure over time, then the occurrence of an intervening significant clinical event (SCE), such as death, can preclude the follow-up measurement. Finally, the ideal continuous endpoint measurement may be contraindicated in a fraction of the study patients, a change that requires a less precise substitution in this subset of participants.A score function that is based on the U-statistic can address these issues of 1) intercurrent SCE's and 2) response variable ascertainments that use different measurements of different precision. The scoring statistic is easy to apply, clinically relevant, and provides flexibility for the investigators' prospective design decisions. Sample size and power formulations for this statistic are provided as functions of clinical event rates and effect size estimates that are easy for investigators to identify and discuss. Examples are provided from current cardiovascular cell therapy research.

  15. The Sloan Digital Sky Survey Quasar Lens Search. IV. Statistical Lens Sample from the Fifth Data Release

    Energy Technology Data Exchange (ETDEWEB)

    Inada, Naohisa; /Wako, RIKEN /Tokyo U., ICEPP; Oguri, Masamune; /Natl. Astron. Observ. of Japan /Stanford U., Phys. Dept.; Shin, Min-Su; /Michigan U. /Princeton U. Observ.; Kayo, Issha; /Tokyo U., ICRR; Strauss, Michael A.; /Princeton U. Observ.; Hennawi, Joseph F.; /UC, Berkeley /Heidelberg, Max Planck Inst. Astron.; Morokuma, Tomoki; /Natl. Astron. Observ. of Japan; Becker, Robert H.; /LLNL, Livermore /UC, Davis; White, Richard L.; /Baltimore, Space Telescope Sci.; Kochanek, Christopher S.; /Ohio State U.; Gregg, Michael D.; /LLNL, Livermore /UC, Davis /Exeter U.

    2010-05-01

    We present the second report of our systematic search for strongly lensed quasars from the data of the Sloan Digital Sky Survey (SDSS). From extensive follow-up observations of 136 candidate objects, we find 36 lenses in the full sample of 77,429 spectroscopically confirmed quasars in the SDSS Data Release 5. We then define a complete sample of 19 lenses, including 11 from our previous search in the SDSS Data Release 3, from the sample of 36,287 quasars with i < 19.1 in the redshift range 0.6 < z < 2.2, where we require the lenses to have image separations of 1 < {theta} < 20 and i-band magnitude differences between the two images smaller than 1.25 mag. Among the 19 lensed quasars, 3 have quadruple-image configurations, while the remaining 16 show double images. This lens sample constrains the cosmological constant to be {Omega}{sub {Lambda}} = 0.84{sub -0.08}{sup +0.06}(stat.){sub -0.07}{sup + 0.09}(syst.) assuming a flat universe, which is in good agreement with other cosmological observations. We also report the discoveries of 7 binary quasars with separations ranging from 1.1 to 16.6, which are identified in the course of our lens survey. This study concludes the construction of our statistical lens sample in the full SDSS-I data set.

  16. Testing statistical hypotheses

    CERN Document Server

    Lehmann, E L

    2005-01-01

    The third edition of Testing Statistical Hypotheses updates and expands upon the classic graduate text, emphasizing optimality theory for hypothesis testing and confidence sets. The principal additions include a rigorous treatment of large sample optimality, together with the requisite tools. In addition, an introduction to the theory of resampling methods such as the bootstrap is developed. The sections on multiple testing and goodness of fit testing are expanded. The text is suitable for Ph.D. students in statistics and includes over 300 new problems out of a total of more than 760. E.L. Lehmann is Professor of Statistics Emeritus at the University of California, Berkeley. He is a member of the National Academy of Sciences and the American Academy of Arts and Sciences, and the recipient of honorary degrees from the University of Leiden, The Netherlands and the University of Chicago. He is the author of Elements of Large-Sample Theory and (with George Casella) he is also the author of Theory of Point Estimat...

  17. Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics

    Science.gov (United States)

    Dowding, Irene; Haufe, Stefan

    2018-01-01

    Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885

  18. MANAGERIAL DECISION IN INNOVATIVE EDUCATION SYSTEMS STATISTICAL SURVEY BASED ON SAMPLE THEORY

    Directory of Open Access Journals (Sweden)

    Gheorghe SĂVOIU

    2012-12-01

    Full Text Available Before formulating the statistical hypotheses and the econometrictesting itself, a breakdown of some of the technical issues is required, which are related to managerial decision in innovative educational systems, the educational managerial phenomenon tested through statistical and mathematical methods, respectively the significant difference in perceiving the current qualities, knowledge, experience, behaviour and desirable health, obtained through a questionnaire applied to a stratified population at the end,in the educational environment, either with educational activities, or with simultaneously managerial and educational activities. The details having to do with research focused on the survey theory, turning into a working tool the questionnaires and statistical data that are processed from those questionnaires, are summarized below.

  19. Statistical concepts a second course

    CERN Document Server

    Lomax, Richard G

    2012-01-01

    Statistical Concepts consists of the last 9 chapters of An Introduction to Statistical Concepts, 3rd ed. Designed for the second course in statistics, it is one of the few texts that focuses just on intermediate statistics. The book highlights how statistics work and what they mean to better prepare students to analyze their own data and interpret SPSS and research results. As such it offers more coverage of non-parametric procedures used when standard assumptions are violated since these methods are more frequently encountered when working with real data. Determining appropriate sample sizes

  20. An introduction to statistical computing a simulation-based approach

    CERN Document Server

    Voss, Jochen

    2014-01-01

    A comprehensive introduction to sampling-based methods in statistical computing The use of computers in mathematics and statistics has opened up a wide range of techniques for studying otherwise intractable problems.  Sampling-based simulation techniques are now an invaluable tool for exploring statistical models.  This book gives a comprehensive introduction to the exciting area of sampling-based methods. An Introduction to Statistical Computing introduces the classical topics of random number generation and Monte Carlo methods.  It also includes some advanced met

  1. A statistical evaluation of asbestos air concentrations

    International Nuclear Information System (INIS)

    Lange, J.H.

    1999-01-01

    Both area and personal air samples collected during an asbestos abatement project were matched and statistically analysed. Among the many parameters studied were fibre concentrations and their variability. Mean values for area and personal samples were 0.005 and 0.024 f cm - - 3 of air, respectively. Summary values for area and personal samples suggest that exposures are low with no single exposure value exceeding the current OSHA TWA value of 0.1 f cm -3 of air. Within- and between-worker analysis suggests that these data are homogeneous. Comparison of within- and between-worker values suggests that the exposure source and variability for abatement are more related to the process than individual practices. This supports the importance of control measures for abatement. Study results also suggest that area and personal samples are not statistically related, that is, there is no association observed for these two sampling methods when data are analysed by correlation or regression analysis. Personal samples were statistically higher in concentration than area samples. Area sampling cannot be used as a surrogate exposure for asbestos abatement workers. (author)

  2. Sampling the Mouse Hippocampal Dentate Gyrus

    OpenAIRE

    Lisa Basler; Lisa Basler; Stephan Gerdes; David P. Wolfer; David P. Wolfer; David P. Wolfer; Lutz Slomianka; Lutz Slomianka

    2017-01-01

    Sampling is a critical step in procedures that generate quantitative morphological data in the neurosciences. Samples need to be representative to allow statistical evaluations, and samples need to deliver a precision that makes statistical evaluations not only possible but also meaningful. Sampling generated variability should, e.g., not be able to hide significant group differences from statistical detection if they are present. Estimators of the coefficient of error (CE) have been develope...

  3. Time Series Analysis Based on Running Mann Whitney Z Statistics

    Science.gov (United States)

    A sensitive and objective time series analysis method based on the calculation of Mann Whitney U statistics is described. This method samples data rankings over moving time windows, converts those samples to Mann-Whitney U statistics, and then normalizes the U statistics to Z statistics using Monte-...

  4. Monitoring larval populations of the Douglas-fir tussock moth and the western spruce budworm on permanent plots: sampling methods and statistical properties of data

    Science.gov (United States)

    A.R. Mason; H.G. Paul

    1994-01-01

    Procedures for monitoring larval populations of the Douglas-fir tussock moth and the western spruce budworm are recommended based on many years experience in sampling these species in eastern Oregon and Washington. It is shown that statistically reliable estimates of larval density can be made for a population by sampling host trees in a series of permanent plots in a...

  5. FUNSTAT and statistical image representations

    Science.gov (United States)

    Parzen, E.

    1983-01-01

    General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.

  6. Applied statistical designs for the researcher

    CERN Document Server

    Paulson, Daryl S

    2003-01-01

    Research and Statistics Basic Review of Parametric Statistics Exploratory Data Analysis Two Sample Tests Completely Randomized One-Factor Analysis of Variance One and Two Restrictions on Randomization Completely Randomized Two-Factor Factorial Designs Two-Factor Factorial Completely Randomized Blocked Designs Useful Small Scale Pilot Designs Nested Statistical Designs Linear Regression Nonparametric Statistics Introduction to Research Synthesis and "Meta-Analysis" and Conclusory Remarks References Index.

  7. Constrained statistical inference : sample-size tables for ANOVA and regression

    NARCIS (Netherlands)

    Vanbrabant, Leonard; Van De Schoot, Rens; Rosseel, Yves

    2015-01-01

    Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient β1 is larger than β2 and β3. The corresponding hypothesis is H: β1 > {β2, β3} and

  8. The quantitative LOD score: test statistic and sample size for exclusion and linkage of quantitative traits in human sibships.

    Science.gov (United States)

    Page, G P; Amos, C I; Boerwinkle, E

    1998-04-01

    We present a test statistic, the quantitative LOD (QLOD) score, for the testing of both linkage and exclusion of quantitative-trait loci in randomly selected human sibships. As with the traditional LOD score, the boundary values of 3, for linkage, and -2, for exclusion, can be used for the QLOD score. We investigated the sample sizes required for inferring exclusion and linkage, for various combinations of linked genetic variance, total heritability, recombination distance, and sibship size, using fixed-size sampling. The sample sizes required for both linkage and exclusion were not qualitatively different and depended on the percentage of variance being linked or excluded and on the total genetic variance. Information regarding linkage and exclusion in sibships larger than size 2 increased as approximately all possible pairs n(n-1)/2 up to sibships of size 6. Increasing the recombination (theta) distance between the marker and the trait loci reduced empirically the power for both linkage and exclusion, as a function of approximately (1-2theta)4.

  9. Frontiers in statistical quality control

    CERN Document Server

    Wilrich, Peter-Theodor

    2004-01-01

    This volume treats the four main categories of Statistical Quality Control: General SQC Methodology, On-line Control including Sampling Inspection and Statistical Process Control, Off-line Control with Data Analysis and Experimental Design, and, fields related to Reliability. Experts with international reputation present their newest contributions.

  10. A statistical evaluation of asbestos air concentrations

    Energy Technology Data Exchange (ETDEWEB)

    Lange, J.H. [Envirosafe Training and Consultants, Pittsburgh, PA (United States)

    1999-07-01

    Both area and personal air samples collected during an asbestos abatement project were matched and statistically analysed. Among the many parameters studied were fibre concentrations and their variability. Mean values for area and personal samples were 0.005 and 0.024 f cm{sup -}-{sup 3} of air, respectively. Summary values for area and personal samples suggest that exposures are low with no single exposure value exceeding the current OSHA TWA value of 0.1 f cm{sup -3} of air. Within- and between-worker analysis suggests that these data are homogeneous. Comparison of within- and between-worker values suggests that the exposure source and variability for abatement are more related to the process than individual practices. This supports the importance of control measures for abatement. Study results also suggest that area and personal samples are not statistically related, that is, there is no association observed for these two sampling methods when data are analysed by correlation or regression analysis. Personal samples were statistically higher in concentration than area samples. Area sampling cannot be used as a surrogate exposure for asbestos abatement workers. (author)

  11. Fundamentals of statistics

    CERN Document Server

    Mulholland, Henry

    1968-01-01

    Fundamentals of Statistics covers topics on the introduction, fundamentals, and science of statistics. The book discusses the collection, organization and representation of numerical data; elementary probability; the binomial Poisson distributions; and the measures of central tendency. The text describes measures of dispersion for measuring the spread of a distribution; continuous distributions for measuring on a continuous scale; the properties and use of normal distribution; and tests involving the normal or student's 't' distributions. The use of control charts for sample means; the ranges

  12. How Sample Size Affects a Sampling Distribution

    Science.gov (United States)

    Mulekar, Madhuri S.; Siegel, Murray H.

    2009-01-01

    If students are to understand inferential statistics successfully, they must have a profound understanding of the nature of the sampling distribution. Specifically, they must comprehend the determination of the expected value and standard error of a sampling distribution as well as the meaning of the central limit theorem. Many students in a high…

  13. Direct Learning of Systematics-Aware Summary Statistics

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    Complex machine learning tools, such as deep neural networks and gradient boosting algorithms, are increasingly being used to construct powerful discriminative features for High Energy Physics analyses. These methods are typically trained with simulated or auxiliary data samples by optimising some classification or regression surrogate objective. The learned feature representations are then used to build a sample-based statistical model to perform inference (e.g. interval estimation or hypothesis testing) over a set of parameters of interest. However, the effectiveness of the mentioned approach can be reduced by the presence of known uncertainties that cause differences between training and experimental data, included in the statistical model via nuisance parameters. This work presents an end-to-end algorithm, which leverages on existing deep learning technologies but directly aims to produce inference-optimal sample-summary statistics. By including the statistical model and a differentiable approximation of ...

  14. On the Sampling

    OpenAIRE

    Güleda Doğan

    2017-01-01

    This editorial is on statistical sampling, which is one of the most two important reasons for editorial rejection from our journal Turkish Librarianship. The stages of quantitative research, the stage in which we are sampling, the importance of sampling for a research, deciding on sample size and sampling methods are summarised briefly.

  15. Preferential sampling in veterinary parasitological surveillance

    Directory of Open Access Journals (Sweden)

    Lorenzo Cecconi

    2016-04-01

    Full Text Available In parasitological surveillance of livestock, prevalence surveys are conducted on a sample of farms using several sampling designs. For example, opportunistic surveys or informative sampling designs are very common. Preferential sampling refers to any situation in which the spatial process and the sampling locations are not independent. Most examples of preferential sampling in the spatial statistics literature are in environmental statistics with focus on pollutant monitors, and it has been shown that, if preferential sampling is present and is not accounted for in the statistical modelling and data analysis, statistical inference can be misleading. In this paper, working in the context of veterinary parasitology, we propose and use geostatistical models to predict the continuous and spatially-varying risk of a parasite infection. Specifically, breaking with the common practice in veterinary parasitological surveillance to ignore preferential sampling even though informative or opportunistic samples are very common, we specify a two-stage hierarchical Bayesian model that adjusts for preferential sampling and we apply it to data on Fasciola hepatica infection in sheep farms in Campania region (Southern Italy in the years 2013-2014.

  16. Statistics II essentials

    CERN Document Server

    Milewski, Emil G

    2012-01-01

    REA's Essentials provide quick and easy access to critical information in a variety of different fields, ranging from the most basic to the most advanced. As its name implies, these concise, comprehensive study guides summarize the essentials of the field covered. Essentials are helpful when preparing for exams, doing homework and will remain a lasting reference source for students, teachers, and professionals. Statistics II discusses sampling theory, statistical inference, independent and dependent variables, correlation theory, experimental design, count data, chi-square test, and time se

  17. Statistical quality management using miniTAB 14

    International Nuclear Information System (INIS)

    An, Seong Jin

    2007-01-01

    This book explains statistical quality management giving descriptions of definition of quality, quality management, quality cost, basic methods of quality management, principles of control chart, control chart for variables, control chart for attributes, capability analysis, other issues of statistical process control, acceptance sampling, sampling for variable acceptance, design and analysis of experiment, Taguchi quality engineering, reaction surface methodology reliability analysis.

  18. Spatial statistics of hydrography and water chemistry in a eutrophic boreal lake based on sounding and water samples.

    Science.gov (United States)

    Leppäranta, Matti; Lewis, John E; Heini, Anniina; Arvola, Lauri

    2018-06-04

    Spatial variability, an essential characteristic of lake ecosystems, has often been neglected in field research and monitoring. In this study, we apply spatial statistical methods for the key physics and chemistry variables and chlorophyll a over eight sampling dates in two consecutive years in a large (area 103 km 2 ) eutrophic boreal lake in southern Finland. In the four summer sampling dates, the water body was vertically and horizontally heterogenic except with color and DOC, in the two winter ice-covered dates DO was vertically stratified, while in the two autumn dates, no significant spatial differences in any of the measured variables were found. Chlorophyll a concentration was one order of magnitude lower under the ice cover than in open water. The Moran statistic for spatial correlation was significant for chlorophyll a and NO 2 +NO 3 -N in all summer situations and for dissolved oxygen and pH in three cases. In summer, the mass centers of the chemicals were within 1.5 km from the geometric center of the lake, and the 2nd moment radius ranged in 3.7-4.1 km respective to 3.9 km for the homogeneous situation. The lateral length scales of the studied variables were 1.5-2.5 km, about 1 km longer in the surface layer. The detected spatial "noise" strongly suggests that besides vertical variation also the horizontal variation in eutrophic lakes, in particular, should be considered when the ecosystems are monitored.

  19. Small sample approach, and statistical and epidemiological aspects

    NARCIS (Netherlands)

    Offringa, Martin; van der Lee, Hanneke

    2011-01-01

    In this chapter, the design of pharmacokinetic studies and phase III trials in children is discussed. Classical approaches and relatively novel approaches, which may be more useful in the context of drug research in children, are discussed. The burden of repeated blood sampling in pediatric

  20. Strong laws for L- and U-statistics

    NARCIS (Netherlands)

    Aaronson, J; Burton, R; Dehling, H; Gilat, D; Hill, T; Weiss, B

    Strong laws of large numbers are given for L-statistics (linear combinations of order statistics) and for U-statistics (averages of kernels of random samples) for ergodic stationary processes, extending classical theorems; of Hoeffding and of Helmers for lid sequences. Examples are given to show

  1. Statistical Computing

    Indian Academy of Sciences (India)

    inference and finite population sampling. Sudhakar Kunte. Elements of statistical computing are discussed in this series. ... which captain gets an option to decide whether to field first or bat first ... may of course not be fair, in the sense that the team which wins ... describe two methods of drawing a random number between 0.

  2. Sampling

    CERN Document Server

    Thompson, Steven K

    2012-01-01

    Praise for the Second Edition "This book has never had a competitor. It is the only book that takes a broad approach to sampling . . . any good personal statistics library should include a copy of this book." —Technometrics "Well-written . . . an excellent book on an important subject. Highly recommended." —Choice "An ideal reference for scientific researchers and other professionals who use sampling." —Zentralblatt Math Features new developments in the field combined with all aspects of obtaining, interpreting, and using sample data Sampling provides an up-to-date treat

  3. Statistical Power in Meta-Analysis

    Science.gov (United States)

    Liu, Jin

    2015-01-01

    Statistical power is important in a meta-analysis study, although few studies have examined the performance of simulated power in meta-analysis. The purpose of this study is to inform researchers about statistical power estimation on two sample mean difference test under different situations: (1) the discrepancy between the analytical power and…

  4. Frontiers in statistical quality control

    CERN Document Server

    Wilrich, Peter-Theodor

    2001-01-01

    The book is a collection of papers presented at the 5th International Workshop on Intelligent Statistical Quality Control in Würzburg, Germany. Contributions deal with methodology and successful industrial applications. They can be grouped in four catagories: Sampling Inspection, Statistical Process Control, Data Analysis and Process Capability Studies and Experimental Design.

  5. The Math Problem: Advertising Students' Attitudes toward Statistics

    Science.gov (United States)

    Fullerton, Jami A.; Kendrick, Alice

    2013-01-01

    This study used the Students' Attitudes toward Statistics Scale (STATS) to measure attitude toward statistics among a national sample of advertising students. A factor analysis revealed four underlying factors make up the attitude toward statistics construct--"Interest & Future Applicability," "Confidence," "Statistical Tools," and "Initiative."…

  6. Some connections between importance sampling and enhanced sampling methods in molecular dynamics.

    Science.gov (United States)

    Lie, H C; Quer, J

    2017-11-21

    In molecular dynamics, enhanced sampling methods enable the collection of better statistics of rare events from a reference or target distribution. We show that a large class of these methods is based on the idea of importance sampling from mathematical statistics. We illustrate this connection by comparing the Hartmann-Schütte method for rare event simulation (J. Stat. Mech. Theor. Exp. 2012, P11004) and the Valsson-Parrinello method of variationally enhanced sampling [Phys. Rev. Lett. 113, 090601 (2014)]. We use this connection in order to discuss how recent results from the Monte Carlo methods literature can guide the development of enhanced sampling methods.

  7. Evaluation of statistical methods for quantifying fractal scaling in water-quality time series with irregular sampling

    Science.gov (United States)

    Zhang, Qian; Harman, Ciaran J.; Kirchner, James W.

    2018-02-01

    River water-quality time series often exhibit fractal scaling, which here refers to autocorrelation that decays as a power law over some range of scales. Fractal scaling presents challenges to the identification of deterministic trends because (1) fractal scaling has the potential to lead to false inference about the statistical significance of trends and (2) the abundance of irregularly spaced data in water-quality monitoring networks complicates efforts to quantify fractal scaling. Traditional methods for estimating fractal scaling - in the form of spectral slope (β) or other equivalent scaling parameters (e.g., Hurst exponent) - are generally inapplicable to irregularly sampled data. Here we consider two types of estimation approaches for irregularly sampled data and evaluate their performance using synthetic time series. These time series were generated such that (1) they exhibit a wide range of prescribed fractal scaling behaviors, ranging from white noise (β = 0) to Brown noise (β = 2) and (2) their sampling gap intervals mimic the sampling irregularity (as quantified by both the skewness and mean of gap-interval lengths) in real water-quality data. The results suggest that none of the existing methods fully account for the effects of sampling irregularity on β estimation. First, the results illustrate the danger of using interpolation for gap filling when examining autocorrelation, as the interpolation methods consistently underestimate or overestimate β under a wide range of prescribed β values and gap distributions. Second, the widely used Lomb-Scargle spectral method also consistently underestimates β. A previously published modified form, using only the lowest 5 % of the frequencies for spectral slope estimation, has very poor precision, although the overall bias is small. Third, a recent wavelet-based method, coupled with an aliasing filter, generally has the smallest bias and root-mean-squared error among all methods for a wide range of

  8. Equivalent statistics and data interpretation.

    Science.gov (United States)

    Francis, Gregory

    2017-08-01

    Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.

  9. Sample size determination and power

    CERN Document Server

    Ryan, Thomas P, Jr

    2013-01-01

    THOMAS P. RYAN, PhD, teaches online advanced statistics courses for Northwestern University and The Institute for Statistics Education in sample size determination, design of experiments, engineering statistics, and regression analysis.

  10. Statistical sampling strategies for survey of soil contamination

    NARCIS (Netherlands)

    Brus, D.J.

    2011-01-01

    This chapter reviews methods for selecting sampling locations in contaminated soils for three situations. In the first situation a global estimate of the soil contamination in an area is required. The result of the surey is a number or a series of numbers per contaminant, e.g. the estimated mean

  11. Basics of modern mathematical statistics

    CERN Document Server

    Spokoiny, Vladimir

    2015-01-01

    This textbook provides a unified and self-contained presentation of the main approaches to and ideas of mathematical statistics. It collects the basic mathematical ideas and tools needed as a basis for more serious studies or even independent research in statistics. The majority of existing textbooks in mathematical statistics follow the classical asymptotic framework. Yet, as modern statistics has changed rapidly in recent years, new methods and approaches have appeared. The emphasis is on finite sample behavior, large parameter dimensions, and model misspecifications. The present book provides a fully self-contained introduction to the world of modern mathematical statistics, collecting the basic knowledge, concepts and findings needed for doing further research in the modern theoretical and applied statistics. This textbook is primarily intended for graduate and postdoc students and young researchers who are interested in modern statistical methods.

  12. Statistical intervals a guide for practitioners

    CERN Document Server

    Hahn, Gerald J

    2011-01-01

    Presents a detailed exposition of statistical intervals and emphasizes applications in industry. The discussion differentiates at an elementary level among different kinds of statistical intervals and gives instruction with numerous examples and simple math on how to construct such intervals from sample data. This includes confidence intervals to contain a population percentile, confidence intervals on probability of meeting specified threshold value, and prediction intervals to include observation in a future sample. Also has an appendix containing computer subroutines for nonparametric stati

  13. Probability an introduction with statistical applications

    CERN Document Server

    Kinney, John J

    2014-01-01

    Praise for the First Edition""This is a well-written and impressively presented introduction to probability and statistics. The text throughout is highly readable, and the author makes liberal use of graphs and diagrams to clarify the theory.""  - The StatisticianThoroughly updated, Probability: An Introduction with Statistical Applications, Second Edition features a comprehensive exploration of statistical data analysis as an application of probability. The new edition provides an introduction to statistics with accessible coverage of reliability, acceptance sampling, confidence intervals, h

  14. Statistical analysis of archeomagnetic samples of Teotihuacan, Mexico

    Science.gov (United States)

    Soler-Arechalde, A. M.

    2012-12-01

    Teotihuacan was the one of the most important metropolis of Mesoamerica during the Classic Period (1 to 600 AC). The city had a continuous growth in different stages that usually concluded with a ritual. Fire was an important element natives would burn entire structures. An example of this is the Quetzalcoatl pyramid in La Ciudadela (350 AC), it was burned and a new structure was built over it, also the Big Fire at 570 AC, that marks its end. These events are suitable to archaeomagnetic dating. The inclusion of ash in the stucco enhances the magnetic signal of detrital type that also allows us to make dating. This increases the number of samples to be processed as well as the number of dates. The samples have been analyzed according to their type: floor, wall, talud and painting and whether or not exposed to fire. Sequences of directions obtained in excavations in strict stratigraphic control will be shown. A sequence of images was used to analyze the improving of Teotihuacan secular variation curve through more than a decade of continuous work at the area.

  15. Nonparametric statistics for social and behavioral sciences

    CERN Document Server

    Kraska-MIller, M

    2013-01-01

    Introduction to Research in Social and Behavioral SciencesBasic Principles of ResearchPlanning for ResearchTypes of Research Designs Sampling ProceduresValidity and Reliability of Measurement InstrumentsSteps of the Research Process Introduction to Nonparametric StatisticsData AnalysisOverview of Nonparametric Statistics and Parametric Statistics Overview of Parametric Statistics Overview of Nonparametric StatisticsImportance of Nonparametric MethodsMeasurement InstrumentsAnalysis of Data to Determine Association and Agreement Pearson Chi-Square Test of Association and IndependenceContingency

  16. Statistical prediction of Late Miocene climate

    Digital Repository Service at National Institute of Oceanography (India)

    Fernandes, A.A; Gupta, S.M.

    by making certain simplifying assumptions; for example in modelling ocean 4 currents, the geostrophic approximation is made. In case of statistical prediction no such a priori assumption need be made. statistical prediction comprises of using observed data... the number of equations. In this case the equations are overdetermined, and therefore one has to look for a solution that best fits the sample data in a least squares sense. To this end we express the sample .... (2.1)+ ry = y + data as follows: n L c. (x...

  17. The Statistics and Mathematics of High Dimension Low Sample Size Asymptotics.

    Science.gov (United States)

    Shen, Dan; Shen, Haipeng; Zhu, Hongtu; Marron, J S

    2016-10-01

    The aim of this paper is to establish several deep theoretical properties of principal component analysis for multiple-component spike covariance models. Our new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size. When this ratio converges to a nonzero constant, the sample eigenvector converges to a cone, with a certain angle to its corresponding population eigenvector. In the High Dimension, Low Sample Size case, the angle between the sample eigenvector and its population counterpart converges to a limiting distribution. Several generalizations of the multi-spike covariance models are also explored, and additional theoretical results are presented.

  18. Understanding the Sampling Distribution and the Central Limit Theorem.

    Science.gov (United States)

    Lewis, Charla P.

    The sampling distribution is a common source of misuse and misunderstanding in the study of statistics. The sampling distribution, underlying distribution, and the Central Limit Theorem are all interconnected in defining and explaining the proper use of the sampling distribution of various statistics. The sampling distribution of a statistic is…

  19. Research design and statistical analysis

    CERN Document Server

    Myers, Jerome L; Lorch Jr, Robert F

    2013-01-01

    Research Design and Statistical Analysis provides comprehensive coverage of the design principles and statistical concepts necessary to make sense of real data.  The book's goal is to provide a strong conceptual foundation to enable readers to generalize concepts to new research situations.  Emphasis is placed on the underlying logic and assumptions of the analysis and what it tells the researcher, the limitations of the analysis, and the consequences of violating assumptions.  Sampling, design efficiency, and statistical models are emphasized throughout. As per APA recommendations

  20. USING STATISTICAL SURVEY IN ECONOMICS

    Directory of Open Access Journals (Sweden)

    Delia TESELIOS

    2012-01-01

    Full Text Available Statistical survey is an effective method of statistical investigation that involves gathering quantitative data, which is often preferred in statistical reports due to the information which can be obtained regarding the entire population studied by observing a part of it. Therefore, because of the information provided, polls are used in many research areas. In economics, statistics are used in the decision making process in choosing competitive strategies in the analysis of certain economic phenomena, the formulation of forecasts. Economic study presented in this paper is to illustrate how a simple random sampling is used to analyze the existing parking spaces situation in a given locality.

  1. Use of Statistical Heuristics in Everyday Inductive Reasoning.

    Science.gov (United States)

    Nisbett, Richard E.; And Others

    1983-01-01

    In everyday reasoning, people use statistical heuristics (judgmental tools that are rough intuitive equivalents of statistical principles). Use of statistical heuristics is more likely when (1) sampling is clear, (2) the role of chance is clear, (3) statistical reasoning is normative for the event, or (4) the subject has had training in…

  2. Pseudo-populations a basic concept in statistical surveys

    CERN Document Server

    Quatember, Andreas

    2015-01-01

    This book emphasizes that artificial or pseudo-populations play an important role in statistical surveys from finite universes in two manners: firstly, the concept of pseudo-populations may substantially improve users’ understanding of various aspects in the sampling theory and survey methodology; an example of this scenario is the Horvitz-Thompson estimator. Secondly, statistical procedures exist in which pseudo-populations actually have to be generated. An example of such a scenario can be found in simulation studies in the field of survey sampling, where close-to-reality pseudo-populations are generated from known sample and population data to form the basis for the simulation process. The chapters focus on estimation methods, sampling techniques, nonresponse, questioning designs and statistical disclosure control.This book is a valuable reference in understanding the importance of the pseudo-population concept and applying it in teaching and research.

  3. Final Sampling and Analysis Plan for Background Sampling, Fort Sheridan, Illinois

    National Research Council Canada - National Science Library

    1995-01-01

    .... This Background Sampling and Analysis Plan (BSAP) is designed to address this issue through the collection of additional background samples at Fort Sheridan to support the statistical analysis and the Baseline Risk Assessment (BRA...

  4. Writing to Learn Statistics in an Advanced Placement Statistics Course

    Science.gov (United States)

    Northrup, Christian Glenn

    2012-01-01

    This study investigated the use of writing in a statistics classroom to learn if writing provided a rich description of problem-solving processes of students as they solved problems. Through analysis of 329 written samples provided by students, it was determined that writing provided a rich description of problem-solving processes and enabled…

  5. IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

    Science.gov (United States)

    Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

    2017-09-15

    Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  6. Evaluation of statistical methods for quantifying fractal scaling in water-quality time series with irregular sampling

    Directory of Open Access Journals (Sweden)

    Q. Zhang

    2018-02-01

    Full Text Available River water-quality time series often exhibit fractal scaling, which here refers to autocorrelation that decays as a power law over some range of scales. Fractal scaling presents challenges to the identification of deterministic trends because (1 fractal scaling has the potential to lead to false inference about the statistical significance of trends and (2 the abundance of irregularly spaced data in water-quality monitoring networks complicates efforts to quantify fractal scaling. Traditional methods for estimating fractal scaling – in the form of spectral slope (β or other equivalent scaling parameters (e.g., Hurst exponent – are generally inapplicable to irregularly sampled data. Here we consider two types of estimation approaches for irregularly sampled data and evaluate their performance using synthetic time series. These time series were generated such that (1 they exhibit a wide range of prescribed fractal scaling behaviors, ranging from white noise (β  =  0 to Brown noise (β  =  2 and (2 their sampling gap intervals mimic the sampling irregularity (as quantified by both the skewness and mean of gap-interval lengths in real water-quality data. The results suggest that none of the existing methods fully account for the effects of sampling irregularity on β estimation. First, the results illustrate the danger of using interpolation for gap filling when examining autocorrelation, as the interpolation methods consistently underestimate or overestimate β under a wide range of prescribed β values and gap distributions. Second, the widely used Lomb–Scargle spectral method also consistently underestimates β. A previously published modified form, using only the lowest 5 % of the frequencies for spectral slope estimation, has very poor precision, although the overall bias is small. Third, a recent wavelet-based method, coupled with an aliasing filter, generally has the smallest bias and root-mean-squared error among

  7. Notes on the MUF-D statistic

    International Nuclear Information System (INIS)

    Picard, R.R.

    1987-01-01

    Verification of an inventory or of a reported material unaccounted for (MUF) calls for the remeasurement of a sample of items by an inspector followed by comparison of the inspector's data to the facility's reported values. Such comparison is intended to protect against falsification of accounting data that could conceal material loss. In the international arena, the observed discrepancies between the inspector's data and the reported data are quantified using the D statistic. If data have been falsified by the facility, the standard deviations of the D and MUF-D statistics are inflated owing to the sampling distribution. Moreover, under certain conditions the distributions of those statistics can depart markedly from normality, complicating evaluation of an inspection plan's performance. Detection probabilities estimated using standard deviations appropriate for the no-falsification case in conjunction with assumed normality can be far too optimistic. Under very general conditions regarding the facility's and/or the inspector's measurement error procedures and the inspector's sampling regime, the variance of the MUF-D statistic can be broken into three components. The inspection's sensitivity against various falsification scenarios can be traced to one or more of these components. Obvious implications exist for the planning of effective inspections, particularly in the area of resource optimization

  8. The large sample size fallacy.

    Science.gov (United States)

    Lantz, Björn

    2013-06-01

    Significance in the statistical sense has little to do with significance in the common practical sense. Statistical significance is a necessary but not a sufficient condition for practical significance. Hence, results that are extremely statistically significant may be highly nonsignificant in practice. The degree of practical significance is generally determined by the size of the observed effect, not the p-value. The results of studies based on large samples are often characterized by extreme statistical significance despite small or even trivial effect sizes. Interpreting such results as significant in practice without further analysis is referred to as the large sample size fallacy in this article. The aim of this article is to explore the relevance of the large sample size fallacy in contemporary nursing research. Relatively few nursing articles display explicit measures of observed effect sizes or include a qualitative discussion of observed effect sizes. Statistical significance is often treated as an end in itself. Effect sizes should generally be calculated and presented along with p-values for statistically significant results, and observed effect sizes should be discussed qualitatively through direct and explicit comparisons with the effects in related literature. © 2012 Nordic College of Caring Science.

  9. Statistical Estimators Using Jointly Administrative and Survey Data to Produce French Structural Business Statistics

    Directory of Open Access Journals (Sweden)

    Brion Philippe

    2015-12-01

    Full Text Available Using as much administrative data as possible is a general trend among most national statistical institutes. Different kinds of administrative sources, from tax authorities or other administrative bodies, are very helpful material in the production of business statistics. However, these sources often have to be completed by information collected through statistical surveys. This article describes the way Insee has implemented such a strategy in order to produce French structural business statistics. The originality of the French procedure is that administrative and survey variables are used jointly for the same enterprises, unlike the majority of multisource systems, in which the two kinds of sources generally complement each other for different categories of units. The idea is to use, as much as possible, the richness of the administrative sources combined with the timeliness of a survey, even if the latter is conducted only on a sample of enterprises. One main issue is the classification of enterprises within the NACE nomenclature, which is a cornerstone variable in producing the breakdown of the results by industry. At a given date, two values of the corresponding code may coexist: the value of the register, not necessarily up to date, and the value resulting from the data collected via the survey, but only from a sample of enterprises. Using all this information together requires the implementation of specific statistical estimators combining some properties of the difference estimators with calibration techniques. This article presents these estimators, as well as their statistical properties, and compares them with those of other methods.

  10. Use Of R in Statistics Lithuania

    Directory of Open Access Journals (Sweden)

    Tomas Rudys

    2016-06-01

    Full Text Available Recently R becoming more and more popular among official statistics offices. It can be used not even for research purposes, but also for a production of official statistics. Statistics Lithuania recently started an analysis of possibilities where R can be used and could it replace some other statistical programming languages or systems. For this reason a work group was arranged. In the paper we will present overview of the current situation on implementation of R in Statistics Lithuania, some problems we are chasing with and some future plans. At the current situation R is used mainly for research purposes. Looking for- ward a short courses on basic R was prepared and at the moment we are starting to use R for data analysis, data manipulation from Oracle data bases, some reports preparation, data editing, survey estimation. On the other hand we found some problems working with big data sets, also survey sampling as there are surveys with complex sampling designs. We are also analysing the running of R on our servers in order to have possibilities to use more random access memory (RAM. Despite the problems, we are trying to use R in more fields in production of official statistics.

  11. DESIGNING ENVIRONMENTAL MONITORING DATABASES FOR STATISTIC ASSESSMENT

    Science.gov (United States)

    Databases designed for statistical analyses have characteristics that distinguish them from databases intended for general use. EMAP uses a probabilistic sampling design to collect data to produce statistical assessments of environmental conditions. In addition to supporting the ...

  12. Identification of mine waters by statistical multivariate methods

    Energy Technology Data Exchange (ETDEWEB)

    Mali, N [IGGG, Ljubljana (Slovenia)

    1992-01-01

    Three water-bearing aquifers are present in the Velenje lignite mine. The aquifer waters have differing chemical composition; a geochemical water analysis can therefore determine the source of mine water influx. Mine water samples from different locations in the mine were analyzed, the results of chemical content and of electric conductivity of mine water were statistically processed by means of MICROGAS, SPSS-X and IN STATPAC computer programs, which apply three multivariate statistical methods (discriminate, cluster and factor analysis). Reliability of calculated values was determined with the Kolmogorov and Smirnov tests. It is concluded that laboratory analysis of single water samples can produce measurement errors, but statistical processing of water sample data can identify origin and movement of mine water. 15 refs.

  13. Statistical searches for microlensing events in large, non-uniformly sampled time-domain surveys: A test using palomar transient factory data

    Energy Technology Data Exchange (ETDEWEB)

    Price-Whelan, Adrian M.; Agüeros, Marcel A. [Department of Astronomy, Columbia University, 550 W 120th Street, New York, NY 10027 (United States); Fournier, Amanda P. [Department of Physics, Broida Hall, University of California, Santa Barbara, CA 93106 (United States); Street, Rachel [Las Cumbres Observatory Global Telescope Network, Inc., 6740 Cortona Drive, Suite 102, Santa Barbara, CA 93117 (United States); Ofek, Eran O. [Benoziyo Center for Astrophysics, Weizmann Institute of Science, 76100 Rehovot (Israel); Covey, Kevin R. [Lowell Observatory, 1400 West Mars Hill Road, Flagstaff, AZ 86001 (United States); Levitan, David; Sesar, Branimir [Division of Physics, Mathematics, and Astronomy, California Institute of Technology, Pasadena, CA 91125 (United States); Laher, Russ R.; Surace, Jason, E-mail: adrn@astro.columbia.edu [Spitzer Science Center, California Institute of Technology, Mail Stop 314-6, Pasadena, CA 91125 (United States)

    2014-01-20

    Many photometric time-domain surveys are driven by specific goals, such as searches for supernovae or transiting exoplanets, which set the cadence with which fields are re-imaged. In the case of the Palomar Transient Factory (PTF), several sub-surveys are conducted in parallel, leading to non-uniform sampling over its ∼20,000 deg{sup 2} footprint. While the median 7.26 deg{sup 2} PTF field has been imaged ∼40 times in the R band, ∼2300 deg{sup 2} have been observed >100 times. We use PTF data to study the trade off between searching for microlensing events in a survey whose footprint is much larger than that of typical microlensing searches, but with far-from-optimal time sampling. To examine the probability that microlensing events can be recovered in these data, we test statistics used on uniformly sampled data to identify variables and transients. We find that the von Neumann ratio performs best for identifying simulated microlensing events in our data. We develop a selection method using this statistic and apply it to data from fields with >10 R-band observations, 1.1 × 10{sup 9} light curves, uncovering three candidate microlensing events. We lack simultaneous, multi-color photometry to confirm these as microlensing events. However, their number is consistent with predictions for the event rate in the PTF footprint over the survey's three years of operations, as estimated from near-field microlensing models. This work can help constrain all-sky event rate predictions and tests microlensing signal recovery in large data sets, which will be useful to future time-domain surveys, such as that planned with the Large Synoptic Survey Telescope.

  14. Statistical Model of Extreme Shear

    DEFF Research Database (Denmark)

    Larsen, Gunner Chr.; Hansen, Kurt Schaldemose

    2004-01-01

    In order to continue cost-optimisation of modern large wind turbines, it is important to continously increase the knowledge on wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... by a model that, on a statistically consistent basis, describe the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of high-sampled full-scale time series measurements...... are consistent, given the inevitabel uncertainties associated with model as well as with the extreme value data analysis. Keywords: Statistical model, extreme wind conditions, statistical analysis, turbulence, wind loading, statistical analysis, turbulence, wind loading, wind shear, wind turbines....

  15. A simulative comparison of respondent driven sampling with incentivized snowball sampling--the "strudel effect".

    Science.gov (United States)

    Gyarmathy, V Anna; Johnston, Lisa G; Caplinskiene, Irma; Caplinskas, Saulius; Latkin, Carl A

    2014-02-01

    Respondent driven sampling (RDS) and incentivized snowball sampling (ISS) are two sampling methods that are commonly used to reach people who inject drugs (PWID). We generated a set of simulated RDS samples on an actual sociometric ISS sample of PWID in Vilnius, Lithuania ("original sample") to assess if the simulated RDS estimates were statistically significantly different from the original ISS sample prevalences for HIV (9.8%), Hepatitis A (43.6%), Hepatitis B (Anti-HBc 43.9% and HBsAg 3.4%), Hepatitis C (87.5%), syphilis (6.8%) and Chlamydia (8.8%) infections and for selected behavioral risk characteristics. The original sample consisted of a large component of 249 people (83% of the sample) and 13 smaller components with 1-12 individuals. Generally, as long as all seeds were recruited from the large component of the original sample, the simulation samples simply recreated the large component. There were no significant differences between the large component and the entire original sample for the characteristics of interest. Altogether 99.2% of 360 simulation sample point estimates were within the confidence interval of the original prevalence values for the characteristics of interest. When population characteristics are reflected in large network components that dominate the population, RDS and ISS may produce samples that have statistically non-different prevalence values, even though some isolated network components may be under-sampled and/or statistically significantly different from the main groups. This so-called "strudel effect" is discussed in the paper. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  16. The matchmaking paradox: a statistical explanation

    International Nuclear Information System (INIS)

    Eliazar, Iddo I; Sokolov, Igor M

    2010-01-01

    Medical surveys regarding the number of heterosexual partners per person yield different female and male averages-a result which, from a physical standpoint, is impossible. In this paper we term this puzzle the 'matchmaking paradox', and establish a statistical model explaining it. We consider a bipartite graph with N male and N female nodes (N >> 1), and B bonds connecting them (B >> 1). Each node is associated a random 'attractiveness level', and the bonds connect to the nodes randomly-with probabilities which are proportionate to the nodes' attractiveness levels. The population's average bonds-per-nodes B/N is estimated via a sample average calculated from a survey of size n (n >> 1). A comprehensive statistical analysis of this model is carried out, asserting that (i) the sample average well estimates the population average if and only if the attractiveness levels possess a finite mean; (ii) if the attractiveness levels are governed by a 'fat-tailed' probability law then the sample average displays wild fluctuations and strong skew-thus providing a statistical explanation to the matchmaking paradox.

  17. Bias expansion of spatial statistics and approximation of differenced ...

    Indian Academy of Sciences (India)

    Investigations of spatial statistics, computed from lattice data in the plane, can lead to a special lattice point counting problem. The statistical goal is to expand the asymptotic expectation or large-sample bias of certain spatial covariance estimators, where this bias typically depends on the shape of a spatial sampling region.

  18. Statistical analysis of biomechanical properties of the adult skull and age-related structural changes by sex in a Japanese forensic sample.

    Science.gov (United States)

    Torimitsu, Suguru; Nishida, Yoshifumi; Takano, Tachio; Koizumi, Yoshinori; Makino, Yohsuke; Yajima, Daisuke; Hayakawa, Mutsumi; Inokuchi, Go; Motomura, Ayumi; Chiba, Fumiko; Otsuka, Katsura; Kobayashi, Kazuhiro; Odo, Yuriko; Iwase, Hirotaro

    2014-01-01

    The purpose of this research was to investigate the biomechanical properties of the adult human skull and the structural changes that occur with age in both sexes. The heads of 94 Japanese cadavers (54 male cadavers, 40 female cadavers) autopsied in our department were used in this research. A total of 376 cranial samples, four from each skull, were collected. Sample fracture load was measured by a bending test. A statistically significant negative correlation between the sample fracture load and cadaver age was found. This indicates that the stiffness of cranial bones in Japanese individuals decreases with age, and the risk of skull fracture thus probably increases with age. Prior to the bending test, the sample mass, the sample thickness, the ratio of the sample thickness to cadaver stature (ST/CS), and the sample density were measured and calculated. Significant negative correlations between cadaver age and sample thickness, ST/CS, and the sample density were observed only among the female samples. Computerized tomographic (CT) images of 358 cranial samples were available. The computed tomography value (CT value) of cancellous bone which refers to a quantitative scale for describing radiodensity, cancellous bone thickness and cortical bone thickness were measured and calculated. Significant negative correlation between cadaver age and the CT value or cortical bone thickness was observed only among the female samples. These findings suggest that the skull is substantially affected by decreased bone metabolism resulting from osteoporosis. Therefore, osteoporosis prevention and treatment may increase cranial stiffness and reinforce the skull structure, leading to a decrease in the risk of skull fractures. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  19. Mathematical Anxiety among Business Statistics Students.

    Science.gov (United States)

    High, Robert V.

    A survey instrument was developed to identify sources of mathematics anxiety among undergraduate business students in a statistics class. A number of statistics classes were selected at two colleges in Long Island, New York. A final sample of n=102 respondents indicated that there was a relationship between the mathematics grade in prior…

  20. Triacylglycerol Analysis in Human Milk and Other Mammalian Species: Small-Scale Sample Preparation, Characterization, and Statistical Classification Using HPLC-ELSD Profiles.

    Science.gov (United States)

    Ten-Doménech, Isabel; Beltrán-Iturat, Eduardo; Herrero-Martínez, José Manuel; Sancho-Llopis, Juan Vicente; Simó-Alfonso, Ernesto Francisco

    2015-06-24

    In this work, a method for the separation of triacylglycerols (TAGs) present in human milk and from other mammalian species by reversed-phase high-performance liquid chromatography using a core-shell particle packed column with UV and evaporative light-scattering detectors is described. Under optimal conditions, a mobile phase containing acetonitrile/n-pentanol at 10 °C gave an excellent resolution among more than 50 TAG peaks. A small-scale method for fat extraction in these milks (particularly of interest for human milk samples) using minimal amounts of sample and reagents was also developed. The proposed extraction protocol and the traditional method were compared, giving similar results, with respect to the total fat and relative TAG contents. Finally, a statistical study based on linear discriminant analysis on the TAG composition of different types of milks (human, cow, sheep, and goat) was carried out to differentiate the samples according to their mammalian origin.

  1. Automated statistical modeling of analytical measurement systems

    International Nuclear Information System (INIS)

    Jacobson, J.J.

    1992-01-01

    The statistical modeling of analytical measurement systems at the Idaho Chemical Processing Plant (ICPP) has been completely automated through computer software. The statistical modeling of analytical measurement systems is one part of a complete quality control program used by the Remote Analytical Laboratory (RAL) at the ICPP. The quality control program is an integration of automated data input, measurement system calibration, database management, and statistical process control. The quality control program and statistical modeling program meet the guidelines set forth by the American Society for Testing Materials and American National Standards Institute. A statistical model is a set of mathematical equations describing any systematic bias inherent in a measurement system and the precision of a measurement system. A statistical model is developed from data generated from the analysis of control standards. Control standards are samples which are made up at precise known levels by an independent laboratory and submitted to the RAL. The RAL analysts who process control standards do not know the values of those control standards. The object behind statistical modeling is to describe real process samples in terms of their bias and precision and, to verify that a measurement system is operating satisfactorily. The processing of control standards gives us this ability

  2. Estimating statistical uncertainty of Monte Carlo efficiency-gain in the context of a correlated sampling Monte Carlo code for brachytherapy treatment planning with non-normal dose distribution.

    Science.gov (United States)

    Mukhopadhyay, Nitai D; Sampson, Andrew J; Deniz, Daniel; Alm Carlsson, Gudrun; Williamson, Jeffrey; Malusek, Alexandr

    2012-01-01

    Correlated sampling Monte Carlo methods can shorten computing times in brachytherapy treatment planning. Monte Carlo efficiency is typically estimated via efficiency gain, defined as the reduction in computing time by correlated sampling relative to conventional Monte Carlo methods when equal statistical uncertainties have been achieved. The determination of the efficiency gain uncertainty arising from random effects, however, is not a straightforward task specially when the error distribution is non-normal. The purpose of this study is to evaluate the applicability of the F distribution and standardized uncertainty propagation methods (widely used in metrology to estimate uncertainty of physical measurements) for predicting confidence intervals about efficiency gain estimates derived from single Monte Carlo runs using fixed-collision correlated sampling in a simplified brachytherapy geometry. A bootstrap based algorithm was used to simulate the probability distribution of the efficiency gain estimates and the shortest 95% confidence interval was estimated from this distribution. It was found that the corresponding relative uncertainty was as large as 37% for this particular problem. The uncertainty propagation framework predicted confidence intervals reasonably well; however its main disadvantage was that uncertainties of input quantities had to be calculated in a separate run via a Monte Carlo method. The F distribution noticeably underestimated the confidence interval. These discrepancies were influenced by several photons with large statistical weights which made extremely large contributions to the scored absorbed dose difference. The mechanism of acquiring high statistical weights in the fixed-collision correlated sampling method was explained and a mitigation strategy was proposed. Copyright © 2011 Elsevier Ltd. All rights reserved.

  3. Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science.

    Science.gov (United States)

    Veldkamp, Coosje L S; Nuijten, Michèle B; Dominguez-Alvarez, Linda; van Assen, Marcel A L M; Wicherts, Jelte M

    2014-01-01

    Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this 'co-piloting' currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors.

  4. Statistical evaluations of current sampling procedures and incomplete core recovery

    International Nuclear Information System (INIS)

    Heasler, P.G.; Jensen, L.

    1994-03-01

    This document develops two formulas that describe the effects of incomplete recovery on core sampling results for the Hanford waste tanks. The formulas evaluate incomplete core recovery from a worst-case (i.e.,biased) and best-case (i.e., unbiased) perspective. A core sampler is unbiased if the sample material recovered is a random sample of the material in the tank, while any sampler that preferentially recovers a particular type of waste over others is a biased sampler. There is strong evidence to indicate that the push-mode sampler presently used at the Hanford site is a biased one. The formulas presented here show the effects of incomplete core recovery on the accuracy of composition measurements, as functions of the vertical variability in the waste. These equations are evaluated using vertical variability estimates from previously sampled tanks (B110, U110, C109). Assuming that the values of vertical variability used in this study adequately describes the Hanford tank farm, one can use the formulas to compute the effect of incomplete recovery on the accuracy of an average constituent estimate. To determine acceptable recovery limits, we have assumed that the relative error of such an estimate should be no more than 20%

  5. Statistical distributions applications and parameter estimates

    CERN Document Server

    Thomopoulos, Nick T

    2017-01-01

    This book gives a description of the group of statistical distributions that have ample application to studies in statistics and probability.  Understanding statistical distributions is fundamental for researchers in almost all disciplines.  The informed researcher will select the statistical distribution that best fits the data in the study at hand.  Some of the distributions are well known to the general researcher and are in use in a wide variety of ways.  Other useful distributions are less understood and are not in common use.  The book describes when and how to apply each of the distributions in research studies, with a goal to identify the distribution that best applies to the study.  The distributions are for continuous, discrete, and bivariate random variables.  In most studies, the parameter values are not known a priori, and sample data is needed to estimate parameter values.  In other scenarios, no sample data is available, and the researcher seeks some insight that allows the estimate of ...

  6. Application of nonparametric statistics to material strength/reliability assessment

    International Nuclear Information System (INIS)

    Arai, Taketoshi

    1992-01-01

    An advanced material technology requires data base on a wide variety of material behavior which need to be established experimentally. It may often happen that experiments are practically limited in terms of reproducibility or a range of test parameters. Statistical methods can be applied to understanding uncertainties in such a quantitative manner as required from the reliability point of view. Statistical assessment involves determinations of a most probable value and the maximum and/or minimum value as one-sided or two-sided confidence limit. A scatter of test data can be approximated by a theoretical distribution only if the goodness of fit satisfies a test criterion. Alternatively, nonparametric statistics (NPS) or distribution-free statistics can be applied. Mathematical procedures by NPS are well established for dealing with most reliability problems. They handle only order statistics of a sample. Mathematical formulas and some applications to engineering assessments are described. They include confidence limits of median, population coverage of sample, required minimum number of a sample, and confidence limits of fracture probability. These applications demonstrate that a nonparametric statistical estimation is useful in logical decision making in the case a large uncertainty exists. (author)

  7. The statistical-inference approach to generalized thermodynamics

    International Nuclear Information System (INIS)

    Lavenda, B.H.; Scherer, C.

    1987-01-01

    Limit theorems, such as the central-limit theorem and the weak law of large numbers, are applicable to statistical thermodynamics for sufficiently large sample size of indipendent and identically distributed observations performed on extensive thermodynamic (chance) variables. The estimation of the intensive thermodynamic quantities is a problem in parametric statistical estimation. The normal approximation to the Gibbs' distribution is justified by the analysis of large deviations. Statistical thermodynamics is generalized to include the statistical estimation of variance as well as mean values

  8. Environmental restoration and statistics: Issues and needs

    International Nuclear Information System (INIS)

    Gilbert, R.O.

    1991-10-01

    Statisticians have a vital role to play in environmental restoration (ER) activities. One facet of that role is to point out where additional work is needed to develop statistical sampling plans and data analyses that meet the needs of ER. This paper is an attempt to show where statistics fits into the ER process. The statistician, as member of the ER planning team, works collaboratively with the team to develop the site characterization sampling design, so that data of the quality and quantity required by the specified data quality objectives (DQOs) are obtained. At the same time, the statistician works with the rest of the planning team to design and implement, when appropriate, the observational approach to streamline the ER process and reduce costs. The statistician will also provide the expertise needed to select or develop appropriate tools for statistical analysis that are suited for problems that are common to waste-site data. These data problems include highly heterogeneous waste forms, large variability in concentrations over space, correlated data, data that do not have a normal (Gaussian) distribution, and measurements below detection limits. Other problems include environmental transport and risk models that yield highly uncertain predictions, and the need to effectively communicate to the public highly technical information, such as sampling plans, site characterization data, statistical analysis results, and risk estimates. Even though some statistical analysis methods are available ''off the shelf'' for use in ER, these problems require the development of additional statistical tools, as discussed in this paper. 29 refs

  9. Cloud-based solution to identify statistically significant MS peaks differentiating sample categories.

    Science.gov (United States)

    Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B

    2013-03-23

    Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.

  10. Special nuclear material inventory sampling plans

    International Nuclear Information System (INIS)

    Vaccaro, H.; Goldman, A.

    1987-01-01

    Since their introduction in 1942, sampling inspection procedures have been common quality assurance practice. The U.S. Department of Energy (DOE) supports such sampling of special nuclear materials inventories. The DOE Order 5630.7 states, Operations Offices may develop and use statistically valid sampling plans appropriate for their site-specific needs. The benefits for nuclear facilities operations include reduced worker exposure and reduced work load. Improved procedures have been developed for obtaining statistically valid sampling plans that maximize these benefits. The double sampling concept is described and the resulting sample sizes for double sample plans are compared with other plans. An algorithm is given for finding optimal double sampling plans that assist in choosing the appropriate detection and false alarm probabilities for various sampling plans

  11. Bayesian statistics an introduction

    CERN Document Server

    Lee, Peter M

    2012-01-01

    Bayesian Statistics is the school of thought that combines prior beliefs with the likelihood of a hypothesis to arrive at posterior beliefs. The first edition of Peter Lee’s book appeared in 1989, but the subject has moved ever onwards, with increasing emphasis on Monte Carlo based techniques. This new fourth edition looks at recent techniques such as variational methods, Bayesian importance sampling, approximate Bayesian computation and Reversible Jump Markov Chain Monte Carlo (RJMCMC), providing a concise account of the way in which the Bayesian approach to statistics develops as wel

  12. Statistical evaluation of cleanup: How should it be done?

    International Nuclear Information System (INIS)

    Gilbert, R.O.

    1993-02-01

    This paper discusses statistical issues that must be addressed when conducting statistical tests for the purpose of evaluating if a site has been remediated to guideline values or standards. The importance of using the Data Quality Objectives (DQO) process to plan and design the sampling plan is emphasized. Other topics discussed are: (1) accounting for the uncertainty of cleanup standards when conducting statistical tests, (2) determining the number of samples and measurements needed to attain specified DQOs, (3) considering whether the appropriate testing philosophy in a given situation is ''guilty until proven innocent'' or ''innocent until proven guilty'' when selecting a statistical test for evaluating the attainment of standards, (4) conducting tests using data sets that contain measurements that have been reported by the laboratory as less than the minimum detectable activity, and (5) selecting statistical tests that are appropriate for risk-based or background-based standards. A recent draft report by Berger that provides guidance on sampling plans and data analyses for final status surveys at US Nuclear Regulatory Commission licensed facilities serves as a focal point for discussion

  13. EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.

    Science.gov (United States)

    Tong, Xiaoxiao; Bentler, Peter M

    2013-01-01

    Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.

  14. Statistics for Engineers

    International Nuclear Information System (INIS)

    Kim, Jin Gyeong; Park, Jin Ho; Park, Hyeon Jin; Lee, Jae Jun; Jun, Whong Seok; Whang, Jin Su

    2009-08-01

    This book explains statistics for engineers using MATLAB, which includes arrangement and summary of data, probability, probability distribution, sampling distribution, assumption, check, variance analysis, regression analysis, categorical data analysis, quality assurance such as conception of control chart, consecutive control chart, breakthrough strategy and analysis using Matlab, reliability analysis like measurement of reliability and analysis with Maltab, and Markov chain.

  15. Two-Sample Statistics for Testing the Equality of Survival Functions Against Improper Semi-parametric Accelerated Failure Time Alternatives: An Application to the Analysis of a Breast Cancer Clinical Trial

    Science.gov (United States)

    BROËT, PHILIPPE; TSODIKOV, ALEXANDER; DE RYCKE, YANN; MOREAU, THIERRY

    2010-01-01

    This paper presents two-sample statistics suited for testing equality of survival functions against improper semi-parametric accelerated failure time alternatives. These tests are designed for comparing either the short- or the long-term effect of a prognostic factor, or both. These statistics are obtained as partial likelihood score statistics from a time-dependent Cox model. As a consequence, the proposed tests can be very easily implemented using widely available software. A breast cancer clinical trial is presented as an example to demonstrate the utility of the proposed tests. PMID:15293627

  16. Two-sample statistics for testing the equality of survival functions against improper semi-parametric accelerated failure time alternatives: an application to the analysis of a breast cancer clinical trial.

    Science.gov (United States)

    Broët, Philippe; Tsodikov, Alexander; De Rycke, Yann; Moreau, Thierry

    2004-06-01

    This paper presents two-sample statistics suited for testing equality of survival functions against improper semi-parametric accelerated failure time alternatives. These tests are designed for comparing either the short- or the long-term effect of a prognostic factor, or both. These statistics are obtained as partial likelihood score statistics from a time-dependent Cox model. As a consequence, the proposed tests can be very easily implemented using widely available software. A breast cancer clinical trial is presented as an example to demonstrate the utility of the proposed tests.

  17. Comparative statistical analysis of carcinogenic and non-carcinogenic effects of uranium in groundwater samples from different regions of Punjab, India.

    Science.gov (United States)

    Saini, Komal; Singh, Parminder; Bajwa, Bikramjit Singh

    2016-12-01

    LED flourimeter has been used for microanalysis of uranium concentration in groundwater samples collected from six districts of South West (SW), West (W) and North East (NE) Punjab, India. Average value of uranium content in water samples of SW Punjab is observed to be higher than WHO, USEPA recommended safe limit of 30µgl -1 as well as AERB proposed limit of 60µgl -1 . Whereas, for W and NE region of Punjab, average level of uranium concentration was within AERB recommended limit of 60µgl -1 . Average value observed in SW Punjab is around 3-4 times the value observed in W Punjab, whereas its value is more than 17 times the average value observed in NE region of Punjab. Statistical analysis of carcinogenic as well as non carcinogenic risks due to uranium have been evaluated for each studied district. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. Statistics for environmental science and management

    National Research Council Canada - National Science Library

    Manly, B.F.J

    2009-01-01

    .... Additional topics covered include environmental monitoring, impact assessment, censored data, environmental sampling, the role of statistics in environmental science, assessing site reclamation...

  19. Autonomous spatially adaptive sampling in experiments based on curvature, statistical error and sample spacing with applications in LDA measurements

    Science.gov (United States)

    Theunissen, Raf; Kadosh, Jesse S.; Allen, Christian B.

    2015-06-01

    Spatially varying signals are typically sampled by collecting uniformly spaced samples irrespective of the signal content. For signals with inhomogeneous information content, this leads to unnecessarily dense sampling in regions of low interest or insufficient sample density at important features, or both. A new adaptive sampling technique is presented directing sample collection in proportion to local information content, capturing adequately the short-period features while sparsely sampling less dynamic regions. The proposed method incorporates a data-adapted sampling strategy on the basis of signal curvature, sample space-filling, variable experimental uncertainty and iterative improvement. Numerical assessment has indicated a reduction in the number of samples required to achieve a predefined uncertainty level overall while improving local accuracy for important features. The potential of the proposed method has been further demonstrated on the basis of Laser Doppler Anemometry experiments examining the wake behind a NACA0012 airfoil and the boundary layer characterisation of a flat plate.

  20. Statistical Methods for Environmental Pollution Monitoring

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, Richard O. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    1987-01-01

    The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Some statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.

  1. Modified Distribution-Free Goodness-of-Fit Test Statistic.

    Science.gov (United States)

    Chun, So Yeon; Browne, Michael W; Shapiro, Alexander

    2018-03-01

    Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.

  2. BrightStat.com: free statistics online.

    Science.gov (United States)

    Stricker, Daniel

    2008-10-01

    Powerful software for statistical analysis is expensive. Here I present BrightStat, a statistical software running on the Internet which is free of charge. BrightStat's goals, its main capabilities and functionalities are outlined. Three different sample runs, a Friedman test, a chi-square test, and a step-wise multiple regression are presented. The results obtained by BrightStat are compared with results computed by SPSS, one of the global leader in providing statistical software, and VassarStats, a collection of scripts for data analysis running on the Internet. Elementary statistics is an inherent part of academic education and BrightStat is an alternative to commercial products.

  3. Statistical inference for discrete-time samples from affine stochastic delay differential equations

    DEFF Research Database (Denmark)

    Küchler, Uwe; Sørensen, Michael

    2013-01-01

    Statistical inference for discrete time observations of an affine stochastic delay differential equation is considered. The main focus is on maximum pseudo-likelihood estimators, which are easy to calculate in practice. A more general class of prediction-based estimating functions is investigated...

  4. 7 CFR 52.38a - Definitions of terms applicable to statistical sampling.

    Science.gov (United States)

    2010-01-01

    ... the number of defects (or defectives), which exceed the sample unit tolerance (“T”), in a series of... accumulation of defects (or defectives) allowed to exceed the sample unit tolerance (“T”) in any sample unit or consecutive group of sample units. (ii) CuSum value. The accumulated number of defects (or defectives) that...

  5. An audit of the statistics and the comparison with the parameter in the population

    Science.gov (United States)

    Bujang, Mohamad Adam; Sa'at, Nadiah; Joys, A. Reena; Ali, Mariana Mohamad

    2015-10-01

    The sufficient sample size that is needed to closely estimate the statistics for particular parameters are use to be an issue. Although sample size might had been calculated referring to objective of the study, however, it is difficult to confirm whether the statistics are closed with the parameter for a particular population. All these while, guideline that uses a p-value less than 0.05 is widely used as inferential evidence. Therefore, this study had audited results that were analyzed from various sub sample and statistical analyses and had compared the results with the parameters in three different populations. Eight types of statistical analysis and eight sub samples for each statistical analysis were analyzed. Results found that the statistics were consistent and were closed to the parameters when the sample study covered at least 15% to 35% of population. Larger sample size is needed to estimate parameter that involve with categorical variables compared with numerical variables. Sample sizes with 300 to 500 are sufficient to estimate the parameters for medium size of population.

  6. The N-Pact Factor: Evaluating the Quality of Empirical Journals with Respect to Sample Size and Statistical Power

    Science.gov (United States)

    Fraley, R. Chris; Vazire, Simine

    2014-01-01

    The authors evaluate the quality of research reported in major journals in social-personality psychology by ranking those journals with respect to their N-pact Factors (NF)—the statistical power of the empirical studies they publish to detect typical effect sizes. Power is a particularly important attribute for evaluating research quality because, relative to studies that have low power, studies that have high power are more likely to (a) to provide accurate estimates of effects, (b) to produce literatures with low false positive rates, and (c) to lead to replicable findings. The authors show that the average sample size in social-personality research is 104 and that the power to detect the typical effect size in the field is approximately 50%. Moreover, they show that there is considerable variation among journals in sample sizes and power of the studies they publish, with some journals consistently publishing higher power studies than others. The authors hope that these rankings will be of use to authors who are choosing where to submit their best work, provide hiring and promotion committees with a superior way of quantifying journal quality, and encourage competition among journals to improve their NF rankings. PMID:25296159

  7. Statistical Processing Algorithms for Human Population Databases

    Directory of Open Access Journals (Sweden)

    Camelia COLESCU

    2012-01-01

    Full Text Available The article is describing some algoritms for statistic functions aplied to a human population database. The samples are specific for the most interesting periods, when the evolution of statistical datas has spectacolous value. The article describes the most usefull form of grafical prezentation of the results

  8. Types of non-probabilistic sampling used in marketing research. „Snowball” sampling

    OpenAIRE

    Manuela Rozalia Gabor

    2007-01-01

    A significant way of investigating a firm’s market is the statistical sampling. The sampling typology provides a non / probabilistic models of gathering information and this paper describes thorough information related to network sampling, named “snowball” sampling. This type of sampling enables the survey of occurrence forms concerning the decision power within an organisation and of the interpersonal relation network governing a certain collectivity, a certain consumer panel. The snowball s...

  9. Statistical Analysis and validation

    NARCIS (Netherlands)

    Hoefsloot, H.C.J.; Horvatovich, P.; Bischoff, R.

    2013-01-01

    In this chapter guidelines are given for the selection of a few biomarker candidates from a large number of compounds with a relative low number of samples. The main concepts concerning the statistical validation of the search for biomarkers are discussed. These complicated methods and concepts are

  10. 1861-1981: Statistics teaching in Italian universities

    Directory of Open Access Journals (Sweden)

    Donata Marasini

    2013-05-01

    Full Text Available This paper aims to outline the development of Statistics from 1861 to 1981 with respect to its contents. The paper pays particular attention to some statistical topics which have been covered by basic introductory courses in the Italian Universities since the beginning of the Italian unification. The review takes as its starting point the well-known book “Filosofia della Statistica” of Melchiorre Gioja. This volume was published 35 years before Italian unification but it already contains the fundamental topics of exploratory and inductive Statistics. These topics give the opportunity to mention Italian statisticians who are considered the pioneers of this discipline. In particular, the attention is focused on four statisticians: Corrado Gini, well-known for its modern insights; Marcello Boldrini, high cultured man also in the epistemological field; Bruno de Finetti, founder of subjective school and Bayesian reasoning; Giuseppe Pompilj, precursor of random variables and sampling theory. The paper browses the indexes of three well-known Italian handbooks that, although published after the period 1861-1981, deal with topics covered in some basic teachings of exploratory statistics, statistical inference and sampling theory from finite population.

  11. The Statistics of wood assays for preservative retention

    Science.gov (United States)

    Patricia K. Lebow; Scott W. Conklin

    2011-01-01

    This paper covers general statistical concepts that apply to interpreting wood assay retention values. In particular, since wood assays are typically obtained from a single composited sample, the statistical aspects, including advantages and disadvantages, of simple compositing are covered.

  12. Statistical Survey of Non-Formal Education

    Directory of Open Access Journals (Sweden)

    Ondřej Nývlt

    2012-12-01

    Full Text Available focused on a programme within a regular education system. Labour market flexibility and new requirements on employees create a new domain of education called non-formal education. Is there a reliable statistical source with a good methodological definition for the Czech Republic? Labour Force Survey (LFS has been the basic statistical source for time comparison of non-formal education for the last ten years. Furthermore, a special Adult Education Survey (AES in 2011 was focused on individual components of non-formal education in a detailed way. In general, the goal of the EU is to use data from both internationally comparable surveys for analyses of the particular fields of lifelong learning in the way, that annual LFS data could be enlarged by detailed information from AES in five years periods. This article describes reliability of statistical data aboutnon-formal education. This analysis is usually connected with sampling and non-sampling errors.

  13. Studies in Theoretical and Applied Statistics

    CERN Document Server

    Pratesi, Monica; Ruiz-Gazen, Anne

    2018-01-01

    This book includes a wide selection of the papers presented at the 48th Scientific Meeting of the Italian Statistical Society (SIS2016), held in Salerno on 8-10 June 2016. Covering a wide variety of topics ranging from modern data sources and survey design issues to measuring sustainable development, it provides a comprehensive overview of the current Italian scientific research in the fields of open data and big data in public administration and official statistics, survey sampling, ordinal and symbolic data, statistical models and methods for network data, time series forecasting, spatial analysis, environmental statistics, economic and financial data analysis, statistics in the education system, and sustainable development. Intended for researchers interested in theoretical and empirical issues, this volume provides interesting starting points for further research.

  14. A simulative comparison of respondent driven sampling with incentivized snowball sampling – the “strudel effect”

    Science.gov (United States)

    Gyarmathy, V. Anna; Johnston, Lisa G.; Caplinskiene, Irma; Caplinskas, Saulius; Latkin, Carl A.

    2014-01-01

    Background Respondent driven sampling (RDS) and Incentivized Snowball Sampling (ISS) are two sampling methods that are commonly used to reach people who inject drugs (PWID). Methods We generated a set of simulated RDS samples on an actual sociometric ISS sample of PWID in Vilnius, Lithuania (“original sample”) to assess if the simulated RDS estimates were statistically significantly different from the original ISS sample prevalences for HIV (9.8%), Hepatitis A (43.6%), Hepatitis B (Anti-HBc 43.9% and HBsAg 3.4%), Hepatitis C (87.5%), syphilis (6.8%) and Chlamydia (8.8%) infections and for selected behavioral risk characteristics. Results The original sample consisted of a large component of 249 people (83% of the sample) and 13 smaller components with 1 to 12 individuals. Generally, as long as all seeds were recruited from the large component of the original sample, the simulation samples simply recreated the large component. There were no significant differences between the large component and the entire original sample for the characteristics of interest. Altogether 99.2% of 360 simulation sample point estimates were within the confidence interval of the original prevalence values for the characteristics of interest. Conclusions When population characteristics are reflected in large network components that dominate the population, RDS and ISS may produce samples that have statistically non-different prevalence values, even though some isolated network components may be under-sampled and/or statistically significantly different from the main groups. This so-called “strudel effect” is discussed in the paper. PMID:24360650

  15. Independent assessment of matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) sample preparation quality: A novel statistical approach for quality scoring.

    Science.gov (United States)

    Kooijman, Pieter C; Kok, Sander J; Weusten, Jos J A M; Honing, Maarten

    2016-05-05

    Preparation of samples according to an optimized method is crucial for accurate determination of polymer sample characteristics by Matrix-Assisted Laser Desorption Ionization (MALDI) analysis. Sample preparation conditions such as matrix choice, cationization agent, deposition technique or even the deposition volume should be chosen to suit the sample of interest. Many sample preparation protocols have been developed and employed, yet finding the optimal sample preparation protocol remains a challenge. Because an objective comparison between the results of diverse protocols is not possible, "gut-feeling" or "good enough" is often decisive in the search for an optimum. This implies that sub-optimal protocols are used, leading to a loss of mass spectral information quality. To address this problem a novel analytical strategy based on MALDI imaging and statistical data processing was developed in which eight parameters were formulated to objectively quantify the quality of sample deposition and optimal MALDI matrix composition and finally sum up to an overall quality score of the sample deposition. These parameters can be established in a fully automated way using commercially available mass spectrometry imaging instruments without any hardware adjustments. With the newly developed analytical strategy the highest quality MALDI spots were selected, resulting in more reproducible and more valuable spectra for PEG in a variety of matrices. Moreover, our method enables an objective comparison of sample preparation protocols for any analyte and opens up new fields of investigation by presenting MALDI performance data in a clear and concise way. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. New complete sample of identified radio sources. Part 2. Statistical study

    International Nuclear Information System (INIS)

    Soltan, A.

    1978-01-01

    Complete sample of radio sources with known redshifts selected in Paper I is studied. Source counts in the sample and the luminosity - volume test show that both quasars and galaxies are subject to the evolution. Luminosity functions for different ranges of redshifts are obtained. Due to many uncertainties only simplified models of the evolution are tested. Exponential decline of the liminosity with time of all the bright sources is in a good agreement both with the luminosity- volume test and N(S) realtion in the entire range of observed flux densities. It is shown that sources in the sample are randomly distributed in scales greater than about 17 Mpc. (author)

  17. Network and adaptive sampling

    CERN Document Server

    Chaudhuri, Arijit

    2014-01-01

    Combining the two statistical techniques of network sampling and adaptive sampling, this book illustrates the advantages of using them in tandem to effectively capture sparsely located elements in unknown pockets. It shows how network sampling is a reliable guide in capturing inaccessible entities through linked auxiliaries. The text also explores how adaptive sampling is strengthened in information content through subsidiary sampling with devices to mitigate unmanageable expanding sample sizes. Empirical data illustrates the applicability of both methods.

  18. Computerized statistical analysis with bootstrap method in nuclear medicine

    International Nuclear Information System (INIS)

    Zoccarato, O.; Sardina, M.; Zatta, G.; De Agostini, A.; Barbesti, S.; Mana, O.; Tarolo, G.L.

    1988-01-01

    Statistical analysis of data samples involves some hypothesis about the features of data themselves. The accuracy of these hypotheses can influence the results of statistical inference. Among the new methods of computer-aided statistical analysis, the bootstrap method appears to be one of the most powerful, thanks to its ability to reproduce many artificial samples starting from a single original sample and because it works without hypothesis about data distribution. The authors applied the bootstrap method to two typical situation of Nuclear Medicine Department. The determination of the normal range of serum ferritin, as assessed by radioimmunoassay and defined by the mean value ±2 standard deviations, starting from an experimental sample of small dimension, shows an unacceptable lower limit (ferritin plasmatic levels below zero). On the contrary, the results obtained by elaborating 5000 bootstrap samples gives ans interval of values (10.95 ng/ml - 72.87 ng/ml) corresponding to the normal ranges commonly reported. Moreover the authors applied the bootstrap method in evaluating the possible error associated with the correlation coefficient determined between left ventricular ejection fraction (LVEF) values obtained by first pass radionuclide angiocardiography with 99m Tc and 195m Au. The results obtained indicate a high degree of statistical correlation and give the range of r 2 values to be considered acceptable for this type of studies

  19. Statistical interpretation of geochemical data

    International Nuclear Information System (INIS)

    Carambula, M.

    1990-01-01

    Statistical results have been obtained from a geochemical research from the following four aerial photographies Zapican, Carape, Las Canias, Alferez. They have been studied 3020 samples in total, to 22 chemical elements using plasma emission spectrometry methods.

  20. Perceived Statistical Knowledge Level and Self-Reported Statistical Practice Among Academic Psychologists

    Directory of Open Access Journals (Sweden)

    Laura Badenes-Ribera

    2018-06-01

    Full Text Available Introduction: Publications arguing against the null hypothesis significance testing (NHST procedure and in favor of good statistical practices have increased. The most frequently mentioned alternatives to NHST are effect size statistics (ES, confidence intervals (CIs, and meta-analyses. A recent survey conducted in Spain found that academic psychologists have poor knowledge about effect size statistics, confidence intervals, and graphic displays for meta-analyses, which might lead to a misinterpretation of the results. In addition, it also found that, although the use of ES is becoming generalized, the same thing is not true for CIs. Finally, academics with greater knowledge about ES statistics presented a profile closer to good statistical practice and research design. Our main purpose was to analyze the extension of these results to a different geographical area through a replication study.Methods: For this purpose, we elaborated an on-line survey that included the same items as the original research, and we asked academic psychologists to indicate their level of knowledge about ES, their CIs, and meta-analyses, and how they use them. The sample consisted of 159 Italian academic psychologists (54.09% women, mean age of 47.65 years. The mean number of years in the position of professor was 12.90 (SD = 10.21.Results: As in the original research, the results showed that, although the use of effect size estimates is becoming generalized, an under-reporting of CIs for ES persists. The most frequent ES statistics mentioned were Cohen's d and R2/η2, which can have outliers or show non-normality or violate statistical assumptions. In addition, academics showed poor knowledge about meta-analytic displays (e.g., forest plot and funnel plot and quality checklists for studies. Finally, academics with higher-level knowledge about ES statistics seem to have a profile closer to good statistical practices.Conclusions: Changing statistical practice is not

  1. Applied statistics in ecology: common pitfalls and simple solutions

    Science.gov (United States)

    E. Ashley Steel; Maureen C. Kennedy; Patrick G. Cunningham; John S. Stanovick

    2013-01-01

    The most common statistical pitfalls in ecological research are those associated with data exploration, the logic of sampling and design, and the interpretation of statistical results. Although one can find published errors in calculations, the majority of statistical pitfalls result from incorrect logic or interpretation despite correct numerical calculations. There...

  2. Systematic sampling for suspended sediment

    Science.gov (United States)

    Robert B. Thomas

    1991-01-01

    Abstract - Because of high costs or complex logistics, scientific populations cannot be measured entirely and must be sampled. Accepted scientific practice holds that sample selection be based on statistical principles to assure objectivity when estimating totals and variances. Probability sampling--obtaining samples with known probabilities--is the only method that...

  3. Nonparametric statistics a step-by-step approach

    CERN Document Server

    Corder, Gregory W

    2014-01-01

    "…a very useful resource for courses in nonparametric statistics in which the emphasis is on applications rather than on theory.  It also deserves a place in libraries of all institutions where introductory statistics courses are taught."" -CHOICE This Second Edition presents a practical and understandable approach that enhances and expands the statistical toolset for readers. This book includes: New coverage of the sign test and the Kolmogorov-Smirnov two-sample test in an effort to offer a logical and natural progression to statistical powerSPSS® (Version 21) software and updated screen ca

  4. ASURV: Astronomical SURVival Statistics

    Science.gov (United States)

    Feigelson, E. D.; Nelson, P. I.; Isobe, T.; LaValley, M.

    2014-06-01

    ASURV (Astronomical SURVival Statistics) provides astronomy survival analysis for right- and left-censored data including the maximum-likelihood Kaplan-Meier estimator and several univariate two-sample tests, bivariate correlation measures, and linear regressions. ASURV is written in FORTRAN 77, and is stand-alone and does not call any specialized libraries.

  5. Estimation of Peaking Factor Uncertainty due to Manufacturing Tolerance using Statistical Sampling Method

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Kyung Hoon; Park, Ho Jin; Lee, Chung Chan; Cho, Jin Young [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

    2015-10-15

    The purpose of this paper is to study the effect on output parameters in the lattice physics calculation due to the last input uncertainty such as manufacturing deviations from nominal value for material composition and geometric dimensions. In a nuclear design and analysis, the lattice physics calculations are usually employed to generate lattice parameters for the nodal core simulation and pin power reconstruction. These lattice parameters which consist of homogenized few-group cross-sections, assembly discontinuity factors, and form-functions can be affected by input uncertainties which arise from three different sources: 1) multi-group cross-section uncertainties, 2) the uncertainties associated with methods and modeling approximations utilized in lattice physics codes, and 3) fuel/assembly manufacturing uncertainties. In this paper, data provided by the light water reactor (LWR) uncertainty analysis in modeling (UAM) benchmark has been used as the manufacturing uncertainties. First, the effect of each input parameter has been investigated through sensitivity calculations at the fuel assembly level. Then, uncertainty in prediction of peaking factor due to the most sensitive input parameter has been estimated using the statistical sampling method, often called the brute force method. For our analysis, the two-dimensional transport lattice code DeCART2D and its ENDF/B-VII.1 based 47-group library were used to perform the lattice physics calculation. Sensitivity calculations have been performed in order to study the influence of manufacturing tolerances on the lattice parameters. The manufacturing tolerance that has the largest influence on the k-inf is the fuel density. The second most sensitive parameter is the outer clad diameter.

  6. Statistical modeling of static strengths of nuclear graphites with relevance to structural design

    International Nuclear Information System (INIS)

    Arai, Taketoshi

    1992-02-01

    Use of graphite materials for structural members poses a problem as to how to take into account of statistical properties of static strength, especially tensile fracture stresses, in component structural design. The present study concerns comprehensive examinations on statistical data base and modelings on nuclear graphites. First, the report provides individual samples and their analyses on strengths of IG-110 and PGX graphites for HTTR components. Those statistical characteristics on other HTGR graphites are also exemplified from the literature. Most of statistical distributions of individual samples are found to be approximately normal. The goodness of fit to normal distributions is more satisfactory with larger sample sizes. Molded and extruded graphites, however, possess a variety of statistical properties depending of samples from different with-in-log locations and/or different orientations. Second, the previous statistical models including the Weibull theory are assessed from the viewpoint of applicability to design procedures. This leads to a conclusion that the Weibull theory and its modified ones are satisfactory only for limited parts of tensile fracture behavior. They are not consistent for whole observations. Only normal statistics are justifiable as practical approaches to discuss specified minimum ultimate strengths as statistical confidence limits for individual samples. Third, the assessment of various statistical models emphasizes the need to develop advanced analytical ones which should involve modeling of microstructural features of actual graphite materials. Improvements of other structural design methodologies are also presented. (author)

  7. Spatial scan statistics to assess sampling strategy of antimicrobial resistance monitoring programme

    DEFF Research Database (Denmark)

    Vieira, Antonio; Houe, Hans; Wegener, Henrik Caspar

    2009-01-01

    Pie collection and analysis of data on antimicrobial resistance in human and animal Populations are important for establishing a baseline of the occurrence of resistance and for determining trends over time. In animals, targeted monitoring with a stratified sampling plan is normally used. However...... sampled by the Danish Integrated Antimicrobial Resistance Monitoring and Research Programme (DANMAP), by identifying spatial Clusters of samples and detecting areas with significantly high or low sampling rates. These analyses were performed for each year and for the total 5-year study period for all...... by an antimicrobial monitoring program....

  8. On the Use of Biomineral Oxygen Isotope Data to Identify Human Migrants in the Archaeological Record: Intra-Sample Variation, Statistical Methods and Geographical Considerations.

    Directory of Open Access Journals (Sweden)

    Emma Lightfoot

    Full Text Available Oxygen isotope analysis of archaeological skeletal remains is an increasingly popular tool to study past human migrations. It is based on the assumption that human body chemistry preserves the δ18O of precipitation in such a way as to be a useful technique for identifying migrants and, potentially, their homelands. In this study, the first such global survey, we draw on published human tooth enamel and bone bioapatite data to explore the validity of using oxygen isotope analyses to identify migrants in the archaeological record. We use human δ18O results to show that there are large variations in human oxygen isotope values within a population sample. This may relate to physiological factors influencing the preservation of the primary isotope signal, or due to human activities (such as brewing, boiling, stewing, differential access to water sources and so on causing variation in ingested water and food isotope values. We compare the number of outliers identified using various statistical methods. We determine that the most appropriate method for identifying migrants is dependent on the data but is likely to be the IQR or median absolute deviation from the median under most archaeological circumstances. Finally, through a spatial assessment of the dataset, we show that the degree of overlap in human isotope values from different locations across Europe is such that identifying individuals' homelands on the basis of oxygen isotope analysis alone is not possible for the regions analysed to date. Oxygen isotope analysis is a valid method for identifying first-generation migrants from an archaeological site when used appropriately, however it is difficult to identify migrants using statistical methods for a sample size of less than c. 25 individuals. In the absence of local previous analyses, each sample should be treated as an individual dataset and statistical techniques can be used to identify migrants, but in most cases pinpointing a specific

  9. Particle System Based Adaptive Sampling on Spherical Parameter Space to Improve the MDL Method for Construction of Statistical Shape Models

    Directory of Open Access Journals (Sweden)

    Rui Xu

    2013-01-01

    Full Text Available Minimum description length (MDL based group-wise registration was a state-of-the-art method to determine the corresponding points of 3D shapes for the construction of statistical shape models (SSMs. However, it suffered from the problem that determined corresponding points did not uniformly spread on original shapes, since corresponding points were obtained by uniformly sampling the aligned shape on the parameterized space of unit sphere. We proposed a particle-system based method to obtain adaptive sampling positions on the unit sphere to resolve this problem. Here, a set of particles was placed on the unit sphere to construct a particle system whose energy was related to the distortions of parameterized meshes. By minimizing this energy, each particle was moved on the unit sphere. When the system became steady, particles were treated as vertices to build a spherical mesh, which was then relaxed to slightly adjust vertices to obtain optimal sampling-positions. We used 47 cases of (left and right lungs and 50 cases of livers, (left and right kidneys, and spleens for evaluations. Experiments showed that the proposed method was able to resolve the problem of the original MDL method, and the proposed method performed better in the generalization and specificity tests.

  10. Classification of bladder cancer cell lines using Raman spectroscopy: a comparison of excitation wavelength, sample substrate and statistical algorithms

    Science.gov (United States)

    Kerr, Laura T.; Adams, Aine; O'Dea, Shirley; Domijan, Katarina; Cullen, Ivor; Hennelly, Bryan M.

    2014-05-01

    Raman microspectroscopy can be applied to the urinary bladder for highly accurate classification and diagnosis of bladder cancer. This technique can be applied in vitro to bladder epithelial cells obtained from urine cytology or in vivo as an optical biopsy" to provide results in real-time with higher sensitivity and specificity than current clinical methods. However, there exists a high degree of variability across experimental parameters which need to be standardised before this technique can be utilized in an everyday clinical environment. In this study, we investigate different laser wavelengths (473 nm and 532 nm), sample substrates (glass, fused silica and calcium fluoride) and multivariate statistical methods in order to gain insight into how these various experimental parameters impact on the sensitivity and specificity of Raman cytology.

  11. Statistical methods for evaluating the attainment of cleanup standards

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, R.O.; Simpson, J.C.

    1992-12-01

    This document is the third volume in a series of volumes sponsored by the US Environmental Protection Agency (EPA), Statistical Policy Branch, that provide statistical methods for evaluating the attainment of cleanup Standards at Superfund sites. Volume 1 (USEPA 1989a) provides sampling designs and tests for evaluating attainment of risk-based standards for soils and solid media. Volume 2 (USEPA 1992) provides designs and tests for evaluating attainment of risk-based standards for groundwater. The purpose of this third volume is to provide statistical procedures for designing sampling programs and conducting statistical tests to determine whether pollution parameters in remediated soils and solid media at Superfund sites attain site-specific reference-based standards. This.document is written for individuals who may not have extensive training or experience with statistical methods. The intended audience includes EPA regional remedial project managers, Superfund-site potentially responsible parties, state environmental protection agencies, and contractors for these groups.

  12. Micro-organism distribution sampling for bioassays

    Science.gov (United States)

    Nelson, B. A.

    1975-01-01

    Purpose of sampling distribution is to characterize sample-to-sample variation so statistical tests may be applied, to estimate error due to sampling (confidence limits) and to evaluate observed differences between samples. Distribution could be used for bioassays taken in hospitals, breweries, food-processing plants, and pharmaceutical plants.

  13. Statistical Rules-of-Thumb.

    Science.gov (United States)

    Brewer, James K.

    1988-01-01

    Six best-selling introductory behavioral statistics textbooks that were published in 1982 and two well-known sampling theory textbooks were reviewed to determine the presence of rules-of-thumb--useful principles with wide application that are not intended to be strictly accurate. The relative frequency and type of rules are reported along with a…

  14. Statistical Model of Extreme Shear

    DEFF Research Database (Denmark)

    Hansen, Kurt Schaldemose; Larsen, Gunner Chr.

    2005-01-01

    In order to continue cost-optimisation of modern large wind turbines, it is important to continuously increase the knowledge of wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... by a model that, on a statistically consistent basis, describes the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of full-scale measurements recorded with a high sampling rate...

  15. The rise of survey sampling

    NARCIS (Netherlands)

    Bethlehem, J.

    2009-01-01

    This paper is about the history of survey sampling. It describes how sampling became an accepted scientific method. From the first ideas in 1895 it took some 50 years before the principles of probability sampling were widely accepted. This papers has a focus on developments in official statistics in

  16. Experimental uncertainty estimation and statistics for data having interval uncertainty.

    Energy Technology Data Exchange (ETDEWEB)

    Kreinovich, Vladik (Applied Biomathematics, Setauket, New York); Oberkampf, William Louis (Applied Biomathematics, Setauket, New York); Ginzburg, Lev (Applied Biomathematics, Setauket, New York); Ferson, Scott (Applied Biomathematics, Setauket, New York); Hajagos, Janos (Applied Biomathematics, Setauket, New York)

    2007-05-01

    This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.

  17. Statistical analysis of environmental data

    International Nuclear Information System (INIS)

    Beauchamp, J.J.; Bowman, K.O.; Miller, F.L. Jr.

    1975-10-01

    This report summarizes the analyses of data obtained by the Radiological Hygiene Branch of the Tennessee Valley Authority from samples taken around the Browns Ferry Nuclear Plant located in Northern Alabama. The data collection was begun in 1968 and a wide variety of types of samples have been gathered on a regular basis. The statistical analysis of environmental data involving very low-levels of radioactivity is discussed. Applications of computer calculations for data processing are described

  18. Statistical inference involving binomial and negative binomial parameters.

    Science.gov (United States)

    García-Pérez, Miguel A; Núñez-Antón, Vicente

    2009-05-01

    Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.

  19. The (mis)reporting of statistical results in psychology journals

    OpenAIRE

    Bakker, Marjan; Wicherts, Jelte M.

    2011-01-01

    In order to study the prevalence, nature (direction), and causes of reporting errors in psychology, we checked the consistency of reported test statistics, degrees of freedom, and p values in a random sample of high- and low-impact psychology journals. In a second study, we established the generality of reporting errors in a random sample of recent psychological articles. Our results, on the basis of 281 articles, indicate that around 18% of statistical results in the psychological literature...

  20. Statistical Model-Based Face Pose Estimation

    Institute of Scientific and Technical Information of China (English)

    GE Xinliang; YANG Jie; LI Feng; WANG Huahua

    2007-01-01

    A robust face pose estimation approach is proposed by using face shape statistical model approach and pose parameters are represented by trigonometric functions. The face shape statistical model is firstly built by analyzing the face shapes from different people under varying poses. The shape alignment is vital in the process of building the statistical model. Then, six trigonometric functions are employed to represent the face pose parameters. Lastly, the mapping function is constructed between face image and face pose by linearly relating different parameters. The proposed approach is able to estimate different face poses using a few face training samples. Experimental results are provided to demonstrate its efficiency and accuracy.

  1. Sampling the Mouse Hippocampal Dentate Gyrus

    Directory of Open Access Journals (Sweden)

    Lisa Basler

    2017-12-01

    Full Text Available Sampling is a critical step in procedures that generate quantitative morphological data in the neurosciences. Samples need to be representative to allow statistical evaluations, and samples need to deliver a precision that makes statistical evaluations not only possible but also meaningful. Sampling generated variability should, e.g., not be able to hide significant group differences from statistical detection if they are present. Estimators of the coefficient of error (CE have been developed to provide tentative answers to the question if sampling has been “good enough” to provide meaningful statistical outcomes. We tested the performance of the commonly used Gundersen-Jensen CE estimator, using the layers of the mouse hippocampal dentate gyrus as an example (molecular layer, granule cell layer and hilus. We found that this estimator provided useful estimates of the precision that can be expected from samples of different sizes. For all layers, we found that a smoothness factor (m of 0 generally provided better estimates than an m of 1. Only for the combined layers, i.e., the entire dentate gyrus, better CE estimates could be obtained using an m of 1. The orientation of the sections impacted on CE sizes. Frontal (coronal sections are typically most efficient by providing the smallest CEs for a given amount of work. Applying the estimator to 3D-reconstructed layers and using very intense sampling, we observed CE size plots with m = 0 to m = 1 transitions that should also be expected but are not often observed in real section series. The data we present also allows the reader to approximate the sampling intervals in frontal, horizontal or sagittal sections that provide CEs of specified sizes for the layers of the mouse dentate gyrus.

  2. Economic Statistical Design of Variable Sampling Interval X¯$\\overline X $ Control Chart Based on Surrogate Variable Using Genetic Algorithms

    Directory of Open Access Journals (Sweden)

    Lee Tae-Hoon

    2016-12-01

    Full Text Available In many cases, a X¯$\\overline X $ control chart based on a performance variable is used in industrial fields. Typically, the control chart monitors the measurements of a performance variable itself. However, if the performance variable is too costly or impossible to measure, and a less expensive surrogate variable is available, the process may be more efficiently controlled using surrogate variables. In this paper, we present a model for the economic statistical design of a VSI (Variable Sampling Interval X¯$\\overline X $ control chart using a surrogate variable that is linearly correlated with the performance variable. We derive the total average profit model from an economic viewpoint and apply the model to a Very High Temperature Reactor (VHTR nuclear fuel measurement system and derive the optimal result using genetic algorithms. Compared with the control chart based on a performance variable, the proposed model gives a larger expected net income per unit of time in the long-run if the correlation between the performance variable and the surrogate variable is relatively high. The proposed model was confined to the sample mean control chart under the assumption that a single assignable cause occurs according to the Poisson process. However, the model may also be extended to other types of control charts using a single or multiple assignable cause assumptions such as VSS (Variable Sample Size X¯$\\overline X $ control chart, EWMA, CUSUM charts and so on.

  3. Scalability on LHS (Latin Hypercube Sampling) samples for use in uncertainty analysis of large numerical models

    International Nuclear Information System (INIS)

    Baron, Jorge H.; Nunez Mac Leod, J.E.

    2000-01-01

    The present paper deals with the utilization of advanced sampling statistical methods to perform uncertainty and sensitivity analysis on numerical models. Such models may represent physical phenomena, logical structures (such as boolean expressions) or other systems, and various of their intrinsic parameters and/or input variables are usually treated as random variables simultaneously. In the present paper a simple method to scale-up Latin Hypercube Sampling (LHS) samples is presented, starting with a small sample and duplicating its size at each step, making it possible to use the already run numerical model results with the smaller sample. The method does not distort the statistical properties of the random variables and does not add any bias to the samples. The results is a significant reduction in numerical models running time can be achieved (by re-using the previously run samples), keeping all the advantages of LHS, until an acceptable representation level is achieved in the output variables. (author)

  4. Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I2 statistic.

    Science.gov (United States)

    Bowden, Jack; Del Greco M, Fabiola; Minelli, Cosetta; Davey Smith, George; Sheehan, Nuala A; Thompson, John R

    2016-12-01

    : MR-Egger regression has recently been proposed as a method for Mendelian randomization (MR) analyses incorporating summary data estimates of causal effect from multiple individual variants, which is robust to invalid instruments. It can be used to test for directional pleiotropy and provides an estimate of the causal effect adjusted for its presence. MR-Egger regression provides a useful additional sensitivity analysis to the standard inverse variance weighted (IVW) approach that assumes all variants are valid instruments. Both methods use weights that consider the single nucleotide polymorphism (SNP)-exposure associations to be known, rather than estimated. We call this the `NO Measurement Error' (NOME) assumption. Causal effect estimates from the IVW approach exhibit weak instrument bias whenever the genetic variants utilized violate the NOME assumption, which can be reliably measured using the F-statistic. The effect of NOME violation on MR-Egger regression has yet to be studied. An adaptation of the I2 statistic from the field of meta-analysis is proposed to quantify the strength of NOME violation for MR-Egger. It lies between 0 and 1, and indicates the expected relative bias (or dilution) of the MR-Egger causal estimate in the two-sample MR context. We call it IGX2 . The method of simulation extrapolation is also explored to counteract the dilution. Their joint utility is evaluated using simulated data and applied to a real MR example. In simulated two-sample MR analyses we show that, when a causal effect exists, the MR-Egger estimate of causal effect is biased towards the null when NOME is violated, and the stronger the violation (as indicated by lower values of IGX2 ), the stronger the dilution. When additionally all genetic variants are valid instruments, the type I error rate of the MR-Egger test for pleiotropy is inflated and the causal effect underestimated. Simulation extrapolation is shown to substantially mitigate these adverse effects. We

  5. Computer Graphics Simulations of Sampling Distributions.

    Science.gov (United States)

    Gordon, Florence S.; Gordon, Sheldon P.

    1989-01-01

    Describes the use of computer graphics simulations to enhance student understanding of sampling distributions that arise in introductory statistics. Highlights include the distribution of sample proportions, the distribution of the difference of sample means, the distribution of the difference of sample proportions, and the distribution of sample…

  6. Choice of Sample Split in Out-of-Sample Forecast Evaluation

    DEFF Research Database (Denmark)

    Hansen, Peter Reinhard; Timmermann, Allan

    , while conversely the power of forecast evaluation tests is strongest with long out-of-sample periods. To deal with size distortions, we propose a test statistic that is robust to the effect of considering multiple sample split points. Empirical applications to predictabil- ity of stock returns......Out-of-sample tests of forecast performance depend on how a given data set is split into estimation and evaluation periods, yet no guidance exists on how to choose the split point. Empirical forecast evaluation results can therefore be difficult to interpret, particularly when several values...... and inflation demonstrate that out-of-sample forecast evaluation results can critically depend on how the sample split is determined....

  7. Statistical Analysis of Research Data | Center for Cancer Research

    Science.gov (United States)

    Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. The Statistical Analysis of Research Data (SARD) course will be held on April 5-6, 2018 from 9 a.m.-5 p.m. at the National Institutes of Health's Natcher Conference Center, Balcony C on the Bethesda Campus. SARD is designed to provide an overview on the general principles of statistical analysis of research data.  The first day will feature univariate data analysis, including descriptive statistics, probability distributions, one- and two-sample inferential statistics.

  8. Improving Instruction Using Statistical Process Control.

    Science.gov (United States)

    Higgins, Ronald C.; Messer, George H.

    1990-01-01

    Two applications of statistical process control to the process of education are described. Discussed are the use of prompt feedback to teachers and prompt feedback to students. A sample feedback form is provided. (CW)

  9. Multivariate statistical analysis a high-dimensional approach

    CERN Document Server

    Serdobolskii, V

    2000-01-01

    In the last few decades the accumulation of large amounts of in­ formation in numerous applications. has stimtllated an increased in­ terest in multivariate analysis. Computer technologies allow one to use multi-dimensional and multi-parametric models successfully. At the same time, an interest arose in statistical analysis with a de­ ficiency of sample data. Nevertheless, it is difficult to describe the recent state of affairs in applied multivariate methods as satisfactory. Unimprovable (dominating) statistical procedures are still unknown except for a few specific cases. The simplest problem of estimat­ ing the mean vector with minimum quadratic risk is unsolved, even for normal distributions. Commonly used standard linear multivari­ ate procedures based on the inversion of sample covariance matrices can lead to unstable results or provide no solution in dependence of data. Programs included in standard statistical packages cannot process 'multi-collinear data' and there are no theoretical recommen­ ...

  10. Evaluation of observables in statistical multifragmentation theories

    International Nuclear Information System (INIS)

    Cole, A.J.

    1989-01-01

    The canonical formulation of equilibrium statistical multifragmentation is examined. It is shown that the explicit construction of observables (average values) by sampling the partition probabilities is unnecessary insofar as closed expressions in the form of recursion relations can be obtained quite easily. Such expressions may conversely be used to verify the sampling algorithms

  11. Statistical inference for the additive hazards model under outcome-dependent sampling.

    Science.gov (United States)

    Yu, Jichang; Liu, Yanyan; Sandler, Dale P; Zhou, Haibo

    2015-09-01

    Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.

  12. Special nuclear material inventory sampling plans

    International Nuclear Information System (INIS)

    Vaccaro, H.S.; Goldman, A.S.

    1987-01-01

    This paper presents improved procedures for obtaining statistically valid sampling plans for nuclear facilities. The double sampling concept and methods for developing optimal double sampling plans are described. An algorithm is described that is satisfactory for finding optimal double sampling plans and choosing appropriate detection and false alarm probabilities

  13. Analysis of Statistical Methods Currently used in Toxicology Journals.

    Science.gov (United States)

    Na, Jihye; Yang, Hyeri; Bae, SeungJin; Lim, Kyung-Min

    2014-09-01

    Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health.

  14. Prior knowledge regularization in statistical medical image tasks

    DEFF Research Database (Denmark)

    Crimi, Alessandro; Sporring, Jon; de Bruijne, Marleen

    2009-01-01

    The estimation of the covariance matrix is a pivotal step inseveral statistical tasks. In particular, the estimation becomes challeng-ing for high dimensional representations of data when few samples areavailable. Using the standard Maximum Likelihood estimation (MLE)when the number of samples ar...

  15. Uncertainty and sampling issues in tank characterization

    International Nuclear Information System (INIS)

    Liebetrau, A.M.; Pulsipher, B.A.; Kashporenko, D.M.

    1997-06-01

    A defensible characterization strategy must recognize that uncertainties are inherent in any measurement or estimate of interest and must employ statistical methods for quantifying and managing those uncertainties. Estimates of risk and therefore key decisions must incorporate knowledge about uncertainty. This report focuses statistical methods that should be employed to ensure confident decision making and appropriate management of uncertainty. Sampling is a major source of uncertainty that deserves special consideration in the tank characterization strategy. The question of whether sampling will ever provide the reliable information needed to resolve safety issues is explored. The issue of sample representativeness must be resolved before sample information is reliable. Representativeness is a relative term but can be defined in terms of bias and precision. Currently, precision can be quantified and managed through an effective sampling and statistical analysis program. Quantifying bias is more difficult and is not being addressed under the current sampling strategies. Bias could be bounded by (1) employing new sampling methods that can obtain samples from other areas in the tanks, (2) putting in new risers on some worst case tanks and comparing the results from existing risers with new risers, or (3) sampling tanks through risers under which no disturbance or activity has previously occurred. With some bound on bias and estimates of precision, various sampling strategies could be determined and shown to be either cost-effective or infeasible

  16. Comparative statistical analysis of carcinogenic and non-carcinogenic effects of uranium in groundwater samples from different regions of Punjab, India

    International Nuclear Information System (INIS)

    Saini, Komal; Singh, Parminder; Bajwa, Bikramjit Singh

    2016-01-01

    LED flourimeter has been used for microanalysis of uranium concentration in groundwater samples collected from six districts of South West (SW), West (W) and North East (NE) Punjab, India. Average value of uranium content in water samples of SW Punjab is observed to be higher than WHO, USEPA recommended safe limit of 30 µg l −1 as well as AERB proposed limit of 60 µg l −1 . Whereas, for W and NE region of Punjab, average level of uranium concentration was within AERB recommended limit of 60 µg l −1 . Average value observed in SW Punjab is around 3–4 times the value observed in W Punjab, whereas its value is more than 17 times the average value observed in NE region of Punjab. Statistical analysis of carcinogenic as well as non carcinogenic risks due to uranium have been evaluated for each studied district. - Highlights: • Uranium level in groundwater samples have been assessed in different regions of Punjab. • Comparative study of carcinogenic and non carcinogenic effects of uranium has been done. • Wide variation has been found for different geological regions. • It has been found that South west Punjab is worst affected by uranium contamination in its water. • For west and north east regions of Punjab, uranium levels in groundwater laid under recommended safe limits.

  17. Quantitative and statistical approaches to geography a practical manual

    CERN Document Server

    Matthews, John A

    2013-01-01

    Quantitative and Statistical Approaches to Geography: A Practical Manual is a practical introduction to some quantitative and statistical techniques of use to geographers and related scientists. This book is composed of 15 chapters, each begins with an outline of the purpose and necessary mechanics of a technique or group of techniques and is concluded with exercises and the particular approach adopted. These exercises aim to enhance student's ability to use the techniques as part of the process by which sound judgments are made according to scientific standards while tackling complex problems. After a brief introduction to the principles of quantitative and statistical geography, this book goes on dealing with the topics of measures of central tendency; probability statements and maps; the problem of time-dependence, time-series analysis, non-normality, and data transformations; and the elements of sampling methodology. Other chapters cover the confidence intervals and estimation from samples, statistical hy...

  18. Using student models to generate feedback in a university course on statistical sampling

    NARCIS (Netherlands)

    Tacoma, S.G.|info:eu-repo/dai/nl/411923080; Drijvers, P.H.M.|info:eu-repo/dai/nl/074302922; Boon, P.B.J.|info:eu-repo/dai/nl/203374207

    2017-01-01

    Due to the complexity of the topic and a lack of individual guidance, introductory statistics courses at university are often challenging. Automated feedback might help to address this issue. In this study, we explore the use of student models to provide feedback. The research question is how

  19. Some challenges with statistical inference in adaptive designs.

    Science.gov (United States)

    Hung, H M James; Wang, Sue-Jane; Yang, Peiling

    2014-01-01

    Adaptive designs have generated a great deal of attention to clinical trial communities. The literature contains many statistical methods to deal with added statistical uncertainties concerning the adaptations. Increasingly encountered in regulatory applications are adaptive statistical information designs that allow modification of sample size or related statistical information and adaptive selection designs that allow selection of doses or patient populations during the course of a clinical trial. For adaptive statistical information designs, a few statistical testing methods are mathematically equivalent, as a number of articles have stipulated, but arguably there are large differences in their practical ramifications. We pinpoint some undesirable features of these methods in this work. For adaptive selection designs, the selection based on biomarker data for testing the correlated clinical endpoints may increase statistical uncertainty in terms of type I error probability, and most importantly the increased statistical uncertainty may be impossible to assess.

  20. Induction of micronuclei in hemocytes of Mytilus edulis and statistical analysis

    DEFF Research Database (Denmark)

    Wrisberg, M. N.; Bilbo, Carl M.; Spliid, Henrik

    1992-01-01

    biological variation, emphasizing the importance of application of a correct statistical method. A systematic approach to the statistical evaluation of the mussel MN test is outlined. The statistical model includes three different situations: (a) estimation of parameters of a single sample, (b) estimation...

  1. Efficient statistical tests to compare Youden index: accounting for contingency correlation.

    Science.gov (United States)

    Chen, Fangyao; Xue, Yuqiang; Tan, Ming T; Chen, Pingyan

    2015-04-30

    Youden index is widely utilized in studies evaluating accuracy of diagnostic tests and performance of predictive, prognostic, or risk models. However, both one and two independent sample tests on Youden index have been derived ignoring the dependence (association) between sensitivity and specificity, resulting in potentially misleading findings. Besides, paired sample test on Youden index is currently unavailable. This article develops efficient statistical inference procedures for one sample, independent, and paired sample tests on Youden index by accounting for contingency correlation, namely associations between sensitivity and specificity and paired samples typically represented in contingency tables. For one and two independent sample tests, the variances are estimated by Delta method, and the statistical inference is based on the central limit theory, which are then verified by bootstrap estimates. For paired samples test, we show that the estimated covariance of the two sensitivities and specificities can be represented as a function of kappa statistic so the test can be readily carried out. We then show the remarkable accuracy of the estimated variance using a constrained optimization approach. Simulation is performed to evaluate the statistical properties of the derived tests. The proposed approaches yield more stable type I errors at the nominal level and substantially higher power (efficiency) than does the original Youden's approach. Therefore, the simple explicit large sample solution performs very well. Because we can readily implement the asymptotic and exact bootstrap computation with common software like R, the method is broadly applicable to the evaluation of diagnostic tests and model performance. Copyright © 2015 John Wiley & Sons, Ltd.

  2. Statistics of Monte Carlo methods used in radiation transport calculation

    International Nuclear Information System (INIS)

    Datta, D.

    2009-01-01

    Radiation transport calculation can be carried out by using either deterministic or statistical methods. Radiation transport calculation based on statistical methods is basic theme of the Monte Carlo methods. The aim of this lecture is to describe the fundamental statistics required to build the foundations of Monte Carlo technique for radiation transport calculation. Lecture note is organized in the following way. Section (1) will describe the introduction of Basic Monte Carlo and its classification towards the respective field. Section (2) will describe the random sampling methods, a key component of Monte Carlo radiation transport calculation, Section (3) will provide the statistical uncertainty of Monte Carlo estimates, Section (4) will describe in brief the importance of variance reduction techniques while sampling particles such as photon, or neutron in the process of radiation transport

  3. The statistics of galaxies: beyond correlation functions

    International Nuclear Information System (INIS)

    Lachieze-Rey, M.

    1988-01-01

    I mention some normalization problems encountered when estimating the 2-point correlation functions in samples of galaxies of different average densities. I present some aspects of the void probability function as a statistical indicator, free of such normalization problems. Finally I suggest a new statistical approach to give an account in a synthetic way of those aspects of the galaxy distribution that a conventional method is unable to characterize

  4. Towards Cost-efficient Sampling Methods

    OpenAIRE

    Peng, Luo; Yongli, Li; Chong, Wu

    2014-01-01

    The sampling method has been paid much attention in the field of complex network in general and statistical physics in particular. This paper presents two new sampling methods based on the perspective that a small part of vertices with high node degree can possess the most structure information of a network. The two proposed sampling methods are efficient in sampling the nodes with high degree. The first new sampling method is improved on the basis of the stratified random sampling method and...

  5. The price of electricity from private power producers: Stage 2, Expansion of sample and preliminary statistical analysis

    Energy Technology Data Exchange (ETDEWEB)

    Comnes, G.A.; Belden, T.N.; Kahn, E.P.

    1995-02-01

    The market for long-term bulk power is becoming increasingly competitive and mature. Given that many privately developed power projects have been or are being developed in the US, it is possible to begin to evaluate the performance of the market by analyzing its revealed prices. Using a consistent method, this paper presents levelized contract prices for a sample of privately developed US generation properties. The sample includes 26 projects with a total capacity of 6,354 MW. Contracts are described in terms of their choice of technology, choice of fuel, treatment of fuel price risk, geographic location, dispatchability, expected dispatch niche, and size. The contract price analysis shows that gas technologies clearly stand out as the most attractive. At an 80% capacity factor, coal projects have an average 20-year levelized price of $0.092/kWh, whereas natural gas combined cycle and/or cogeneration projects have an average price of $0.069/kWh. Within each technology type subsample, however, there is considerable variation. Prices for natural gas combustion turbines and one wind project are also presented. A preliminary statistical analysis is conducted to understand the relationship between price and four categories of explanatory factors including product heterogeneity, geographic heterogeneity, economic and technological change, and other buyer attributes (including avoided costs). Because of residual price variation, we are unable to accept the hypothesis that electricity is a homogeneous product. Instead, the analysis indicates that buyer value still plays an important role in the determination of price for competitively-acquired electricity.

  6. Statistical and quantitative research

    International Nuclear Information System (INIS)

    Anon.

    1984-01-01

    Environmental impacts may escape detection if the statistical tests used to analyze data from field studies are inadequate or the field design is not appropriate. To alleviate this problem, PNL scientists are doing theoretical research which will provide the basis for new sampling schemes or better methods to analyze and present data. Such efforts have resulted in recommendations about the optimal size of study plots, sampling intensity, field replication, and program duration. Costs associated with any of these factors can be substantial if, for example, attention is not paid to the adequacy of a sampling scheme. In the study of dynamics of large-mammal populations, the findings are sometimes surprising. For example, the survival of a grizzly bear population may hinge on the loss of one or two adult females per year

  7. Robust statistical methods with R

    CERN Document Server

    Jureckova, Jana

    2005-01-01

    Robust statistical methods were developed to supplement the classical procedures when the data violate classical assumptions. They are ideally suited to applied research across a broad spectrum of study, yet most books on the subject are narrowly focused, overly theoretical, or simply outdated. Robust Statistical Methods with R provides a systematic treatment of robust procedures with an emphasis on practical application.The authors work from underlying mathematical tools to implementation, paying special attention to the computational aspects. They cover the whole range of robust methods, including differentiable statistical functions, distance of measures, influence functions, and asymptotic distributions, in a rigorous yet approachable manner. Highlighting hands-on problem solving, many examples and computational algorithms using the R software supplement the discussion. The book examines the characteristics of robustness, estimators of real parameter, large sample properties, and goodness-of-fit tests. It...

  8. Applied multivariate statistics with R

    CERN Document Server

    Zelterman, Daniel

    2015-01-01

    This book brings the power of multivariate statistics to graduate-level practitioners, making these analytical methods accessible without lengthy mathematical derivations. Using the open source, shareware program R, Professor Zelterman demonstrates the process and outcomes for a wide array of multivariate statistical applications. Chapters cover graphical displays, linear algebra, univariate, bivariate and multivariate normal distributions, factor methods, linear regression, discrimination and classification, clustering, time series models, and additional methods. Zelterman uses practical examples from diverse disciplines to welcome readers from a variety of academic specialties. Those with backgrounds in statistics will learn new methods while they review more familiar topics. Chapters include exercises, real data sets, and R implementations. The data are interesting, real-world topics, particularly from health and biology-related contexts. As an example of the approach, the text examines a sample from the B...

  9. Some statistical aspects of the cleanup of Enewetak Atoll

    International Nuclear Information System (INIS)

    Barnes, M.G.; Giacomini, J.J.; Friesen, H.N.

    1979-01-01

    Cleaning up the radionuclide contamination at Enewetak Atoll has involved a number of statistical design problems. Theoretical considerations led to choosing a grid sampling pattern; practical problems sometimes lead to resampling on a finer grid. Other problems associated with using grids have been both physical and statistical. The standard sampling system is an in situ intrinsic gamma detector which measures americium concentration. The cleanup guidelines include plutonium concentration, so additional sampling of soil is required to establish Pu/Am ratios. The soil sampling design included both guidelines for location of the samples and also a special pattern of subsamples making up composite samples. The large variance of the soil, sample results makes comparison between the two types difficult anyway, but this is compounded by vegetation attenuation of the in situ readings, soil disturbance influences, and differences in devegetation methods. The constraints inherent in doing what amounts to a research and development project, on a limited budget of time and money, in a field engineering environment are also considered

  10. Renyi statistics in equilibrium statistical mechanics

    International Nuclear Information System (INIS)

    Parvan, A.S.; Biro, T.S.

    2010-01-01

    The Renyi statistics in the canonical and microcanonical ensembles is examined both in general and in particular for the ideal gas. In the microcanonical ensemble the Renyi statistics is equivalent to the Boltzmann-Gibbs statistics. By the exact analytical results for the ideal gas, it is shown that in the canonical ensemble, taking the thermodynamic limit, the Renyi statistics is also equivalent to the Boltzmann-Gibbs statistics. Furthermore it satisfies the requirements of the equilibrium thermodynamics, i.e. the thermodynamical potential of the statistical ensemble is a homogeneous function of first degree of its extensive variables of state. We conclude that the Renyi statistics arrives at the same thermodynamical relations, as those stemming from the Boltzmann-Gibbs statistics in this limit.

  11. Including Below Detection Limit Samples in Post Decommissioning Soil Sample Analyses

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Jung Hwan; Yim, Man Sung [KAIST, Daejeon (Korea, Republic of)

    2016-05-15

    To meet the required standards the site owner has to show that the soil at the facility has been sufficiently cleaned up. To do this one must know the contamination of the soil at the site prior to clean up. This involves sampling that soil to identify the degree of contamination. However there is a technical difficulty in determining how much decontamination should be done. The problem arises when measured samples are below the detection limit. Regulatory guidelines for site reuse after decommissioning are commonly challenged because the majority of the activity in the soil at or below the limit of detection. Using additional statistical analyses of contaminated soil after decommissioning is expected to have the following advantages: a better and more reliable probabilistic exposure assessment, better economics (lower project costs) and improved communication with the public. This research will develop an approach that defines an acceptable method for demonstrating compliance of decommissioned NPP sites and validates that compliance. Soil samples from NPP often contain censored data. Conventional methods for dealing with censored data sets are statistically biased and limited in their usefulness.

  12. Prospective elementary and secondary school mathematics teachers’ statistical reasoning

    Directory of Open Access Journals (Sweden)

    Rabia KARATOPRAK

    2015-04-01

    Full Text Available This study investigated prospective elementary (PEMTs and secondary (PSMTs school mathematics teachers’ statistical reasoning. The study began with the adaptation of the Statistical Reasoning Assessment (Garfield, 2003 test. Then, the test was administered to 82 PEMTs and 91 PSMTs in a metropolitan city of Turkey. Results showed that both groups were equally successful in understanding independence, and understanding importance of large samples. However, results from selecting appropriate measures of center together with the misconceptions assessing the same subscales showed that both groups selected mode rather than mean as an appropriate average. This suggested their lack of attention to the categorical and interval/ratio variables while examining data. Similarly, both groups were successful in interpreting and computing probability; however, they had equiprobability bias, law of small numbers and representativeness misconceptions. The results imply a change in some questions in the Statistical Reasoning Assessment test and that teacher training programs should include statistics courses focusing on studying characteristics of samples.

  13. Statistical Optics

    Science.gov (United States)

    Goodman, Joseph W.

    2000-07-01

    The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Robert G. Bartle The Elements of Integration and Lebesgue Measure George E. P. Box & Norman R. Draper Evolutionary Operation: A Statistical Method for Process Improvement George E. P. Box & George C. Tiao Bayesian Inference in Statistical Analysis R. W. Carter Finite Groups of Lie Type: Conjugacy Classes and Complex Characters R. W. Carter Simple Groups of Lie Type William G. Cochran & Gertrude M. Cox Experimental Designs, Second Edition Richard Courant Differential and Integral Calculus, Volume I RIchard Courant Differential and Integral Calculus, Volume II Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume I Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume II D. R. Cox Planning of Experiments Harold S. M. Coxeter Introduction to Geometry, Second Edition Charles W. Curtis & Irving Reiner Representation Theory of Finite Groups and Associative Algebras Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume I Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume II Cuthbert Daniel Fitting Equations to Data: Computer Analysis of Multifactor Data, Second Edition Bruno de Finetti Theory of Probability, Volume I Bruno de Finetti Theory of Probability, Volume 2 W. Edwards Deming Sample Design in Business Research

  14. Contribution to the study of the dependence in the limit between various statistics of a sample; Contribution a l'etude des liaisons limites entre differentes statistiques d'un echantillon

    Energy Technology Data Exchange (ETDEWEB)

    Rosengard, A [Commissariat a l' Energie Atomique, Saclay (France). Centre d' Etudes Nucleaires

    1966-02-01

    With a view to the investigation of the dependence in the limit between some statistics, we give general definitions of the independence in the limit and some of their properties are investigated. We then give some necessary and sufficient conditions for the independence in the limit between sample mean and orders statistics, in particular quantities and extreme values. For example, we show that, for bounded variance laws, the sample mean and orders statistics, in particular quantities and extreme values. For example, we show that, for bounded variance laws, the sample mean tends to become independent of the extreme values, whereas it is asymptotically dependent on the quantities, with a positive correlation coefficient for which an expression is given. We investigate also the case in which variance is unbounded. (author) [French] Afin d'etudier la liaison limite entre certaines statistiques d'un echantillon, on est amene a donner des definitions generales de l'independance limite, et a en etudier quelques proprietes. On donne ensuite les conditions necessaires et suffisantes pour l'independance limite des statistiques d'ordre, et on etudie la liaison limite entre la moyenne et les statistiques d'ordre, en particulier quantites et valeurs extremes. On montre par exemple, que pour des lois dont la variance est finie, la moyenne tend a devenir independante des valeurs extremes, alors qu'elle est asymptotiquement liee aux quantites, avec un coefficient de correlation positif, dont on donne l'expression. On etudie egalement le cas ou la variance est infinie. (auteur)

  15. Acceptance sampling using judgmental and randomly selected samples

    Energy Technology Data Exchange (ETDEWEB)

    Sego, Landon H.; Shulman, Stanley A.; Anderson, Kevin K.; Wilson, John E.; Pulsipher, Brent A.; Sieber, W. Karl

    2010-09-01

    We present a Bayesian model for acceptance sampling where the population consists of two groups, each with different levels of risk of containing unacceptable items. Expert opinion, or judgment, may be required to distinguish between the high and low-risk groups. Hence, high-risk items are likely to be identifed (and sampled) using expert judgment, while the remaining low-risk items are sampled randomly. We focus on the situation where all observed samples must be acceptable. Consequently, the objective of the statistical inference is to quantify the probability that a large percentage of the unsampled items in the population are also acceptable. We demonstrate that traditional (frequentist) acceptance sampling and simpler Bayesian formulations of the problem are essentially special cases of the proposed model. We explore the properties of the model in detail, and discuss the conditions necessary to ensure that required samples sizes are non-decreasing function of the population size. The method is applicable to a variety of acceptance sampling problems, and, in particular, to environmental sampling where the objective is to demonstrate the safety of reoccupying a remediated facility that has been contaminated with a lethal agent.

  16. Random sampling or geostatistical modelling? Choosing between design-based and model-based sampling strategies for soil (with discussion)

    NARCIS (Netherlands)

    Brus, D.J.; Gruijter, de J.J.

    1997-01-01

    Classical sampling theory has been repeatedly identified with classical statistics which assumes that data are identically and independently distributed. This explains the switch of many soil scientists from design-based sampling strategies, based on classical sampling theory, to the model-based

  17. Statistical characterization report for Single-Shell Tank 241-T-107

    International Nuclear Information System (INIS)

    Cromar, R.D.; Wilmarth, S.R.; Jensen, L.

    1994-01-01

    This report contains the results of the statistical analysis of data from three core samples obtained from single-shell tank 241-T-107 (T-107). Four specific topics are addressed. They are summarized below. Section 3.0 contains mean concentration estimates of analytes found in T-107. The estimates of open-quotes errorclose quotes associated with the concentration estimates are given as 95% confidence intervals (CI) on the mean. The results given are based on three types of samples: core composite samples, core segment samples, and drainable liquid samples. Section 4.0 contains estimates of the spatial variability (variability between cores and between segments) and the analytical variability (variability between the primary and the duplicate analysis). Statistical tests were performed to test the hypothesis that the between cores and the between segments spatial variability is zero. The results of the tests are as follows. Based on the core composite data, the between cores variance is significantly different from zero for 35 out of 74 analytes; i.e., for 53% of the analytes there is no statistically significant difference between the concentration means for two cores. Based on core segment data, the between segments variance is significantly different from zero for 22 out of 24 analytes and the between cores variance is significantly different from zero for 4 out of 24 analytes; i.e., for 8% of the analytes there is no statistically significant difference between segment means and for 83% of the analytes there is no difference between the means from the three cores. Section 5.0 contains the results of the application of multiple comparison methods to the core composite data, the core segment data, and the drainable liquid data. Section 6.0 contains the results of a statistical test conducted to determine the 222-S Analytical Laboratory's ability to homogenize solid core segments

  18. 'Intelligent' approach to radioimmunoassay sample counting employing a microprocessor controlled sample counter

    International Nuclear Information System (INIS)

    Ekins, R.P.; Sufi, S.; Malan, P.G.

    1977-01-01

    The enormous impact on medical science in the last two decades of microanalytical techniques employing radioisotopic labels has, in turn, generated a large demand for automatic radioisotopic sample counters. Such instruments frequently comprise the most important item of capital equipment required in the use of radioimmunoassay and related techniques and often form a principle bottleneck in the flow of samples through a busy laboratory. It is therefore particularly imperitive that such instruments should be used 'intelligently' and in an optimal fashion to avoid both the very large capital expenditure involved in the unnecessary proliferation of instruments and the time delays arising from their sub-optimal use. The majority of the current generation of radioactive sample counters nevertheless rely on primitive control mechanisms based on a simplistic statistical theory of radioactive sample counting which preclude their efficient and rational use. The fundamental principle upon which this approach is based is that it is useless to continue counting a radioactive sample for a time longer than that required to yield a significant increase in precision of the measurement. Thus, since substantial experimental errors occur during sample preparation, these errors should be assessed and must be releted to the counting errors for that sample. It is the objective of this presentation to demonstrate that the combination of a realistic statistical assessment of radioactive sample measurement, together with the more sophisticated control mechanisms that modern microprocessor technology make possible, may often enable savings in counter usage of the order of 5-10 fold to be made. (orig.) [de

  19. Applying Statistical Process Control to Clinical Data: An Illustration.

    Science.gov (United States)

    Pfadt, Al; And Others

    1992-01-01

    Principles of statistical process control are applied to a clinical setting through the use of control charts to detect changes, as part of treatment planning and clinical decision-making processes. The logic of control chart analysis is derived from principles of statistical inference. Sample charts offer examples of evaluating baselines and…

  20. Developing Sampling Frame for Case Study: Challenges and Conditions

    Science.gov (United States)

    Ishak, Noriah Mohd; Abu Bakar, Abu Yazid

    2014-01-01

    Due to statistical analysis, the issue of random sampling is pertinent to any quantitative study. Unlike quantitative study, the elimination of inferential statistical analysis, allows qualitative researchers to be more creative in dealing with sampling issue. Since results from qualitative study cannot be generalized to the bigger population,…

  1. A weighted generalized score statistic for comparison of predictive values of diagnostic tests.

    Science.gov (United States)

    Kosinski, Andrzej S

    2013-03-15

    Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.

  2. Testing University Rankings Statistically: Why this Perhaps is not such a Good Idea after All. Some Reflections on Statistical Power, Effect Size, Random Sampling and Imaginary Populations

    DEFF Research Database (Denmark)

    Schneider, Jesper Wiborg

    2012-01-01

    In this paper we discuss and question the use of statistical significance tests in relation to university rankings as recently suggested. We outline the assumptions behind and interpretations of statistical significance tests and relate this to examples from the recent SCImago Institutions Rankin...

  3. 9th Symposium on Computational Statistics

    CERN Document Server

    Mildner, Vesna

    1990-01-01

    Although no-one is, probably, too enthused about the idea, it is a fact that the development of most empirical sciences to a great extent depends on the development of data analysis methods and techniques, which, due to the necessity of application of computers for that purpose, actually means that it practically depends on the advancement and orientation of computer statistics. Every other year the International Association for Statistical Computing sponsors the organizition of meetings of individual s professiona77y involved in computational statistics. Since these meetings attract professionals from allover the world, they are a good sample for the estimation of trends in this area which some believe is a statistics proper while others claim it is computer science. It seems, though, that an increasing number of colleagues treat it as an independent scientific or at least technical discipline. This volume contains six invited papers, 41 contributed papers and, finally, two papers which are, formally, softwa...

  4. Understanding Statistics - Cancer Statistics

    Science.gov (United States)

    Annual reports of U.S. cancer statistics including new cases, deaths, trends, survival, prevalence, lifetime risk, and progress toward Healthy People targets, plus statistical summaries for a number of common cancer types.

  5. Probability sampling in legal cases: Kansas cellphone users

    Science.gov (United States)

    Kadane, Joseph B.

    2012-10-01

    Probability sampling is a standard statistical technique. This article introduces the basic ideas of probability sampling, and shows in detail how probability sampling was used in a particular legal case.

  6. Speech emotion recognition based on statistical pitch model

    Institute of Scientific and Technical Information of China (English)

    WANG Zhiping; ZHAO Li; ZOU Cairong

    2006-01-01

    A modified Parzen-window method, which keep high resolution in low frequencies and keep smoothness in high frequencies, is proposed to obtain statistical model. Then, a gender classification method utilizing the statistical model is proposed, which have a 98% accuracy of gender classification while long sentence is dealt with. By separation the male voice and female voice, the mean and standard deviation of speech training samples with different emotion are used to create the corresponding emotion models. Then the Bhattacharyya distance between the test sample and statistical models of pitch, are utilized for emotion recognition in speech.The normalization of pitch for the male voice and female voice are also considered, in order to illustrate them into a uniform space. Finally, the speech emotion recognition experiment based on K Nearest Neighbor shows that, the correct rate of 81% is achieved, where it is only 73.85%if the traditional parameters are utilized.

  7. An introduction to automatic radioactive sample counters

    International Nuclear Information System (INIS)

    1980-01-01

    The subject is covered in chapters, entitled; the detection of radiation in sample counters; nucleonic equipment; liquid scintillation counting; basic features of automatic sample counters; statistics of counting; data analysis; purchase, installation, calibration and maintenance of automatic sample counters. (U.K.)

  8. Concepts in sample size determination

    Directory of Open Access Journals (Sweden)

    Umadevi K Rao

    2012-01-01

    Full Text Available Investigators involved in clinical, epidemiological or translational research, have the drive to publish their results so that they can extrapolate their findings to the population. This begins with the preliminary step of deciding the topic to be studied, the subjects and the type of study design. In this context, the researcher must determine how many subjects would be required for the proposed study. Thus, the number of individuals to be included in the study, i.e., the sample size is an important consideration in the design of many clinical studies. The sample size determination should be based on the difference in the outcome between the two groups studied as in an analytical study, as well as on the accepted p value for statistical significance and the required statistical power to test a hypothesis. The accepted risk of type I error or alpha value, which by convention is set at the 0.05 level in biomedical research defines the cutoff point at which the p value obtained in the study is judged as significant or not. The power in clinical research is the likelihood of finding a statistically significant result when it exists and is typically set to >80%. This is necessary since the most rigorously executed studies may fail to answer the research question if the sample size is too small. Alternatively, a study with too large a sample size will be difficult and will result in waste of time and resources. Thus, the goal of sample size planning is to estimate an appropriate number of subjects for a given study design. This article describes the concepts in estimating the sample size.

  9. Evaluation of Respondent-Driven Sampling

    Science.gov (United States)

    McCreesh, Nicky; Frost, Simon; Seeley, Janet; Katongole, Joseph; Tarsh, Matilda Ndagire; Ndunguse, Richard; Jichi, Fatima; Lunel, Natasha L; Maher, Dermot; Johnston, Lisa G; Sonnenberg, Pam; Copas, Andrew J; Hayes, Richard J; White, Richard G

    2012-01-01

    Background Respondent-driven sampling is a novel variant of link-tracing sampling for estimating the characteristics of hard-to-reach groups, such as HIV prevalence in sex-workers. Despite its use by leading health organizations, the performance of this method in realistic situations is still largely unknown. We evaluated respondent-driven sampling by comparing estimates from a respondent-driven sampling survey with total-population data. Methods Total-population data on age, tribe, religion, socioeconomic status, sexual activity and HIV status were available on a population of 2402 male household-heads from an open cohort in rural Uganda. A respondent-driven sampling (RDS) survey was carried out in this population, employing current methods of sampling (RDS sample) and statistical inference (RDS estimates). Analyses were carried out for the full RDS sample and then repeated for the first 250 recruits (small sample). Results We recruited 927 household-heads. Full and small RDS samples were largely representative of the total population, but both samples under-represented men who were younger, of higher socioeconomic status, and with unknown sexual activity and HIV status. Respondent-driven-sampling statistical-inference methods failed to reduce these biases. Only 31%-37% (depending on method and sample size) of RDS estimates were closer to the true population proportions than the RDS sample proportions. Only 50%-74% of respondent-driven-sampling bootstrap 95% confidence intervals included the population proportion. Conclusions Respondent-driven sampling produced a generally representative sample of this well-connected non-hidden population. However, current respondent-driven-sampling inference methods failed to reduce bias when it occurred. Whether the data required to remove bias and measure precision can be collected in a respondent-driven sampling survey is unresolved. Respondent-driven sampling should be regarded as a (potentially superior) form of convenience-sampling

  10. Evaluation of respondent-driven sampling.

    Science.gov (United States)

    McCreesh, Nicky; Frost, Simon D W; Seeley, Janet; Katongole, Joseph; Tarsh, Matilda N; Ndunguse, Richard; Jichi, Fatima; Lunel, Natasha L; Maher, Dermot; Johnston, Lisa G; Sonnenberg, Pam; Copas, Andrew J; Hayes, Richard J; White, Richard G

    2012-01-01

    Respondent-driven sampling is a novel variant of link-tracing sampling for estimating the characteristics of hard-to-reach groups, such as HIV prevalence in sex workers. Despite its use by leading health organizations, the performance of this method in realistic situations is still largely unknown. We evaluated respondent-driven sampling by comparing estimates from a respondent-driven sampling survey with total population data. Total population data on age, tribe, religion, socioeconomic status, sexual activity, and HIV status were available on a population of 2402 male household heads from an open cohort in rural Uganda. A respondent-driven sampling (RDS) survey was carried out in this population, using current methods of sampling (RDS sample) and statistical inference (RDS estimates). Analyses were carried out for the full RDS sample and then repeated for the first 250 recruits (small sample). We recruited 927 household heads. Full and small RDS samples were largely representative of the total population, but both samples underrepresented men who were younger, of higher socioeconomic status, and with unknown sexual activity and HIV status. Respondent-driven sampling statistical inference methods failed to reduce these biases. Only 31%-37% (depending on method and sample size) of RDS estimates were closer to the true population proportions than the RDS sample proportions. Only 50%-74% of respondent-driven sampling bootstrap 95% confidence intervals included the population proportion. Respondent-driven sampling produced a generally representative sample of this well-connected nonhidden population. However, current respondent-driven sampling inference methods failed to reduce bias when it occurred. Whether the data required to remove bias and measure precision can be collected in a respondent-driven sampling survey is unresolved. Respondent-driven sampling should be regarded as a (potentially superior) form of convenience sampling method, and caution is required

  11. Statistical estimation Monte Carlo for unreliability evaluation of highly reliable system

    International Nuclear Information System (INIS)

    Xiao Gang; Su Guanghui; Jia Dounan; Li Tianduo

    2000-01-01

    Based on analog Monte Carlo simulation, statistical Monte Carlo methods for unreliable evaluation of highly reliable system are constructed, including direct statistical estimation Monte Carlo method and weighted statistical estimation Monte Carlo method. The basal element is given, and the statistical estimation Monte Carlo estimators are derived. Direct Monte Carlo simulation method, bounding-sampling method, forced transitions Monte Carlo method, direct statistical estimation Monte Carlo and weighted statistical estimation Monte Carlo are used to evaluate unreliability of a same system. By comparing, weighted statistical estimation Monte Carlo estimator has smallest variance, and has highest calculating efficiency

  12. Control cards as a statistical quality control resource

    Directory of Open Access Journals (Sweden)

    Aleksandar Živan Drenovac

    2013-02-01

    Full Text Available Normal 0 false false false MicrosoftInternetExplorer4 This paper proves that applying of statistical methods can significantly contribute increasing of products and services quality, as well as increasing of institutions rating. Determining of optimal, apropos anticipatory and limitary values, is based on sample`s statistical analyze. Control cards represent very confident instrument, which is simple for use and efficient for control of process, by which process is maintained in set borders. Thus, control cards can be applied in quality control of procesess of weapons and military equipment production, maintenance of technical systems, as well as for seting of standards and increasing of quality level for many other activities.

  13. Towards the harmonization between National Forest Inventory and Forest Condition Monitoring. Consistency of plot allocation and effect of tree selection methods on sample statistics in Italy.

    Science.gov (United States)

    Gasparini, Patrizia; Di Cosmo, Lucio; Cenni, Enrico; Pompei, Enrico; Ferretti, Marco

    2013-07-01

    In the frame of a process aiming at harmonizing National Forest Inventory (NFI) and ICP Forests Level I Forest Condition Monitoring (FCM) in Italy, we investigated (a) the long-term consistency between FCM sample points (a subsample of the first NFI, 1985, NFI_1) and recent forest area estimates (after the second NFI, 2005, NFI_2) and (b) the effect of tree selection method (tree-based or plot-based) on sample composition and defoliation statistics. The two investigations were carried out on 261 and 252 FCM sites, respectively. Results show that some individual forest categories (larch and stone pine, Norway spruce, other coniferous, beech, temperate oaks and cork oak forests) are over-represented and others (hornbeam and hophornbeam, other deciduous broadleaved and holm oak forests) are under-represented in the FCM sample. This is probably due to a change in forest cover, which has increased by 1,559,200 ha from 1985 to 2005. In case of shift from a tree-based to a plot-based selection method, 3,130 (46.7%) of the original 6,703 sample trees will be abandoned, and 1,473 new trees will be selected. The balance between exclusion of former sample trees and inclusion of new ones will be particularly unfavourable for conifers (with only 16.4% of excluded trees replaced by new ones) and less for deciduous broadleaves (with 63.5% of excluded trees replaced). The total number of tree species surveyed will not be impacted, while the number of trees per species will, and the resulting (plot-based) sample composition will have a much larger frequency of deciduous broadleaved trees. The newly selected trees have-in general-smaller diameter at breast height (DBH) and defoliation scores. Given the larger rate of turnover, the deciduous broadleaved part of the sample will be more impacted. Our results suggest that both a revision of FCM network to account for forest area change and a plot-based approach to permit statistical inference and avoid bias in the tree sample

  14. Learning to reason from samples

    NARCIS (Netherlands)

    Ben-Zvi, Dani; Bakker, Arthur; Makar, Katie

    2015-01-01

    The goal of this article is to introduce the topic of learning to reason from samples, which is the focus of this special issue of Educational Studies in Mathematics on statistical reasoning. Samples are data sets, taken from some wider universe (e.g., a population or a process) using a particular

  15. Elementary statistics for effective library and information service management

    CERN Document Server

    Egghe, Leo

    2001-01-01

    This title describes how best to use statistical data to produce professional reports on library activities. The authors cover data gathering, sampling, graphical representation of data and summary statistics from data, and also include a section on trend analysis. A full bibliography and a subject index make this a key title for any information professional..

  16. Statistical approach for selection of regression model during validation of bioanalytical method

    Directory of Open Access Journals (Sweden)

    Natalija Nakov

    2014-06-01

    Full Text Available The selection of an adequate regression model is the basis for obtaining accurate and reproducible results during the bionalytical method validation. Given the wide concentration range, frequently present in bioanalytical assays, heteroscedasticity of the data may be expected. Several weighted linear and quadratic regression models were evaluated during the selection of the adequate curve fit using nonparametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples. The results obtained with One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the error (presented through the relative residuals were obtained. Estimation of the significance of the differences in the RR was achieved using Wilcoxon signed rank test, where linear and quadratic regression models were treated as two independent groups. The application of this simple non-parametric statistical test provides statistical confirmation of the choice of an adequate regression model.

  17. Small Sample Statistics for Incomplete Nonnormal Data: Extensions of Complete Data Formulae and a Monte Carlo Comparison

    Science.gov (United States)

    Savalei, Victoria

    2010-01-01

    Incomplete nonnormal data are common occurrences in applied research. Although these 2 problems are often dealt with separately by methodologists, they often cooccur. Very little has been written about statistics appropriate for evaluating models with such data. This article extends several existing statistics for complete nonnormal data to…

  18. Estimation of measurement variance in the context of environment statistics

    Science.gov (United States)

    Maiti, Pulakesh

    2015-02-01

    The object of environment statistics is for providing information on the environment, on its most important changes over time, across locations and identifying the main factors that influence them. Ultimately environment statistics would be required to produce higher quality statistical information. For this timely, reliable and comparable data are needed. Lack of proper and uniform definitions, unambiguous classifications pose serious problems to procure qualitative data. These cause measurement errors. We consider the problem of estimating measurement variance so that some measures may be adopted to improve upon the quality of data on environmental goods and services and on value statement in economic terms. The measurement technique considered here is that of employing personal interviewers and the sampling considered here is that of two-stage sampling.

  19. Limitations of Poisson statistics in describing radioactive decay.

    Science.gov (United States)

    Sitek, Arkadiusz; Celler, Anna M

    2015-12-01

    The assumption that nuclear decays are governed by Poisson statistics is an approximation. This approximation becomes unjustified when data acquisition times longer than or even comparable with the half-lives of the radioisotope in the sample are considered. In this work, the limits of the Poisson-statistics approximation are investigated. The formalism for the statistics of radioactive decay based on binomial distribution is derived. The theoretical factor describing the deviation of variance of the number of decays predicated by the Poisson distribution from the true variance is defined and investigated for several commonly used radiotracers such as (18)F, (15)O, (82)Rb, (13)N, (99m)Tc, (123)I, and (201)Tl. The variance of the number of decays estimated using the Poisson distribution is significantly different than the true variance for a 5-minute observation time of (11)C, (15)O, (13)N, and (82)Rb. Durations of nuclear medicine studies often are relatively long; they may be even a few times longer than the half-lives of some short-lived radiotracers. Our study shows that in such situations the Poisson statistics is unsuitable and should not be applied to describe the statistics of the number of decays in radioactive samples. However, the above statement does not directly apply to counting statistics at the level of event detection. Low sensitivities of detectors which are used in imaging studies make the Poisson approximation near perfect. Copyright © 2015 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

  20. An 'intelligent' approach to radioimmunoassay sample counting employing a microprocessor-controlled sample counter

    International Nuclear Information System (INIS)

    Ekins, R.P.; Sufi, S.; Malan, P.G.

    1978-01-01

    The enormous impact on medical science in the last two decades of microanalytical techniques employing radioisotopic labels has, in turn, generated a large demand for automatic radioisotopic sample counters. Such instruments frequently comprise the most important item of capital equipment required in the use of radioimmunoassay and related techniques and often form a principle bottleneck in the flow of samples through a busy laboratory. It is therefore imperative that such instruments should be used 'intelligently' and in an optimal fashion to avoid both the very large capital expenditure involved in the unnecessary proliferation of instruments and the time delays arising from their sub-optimal use. Most of the current generation of radioactive sample counters nevertheless rely on primitive control mechanisms based on a simplistic statistical theory of radioactive sample counting which preclude their efficient and rational use. The fundamental principle upon which this approach is based is that it is useless to continue counting a radioactive sample for a time longer than that required to yield a significant increase in precision of the measurement. Thus, since substantial experimental errors occur during sample preparation, these errors should be assessed and must be related to the counting errors for that sample. The objective of the paper is to demonstrate that the combination of a realistic statistical assessment of radioactive sample measurement, together with the more sophisticated control mechanisms that modern microprocessor technology make possible, may often enable savings in counter usage of the order of 5- to 10-fold to be made. (author)

  1. Statistical scaling of pore-scale Lagrangian velocities in natural porous media.

    Science.gov (United States)

    Siena, M; Guadagnini, A; Riva, M; Bijeljic, B; Pereira Nunes, J P; Blunt, M J

    2014-08-01

    We investigate the scaling behavior of sample statistics of pore-scale Lagrangian velocities in two different rock samples, Bentheimer sandstone and Estaillades limestone. The samples are imaged using x-ray computer tomography with micron-scale resolution. The scaling analysis relies on the study of the way qth-order sample structure functions (statistical moments of order q of absolute increments) of Lagrangian velocities depend on separation distances, or lags, traveled along the mean flow direction. In the sandstone block, sample structure functions of all orders exhibit a power-law scaling within a clearly identifiable intermediate range of lags. Sample structure functions associated with the limestone block display two diverse power-law regimes, which we infer to be related to two overlapping spatially correlated structures. In both rocks and for all orders q, we observe linear relationships between logarithmic structure functions of successive orders at all lags (a phenomenon that is typically known as extended power scaling, or extended self-similarity). The scaling behavior of Lagrangian velocities is compared with the one exhibited by porosity and specific surface area, which constitute two key pore-scale geometric observables. The statistical scaling of the local velocity field reflects the behavior of these geometric observables, with the occurrence of power-law-scaling regimes within the same range of lags for sample structure functions of Lagrangian velocity, porosity, and specific surface area.

  2. The interprocess NIR sampling as an alternative approach to multivariate statistical process control for identifying sources of product-quality variability.

    Science.gov (United States)

    Marković, Snežana; Kerč, Janez; Horvat, Matej

    2017-03-01

    We are presenting a new approach of identifying sources of variability within a manufacturing process by NIR measurements of samples of intermediate material after each consecutive unit operation (interprocess NIR sampling technique). In addition, we summarize the development of a multivariate statistical process control (MSPC) model for the production of enteric-coated pellet product of the proton-pump inhibitor class. By developing provisional NIR calibration models, the identification of critical process points yields comparable results to the established MSPC modeling procedure. Both approaches are shown to lead to the same conclusion, identifying parameters of extrusion/spheronization and characteristics of lactose that have the greatest influence on the end-product's enteric coating performance. The proposed approach enables quicker and easier identification of variability sources during manufacturing process, especially in cases when historical process data is not straightforwardly available. In the presented case the changes of lactose characteristics are influencing the performance of the extrusion/spheronization process step. The pellet cores produced by using one (considered as less suitable) lactose source were on average larger and more fragile, leading to consequent breakage of the cores during subsequent fluid bed operations. These results were confirmed by additional experimental analyses illuminating the underlying mechanism of fracture of oblong pellets during the pellet coating process leading to compromised film coating.

  3. Using statistics to understand the environment

    CERN Document Server

    Cook, Penny A

    2000-01-01

    Using Statistics to Understand the Environment covers all the basic tests required for environmental practicals and projects and points the way to the more advanced techniques that may be needed in more complex research designs. Following an introduction to project design, the book covers methods to describe data, to examine differences between samples, and to identify relationships and associations between variables.Featuring: worked examples covering a wide range of environmental topics, drawings and icons, chapter summaries, a glossary of statistical terms and a further reading section, this book focuses on the needs of the researcher rather than on the mathematics behind the tests.

  4. Parts of the Whole: Hands On Statistics

    Directory of Open Access Journals (Sweden)

    Dorothy Wallace

    2018-01-01

    Full Text Available In this column we describe a hands-on data collection lab for an introductory statistics course. The exercise elicits issues of normality, sampling, and sample mean comparisons. Based on volcanology models of tephra dispersion, this lab leads students to question the accuracy of some assumptions made in the model, particularly regarding the normality of the dispersal of tephra of identical size in a given atmospheric layer.

  5. Statistical analysis of radioactivity in the environment

    International Nuclear Information System (INIS)

    Barnes, M.G.; Giacomini, J.J.

    1980-05-01

    The pattern of radioactivity in surface soils of Area 5 of the Nevada Test Site is analyzed statistically by means of kriging. The 1962 event code-named Smallboy effected the greatest proportion of the area sampled, but some of the area was also affected by a number of other events. The data for this study were collected on a regular grid to take advantage of the efficiency of grid sampling

  6. Statistical inference for the lifetime performance index based on generalised order statistics from exponential distribution

    Science.gov (United States)

    Vali Ahmadi, Mohammad; Doostparast, Mahdi; Ahmadi, Jafar

    2015-04-01

    In manufacturing industries, the lifetime of an item is usually characterised by a random variable X and considered to be satisfactory if X exceeds a given lower lifetime limit L. The probability of a satisfactory item is then ηL := P(X ≥ L), called conforming rate. In industrial companies, however, the lifetime performance index, proposed by Montgomery and denoted by CL, is widely used as a process capability index instead of the conforming rate. Assuming a parametric model for the random variable X, we show that there is a connection between the conforming rate and the lifetime performance index. Consequently, the statistical inferences about ηL and CL are equivalent. Hence, we restrict ourselves to statistical inference for CL based on generalised order statistics, which contains several ordered data models such as usual order statistics, progressively Type-II censored data and records. Various point and interval estimators for the parameter CL are obtained and optimal critical regions for the hypothesis testing problems concerning CL are proposed. Finally, two real data-sets on the lifetimes of insulating fluid and ball bearings, due to Nelson (1982) and Caroni (2002), respectively, and a simulated sample are analysed.

  7. Introduction to statistics using interactive MM*Stat elements

    CERN Document Server

    Härdle, Wolfgang Karl; Rönz, Bernd

    2015-01-01

    MM*Stat, together with its enhanced online version with interactive examples, offers a flexible tool that facilitates the teaching of basic statistics. It covers all the topics found in introductory descriptive statistics courses, including simple linear regression and time series analysis, the fundamentals of inferential statistics (probability theory, random sampling and estimation theory), and inferential statistics itself (confidence intervals, testing). MM*Stat is also designed to help students rework class material independently and to promote comprehension with the help of additional examples. Each chapter starts with the necessary theoretical background, which is followed by a variety of examples. The core examples are based on the content of the respective chapter, while the advanced examples, designed to deepen students’ knowledge, also draw on information and material from previous chapters. The enhanced online version helps students grasp the complexity and the practical relevance of statistical...

  8. National Statistical Commission and Indian Official Statistics*

    Indian Academy of Sciences (India)

    IAS Admin

    a good collection of official statistics of that time. With more .... statistical agencies and institutions to provide details of statistical activities .... ing several training programmes. .... ful completion of Indian Statistical Service examinations, the.

  9. Parametric analysis of the statistical model of the stick-slip process

    Science.gov (United States)

    Lima, Roberta; Sampaio, Rubens

    2017-06-01

    In this paper it is performed a parametric analysis of the statistical model of the response of a dry-friction oscillator. The oscillator is a spring-mass system which moves over a base with a rough surface. Due to this roughness, the mass is subject to a dry-frictional force modeled as a Coulomb friction. The system is stochastically excited by an imposed bang-bang base motion. The base velocity is modeled by a Poisson process for which a probabilistic model is fully specified. The excitation induces in the system stochastic stick-slip oscillations. The system response is composed by a random sequence alternating stick and slip-modes. With realizations of the system, a statistical model is constructed for this sequence. In this statistical model, the variables of interest of the sequence are modeled as random variables, as for example, the number of time intervals in which stick or slip occur, the instants at which they begin, and their duration. Samples of the system response are computed by integration of the dynamic equation of the system using independent samples of the base motion. Statistics and histograms of the random variables which characterize the stick-slip process are estimated for the generated samples. The objective of the paper is to analyze how these estimated statistics and histograms vary with the system parameters, i.e., to make a parametric analysis of the statistical model of the stick-slip process.

  10. Official Statistics and Statistics Education: Bridging the Gap

    Directory of Open Access Journals (Sweden)

    Gal Iddo

    2017-03-01

    Full Text Available This article aims to challenge official statistics providers and statistics educators to ponder on how to help non-specialist adult users of statistics develop those aspects of statistical literacy that pertain to official statistics. We first document the gap in the literature in terms of the conceptual basis and educational materials needed for such an undertaking. We then review skills and competencies that may help adults to make sense of statistical information in areas of importance to society. Based on this review, we identify six elements related to official statistics about which non-specialist adult users should possess knowledge in order to be considered literate in official statistics: (1 the system of official statistics and its work principles; (2 the nature of statistics about society; (3 indicators; (4 statistical techniques and big ideas; (5 research methods and data sources; and (6 awareness and skills for citizens’ access to statistical reports. Based on this ad hoc typology, we discuss directions that official statistics providers, in cooperation with statistics educators, could take in order to (1 advance the conceptualization of skills needed to understand official statistics, and (2 expand educational activities and services, specifically by developing a collaborative digital textbook and a modular online course, to improve public capacity for understanding of official statistics.

  11. Statistical theory and inference

    CERN Document Server

    Olive, David J

    2014-01-01

    This text is for  a one semester graduate course in statistical theory and covers minimal and complete sufficient statistics, maximum likelihood estimators, method of moments, bias and mean square error, uniform minimum variance estimators and the Cramer-Rao lower bound, an introduction to large sample theory, likelihood ratio tests and uniformly most powerful  tests and the Neyman Pearson Lemma. A major goal of this text is to make these topics much more accessible to students by using the theory of exponential families. Exponential families, indicator functions and the support of the distribution are used throughout the text to simplify the theory. More than 50 ``brand name" distributions are used to illustrate the theory with many examples of exponential families, maximum likelihood estimators and uniformly minimum variance unbiased estimators. There are many homework problems with over 30 pages of solutions.

  12. Overdispersion in nuclear statistics

    International Nuclear Information System (INIS)

    Semkow, Thomas M.

    1999-01-01

    The modern statistical distribution theory is applied to the development of the overdispersion theory in ionizing-radiation statistics for the first time. The physical nuclear system is treated as a sequence of binomial processes, each depending on a characteristic probability, such as probability of decay, detection, etc. The probabilities fluctuate in the course of a measurement, and the physical reasons for that are discussed. If the average values of the probabilities change from measurement to measurement, which originates from the random Lexis binomial sampling scheme, then the resulting distribution is overdispersed. The generating functions and probability distribution functions are derived, followed by a moment analysis. The Poisson and Gaussian limits are also given. The distribution functions belong to a family of generalized hypergeometric factorial moment distributions by Kemp and Kemp, and can serve as likelihood functions for the statistical estimations. An application to radioactive decay with detection is described and working formulae are given, including a procedure for testing the counting data for overdispersion. More complex experiments in nuclear physics (such as solar neutrino) can be handled by this model, as well as distinguishing between the source and background

  13. Basic statistical tools in research and data analysis

    Directory of Open Access Journals (Sweden)

    Zulfiqar Ali

    2016-01-01

    Full Text Available Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

  14. STATLIB, Interactive Statistics Program Library of Tutorial System

    International Nuclear Information System (INIS)

    Anderson, H.E.

    1986-01-01

    1 - Description of program or function: STATLIB is a conversational statistical program library developed in conjunction with a Sandia National Laboratories applied statistics course intended for practicing engineers and scientists. STATLIB is a group of 15 interactive, argument-free, statistical routines. Included are analysis of sensitivity tests; sample statistics for the normal, exponential, hypergeometric, Weibull, and extreme value distributions; three models of multiple regression analysis; x-y data plots; exact probabilities for RxC tables; n sets of m permuted integers in the range 1 to m; simple linear regression and correlation; K different random integers in the range m to n; and Fisher's exact test of independence for a 2 by 2 contingency table. Forty-five other subroutines in the library support the basic 15

  15. Estimation of plant sampling uncertainty: an example based on chemical analysis of moss samples.

    Science.gov (United States)

    Dołęgowska, Sabina

    2016-11-01

    In order to estimate the level of uncertainty arising from sampling, 54 samples (primary and duplicate) of the moss species Pleurozium schreberi (Brid.) Mitt. were collected within three forested areas (Wierna Rzeka, Piaski, Posłowice Range) in the Holy Cross Mountains (south-central Poland). During the fieldwork, each primary sample composed of 8 to 10 increments (subsamples) was taken over an area of 10 m 2 whereas duplicate samples were collected in the same way at a distance of 1-2 m. Subsequently, all samples were triple rinsed with deionized water, dried, milled, and digested (8 mL HNO 3 (1:1) + 1 mL 30 % H 2 O 2 ) in a closed microwave system Multiwave 3000. The prepared solutions were analyzed twice for Cu, Fe, Mn, and Zn using FAAS and GFAAS techniques. All datasets were checked for normality and for normally distributed elements (Cu from Piaski, Zn from Posłowice, Fe, Zn from Wierna Rzeka). The sampling uncertainty was computed with (i) classical ANOVA, (ii) classical RANOVA, (iii) modified RANOVA, and (iv) range statistics. For the remaining elements, the sampling uncertainty was calculated with traditional and/or modified RANOVA (if the amount of outliers did not exceed 10 %) or classical ANOVA after Box-Cox transformation (if the amount of outliers exceeded 10 %). The highest concentrations of all elements were found in moss samples from Piaski, whereas the sampling uncertainty calculated with different statistical methods ranged from 4.1 to 22 %.

  16. A Bayesian Method for Weighted Sampling

    OpenAIRE

    Lo, Albert Y.

    1993-01-01

    Bayesian statistical inference for sampling from weighted distribution models is studied. Small-sample Bayesian bootstrap clone (BBC) approximations to the posterior distribution are discussed. A second-order property for the BBC in unweighted i.i.d. sampling is given. A consequence is that BBC approximations to a posterior distribution of the mean and to the sampling distribution of the sample average, can be made asymptotically accurate by a proper choice of the random variables that genera...

  17. Performance modeling, loss networks, and statistical multiplexing

    CERN Document Server

    Mazumdar, Ravi

    2009-01-01

    This monograph presents a concise mathematical approach for modeling and analyzing the performance of communication networks with the aim of understanding the phenomenon of statistical multiplexing. The novelty of the monograph is the fresh approach and insights provided by a sample-path methodology for queueing models that highlights the important ideas of Palm distributions associated with traffic models and their role in performance measures. Also presented are recent ideas of large buffer, and many sources asymptotics that play an important role in understanding statistical multiplexing. I

  18. Statistics for Petroleum Engineers and Geoscientists

    International Nuclear Information System (INIS)

    Jensen, J.L.; Lake, L.W.; Corbett, P.W.M.; Goggin, D.J.

    2000-01-01

    Geostatistics is a common tool in reservoir characterisation. Several texts discuss the subject, however this book differs in its approach and audience from currently available material. Written from the basics of statistics it covers only those topics that are needed for the two goals of the text: to exhibit the diagnostic potential of statistics and to introduce the important features of statistical modeling. This revised edition contains expanded discussions of some materials, in particular conditional probabilities, Bayes Theorem, correlation, and Kriging. The coverage of estimation, variability, and modeling applications have been updated. Seventy examples illustrate concepts and show the role of geology for providing important information for data analysis and model building. Four reservoir case studies conclude the presentation, illustrating the application and importance of the earlier material. This book can help petroleum professionals develop more accurate models, leading to lower sampling costs

  19. Sampling the reference set’ revisited

    NARCIS (Netherlands)

    Berkum, van E.E.M.; Linssen, H.N.; Overdijk, D.A.

    1998-01-01

    The confidence level of an inference table is defined as a weighted truth probability of the inference when sampling the reference set. The reference set is recognized by conditioning on the values of maximal partially ancillary statistics. In the sampling experiment values of incidental parameters

  20. Statistics anxiety among undergraduate students in the faculty of ...

    African Journals Online (AJOL)

    The purpose of the study was to determine the level of statistics anxiety among undergraduate students, and whether the level of influenced by factor e.g gender and age. A sample of 100 third year students who enrolled for basic statistics in the University of Calabar was used for the study. A series of t-tests revealed that the ...

  1. Statistical Analysis Of Reconnaissance Geochemical Data From ...

    African Journals Online (AJOL)

    , Co, Mo, Hg, Sb, Tl, Sc, Cr, Ni, La, W, V, U, Th, Bi, Sr and Ga in 56 stream sediment samples collected from Orle drainage system were subjected to univariate and multivariate statistical analyses. The univariate methods used include ...

  2. Statistical Decision Theory Estimation, Testing, and Selection

    CERN Document Server

    Liese, Friedrich

    2008-01-01

    Suitable for advanced graduate students and researchers in mathematical statistics and decision theory, this title presents an account of the concepts and a treatment of the major results of classical finite sample size decision theory and modern asymptotic decision theory

  3. Statistical evaluation of fatty acid profile and cholesterol content in fish (common carp) lipids obtained by different sample preparation procedures.

    Science.gov (United States)

    Spiric, Aurelija; Trbovic, Dejana; Vranic, Danijela; Djinovic, Jasna; Petronijevic, Radivoj; Matekalo-Sverak, Vesna

    2010-07-05

    the second principal component (PC2) is recorded by C18:3 n-3, and C20:3 n-6, being present in a higher amount in the samples treated by the modified Soxhlet extraction, while C22:5 n-3, C20:3 n-3, C22:1 and C20:4, C16 and C18 negatively influence the score values of the PC2, showing significantly increased level in the samples treated by ASE method. Hotelling's paired T-square test used on the first three principal components for confirmation of differences in individual fatty acid content obtained by ASE and Soxhlet method in carp muscle showed statistically significant difference between these two data sets (T(2)=161.308, p<0.001). Copyright 2010 Elsevier B.V. All rights reserved.

  4. Statistical comparison of the geometry of second-phase particles

    Energy Technology Data Exchange (ETDEWEB)

    Benes, Viktor, E-mail: benesv@karlin.mff.cuni.cz [Charles University in Prague, Faculty of Mathematics and Physics, Department of Probability and Mathematical Statistics, Sokolovska 83, 186 75 Prague 8-Karlin (Czech Republic); Lechnerova, Radka, E-mail: radka.lech@seznam.cz [Private College on Economical Studies, Ltd., Lindnerova 575/1, 180 00 Prague 8-Liben (Czech Republic); Klebanov, Lev [Charles University in Prague, Faculty of Mathematics and Physics, Department of Probability and Mathematical Statistics, Sokolovska 83, 186 75 Prague 8-Karlin (Czech Republic); Slamova, Margarita, E-mail: slamova@vyzkum-kovu.cz [Research Institute for Metals, Ltd., Panenske Brezany 50, 250 70 Odolena Voda (Czech Republic); Slama, Peter [Research Institute for Metals, Ltd., Panenske Brezany 50, 250 70 Odolena Voda (Czech Republic)

    2009-10-15

    In microscopic studies of materials, there is often a need to provide a statistical test as to whether two microstructures are different or not. Typically, there are some random objects (particles, grains, pores) and the comparison concerns their density, individual geometrical parameters and their spatial distribution. The problem is that neighbouring objects observed in a single window cannot be assumed to be stochastically independent, therefore classical statistical testing based on random sampling is not applicable. The aim of the present paper is to develop a test based on N-distances in probability theory. Using the measurements from a few independent windows, we consider a two-sample test, which involves a large amount of information collected from each window. An application is presented consisting in a comparison of metallographic samples of aluminium alloys, and the results are interpreted.

  5. Statistical comparison of the geometry of second-phase particles

    International Nuclear Information System (INIS)

    Benes, Viktor; Lechnerova, Radka; Klebanov, Lev; Slamova, Margarita; Slama, Peter

    2009-01-01

    In microscopic studies of materials, there is often a need to provide a statistical test as to whether two microstructures are different or not. Typically, there are some random objects (particles, grains, pores) and the comparison concerns their density, individual geometrical parameters and their spatial distribution. The problem is that neighbouring objects observed in a single window cannot be assumed to be stochastically independent, therefore classical statistical testing based on random sampling is not applicable. The aim of the present paper is to develop a test based on N-distances in probability theory. Using the measurements from a few independent windows, we consider a two-sample test, which involves a large amount of information collected from each window. An application is presented consisting in a comparison of metallographic samples of aluminium alloys, and the results are interpreted.

  6. Sample size calculation in metabolic phenotyping studies.

    Science.gov (United States)

    Billoir, Elise; Navratil, Vincent; Blaise, Benjamin J

    2015-09-01

    The number of samples needed to identify significant effects is a key question in biomedical studies, with consequences on experimental designs, costs and potential discoveries. In metabolic phenotyping studies, sample size determination remains a complex step. This is due particularly to the multiple hypothesis-testing framework and the top-down hypothesis-free approach, with no a priori known metabolic target. Until now, there was no standard procedure available to address this purpose. In this review, we discuss sample size estimation procedures for metabolic phenotyping studies. We release an automated implementation of the Data-driven Sample size Determination (DSD) algorithm for MATLAB and GNU Octave. Original research concerning DSD was published elsewhere. DSD allows the determination of an optimized sample size in metabolic phenotyping studies. The procedure uses analytical data only from a small pilot cohort to generate an expanded data set. The statistical recoupling of variables procedure is used to identify metabolic variables, and their intensity distributions are estimated by Kernel smoothing or log-normal density fitting. Statistically significant metabolic variations are evaluated using the Benjamini-Yekutieli correction and processed for data sets of various sizes. Optimal sample size determination is achieved in a context of biomarker discovery (at least one statistically significant variation) or metabolic exploration (a maximum of statistically significant variations). DSD toolbox is encoded in MATLAB R2008A (Mathworks, Natick, MA) for Kernel and log-normal estimates, and in GNU Octave for log-normal estimates (Kernel density estimates are not robust enough in GNU octave). It is available at http://www.prabi.fr/redmine/projects/dsd/repository, with a tutorial at http://www.prabi.fr/redmine/projects/dsd/wiki. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  7. [Practical aspects regarding sample size in clinical research].

    Science.gov (United States)

    Vega Ramos, B; Peraza Yanes, O; Herrera Correa, G; Saldívar Toraya, S

    1996-01-01

    The knowledge of the right sample size let us to be sure if the published results in medical papers had a suitable design and a proper conclusion according to the statistics analysis. To estimate the sample size we must consider the type I error, type II error, variance, the size of the effect, significance and power of the test. To decide what kind of mathematics formula will be used, we must define what kind of study we have, it means if its a prevalence study, a means values one or a comparative one. In this paper we explain some basic topics of statistics and we describe four simple samples of estimation of sample size.

  8. Heterogeneous Rock Simulation Using DIP-Micromechanics-Statistical Methods

    Directory of Open Access Journals (Sweden)

    H. Molladavoodi

    2018-01-01

    Full Text Available Rock as a natural material is heterogeneous. Rock material consists of minerals, crystals, cement, grains, and microcracks. Each component of rock has a different mechanical behavior under applied loading condition. Therefore, rock component distribution has an important effect on rock mechanical behavior, especially in the postpeak region. In this paper, the rock sample was studied by digital image processing (DIP, micromechanics, and statistical methods. Using image processing, volume fractions of the rock minerals composing the rock sample were evaluated precisely. The mechanical properties of the rock matrix were determined based on upscaling micromechanics. In order to consider the rock heterogeneities effect on mechanical behavior, the heterogeneity index was calculated in a framework of statistical method. A Weibull distribution function was fitted to the Young modulus distribution of minerals. Finally, statistical and Mohr–Coulomb strain-softening models were used simultaneously as a constitutive model in DEM code. The acoustic emission, strain energy release, and the effect of rock heterogeneities on the postpeak behavior process were investigated. The numerical results are in good agreement with experimental data.

  9. Sample Based Unit Liter Dose Estimates

    International Nuclear Information System (INIS)

    JENSEN, L.

    1999-01-01

    The Tank Waste Characterization Program has taken many core samples, grab samples, and auger samples from the single-shell and double-shell tanks during the past 10 years. Consequently, the amount of sample data available has increased, both in terms of quantity of sample results and the number of tanks characterized. More and better data is available than when the current radiological and toxicological source terms used in the Basis for Interim Operation (BIO) (FDH 1999) and the Final Safety Analysis Report (FSAR) (FDH 1999) were developed. The Nuclear Safety and Licensing (NS and L) organization wants to use the new data to upgrade the radiological and toxicological source terms used in the BIO and FSAR. The NS and L organization requested assistance in developing a statistically based process for developing the source terms. This report describes the statistical techniques used and the assumptions made to support the development of a new radiological source term for liquid and solid wastes stored in single-shell and double-shell tanks

  10. Visual Sample Plan Version 7.0 User's Guide

    Energy Technology Data Exchange (ETDEWEB)

    Matzke, Brett D. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Newburn, Lisa LN [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Hathaway, John E. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Bramer, Lisa M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Wilson, John E. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Dowson, Scott T. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Sego, Landon H. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Pulsipher, Brent A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2014-03-01

    User's guide for VSP 7.0 This user's guide describes Visual Sample Plan (VSP) Version 7.0 and provides instructions for using the software. VSP selects the appropriate number and location of environmental samples to ensure that the results of statistical tests performed to provide input to risk decisions have the required confidence and performance. VSP Version 7.0 provides sample-size equations or algorithms needed by specific statistical tests appropriate for specific environmental sampling objectives. It also provides data quality assessment and statistical analysis functions to support evaluation of the data and determine whether the data support decisions regarding sites suspected of contamination. The easy-to-use program is highly visual and graphic. VSP runs on personal computers with Microsoft Windows operating systems (XP, Vista, Windows 7, and Windows 8). Designed primarily for project managers and users without expertise in statistics, VSP is applicable to two- and three-dimensional populations to be sampled (e.g., rooms and buildings, surface soil, a defined layer of subsurface soil, water bodies, and other similar applications) for studies of environmental quality. VSP is also applicable for designing sampling plans for assessing chem/rad/bio threat and hazard identification within rooms and buildings, and for designing geophysical surveys for unexploded ordnance (UXO) identification.

  11. Ash contents of foodstuff samples in environmental radioactivity analysis

    International Nuclear Information System (INIS)

    Oikawa, Shinji; Ohta, Hiroshi; Hayano, Kazuhiko; Nonaka, Nobuhiro

    2004-01-01

    Statistical data of the ash content in various environmental samples obtained from an environmental radioactivity survey project commissioned by the Japanese government of Science and Technology Agency (at present Ministry of Education, Culture, Sports, Sciences and Technology) during the past 10 years are expressed for establishing a standard of ash content in environmental samples based on radioactivity analysis. The ash content for some kinds of environmental samples such as dietary food, milk, Japanese radish, spinach, fish, green tea and potato was reviewed in the light of statistical and stochastic viewpoints. For all of the samples reviewed in this paper, the coefficient of variation varied from 4.7% for milk to 36.3% for cabbage. Dietary food and milk samples were reviewed more than 1900 and 1400 samples, respectively. Especially, ash content of dietary food depended mainly on the dietary culture reflected on the period. However it showed an almost invariant distribution within 18.7% of coefficient of variation during the past 10 years. Pretreatment of environmental samples especially ashing processes are important from the viewpoint on environmental radioactivity analysis, which is one of the especial fields in analytical chemistry. Statistical reviewed data obtained in this paper may be useful for sample preparation. (author)

  12. A Validity Study: Attitudes towards Statistics among Japanese College Students

    Science.gov (United States)

    Satake, Eike

    2015-01-01

    This cross-cultural study investigated the relationship between attitudes toward statistics (ATS) and course achievement (CA) among Japanese college students. The sample consisted of 135 male and 134 female students from the first two-year liberal arts program of a four-year college in Tokyo, Japan. Attitudes about statistics were measured using…

  13. Log-concave Probability Distributions: Theory and Statistical Testing

    DEFF Research Database (Denmark)

    An, Mark Yuing

    1996-01-01

    This paper studies the broad class of log-concave probability distributions that arise in economics of uncertainty and information. For univariate, continuous, and log-concave random variables we prove useful properties without imposing the differentiability of density functions. Discrete...... and multivariate distributions are also discussed. We propose simple non-parametric testing procedures for log-concavity. The test statistics are constructed to test one of the two implicati ons of log-concavity: increasing hazard rates and new-is-better-than-used (NBU) property. The test for increasing hazard...... rates are based on normalized spacing of the sample order statistics. The tests for NBU property fall into the category of Hoeffding's U-statistics...

  14. THE SAMPLING PROCESS IN THE FINANCIAL AUDIT .TECHNICAL PRACTICE APPROACH

    Directory of Open Access Journals (Sweden)

    Cardos Vasile-Daniel

    2014-12-01

    “Audit sampling” (sampling assumes appliancing audit procedures for less than 100% of the elements within an account or a trasaction class balance, such that all the samples will be selected. This will allow the auditor to obtain and to evaluate the audit evidence on some features for the selected elements, in purpose to assist or to express a conclusion regardind the population within the sample was extracted. The sampling in audit can use both a statistical or a non-statistical approach. (THE AUDIT INTERNATIONAl STANDARD 530 –THE SAMPLING IN AUDIT AND OTHER SELECTIVE TESTING PROCEDURES

  15. THE SAMPLING PROCESS IN THE FINANCIAL AUDIT .TECHNICAL PRACTICE APPROACH

    Directory of Open Access Journals (Sweden)

    GRIGORE MARIAN

    2014-07-01

    “Audit sampling” (sampling assumes appliancing audit procedures for less than 100% of the elements within an account or a trasaction class balance, such that all the samples will be selected. This will allow the auditor to obtain and to evaluate the audit evidence on some features for the selected elements, in purpose to assist or to express a conclusion regardind the population within the sample was extracted. The sampling in audit can use both a statistical or a non-statistical approach. (THE AUDIT INTERNATIONAl STANDARD 530 –THE SAMPLING IN AUDIT AND OTHER SELECTIVE TESTING PROCEDURES

  16. Statistical Sampling Handbook for Student Aid Programs: A Reference for Non-Statisticians. Winter 1984.

    Science.gov (United States)

    Office of Student Financial Assistance (ED), Washington, DC.

    A manual on sampling is presented to assist audit and program reviewers, project officers, managers, and program specialists of the U.S. Office of Student Financial Assistance (OSFA). For each of the following types of samples, definitions and examples are provided, along with information on advantages and disadvantages: simple random sampling,…

  17. The (mis)reporting of statistical results in psychology journals.

    Science.gov (United States)

    Bakker, Marjan; Wicherts, Jelte M

    2011-09-01

    In order to study the prevalence, nature (direction), and causes of reporting errors in psychology, we checked the consistency of reported test statistics, degrees of freedom, and p values in a random sample of high- and low-impact psychology journals. In a second study, we established the generality of reporting errors in a random sample of recent psychological articles. Our results, on the basis of 281 articles, indicate that around 18% of statistical results in the psychological literature are incorrectly reported. Inconsistencies were more common in low-impact journals than in high-impact journals. Moreover, around 15% of the articles contained at least one statistical conclusion that proved, upon recalculation, to be incorrect; that is, recalculation rendered the previously significant result insignificant, or vice versa. These errors were often in line with researchers' expectations. We classified the most common errors and contacted authors to shed light on the origins of the errors.

  18. Statistical and physical evolution of QSO's

    International Nuclear Information System (INIS)

    Caditz, D.; Petrosian, V.

    1989-09-01

    The relationship between the physical evolution of discrete extragalactic sources, the statistical evolution of the observed population of sources, and the cosmological model is discussed. Three simple forms of statistical evolution: pure luminosity evolution (PLE), pure density evolution (PDE), and generalized luminosity evolution (GLE), are considered in detail together with what these forms imply about the physical evolution of individual sources. Two methods are used to analyze the statistical evolution of the observed distribution of QSO's (quasars) from combined flux limited samples. It is shown that both PLE and PDE are inconsistent with the data over the redshift range 0 less than z less than 2.2, and that a more complicated form of evolution such as GLE is required, independent of the cosmological model. This result is important for physical models of AGN, and in particular, for the accretion disk model which recent results show may be inconsistent with PLE

  19. 222Rn in water: A comparison of two sample collection methods and two sample transport methods, and the determination of temporal variation in North Carolina ground water

    International Nuclear Information System (INIS)

    Hightower, J.H. III

    1994-01-01

    Objectives of this field experiment were: (1) determine whether there was a statistically significant difference between the radon concentrations of samples collected by EPA's standard method, using a syringe, and an alternative, slow-flow method; (2) determine whether there was a statistically significant difference between the measured radon concentrations of samples mailed vs samples not mailed; and (3) determine whether there was a temporal variation of water radon concentration over a 7-month period. The field experiment was conducted at 9 sites, 5 private wells, and 4 public wells, at various locations in North Carolina. Results showed that a syringe is not necessary for sample collection, there was generally no significant radon loss due to mailing samples, and there was statistically significant evidence of temporal variations in water radon concentrations

  20. "What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"

    Science.gov (United States)

    Ozturk, Elif

    2012-01-01

    The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…

  1. Two sampling techniques for game meat

    OpenAIRE

    van der Merwe, Maretha; Jooste, Piet J.; Hoffman, Louw C.; Calitz, Frikkie J.

    2013-01-01

    A study was conducted to compare the excision sampling technique used by the export market and the sampling technique preferred by European countries, namely the biotrace cattle and swine test. The measuring unit for the excision sampling was grams (g) and square centimetres (cm2) for the swabbing technique. The two techniques were compared after a pilot test was conducted on spiked approved beef carcasses (n = 12) that statistically proved the two measuring units correlated. The two sampling...

  2. Statistical thermodynamics

    International Nuclear Information System (INIS)

    Lim, Gyeong Hui

    2008-03-01

    This book consists of 15 chapters, which are basic conception and meaning of statistical thermodynamics, Maxwell-Boltzmann's statistics, ensemble, thermodynamics function and fluctuation, statistical dynamics with independent particle system, ideal molecular system, chemical equilibrium and chemical reaction rate in ideal gas mixture, classical statistical thermodynamics, ideal lattice model, lattice statistics and nonideal lattice model, imperfect gas theory on liquid, theory on solution, statistical thermodynamics of interface, statistical thermodynamics of a high molecule system and quantum statistics

  3. A statistical procedure for the qualification of indoor dust

    International Nuclear Information System (INIS)

    Scapin, Valdirene O.; Scapin, Marcos A.; Ribeiro, Andreza P.; Sato, Ivone M.

    2009-01-01

    The materials science advance has contributed to the humanity. Notwithstanding, serious environmental and human health problems are often observed. Thereby, many worldwide researchers have focused their work to diagnose, assess and monitor several environmental systems. In this work, a statistical procedure (on a 0.05 significance level) that allows verifying if indoor dust samples have characteristics of soil/sediment is presented. Dust samples were collected from 69 residences using a domestic vacuum cleaner in four neighborhoods of the Sao Paulo metropolitan region, Brazil, between 2006 and 2008. The samples were sieved in the fractions of 150-75 (C), 75-63 (M) and <63 μm (F). The elemental concentrations were determined by X-ray fluorescence (WDXRF). Afterwards, the indoor samples results (group A) were compared to the group of 109 certificated reference materials, which included different kinds of geological matrices, such as clay, sediment, sand and sludge (group B) and to the continental crust values (group C). Initially, the Al/Si ratio was calculated for the groups (A, B, C). The variance analysis (ANOVA), followed by Tukey test, was used to find out if there was a significant difference between the concentration means of the considered groups. According to the statistical tests; the group B presented results that are considered different from others. The interquartile range (IQR) was used to detected outlier values. ANOVA was applied again and the results (p ≥ 0.05) showed equality between ratios means of the three groups. Accordingly, the results suggest that the indoor dust samples have characteristic of soil/sediment. The statistical procedure may be used as a tool to clear the information about contaminants in dust samples, since they have characteristic of soil and may be compared with values reported by environmental control organisms. (author)

  4. Statistical Characterization of Dispersed Single-Wall Carbon Nanotube Quantum Dots

    International Nuclear Information System (INIS)

    Shimizu, M; Moriyama, S; Suzuki, M; Fuse, T; Homma, Y; Ishibashi, K

    2006-01-01

    Quantum dots have been fabricated in single-wall carbon nanotubes (SWCNTs) simply by depositing metallic contacts on top of them. The fabricated quantum dots show different characteristics from sample to sample, which are even different in samples fabricated in the same chip. In this report, we study the statistical variations of the quantum dots fabricated with our method, and suggest their possible origin

  5. [Statistical study of the incidence of agenesis in a sample of 1529 subjects].

    Science.gov (United States)

    Lo Muzio, L; Mignogna, M D; Bucci, P; Sorrentino, F

    1989-09-01

    Following a short review of the main aetiopathogenetic theories on dental agenesia, a personal statistical study of this pathology is reported. 1529 orthopantomographs of juveniles aged between 7 and 14 were examined. 79 cases of hypodentia were observed (5.2%), 32 in males (4.05%) and 47 in females (6.78%). The most interesting tooth was the second premolar with an incidence of 58.9% followed by the lateral incisor, with an incidence of 26.38%. This is in agreement with the international literature.

  6. Statistics for Learning Genetics

    Science.gov (United States)

    Charles, Abigail Sheena

    This study investigated the knowledge and skills that biology students may need to help them understand statistics/mathematics as it applies to genetics. The data are based on analyses of current representative genetics texts, practicing genetics professors' perspectives, and more directly, students' perceptions of, and performance in, doing statistically-based genetics problems. This issue is at the emerging edge of modern college-level genetics instruction, and this study attempts to identify key theoretical components for creating a specialized biological statistics curriculum. The goal of this curriculum will be to prepare biology students with the skills for assimilating quantitatively-based genetic processes, increasingly at the forefront of modern genetics. To fulfill this, two college level classes at two universities were surveyed. One university was located in the northeastern US and the other in the West Indies. There was a sample size of 42 students and a supplementary interview was administered to a select 9 students. Interviews were also administered to professors in the field in order to gain insight into the teaching of statistics in genetics. Key findings indicated that students had very little to no background in statistics (55%). Although students did perform well on exams with 60% of the population receiving an A or B grade, 77% of them did not offer good explanations on a probability question associated with the normal distribution provided in the survey. The scope and presentation of the applicable statistics/mathematics in some of the most used textbooks in genetics teaching, as well as genetics syllabi used by instructors do not help the issue. It was found that the text books, often times, either did not give effective explanations for students, or completely left out certain topics. The omission of certain statistical/mathematical oriented topics was seen to be also true with the genetics syllabi reviewed for this study. Nonetheless

  7. Robust functional statistics applied to Probability Density Function shape screening of sEMG data.

    Science.gov (United States)

    Boudaoud, S; Rix, H; Al Harrach, M; Marin, F

    2014-01-01

    Recent studies pointed out possible shape modifications of the Probability Density Function (PDF) of surface electromyographical (sEMG) data according to several contexts like fatigue and muscle force increase. Following this idea, criteria have been proposed to monitor these shape modifications mainly using High Order Statistics (HOS) parameters like skewness and kurtosis. In experimental conditions, these parameters are confronted with small sample size in the estimation process. This small sample size induces errors in the estimated HOS parameters restraining real-time and precise sEMG PDF shape monitoring. Recently, a functional formalism, the Core Shape Model (CSM), has been used to analyse shape modifications of PDF curves. In this work, taking inspiration from CSM method, robust functional statistics are proposed to emulate both skewness and kurtosis behaviors. These functional statistics combine both kernel density estimation and PDF shape distances to evaluate shape modifications even in presence of small sample size. Then, the proposed statistics are tested, using Monte Carlo simulations, on both normal and Log-normal PDFs that mimic observed sEMG PDF shape behavior during muscle contraction. According to the obtained results, the functional statistics seem to be more robust than HOS parameters to small sample size effect and more accurate in sEMG PDF shape screening applications.

  8. [Statistics for statistics?--Thoughts about psychological tools].

    Science.gov (United States)

    Berger, Uwe; Stöbel-Richter, Yve

    2007-12-01

    Statistical methods take a prominent place among psychologists' educational programs. Being known as difficult to understand and heavy to learn, students fear of these contents. Those, who do not aspire after a research carrier at the university, will forget the drilled contents fast. Furthermore, because it does not apply for the work with patients and other target groups at a first glance, the methodological education as a whole was often questioned. For many psychological practitioners the statistical education makes only sense by enforcing respect against other professions, namely physicians. For the own business, statistics is rarely taken seriously as a professional tool. The reason seems to be clear: Statistics treats numbers, while psychotherapy treats subjects. So, does statistics ends in itself? With this article, we try to answer the question, if and how statistical methods were represented within the psychotherapeutical and psychological research. Therefore, we analyzed 46 Originals of a complete volume of the journal Psychotherapy, Psychosomatics, Psychological Medicine (PPmP). Within the volume, 28 different analyse methods were applied, from which 89 per cent were directly based upon statistics. To be able to write and critically read Originals as a backbone of research, presumes a high degree of statistical education. To ignore statistics means to ignore research and at least to reveal the own professional work to arbitrariness.

  9. Notes on the MUF-D statistic

    International Nuclear Information System (INIS)

    Picard, R.R.

    1987-01-01

    Many aspects of the MUF-D statistic, used for verification of accountability data, have been examined in the safeguards literature. In this paper, basic MUF-D results are extended to more general environments than are usually considered. These environments include arbitrary measurement error structures, various sampling regimes that could be imposed by the inspectorate, and the attributes/variables framework

  10. The thresholds for statistical and clinical significance

    DEFF Research Database (Denmark)

    Jakobsen, Janus Christian; Gluud, Christian; Winkel, Per

    2014-01-01

    BACKGROUND: Thresholds for statistical significance are insufficiently demonstrated by 95% confidence intervals or P-values when assessing results from randomised clinical trials. First, a P-value only shows the probability of getting a result assuming that the null hypothesis is true and does...... not reflect the probability of getting a result assuming an alternative hypothesis to the null hypothesis is true. Second, a confidence interval or a P-value showing significance may be caused by multiplicity. Third, statistical significance does not necessarily result in clinical significance. Therefore...... of the probability that a given trial result is compatible with a 'null' effect (corresponding to the P-value) divided by the probability that the trial result is compatible with the intervention effect hypothesised in the sample size calculation; (3) adjust the confidence intervals and the statistical significance...

  11. Statistical analysis of temperature data sampled at Station-M in the Norwegian Sea

    Science.gov (United States)

    Lorentzen, Torbjørn

    2014-02-01

    The paper analyzes sea temperature data sampled at Station-M in the Norwegian Sea. The data cover the period 1948-2010. The following questions are addressed: What type of stochastic process characterizes the temperature series? Are there any changes or patterns which indicate climate change? Are there any characteristics in the data which can be linked to the shrinking sea-ice in the Arctic area? Can the series be modeled consistently and applied in forecasting of the future sea temperature? The paper applies the following methods: Augmented Dickey-Fuller tests for testing of unit-root and stationarity, ARIMA-models in univariate modeling, cointegration and error-correcting models are applied in estimating short- and long-term dynamics of non-stationary series, Granger-causality tests in analyzing the interaction pattern between the deep and upper layer temperatures, and simultaneous equation systems are applied in forecasting future temperature. The paper shows that temperature at 2000 m Granger-causes temperature at 150 m, and that the 2000 m series can represent an important information carrier of the long-term development of the sea temperature in the geographical area. Descriptive statistics shows that the temperature level has been on a positive trend since the beginning of the 1980s which is also measured in most of the oceans in the North Atlantic. The analysis shows that the temperature series are cointegrated which means they share the same long-term stochastic trend and they do not diverge too far from each other. The measured long-term temperature increase is one of the factors that can explain the shrinking summer sea-ice in the Arctic region. The analysis shows that there is a significant negative correlation between the shrinking sea ice and the sea temperature at Station-M. The paper shows that the temperature forecasts are conditioned on the properties of the stochastic processes, causality pattern between the variables and specification of model

  12. Public and patient involvement in quantitative health research: A statistical perspective.

    Science.gov (United States)

    Hannigan, Ailish

    2018-06-19

    The majority of studies included in recent reviews of impact for public and patient involvement (PPI) in health research had a qualitative design. PPI in solely quantitative designs is underexplored, particularly its impact on statistical analysis. Statisticians in practice have a long history of working in both consultative (indirect) and collaborative (direct) roles in health research, yet their perspective on PPI in quantitative health research has never been explicitly examined. To explore the potential and challenges of PPI from a statistical perspective at distinct stages of quantitative research, that is sampling, measurement and statistical analysis, distinguishing between indirect and direct PPI. Statistical analysis is underpinned by having a representative sample, and a collaborative or direct approach to PPI may help achieve that by supporting access to and increasing participation of under-represented groups in the population. Acknowledging and valuing the role of lay knowledge of the context in statistical analysis and in deciding what variables to measure may support collective learning and advance scientific understanding, as evidenced by the use of participatory modelling in other disciplines. A recurring issue for quantitative researchers, which reflects quantitative sampling methods, is the selection and required number of PPI contributors, and this requires further methodological development. Direct approaches to PPI in quantitative health research may potentially increase its impact, but the facilitation and partnership skills required may require further training for all stakeholders, including statisticians. © 2018 The Authors Health Expectations published by John Wiley & Sons Ltd.

  13. Double asymptotics for the chi-square statistic.

    Science.gov (United States)

    Rempała, Grzegorz A; Wesołowski, Jacek

    2016-12-01

    Consider distributional limit of the Pearson chi-square statistic when the number of classes m n increases with the sample size n and [Formula: see text]. Under mild moment conditions, the limit is Gaussian for λ = ∞, Poisson for finite λ > 0, and degenerate for λ = 0.

  14. Correlated statistical uncertainties in coded-aperture imaging

    International Nuclear Information System (INIS)

    Fleenor, Matthew C.; Blackston, Matthew A.; Ziock, Klaus P.

    2015-01-01

    In nuclear security applications, coded-aperture imagers can provide a wealth of information regarding the attributes of both the radioactive and nonradioactive components of the objects being imaged. However, for optimum benefit to the community, spatial attributes need to be determined in a quantitative and statistically meaningful manner. To address a deficiency of quantifiable errors in coded-aperture imaging, we present uncertainty matrices containing covariance terms between image pixels for MURA mask patterns. We calculated these correlated uncertainties as functions of variation in mask rank, mask pattern over-sampling, and whether or not anti-mask data are included. Utilizing simulated point source data, we found that correlations arose when two or more image pixels were summed. Furthermore, we found that the presence of correlations was heightened by the process of over-sampling, while correlations were suppressed by the inclusion of anti-mask data and with increased mask rank. As an application of this result, we explored how statistics-based alarming is impacted in a radiological search scenario

  15. Progressive statistics for studies in sports medicine and exercise science.

    Science.gov (United States)

    Hopkins, William G; Marshall, Stephen W; Batterham, Alan M; Hanin, Juri

    2009-01-01

    Statistical guidelines and expert statements are now available to assist in the analysis and reporting of studies in some biomedical disciplines. We present here a more progressive resource for sample-based studies, meta-analyses, and case studies in sports medicine and exercise science. We offer forthright advice on the following controversial or novel issues: using precision of estimation for inferences about population effects in preference to null-hypothesis testing, which is inadequate for assessing clinical or practical importance; justifying sample size via acceptable precision or confidence for clinical decisions rather than via adequate power for statistical significance; showing SD rather than SEM, to better communicate the magnitude of differences in means and nonuniformity of error; avoiding purely nonparametric analyses, which cannot provide inferences about magnitude and are unnecessary; using regression statistics in validity studies, in preference to the impractical and biased limits of agreement; making greater use of qualitative methods to enrich sample-based quantitative projects; and seeking ethics approval for public access to the depersonalized raw data of a study, to address the need for more scrutiny of research and better meta-analyses. Advice on less contentious issues includes the following: using covariates in linear models to adjust for confounders, to account for individual differences, and to identify potential mechanisms of an effect; using log transformation to deal with nonuniformity of effects and error; identifying and deleting outliers; presenting descriptive, effect, and inferential statistics in appropriate formats; and contending with bias arising from problems with sampling, assignment, blinding, measurement error, and researchers' prejudices. This article should advance the field by stimulating debate, promoting innovative approaches, and serving as a useful checklist for authors, reviewers, and editors.

  16. Sampling Transition Pathways in Highly Correlated Complex Systems

    Energy Technology Data Exchange (ETDEWEB)

    Chandler, David

    2004-10-20

    This research grant supported my group's efforts to apply and extend the method of transition path sampling that we invented during the late 1990s. This methodology is based upon a statistical mechanics of trajectory space. Traditional statistical mechanics focuses on state space, and with it, one can use Monte Carlo methods to facilitate importance sampling of states. With our formulation of a statistical mechanics of trajectory space, we have succeeded at creating algorithms by which importance sampling can be done for dynamical processes. In particular, we are able to study rare but important events without prior knowledge of transition states or mechanisms. In perhaps the most impressive application of transition path sampling, my group combined forces with Michele Parrinello and his coworkers to unravel the dynamics of auto ionization of water [5]. This dynamics is the fundamental kinetic step of pH. Other applications concern nature of dynamics far from equilibrium [1, 7], nucleation processes [2], cluster isomerization, melting and dissociation [3, 6], and molecular motors [10]. Research groups throughout the world are adopting transition path sampling. In part this has been the result of our efforts to provide pedagogical presentations of the technique [4, 8, 9], as well as providing new procedures for interpreting trajectories of complex systems [11].

  17. Statistical physics of medical ultrasonic images

    International Nuclear Information System (INIS)

    Wagner, R.F.; Insana, M.F.; Brown, D.G.; Smith, S.W.

    1987-01-01

    The physical and statistical properties of backscattered signals in medical ultrasonic imaging are reviewed in terms of: 1) the radiofrequency signal; 2) the envelope (video or magnitude) signal; and 3) the density of samples in simple and in compounded images. There is a wealth of physical information in backscattered signals in medical ultrasound. This information is contained in the radiofrequency spectrum - which is not typically displayed to the viewer - as well as in the higher statistical moments of the envelope or video signal - which are not readily accessed by the human viewer of typical B-scans. This information may be extracted from the detected backscattered signals by straightforward signal processing techniques at low resolution

  18. Texture classification by texton: statistical versus binary.

    Directory of Open Access Journals (Sweden)

    Zhenhua Guo

    Full Text Available Using statistical textons for texture classification has shown great success recently. The maximal response 8 (Statistical_MR8, image patch (Statistical_Joint and locally invariant fractal (Statistical_Fractal are typical statistical texton algorithms and state-of-the-art texture classification methods. However, there are two limitations when using these methods. First, it needs a training stage to build a texton library, thus the recognition accuracy will be highly depended on the training samples; second, during feature extraction, local feature is assigned to a texton by searching for the nearest texton in the whole library, which is time consuming when the library size is big and the dimension of feature is high. To address the above two issues, in this paper, three binary texton counterpart methods were proposed, Binary_MR8, Binary_Joint, and Binary_Fractal. These methods do not require any training step but encode local feature into binary representation directly. The experimental results on the CUReT, UIUC and KTH-TIPS databases show that binary texton could get sound results with fast feature extraction, especially when the image size is not big and the quality of image is not poor.

  19. Analysis of stress corrosion data by means of the statistic of extreme values

    International Nuclear Information System (INIS)

    Imarisio, G.; Lanza, F.

    1978-01-01

    The possibility of examining stress corosion by means of extreme statistic was proposed. A series of test in boiling MgCl 2 of samples made on AISI 304 have been performed. Evolution of cracks dimension and time of life of samples was followed. It has been shown that the dimensions of the maximum cracks on the sample corroded for different times can be organized following the extreme values statistic. Also the life time of sample can be treated in the same way. A confirmation has been obtained using data taken from literature. Possible uses of predictions obtained with this type of analysis have been underlined. An extension of the toward less corrosive media and samples of several volumes is suggested to check the validity of the method

  20. Building a stronger child dental health system in Australia: statistical sampling masks the burden of dental disease distribution in Australian children.

    Science.gov (United States)

    Tennant, M; Kruger, E

    2014-01-01

    In Australia, over the past 30 years, the prevalence of dental decay in children has reduced significantly, where today 60-70% of all 12-year-olds are caries free, and only 10% of children have more than two decayed teeth. However, many studies continue to report a small but significant subset of children suffering severe levels of decay. The present study applies Monte Carlo simulation to examine, at the national level, 12-year-old decayed, missing or filled teeth and shed light on both the statistical limitation of Australia's reporting to date as well as the problem of targeting high-risk children. A simulation for 273 000 Australian 12-year-old children found that moving from different levels of geographic clustering produced different statistical influences that drive different conclusions. At the high scale (ie state level) the gross averaging of the non-normally distributed disease burden masks the small subset of disease bearing children. At the much higher acuity of analysis (ie local government area) the risk of low numbers in the sample becomes a significant issue. The results clearly highlight the importance of care when examining the existing data, and, second, opportunities for far greater levels of targeting of services to children in need. The sustainability (and fairness) of universal coverage systems needs to be examined to ensure they remain highly targeted at disease burden, and not just focused on the children that are easy to reach (and suffer the least disease).

  1. An introduction to Bayesian statistics in health psychology.

    Science.gov (United States)

    Depaoli, Sarah; Rus, Holly M; Clifton, James P; van de Schoot, Rens; Tiemensma, Jitske

    2017-09-01

    The aim of the current article is to provide a brief introduction to Bayesian statistics within the field of health psychology. Bayesian methods are increasing in prevalence in applied fields, and they have been shown in simulation research to improve the estimation accuracy of structural equation models, latent growth curve (and mixture) models, and hierarchical linear models. Likewise, Bayesian methods can be used with small sample sizes since they do not rely on large sample theory. In this article, we discuss several important components of Bayesian statistics as they relate to health-based inquiries. We discuss the incorporation and impact of prior knowledge into the estimation process and the different components of the analysis that should be reported in an article. We present an example implementing Bayesian estimation in the context of blood pressure changes after participants experienced an acute stressor. We conclude with final thoughts on the implementation of Bayesian statistics in health psychology, including suggestions for reviewing Bayesian manuscripts and grant proposals. We have also included an extensive amount of online supplementary material to complement the content presented here, including Bayesian examples using many different software programmes and an extensive sensitivity analysis examining the impact of priors.

  2. Calculating statistical distributions from operator relations: The statistical distributions of various intermediate statistics

    International Nuclear Information System (INIS)

    Dai, Wu-Sheng; Xie, Mi

    2013-01-01

    In this paper, we give a general discussion on the calculation of the statistical distribution from a given operator relation of creation, annihilation, and number operators. Our result shows that as long as the relation between the number operator and the creation and annihilation operators can be expressed as a † b=Λ(N) or N=Λ −1 (a † b), where N, a † , and b denote the number, creation, and annihilation operators, i.e., N is a function of quadratic product of the creation and annihilation operators, the corresponding statistical distribution is the Gentile distribution, a statistical distribution in which the maximum occupation number is an arbitrary integer. As examples, we discuss the statistical distributions corresponding to various operator relations. In particular, besides the Bose–Einstein and Fermi–Dirac cases, we discuss the statistical distributions for various schemes of intermediate statistics, especially various q-deformation schemes. Our result shows that the statistical distributions corresponding to various q-deformation schemes are various Gentile distributions with different maximum occupation numbers which are determined by the deformation parameter q. This result shows that the results given in much literature on the q-deformation distribution are inaccurate or incomplete. -- Highlights: ► A general discussion on calculating statistical distribution from relations of creation, annihilation, and number operators. ► A systemic study on the statistical distributions corresponding to various q-deformation schemes. ► Arguing that many results of q-deformation distributions in literature are inaccurate or incomplete

  3. Application of mathematical statistics methods to study fluorite deposits

    International Nuclear Information System (INIS)

    Chermeninov, V.B.

    1980-01-01

    Considered are the applicability of mathematical-statistical methods for the increase of reliability of sampling and geological tasks (study of regularities of ore formation). Compared is the reliability of core sampling (regarding the selective abrasion of fluorite) and neutron activation logging for fluorine. The core sampling data are characterized by higher dispersion than neutron activation logging results (mean value of variation coefficients are 75% and 56% respectively). However the hypothesis of the equality of average two sampling is confirmed; this fact testifies to the absence of considerable variability of ore bodies

  4. Fungi identify the geographic origin of dust samples.

    Directory of Open Access Journals (Sweden)

    Neal S Grantham

    Full Text Available There is a long history of archaeologists and forensic scientists using pollen found in a dust sample to identify its geographic origin or history. Such palynological approaches have important limitations as they require time-consuming identification of pollen grains, a priori knowledge of plant species distributions, and a sufficient diversity of pollen types to permit spatial or temporal identification. We demonstrate an alternative approach based on DNA sequencing analyses of the fungal diversity found in dust samples. Using nearly 1,000 dust samples collected from across the continental U.S., our analyses identify up to 40,000 fungal taxa from these samples, many of which exhibit a high degree of geographic endemism. We develop a statistical learning algorithm via discriminant analysis that exploits this geographic endemicity in the fungal diversity to correctly identify samples to within a few hundred kilometers of their geographic origin with high probability. In addition, our statistical approach provides a measure of certainty for each prediction, in contrast with current palynology methods that are almost always based on expert opinion and devoid of statistical inference. Fungal taxa found in dust samples can therefore be used to identify the origin of that dust and, more importantly, we can quantify our degree of certainty that a sample originated in a particular place. This work opens up a new approach to forensic biology that could be used by scientists to identify the origin of dust or soil samples found on objects, clothing, or archaeological artifacts.

  5. Statistical Process Control in a Modern Production Environment

    DEFF Research Database (Denmark)

    Windfeldt, Gitte Bjørg

    gathered here and standard statistical software. In Paper 2 a new method for process monitoring is introduced. The method uses a statistical model of the quality characteristic and a sliding window of observations to estimate the probability that the next item will not respect the specications......Paper 1 is aimed at practicians to help them test the assumption that the observations in a sample are independent and identically distributed. An assumption that is essential when using classical Shewhart charts. The test can easily be performed in the control chart setup using the samples....... If the estimated probability exceeds a pre-determined threshold the process will be stopped. The method is exible, allowing a complexity in modeling that remains invisible to the end user. Furthermore, the method allows to build diagnostic plots based on the parameters estimates that can provide valuable insight...

  6. In vivo Comet assay--statistical analysis and power calculations of mice testicular cells.

    Science.gov (United States)

    Hansen, Merete Kjær; Sharma, Anoop Kumar; Dybdahl, Marianne; Boberg, Julie; Kulahci, Murat

    2014-11-01

    The in vivo Comet assay is a sensitive method for evaluating DNA damage. A recurrent concern is how to analyze the data appropriately and efficiently. A popular approach is to summarize the raw data into a summary statistic prior to the statistical analysis. However, consensus on which summary statistic to use has yet to be reached. Another important consideration concerns the assessment of proper sample sizes in the design of Comet assay studies. This study aims to identify a statistic suitably summarizing the % tail DNA of mice testicular samples in Comet assay studies. A second aim is to provide curves for this statistic outlining the number of animals and gels to use. The current study was based on 11 compounds administered via oral gavage in three doses to male mice: CAS no. 110-26-9, CAS no. 512-56-1, CAS no. 111873-33-7, CAS no. 79-94-7, CAS no. 115-96-8, CAS no. 598-55-0, CAS no. 636-97-5, CAS no. 85-28-9, CAS no. 13674-87-8, CAS no. 43100-38-5 and CAS no. 60965-26-6. Testicular cells were examined using the alkaline version of the Comet assay and the DNA damage was quantified as % tail DNA using a fully automatic scoring system. From the raw data 23 summary statistics were examined. A linear mixed-effects model was fitted to the summarized data and the estimated variance components were used to generate power curves as a function of sample size. The statistic that most appropriately summarized the within-sample distributions was the median of the log-transformed data, as it most consistently conformed to the assumptions of the statistical model. Power curves for 1.5-, 2-, and 2.5-fold changes of the highest dose group compared to the control group when 50 and 100 cells were scored per gel are provided to aid in the design of future Comet assay studies on testicular cells. Copyright © 2014 Elsevier B.V. All rights reserved.

  7. How to construct the statistic network? An association network of herbaceous

    Directory of Open Access Journals (Sweden)

    WenJun Zhang

    2012-06-01

    Full Text Available In present study I defined a new type of network, the statistic network. The statistic network is a weighted and non-deterministic network. In the statistic network, a connection value, i.e., connection weight, represents connection strength and connection likelihood between two nodes and its absolute value falls in the interval (0,1]. The connection value is expressed as a statistical measure such as correlation coefficient, association coefficient, or Jaccard coefficient, etc. In addition, all connections of the statistic network can be statistically tested for their validity. A connection is true if the connection value is statistically significant. If all connection values of a node are not statistically significant, it is an isolated node. An isolated node has not any connection to other nodes in the statistic network. Positive and negative connection values denote distinct connectiontypes (positive or negative association or interaction. In the statistic network, two nodes with the greater connection value will show more similar trend in the change of their states. At any time we can obtain a sample network of the statistic network. A sample network is a non-weighted and deterministic network. Thestatistic network, in particular the plant association network that constructed from field sampling, is mostly an information network. Most of the interspecific relationships in plant community are competition and cooperation. Therefore in comparison to animal networks, the methodology of statistic network is moresuitable to construct plant association networks. Some conclusions were drawn from this study: (1 in the plant association network, most connections are weak and positive interactions. The association network constructed from Spearman rank correlation has most connections and isolated taxa are fewer. From net linear correlation,linear correlation, to Spearman rank correlation, the practical number of connections and connectance in the

  8. Efficient Parallel Statistical Model Checking of Biochemical Networks

    Directory of Open Access Journals (Sweden)

    Paolo Ballarini

    2009-12-01

    Full Text Available We consider the problem of verifying stochastic models of biochemical networks against behavioral properties expressed in temporal logic terms. Exact probabilistic verification approaches such as, for example, CSL/PCTL model checking, are undermined by a huge computational demand which rule them out for most real case studies. Less demanding approaches, such as statistical model checking, estimate the likelihood that a property is satisfied by sampling executions out of the stochastic model. We propose a methodology for efficiently estimating the likelihood that a LTL property P holds of a stochastic model of a biochemical network. As with other statistical verification techniques, the methodology we propose uses a stochastic simulation algorithm for generating execution samples, however there are three key aspects that improve the efficiency: first, the sample generation is driven by on-the-fly verification of P which results in optimal overall simulation time. Second, the confidence interval estimation for the probability of P to hold is based on an efficient variant of the Wilson method which ensures a faster convergence. Third, the whole methodology is designed according to a parallel fashion and a prototype software tool has been implemented that performs the sampling/verification process in parallel over an HPC architecture.

  9. Rational Arithmetic Mathematica Functions to Evaluate the Two-Sided One Sample K-S Cumulative Sampling Distribution

    Directory of Open Access Journals (Sweden)

    J. Randall Brown

    2007-06-01

    Full Text Available One of the most widely used goodness-of-fit tests is the two-sided one sample Kolmogorov-Smirnov (K-S test which has been implemented by many computer statistical software packages. To calculate a two-sided p value (evaluate the cumulative sampling distribution, these packages use various methods including recursion formulae, limiting distributions, and approximations of unknown accuracy developed over thirty years ago. Based on an extensive literature search for the two-sided one sample K-S test, this paper identifies an exact formula for sample sizes up to 31, six recursion formulae, and one matrix formula that can be used to calculate a p value. To ensure accurate calculation by avoiding catastrophic cancelation and eliminating rounding error, each of these formulae is implemented in rational arithmetic. For the six recursion formulae and the matrix formula, computational experience for sample sizes up to 500 shows that computational times are increasing functions of both the sample size and the number of digits in the numerator and denominator integers of the rational number test statistic. The computational times of the seven formulae vary immensely but the Durbin recursion formula is almost always the fastest. Linear search is used to calculate the inverse of the cumulative sampling distribution (find the confidence interval half-width and tables of calculated half-widths are presented for sample sizes up to 500. Using calculated half-widths as input, computational times for the fastest formula, the Durbin recursion formula, are given for sample sizes up to two thousand.

  10. A basic introduction to statistics for the orthopaedic surgeon.

    Science.gov (United States)

    Bertrand, Catherine; Van Riet, Roger; Verstreken, Frederik; Michielsen, Jef

    2012-02-01

    Orthopaedic surgeons should review the orthopaedic literature in order to keep pace with the latest insights and practices. A good understanding of basic statistical principles is of crucial importance to the ability to read articles critically, to interpret results and to arrive at correct conclusions. This paper explains some of the key concepts in statistics, including hypothesis testing, Type I and Type II errors, testing of normality, sample size and p values.

  11. The Investigation of Construct Validity of Diagnostic and Statistical Manual of Mental Disorder-5 Personality Traits on Iranian sample with Antisocial and Borderline Personality Disorders.

    Science.gov (United States)

    Amini, Mehdi; Pourshahbaz, Abbas; Mohammadkhani, Parvaneh; Ardakani, Mohammad-Reza Khodaie; Lotfi, Mozhgan

    2014-12-01

    The goal of this study was to examine the construct validity of the diagnostic and statistical manual of mental disorder-5 (DSM-5) conceptual model of antisocial and borderline personality disorders (PDs). More specifically, the aim was to determine whether the DSM-5 five-factor structure of pathological personality trait domains replicated in an independently collected sample that differs culturally from the derivation sample. This study was on a sample of 346 individuals with antisocial (n = 122) and borderline PD (n = 130), and nonclinical subjects (n = 94). Participants randomly selected from prisoners, out-patient, and in-patient clients. Participants were recruited from Tehran prisoners, and clinical psychology and psychiatry clinics of Razi and Taleghani Hospital, Tehran, Iran. The SCID-II-PQ, SCID-II, DSM-5 Personality Trait Rating Form (Clinician's PTRF) were used to diagnosis of PD and to assessment of pathological traits. The data were analyzed by exploratory factor analysis. Factor analysis revealed a 5-factor solution for DSM-5 personality traits. Results showed that DSM-5 has adequate construct validity in Iranian sample with antisocial and borderline PDs. Factors similar in number with the other studies, but different in the content. Exploratory factor analysis revealed five homogeneous components of antisocial and borderline PDs. That may represent personality, behavioral, and affective features central to the disorder. Furthermore, the present study helps understand the adequacy of DSM-5 dimensional approach to evaluation of personality pathology, specifically on Iranian sample.

  12. The investigation of construct validity of diagnostic and statistical manual of mental disorder-5 personality traits on iranian sample with antisocial and borderline personality disorders

    Directory of Open Access Journals (Sweden)

    Mehdi Amini

    2014-01-01

    Full Text Available Background: The goal of this study was to examine the construct validity of the diagnostic and statistical manual of mental disorder-5 (DSM-5 conceptual model of antisocial and borderline personality disorders (PDs. More specifically, the aim was to determine whether the DSM-5 five-factor structure of pathological personality trait domains replicated in an independently collected sample that differs culturally from the derivation sample. Methods: This study was on a sample of 346 individuals with antisocial (n = 122 and borderline PD (n = 130, and nonclinical subjects (n = 94. Participants randomly selected from prisoners, out-patient, and in-patient clients . Participants were recruited from Tehran prisoners, and clinical psychology and psychiatry clinics of Razi and Taleghani Hospital, Tehran, Iran. The SCID-II-PQ, SCID-II, DSM-5 Personality Trait Rating Form (Clinician′s PTRF were used to diagnosis of PD and to assessment of pathological traits. The data were analyzed by exploratory factor analysis. Results: Factor analysis revealed a 5-factor solution for DSM-5 personality traits. Results showed that DSM-5 has adequate construct validity in Iranian sample with antisocial and borderline PDs. Factors similar in number with the other studies, but different in the content. Conclusions: Exploratory factor analysis revealed five homogeneous components of antisocial and borderline PDs. That may represent personality, behavioral, and affective features central to the disorder. Furthermore, the present study helps understand the adequacy of DSM-5 dimensional approach to evaluation of personality pathology, specifically on Iranian sample.

  13. Sampling of temporal networks: Methods and biases

    Science.gov (United States)

    Rocha, Luis E. C.; Masuda, Naoki; Holme, Petter

    2017-11-01

    Temporal networks have been increasingly used to model a diversity of systems that evolve in time; for example, human contact structures over which dynamic processes such as epidemics take place. A fundamental aspect of real-life networks is that they are sampled within temporal and spatial frames. Furthermore, one might wish to subsample networks to reduce their size for better visualization or to perform computationally intensive simulations. The sampling method may affect the network structure and thus caution is necessary to generalize results based on samples. In this paper, we study four sampling strategies applied to a variety of real-life temporal networks. We quantify the biases generated by each sampling strategy on a number of relevant statistics such as link activity, temporal paths and epidemic spread. We find that some biases are common in a variety of networks and statistics, but one strategy, uniform sampling of nodes, shows improved performance in most scenarios. Given the particularities of temporal network data and the variety of network structures, we recommend that the choice of sampling methods be problem oriented to minimize the potential biases for the specific research questions on hand. Our results help researchers to better design network data collection protocols and to understand the limitations of sampled temporal network data.

  14. Use of run statistics to validate tensile tests

    International Nuclear Information System (INIS)

    Eatherly, W.P.

    1981-01-01

    In tensile testing of irradiated graphites, it is difficult to assure alignment of sample and train for tensile measurements. By recording location of fractures, run (sequential) statistics can readily detect lack of randomness. The technique is based on partitioning binomial distributions

  15. Compressive sampling of polynomial chaos expansions: Convergence analysis and sampling strategies

    International Nuclear Information System (INIS)

    Hampton, Jerrad; Doostan, Alireza

    2015-01-01

    Sampling orthogonal polynomial bases via Monte Carlo is of interest for uncertainty quantification of models with random inputs, using Polynomial Chaos (PC) expansions. It is known that bounding a probabilistic parameter, referred to as coherence, yields a bound on the number of samples necessary to identify coefficients in a sparse PC expansion via solution to an ℓ 1 -minimization problem. Utilizing results for orthogonal polynomials, we bound the coherence parameter for polynomials of Hermite and Legendre type under their respective natural sampling distribution. In both polynomial bases we identify an importance sampling distribution which yields a bound with weaker dependence on the order of the approximation. For more general orthonormal bases, we propose the coherence-optimal sampling: a Markov Chain Monte Carlo sampling, which directly uses the basis functions under consideration to achieve a statistical optimality among all sampling schemes with identical support. We demonstrate these different sampling strategies numerically in both high-order and high-dimensional, manufactured PC expansions. In addition, the quality of each sampling method is compared in the identification of solutions to two differential equations, one with a high-dimensional random input and the other with a high-order PC expansion. In both cases, the coherence-optimal sampling scheme leads to similar or considerably improved accuracy

  16. Compressive sampling of polynomial chaos expansions: Convergence analysis and sampling strategies

    Science.gov (United States)

    Hampton, Jerrad; Doostan, Alireza

    2015-01-01

    Sampling orthogonal polynomial bases via Monte Carlo is of interest for uncertainty quantification of models with random inputs, using Polynomial Chaos (PC) expansions. It is known that bounding a probabilistic parameter, referred to as coherence, yields a bound on the number of samples necessary to identify coefficients in a sparse PC expansion via solution to an ℓ1-minimization problem. Utilizing results for orthogonal polynomials, we bound the coherence parameter for polynomials of Hermite and Legendre type under their respective natural sampling distribution. In both polynomial bases we identify an importance sampling distribution which yields a bound with weaker dependence on the order of the approximation. For more general orthonormal bases, we propose the coherence-optimal sampling: a Markov Chain Monte Carlo sampling, which directly uses the basis functions under consideration to achieve a statistical optimality among all sampling schemes with identical support. We demonstrate these different sampling strategies numerically in both high-order and high-dimensional, manufactured PC expansions. In addition, the quality of each sampling method is compared in the identification of solutions to two differential equations, one with a high-dimensional random input and the other with a high-order PC expansion. In both cases, the coherence-optimal sampling scheme leads to similar or considerably improved accuracy.

  17. Normal Approximations to the Distributions of the Wilcoxon Statistics: Accurate to What "N"? Graphical Insights

    Science.gov (United States)

    Bellera, Carine A.; Julien, Marilyse; Hanley, James A.

    2010-01-01

    The Wilcoxon statistics are usually taught as nonparametric alternatives for the 1- and 2-sample Student-"t" statistics in situations where the data appear to arise from non-normal distributions, or where sample sizes are so small that we cannot check whether they do. In the past, critical values, based on exact tail areas, were…

  18. A Story-Based Simulation for Teaching Sampling Distributions

    Science.gov (United States)

    Turner, Stephen; Dabney, Alan R.

    2015-01-01

    Statistical inference relies heavily on the concept of sampling distributions. However, sampling distributions are difficult to teach. We present a series of short animations that are story-based, with associated assessments. We hope that our contribution can be useful as a tool to teach sampling distributions in the introductory statistics…

  19. Reducing the sampling frequency of groundwater monitoring wells

    Energy Technology Data Exchange (ETDEWEB)

    Johnson, V.M.; Ridley, M.N. [Lawrence Livermore National Lab., CA (United States); Tuckfield, R.C.; Anderson, R.A. [Westinghouse, Savannah River Co., Aiken, SC (United States)

    1996-01-01

    As part of a joint LLNL/SRTC project, a methodology for selecting sampling frequencies is evolving that introduces statistical thinking and cost effectiveness into the sampling schedule selection practices now commonly employed on environmental projects. Our current emphasis is on descriptive rather than inferential statistics. Environmental monitoring data are inherently messy, being plagued by such problems as extremely high variability and left-censoring. As a result, real data often fail to meet the assumptions required for the appropriate application of many statistical methods. Rather than abandon the quantitative approach in these cases, however, the methodology employs simple statistical techniques to bring a measure of objectivity and reproducibility to the process. The techniques are applied within the framework of decision logic, which inrerprets the numerical results from the standpoint of chemistry-related professional judgment and the regulatory context. This paper presents the methodology`s basic concepts together with early implementation results, showing the estimated cost savings. 6 refs., 3 figs.

  20. Quality of statistical reporting in developmental disability journals.

    Science.gov (United States)

    Namasivayam, Aravind K; Yan, Tina; Wong, Wing Yiu Stephanie; van Lieshout, Pascal

    2015-12-01

    Null hypothesis significance testing (NHST) dominates quantitative data analysis, but its use is controversial and has been heavily criticized. The American Psychological Association has advocated the reporting of effect sizes (ES), confidence intervals (CIs), and statistical power analysis to complement NHST results to provide a more comprehensive understanding of research findings. The aim of this paper is to carry out a sample survey of statistical reporting practices in two journals with the highest h5-index scores in the areas of developmental disability and rehabilitation. Using a checklist that includes critical recommendations by American Psychological Association, we examined 100 randomly selected articles out of 456 articles reporting inferential statistics in the year 2013 in the Journal of Autism and Developmental Disorders (JADD) and Research in Developmental Disabilities (RDD). The results showed that for both journals, ES were reported only half the time (JADD 59.3%; RDD 55.87%). These findings are similar to psychology journals, but are in stark contrast to ES reporting in educational journals (73%). Furthermore, a priori power and sample size determination (JADD 10%; RDD 6%), along with reporting and interpreting precision measures (CI: JADD 13.33%; RDD 16.67%), were the least reported metrics in these journals, but not dissimilar to journals in other disciplines. To advance the science in developmental disability and rehabilitation and to bridge the research-to-practice divide, reforms in statistical reporting, such as providing supplemental measures to NHST, are clearly needed.

  1. Industrial commodity statistics yearbook 2001. Production statistics (1992-2001)

    International Nuclear Information System (INIS)

    2003-01-01

    This is the thirty-fifth in a series of annual compilations of statistics on world industry designed to meet both the general demand for information of this kind and the special requirements of the United Nations and related international bodies. Beginning with the 1992 edition, the title of the publication was changed to industrial Commodity Statistics Yearbook as the result of a decision made by the United Nations Statistical Commission at its twenty-seventh session to discontinue, effective 1994, publication of the Industrial Statistics Yearbook, volume I, General Industrial Statistics by the Statistics Division of the United Nations. The United Nations Industrial Development Organization (UNIDO) has become responsible for the collection and dissemination of general industrial statistics while the Statistics Division of the United Nations continues to be responsible for industrial commodity production statistics. The previous title, Industrial Statistics Yearbook, volume II, Commodity Production Statistics, was introduced in the 1982 edition. The first seven editions in this series were published under the title The Growth of World industry and the next eight editions under the title Yearbook of Industrial Statistics. This edition of the Yearbook contains annual quantity data on production of industrial commodities by country, geographical region, economic grouping and for the world. A standard list of about 530 commodities (about 590 statistical series) has been adopted for the publication. The statistics refer to the ten-year period 1992-2001 for about 200 countries and areas

  2. Industrial commodity statistics yearbook 2002. Production statistics (1993-2002)

    International Nuclear Information System (INIS)

    2004-01-01

    This is the thirty-sixth in a series of annual compilations of statistics on world industry designed to meet both the general demand for information of this kind and the special requirements of the United Nations and related international bodies. Beginning with the 1992 edition, the title of the publication was changed to industrial Commodity Statistics Yearbook as the result of a decision made by the United Nations Statistical Commission at its twenty-seventh session to discontinue, effective 1994, publication of the Industrial Statistics Yearbook, volume I, General Industrial Statistics by the Statistics Division of the United Nations. The United Nations Industrial Development Organization (UNIDO) has become responsible for the collection and dissemination of general industrial statistics while the Statistics Division of the United Nations continues to be responsible for industrial commodity production statistics. The previous title, Industrial Statistics Yearbook, volume II, Commodity Production Statistics, was introduced in the 1982 edition. The first seven editions in this series were published under the title 'The Growth of World industry' and the next eight editions under the title 'Yearbook of Industrial Statistics'. This edition of the Yearbook contains annual quantity data on production of industrial commodities by country, geographical region, economic grouping and for the world. A standard list of about 530 commodities (about 590 statistical series) has been adopted for the publication. The statistics refer to the ten-year period 1993-2002 for about 200 countries and areas

  3. Industrial commodity statistics yearbook 2000. Production statistics (1991-2000)

    International Nuclear Information System (INIS)

    2002-01-01

    This is the thirty-third in a series of annual compilations of statistics on world industry designed to meet both the general demand for information of this kind and the special requirements of the United Nations and related international bodies. Beginning with the 1992 edition, the title of the publication was changed to industrial Commodity Statistics Yearbook as the result of a decision made by the United Nations Statistical Commission at its twenty-seventh session to discontinue, effective 1994, publication of the Industrial Statistics Yearbook, volume I, General Industrial Statistics by the Statistics Division of the United Nations. The United Nations Industrial Development Organization (UNIDO) has become responsible for the collection and dissemination of general industrial statistics while the Statistics Division of the United Nations continues to be responsible for industrial commodity production statistics. The previous title, Industrial Statistics Yearbook, volume II, Commodity Production Statistics, was introduced in the 1982 edition. The first seven editions in this series were published under the title The Growth of World industry and the next eight editions under the title Yearbook of Industrial Statistics. This edition of the Yearbook contains annual quantity data on production of industrial commodities by country, geographical region, economic grouping and for the world. A standard list of about 530 commodities (about 590 statistical series) has been adopted for the publication. Most of the statistics refer to the ten-year period 1991-2000 for about 200 countries and areas

  4. Statistical Hot Channel Factors and Safety Limit CHFR/OFIR

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Byeonghee; Park, Suki [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

    2016-10-15

    The fuel integrity of research reactors are usually judged by comparing the critical heat flux ratio (CHFR) and the maximum fuel temperature (MFT) with the safety limits. Onset of flow instability ratio (OFIR) can also be used for the examination with CHFR. Hot channel factors (HCFs) are incorporated when calculating the CHFR/OFIR and MFT, to consider the uncertainties of fuel properties and thermo-hydraulic variables affecting them. The HCFs and safety limit CHFR is sometimes estimated to include too much conservatism, deteriorating the design flexibilities and operating margins. In this paper, a statistical estimation of HCFs and the safety limit CHFR/OFIR is presented by a random sampling of uncertainty parameters. A 15MW pool type research reactor is selected as the sample reactor for the estimation. The HCFs and the safety limit CHFR/OFIR of a 15MW pool type research reactor are evaluated statistically. The parameters affecting the HCF and the safety limit CHFR/OFIR are listed and their uncertainties are estimated. The relevant parameter uncertainties are sampled randomly and the HCFs and the safety limits are evaluated from them. The HCFs and the safety limit CHFR/OFIR with 95% probability are smaller than those estimated deterministically because the statistical evaluation convolute the correlation uncertainties and the other uncertainties in probabilistic way, whereas the deterministic evaluation simply multiply them.

  5. Application of statistical process control to qualitative molecular diagnostic assays.

    Directory of Open Access Journals (Sweden)

    Cathal P O'brien

    2014-11-01

    Full Text Available Modern pathology laboratories and in particular high throughput laboratories such as clinical chemistry have developed a reliable system for statistical process control. Such a system is absent from the majority of molecular laboratories and where present is confined to quantitative assays. As the inability to apply statistical process control to assay is an obvious disadvantage this study aimed to solve this problem by using a frequency estimate coupled with a confidence interval calculation to detect deviations from an expected mutation frequency. The results of this study demonstrate the strengths and weaknesses of this approach and highlight minimum sample number requirements. Notably, assays with low mutation frequencies and detection of small deviations from an expected value require greater samples with a resultant protracted time to detection. Modelled laboratory data was also used to highlight how this approach might be applied in a routine molecular laboratory. This article is the first to describe the application of statistical process control to qualitative laboratory data.

  6. Sampling a guide for internal auditors

    CERN Document Server

    Apostolou, Barbara

    2004-01-01

    While it is possible to examine 100 percent of an audit customer's data, the time and cost associated with such a study are often prohibitive. To obtain sufficient, reliable, and relevant information with a limited data set, sampling is an efficient and effective tool. It can help you evaluate the customer's assertions, as well as reach audit conclusions and provide reasonable assurance to your organization. This handbook will help you understand sampling. It also serves as a guide for auditors and students preparing for certification. Topics include: An overview of sampling. Statistical and nonstatistical sampling issues. Sampling selection methods and risks. The pros and cons of popular sampling plans.

  7. Mathematics Anxiety and Statistics Anxiety. Shared but Also Unshared Components and Antagonistic Contributions to Performance in Statistics

    Science.gov (United States)

    Paechter, Manuela; Macher, Daniel; Martskvishvili, Khatuna; Wimmer, Sigrid; Papousek, Ilona

    2017-01-01

    In many social science majors, e.g., psychology, students report high levels of statistics anxiety. However, these majors are often chosen by students who are less prone to mathematics and who might have experienced difficulties and unpleasant feelings in their mathematics courses at school. The present study investigates whether statistics anxiety is a genuine form of anxiety that impairs students' achievements or whether learners mainly transfer previous experiences in mathematics and their anxiety in mathematics to statistics. The relationship between mathematics anxiety and statistics anxiety, their relationship to learning behaviors and to performance in a statistics examination were investigated in a sample of 225 undergraduate psychology students (164 women, 61 men). Data were recorded at three points in time: At the beginning of term students' mathematics anxiety, general proneness to anxiety, school grades, and demographic data were assessed; 2 weeks before the end of term, they completed questionnaires on statistics anxiety and their learning behaviors. At the end of term, examination scores were recorded. Mathematics anxiety and statistics anxiety correlated highly but the comparison of different structural equation models showed that they had genuine and even antagonistic contributions to learning behaviors and performance in the examination. Surprisingly, mathematics anxiety was positively related to performance. It might be that students realized over the course of their first term that knowledge and skills in higher secondary education mathematics are not sufficient to be successful in statistics. Part of mathematics anxiety may then have strengthened positive extrinsic effort motivation by the intention to avoid failure and may have led to higher effort for the exam preparation. However, via statistics anxiety mathematics anxiety also had a negative contribution to performance. Statistics anxiety led to higher procrastination in the structural

  8. Mathematics Anxiety and Statistics Anxiety. Shared but Also Unshared Components and Antagonistic Contributions to Performance in Statistics.

    Science.gov (United States)

    Paechter, Manuela; Macher, Daniel; Martskvishvili, Khatuna; Wimmer, Sigrid; Papousek, Ilona

    2017-01-01

    In many social science majors, e.g., psychology, students report high levels of statistics anxiety. However, these majors are often chosen by students who are less prone to mathematics and who might have experienced difficulties and unpleasant feelings in their mathematics courses at school. The present study investigates whether statistics anxiety is a genuine form of anxiety that impairs students' achievements or whether learners mainly transfer previous experiences in mathematics and their anxiety in mathematics to statistics. The relationship between mathematics anxiety and statistics anxiety, their relationship to learning behaviors and to performance in a statistics examination were investigated in a sample of 225 undergraduate psychology students (164 women, 61 men). Data were recorded at three points in time: At the beginning of term students' mathematics anxiety, general proneness to anxiety, school grades, and demographic data were assessed; 2 weeks before the end of term, they completed questionnaires on statistics anxiety and their learning behaviors. At the end of term, examination scores were recorded. Mathematics anxiety and statistics anxiety correlated highly but the comparison of different structural equation models showed that they had genuine and even antagonistic contributions to learning behaviors and performance in the examination. Surprisingly, mathematics anxiety was positively related to performance. It might be that students realized over the course of their first term that knowledge and skills in higher secondary education mathematics are not sufficient to be successful in statistics. Part of mathematics anxiety may then have strengthened positive extrinsic effort motivation by the intention to avoid failure and may have led to higher effort for the exam preparation. However, via statistics anxiety mathematics anxiety also had a negative contribution to performance. Statistics anxiety led to higher procrastination in the structural

  9. Mathematics Anxiety and Statistics Anxiety. Shared but Also Unshared Components and Antagonistic Contributions to Performance in Statistics

    Directory of Open Access Journals (Sweden)

    Manuela Paechter

    2017-07-01

    Full Text Available In many social science majors, e.g., psychology, students report high levels of statistics anxiety. However, these majors are often chosen by students who are less prone to mathematics and who might have experienced difficulties and unpleasant feelings in their mathematics courses at school. The present study investigates whether statistics anxiety is a genuine form of anxiety that impairs students' achievements or whether learners mainly transfer previous experiences in mathematics and their anxiety in mathematics to statistics. The relationship between mathematics anxiety and statistics anxiety, their relationship to learning behaviors and to performance in a statistics examination were investigated in a sample of 225 undergraduate psychology students (164 women, 61 men. Data were recorded at three points in time: At the beginning of term students' mathematics anxiety, general proneness to anxiety, school grades, and demographic data were assessed; 2 weeks before the end of term, they completed questionnaires on statistics anxiety and their learning behaviors. At the end of term, examination scores were recorded. Mathematics anxiety and statistics anxiety correlated highly but the comparison of different structural equation models showed that they had genuine and even antagonistic contributions to learning behaviors and performance in the examination. Surprisingly, mathematics anxiety was positively related to performance. It might be that students realized over the course of their first term that knowledge and skills in higher secondary education mathematics are not sufficient to be successful in statistics. Part of mathematics anxiety may then have strengthened positive extrinsic effort motivation by the intention to avoid failure and may have led to higher effort for the exam preparation. However, via statistics anxiety mathematics anxiety also had a negative contribution to performance. Statistics anxiety led to higher procrastination in

  10. A statistical method for predicting splice variants between two groups of samples using GeneChip® expression array data

    Directory of Open Access Journals (Sweden)

    Olson James M

    2006-04-01

    Full Text Available Abstract Background Alternative splicing of pre-messenger RNA results in RNA variants with combinations of selected exons. It is one of the essential biological functions and regulatory components in higher eukaryotic cells. Some of these variants are detectable with the Affymetrix GeneChip® that uses multiple oligonucleotide probes (i.e. probe set, since the target sequences for the multiple probes are adjacent within each gene. Hybridization intensity from a probe correlates with abundance of the corresponding transcript. Although the multiple-probe feature in the current GeneChip® was designed to assess expression values of individual genes, it also measures transcriptional abundance for a sub-region of a gene sequence. This additional capacity motivated us to develop a method to predict alternative splicing, taking advance of extensive repositories of GeneChip® gene expression array data. Results We developed a two-step approach to predict alternative splicing from GeneChip® data. First, we clustered the probes from a probe set into pseudo-exons based on similarity of probe intensities and physical adjacency. A pseudo-exon is defined as a sequence in the gene within which multiple probes have comparable probe intensity values. Second, for each pseudo-exon, we assessed the statistical significance of the difference in probe intensity between two groups of samples. Differentially expressed pseudo-exons are predicted to be alternatively spliced. We applied our method to empirical data generated from GeneChip® Hu6800 arrays, which include 7129 probe sets and twenty probes per probe set. The dataset consists of sixty-nine medulloblastoma (27 metastatic and 42 non-metastatic samples and four cerebellum samples as normal controls. We predicted that 577 genes would be alternatively spliced when we compared normal cerebellum samples to medulloblastomas, and predicted that thirteen genes would be alternatively spliced when we compared metastatic

  11. THE USE OF RANKING SAMPLING METHOD WITHIN MARKETING RESEARCH

    Directory of Open Access Journals (Sweden)

    CODRUŢA DURA

    2011-01-01

    Full Text Available Marketing and statistical literature available to practitioners provides a wide range of sampling methods that can be implemented in the context of marketing research. Ranking sampling method is based on taking apart the general population into several strata, namely into several subdivisions which are relatively homogenous regarding a certain characteristic. In fact, the sample will be composed by selecting, from each stratum, a certain number of components (which can be proportional or non-proportional to the size of the stratum until the pre-established volume of the sample is reached. Using ranking sampling within marketing research requires the determination of some relevant statistical indicators - average, dispersion, sampling error etc. To that end, the paper contains a case study which illustrates the actual approach used in order to apply the ranking sample method within a marketing research made by a company which provides Internet connection services, on a particular category of customers – small and medium enterprises.

  12. Statistics for the LHC: Quantifying our Scientific Narrative (1/4)

    CERN Multimedia

    CERN. Geneva

    2011-01-01

    Now that the LHC physics program is well under way and results have begun to pour out of the experiments, the statistical methodology used for these results is a hot topic. This is a challenge at the LHC, as we have sensitivity to discover new physics in a stage of the experiments where systematic uncertainties can still be quite large. The emphasis of these lectures is how we can translate the scientific narrative of why we think we know what we know into quantitative statistical statements about the presence or absence of new physics. Topics will include statistical modeling, incorporation of control samples to constrain systematics, and Bayesian and Frequentist statistical tests that are capable of answering these questions.

  13. 2nd Conference of the International Society for Nonparametric Statistics

    CERN Document Server

    Manteiga, Wenceslao; Romo, Juan

    2016-01-01

    This volume collects selected, peer-reviewed contributions from the 2nd Conference of the International Society for Nonparametric Statistics (ISNPS), held in Cádiz (Spain) between June 11–16 2014, and sponsored by the American Statistical Association, the Institute of Mathematical Statistics, the Bernoulli Society for Mathematical Statistics and Probability, the Journal of Nonparametric Statistics and Universidad Carlos III de Madrid. The 15 articles are a representative sample of the 336 contributed papers presented at the conference. They cover topics such as high-dimensional data modelling, inference for stochastic processes and for dependent data, nonparametric and goodness-of-fit testing, nonparametric curve estimation, object-oriented data analysis, and semiparametric inference. The aim of the ISNPS 2014 conference was to bring together recent advances and trends in several areas of nonparametric statistics in order to facilitate the exchange of research ideas, promote collaboration among researchers...

  14. An introduction to inferential statistics: A review and practical guide

    International Nuclear Information System (INIS)

    Marshall, Gill; Jonker, Leon

    2011-01-01

    Building on the first part of this series regarding descriptive statistics, this paper demonstrates why it is advantageous for radiographers to understand the role of inferential statistics in deducing conclusions from a sample and their application to a wider population. This is necessary so radiographers can understand the work of others, can undertake their own research and evidence base their practice. This article explains p values and confidence intervals. It introduces the common statistical tests that comprise inferential statistics, and explains the use of parametric and non-parametric statistics. To do this, the paper reviews relevant literature, and provides a checklist of points to consider before and after applying statistical tests to a data set. The paper provides a glossary of relevant terms and the reader is advised to refer to this when any unfamiliar terms are used in the text. Together with the information provided on descriptive statistics in an earlier article, it can be used as a starting point for applying statistics in radiography practice and research.

  15. An introduction to inferential statistics: A review and practical guide

    Energy Technology Data Exchange (ETDEWEB)

    Marshall, Gill, E-mail: gill.marshall@cumbria.ac.u [Faculty of Health, Medical Sciences and Social Care, University of Cumbria, Lancaster LA1 3JD (United Kingdom); Jonker, Leon [Faculty of Health, Medical Sciences and Social Care, University of Cumbria, Lancaster LA1 3JD (United Kingdom)

    2011-02-15

    Building on the first part of this series regarding descriptive statistics, this paper demonstrates why it is advantageous for radiographers to understand the role of inferential statistics in deducing conclusions from a sample and their application to a wider population. This is necessary so radiographers can understand the work of others, can undertake their own research and evidence base their practice. This article explains p values and confidence intervals. It introduces the common statistical tests that comprise inferential statistics, and explains the use of parametric and non-parametric statistics. To do this, the paper reviews relevant literature, and provides a checklist of points to consider before and after applying statistical tests to a data set. The paper provides a glossary of relevant terms and the reader is advised to refer to this when any unfamiliar terms are used in the text. Together with the information provided on descriptive statistics in an earlier article, it can be used as a starting point for applying statistics in radiography practice and research.

  16. Experimental toxicology: Issues of statistics, experimental design, and replication.

    Science.gov (United States)

    Briner, Wayne; Kirwan, Jeral

    2017-01-01

    The difficulty of replicating experiments has drawn considerable attention. Issues with replication occur for a variety of reasons ranging from experimental design to laboratory errors to inappropriate statistical analysis. Here we review a variety of guidelines for statistical analysis, design, and execution of experiments in toxicology. In general, replication can be improved by using hypothesis driven experiments with adequate sample sizes, randomization, and blind data collection techniques. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Statistics in experimental design, preprocessing, and analysis of proteomics data.

    Science.gov (United States)

    Jung, Klaus

    2011-01-01

    High-throughput experiments in proteomics, such as 2-dimensional gel electrophoresis (2-DE) and mass spectrometry (MS), yield usually high-dimensional data sets of expression values for hundreds or thousands of proteins which are, however, observed on only a relatively small number of biological samples. Statistical methods for the planning and analysis of experiments are important to avoid false conclusions and to receive tenable results. In this chapter, the most frequent experimental designs for proteomics experiments are illustrated. In particular, focus is put on studies for the detection of differentially regulated proteins. Furthermore, issues of sample size planning, statistical analysis of expression levels as well as methods for data preprocessing are covered.

  18. Bayesian stratified sampling to assess corpus utility

    Energy Technology Data Exchange (ETDEWEB)

    Hochberg, J.; Scovel, C.; Thomas, T.; Hall, S.

    1998-12-01

    This paper describes a method for asking statistical questions about a large text corpus. The authors exemplify the method by addressing the question, ``What percentage of Federal Register documents are real documents, of possible interest to a text researcher or analyst?`` They estimate an answer to this question by evaluating 200 documents selected from a corpus of 45,820 Federal Register documents. Bayesian analysis and stratified sampling are used to reduce the sampling uncertainty of the estimate from over 3,100 documents to fewer than 1,000. A possible application of the method is to establish baseline statistics used to estimate recall rates for information retrieval systems.

  19. The Impact of Including Below Detection Limit Samples in Post Decommissioning Soil Sample Analyses

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Jung Hwan; Yim, Man-Sung [KAIST, Daejeon (Korea, Republic of)

    2016-10-15

    To meet the required standards the site owner has to show that the soil at the facility has been sufficiently cleaned up. To do this one must know the contamination of the soil at the site prior to clean up. This involves sampling that soil to identify the degree of contamination. However there is a technical difficulty in determining how much decontamination should be done. The problem arises when measured samples are below the detection limit. Regulatory guidelines for site reuse after decommissioning are commonly challenged because the majority of the activity in the soil at or below the limit of detection. Using additional statistical analyses of contaminated soil after decommissioning is expected to have the following advantages: a better and more reliable probabilistic exposure assessment, better economics (lower project costs) and improved communication with the public. This research will develop an approach that defines an acceptable method for demonstrating compliance of decommissioned NPP sites and validates that compliance. Soil samples from NPP often contain censored data. Conventional methods for dealing with censored data sets are statistically biased and limited in their usefulness. In this research, additional methods are performed using real data from a monazite manufacturing factory.

  20. Multi-reader ROC studies with split-plot designs: a comparison of statistical methods.

    Science.gov (United States)

    Obuchowski, Nancy A; Gallas, Brandon D; Hillis, Stephen L

    2012-12-01

    Multireader imaging trials often use a factorial design, in which study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of this design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper, the authors compare three methods of analysis for the split-plot design. Three statistical methods are presented: the Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean analysis-of-variance approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power, and confidence interval coverage of the three test statistics. The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% confidence intervals falls close to the nominal coverage for small and large sample sizes. The split-plot multireader, multicase study design can be statistically efficient compared to the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rates, similar power, and nominal confidence interval coverage, are available for this study design. Copyright © 2012 AUR. All rights reserved.

  1. Using the modified sample entropy to detect determinism

    Energy Technology Data Exchange (ETDEWEB)

    Xie Hongbo, E-mail: xiehb@sjtu.or [Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hung Hom, Kowloon (Hong Kong); Department of Biomedical Engineering, Jiangsu University, Zhenjiang (China); Guo Jingyi [Department of Health Technology and Informatics, Hong Kong Polytechnic University, Hung Hom, Kowloon (Hong Kong); Zheng Yongping, E-mail: ypzheng@ieee.or [Department of Health Technology and Informatics, Hong Kong Polytechnic University, Hung Hom, Kowloon (Hong Kong); Reseach Institute of Innovative Products and Technologies, Hong Kong Polytechnic University (Hong Kong)

    2010-08-23

    A modified sample entropy (mSampEn), based on the nonlinear continuous and convex function, has been proposed and proven to be superior to the standard sample entropy (SampEn) in several aspects. In this Letter, we empirically investigate the ability of the mSampEn statistic combined with surrogate data method to detect determinism. The effects of the datasets length and noise on the proposed method to differentiate between deterministic and stochastic dynamics are tested on several benchmark time series. The noise performance of the mSampEn statistic is also compared with the singular value decomposition (SVD) and symplectic geometry spectrum (SGS) based methods. The results indicate that the mSampEn statistic is a robust index for detecting determinism in short and noisy time series.

  2. Statistical process control charts for attribute data involving very large sample sizes: a review of problems and solutions.

    Science.gov (United States)

    Mohammed, Mohammed A; Panesar, Jagdeep S; Laney, David B; Wilson, Richard

    2013-04-01

    The use of statistical process control (SPC) charts in healthcare is increasing. The primary purpose of SPC is to distinguish between common-cause variation which is attributable to the underlying process, and special-cause variation which is extrinsic to the underlying process. This is important because improvement under common-cause variation requires action on the process, whereas special-cause variation merits an investigation to first find the cause. Nonetheless, when dealing with attribute or count data (eg, number of emergency admissions) involving very large sample sizes, traditional SPC charts often produce tight control limits with most of the data points appearing outside the control limits. This can give a false impression of common and special-cause variation, and potentially misguide the user into taking the wrong actions. Given the growing availability of large datasets from routinely collected databases in healthcare, there is a need to present a review of this problem (which arises because traditional attribute charts only consider within-subgroup variation) and its solutions (which consider within and between-subgroup variation), which involve the use of the well-established measurements chart and the more recently developed attribute charts based on Laney's innovative approach. We close by making some suggestions for practice.

  3. The relation between statistical power and inference in fMRI.

    Directory of Open Access Journals (Sweden)

    Henk R Cremers

    Full Text Available Statistically underpowered studies can result in experimental failure even when all other experimental considerations have been addressed impeccably. In fMRI the combination of a large number of dependent variables, a relatively small number of observations (subjects, and a need to correct for multiple comparisons can decrease statistical power dramatically. This problem has been clearly addressed yet remains controversial-especially in regards to the expected effect sizes in fMRI, and especially for between-subjects effects such as group comparisons and brain-behavior correlations. We aimed to clarify the power problem by considering and contrasting two simulated scenarios of such possible brain-behavior correlations: weak diffuse effects and strong localized effects. Sampling from these scenarios shows that, particularly in the weak diffuse scenario, common sample sizes (n = 20-30 display extremely low statistical power, poorly represent the actual effects in the full sample, and show large variation on subsequent replications. Empirical data from the Human Connectome Project resembles the weak diffuse scenario much more than the localized strong scenario, which underscores the extent of the power problem for many studies. Possible solutions to the power problem include increasing the sample size, using less stringent thresholds, or focusing on a region-of-interest. However, these approaches are not always feasible and some have major drawbacks. The most prominent solutions that may help address the power problem include model-based (multivariate prediction methods and meta-analyses with related synthesis-oriented approaches.

  4. Adaptive multiple importance sampling for Gaussian processes

    Czech Academy of Sciences Publication Activity Database

    Xiong, X.; Šmídl, Václav; Filippone, M.

    2017-01-01

    Roč. 87, č. 8 (2017), s. 1644-1665 ISSN 0094-9655 R&D Projects: GA MŠk(CZ) 7F14287 Institutional support: RVO:67985556 Keywords : Gaussian Process * Bayesian estimation * Adaptive importance sampling Subject RIV: BB - Applied Statistics, Operational Research OBOR OECD: Statistics and probability Impact factor: 0.757, year: 2016 http://library.utia.cas.cz/separaty/2017/AS/smidl-0469804.pdf

  5. Efficient bootstrap estimates for tail statistics

    Science.gov (United States)

    Breivik, Øyvind; Aarnes, Ole Johan

    2017-03-01

    Bootstrap resamples can be used to investigate the tail of empirical distributions as well as return value estimates from the extremal behaviour of the sample. Specifically, the confidence intervals on return value estimates or bounds on in-sample tail statistics can be obtained using bootstrap techniques. However, non-parametric bootstrapping from the entire sample is expensive. It is shown here that it suffices to bootstrap from a small subset consisting of the highest entries in the sequence to make estimates that are essentially identical to bootstraps from the entire sample. Similarly, bootstrap estimates of confidence intervals of threshold return estimates are found to be well approximated by using a subset consisting of the highest entries. This has practical consequences in fields such as meteorology, oceanography and hydrology where return values are calculated from very large gridded model integrations spanning decades at high temporal resolution or from large ensembles of independent and identically distributed model fields. In such cases the computational savings are substantial.

  6. Harmonic statistics

    International Nuclear Information System (INIS)

    Eliazar, Iddo

    2017-01-01

    The exponential, the normal, and the Poisson statistical laws are of major importance due to their universality. Harmonic statistics are as universal as the three aforementioned laws, but yet they fall short in their ‘public relations’ for the following reason: the full scope of harmonic statistics cannot be described in terms of a statistical law. In this paper we describe harmonic statistics, in their full scope, via an object termed harmonic Poisson process: a Poisson process, over the positive half-line, with a harmonic intensity. The paper reviews the harmonic Poisson process, investigates its properties, and presents the connections of this object to an assortment of topics: uniform statistics, scale invariance, random multiplicative perturbations, Pareto and inverse-Pareto statistics, exponential growth and exponential decay, power-law renormalization, convergence and domains of attraction, the Langevin equation, diffusions, Benford’s law, and 1/f noise. - Highlights: • Harmonic statistics are described and reviewed in detail. • Connections to various statistical laws are established. • Connections to perturbation, renormalization and dynamics are established.

  7. Harmonic statistics

    Energy Technology Data Exchange (ETDEWEB)

    Eliazar, Iddo, E-mail: eliazar@post.tau.ac.il

    2017-05-15

    The exponential, the normal, and the Poisson statistical laws are of major importance due to their universality. Harmonic statistics are as universal as the three aforementioned laws, but yet they fall short in their ‘public relations’ for the following reason: the full scope of harmonic statistics cannot be described in terms of a statistical law. In this paper we describe harmonic statistics, in their full scope, via an object termed harmonic Poisson process: a Poisson process, over the positive half-line, with a harmonic intensity. The paper reviews the harmonic Poisson process, investigates its properties, and presents the connections of this object to an assortment of topics: uniform statistics, scale invariance, random multiplicative perturbations, Pareto and inverse-Pareto statistics, exponential growth and exponential decay, power-law renormalization, convergence and domains of attraction, the Langevin equation, diffusions, Benford’s law, and 1/f noise. - Highlights: • Harmonic statistics are described and reviewed in detail. • Connections to various statistical laws are established. • Connections to perturbation, renormalization and dynamics are established.

  8. Significance levels for studies with correlated test statistics.

    Science.gov (United States)

    Shi, Jianxin; Levinson, Douglas F; Whittemore, Alice S

    2008-07-01

    When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.

  9. Statistical mechanics for a class of quantum statistics

    International Nuclear Information System (INIS)

    Isakov, S.B.

    1994-01-01

    Generalized statistical distributions for identical particles are introduced for the case where filling a single-particle quantum state by particles depends on filling states of different momenta. The system of one-dimensional bosons with a two-body potential that can be solved by means of the thermodynamic Bethe ansatz is shown to be equivalent thermodynamically to a system of free particles obeying statistical distributions of the above class. The quantum statistics arising in this way are completely determined by the two-particle scattering phases of the corresponding interacting systems. An equation determining the statistical distributions for these statistics is derived

  10. Comparing geological and statistical approaches for element selection in sediment tracing research

    Science.gov (United States)

    Laceby, J. Patrick; McMahon, Joe; Evrard, Olivier; Olley, Jon

    2015-04-01

    Elevated suspended sediment loads reduce reservoir capacity and significantly increase the cost of operating water treatment infrastructure, making the management of sediment supply to reservoirs of increasingly importance. Sediment fingerprinting techniques can be used to determine the relative contributions of different sources of sediment accumulating in reservoirs. The objective of this research is to compare geological and statistical approaches to element selection for sediment fingerprinting modelling. Time-integrated samplers (n=45) were used to obtain source samples from four major subcatchments flowing into the Baroon Pocket Dam in South East Queensland, Australia. The geochemistry of potential sources were compared to the geochemistry of sediment cores (n=12) sampled in the reservoir. The geochemical approach selected elements for modelling that provided expected, observed and statistical discrimination between sediment sources. Two statistical approaches selected elements for modelling with the Kruskal-Wallis H-test and Discriminatory Function Analysis (DFA). In particular, two different significance levels (0.05 & 0.35) for the DFA were included to investigate the importance of element selection on modelling results. A distribution model determined the relative contributions of different sources to sediment sampled in the Baroon Pocket Dam. Elemental discrimination was expected between one subcatchment (Obi Obi Creek) and the remaining subcatchments (Lexys, Falls and Bridge Creek). Six major elements were expected to provide discrimination. Of these six, only Fe2O3 and SiO2 provided expected, observed and statistical discrimination. Modelling results with this geological approach indicated 36% (+/- 9%) of sediment sampled in the reservoir cores were from mafic-derived sources and 64% (+/- 9%) were from felsic-derived sources. The geological and the first statistical approach (DFA0.05) differed by only 1% (σ 5%) for 5 out of 6 model groupings with only

  11. Use of Statistics for Data Evaluation in Environmental Radioactivity Measurements

    International Nuclear Information System (INIS)

    Sutarman

    2001-01-01

    Counting statistics will give a correction on environmental radioactivity measurement result. Statistics provides formulas to determine standard deviation (S B ) and minimum detectable concentration (MDC) according to the Poisson distribution. Both formulas depend on the background count rate, counting time, counting efficiency, gamma intensity, and sample size. A long time background counting results in relatively low S B and MDC that can present relatively accurate measurement results. (author)

  12. Applied statistics for agriculture, veterinary, fishery, dairy and allied fields

    CERN Document Server

    Sahu, Pradip Kumar

    2016-01-01

    This book is aimed at a wide range of readers who lack confidence in the mathematical and statistical sciences, particularly in the fields of Agriculture, Veterinary, Fishery, Dairy and other related areas. Its goal is to present the subject of statistics and its useful tools in various disciplines in such a manner that, after reading the book, readers will be equipped to apply the statistical tools to extract otherwise hidden information from their data sets with confidence. Starting with the meaning of statistics, the book introduces measures of central tendency, dispersion, association, sampling methods, probability, inference, designs of experiments and many other subjects of interest in a step-by-step and lucid manner. The relevant theories are described in detail, followed by a broad range of real-world worked-out examples, solved either manually or with the help of statistical packages. In closing, the book also includes a chapter on which statistical packages to use, depending on the user’s respecti...

  13. Control charts for location based on different sampling schemes

    NARCIS (Netherlands)

    Mehmood, R.; Riaz, M.; Does, R.J.M.M.

    2013-01-01

    Control charts are the most important statistical process control tool for monitoring variations in a process. A number of articles are available in the literature for the X̄ control chart based on simple random sampling, ranked set sampling, median-ranked set sampling (MRSS), extreme-ranked set

  14. Models for probability and statistical inference theory and applications

    CERN Document Server

    Stapleton, James H

    2007-01-01

    This concise, yet thorough, book is enhanced with simulations and graphs to build the intuition of readersModels for Probability and Statistical Inference was written over a five-year period and serves as a comprehensive treatment of the fundamentals of probability and statistical inference. With detailed theoretical coverage found throughout the book, readers acquire the fundamentals needed to advance to more specialized topics, such as sampling, linear models, design of experiments, statistical computing, survival analysis, and bootstrapping.Ideal as a textbook for a two-semester sequence on probability and statistical inference, early chapters provide coverage on probability and include discussions of: discrete models and random variables; discrete distributions including binomial, hypergeometric, geometric, and Poisson; continuous, normal, gamma, and conditional distributions; and limit theory. Since limit theory is usually the most difficult topic for readers to master, the author thoroughly discusses mo...

  15. Guidance for air sampling at nuclear facilities

    International Nuclear Information System (INIS)

    Breslin, A.J.

    1976-11-01

    The principal uses of air sampling at nuclear facilities are to monitor general levels of radioactive air contamination, identify sources of air contamination, and evaluate the effectiveness of contaminant control equipment, determine exposures of individual workers, and provide automatic warning of hazardous concentrations of radioactivity. These applications of air sampling are discussed with respect to standards of occupational exposure, instrumentation, sample analysis, sampling protocol, and statistical treatment of concentration data. Emphasis is given to the influence of spacial and temporal variations of radionuclide concentration on the location, duration, and frequency of air sampling

  16. Statistics with JMP graphs, descriptive statistics and probability

    CERN Document Server

    Goos, Peter

    2015-01-01

    Peter Goos, Department of Statistics, University ofLeuven, Faculty of Bio-Science Engineering and University ofAntwerp, Faculty of Applied Economics, BelgiumDavid Meintrup, Department of Mathematics and Statistics,University of Applied Sciences Ingolstadt, Faculty of MechanicalEngineering, GermanyThorough presentation of introductory statistics and probabilitytheory, with numerous examples and applications using JMPDescriptive Statistics and Probability provides anaccessible and thorough overview of the most important descriptivestatistics for nominal, ordinal and quantitative data withpartic

  17. Statistical surrogate model based sampling criterion for stochastic global optimization of problems with constraints

    Energy Technology Data Exchange (ETDEWEB)

    Cho, Su Gil; Jang, Jun Yong; Kim, Ji Hoon; Lee, Tae Hee [Hanyang University, Seoul (Korea, Republic of); Lee, Min Uk [Romax Technology Ltd., Seoul (Korea, Republic of); Choi, Jong Su; Hong, Sup [Korea Research Institute of Ships and Ocean Engineering, Daejeon (Korea, Republic of)

    2015-04-15

    Sequential surrogate model-based global optimization algorithms, such as super-EGO, have been developed to increase the efficiency of commonly used global optimization technique as well as to ensure the accuracy of optimization. However, earlier studies have drawbacks because there are three phases in the optimization loop and empirical parameters. We propose a united sampling criterion to simplify the algorithm and to achieve the global optimum of problems with constraints without any empirical parameters. It is able to select the points located in a feasible region with high model uncertainty as well as the points along the boundary of constraint at the lowest objective value. The mean squared error determines which criterion is more dominant among the infill sampling criterion and boundary sampling criterion. Also, the method guarantees the accuracy of the surrogate model because the sample points are not located within extremely small regions like super-EGO. The performance of the proposed method, such as the solvability of a problem, convergence properties, and efficiency, are validated through nonlinear numerical examples with disconnected feasible regions.

  18. Statistical Power in Plant Pathology Research.

    Science.gov (United States)

    Gent, David H; Esker, Paul D; Kriss, Alissa B

    2018-01-01

    In null hypothesis testing, failure to reject a null hypothesis may have two potential interpretations. One interpretation is that the treatments being evaluated do not have a significant effect, and a correct conclusion was reached in the analysis. Alternatively, a treatment effect may have existed but the conclusion of the study was that there was none. This is termed a Type II error, which is most likely to occur when studies lack sufficient statistical power to detect a treatment effect. In basic terms, the power of a study is the ability to identify a true effect through a statistical test. The power of a statistical test is 1 - (the probability of Type II errors), and depends on the size of treatment effect (termed the effect size), variance, sample size, and significance criterion (the probability of a Type I error, α). Low statistical power is prevalent in scientific literature in general, including plant pathology. However, power is rarely reported, creating uncertainty in the interpretation of nonsignificant results and potentially underestimating small, yet biologically significant relationships. The appropriate level of power for a study depends on the impact of Type I versus Type II errors and no single level of power is acceptable for all purposes. Nonetheless, by convention 0.8 is often considered an acceptable threshold and studies with power less than 0.5 generally should not be conducted if the results are to be conclusive. The emphasis on power analysis should be in the planning stages of an experiment. Commonly employed strategies to increase power include increasing sample sizes, selecting a less stringent threshold probability for Type I errors, increasing the hypothesized or detectable effect size, including as few treatment groups as possible, reducing measurement variability, and including relevant covariates in analyses. Power analysis will lead to more efficient use of resources and more precisely structured hypotheses, and may even

  19. Statistical analyses of in-situ and soil-sample measurements for radionuclides in surface soil near the 116-K-2 trench

    International Nuclear Information System (INIS)

    Gilbert, R.O.; Klover, W.J.

    1988-09-01

    Radiation detection surveys are used at the US Department of Energy's Hanford Reservation near Richland, Washington, to determine areas that need posting as radiation zones or to measure dose rates in the field. The relationship between measurements made by Sodium Iodide (NaI) detectors mounted on the mobile Road Monitor vehicle and those made by hand-held GM P-11 probes and Micro-R meters are of particular interest because the Road Monitor can survey land areas in much less time than hand-held detectors. Statistical regression methods are used here to develop simple equations to predict GM P-11 probe gross gamma count-per-minute (cpm) and Micro-R-Meter μR/h measurements on the basis of NaI gross gamma count-per-second (cps) measurements obtained using the Road Monitor. These equations were estimated using data collected near the 116-K-2 Trench in the 100-K area on the Hanford Reservation. Equations are also obtained for estimating upper and lower limits within which the GM P-11 or Micro-R-Meter measurement corresponding to a given NaI Road Monitor measurement at a new location is expected to fall with high probability. An equation and limits for predicting GM P-11 measurements on the basis of Micro-R- Meter measurements is also estimated. Also, we estimate an equation that may be useful for approximating the 90 Sr measurement of a surface soil sample on the basis of a spectroscopy measurement for 137 Cs on that sample. 3 refs., 16 figs., 44 tabs

  20. Statistical evaluation of vibration analysis techniques

    Science.gov (United States)

    Milner, G. Martin; Miller, Patrice S.

    1987-01-01

    An evaluation methodology is presented for a selection of candidate vibration analysis techniques applicable to machinery representative of the environmental control and life support system of advanced spacecraft; illustrative results are given. Attention is given to the statistical analysis of small sample experiments, the quantification of detection performance for diverse techniques through the computation of probability of detection versus probability of false alarm, and the quantification of diagnostic performance.

  1. Comparison of sampling methods for hard-to-reach francophone populations: yield and adequacy of advertisement and respondent-driven sampling.

    Science.gov (United States)

    Ngwakongnwi, Emmanuel; King-Shier, Kathryn M; Hemmelgarn, Brenda R; Musto, Richard; Quan, Hude

    2014-01-01

    Francophones who live outside the primarily French-speaking province of Quebec, Canada, risk being excluded from research by lack of a sampling frame. We examined the adequacy of random sampling, advertising, and respondent-driven sampling for recruitment of francophones for survey research. We recruited francophones residing in the city of Calgary, Alberta, through advertising and respondentdriven sampling. These 2 samples were then compared with a random subsample of Calgary francophones derived from the 2006 Canadian Community Health Survey (CCHS). We assessed the effectiveness of advertising and respondent-driven sampling in relation to the CCHS sample by comparing demographic characteristics and selected items from the CCHS (specifically self-reported general health status, perceived weight, and having a family doctor). We recruited 120 francophones through advertising and 145 through respondent-driven sampling; the random sample from the CCHS consisted of 259 records. The samples derived from advertising and respondentdriven sampling differed from the CCHS in terms of age (mean ages 41.0, 37.6, and 42.5 years, respectively), sex (proportion of males 26.1%, 40.6%, and 56.6%, respectively), education (college or higher 86.7% , 77.9% , and 59.1%, respectively), place of birth (immigrants accounting for 45.8%, 55.2%, and 3.7%, respectively), and not having a regular medical doctor (16.7%, 34.5%, and 16.6%, respectively). Differences were not tested statistically because of limitations on the analysis of CCHS data imposed by Statistics Canada. The samples generated exclusively through advertising and respondent-driven sampling were not representative of the gold standard sample from the CCHS. Use of such biased samples for research studies could generate misleading results.

  2. Effect size, confidence intervals and statistical power in psychological research.

    Directory of Open Access Journals (Sweden)

    Téllez A.

    2015-07-01

    Full Text Available Quantitative psychological research is focused on detecting the occurrence of certain population phenomena by analyzing data from a sample, and statistics is a particularly helpful mathematical tool that is used by researchers to evaluate hypotheses and make decisions to accept or reject such hypotheses. In this paper, the various statistical tools in psychological research are reviewed. The limitations of null hypothesis significance testing (NHST and the advantages of using effect size and its respective confidence intervals are explained, as the latter two measurements can provide important information about the results of a study. These measurements also can facilitate data interpretation and easily detect trivial effects, enabling researchers to make decisions in a more clinically relevant fashion. Moreover, it is recommended to establish an appropriate sample size by calculating the optimum statistical power at the moment that the research is designed. Psychological journal editors are encouraged to follow APA recommendations strictly and ask authors of original research studies to report the effect size, its confidence intervals, statistical power and, when required, any measure of clinical significance. Additionally, we must account for the teaching of statistics at the graduate level. At that level, students do not receive sufficient information concerning the importance of using different types of effect sizes and their confidence intervals according to the different types of research designs; instead, most of the information is focused on the various tools of NHST.

  3. Mixing and sampling tests for Radiochemical Plant

    International Nuclear Information System (INIS)

    Ehinger, M.N.; Marfin, H.R.; Hunt, B.

    1999-01-01

    The paper describes results and test procedures used to evaluate uncertainly and basis effects introduced by the sampler systems of a radiochemical plant, and similar parameters associated with mixing. This report will concentrate on experiences at the Barnwell Nuclear Fuels Plant. Mixing and sampling tests can be conducted to establish the statistical parameters for those activities related to overall measurement uncertainties. Density measurements by state-of-the art, commercially availability equipment is the key to conducting those tests. Experience in the U.S. suggests the statistical contribution of mixing and sampling can be controlled to less than 0.01 % and with new equipment and new tests in operating facilities might be controlled to better accuracy [ru

  4. Engineer’s estimate reliability and statistical characteristics of bids

    Directory of Open Access Journals (Sweden)

    Fariborz M. Tehrani

    2016-12-01

    Full Text Available The objective of this report is to provide a methodology for examining bids and evaluating the performance of engineer’s estimates in capturing the true cost of projects. This study reviews the cost development for transportation projects in addition to two sources of uncertainties in a cost estimate, including modeling errors and inherent variability. Sample projects are highway maintenance projects with a similar scope of the work, size, and schedule. Statistical analysis of engineering estimates and bids examines the adaptability of statistical models for sample projects. Further, the variation of engineering cost estimates from inception to implementation has been presented and discussed for selected projects. Moreover, the applicability of extreme values theory is assessed for available data. The results indicate that the performance of engineer’s estimate is best evaluated based on trimmed average of bids, excluding discordant bids.

  5. Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.

    Science.gov (United States)

    Kieffer, Kevin M.; Thompson, Bruce

    As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate…

  6. Kappa statistic for clustered matched-pair data.

    Science.gov (United States)

    Yang, Zhao; Zhou, Ming

    2014-07-10

    Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.

  7. Statistical Considerations of Food Allergy Prevention Studies.

    Science.gov (United States)

    Bahnson, Henry T; du Toit, George; Lack, Gideon

    Clinical studies to prevent the development of food allergy have recently helped reshape public policy recommendations on the early introduction of allergenic foods. These trials are also prompting new research, and it is therefore important to address the unique design and analysis challenges of prevention trials. We highlight statistical concepts and give recommendations that clinical researchers may wish to adopt when designing future study protocols and analysis plans for prevention studies. Topics include selecting a study sample, addressing internal and external validity, improving statistical power, choosing alpha and beta, analysis innovations to address dilution effects, and analysis methods to deal with poor compliance, dropout, and missing data. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  8. Statistics Anxiety and Business Statistics: The International Student

    Science.gov (United States)

    Bell, James A.

    2008-01-01

    Does the international student suffer from statistics anxiety? To investigate this, the Statistics Anxiety Rating Scale (STARS) was administered to sixty-six beginning statistics students, including twelve international students and fifty-four domestic students. Due to the small number of international students, nonparametric methods were used to…

  9. Clinical supervision reflected in a Danish DPCCQ sample

    DEFF Research Database (Denmark)

    Nielsen, Jan; Jacobsen, Claus Haugaard

    Core Questionnaire (DPCCQ) has only few questions on supervision. To rectify this limitation, a recent Danish version of the DPCCQ included two new sections on supervision, one focusing on supervisees and another on supervisors and their supervisory training. This paper presents our initial findings...... on giving and receiving clinical supervision as reported by therapists in Denmark. Method: Currently, the Danish sample consists of 350 clinical psychologist doing psychotherapy who completed DPCCQ. Data are currently being prepared for statistical analysis. Results: This paper will focus primarily...... on describing the amount and type of supervision received and given by the sample. Findings from these descriptive statistics will be compared within the sample across demographic parameters such as age and sex, and professional characteristics such as career level, theoretical preferences, type of clients...

  10. Spreadsheets as tools for statistical computing and statistics education

    OpenAIRE

    Neuwirth, Erich

    2000-01-01

    Spreadsheets are an ubiquitous program category, and we will discuss their use in statistics and statistics education on various levels, ranging from very basic examples to extremely powerful methods. Since the spreadsheet paradigm is very familiar to many potential users, using it as the interface to statistical methods can make statistics more easily accessible.

  11. Mathematics and Statistics Research Department progress report for period ending June 30, 1977

    International Nuclear Information System (INIS)

    Lever, W.E.; Shepherd, D.E.; Ward, R.C.; Wilson, D.G.

    1977-09-01

    Brief descriptions are given of work done in mathematical and statistical research (moving-boundary problems; numerical analysis; continuum mechanics; matrices and other operators; experiment design; statistical testing; multivariate, multipopulation classification; statistical estimation) and statistical and mathematical collaboration (analytical chemistry, biological research, chemistry and physics research, energy research, engineering technology research, environmental sciences research, health physics research, meterials research, sampling inspection and quality control, uranium resource evaluation research). Most of the descriptions are a page or less in length. Educational activities, publications, seminar titles, etc., are also included

  12. Statistical analysis of planktic foraminifera of the surface Continental ...

    African Journals Online (AJOL)

    Planktic foraminiferal assemblage recorded from selected samples obtained from shallow continental shelf sediments off southwestern Nigeria were subjected to statistical analysis. The Principal Component Analysis (PCA) was used to determine variants of planktic parameters. Values obtained for these parameters were ...

  13. Estimating an appropriate sampling frequency for monitoring ground water well contamination

    International Nuclear Information System (INIS)

    Tuckfield, R.C.

    1994-01-01

    Nearly 1,500 ground water wells at the Savannah River Site (SRS) are sampled quarterly to monitor contamination by radionuclides and other hazardous constituents from nearby waste sites. Some 10,000 water samples were collected in 1993 at a laboratory analysis cost of $10,000,000. No widely accepted statistical method has been developed, to date, for estimating a technically defensible ground water sampling frequency consistent and compliant with federal regulations. Such a method is presented here based on the concept of statistical independence among successively measured contaminant concentrations in time

  14. APA's Learning Objectives for Research Methods and Statistics in Practice: A Multimethod Analysis

    Science.gov (United States)

    Tomcho, Thomas J.; Rice, Diana; Foels, Rob; Folmsbee, Leah; Vladescu, Jason; Lissman, Rachel; Matulewicz, Ryan; Bopp, Kara

    2009-01-01

    Research methods and statistics courses constitute a core undergraduate psychology requirement. We analyzed course syllabi and faculty self-reported coverage of both research methods and statistics course learning objectives to assess the concordance with APA's learning objectives (American Psychological Association, 2007). We obtained a sample of…

  15. Using remotely-sensed data for optimal field sampling

    CSIR Research Space (South Africa)

    Debba, Pravesh

    2008-09-01

    Full Text Available M B E R 2 0 0 8 15 USING REMOTELY- SENSED DATA FOR OPTIMAL FIELD SAMPLING BY DR PRAVESH DEBBA STATISTICS IS THE SCIENCE pertaining to the collection, summary, analysis, interpretation and presentation of data. It is often impractical... studies are: where to sample, what to sample and how many samples to obtain. Conventional sampling techniques are not always suitable in environmental studies and scientists have explored the use of remotely-sensed data as ancillary information to aid...

  16. Probability sampling design in ethnobotanical surveys of medicinal plants

    Directory of Open Access Journals (Sweden)

    Mariano Martinez Espinosa

    2012-07-01

    Full Text Available Non-probability sampling design can be used in ethnobotanical surveys of medicinal plants. However, this method does not allow statistical inferences to be made from the data generated. The aim of this paper is to present a probability sampling design that is applicable in ethnobotanical studies of medicinal plants. The sampling design employed in the research titled "Ethnobotanical knowledge of medicinal plants used by traditional communities of Nossa Senhora Aparecida do Chumbo district (NSACD, Poconé, Mato Grosso, Brazil" was used as a case study. Probability sampling methods (simple random and stratified sampling were used in this study. In order to determine the sample size, the following data were considered: population size (N of 1179 families; confidence coefficient, 95%; sample error (d, 0.05; and a proportion (p, 0.5. The application of this sampling method resulted in a sample size (n of at least 290 families in the district. The present study concludes that probability sampling methods necessarily have to be employed in ethnobotanical studies of medicinal plants, particularly where statistical inferences have to be made using data obtained. This can be achieved by applying different existing probability sampling methods, or better still, a combination of such methods.

  17. A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.

    Science.gov (United States)

    Lai, En-Yu; Chen, Yi-Hau; Wu, Kun-Pin

    2017-06-01

    Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https

  18. Register-based statistics statistical methods for administrative data

    CERN Document Server

    Wallgren, Anders

    2014-01-01

    This book provides a comprehensive and up to date treatment of  theory and practical implementation in Register-based statistics. It begins by defining the area, before explaining how to structure such systems, as well as detailing alternative approaches. It explains how to create statistical registers, how to implement quality assurance, and the use of IT systems for register-based statistics. Further to this, clear details are given about the practicalities of implementing such statistical methods, such as protection of privacy and the coordination and coherence of such an undertaking. Thi

  19. Cancer Statistics

    Science.gov (United States)

    ... What Is Cancer? Cancer Statistics Cancer Disparities Cancer Statistics Cancer has a major impact on society in ... success of efforts to control and manage cancer. Statistics at a Glance: The Burden of Cancer in ...

  20. Quantal bookkeeping of samples and locality

    International Nuclear Information System (INIS)

    Groenewold, H.J.

    1983-01-01

    The skeptical pruned ensemble interpretation of quantal measurements is described in the conventional representation and in an equivalent hedge-hog representation. (A symmetric hedge-hog is displayed by a spiny array of Hilbert projectors.) A fundamental problem of any individual interpretation is the distinction in the formalism of individual samples and their mutual independence. In the formal hedge-hog bookkeeping an auxiliary hedge-hog hypothesis is proposed, which associates separate real samples of quantal ensembles with separate fictions hedge-hogs. In accordance with its private unobservable hedge-hog each sample has so to speak to every potential observable question its own definite potential answer in store. The statistical distribution of the answers of the various samples to all questions is represented by the ensemble operator, which can only be attributed to the entire ensemble as a whole. Observable answers can be obtained from an individual real sample only one at a time. In this fictitous finer grained model the hedge-hogs metaphorically represent a kind of individual memory of the corresponding samples. The principle of mutual kinematical and dynamical independence of samples as well as the principle of locality in retarded correlation is satisfied. This has to be paid for by indefinite statistics of the hedge-hog distribution in the ensemble. Even if the hedge-hog model is afterwards thrown away as an untestable fake, the logical compatibility of these two fundamental principles (whatever their significance may be) with standard quantum mechanics holds firm. The physical compatibility remains an open question. (orig.)

  1. Analysis of monazite samples

    International Nuclear Information System (INIS)

    Kartiwa Sumadi; Yayah Rohayati

    1996-01-01

    The 'monazit' analytical program has been set up for routine work of Rare Earth Elements analysis in the monazite and xenotime minerals samples. Total relative error of the analysis is very low, less than 2.50%, and the reproducibility of counting statistic and stability of the instrument were very excellent. The precision and accuracy of the analytical program are very good with the maximum percentage relative are 5.22% and 1.61%, respectively. The mineral compositions of the 30 monazite samples have been also calculated using their chemical constituents, and the results were compared to the grain counting microscopic analysis

  2. The influence of the presence of deviant item score patterns on the power of a person-fit statistic

    NARCIS (Netherlands)

    Meijer, R.R.

    1994-01-01

    In studies investigating the power of person-fit statistics it is often assumed that the item parameters that are used to calculate the statistics can be estimated in a sample without aberrant persons. However, in practical test applications calibration samples most likely will contain aberrant

  3. The Big Mac Standard: A statistical Illustration

    OpenAIRE

    Yukinobu Kitamura; Hiroshi Fujiki

    2004-01-01

    We demonstrate a statistical procedure for selecting the most suitable empirical model to test an economic theory, using the example of the test for purchasing power parity based on the Big Mac Index. Our results show that supporting evidence for purchasing power parity, conditional on the Balassa-Samuelson effect, depends crucially on the selection of models, sample periods and economies used for estimations.

  4. Using robust statistics to improve neutron activation analysis results

    International Nuclear Information System (INIS)

    Zahn, Guilherme S.; Genezini, Frederico A.; Ticianelli, Regina B.; Figueiredo, Ana Maria G.

    2011-01-01

    Neutron activation analysis (NAA) is an analytical technique where an unknown sample is submitted to a neutron flux in a nuclear reactor, and its elemental composition is calculated by measuring the induced activity produced. By using the relative NAA method, one or more well-characterized samples (usually certified reference materials - CRMs) are irradiated together with the unknown ones, and the concentration of each element is then calculated by comparing the areas of the gamma ray peaks related to that element. When two or more CRMs are used as reference, the concentration of each element can be determined by several different ways, either using more than one gamma ray peak for that element (when available), or using the results obtained in the comparison with each CRM. Therefore, determining the best estimate for the concentration of each element in the sample can be a delicate issue. In this work, samples from three CRMs were irradiated together and the elemental concentration in one of them was calculated using the other two as reference. Two sets of peaks were analyzed for each element: a smaller set containing only the literature-recommended gamma-ray peaks and a larger one containing all peaks related to that element that could be quantified in the gamma-ray spectra; the most recommended transition was also used as a benchmark. The resulting data for each element was then reduced using up to five different statistical approaches: the usual (and not robust) unweighted and weighted means, together with three robust means: the Limitation of Relative Statistical Weight, Normalized Residuals and Rajeval. The resulting concentration values were then compared to the certified value for each element, allowing for discussion on both the performance of each statistical tool and on the best choice of peaks for each element. (author)

  5. A study on the advanced statistical core thermal design methodology

    International Nuclear Information System (INIS)

    Lee, Seung Hyuk

    1992-02-01

    A statistical core thermal design methodology for generating the limit DNBR and the nominal DNBR is proposed and used in assessing the best-estimate thermal margin in a reactor core. Firstly, the Latin Hypercube Sampling Method instead of the conventional Experimental Design Technique is utilized as an input sampling method for a regression analysis to evaluate its sampling efficiency. Secondly and as a main topic, the Modified Latin Hypercube Sampling and the Hypothesis Test Statistics method is proposed as a substitute for the current statistical core thermal design method. This new methodology adopts 'a Modified Latin Hypercube Sampling Method' which uses the mean values of each interval of input variables instead of random values to avoid the extreme cases that arise in the tail areas of some parameters. Next, the independence between the input variables is verified through 'Correlation Coefficient Test' for statistical treatment of their uncertainties. And the distribution type of DNBR response is determined though 'Goodness of Fit Test'. Finally, the limit DNBR with one-sided 95% probability and 95% confidence level, DNBR 95/95 ' is estimated. The advantage of this methodology over the conventional statistical method using Response Surface and Monte Carlo simulation technique lies in its simplicity of the analysis procedure, while maintaining the same level of confidence in the limit DNBR result. This methodology is applied to the two cases of DNBR margin calculation. The first case is the application to the determination of the limit DNBR where the DNBR margin is determined by the difference between the nominal DNBR and the limit DNBR. The second case is the application to the determination of the nominal DNBR where the DNBR margin is determined by the difference between the lower limit value of the nominal DNBR and the CHF correlation limit being used. From this study, it is deduced that the proposed methodology gives a good agreement in the DNBR results

  6. Generation and Analysis of Constrained Random Sampling Patterns

    DEFF Research Database (Denmark)

    Pierzchlewski, Jacek; Arildsen, Thomas

    2016-01-01

    Random sampling is a technique for signal acquisition which is gaining popularity in practical signal processing systems. Nowadays, event-driven analog-to-digital converters make random sampling feasible in practical applications. A process of random sampling is defined by a sampling pattern, which...... indicates signal sampling points in time. Practical random sampling patterns are constrained by ADC characteristics and application requirements. In this paper, we introduce statistical methods which evaluate random sampling pattern generators with emphasis on practical applications. Furthermore, we propose...... algorithm generates random sampling patterns dedicated for event-driven-ADCs better than existed sampling pattern generators. Finally, implementation issues of random sampling patterns are discussed....

  7. Statistical aspects of tumor registries, Hiroshima and Nagasaki

    Energy Technology Data Exchange (ETDEWEB)

    Ishida, M

    1961-02-24

    Statistical considerations are presented on the tumor registries established for purpose of studying radiation induced carcinoma in Hiroshima and Nagasaki by observing tumors developing in the survivors of these cities. In addition to describing the background and purpose of the tumor registries the report consists of two parts: (1) accuracy of reported tumor cases and (2) statistical aspects of the incidence of tumors based both on a current population and on a fixed sample. Under the heading background, discussion includes the difficulties in attaining complete registration; the various problems associated with the tumor registries; and the special characteristics of tumor registries in Hiroshima and Nagasaki. Beye's a posteriori probability formula was applied to the Type I and Type II errors in the autopsy data of Hiroshima ABCC. (Type I, diagnosis of what is not cancer as cancer; Type II, diagnosis of what is cancer as noncancer.) Finally, the report discussed the difficulties in estimating a current population of survivors; the advantages and disadvantages of analyses based on a fixed sample and on an estimated current population; the comparison of incidence rates based on these populations using the 20 months' data of the tumor registry in Hiroshima; and the sample size required for studying radiation induced carcinoma. 10 references, 1 figure, 8 tables.

  8. Statistical hadronization and hadronic micro-canonical ensemble II

    International Nuclear Information System (INIS)

    Becattini, F.; Ferroni, L.

    2004-01-01

    We present a Monte Carlo calculation of the micro-canonical ensemble of the ideal hadron-resonance gas including all known states up to a mass of about 1.8 GeV and full quantum statistics. The micro-canonical average multiplicities of the various hadron species are found to converge to the canonical ones for moderately low values of the total energy, around 8 GeV, thus bearing out previous analyses of hadronic multiplicities in the canonical ensemble. The main numerical computing method is an importance sampling Monte Carlo algorithm using the product of Poisson distributions to generate multi-hadronic channels. It is shown that the use of this multi-Poisson distribution allows for an efficient and fast computation of averages, which can be further improved in the limit of very large clusters. We have also studied the fitness of a previously proposed computing method, based on the Metropolis Monte Carlo algorithm, for event generation in the statistical hadronization model. We find that the use of the multi-Poisson distribution as proposal matrix dramatically improves the computation performance. However, due to the correlation of subsequent samples, this method proves to be generally less robust and effective than the importance sampling method. (orig.)

  9. Handling missing data in ranked set sampling

    CERN Document Server

    Bouza-Herrera, Carlos N

    2013-01-01

    The existence of missing observations is a very important aspect to be considered in the application of survey sampling, for example. In human populations they may be caused by a refusal of some interviewees to give the true value for the variable of interest. Traditionally, simple random sampling is used to select samples. Most statistical models are supported by the use of samples selected by means of this design. In recent decades, an alternative design has started being used, which, in many cases, shows an improvement in terms of accuracy compared with traditional sampling. It is called R

  10. Software Used to Generate Cancer Statistics - SEER Cancer Statistics

    Science.gov (United States)

    Videos that highlight topics and trends in cancer statistics and definitions of statistical terms. Also software tools for analyzing and reporting cancer statistics, which are used to compile SEER's annual reports.

  11. Isotopic safeguards statistics

    International Nuclear Information System (INIS)

    Timmerman, C.L.; Stewart, K.B.

    1978-06-01

    The methods and results of our statistical analysis of isotopic data using isotopic safeguards techniques are illustrated using example data from the Yankee Rowe reactor. The statistical methods used in this analysis are the paired comparison and the regression analyses. A paired comparison results when a sample from a batch is analyzed by two different laboratories. Paired comparison techniques can be used with regression analysis to detect and identify outlier batches. The second analysis tool, linear regression, involves comparing various regression approaches. These approaches use two basic types of models: the intercept model (y = α + βx) and the initial point model [y - y 0 = β(x - x 0 )]. The intercept model fits strictly the exposure or burnup values of isotopic functions, while the initial point model utilizes the exposure values plus the initial or fabricator's data values in the regression analysis. Two fitting methods are applied to each of these models. These methods are: (1) the usual least squares fitting approach where x is measured without error, and (2) Deming's approach which uses the variance estimates obtained from the paired comparison results and considers x and y are both measured with error. The Yankee Rowe data were first measured by Nuclear Fuel Services (NFS) and remeasured by Nuclear Audit and Testing Company (NATCO). The ratio of Pu/U versus 235 D (in which 235 D is the amount of depleted 235 U expressed in weight percent) using actual numbers is the isotopic function illustrated. Statistical results using the Yankee Rowe data indicates the attractiveness of Deming's regression model over the usual approach by simple comparison of the given regression variances with the random variance from the paired comparison results

  12. Toward cost-efficient sampling methods

    Science.gov (United States)

    Luo, Peng; Li, Yongli; Wu, Chong; Zhang, Guijie

    2015-09-01

    The sampling method has been paid much attention in the field of complex network in general and statistical physics in particular. This paper proposes two new sampling methods based on the idea that a small part of vertices with high node degree could possess the most structure information of a complex network. The two proposed sampling methods are efficient in sampling high degree nodes so that they would be useful even if the sampling rate is low, which means cost-efficient. The first new sampling method is developed on the basis of the widely used stratified random sampling (SRS) method and the second one improves the famous snowball sampling (SBS) method. In order to demonstrate the validity and accuracy of two new sampling methods, we compare them with the existing sampling methods in three commonly used simulation networks that are scale-free network, random network, small-world network, and also in two real networks. The experimental results illustrate that the two proposed sampling methods perform much better than the existing sampling methods in terms of achieving the true network structure characteristics reflected by clustering coefficient, Bonacich centrality and average path length, especially when the sampling rate is low.

  13. Sample design considerations of indoor air exposure surveys

    International Nuclear Information System (INIS)

    Cox, B.G.; Mage, D.T.; Immerman, F.W.

    1988-01-01

    Concern about the potential for indoor air pollution has prompted recent surveys of radon and NO 2 concentrations in homes and personal exposure studies of volatile organics, carbon monoxide and pesticides, to name a few. The statistical problems in designing sample surveys that measure the physical environment are diverse and more complicated than those encountered in traditional surveys of human attitudes and attributes. This paper addresses issues encountered when designing indoor air quality (IAQ) studies. General statistical concepts related to target population definition, frame creation, and sample selection for area household surveys and telephone surveys are presented. The implications of different measurement approaches are discussed, and response rate considerations are described

  14. Understanding Statistics and Statistics Education: A Chinese Perspective

    Science.gov (United States)

    Shi, Ning-Zhong; He, Xuming; Tao, Jian

    2009-01-01

    In recent years, statistics education in China has made great strides. However, there still exists a fairly large gap with the advanced levels of statistics education in more developed countries. In this paper, we identify some existing problems in statistics education in Chinese schools and make some proposals as to how they may be overcome. We…

  15. Mathematical statistics

    CERN Document Server

    Pestman, Wiebe R

    2009-01-01

    This textbook provides a broad and solid introduction to mathematical statistics, including the classical subjects hypothesis testing, normal regression analysis, and normal analysis of variance. In addition, non-parametric statistics and vectorial statistics are considered, as well as applications of stochastic analysis in modern statistics, e.g., Kolmogorov-Smirnov testing, smoothing techniques, robustness and density estimation. For students with some elementary mathematical background. With many exercises. Prerequisites from measure theory and linear algebra are presented.

  16. A statistical test for outlier identification in data envelopment analysis

    Directory of Open Access Journals (Sweden)

    Morteza Khodabin

    2010-09-01

    Full Text Available In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results. In these ‘‘deterministic’’ frontier models, statistical theory is now mostly available. This paper deals with the statistical pared sample method and its capability of detecting outliers in data envelopment analysis. In the presented method, each observation is deleted from the sample once and the resulting linear program is solved, leading to a distribution of efficiency estimates. Based on the achieved distribution, a pared test is designed to identify the potential outlier(s. We illustrate the method through a real data set. The method could be used in a first step, as an exploratory data analysis, before using any frontier estimation.

  17. Field Sampling from a Segmented Image

    CSIR Research Space (South Africa)

    Debba, Pravesh

    2008-06-01

    Full Text Available This paper presents a statistical method for deriving the optimal prospective field sampling scheme on a remote sensing image to represent different categories in the field. The iterated conditional modes algorithm (ICM) is used for segmentation...

  18. Improving the Crossing-SIBTEST Statistic for Detecting Non-uniform DIF.

    Science.gov (United States)

    Chalmers, R Philip

    2018-06-01

    This paper demonstrates that, after applying a simple modification to Li and Stout's (Psychometrika 61(4):647-677, 1996) CSIBTEST statistic, an improved variant of the statistic could be realized. It is shown that this modified version of CSIBTEST has a more direct association with the SIBTEST statistic presented by Shealy and Stout (Psychometrika 58(2):159-194, 1993). In particular, the asymptotic sampling distributions and general interpretation of the effect size estimates are the same for SIBTEST and the new CSIBTEST. Given the more natural connection to SIBTEST, it is shown that Li and Stout's hypothesis testing approach is insufficient for CSIBTEST; thus, an improved hypothesis testing procedure is required. Based on the presented arguments, a new chi-squared-based hypothesis testing approach is proposed for the modified CSIBTEST statistic. Positive results from a modest Monte Carlo simulation study strongly suggest the original CSIBTEST procedure and randomization hypothesis testing approach should be replaced by the modified statistic and hypothesis testing method.

  19. Using Group Projects to Assess the Learning of Sampling Distributions

    Science.gov (United States)

    Neidigh, Robert O.; Dunkelberger, Jake

    2012-01-01

    In an introductory business statistics course, student groups used sample data to compare a set of sample means to the theoretical sampling distribution. Each group was given a production measurement with a population mean and standard deviation. The groups were also provided an excel spreadsheet with 40 sample measurements per week for 52 weeks…

  20. RHAPSODY. I. STRUCTURAL PROPERTIES AND FORMATION HISTORY FROM A STATISTICAL SAMPLE OF RE-SIMULATED CLUSTER-SIZE HALOS

    International Nuclear Information System (INIS)

    Wu, Hao-Yi; Hahn, Oliver; Wechsler, Risa H.; Mao, Yao-Yuan; Behroozi, Peter S.

    2013-01-01

    We present the first results from the RHAPSODY cluster re-simulation project: a sample of 96 'zoom-in' simulations of dark matter halos of 10 14.8±0.05 h –1 M ☉ , selected from a 1 h –3 Gpc 3 volume. This simulation suite is the first to resolve this many halos with ∼5 × 10 6 particles per halo in the cluster mass regime, allowing us to statistically characterize the distribution of and correlation between halo properties at fixed mass. We focus on the properties of the main halos and how they are affected by formation history, which we track back to z = 12, over five decades in mass. We give particular attention to the impact of the formation history on the density profiles of the halos. We find that the deviations from the Navarro-Frenk-White (NFW) model and the Einasto model depend on formation time. Late-forming halos tend to have considerable deviations from both models, partly due to the presence of massive subhalos, while early-forming halos deviate less but still significantly from the NFW model and are better described by the Einasto model. We find that the halo shapes depend only moderately on formation time. Departure from spherical symmetry impacts the density profiles through the anisotropic distribution of massive subhalos. Further evidence of the impact of subhalos is provided by analyzing the phase-space structure. A detailed analysis of the properties of the subhalo population in RHAPSODY is presented in a companion paper.