Statistical Symbolic Execution with Informed Sampling
Filieri, Antonio; Pasareanu, Corina S.; Visser, Willem; Geldenhuys, Jaco
2014-01-01
Symbolic execution techniques have been proposed recently for the probabilistic analysis of programs. These techniques seek to quantify the likelihood of reaching program events of interest, e.g., assert violations. They have many promising applications but have scalability issues due to high computational demand. To address this challenge, we propose a statistical symbolic execution technique that performs Monte Carlo sampling of the symbolic program paths and uses the obtained information for Bayesian estimation and hypothesis testing with respect to the probability of reaching the target events. To speed up the convergence of the statistical analysis, we propose Informed Sampling, an iterative symbolic execution that first explores the paths that have high statistical significance, prunes them from the state space and guides the execution towards less likely paths. The technique combines Bayesian estimation with a partial exact analysis for the pruned paths leading to provably improved convergence of the statistical analysis. We have implemented statistical symbolic execution with in- formed sampling in the Symbolic PathFinder tool. We show experimentally that the informed sampling obtains more precise results and converges faster than a purely statistical analysis and may also be more efficient than an exact symbolic analysis. When the latter does not terminate symbolic execution with informed sampling can give meaningful results under the same time and memory limits.
Statistical sampling techniques as applied to OSE inspections
International Nuclear Information System (INIS)
Davis, J.J.; Cote, R.W.
1987-01-01
The need has been recognized for statistically valid methods for gathering information during OSE inspections; and for interpretation of results, both from performance testing and from records reviews, interviews, etc. Battelle Columbus Division, under contract to DOE OSE has performed and is continuing to perform work in the area of statistical methodology for OSE inspections. This paper represents some of the sampling methodology currently being developed for use during OSE inspections. Topics include population definition, sample size requirements, level of confidence and practical logistical constraints associated with the conduct of an inspection based on random sampling. Sequential sampling schemes and sampling from finite populations are also discussed. The methods described are applicable to various data gathering activities, ranging from the sampling and examination of classified documents to the sampling of Protective Force security inspectors for skill testing
Energy Technology Data Exchange (ETDEWEB)
GREER DA; THIEN MG
2012-01-12
The ability to effectively mix, sample, certify, and deliver consistent batches of High Level Waste (HLW) feed from the Hanford Double Shell Tanks (DST) to the Waste Treatment and Immobilization Plant (WTP) presents a significant mission risk with potential to impact mission length and the quantity of HLW glass produced. DOE's Tank Operations Contractor, Washington River Protection Solutions (WRPS) has previously presented the results of mixing performance in two different sizes of small scale DSTs to support scale up estimates of full scale DST mixing performance. Currently, sufficient sampling of DSTs is one of the largest programmatic risks that could prevent timely delivery of high level waste to the WTP. WRPS has performed small scale mixing and sampling demonstrations to study the ability to sufficiently sample the tanks. The statistical evaluation of the demonstration results which lead to the conclusion that the two scales of small DST are behaving similarly and that full scale performance is predictable will be presented. This work is essential to reduce the risk of requiring a new dedicated feed sampling facility and will guide future optimization work to ensure the waste feed delivery mission will be accomplished successfully. This paper will focus on the analytical data collected from mixing, sampling, and batch transfer testing from the small scale mixing demonstration tanks and how those data are being interpreted to begin to understand the relationship between samples taken prior to transfer and samples from the subsequent batches transferred. An overview of the types of data collected and examples of typical raw data will be provided. The paper will then discuss the processing and manipulation of the data which is necessary to begin evaluating sampling and batch transfer performance. This discussion will also include the evaluation of the analytical measurement capability with regard to the simulant material used in the demonstration tests. The
Effect of the Target Motion Sampling Temperature Treatment Method on the Statistics and Performance
Viitanen, Tuomas; Leppänen, Jaakko
2014-06-01
Target Motion Sampling (TMS) is a stochastic on-the-fly temperature treatment technique that is being developed as a part of the Monte Carlo reactor physics code Serpent. The method provides for modeling of arbitrary temperatures in continuous-energy Monte Carlo tracking routines with only one set of cross sections stored in the computer memory. Previously, only the performance of the TMS method in terms of CPU time per transported neutron has been discussed. Since the effective cross sections are not calculated at any point of a transport simulation with TMS, reaction rate estimators must be scored using sampled cross sections, which is expected to increase the variances and, consequently, to decrease the figures-of-merit. This paper examines the effects of the TMS on the statistics and performance in practical calculations involving reaction rate estimation with collision estimators. Against all expectations it turned out that the usage of sampled response values has no practical effect on the performance of reaction rate estimators when using TMS with elevated basis cross section temperatures (EBT), i.e. the usual way. With 0 Kelvin cross sections a significant increase in the variances of capture rate estimators was observed right below the energy region of unresolved resonances, but at these energies the figures-of-merit could be increased using a simple resampling technique to decrease the variances of the responses. It was, however, noticed that the usage of the TMS method increases the statistical deviances of all estimators, including the flux estimator, by tens of percents in the vicinity of very strong resonances. This effect is actually not related to the usage of sampled responses, but is instead an inherent property of the TMS tracking method and concerns both EBT and 0 K calculations.
Effect of the Target Motion Sampling temperature treatment method on the statistics and performance
International Nuclear Information System (INIS)
Viitanen, Tuomas; Leppänen, Jaakko
2015-01-01
Highlights: • Use of the Target Motion Sampling (TMS) method with collision estimators is studied. • The expected values of the estimators agree with NJOY-based reference. • In most practical cases also the variances of the estimators are unaffected by TMS. • Transport calculation slow-down due to TMS dominates the impact on figures-of-merit. - Abstract: Target Motion Sampling (TMS) is a stochastic on-the-fly temperature treatment technique that is being developed as a part of the Monte Carlo reactor physics code Serpent. The method provides for modeling of arbitrary temperatures in continuous-energy Monte Carlo tracking routines with only one set of cross sections stored in the computer memory. Previously, only the performance of the TMS method in terms of CPU time per transported neutron has been discussed. Since the effective cross sections are not calculated at any point of a transport simulation with TMS, reaction rate estimators must be scored using sampled cross sections, which is expected to increase the variances and, consequently, to decrease the figures-of-merit. This paper examines the effects of the TMS on the statistics and performance in practical calculations involving reaction rate estimation with collision estimators. Against all expectations it turned out that the usage of sampled response values has no practical effect on the performance of reaction rate estimators when using TMS with elevated basis cross section temperatures (EBT), i.e. the usual way. With 0 Kelvin cross sections a significant increase in the variances of capture rate estimators was observed right below the energy region of unresolved resonances, but at these energies the figures-of-merit could be increased using a simple resampling technique to decrease the variances of the responses. It was, however, noticed that the usage of the TMS method increases the statistical deviances of all estimators, including the flux estimator, by tens of percents in the vicinity of very
Sampling, Probability Models and Statistical Reasoning Statistical
Indian Academy of Sciences (India)
Home; Journals; Resonance – Journal of Science Education; Volume 1; Issue 5. Sampling, Probability Models and Statistical Reasoning Statistical Inference. Mohan Delampady V R Padmawar. General Article Volume 1 Issue 5 May 1996 pp 49-58 ...
Statistical distribution sampling
Johnson, E. S.
1975-01-01
Determining the distribution of statistics by sampling was investigated. Characteristic functions, the quadratic regression problem, and the differential equations for the characteristic functions are analyzed.
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.
Lin, Johnny; Bentler, Peter M
2012-01-01
Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.
Performance modeling, loss networks, and statistical multiplexing
Mazumdar, Ravi
2009-01-01
This monograph presents a concise mathematical approach for modeling and analyzing the performance of communication networks with the aim of understanding the phenomenon of statistical multiplexing. The novelty of the monograph is the fresh approach and insights provided by a sample-path methodology for queueing models that highlights the important ideas of Palm distributions associated with traffic models and their role in performance measures. Also presented are recent ideas of large buffer, and many sources asymptotics that play an important role in understanding statistical multiplexing. I
42 CFR 402.109 - Statistical sampling.
2010-10-01
... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study, if based upon an appropriate sampling and computed by valid statistical methods, constitute prima... § 402.1. (c) Burden of proof. Once CMS or OIG has made a prima facie case, the burden is on the...
Contributions to sampling statistics
Conti, Pier; Ranalli, Maria
2014-01-01
This book contains a selection of the papers presented at the ITACOSM 2013 Conference, held in Milan in June 2013. ITACOSM is the bi-annual meeting of the Survey Sampling Group S2G of the Italian Statistical Society, intended as an international forum of scientific discussion on the developments of theory and application of survey sampling methodologies and applications in human and natural sciences. The book gathers research papers carefully selected from both invited and contributed sessions of the conference. The whole book appears to be a relevant contribution to various key aspects of sampling methodology and techniques; it deals with some hot topics in sampling theory, such as calibration, quantile-regression and multiple frame surveys, and with innovative methodologies in important topics of both sampling theory and applications. Contributions cut across current sampling methodologies such as interval estimation for complex samples, randomized responses, bootstrap, weighting, modeling, imputati...
Statistical literacy and sample survey results
McAlevey, Lynn; Sullivan, Charles
2010-10-01
Sample surveys are widely used in the social sciences and business. The news media almost daily quote from them, yet they are widely misused. Using students with prior managerial experience embarking on an MBA course, we show that common sample survey results are misunderstood even by those managers who have previously done a statistics course. In general, they fare no better than managers who have never studied statistics. There are implications for teaching, especially in business schools, as well as for consulting.
Statistical sampling approaches for soil monitoring
Brus, D.J.
2014-01-01
This paper describes three statistical sampling approaches for regional soil monitoring, a design-based, a model-based and a hybrid approach. In the model-based approach a space-time model is exploited to predict global statistical parameters of interest such as the space-time mean. In the hybrid
International Nuclear Information System (INIS)
Romero, Vicente J.; Burkardt, John V.; Gunzburger, Max D.; Peterson, Janet S.
2006-01-01
A recently developed centroidal Voronoi tessellation (CVT) sampling method is investigated here to assess its suitability for use in statistical sampling applications. CVT efficiently generates a highly uniform distribution of sample points over arbitrarily shaped M-dimensional parameter spaces. On several 2-D test problems CVT has recently been found to provide exceedingly effective and efficient point distributions for response surface generation. Additionally, for statistical function integration and estimation of response statistics associated with uniformly distributed random-variable inputs (uncorrelated), CVT has been found in initial investigations to provide superior points sets when compared against latin-hypercube and simple-random Monte Carlo methods and Halton and Hammersley quasi-random sequence methods. In this paper, the performance of all these sampling methods and a new variant ('Latinized' CVT) are further compared for non-uniform input distributions. Specifically, given uncorrelated normal inputs in a 2-D test problem, statistical sampling efficiencies are compared for resolving various statistics of response: mean, variance, and exceedence probabilities
Statistical aspects of food safety sampling
Jongenburger, I.; Besten, den H.M.W.; Zwietering, M.H.
2015-01-01
In food safety management, sampling is an important tool for verifying control. Sampling by nature is a stochastic process. However, uncertainty regarding results is made even greater by the uneven distribution of microorganisms in a batch of food. This article reviews statistical aspects of
Ayam, Rufus Tekoh
2011-01-01
PURPOSE: The two approaches to audit sampling; statistical and nonstatistical have been examined in this study. The overall purpose of the study is to explore the current extent at which statistical and nonstatistical sampling approaches are utilized by independent auditors during auditing practices. Moreover, the study also seeks to achieve two additional purposes; the first is to find out whether auditors utilize different sampling techniques when auditing SME´s (Small and Medium-Sized Ente...
Gene coexpression measures in large heterogeneous samples using count statistics.
Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan
2014-11-18
With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.
Measuring radioactive half-lives via statistical sampling in practice
Lorusso, G.; Collins, S. M.; Jagan, K.; Hitt, G. W.; Sadek, A. M.; Aitken-Smith, P. M.; Bridi, D.; Keightley, J. D.
2017-10-01
The statistical sampling method for the measurement of radioactive decay half-lives exhibits intriguing features such as that the half-life is approximately the median of a distribution closely resembling a Cauchy distribution. Whilst initial theoretical considerations suggested that in certain cases the method could have significant advantages, accurate measurements by statistical sampling have proven difficult, for they require an exercise in non-standard statistical analysis. As a consequence, no half-life measurement using this method has yet been reported and no comparison with traditional methods has ever been made. We used a Monte Carlo approach to address these analysis difficulties, and present the first experimental measurement of a radioisotope half-life (211Pb) by statistical sampling in good agreement with the literature recommended value. Our work also focused on the comparison between statistical sampling and exponential regression analysis, and concluded that exponential regression achieves generally the highest accuracy.
Effect of the absolute statistic on gene-sampling gene-set analysis methods.
Nam, Dougu
2017-06-01
Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.
Statistical benchmark for BosonSampling
International Nuclear Information System (INIS)
Walschaers, Mattia; Mayer, Klaus; Buchleitner, Andreas; Kuipers, Jack; Urbina, Juan-Diego; Richter, Klaus; Tichy, Malte Christopher
2016-01-01
Boson samplers—set-ups that generate complex many-particle output states through the transmission of elementary many-particle input states across a multitude of mutually coupled modes—promise the efficient quantum simulation of a classically intractable computational task, and challenge the extended Church–Turing thesis, one of the fundamental dogmas of computer science. However, as in all experimental quantum simulations of truly complex systems, one crucial problem remains: how to certify that a given experimental measurement record unambiguously results from enforcing the claimed dynamics, on bosons, fermions or distinguishable particles? Here we offer a statistical solution to the certification problem, identifying an unambiguous statistical signature of many-body quantum interference upon transmission across a multimode, random scattering device. We show that statistical analysis of only partial information on the output state allows to characterise the imparted dynamics through particle type-specific features of the emerging interference patterns. The relevant statistical quantifiers are classically computable, define a falsifiable benchmark for BosonSampling, and reveal distinctive features of many-particle quantum dynamics, which go much beyond mere bunching or anti-bunching effects. (fast track communication)
Directory of Open Access Journals (Sweden)
Elias Chaibub Neto
Full Text Available In this paper we propose a vectorized implementation of the non-parametric bootstrap for statistics based on sample moments. Basically, we adopt the multinomial sampling formulation of the non-parametric bootstrap, and compute bootstrap replications of sample moment statistics by simply weighting the observed data according to multinomial counts instead of evaluating the statistic on a resampled version of the observed data. Using this formulation we can generate a matrix of bootstrap weights and compute the entire vector of bootstrap replications with a few matrix multiplications. Vectorization is particularly important for matrix-oriented programming languages such as R, where matrix/vector calculations tend to be faster than scalar operations implemented in a loop. We illustrate the application of the vectorized implementation in real and simulated data sets, when bootstrapping Pearson's sample correlation coefficient, and compared its performance against two state-of-the-art R implementations of the non-parametric bootstrap, as well as a straightforward one based on a for loop. Our investigations spanned varying sample sizes and number of bootstrap replications. The vectorized bootstrap compared favorably against the state-of-the-art implementations in all cases tested, and was remarkably/considerably faster for small/moderate sample sizes. The same results were observed in the comparison with the straightforward implementation, except for large sample sizes, where the vectorized bootstrap was slightly slower than the straightforward implementation due to increased time expenditures in the generation of weight matrices via multinomial sampling.
Statistical conditional sampling for variable-resolution video compression.
Directory of Open Access Journals (Sweden)
Alexander Wong
Full Text Available In this study, we investigate a variable-resolution approach to video compression based on Conditional Random Field and statistical conditional sampling in order to further improve compression rate while maintaining high-quality video. In the proposed approach, representative key-frames within a video shot are identified and stored at full resolution. The remaining frames within the video shot are stored and compressed at a reduced resolution. At the decompression stage, a region-based dictionary is constructed from the key-frames and used to restore the reduced resolution frames to the original resolution via statistical conditional sampling. The sampling approach is based on the conditional probability of the CRF modeling by use of the constructed dictionary. Experimental results show that the proposed variable-resolution approach via statistical conditional sampling has potential for improving compression rates when compared to compressing the video at full resolution, while achieving higher video quality when compared to compressing the video at reduced resolution.
Statistical Analysis Of Tank 19F Floor Sample Results
International Nuclear Information System (INIS)
Harris, S.
2010-01-01
Representative sampling has been completed for characterization of the residual material on the floor of Tank 19F as per the statistical sampling plan developed by Harris and Shine. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples results to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL95%) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current scrape sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 19F. The uncertainty is quantified in this report by an UCL95% on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL95% was based entirely on the six current scrape sample results (each averaged across three analytical determinations).
International Nuclear Information System (INIS)
Jaech, J.L.
1984-01-01
In auditing and in inspection, one selects a number of items by some set of procedures and performs measurements which are compared with the operator's values. This session considers the problem of how to select the samples to be measured, and what kinds of measurements to make. In the inspection situation, the ultimate aim is to independently verify the operator's material balance. The effectiveness of the sample plan in achieving this objective is briefly considered. The discussion focuses on the model plant
Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module
Energy Technology Data Exchange (ETDEWEB)
Martinez, Gregory D. [University of California, Physics and Astronomy Department, Los Angeles, CA (United States); McKay, James; Scott, Pat [Imperial College London, Department of Physics, Blackett Laboratory, London (United Kingdom); Farmer, Ben; Conrad, Jan [AlbaNova University Centre, Oskar Klein Centre for Cosmoparticle Physics, Stockholm (Sweden); Stockholm University, Department of Physics, Stockholm (Sweden); Roebber, Elinore [McGill University, Department of Physics, Montreal, QC (Canada); Putze, Antje [LAPTh, Universite de Savoie, CNRS, Annecy-le-Vieux (France); Collaboration: The GAMBIT Scanner Workgroup
2017-11-15
We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework GAMBIT. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. We also announce the release of a new standalone differential evolution sampler, Diver, and describe its design, usage and interface to ScannerBit. We subject Diver and three other samplers (the nested sampler MultiNest, the MCMC GreAT, and the native ScannerBit implementation of the ensemble Monte Carlo algorithm T-Walk) to a battery of statistical tests. For this we use a realistic physical likelihood function, based on the scalar singlet model of dark matter. We examine the performance of each sampler as a function of its adjustable settings, and the dimensionality of the sampling problem. We evaluate performance on four metrics: optimality of the best fit found, completeness in exploring the best-fit region, number of likelihood evaluations, and total runtime. For Bayesian posterior estimation at high resolution, T-Walk provides the most accurate and timely mapping of the full parameter space. For profile likelihood analysis in less than about ten dimensions, we find that Diver and MultiNest score similarly in terms of best fit and speed, outperforming GreAT and T-Walk; in ten or more dimensions, Diver substantially outperforms the other three samplers on all metrics. (orig.)
STATISTICAL ANALYSIS OF TANK 18F FLOOR SAMPLE RESULTS
Energy Technology Data Exchange (ETDEWEB)
Harris, S.
2010-09-02
Representative sampling has been completed for characterization of the residual material on the floor of Tank 18F as per the statistical sampling plan developed by Shine [1]. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL [2]. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples results [3] to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL{sub 95%}) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 18F. The uncertainty is quantified in this report by an upper 95% confidence limit (UCL{sub 95%}) on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL{sub 95%} was based entirely on the six current scrape sample results (each averaged across three analytical determinations).
Garfield, Joan; Le, Laura; Zieffler, Andrew; Ben-Zvi, Dani
2015-01-01
This paper describes the importance of developing students' reasoning about samples and sampling variability as a foundation for statistical thinking. Research on expert-novice thinking as well as statistical thinking is reviewed and compared. A case is made that statistical thinking is a type of expert thinking, and as such, research…
Vali Ahmadi, Mohammad; Doostparast, Mahdi; Ahmadi, Jafar
2015-04-01
In manufacturing industries, the lifetime of an item is usually characterised by a random variable X and considered to be satisfactory if X exceeds a given lower lifetime limit L. The probability of a satisfactory item is then ηL := P(X ≥ L), called conforming rate. In industrial companies, however, the lifetime performance index, proposed by Montgomery and denoted by CL, is widely used as a process capability index instead of the conforming rate. Assuming a parametric model for the random variable X, we show that there is a connection between the conforming rate and the lifetime performance index. Consequently, the statistical inferences about ηL and CL are equivalent. Hence, we restrict ourselves to statistical inference for CL based on generalised order statistics, which contains several ordered data models such as usual order statistics, progressively Type-II censored data and records. Various point and interval estimators for the parameter CL are obtained and optimal critical regions for the hypothesis testing problems concerning CL are proposed. Finally, two real data-sets on the lifetimes of insulating fluid and ball bearings, due to Nelson (1982) and Caroni (2002), respectively, and a simulated sample are analysed.
Paechter, Manuela; Macher, Daniel; Martskvishvili, Khatuna; Wimmer, Sigrid; Papousek, Ilona
2017-01-01
In many social science majors, e.g., psychology, students report high levels of statistics anxiety. However, these majors are often chosen by students who are less prone to mathematics and who might have experienced difficulties and unpleasant feelings in their mathematics courses at school. The present study investigates whether statistics anxiety is a genuine form of anxiety that impairs students' achievements or whether learners mainly transfer previous experiences in mathematics and their anxiety in mathematics to statistics. The relationship between mathematics anxiety and statistics anxiety, their relationship to learning behaviors and to performance in a statistics examination were investigated in a sample of 225 undergraduate psychology students (164 women, 61 men). Data were recorded at three points in time: At the beginning of term students' mathematics anxiety, general proneness to anxiety, school grades, and demographic data were assessed; 2 weeks before the end of term, they completed questionnaires on statistics anxiety and their learning behaviors. At the end of term, examination scores were recorded. Mathematics anxiety and statistics anxiety correlated highly but the comparison of different structural equation models showed that they had genuine and even antagonistic contributions to learning behaviors and performance in the examination. Surprisingly, mathematics anxiety was positively related to performance. It might be that students realized over the course of their first term that knowledge and skills in higher secondary education mathematics are not sufficient to be successful in statistics. Part of mathematics anxiety may then have strengthened positive extrinsic effort motivation by the intention to avoid failure and may have led to higher effort for the exam preparation. However, via statistics anxiety mathematics anxiety also had a negative contribution to performance. Statistics anxiety led to higher procrastination in the structural
Paechter, Manuela; Macher, Daniel; Martskvishvili, Khatuna; Wimmer, Sigrid; Papousek, Ilona
2017-01-01
In many social science majors, e.g., psychology, students report high levels of statistics anxiety. However, these majors are often chosen by students who are less prone to mathematics and who might have experienced difficulties and unpleasant feelings in their mathematics courses at school. The present study investigates whether statistics anxiety is a genuine form of anxiety that impairs students' achievements or whether learners mainly transfer previous experiences in mathematics and their anxiety in mathematics to statistics. The relationship between mathematics anxiety and statistics anxiety, their relationship to learning behaviors and to performance in a statistics examination were investigated in a sample of 225 undergraduate psychology students (164 women, 61 men). Data were recorded at three points in time: At the beginning of term students' mathematics anxiety, general proneness to anxiety, school grades, and demographic data were assessed; 2 weeks before the end of term, they completed questionnaires on statistics anxiety and their learning behaviors. At the end of term, examination scores were recorded. Mathematics anxiety and statistics anxiety correlated highly but the comparison of different structural equation models showed that they had genuine and even antagonistic contributions to learning behaviors and performance in the examination. Surprisingly, mathematics anxiety was positively related to performance. It might be that students realized over the course of their first term that knowledge and skills in higher secondary education mathematics are not sufficient to be successful in statistics. Part of mathematics anxiety may then have strengthened positive extrinsic effort motivation by the intention to avoid failure and may have led to higher effort for the exam preparation. However, via statistics anxiety mathematics anxiety also had a negative contribution to performance. Statistics anxiety led to higher procrastination in the structural
Directory of Open Access Journals (Sweden)
Manuela Paechter
2017-07-01
Full Text Available In many social science majors, e.g., psychology, students report high levels of statistics anxiety. However, these majors are often chosen by students who are less prone to mathematics and who might have experienced difficulties and unpleasant feelings in their mathematics courses at school. The present study investigates whether statistics anxiety is a genuine form of anxiety that impairs students' achievements or whether learners mainly transfer previous experiences in mathematics and their anxiety in mathematics to statistics. The relationship between mathematics anxiety and statistics anxiety, their relationship to learning behaviors and to performance in a statistics examination were investigated in a sample of 225 undergraduate psychology students (164 women, 61 men. Data were recorded at three points in time: At the beginning of term students' mathematics anxiety, general proneness to anxiety, school grades, and demographic data were assessed; 2 weeks before the end of term, they completed questionnaires on statistics anxiety and their learning behaviors. At the end of term, examination scores were recorded. Mathematics anxiety and statistics anxiety correlated highly but the comparison of different structural equation models showed that they had genuine and even antagonistic contributions to learning behaviors and performance in the examination. Surprisingly, mathematics anxiety was positively related to performance. It might be that students realized over the course of their first term that knowledge and skills in higher secondary education mathematics are not sufficient to be successful in statistics. Part of mathematics anxiety may then have strengthened positive extrinsic effort motivation by the intention to avoid failure and may have led to higher effort for the exam preparation. However, via statistics anxiety mathematics anxiety also had a negative contribution to performance. Statistics anxiety led to higher procrastination in
A course in mathematical statistics and large sample theory
Bhattacharya, Rabi; Patrangenaru, Victor
2016-01-01
This graduate-level textbook is primarily aimed at graduate students of statistics, mathematics, science, and engineering who have had an undergraduate course in statistics, an upper division course in analysis, and some acquaintance with measure theoretic probability. It provides a rigorous presentation of the core of mathematical statistics. Part I of this book constitutes a one-semester course on basic parametric mathematical statistics. Part II deals with the large sample theory of statistics — parametric and nonparametric, and its contents may be covered in one semester as well. Part III provides brief accounts of a number of topics of current interest for practitioners and other disciplines whose work involves statistical methods. Large Sample theory with many worked examples, numerical calculations, and simulations to illustrate theory Appendices provide ready access to a number of standard results, with many proofs Solutions given to a number of selected exercises from Part I Part II exercises with ...
Performance modeling, stochastic networks, and statistical multiplexing
Mazumdar, Ravi R
2013-01-01
This monograph presents a concise mathematical approach for modeling and analyzing the performance of communication networks with the aim of introducing an appropriate mathematical framework for modeling and analysis as well as understanding the phenomenon of statistical multiplexing. The models, techniques, and results presented form the core of traffic engineering methods used to design, control and allocate resources in communication networks.The novelty of the monograph is the fresh approach and insights provided by a sample-path methodology for queueing models that highlights the importan
Illustrating Sampling Distribution of a Statistic: Minitab Revisited
Johnson, H. Dean; Evans, Marc A.
2008-01-01
Understanding the concept of the sampling distribution of a statistic is essential for the understanding of inferential procedures. Unfortunately, this topic proves to be a stumbling block for students in introductory statistics classes. In efforts to aid students in their understanding of this concept, alternatives to a lecture-based mode of…
Energy Technology Data Exchange (ETDEWEB)
Piepel, Gregory F.; Matzke, Brett D.; Sego, Landon H.; Amidan, Brett G.
2013-04-27
This report discusses the methodology, formulas, and inputs needed to make characterization and clearance decisions for Bacillus anthracis-contaminated and uncontaminated (or decontaminated) areas using a statistical sampling approach. Specifically, the report includes the methods and formulas for calculating the • number of samples required to achieve a specified confidence in characterization and clearance decisions • confidence in making characterization and clearance decisions for a specified number of samples for two common statistically based environmental sampling approaches. In particular, the report addresses an issue raised by the Government Accountability Office by providing methods and formulas to calculate the confidence that a decision area is uncontaminated (or successfully decontaminated) if all samples collected according to a statistical sampling approach have negative results. Key to addressing this topic is the probability that an individual sample result is a false negative, which is commonly referred to as the false negative rate (FNR). The two statistical sampling approaches currently discussed in this report are 1) hotspot sampling to detect small isolated contaminated locations during the characterization phase, and 2) combined judgment and random (CJR) sampling during the clearance phase. Typically if contamination is widely distributed in a decision area, it will be detectable via judgment sampling during the characterization phrase. Hotspot sampling is appropriate for characterization situations where contamination is not widely distributed and may not be detected by judgment sampling. CJR sampling is appropriate during the clearance phase when it is desired to augment judgment samples with statistical (random) samples. The hotspot and CJR statistical sampling approaches are discussed in the report for four situations: 1. qualitative data (detect and non-detect) when the FNR = 0 or when using statistical sampling methods that account
Pitard, Francis F
1993-01-01
Pierre Gy's Sampling Theory and Sampling Practice, Second Edition is a concise, step-by-step guide for process variability management and methods. Updated and expanded, this new edition provides a comprehensive study of heterogeneity, covering the basic principles of sampling theory and its various applications. It presents many practical examples to allow readers to select appropriate sampling protocols and assess the validity of sampling protocols from others. The variability of dynamic process streams using variography is discussed to help bridge sampling theory with statistical process control. Many descriptions of good sampling devices, as well as descriptions of poor ones, are featured to educate readers on what to look for when purchasing sampling systems. The book uses its accessible, tutorial style to focus on professional selection and use of methods. The book will be a valuable guide for mineral processing engineers; metallurgists; geologists; miners; chemists; environmental scientists; and practit...
Directory of Open Access Journals (Sweden)
D.P. van der Nest
2015-03-01
Full Text Available This article explores the use by internal audit functions of audit sampling techniques in order to test the effectiveness of controls in the banking sector. The article focuses specifically on the use of statistical and/or non-statistical sampling techniques by internal auditors. The focus of the research for this article was internal audit functions in the banking sector of South Africa. The results discussed in the article indicate that audit sampling is still used frequently as an audit evidence-gathering technique. Non-statistical sampling techniques are used more frequently than statistical sampling techniques for the evaluation of the sample. In addition, both techniques are regarded as important for the determination of the sample size and the selection of the sample items
Finite-sample instrumental variables inference using an asymptotically pivotal statistic
Bekker, P; Kleibergen, F
2003-01-01
We consider the K-statistic, Kleibergen's (2002, Econometrica 70, 1781-1803) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Whereas Kleibergen (2002) especially analyzes the asymptotic behavior of the statistic, we focus on finite-sample properties in, a
Comparative Gender Performance in Business Statistics.
Mogull, Robert G.
1989-01-01
Comparative performance of male and female students in introductory and intermediate statistics classes was examined for over 16 years at a state university. Gender means from 97 classes and 1,609 males and 1,085 females revealed a probabilistic--although statistically insignificant--superior performance by female students that appeared to…
Statistical sampling method for releasing decontaminated vehicles
International Nuclear Information System (INIS)
Lively, J.W.; Ware, J.A.
1996-01-01
Earth moving vehicles (e.g., dump trucks, belly dumps) commonly haul radiologically contaminated materials from a site being remediated to a disposal site. Traditionally, each vehicle must be surveyed before being released. The logistical difficulties of implementing the traditional approach on a large scale demand that an alternative be devised. A statistical method (MIL-STD-105E, open-quotes Sampling Procedures and Tables for Inspection by Attributesclose quotes) for assessing product quality from a continuous process was adapted to the vehicle decontamination process. This method produced a sampling scheme that automatically compensates and accommodates fluctuating batch sizes and changing conditions without the need to modify or rectify the sampling scheme in the field. Vehicles are randomly selected (sampled) upon completion of the decontamination process to be surveyed for residual radioactive surface contamination. The frequency of sampling is based on the expected number of vehicles passing through the decontamination process in a given period and the confidence level desired. This process has been successfully used for 1 year at the former uranium mill site in Monticello, Utah (a CERCLA regulated clean-up site). The method forces improvement in the quality of the decontamination process and results in a lower likelihood that vehicles exceeding the surface contamination standards are offered for survey. Implementation of this statistical sampling method on Monticello Projects has resulted in more efficient processing of vehicles through decontamination and radiological release, saved hundreds of hours of processing time, provided a high level of confidence that release limits are met, and improved the radiological cleanliness of vehicles leaving the controlled site
Breaking Free of Sample Size Dogma to Perform Innovative Translational Research
Bacchetti, Peter; Deeks, Steven G.; McCune, Joseph M.
2011-01-01
Innovative clinical and translational research is often delayed or prevented by reviewers’ expectations that any study performed in humans must be shown in advance to have high statistical power. This supposed requirement is not justifiable and is contradicted by the reality that increasing sample size produces diminishing marginal returns. Studies of new ideas often must start small (sometimes even with an N of 1) because of cost and feasibility concerns, and recent statistical work shows that small sample sizes for such research can produce more projected scientific value per dollar spent than larger sample sizes. Renouncing false dogma about sample size would remove a serious barrier to innovation and translation. PMID:21677197
Determination of Sr-90 in milk samples from the study of statistical results
Directory of Open Access Journals (Sweden)
Otero-Pazos Alberto
2017-01-01
Full Text Available The determination of 90Sr in milk samples is the main objective of radiation monitoring laboratories because of its environmental importance. In this paper the concentration of activity of 39 milk samples was obtained through radiochemical separation based on selective retention of Sr in a cationic resin (Dowex 50WX8, 50-100 mesh and subsequent determination by a low-level proportional gas counter. The results were checked by performing the measurement of the Sr concentration by using the flame atomic absorption spectroscopy technique, to finally obtain the mass of 90Sr. From the data obtained a statistical treatment was performed using linear regressions. A reliable estimate of the mass of 90Sr was obtained based on the gravimetric technique, and secondly, the counts per minute of the third measurement in the 90Sr and 90Y equilibrium, without having to perform the analysis. These estimates have been verified with 19 milk samples, obtaining overlapping results. The novelty of the manuscript is the possibility of determining the concentration of 90Sr in milk samples, without the need to perform the third measurement in the equilibrium.
Statistical sampling for holdup measurement
International Nuclear Information System (INIS)
Picard, R.R.; Pillay, K.K.S.
1986-01-01
Nuclear materials holdup is a serious problem in many operating facilities. Estimating amounts of holdup is important for materials accounting and, sometimes, for process safety. Clearly, measuring holdup in all pieces of equipment is not a viable option in terms of time, money, and radiation exposure to personnel. Furthermore, 100% measurement is not only impractical but unnecessary for developing estimated values. Principles of statistical sampling are valuable in the design of cost effective holdup monitoring plans and in qualifying uncertainties in holdup estimates. The purpose of this paper is to describe those principles and to illustrate their use
New Hybrid Monte Carlo methods for efficient sampling. From physics to biology and statistics
International Nuclear Information System (INIS)
Akhmatskaya, Elena; Reich, Sebastian
2011-01-01
We introduce a class of novel hybrid methods for detailed simulations of large complex systems in physics, biology, materials science and statistics. These generalized shadow Hybrid Monte Carlo (GSHMC) methods combine the advantages of stochastic and deterministic simulation techniques. They utilize a partial momentum update to retain some of the dynamical information, employ modified Hamiltonians to overcome exponential performance degradation with the system’s size and make use of multi-scale nature of complex systems. Variants of GSHMCs were developed for atomistic simulation, particle simulation and statistics: GSHMC (thermodynamically consistent implementation of constant-temperature molecular dynamics), MTS-GSHMC (multiple-time-stepping GSHMC), meso-GSHMC (Metropolis corrected dissipative particle dynamics (DPD) method), and a generalized shadow Hamiltonian Monte Carlo, GSHmMC (a GSHMC for statistical simulations). All of these are compatible with other enhanced sampling techniques and suitable for massively parallel computing allowing for a range of multi-level parallel strategies. A brief description of the GSHMC approach, examples of its application on high performance computers and comparison with other existing techniques are given. Our approach is shown to resolve such problems as resonance instabilities of the MTS methods and non-preservation of thermodynamic equilibrium properties in DPD, and to outperform known methods in sampling efficiency by an order of magnitude. (author)
Statistical Sampling For In-Service Inspection Of Liquid Waste Tanks At The Savannah River Site
International Nuclear Information System (INIS)
Harris, S.
2011-01-01
Savannah River Remediation, LLC (SRR) is implementing a statistical sampling strategy for In-Service Inspection (ISI) of Liquid Waste (LW) Tanks at the United States Department of Energy's Savannah River Site (SRS) in Aiken, South Carolina. As a component of SRS's corrosion control program, the ISI program assesses tank wall structural integrity through the use of ultrasonic testing (UT). The statistical strategy for ISI is based on the random sampling of a number of vertically oriented unit areas, called strips, within each tank. The number of strips to inspect was determined so as to attain, over time, a high probability of observing at least one of the worst 5% in terms of pitting and corrosion across all tanks. The probability estimation to determine the number of strips to inspect was performed using the hypergeometric distribution. Statistical tolerance limits for pit depth and corrosion rates were calculated by fitting the lognormal distribution to the data. In addition to the strip sampling strategy, a single strip within each tank was identified to serve as the baseline for a longitudinal assessment of the tank safe operational life. The statistical sampling strategy enables the ISI program to develop individual profiles of LW tank wall structural integrity that collectively provide a high confidence in their safety and integrity over operational lifetimes.
Sampling methods to the statistical control of the production of blood components.
Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo
2017-12-01
The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.
A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.
Lai, En-Yu; Chen, Yi-Hau; Wu, Kun-Pin
2017-06-01
Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https
Multivariate statistics high-dimensional and large-sample approximations
Fujikoshi, Yasunori; Shimizu, Ryoichi
2010-01-01
A comprehensive examination of high-dimensional analysis of multivariate methods and their real-world applications Multivariate Statistics: High-Dimensional and Large-Sample Approximations is the first book of its kind to explore how classical multivariate methods can be revised and used in place of conventional statistical tools. Written by prominent researchers in the field, the book focuses on high-dimensional and large-scale approximations and details the many basic multivariate methods used to achieve high levels of accuracy. The authors begin with a fundamental presentation of the basic
Sample Reuse in Statistical Remodeling.
1987-08-01
as the jackknife and bootstrap, is an expansion of the functional, T(Fn), or of its distribution function or both. Frangos and Schucany (1987a) used...accelerated bootstrap. In the same report Frangos and Schucany demonstrated the small sample superiority of that approach over the proposals that take...higher order terms of an Edgeworth expansion into account. In a second report Frangos and Schucany (1987b) examined the small sample performance of
The Role of the Sampling Distribution in Understanding Statistical Inference
Lipson, Kay
2003-01-01
Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…
Weighted statistical parameters for irregularly sampled time series
Rimoldini, Lorenzo
2014-01-01
Unevenly spaced time series are common in astronomy because of the day-night cycle, weather conditions, dependence on the source position in the sky, allocated telescope time and corrupt measurements, for example, or inherent to the scanning law of satellites like Hipparcos and the forthcoming Gaia. Irregular sampling often causes clumps of measurements and gaps with no data which can severely disrupt the values of estimators. This paper aims at improving the accuracy of common statistical parameters when linear interpolation (in time or phase) can be considered an acceptable approximation of a deterministic signal. A pragmatic solution is formulated in terms of a simple weighting scheme, adapting to the sampling density and noise level, applicable to large data volumes at minimal computational cost. Tests on time series from the Hipparcos periodic catalogue led to significant improvements in the overall accuracy and precision of the estimators with respect to the unweighted counterparts and those weighted by inverse-squared uncertainties. Automated classification procedures employing statistical parameters weighted by the suggested scheme confirmed the benefits of the improved input attributes. The classification of eclipsing binaries, Mira, RR Lyrae, Delta Cephei and Alpha2 Canum Venaticorum stars employing exclusively weighted descriptive statistics achieved an overall accuracy of 92 per cent, about 6 per cent higher than with unweighted estimators.
[Effect sizes, statistical power and sample sizes in "the Japanese Journal of Psychology"].
Suzukawa, Yumi; Toyoda, Hideki
2012-04-01
This study analyzed the statistical power of research studies published in the "Japanese Journal of Psychology" in 2008 and 2009. Sample effect sizes and sample statistical powers were calculated for each statistical test and analyzed with respect to the analytical methods and the fields of the studies. The results show that in the fields like perception, cognition or learning, the effect sizes were relatively large, although the sample sizes were small. At the same time, because of the small sample sizes, some meaningful effects could not be detected. In the other fields, because of the large sample sizes, meaningless effects could be detected. This implies that researchers who could not get large enough effect sizes would use larger samples to obtain significant results.
Energy Technology Data Exchange (ETDEWEB)
Song, Myung Sub; Kim, Song Hyun; Kim, Jong Kyung [Hanyang Univ., Seoul (Korea, Republic of); Noh, Jae Man [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)
2013-10-15
The uncertainty evaluation with statistical method is performed by repetition of transport calculation with sampling the directly perturbed nuclear data. Hence, the reliable uncertainty result can be obtained by analyzing the results of the numerous transport calculations. One of the problems in the uncertainty analysis with the statistical approach is known as that the cross section sampling from the normal (Gaussian) distribution with relatively large standard deviation leads to the sampling error of the cross sections such as the sampling of the negative cross section. Some collection methods are noted; however, the methods can distort the distribution of the sampled cross sections. In this study, a sampling method of the nuclear data is proposed by using lognormal distribution. After that, the criticality calculations with sampled nuclear data are performed and the results are compared with that from the normal distribution which is conventionally used in the previous studies. In this study, the statistical sampling method of the cross section with the lognormal distribution was proposed to increase the sampling accuracy without negative sampling error. Also, a stochastic cross section sampling and writing program was developed. For the sensitivity and uncertainty analysis, the cross section sampling was pursued with the normal and lognormal distribution. The uncertainties, which are caused by covariance of (n,.) cross sections, were evaluated by solving GODIVA problem. The results show that the sampling method with lognormal distribution can efficiently solve the negative sampling problem referred in the previous studies. It is expected that this study will contribute to increase the accuracy of the sampling-based uncertainty analysis.
International Nuclear Information System (INIS)
Song, Myung Sub; Kim, Song Hyun; Kim, Jong Kyung; Noh, Jae Man
2013-01-01
The uncertainty evaluation with statistical method is performed by repetition of transport calculation with sampling the directly perturbed nuclear data. Hence, the reliable uncertainty result can be obtained by analyzing the results of the numerous transport calculations. One of the problems in the uncertainty analysis with the statistical approach is known as that the cross section sampling from the normal (Gaussian) distribution with relatively large standard deviation leads to the sampling error of the cross sections such as the sampling of the negative cross section. Some collection methods are noted; however, the methods can distort the distribution of the sampled cross sections. In this study, a sampling method of the nuclear data is proposed by using lognormal distribution. After that, the criticality calculations with sampled nuclear data are performed and the results are compared with that from the normal distribution which is conventionally used in the previous studies. In this study, the statistical sampling method of the cross section with the lognormal distribution was proposed to increase the sampling accuracy without negative sampling error. Also, a stochastic cross section sampling and writing program was developed. For the sensitivity and uncertainty analysis, the cross section sampling was pursued with the normal and lognormal distribution. The uncertainties, which are caused by covariance of (n,.) cross sections, were evaluated by solving GODIVA problem. The results show that the sampling method with lognormal distribution can efficiently solve the negative sampling problem referred in the previous studies. It is expected that this study will contribute to increase the accuracy of the sampling-based uncertainty analysis
Constrained statistical inference: sample-size tables for ANOVA and regression
Directory of Open Access Journals (Sweden)
Leonard eVanbrabant
2015-01-01
Full Text Available Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient beta1 is larger than beta2 and beta3. The corresponding hypothesis is H: beta1 > {beta2, beta3} and this is known as an (order constrained hypothesis. A major advantage of testing such a hypothesis is that power can be gained and inherently a smaller sample size is needed. This article discusses this gain in sample size reduction, when an increasing number of constraints is included into the hypothesis. The main goal is to present sample-size tables for constrained hypotheses. A sample-size table contains the necessary sample-size at a prespecified power (say, 0.80 for an increasing number of constraints. To obtain sample-size tables, two Monte Carlo simulations were performed, one for ANOVA and one for multiple regression. Three results are salient. First, in an ANOVA the needed sample-size decreases with 30% to 50% when complete ordering of the parameters is taken into account. Second, small deviations from the imposed order have only a minor impact on the power. Third, at the maximum number of constraints, the linear regression results are comparable with the ANOVA results. However, in the case of fewer constraints, ordering the parameters (e.g., beta1 > beta2 results in a higher power than assigning a positive or a negative sign to the parameters (e.g., beta1 > 0.
Directory of Open Access Journals (Sweden)
Shirin Iranfar
2013-12-01
Full Text Available Introduction: Test anxiety is a common phenomenon among students and is one of the problems of educational system. The present study was conducted to investigate the test anxiety in vital statistics course and its association with academic performance of students at Kermanshah University of Medical Sciences. This study was descriptive-analytical and the study sample included the students studying in nursing and midwifery, paramedicine and health faculties that had taken vital statistics course and were selected through census method. Sarason questionnaire was used to analyze the test anxiety. Data were analyzed by descriptive and inferential statistics. The findings indicated no significant correlation between test anxiety and score of vital statistics course.
Effect of model choice and sample size on statistical tolerance limits
International Nuclear Information System (INIS)
Duran, B.S.; Campbell, K.
1980-03-01
Statistical tolerance limits are estimates of large (or small) quantiles of a distribution, quantities which are very sensitive to the shape of the tail of the distribution. The exact nature of this tail behavior cannot be ascertained brom small samples, so statistical tolerance limits are frequently computed using a statistical model chosen on the basis of theoretical considerations or prior experience with similar populations. This report illustrates the effects of such choices on the computations
The Performance Analysis Based on SAR Sample Covariance Matrix
Directory of Open Access Journals (Sweden)
Esra Erten
2012-03-01
Full Text Available Multi-channel systems appear in several fields of application in science. In the Synthetic Aperture Radar (SAR context, multi-channel systems may refer to different domains, as multi-polarization, multi-interferometric or multi-temporal data, or even a combination of them. Due to the inherent speckle phenomenon present in SAR images, the statistical description of the data is almost mandatory for its utilization. The complex images acquired over natural media present in general zero-mean circular Gaussian characteristics. In this case, second order statistics as the multi-channel covariance matrix fully describe the data. For practical situations however, the covariance matrix has to be estimated using a limited number of samples, and this sample covariance matrix follow the complex Wishart distribution. In this context, the eigendecomposition of the multi-channel covariance matrix has been shown in different areas of high relevance regarding the physical properties of the imaged scene. Specifically, the maximum eigenvalue of the covariance matrix has been frequently used in different applications as target or change detection, estimation of the dominant scattering mechanism in polarimetric data, moving target indication, etc. In this paper, the statistical behavior of the maximum eigenvalue derived from the eigendecomposition of the sample multi-channel covariance matrix in terms of multi-channel SAR images is simplified for SAR community. Validation is performed against simulated data and examples of estimation and detection problems using the analytical expressions are as well given.
Statistically robust sampling strategies form an integral component of grain storage and handling activities throughout the world. Developing sampling strategies to target biological pests such as insects in stored grain is inherently difficult due to species biology and behavioral characteristics. ...
Chance constrained problems: penalty reformulation and performance of sample approximation technique
Czech Academy of Sciences Publication Activity Database
Branda, Martin
2012-01-01
Roč. 48, č. 1 (2012), s. 105-122 ISSN 0023-5954 R&D Projects: GA ČR(CZ) GBP402/12/G097 Institutional research plan: CEZ:AV0Z10750506 Keywords : chance constrained problems * penalty functions * asymptotic equivalence * sample approximation technique * investment problem Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.619, year: 2012 http://library.utia.cas.cz/separaty/2012/E/branda-chance constrained problems penalty reformulation and performance of sample approximation technique.pdf
Improving Statistics Education through Simulations: The Case of the Sampling Distribution.
Earley, Mark A.
This paper presents a summary of action research investigating statistics students' understandings of the sampling distribution of the mean. With four sections of an introductory Statistics in Education course (n=98 students), a computer simulation activity (R. delMas, J. Garfield, and B. Chance, 1999) was implemented and evaluated to show…
Comparing Simulated and Theoretical Sampling Distributions of the U3 Person-Fit Statistic.
Emons, Wilco H. M.; Meijer, Rob R.; Sijtsma, Klaas
2002-01-01
Studied whether the theoretical sampling distribution of the U3 person-fit statistic is in agreement with the simulated sampling distribution under different item response theory models and varying item and test characteristics. Simulation results suggest that the use of standard normal deviates for the standardized version of the U3 statistic may…
Statistical learning methods: Basics, control and performance
Energy Technology Data Exchange (ETDEWEB)
Zimmermann, J. [Max-Planck-Institut fuer Physik, Foehringer Ring 6, 80805 Munich (Germany)]. E-mail: zimmerm@mppmu.mpg.de
2006-04-01
The basics of statistical learning are reviewed with a special emphasis on general principles and problems for all different types of learning methods. Different aspects of controlling these methods in a physically adequate way will be discussed. All principles and guidelines will be exercised on examples for statistical learning methods in high energy and astrophysics. These examples prove in addition that statistical learning methods very often lead to a remarkable performance gain compared to the competing classical algorithms.
Statistical learning methods: Basics, control and performance
International Nuclear Information System (INIS)
Zimmermann, J.
2006-01-01
The basics of statistical learning are reviewed with a special emphasis on general principles and problems for all different types of learning methods. Different aspects of controlling these methods in a physically adequate way will be discussed. All principles and guidelines will be exercised on examples for statistical learning methods in high energy and astrophysics. These examples prove in addition that statistical learning methods very often lead to a remarkable performance gain compared to the competing classical algorithms
Directory of Open Access Journals (Sweden)
Scheid Anika
2012-07-01
Full Text Available Abstract Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent stochastic context-free grammar (SCFG that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples, where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones, then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst
Mohd Fo'ad Rohani; Mohd Aizaini Maarof; Ali Selamat; Houssain Kettani
2010-01-01
This paper proposes a Multi-Level Sampling (MLS) approach for continuous Loss of Self-Similarity (LoSS) detection using iterative window. The method defines LoSS based on Second Order Self-Similarity (SOSS) statistical model. The Optimization Method (OM) is used to estimate self-similarity parameter since it is fast and more accurate in comparison with other estimation methods known in the literature. Probability of LoSS detection is introduced to measure continuous LoSS detection performance...
Nomogram for sample size calculation on a straightforward basis for the kappa statistic.
Hong, Hyunsook; Choi, Yunhee; Hahn, Seokyung; Park, Sue Kyung; Park, Byung-Joo
2014-09-01
Kappa is a widely used measure of agreement. However, it may not be straightforward in some situation such as sample size calculation due to the kappa paradox: high agreement but low kappa. Hence, it seems reasonable in sample size calculation that the level of agreement under a certain marginal prevalence is considered in terms of a simple proportion of agreement rather than a kappa value. Therefore, sample size formulae and nomograms using a simple proportion of agreement rather than a kappa under certain marginal prevalences are proposed. A sample size formula was derived using the kappa statistic under the common correlation model and goodness-of-fit statistic. The nomogram for the sample size formula was developed using SAS 9.3. The sample size formulae using a simple proportion of agreement instead of a kappa statistic and nomograms to eliminate the inconvenience of using a mathematical formula were produced. A nomogram for sample size calculation with a simple proportion of agreement should be useful in the planning stages when the focus of interest is on testing the hypothesis of interobserver agreement involving two raters and nominal outcome measures. Copyright © 2014 Elsevier Inc. All rights reserved.
Statistics Anxiety, Trait Anxiety, Learning Behavior, and Academic Performance
Macher, Daniel; Paechter, Manuela; Papousek, Ilona; Ruggeri, Kai
2012-01-01
The present study investigated the relationship between statistics anxiety, individual characteristics (e.g., trait anxiety and learning strategies), and academic performance. Students enrolled in a statistics course in psychology (N = 147) filled in a questionnaire on statistics anxiety, trait anxiety, interest in statistics, mathematical…
Examination of statistical noise in SPECT image and sampling pitch
International Nuclear Information System (INIS)
Takaki, Akihiro; Soma, Tsutomu; Murase, Kenya; Watanabe, Hiroyuki; Murakami, Tomonori; Kawakami, Kazunori; Teraoka, Satomi; Kojima, Akihiro; Matsumoto, Masanori
2008-01-01
Statistical noise in single photon emission computed tomography (SPECT) image was examined for its relation with total count and with sampling pitch by simulation and phantom experiment to obtain their projection data under defined conditions. The former SPECT simulation was performed on assumption of a virtual, homogeneous water column (20 cm diameter) as an absorbing mass. In the latter, used were 3D-Hoffman brain phantom (Data Spectrum Corp.) filled with 370 MBq of 99m Tc-pertechnetate solution and a facing 2-detector SPECT machine with a low-energy/high-resolution collimator, E-CAM (Siemens). Projected data by the two methods were reconstructed through the filtered back projection to make each transaxial image. The noise was evaluated by vision, by their root mean square uncertainty calculated from average count and standard deviation (SD) in the region of interest (ROI) defined in reconstructed images and by normalized mean squares calculated from the difference between the reference image obtained with common sampling pitch to and all of obtained slices of, the simulation and phantom. As a conclusion, the pitch was recommended to be set in the machine as to approximating the value calculated by the sampling theorem, though the projection counts per one angular direction were smaller with the same total time of data acquisition. (R.T.)
Statistical methods for detecting differentially abundant features in clinical metagenomic samples.
Directory of Open Access Journals (Sweden)
James Robert White
2009-04-01
Full Text Available Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them.We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software
Self-assessed performance improves statistical fusion of image labels
Energy Technology Data Exchange (ETDEWEB)
Bryan, Frederick W., E-mail: frederick.w.bryan@vanderbilt.edu; Xu, Zhoubing; Asman, Andrew J.; Allen, Wade M. [Electrical Engineering, Vanderbilt University, Nashville, Tennessee 37235 (United States); Reich, Daniel S. [Translational Neuroradiology Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892 (United States); Landman, Bennett A. [Electrical Engineering, Vanderbilt University, Nashville, Tennessee 37235 (United States); Biomedical Engineering, Vanderbilt University, Nashville, Tennessee 37235 (United States); and Radiology and Radiological Sciences, Vanderbilt University, Nashville, Tennessee 37235 (United States)
2014-03-15
Purpose: Expert manual labeling is the gold standard for image segmentation, but this process is difficult, time-consuming, and prone to inter-individual differences. While fully automated methods have successfully targeted many anatomies, automated methods have not yet been developed for numerous essential structures (e.g., the internal structure of the spinal cord as seen on magnetic resonance imaging). Collaborative labeling is a new paradigm that offers a robust alternative that may realize both the throughput of automation and the guidance of experts. Yet, distributing manual labeling expertise across individuals and sites introduces potential human factors concerns (e.g., training, software usability) and statistical considerations (e.g., fusion of information, assessment of confidence, bias) that must be further explored. During the labeling process, it is simple to ask raters to self-assess the confidence of their labels, but this is rarely done and has not been previously quantitatively studied. Herein, the authors explore the utility of self-assessment in relation to automated assessment of rater performance in the context of statistical fusion. Methods: The authors conducted a study of 66 volumes manually labeled by 75 minimally trained human raters recruited from the university undergraduate population. Raters were given 15 min of training during which they were shown examples of correct segmentation, and the online segmentation tool was demonstrated. The volumes were labeled 2D slice-wise, and the slices were unordered. A self-assessed quality metric was produced by raters for each slice by marking a confidence bar superimposed on the slice. Volumes produced by both voting and statistical fusion algorithms were compared against a set of expert segmentations of the same volumes. Results: Labels for 8825 distinct slices were obtained. Simple majority voting resulted in statistically poorer performance than voting weighted by self-assessed performance
Self-assessed performance improves statistical fusion of image labels
International Nuclear Information System (INIS)
Bryan, Frederick W.; Xu, Zhoubing; Asman, Andrew J.; Allen, Wade M.; Reich, Daniel S.; Landman, Bennett A.
2014-01-01
Purpose: Expert manual labeling is the gold standard for image segmentation, but this process is difficult, time-consuming, and prone to inter-individual differences. While fully automated methods have successfully targeted many anatomies, automated methods have not yet been developed for numerous essential structures (e.g., the internal structure of the spinal cord as seen on magnetic resonance imaging). Collaborative labeling is a new paradigm that offers a robust alternative that may realize both the throughput of automation and the guidance of experts. Yet, distributing manual labeling expertise across individuals and sites introduces potential human factors concerns (e.g., training, software usability) and statistical considerations (e.g., fusion of information, assessment of confidence, bias) that must be further explored. During the labeling process, it is simple to ask raters to self-assess the confidence of their labels, but this is rarely done and has not been previously quantitatively studied. Herein, the authors explore the utility of self-assessment in relation to automated assessment of rater performance in the context of statistical fusion. Methods: The authors conducted a study of 66 volumes manually labeled by 75 minimally trained human raters recruited from the university undergraduate population. Raters were given 15 min of training during which they were shown examples of correct segmentation, and the online segmentation tool was demonstrated. The volumes were labeled 2D slice-wise, and the slices were unordered. A self-assessed quality metric was produced by raters for each slice by marking a confidence bar superimposed on the slice. Volumes produced by both voting and statistical fusion algorithms were compared against a set of expert segmentations of the same volumes. Results: Labels for 8825 distinct slices were obtained. Simple majority voting resulted in statistically poorer performance than voting weighted by self-assessed performance
Statistical evaluation of diagnostic performance topics in ROC analysis
Zou, Kelly H; Bandos, Andriy I; Ohno-Machado, Lucila; Rockette, Howard E
2016-01-01
Statistical evaluation of diagnostic performance in general and Receiver Operating Characteristic (ROC) analysis in particular are important for assessing the performance of medical tests and statistical classifiers, as well as for evaluating predictive models or algorithms. This book presents innovative approaches in ROC analysis, which are relevant to a wide variety of applications, including medical imaging, cancer research, epidemiology, and bioinformatics. Statistical Evaluation of Diagnostic Performance: Topics in ROC Analysis covers areas including monotone-transformation techniques in parametric ROC analysis, ROC methods for combined and pooled biomarkers, Bayesian hierarchical transformation models, sequential designs and inferences in the ROC setting, predictive modeling, multireader ROC analysis, and free-response ROC (FROC) methodology. The book is suitable for graduate-level students and researchers in statistics, biostatistics, epidemiology, public health, biomedical engineering, radiology, medi...
Directory of Open Access Journals (Sweden)
R. Eric Heidel
2016-01-01
Full Text Available Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by an a priori sample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up an a priori sample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.
Statistical analyses to support guidelines for marine avian sampling. Final report
Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris
2012-01-01
Interest in development of offshore renewable energy facilities has led to a need for high-quality, statistically robust information on marine wildlife distributions. A practical approach is described to estimate the amount of sampling effort required to have sufficient statistical power to identify species-specific “hotspots” and “coldspots” of marine bird abundance and occurrence in an offshore environment divided into discrete spatial units (e.g., lease blocks), where “hotspots” and “coldspots” are defined relative to a reference (e.g., regional) mean abundance and/or occurrence probability for each species of interest. For example, a location with average abundance or occurrence that is three times larger the mean (3x effect size) could be defined as a “hotspot,” and a location that is three times smaller than the mean (1/3x effect size) as a “coldspot.” The choice of the effect size used to define hot and coldspots will generally depend on a combination of ecological and regulatory considerations. A method is also developed for testing the statistical significance of possible hotspots and coldspots. Both methods are illustrated with historical seabird survey data from the USGS Avian Compendium Database. Our approach consists of five main components: 1. A review of the primary scientific literature on statistical modeling of animal group size and avian count data to develop a candidate set of statistical distributions that have been used or may be useful to model seabird counts. 2. Statistical power curves for one-sample, one-tailed Monte Carlo significance tests of differences of observed small-sample means from a specified reference distribution. These curves show the power to detect "hotspots" or "coldspots" of occurrence and abundance at a range of effect sizes, given assumptions which we discuss. 3. A model selection procedure, based on maximum likelihood fits of models in the candidate set, to determine an appropriate statistical
Statistical characterization of a large geochemical database and effect of sample size
Zhang, C.; Manheim, F.T.; Hinde, J.; Grossman, J.N.
2005-01-01
The authors investigated statistical distributions for concentrations of chemical elements from the National Geochemical Survey (NGS) database of the U.S. Geological Survey. At the time of this study, the NGS data set encompasses 48,544 stream sediment and soil samples from the conterminous United States analyzed by ICP-AES following a 4-acid near-total digestion. This report includes 27 elements: Al, Ca, Fe, K, Mg, Na, P, Ti, Ba, Ce, Co, Cr, Cu, Ga, La, Li, Mn, Nb, Nd, Ni, Pb, Sc, Sr, Th, V, Y and Zn. The goal and challenge for the statistical overview was to delineate chemical distributions in a complex, heterogeneous data set spanning a large geographic range (the conterminous United States), and many different geological provinces and rock types. After declustering to create a uniform spatial sample distribution with 16,511 samples, histograms and quantile-quantile (Q-Q) plots were employed to delineate subpopulations that have coherent chemical and mineral affinities. Probability groupings are discerned by changes in slope (kinks) on the plots. Major rock-forming elements, e.g., Al, Ca, K and Na, tend to display linear segments on normal Q-Q plots. These segments can commonly be linked to petrologic or mineralogical associations. For example, linear segments on K and Na plots reflect dilution of clay minerals by quartz sand (low in K and Na). Minor and trace element relationships are best displayed on lognormal Q-Q plots. These sensitively reflect discrete relationships in subpopulations within the wide range of the data. For example, small but distinctly log-linear subpopulations for Pb, Cu, Zn and Ag are interpreted to represent ore-grade enrichment of naturally occurring minerals such as sulfides. None of the 27 chemical elements could pass the test for either normal or lognormal distribution on the declustered data set. Part of the reasons relate to the presence of mixtures of subpopulations and outliers. Random samples of the data set with successively
Statistical sampling strategies
International Nuclear Information System (INIS)
Andres, T.H.
1987-01-01
Systems assessment codes use mathematical models to simulate natural and engineered systems. Probabilistic systems assessment codes carry out multiple simulations to reveal the uncertainty in values of output variables due to uncertainty in the values of the model parameters. In this paper, methods are described for sampling sets of parameter values to be used in a probabilistic systems assessment code. Three Monte Carlo parameter selection methods are discussed: simple random sampling, Latin hypercube sampling, and sampling using two-level orthogonal arrays. Three post-selection transformations are also described: truncation, importance transformation, and discretization. Advantages and disadvantages of each method are summarized
Attitude towards statistics and performance among post-graduate students
Rosli, Mira Khalisa; Maat, Siti Mistima
2017-05-01
For student to master Statistics is a necessity, especially for those post-graduates that are involved in the research field. The purpose of this research was to identify the attitude towards Statistics among the post-graduates and to determine the relationship between the attitude towards Statistics and post-graduates' of Faculty of Education, UKM, Bangi performance. 173 post-graduate students were chosen randomly to participate in the study. These students registered in Research Methodology II course that was introduced by faculty. A survey of attitude toward Statistics using 5-points Likert scale was used for data collection purposes. The instrument consists of four components such as affective, cognitive competency, value and difficulty. The data was analyzed using the SPSS version 22 in producing the descriptive and inferential Statistics output. The result of this research showed that there is a medium and positive relation between attitude towards statistics and students' performance. As a conclusion, educators need to access students' attitude towards the course to accomplish the learning outcomes.
Statistical sampling and modelling for cork oak and eucalyptus stands
Paulo, M.J.
2002-01-01
This thesis focuses on the use of modern statistical methods to solve problems on sampling, optimal cutting time and agricultural modelling in Portuguese cork oak and eucalyptus stands. The results are contained in five chapters that have been submitted for publication
Statistical Methods and Tools for Hanford Staged Feed Tank Sampling
Energy Technology Data Exchange (ETDEWEB)
Fountain, Matthew S. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Brigantic, Robert T. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Peterson, Reid A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
2013-10-01
This report summarizes work conducted by Pacific Northwest National Laboratory to technically evaluate the current approach to staged feed sampling of high-level waste (HLW) sludge to meet waste acceptance criteria (WAC) for transfer from tank farms to the Hanford Waste Treatment and Immobilization Plant (WTP). The current sampling and analysis approach is detailed in the document titled Initial Data Quality Objectives for WTP Feed Acceptance Criteria, 24590-WTP-RPT-MGT-11-014, Revision 0 (Arakali et al. 2011). The goal of this current work is to evaluate and provide recommendations to support a defensible, technical and statistical basis for the staged feed sampling approach that meets WAC data quality objectives (DQOs).
A Model of Statistics Performance Based on Achievement Goal Theory.
Bandalos, Deborah L.; Finney, Sara J.; Geske, Jenenne A.
2003-01-01
Tests a model of statistics performance based on achievement goal theory. Both learning and performance goals affected achievement indirectly through study strategies, self-efficacy, and test anxiety. Implications of these findings for teaching and learning statistics are discussed. (Contains 47 references, 3 tables, 3 figures, and 1 appendix.)…
Statistical sampling applied to the radiological characterization of historical waste
Directory of Open Access Journals (Sweden)
Zaffora Biagio
2016-01-01
Full Text Available The evaluation of the activity of radionuclides in radioactive waste is required for its disposal in final repositories. Easy-to-measure nuclides, like γ-emitters and high-energy X-rays, can be measured via non-destructive nuclear techniques from outside a waste package. Some radionuclides are difficult-to-measure (DTM from outside a package because they are α- or β-emitters. The present article discusses the application of linear regression, scaling factors (SF and the so-called “mean activity method” to estimate the activity of DTM nuclides on metallic waste produced at the European Organization for Nuclear Research (CERN. Various statistical sampling techniques including simple random sampling, systematic sampling, stratified and authoritative sampling are described and applied to 2 waste populations of activated copper cables. The bootstrap is introduced as a tool to estimate average activities and standard errors in waste characterization. The analysis of the DTM Ni-63 is used as an example. Experimental and theoretical values of SFs are calculated and compared. Guidelines for sampling historical waste using probabilistic and non-probabilistic sampling are finally given.
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics
Pohorille, Andrew
2006-01-01
The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described
Comparing simulated and theoretical sampling distributions of the U3 person-fit statistic
Emons, W.H.M.; Meijer, R.R.; Sijtsma, K.
2002-01-01
The accuracy with which the theoretical sampling distribution of van der Flier's person-.t statistic U3 approaches the empirical U3 sampling distribution is affected by the item discrimination. A simulation study showed that for tests with a moderate or a strong mean item discrimination, the Type I
Effect of sample size on bias correction performance
Reiter, Philipp; Gutjahr, Oliver; Schefczyk, Lukas; Heinemann, Günther; Casper, Markus C.
2014-05-01
The output of climate models often shows a bias when compared to observed data, so that a preprocessing is necessary before using it as climate forcing in impact modeling (e.g. hydrology, species distribution). A common bias correction method is the quantile matching approach, which adapts the cumulative distribution function of the model output to the one of the observed data by means of a transfer function. Especially for precipitation we expect the bias correction performance to strongly depend on sample size, i.e. the length of the period used for calibration of the transfer function. We carry out experiments using the precipitation output of ten regional climate model (RCM) hindcast runs from the EU-ENSEMBLES project and the E-OBS observational dataset for the period 1961 to 2000. The 40 years are split into a 30 year calibration period and a 10 year validation period. In the first step, for each RCM transfer functions are set up cell-by-cell, using the complete 30 year calibration period. The derived transfer functions are applied to the validation period of the respective RCM precipitation output and the mean absolute errors in reference to the observational dataset are calculated. These values are treated as "best fit" for the respective RCM. In the next step, this procedure is redone using subperiods out of the 30 year calibration period. The lengths of these subperiods are reduced from 29 years down to a minimum of 1 year, only considering subperiods of consecutive years. This leads to an increasing number of repetitions for smaller sample sizes (e.g. 2 for a length of 29 years). In the last step, the mean absolute errors are statistically tested against the "best fit" of the respective RCM to compare the performances. In order to analyze if the intensity of the effect of sample size depends on the chosen correction method, four variations of the quantile matching approach (PTF, QUANT/eQM, gQM, GQM) are applied in this study. The experiments are further
The Statistics of Radio Astronomical Polarimetry: Disjoint, Superposed, and Composite Samples
Energy Technology Data Exchange (ETDEWEB)
Straten, W. van [Centre for Astrophysics and Supercomputing, Swinburne University of Technology, Hawthorn, VIC 3122 (Australia); Tiburzi, C., E-mail: willem.van.straten@aut.ac.nz [Max-Planck-Institut für Radioastronomie, Auf dem Hügel 69, D-53121 Bonn (Germany)
2017-02-01
A statistical framework is presented for the study of the orthogonally polarized modes of radio pulsar emission via the covariances between the Stokes parameters. To accommodate the typically heavy-tailed distributions of single-pulse radio flux density, the fourth-order joint cumulants of the electric field are used to describe the superposition of modes with arbitrary probability distributions. The framework is used to consider the distinction between superposed and disjoint modes, with particular attention to the effects of integration over finite samples. If the interval over which the polarization state is estimated is longer than the timescale for switching between two or more disjoint modes of emission, then the modes are unresolved by the instrument. The resulting composite sample mean exhibits properties that have been attributed to mode superposition, such as depolarization. Because the distinction between disjoint modes and a composite sample of unresolved disjoint modes depends on the temporal resolution of the observing instrumentation, the arguments in favor of superposed modes of pulsar emission are revisited, and observational evidence for disjoint modes is described. In principle, the four-dimensional covariance matrix that describes the distribution of sample mean Stokes parameters can be used to distinguish between disjoint modes, superposed modes, and a composite sample of unresolved disjoint modes. More comprehensive and conclusive interpretation of the covariance matrix requires more detailed consideration of various relevant phenomena, including temporally correlated subpulse modulation (e.g., jitter), statistical dependence between modes (e.g., covariant intensities and partial coherence), and multipath propagation effects (e.g., scintillation and scattering).
International Nuclear Information System (INIS)
Gilbert, R.O.
1978-01-01
Some statistical aspects of compositing field samples of soils for determining the content of Pu are discussed. Some of the potential problems involved in pooling samples are reviewed. This is followed by more detailed discussions and examples of compositing designs, adequacy of mixing, statistical models and their role in compositing, and related topics
Comparing simulated and theoretical sampling distributions of the U3 person-fit statistic
Emons, Wilco H.M.; Meijer, R.R.; Sijtsma, Klaas
2002-01-01
The accuracy with which the theoretical sampling distribution of van der Flier’s person-fit statistic U3 approaches the empirical U3 sampling distribution is affected by the item discrimination. A simulation study showed that for tests with a moderate or a strong mean item discrimination, the Type I
Review of Statistical Analyses Resulting from Performance of HLDWD-DWPF-005
International Nuclear Information System (INIS)
Beck, R.S.
1997-01-01
The Engineering Department at the Defense Waste Processing Facility (DWPF) has reviewed two reports from the Statistical Consulting Section (SCS) involving the statistical analysis of test results for analysis of small sample inserts (references 1 ampersand 2). The test results cover two proposed analytical methods, a room temperature hydrofluoric acid preparation (Cold Chem) and a sodium peroxide/sodium hydroxide fusion modified for insert samples (Modified Fusion). The reports support implementation of the proposed small sample containers and analytical methods at DWPF. Hydragard sampler valve performance was typical of previous results (reference 3). Using an element from each major feed stream. lithium from the frit and iron from the sludge, the sampler was determined to deliver a uniform mixture in either sample container.The lithium to iron ratios were equivalent for the standard 15 ml vial and the 3 ml insert.The proposed method provide equivalent analyses as compared to the current methods. The biases associated with the proposed methods on a vitrified basis are less than 5% for major elements. The sum of oxides for the proposed method compares favorably with the sum of oxides for the conventional methods. However, the average sum of oxides for the Cold Chem method was 94.3% which is below the minimum required recovery of 95%. Both proposed methods, cold Chem and Modified Fusion, will be required at first to provide an accurate analysis which will routinely meet the 95% and 105% average sum of oxides limit for Product Composition Control System (PCCS).Issued to be resolved during phased implementation are as follows: (1) Determine calcine/vitrification factor for radioactive feed; (2) Evaluate covariance matrix change against process operating ranges to determine optimum sample size; (3) Evaluate sources for low sum of oxides; and (4) Improve remote operability of production versions of equipment and instruments for installation in 221-S.The specifics of
Statistical analysis of RHIC beam position monitors performance
Calaga, R.; Tomás, R.
2004-04-01
A detailed statistical analysis of beam position monitors (BPM) performance at RHIC is a critical factor in improving regular operations and future runs. Robust identification of malfunctioning BPMs plays an important role in any orbit or turn-by-turn analysis. Singular value decomposition and Fourier transform methods, which have evolved as powerful numerical techniques in signal processing, will aid in such identification from BPM data. This is the first attempt at RHIC to use a large set of data to statistically enhance the capability of these two techniques and determine BPM performance. A comparison from run 2003 data shows striking agreement between the two methods and hence can be used to improve BPM functioning at RHIC and possibly other accelerators.
Statistical analysis of RHIC beam position monitors performance
Directory of Open Access Journals (Sweden)
R. Calaga
2004-04-01
Full Text Available A detailed statistical analysis of beam position monitors (BPM performance at RHIC is a critical factor in improving regular operations and future runs. Robust identification of malfunctioning BPMs plays an important role in any orbit or turn-by-turn analysis. Singular value decomposition and Fourier transform methods, which have evolved as powerful numerical techniques in signal processing, will aid in such identification from BPM data. This is the first attempt at RHIC to use a large set of data to statistically enhance the capability of these two techniques and determine BPM performance. A comparison from run 2003 data shows striking agreement between the two methods and hence can be used to improve BPM functioning at RHIC and possibly other accelerators.
Performing Inferential Statistics Prior to Data Collection
Trafimow, David; MacDonald, Justin A.
2017-01-01
Typically, in education and psychology research, the investigator collects data and subsequently performs descriptive and inferential statistics. For example, a researcher might compute group means and use the null hypothesis significance testing procedure to draw conclusions about the populations from which the groups were drawn. We propose an…
International Nuclear Information System (INIS)
Dowdy, E.J.; Hansen, G.E.; Robba, A.A.; Pratt, J.C.
1980-01-01
The complete formalism for the use of statistical neutron fluctuation measurements for the nondestructive assay of fissionable materials has been developed. This formalism includes the effect of detector deadtime, neutron multiplicity, random neutron pulse contributions from (α,n) contaminants in the sample, and the sample multiplication of both fission-related and background neutrons
Directory of Open Access Journals (Sweden)
CODRUŢA DURA
2010-01-01
Full Text Available The sample represents a particular segment of the statistical populationchosen to represent it as a whole. The representativeness of the sample determines the accuracyfor estimations made on the basis of calculating the research indicators and the inferentialstatistics. The method of random sampling is part of probabilistic methods which can be usedwithin marketing research and it is characterized by the fact that it imposes the requirementthat each unit belonging to the statistical population should have an equal chance of beingselected for the sampling process. When the simple random sampling is meant to be rigorouslyput into practice, it is recommended to use the technique of random number tables in order toconfigure the sample which will provide information that the marketer needs. The paper alsodetails the practical procedure implemented in order to create a sample for a marketingresearch by generating random numbers using the facilities offered by Microsoft Excel.
Galbraith, Niall D; Manktelow, Ken I; Morris, Neil G
2010-11-01
Previous studies demonstrate that people high in delusional ideation exhibit a data-gathering bias on inductive reasoning tasks. The current study set out to investigate the factors that may underpin such a bias by examining healthy individuals, classified as either high or low scorers on the Peters et al. Delusions Inventory (PDI). More specifically, whether high PDI scorers have a relatively poor appreciation of sample size and heterogeneity when making statistical judgments. In Expt 1, high PDI scorers made higher probability estimates when generalizing from a sample of 1 with regard to the heterogeneous human property of obesity. In Expt 2, this effect was replicated and was also observed in relation to the heterogeneous property of aggression. The findings suggest that delusion-prone individuals are less appreciative of the importance of sample size when making statistical judgments about heterogeneous properties; this may underpin the data gathering bias observed in previous studies. There was some support for the hypothesis that threatening material would exacerbate high PDI scorers' indifference to sample size.
Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu
2015-01-01
Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing
Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H
2016-01-01
Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell
Sample Size Requirements for Assessing Statistical Moments of Simulated Crop Yield Distributions
Lehmann, N.; Finger, R.; Klein, T.; Calanca, P.
2013-01-01
Mechanistic crop growth models are becoming increasingly important in agricultural research and are extensively used in climate change impact assessments. In such studies, statistics of crop yields are usually evaluated without the explicit consideration of sample size requirements. The purpose of
THESEE-3, Orgel Reactor Performance and Statistic Hot Channel Factors
International Nuclear Information System (INIS)
Chambaud, B.
1974-01-01
1 - Nature of physical problem solved: The code applies to a heavy-water moderated organic-cooled reactor channel. Different fuel cluster models can be used (circular or hexagonal patterns). The code gives coolant temperatures and velocities and cladding temperatures throughout the channel and also channel performances, such as power, outlet temperature, boiling and burn-out safety margins (see THESEE-1). In a further step, calculations are performed with statistical values obtained by random retrieval of geometrical in- put data and taking into account construction tolerances, vibrations, etc. The code evaluates the mean value and standard deviation for the more important thermal and hydraulic parameters. 2 - Method of solution: First step calculations are performed for nominal values of parameters by solving iteratively the non-linear system of equations which give the pressure drops in subchannels of the current zone (see THESEE-1). Then a Gaussian probability distribution of possible statistical values of the geometrical input data is assumed. A random number generation routine determines the statistical case. Calculations are performed in the same way as for the nominal case. In the case of several channels, statistical performances must be adjusted to equalize the normal pressure drop. A special subroutine (AVERAGE) then determines the mean value and standard deviation, and thus probability functions of the most significant thermal and hydraulic results. 3 - Restrictions on the complexity of the problem: Maximum 7 fuel clusters, each divided into 10 axial zones. Fuel bundle geometries are restricted to the following models - circular pattern 6/7, 18/19, 36/67 rods, with or without fillers. The fuel temperature distribution is not studied. The probability distribution of the statistical input is assumed to be a Gaussian function. The principle of random retrieval of statistical values is correct, but some additional correlations could be found from a more
International Nuclear Information System (INIS)
Tang Jie; Nett, Brian E; Chen Guanghong
2009-01-01
Of all available reconstruction methods, statistical iterative reconstruction algorithms appear particularly promising since they enable accurate physical noise modeling. The newly developed compressive sampling/compressed sensing (CS) algorithm has shown the potential to accurately reconstruct images from highly undersampled data. The CS algorithm can be implemented in the statistical reconstruction framework as well. In this study, we compared the performance of two standard statistical reconstruction algorithms (penalized weighted least squares and q-GGMRF) to the CS algorithm. In assessing the image quality using these iterative reconstructions, it is critical to utilize realistic background anatomy as the reconstruction results are object dependent. A cadaver head was scanned on a Varian Trilogy system at different dose levels. Several figures of merit including the relative root mean square error and a quality factor which accounts for the noise performance and the spatial resolution were introduced to objectively evaluate reconstruction performance. A comparison is presented between the three algorithms for a constant undersampling factor comparing different algorithms at several dose levels. To facilitate this comparison, the original CS method was formulated in the framework of the statistical image reconstruction algorithms. Important conclusions of the measurements from our studies are that (1) for realistic neuro-anatomy, over 100 projections are required to avoid streak artifacts in the reconstructed images even with CS reconstruction, (2) regardless of the algorithm employed, it is beneficial to distribute the total dose to more views as long as each view remains quantum noise limited and (3) the total variation-based CS method is not appropriate for very low dose levels because while it can mitigate streaking artifacts, the images exhibit patchy behavior, which is potentially harmful for medical diagnosis.
[The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].
Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel
2017-01-01
The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
The research protocol VI: How to choose the appropriate statistical test. Inferential statistics
Directory of Open Access Journals (Sweden)
Eric Flores-Ruiz
2017-10-01
Full Text Available The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
Overestimation of test performance by ROC analysis: Effect of small sample size
International Nuclear Information System (INIS)
Seeley, G.W.; Borgstrom, M.C.; Patton, D.D.; Myers, K.J.; Barrett, H.H.
1984-01-01
New imaging systems are often observer-rated by ROC techniques. For practical reasons the number of different images, or sample size (SS), is kept small. Any systematic bias due to small SS would bias system evaluation. The authors set about to determine whether the area under the ROC curve (AUC) would be systematically biased by small SS. Monte Carlo techniques were used to simulate observer performance in distinguishing signal (SN) from noise (N) on a 6-point scale; P(SN) = P(N) = .5. Four sample sizes (15, 25, 50 and 100 each of SN and N), three ROC slopes (0.8, 1.0 and 1.25), and three intercepts (0.8, 1.0 and 1.25) were considered. In each of the 36 combinations of SS, slope and intercept, 2000 runs were simulated. Results showed a systematic bias: the observed AUC exceeded the expected AUC in every one of the 36 combinations for all sample sizes, with the smallest sample sizes having the largest bias. This suggests that evaluations of imaging systems using ROC curves based on small sample size systematically overestimate system performance. The effect is consistent but subtle (maximum 10% of AUC standard deviation), and is probably masked by the s.d. in most practical settings. Although there is a statistically significant effect (F = 33.34, P<0.0001) due to sample size, none was found for either the ROC curve slope or intercept. Overestimation of test performance by small SS seems to be an inherent characteristic of the ROC technique that has not previously been described
International Nuclear Information System (INIS)
Moscati, A.F. Jr.; Hediger, E.M.; Rupp, M.J.
1986-01-01
High concentrations of lead in soils along an abandoned railroad line prompted a remedial investigation to characterize the extent of contamination across a 7-acre site. Contamination was thought to be spotty across the site reflecting its past use in battery recycling operations at discrete locations. A screening technique was employed to delineate the more highly contaminated areas by testing a statistically determined minimum number of random samples from each of seven discrete site areas. The approach not only quickly identified those site areas which would require more extensive grid sampling, but also provided a statistically defensible basis for excluding other site areas from further consideration, thus saving the cost of additional sample collection and analysis. The reduction in the number of samples collected in ''clean'' areas of the site ranged from 45 to 60%
EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.
Tong, Xiaoxiao; Bentler, Peter M
2013-01-01
Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.
Ecotoxicology statistical sampling
International Nuclear Information System (INIS)
Saona, G.
2012-01-01
This presentation introduces to general concepts in toxicology sample designs such as the distribution of organic or inorganic contaminants, a microbiological contamination, and the determination of the position in an eco toxicological bioassays ecosystem.
Improved custom statistics visualization for CA Performance Center data
Talevi, Iacopo
2017-01-01
The main goal of my project is to understand and experiment the possibilities that CA Performance Center (CA PC) offers for creating custom applications to display stored information through interesting visual means, such as maps. In particular, I have re-written some of the network statistics web pages in order to fetch data from new statistics modules in CA PC, which has its own API, and stop using the RRD data.
International Nuclear Information System (INIS)
Eberhardt, L.L.; Thomas, J.M.
1986-07-01
This project was designed to develop guidance for implementing 10 CFR Part 61 and to determine the overall needs for sampling and statistical work in characterizing, surveying, monitoring, and closing commercial low-level waste sites. When cost-effectiveness and statistical reliability are of prime importance, then double sampling, compositing, and stratification (with optimal allocation) are identified as key issues. If the principal concern is avoiding questionable statistical practice, then the applicability of kriging (for assessing spatial pattern), methods for routine monitoring, and use of standard textbook formulae in reporting monitoring results should be reevaluated. Other important issues identified include sampling for estimating model parameters and the use of data from left-censored (less than detectable limits) distributions
U.S. Department of Health & Human Services — This section contains statistical information and reports related to the percentage of electronic transactions being sent to Medicare contractors in the formats...
Liem, Franziskus; Mérillat, Susan; Bezzola, Ladina; Hirsiger, Sarah; Philipp, Michel; Madhyastha, Tara; Jäncke, Lutz
2015-03-01
FreeSurfer is a tool to quantify cortical and subcortical brain anatomy automatically and noninvasively. Previous studies have reported reliability and statistical power analyses in relatively small samples or only selected one aspect of brain anatomy. Here, we investigated reliability and statistical power of cortical thickness, surface area, volume, and the volume of subcortical structures in a large sample (N=189) of healthy elderly subjects (64+ years). Reliability (intraclass correlation coefficient) of cortical and subcortical parameters is generally high (cortical: ICCs>0.87, subcortical: ICCs>0.95). Surface-based smoothing increases reliability of cortical thickness maps, while it decreases reliability of cortical surface area and volume. Nevertheless, statistical power of all measures benefits from smoothing. When aiming to detect a 10% difference between groups, the number of subjects required to test effects with sufficient power over the entire cortex varies between cortical measures (cortical thickness: N=39, surface area: N=21, volume: N=81; 10mm smoothing, power=0.8, α=0.05). For subcortical regions this number is between 16 and 76 subjects, depending on the region. We also demonstrate the advantage of within-subject designs over between-subject designs. Furthermore, we publicly provide a tool that allows researchers to perform a priori power analysis and sensitivity analysis to help evaluate previously published studies and to design future studies with sufficient statistical power. Copyright © 2014 Elsevier Inc. All rights reserved.
Statistical Power in Meta-Analysis
Liu, Jin
2015-01-01
Statistical power is important in a meta-analysis study, although few studies have examined the performance of simulated power in meta-analysis. The purpose of this study is to inform researchers about statistical power estimation on two sample mean difference test under different situations: (1) the discrepancy between the analytical power and…
Directory of Open Access Journals (Sweden)
M.M. Mohie El-Din
2011-10-01
Full Text Available In this paper, two sample Bayesian prediction intervals for order statistics (OS are obtained. This prediction is based on a certain class of the inverse exponential-type distributions using a right censored sample. A general class of prior density functions is used and the predictive cumulative function is obtained in the two samples case. The class of the inverse exponential-type distributions includes several important distributions such the inverse Weibull distribution, the inverse Burr distribution, the loglogistic distribution, the inverse Pareto distribution and the inverse paralogistic distribution. Special cases of the inverse Weibull model such as the inverse exponential model and the inverse Rayleigh model are considered.
Hansen, John P
2003-01-01
Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 2, describes probability, populations, and samples. The uses of descriptive and inferential statistics are outlined. The article also discusses the properties and probability of normal distributions, including the standard normal distribution.
Humans make efficient use of natural image statistics when performing spatial interpolation.
D'Antona, Anthony D; Perry, Jeffrey S; Geisler, Wilson S
2013-12-16
Visual systems learn through evolution and experience over the lifespan to exploit the statistical structure of natural images when performing visual tasks. Understanding which aspects of this statistical structure are incorporated into the human nervous system is a fundamental goal in vision science. To address this goal, we measured human ability to estimate the intensity of missing image pixels in natural images. Human estimation accuracy is compared with various simple heuristics (e.g., local mean) and with optimal observers that have nearly complete knowledge of the local statistical structure of natural images. Human estimates are more accurate than those of simple heuristics, and they match the performance of an optimal observer that knows the local statistical structure of relative intensities (contrasts). This optimal observer predicts the detailed pattern of human estimation errors and hence the results place strong constraints on the underlying neural mechanisms. However, humans do not reach the performance of an optimal observer that knows the local statistical structure of the absolute intensities, which reflect both local relative intensities and local mean intensity. As predicted from a statistical analysis of natural images, human estimation accuracy is negligibly improved by expanding the context from a local patch to the whole image. Our results demonstrate that the human visual system exploits efficiently the statistical structure of natural images.
Directory of Open Access Journals (Sweden)
Lee Tae-Hoon
2016-12-01
Full Text Available In many cases, a X¯$\\overline X $ control chart based on a performance variable is used in industrial fields. Typically, the control chart monitors the measurements of a performance variable itself. However, if the performance variable is too costly or impossible to measure, and a less expensive surrogate variable is available, the process may be more efficiently controlled using surrogate variables. In this paper, we present a model for the economic statistical design of a VSI (Variable Sampling Interval X¯$\\overline X $ control chart using a surrogate variable that is linearly correlated with the performance variable. We derive the total average profit model from an economic viewpoint and apply the model to a Very High Temperature Reactor (VHTR nuclear fuel measurement system and derive the optimal result using genetic algorithms. Compared with the control chart based on a performance variable, the proposed model gives a larger expected net income per unit of time in the long-run if the correlation between the performance variable and the surrogate variable is relatively high. The proposed model was confined to the sample mean control chart under the assumption that a single assignable cause occurs according to the Poisson process. However, the model may also be extended to other types of control charts using a single or multiple assignable cause assumptions such as VSS (Variable Sample Size X¯$\\overline X $ control chart, EWMA, CUSUM charts and so on.
Bakker, A.; Dierdorp, A.; Maanen, J.A. van; Eijkelhof, H.M.C.
2012-01-01
To stimulate students’ shuttling between contextual and statistical spheres, we based tasks on professional practices. This article focuses on two tasks to support reasoning about sampling by students aged 16-17. The purpose of the tasks was to find out which smaller sample size would have been
Varekar, Vikas; Karmakar, Subhankar; Jha, Ramakar
2016-02-01
The design of surface water quality sampling location is a crucial decision-making process for rationalization of monitoring network. The quantity, quality, and types of available dataset (watershed characteristics and water quality data) may affect the selection of appropriate design methodology. The modified Sanders approach and multivariate statistical techniques [particularly factor analysis (FA)/principal component analysis (PCA)] are well-accepted and widely used techniques for design of sampling locations. However, their performance may vary significantly with quantity, quality, and types of available dataset. In this paper, an attempt has been made to evaluate performance of these techniques by accounting the effect of seasonal variation, under a situation of limited water quality data but extensive watershed characteristics information, as continuous and consistent river water quality data is usually difficult to obtain, whereas watershed information may be made available through application of geospatial techniques. A case study of Kali River, Western Uttar Pradesh, India, is selected for the analysis. The monitoring was carried out at 16 sampling locations. The discrete and diffuse pollution loads at different sampling sites were estimated and accounted using modified Sanders approach, whereas the monitored physical and chemical water quality parameters were utilized as inputs for FA/PCA. The designed optimum number of sampling locations for monsoon and non-monsoon seasons by modified Sanders approach are eight and seven while that for FA/PCA are eleven and nine, respectively. Less variation in the number and locations of designed sampling sites were obtained by both techniques, which shows stability of results. A geospatial analysis has also been carried out to check the significance of designed sampling location with respect to river basin characteristics and land use of the study area. Both methods are equally efficient; however, modified Sanders
Statistical issues in reporting quality data: small samples and casemix variation.
Zaslavsky, A M
2001-12-01
To present two key statistical issues that arise in analysis and reporting of quality data. Casemix variation is relevant to quality reporting when the units being measured have differing distributions of patient characteristics that also affect the quality outcome. When this is the case, adjustment using stratification or regression may be appropriate. Such adjustments may be controversial when the patient characteristic does not have an obvious relationship to the outcome. Stratified reporting poses problems for sample size and reporting format, but may be useful when casemix effects vary across units. Although there are no absolute standards of reliability, high reliabilities (interunit F > or = 10 or reliability > or = 0.9) are desirable for distinguishing above- and below-average units. When small or unequal sample sizes complicate reporting, precision may be improved using indirect estimation techniques that incorporate auxiliary information, and 'shrinkage' estimation can help to summarize the strength of evidence about units with small samples. With broader understanding of casemix adjustment and methods for analyzing small samples, quality data can be analysed and reported more accurately.
Statistical evaluation of the data obtained from the K East Basin Sandfilter Backwash Pit samples
International Nuclear Information System (INIS)
Welsh, T.L.
1994-01-01
Samples were obtained from different locations from the K Each Sandfilter Backwash Pit to characterize the sludge material. These samples were analyzed chemically for elements, radionuclides, and residual compounds. The analytical results were statistically analyzed to determine the mean analyte content and the associated variability for each mean value
Directory of Open Access Journals (Sweden)
Zeyu Lin
2017-01-01
Full Text Available In this paper, the authors presented a study on the discrimination of handlebar grip samples, to provide effective forensic science service for hit and run traffic cases. 50 bicycle handlebar grip samples, 49 electric bike handlebar grip samples, and 96 motorcycle handlebar grip samples have been randomly collected by the local police in Beijing (China. Fourier transform infrared microspectroscopy (FTIR was utilized as analytical technology. Then, target absorption selection, data pretreatment, and discrimination of linked samples and unlinked samples were chosen as three steps to improve the discrimination of FTIR spectrums collected from different handlebar grip samples. Principal component analysis and receiver operating characteristic curve were utilized to evaluate different data selection methods and different data pretreatment methods, respectively. It is possible to explore the evidential value of handlebar grip residue evidence through instrumental analysis and statistical treatments. It will provide a universal discrimination method for other forensic science samples as well.
Energy Technology Data Exchange (ETDEWEB)
Wilson, Mollye C.; Einfeld, Wayne; Boucher, Raymond M.; Brown, Gary Stephen; Tezak, Matthew Stephen
2011-06-01
Recovery of Bacillus atrophaeous spores from grime-treated and clean surfaces was measured in a controlled chamber study to assess sampling method performance. Outdoor surfaces investigated by wipe and vacuum sampling methods included stainless steel, glass, marble and concrete. Bacillus atrophaeous spores were used as a surrogate for Bacillus anthracis spores in this study designed to assess whether grime-coated surfaces significantly affected surface sampling method performance when compared to clean surfaces. A series of chamber tests were carried out in which known amounts of spores were allowed to gravitationally settle onto both clean and dirty surfaces. Reference coupons were co-located with test coupons in all chamber experiments to provide a quantitative measure of initial surface concentrations of spores on all surfaces, thereby allowing sampling recovery calculations. Results from these tests, carried out under both low and high humidity conditions, show that spore recovery from grime-coated surfaces is the same as or better than spore recovery from clean surfaces. Statistically significant differences between method performance for grime-coated and clean surfaces were observed in only about half of the chamber tests conducted.
A Divergence Statistics Extension to VTK for Performance Analysis
Energy Technology Data Exchange (ETDEWEB)
Pebay, Philippe Pierre [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Bennett, Janine Camille [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2015-02-01
This report follows the series of previous documents ([PT08, BPRT09b, PT09, BPT09, PT10, PB13], where we presented the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k -means, order and auto-correlative statistics engines which we developed within the Visualization Tool Kit ( VTK ) as a scalable, parallel and versatile statistics package. We now report on a new engine which we developed for the calculation of divergence statistics, a concept which we hereafter explain and whose main goal is to quantify the discrepancy, in a stasticial manner akin to measuring a distance, between an observed empirical distribution and a theoretical, "ideal" one. The ease of use of the new diverence statistics engine is illustrated by the means of C++ code snippets. Although this new engine does not yet have a parallel implementation, it has already been applied to HPC performance analysis, of which we provide an example.
DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data
International Nuclear Information System (INIS)
Harris, S.P.
1997-01-01
This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock-up Facility using 3 ml inserts and 15 ml peanut vials. A number of the insert samples were analyzed by Cold Chem and compared with full peanut vial samples analyzed by the current methods. The remaining inserts were analyzed by
Statistical sampling plan for the TRU waste assay facility
International Nuclear Information System (INIS)
Beauchamp, J.J.; Wright, T.; Schultz, F.J.; Haff, K.; Monroe, R.J.
1983-08-01
Due to limited space, there is a need to dispose appropriately of the Oak Ridge National Laboratory transuranic waste which is presently stored below ground in 55-gal (208-l) drums within weather-resistant structures. Waste containing less than 100 nCi/g transuranics can be removed from the present storage and be buried, while waste containing greater than 100 nCi/g transuranics must continue to be retrievably stored. To make the necessary measurements needed to determine the drums that can be buried, a transuranic Neutron Interrogation Assay System (NIAS) has been developed at Los Alamos National Laboratory and can make the needed measurements much faster than previous techniques which involved γ-ray spectroscopy. The previous techniques are reliable but time consuming. Therefore, a validation study has been planned to determine the ability of the NIAS to make adequate measurements. The validation of the NIAS will be based on a paired comparison of a sample of measurements made by the previous techniques and the NIAS. The purpose of this report is to describe the proposed sampling plan and the statistical analyses needed to validate the NIAS. 5 references, 4 figures, 5 tables
Statistics and sampling in transuranic studies
International Nuclear Information System (INIS)
Eberhardt, L.L.; Gilbert, R.O.
1980-01-01
The existing data on transuranics in the environment exhibit a remarkably high variability from sample to sample (coefficients of variation of 100% or greater). This chapter stresses the necessity of adequate sample size and suggests various ways to increase sampling efficiency. Objectives in sampling are regarded as being of great importance in making decisions as to sampling methodology. Four different classes of sampling methods are described: (1) descriptive sampling, (2) sampling for spatial pattern, (3) analytical sampling, and (4) sampling for modeling. A number of research needs are identified in the various sampling categories along with several problems that appear to be common to two or more such areas
International Nuclear Information System (INIS)
Pollock, D.A.; Brown, G.; Capone, D.W. II; Christopherson, D.; Seuntjens, J.M.; Woltz, J.
1992-01-01
This work has demonstrated the statistical concepts behind the XBAR R method for determining sample limits to verify billet I c performance and process uniformity. Using a preliminary population estimate for μ and σ from a stable production lot of only 5 billets, we have shown that reasonable sensitivity to systematic process drift and random within billet variation may be achieved, by using per billet subgroup sizes of moderate proportions. The effects of subgroup size (n) and sampling risk (α and β) on the calculated control limits have been shown to be important factors that need to be carefully considered when selecting an actual number of measurements to be used per billet, for each supplier process. Given the present method of testing in which individual wire samples are ramped to I c only once, with measurement uncertainty due to repeatability and reproducibility (typically > 1.4%), large subgroups (i.e. >30 per billet) appear to be unnecessary, except as an inspection tool to confirm wire process history for each spool. The introduction of the XBAR R method or a similar Statistical Quality Control procedure is recommend for use in the superconducing wire production program, particularly when the program transitions from requiring tests for all pieces of wire to sampling each production unit
Performance samples on academic tasks : improving prediction of academic performance
Tanilon, Jenny
2011-01-01
This thesis is about the development and validation of a performance-based test, labeled as Performance Samples on academic tasks in Education and Child Studies (PSEd). PSEd is designed to identify students who are most able to perform the academic tasks involved in an Education and Child Studies
Statistical assessment of fish behavior from split-beam hydro-acoustic sampling
International Nuclear Information System (INIS)
McKinstry, Craig A.; Simmons, Mary Ann; Simmons, Carver S.; Johnson, Robert L.
2005-01-01
Statistical methods are presented for using echo-traces from split-beam hydro-acoustic sampling to assess fish behavior in response to a stimulus. The data presented are from a study designed to assess the response of free-ranging, lake-resident fish, primarily kokanee (Oncorhynchus nerka) and rainbow trout (Oncorhynchus mykiss) to high intensity strobe lights, and was conducted at Grand Coulee Dam on the Columbia River in Northern Washington State. The lights were deployed immediately upstream from the turbine intakes, in a region exposed to daily alternating periods of high and low flows. The study design included five down-looking split-beam transducers positioned in a line at incremental distances upstream from the strobe lights, and treatments applied in randomized pseudo-replicate blocks. Statistical methods included the use of odds-ratios from fitted loglinear models. Fish-track velocity vectors were modeled using circular probability distributions. Both analyses are depicted graphically. Study results suggest large increases of fish activity in the presence of the strobe lights, most notably at night and during periods of low flow. The lights also induced notable bimodality in the angular distributions of the fish track velocity vectors. Statistical/SUMmaries are presented along with interpretations on fish behavior
Sparse Power-Law Network Model for Reliable Statistical Predictions Based on Sampled Data
Directory of Open Access Journals (Sweden)
Alexander P. Kartun-Giles
2018-04-01
Full Text Available A projective network model is a model that enables predictions to be made based on a subsample of the network data, with the predictions remaining unchanged if a larger sample is taken into consideration. An exchangeable model is a model that does not depend on the order in which nodes are sampled. Despite a large variety of non-equilibrium (growing and equilibrium (static sparse complex network models that are widely used in network science, how to reconcile sparseness (constant average degree with the desired statistical properties of projectivity and exchangeability is currently an outstanding scientific problem. Here we propose a network process with hidden variables which is projective and can generate sparse power-law networks. Despite the model not being exchangeable, it can be closely related to exchangeable uncorrelated networks as indicated by its information theory characterization and its network entropy. The use of the proposed network process as a null model is here tested on real data, indicating that the model offers a promising avenue for statistical network modelling.
The CEO performance effect : Statistical issues and a complex fit perspective
Blettner, D.P.; Chaddad, F.R.; Bettis, R.
2012-01-01
How CEOs affect strategy and performance is important to strategic management research. We show that sophisticated statistical analysis alone is problematic for establishing the magnitude and causes of CEO impact on performance. We discuss three problem areas that substantially distort the
DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data
Energy Technology Data Exchange (ETDEWEB)
Harris, S.P. [Westinghouse Savannah River Company, AIKEN, SC (United States)
1997-09-18
This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Either new DWPF analytical method could result in a two to three fold improvement in sample analysis time.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples.The insert sample is named after the initial trials which placed the container inside the sample (peanut) vials. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock
Robust H2 performance for sampled-data systems
DEFF Research Database (Denmark)
Rank, Mike Lind
1997-01-01
Robust H2 performance conditions under structured uncertainty, analogous to well known methods for H∞ performance, have recently emerged in both discrete and continuous-time. This paper considers the extension into uncertain sampled-data systems, taking into account inter-sample behavior. Convex...... conditions for robust H2 performance are derived for different uncertainty sets...
Pavlacky, David C; Lukacs, Paul M; Blakesley, Jennifer A; Skorkowsky, Robert C; Klute, David S; Hahn, Beth A; Dreitz, Victoria J; George, T Luke; Hanni, David J
2017-01-01
Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1) coordination across organizations and regions, 2) meaningful management and conservation objectives, and 3) rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR) program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17). We provide two examples for the Brewer's sparrow (Spizella breweri) in BCR 17 demonstrating the ability of the design to 1) determine hierarchical population responses to landscape change and 2) estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous statistical
Directory of Open Access Journals (Sweden)
David C Pavlacky
Full Text Available Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1 coordination across organizations and regions, 2 meaningful management and conservation objectives, and 3 rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17. We provide two examples for the Brewer's sparrow (Spizella breweri in BCR 17 demonstrating the ability of the design to 1 determine hierarchical population responses to landscape change and 2 estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous
International Nuclear Information System (INIS)
Hofland, G.S.; Barton, C.C.
1990-01-01
The computer program FREQFIT is designed to perform regression and statistical chi-squared goodness of fit analysis on one-dimensional or two-dimensional data. The program features an interactive user dialogue, numerous help messages, an option for screen or line printer output, and the flexibility to use practically any commercially available graphics package to create plots of the program's results. FREQFIT is written in Microsoft QuickBASIC, for IBM-PC compatible computers. A listing of the QuickBASIC source code for the FREQFIT program, a user manual, and sample input data, output, and plots are included. 6 refs., 1 fig
Modified Distribution-Free Goodness-of-Fit Test Statistic.
Chun, So Yeon; Browne, Michael W; Shapiro, Alexander
2018-03-01
Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.
International Nuclear Information System (INIS)
Samatova, Nagiza F; Branstetter, Marcia; Ganguly, Auroop R; Hettich, Robert; Khan, Shiraj; Kora, Guruprasad; Li, Jiangtian; Ma, Xiaosong; Pan, Chongle; Shoshani, Arie; Yoginath, Srikanth
2006-01-01
Ultrascale computing and high-throughput experimental technologies have enabled the production of scientific data about complex natural phenomena. With this opportunity, comes a new problem - the massive quantities of data so produced. Answers to fundamental questions about the nature of those phenomena remain largely hidden in the produced data. The goal of this work is to provide a scalable high performance statistical data analysis framework to help scientists perform interactive analyses of these raw data to extract knowledge. Towards this goal we have been developing an open source parallel statistical analysis package, called Parallel R, that lets scientists employ a wide range of statistical analysis routines on high performance shared and distributed memory architectures without having to deal with the intricacies of parallelizing these routines
Energy Technology Data Exchange (ETDEWEB)
Wallace, Jack, E-mail: jack.wallace@ce.queensu.ca [Department of Civil Engineering, Queen’s University, Ellis Hall, 58 University Avenue, Kingston, Ontario K7L 3N6 (Canada); Champagne, Pascale, E-mail: champagne@civil.queensu.ca [Department of Civil Engineering, Queen’s University, Ellis Hall, 58 University Avenue, Kingston, Ontario K7L 3N6 (Canada); Monnier, Anne-Charlotte, E-mail: anne-charlotte.monnier@insa-lyon.fr [National Institute for Applied Sciences – Lyon, 20 Avenue Albert Einstein, 69621 Villeurbanne Cedex (France)
2015-01-15
Highlights: • Performance of a hybrid passive landfill leachate treatment system was evaluated. • 33 Water chemistry parameters were sampled for 21 months and statistically analyzed. • Parameters were strongly linked and explained most (>40%) of the variation in data. • Alkalinity, ammonia, COD, heavy metals, and iron were criteria for performance. • Eight other parameters were key in modeling system dynamics and criteria. - Abstract: A pilot-scale hybrid-passive treatment system operated at the Merrick Landfill in North Bay, Ontario, Canada, treats municipal landfill leachate and provides for subsequent natural attenuation. Collected leachate is directed to a hybrid-passive treatment system, followed by controlled release to a natural attenuation zone before entering the nearby Little Sturgeon River. The study presents a comprehensive evaluation of the performance of the system using multivariate statistical techniques to determine the interactions between parameters, major pollutants in the leachate, and the biological and chemical processes occurring in the system. Five parameters (ammonia, alkalinity, chemical oxygen demand (COD), “heavy” metals of interest, with atomic weights above calcium, and iron) were set as criteria for the evaluation of system performance based on their toxicity to aquatic ecosystems and importance in treatment with respect to discharge regulations. System data for a full range of water quality parameters over a 21-month period were analyzed using principal components analysis (PCA), as well as principal components (PC) and partial least squares (PLS) regressions. PCA indicated a high degree of association for most parameters with the first PC, which explained a high percentage (>40%) of the variation in the data, suggesting strong statistical relationships among most of the parameters in the system. Regression analyses identified 8 parameters (set as independent variables) that were most frequently retained for modeling
Efficient statistical tests to compare Youden index: accounting for contingency correlation.
Chen, Fangyao; Xue, Yuqiang; Tan, Ming T; Chen, Pingyan
2015-04-30
Youden index is widely utilized in studies evaluating accuracy of diagnostic tests and performance of predictive, prognostic, or risk models. However, both one and two independent sample tests on Youden index have been derived ignoring the dependence (association) between sensitivity and specificity, resulting in potentially misleading findings. Besides, paired sample test on Youden index is currently unavailable. This article develops efficient statistical inference procedures for one sample, independent, and paired sample tests on Youden index by accounting for contingency correlation, namely associations between sensitivity and specificity and paired samples typically represented in contingency tables. For one and two independent sample tests, the variances are estimated by Delta method, and the statistical inference is based on the central limit theory, which are then verified by bootstrap estimates. For paired samples test, we show that the estimated covariance of the two sensitivities and specificities can be represented as a function of kappa statistic so the test can be readily carried out. We then show the remarkable accuracy of the estimated variance using a constrained optimization approach. Simulation is performed to evaluate the statistical properties of the derived tests. The proposed approaches yield more stable type I errors at the nominal level and substantially higher power (efficiency) than does the original Youden's approach. Therefore, the simple explicit large sample solution performs very well. Because we can readily implement the asymptotic and exact bootstrap computation with common software like R, the method is broadly applicable to the evaluation of diagnostic tests and model performance. Copyright © 2015 John Wiley & Sons, Ltd.
Chemometric and Statistical Analyses of ToF-SIMS Spectra of Increasingly Complex Biological Samples
Energy Technology Data Exchange (ETDEWEB)
Berman, E S; Wu, L; Fortson, S L; Nelson, D O; Kulp, K S; Wu, K J
2007-10-24
Characterizing and classifying molecular variation within biological samples is critical for determining fundamental mechanisms of biological processes that will lead to new insights including improved disease understanding. Towards these ends, time-of-flight secondary ion mass spectrometry (ToF-SIMS) was used to examine increasingly complex samples of biological relevance, including monosaccharide isomers, pure proteins, complex protein mixtures, and mouse embryo tissues. The complex mass spectral data sets produced were analyzed using five common statistical and chemometric multivariate analysis techniques: principal component analysis (PCA), linear discriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), soft independent modeling of class analogy (SIMCA), and decision tree analysis by recursive partitioning. PCA was found to be a valuable first step in multivariate analysis, providing insight both into the relative groupings of samples and into the molecular basis for those groupings. For the monosaccharides, pure proteins and protein mixture samples, all of LDA, PLSDA, and SIMCA were found to produce excellent classification given a sufficient number of compound variables calculated. For the mouse embryo tissues, however, SIMCA did not produce as accurate a classification. The decision tree analysis was found to be the least successful for all the data sets, providing neither as accurate a classification nor chemical insight for any of the tested samples. Based on these results we conclude that as the complexity of the sample increases, so must the sophistication of the multivariate technique used to classify the samples. PCA is a preferred first step for understanding ToF-SIMS data that can be followed by either LDA or PLSDA for effective classification analysis. This study demonstrates the strength of ToF-SIMS combined with multivariate statistical and chemometric techniques to classify increasingly complex biological samples
Direct Learning of Systematics-Aware Summary Statistics
CERN. Geneva
2018-01-01
Complex machine learning tools, such as deep neural networks and gradient boosting algorithms, are increasingly being used to construct powerful discriminative features for High Energy Physics analyses. These methods are typically trained with simulated or auxiliary data samples by optimising some classification or regression surrogate objective. The learned feature representations are then used to build a sample-based statistical model to perform inference (e.g. interval estimation or hypothesis testing) over a set of parameters of interest. However, the effectiveness of the mentioned approach can be reduced by the presence of known uncertainties that cause differences between training and experimental data, included in the statistical model via nuisance parameters. This work presents an end-to-end algorithm, which leverages on existing deep learning technologies but directly aims to produce inference-optimal sample-summary statistics. By including the statistical model and a differentiable approximation of ...
Performance in College Chemistry: a Statistical Comparison Using Gender and Jungian Personality Type
Greene, Susan V.; Wheeler, Henry R.; Riley, Wayne D.
This study sorted college introductory chemistry students by gender and Jungian personality type. It recognized differences from the general population distribution and statistically compared the students' grades with their Jungian personality types. Data from 577 female students indicated that ESFP (extroverted, sensory, feeling, perceiving) and ENFP (extroverted, intuitive, feeling, perceiving) profiles performed poorly at statistically significant levels when compared with the distribution of females enrolled in introductory chemistry. The comparable analysis using data from 422 male students indicated that the poorly performing male profiles were ISTP (introverted, sensory, thinking, perceiving) and ESTP (extroverted, sensory, thinking, perceiving). ESTJ (extroverted, sensory, thinking, judging) female students withdrew from the course at a statistically significant level. For both genders, INTJ (introverted, intuitive, thinking, judging) students were the best performers. By examining the documented characteristics of Jungian profiles that correspond with poorly performing students in chemistry, one may more effectively assist the learning process and the retention of these individuals in the fields of natural science, engineering, and technology.
The statistical-inference approach to generalized thermodynamics
International Nuclear Information System (INIS)
Lavenda, B.H.; Scherer, C.
1987-01-01
Limit theorems, such as the central-limit theorem and the weak law of large numbers, are applicable to statistical thermodynamics for sufficiently large sample size of indipendent and identically distributed observations performed on extensive thermodynamic (chance) variables. The estimation of the intensive thermodynamic quantities is a problem in parametric statistical estimation. The normal approximation to the Gibbs' distribution is justified by the analysis of large deviations. Statistical thermodynamics is generalized to include the statistical estimation of variance as well as mean values
International Nuclear Information System (INIS)
Aslan, B.; Zech, G.
2005-01-01
We introduce the novel concept of statistical energy as a statistical tool. We define statistical energy of statistical distributions in a similar way as for electric charge distributions. Charges of opposite sign are in a state of minimum energy if they are equally distributed. This property is used to check whether two samples belong to the same parent distribution, to define goodness-of-fit tests and to unfold distributions distorted by measurement. The approach is binning-free and especially powerful in multidimensional applications
Oil pipeline performance review 1995, 1996, 1997, 1998 : Technical/statistical report
International Nuclear Information System (INIS)
2000-12-01
This document provides a summary of the pipeline performance and reportable pipeline failures of liquid hydrocarbon pipelines in Canada, for the years 1995 through 1998. The year 1994 was the last one for which the Oil Pipeline Performance Review (OPPR) was published on an annual basis. The OPPR will continue to be published until such time as the Pipeline Risk Assesment Sub-Committee (PRASC) has obtained enough pipeline failure data to be aggregated into a meaningful report. The shifts in the mix of reporting pipeline companies is apparent in the data presented, comparing the volumes transported and the traffic volume during the previous ten-year period. Another table presents a summary of the failures which occurred during the period under consideration, 1995-1998, allowing for a comparison with the data for the previous ten-year period. From the current perspective and from an historical context, this document provides a statistical review of the performance of the pipelines, covering refined petroleum product pipelines, clean oil pipelines and High Vapour Pressure (HVP) pipelines downstream of battery limits. Classified as reportable are spills of 1.5 cubic metre or more of liquid hydrocarbons, any amount of HVP material, any incident involving an injury, a death, a fire, or an explosion. For those companies that responded to the survey, the major items, including number of failures and volumes released are accurate. Samples of the forms used for collecting the information are provided within the document. 6 tabs., 1 fig
Performance of local information-based link prediction: a sampling perspective
Zhao, Jichang; Feng, Xu; Dong, Li; Liang, Xiao; Xu, Ke
2012-08-01
Link prediction is pervasively employed to uncover the missing links in the snapshots of real-world networks, which are usually obtained through different kinds of sampling methods. In the previous literature, in order to evaluate the performance of the prediction, known edges in the sampled snapshot are divided into the training set and the probe set randomly, without considering the underlying sampling approaches. However, different sampling methods might lead to different missing links, especially for the biased ways. For this reason, random partition-based evaluation of performance is no longer convincing if we take the sampling method into account. In this paper, we try to re-evaluate the performance of local information-based link predictions through sampling method governed division of the training set and the probe set. It is interesting that we find that for different sampling methods, each prediction approach performs unevenly. Moreover, most of these predictions perform weakly when the sampling method is biased, which indicates that the performance of these methods might have been overestimated in the prior works.
Predicting sample size required for classification performance
Directory of Open Access Journals (Sweden)
Figueroa Rosa L
2012-02-01
Full Text Available Abstract Background Supervised learning methods need annotated data in order to generate efficient models. Annotated data, however, is a relatively scarce resource and can be expensive to obtain. For both passive and active learning methods, there is a need to estimate the size of the annotated sample required to reach a performance target. Methods We designed and implemented a method that fits an inverse power law model to points of a given learning curve created using a small annotated training set. Fitting is carried out using nonlinear weighted least squares optimization. The fitted model is then used to predict the classifier's performance and confidence interval for larger sample sizes. For evaluation, the nonlinear weighted curve fitting method was applied to a set of learning curves generated using clinical text and waveform classification tasks with active and passive sampling methods, and predictions were validated using standard goodness of fit measures. As control we used an un-weighted fitting method. Results A total of 568 models were fitted and the model predictions were compared with the observed performances. Depending on the data set and sampling method, it took between 80 to 560 annotated samples to achieve mean average and root mean squared error below 0.01. Results also show that our weighted fitting method outperformed the baseline un-weighted method (p Conclusions This paper describes a simple and effective sample size prediction algorithm that conducts weighted fitting of learning curves. The algorithm outperformed an un-weighted algorithm described in previous literature. It can help researchers determine annotation sample size for supervised machine learning.
International Nuclear Information System (INIS)
Balcazar G, M.; Flores R, J.H.
1992-01-01
As part of the knowledge about the radiometric surface exploration, carried out in the geothermal field of Chipilapa, El Salvador, its were considered the geo-statistical parameters starting from the calculated variogram of the field data, being that the maxim distance of correlation of the samples in 'radon' in the different observation addresses (N-S, E-W, N W-S E, N E-S W), it was of 121 mts for the monitoring grill in future prospectus in the same area. Being derived of it an optimization (minimum cost) in the spacing of the field samples by means of geo-statistical techniques, without losing the detection of the anomaly. (Author)
A statistical model for predicting muscle performance
Byerly, Diane Leslie De Caix
The objective of these studies was to develop a capability for predicting muscle performance and fatigue to be utilized for both space- and ground-based applications. To develop this predictive model, healthy test subjects performed a defined, repetitive dynamic exercise to failure using a Lordex spinal machine. Throughout the exercise, surface electromyography (SEMG) data were collected from the erector spinae using a Mega Electronics ME3000 muscle tester and surface electrodes placed on both sides of the back muscle. These data were analyzed using a 5th order Autoregressive (AR) model and statistical regression analysis. It was determined that an AR derived parameter, the mean average magnitude of AR poles, significantly correlated with the maximum number of repetitions (designated Rmax) that a test subject was able to perform. Using the mean average magnitude of AR poles, a test subject's performance to failure could be predicted as early as the sixth repetition of the exercise. This predictive model has the potential to provide a basis for improving post-space flight recovery, monitoring muscle atrophy in astronauts and assessing the effectiveness of countermeasures, monitoring astronaut performance and fatigue during Extravehicular Activity (EVA) operations, providing pre-flight assessment of the ability of an EVA crewmember to perform a given task, improving the design of training protocols and simulations for strenuous International Space Station assembly EVA, and enabling EVA work task sequences to be planned enhancing astronaut performance and safety. Potential ground-based, medical applications of the predictive model include monitoring muscle deterioration and performance resulting from illness, establishing safety guidelines in the industry for repetitive tasks, monitoring the stages of rehabilitation for muscle-related injuries sustained in sports and accidents, and enhancing athletic performance through improved training protocols while reducing
Statistical analysis in MSW collection performance assessment.
Teixeira, Carlos Afonso; Avelino, Catarina; Ferreira, Fátima; Bentes, Isabel
2014-09-01
The increase of Municipal Solid Waste (MSW) generated over the last years forces waste managers pursuing more effective collection schemes, technically viable, environmentally effective and economically sustainable. The assessment of MSW services using performance indicators plays a crucial role for improving service quality. In this work, we focus on the relevance of regular system monitoring as a service assessment tool. In particular, we select and test a core-set of MSW collection performance indicators (effective collection distance, effective collection time and effective fuel consumption) that highlights collection system strengths and weaknesses and supports pro-active management decision-making and strategic planning. A statistical analysis was conducted with data collected in mixed collection system of Oporto Municipality, Portugal, during one year, a week per month. This analysis provides collection circuits' operational assessment and supports effective short-term municipality collection strategies at the level of, e.g., collection frequency and timetables, and type of containers. Copyright © 2014 Elsevier Ltd. All rights reserved.
Energy Technology Data Exchange (ETDEWEB)
Solaimani, Mohiuddin [Univ. of Texas-Dallas, Richardson, TX (United States); Iftekhar, Mohammed [Univ. of Texas-Dallas, Richardson, TX (United States); Khan, Latifur [Univ. of Texas-Dallas, Richardson, TX (United States); Thuraisingham, Bhavani [Univ. of Texas-Dallas, Richardson, TX (United States); Ingram, Joey Burton [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2015-09-01
Anomaly detection refers to the identi cation of an irregular or unusual pat- tern which deviates from what is standard, normal, or expected. Such deviated patterns typically correspond to samples of interest and are assigned different labels in different domains, such as outliers, anomalies, exceptions, or malware. Detecting anomalies in fast, voluminous streams of data is a formidable chal- lenge. This paper presents a novel, generic, real-time distributed anomaly detection framework for heterogeneous streaming data where anomalies appear as a group. We have developed a distributed statistical approach to build a model and later use it to detect anomaly. As a case study, we investigate group anomaly de- tection for a VMware-based cloud data center, which maintains a large number of virtual machines (VMs). We have built our framework using Apache Spark to get higher throughput and lower data processing time on streaming data. We have developed a window-based statistical anomaly detection technique to detect anomalies that appear sporadically. We then relaxed this constraint with higher accuracy by implementing a cluster-based technique to detect sporadic and continuous anomalies. We conclude that our cluster-based technique out- performs other statistical techniques with higher accuracy and lower processing time.
Can We Use Polya’s Method to Improve Students’ Performance in the Statistics Classes?
Directory of Open Access Journals (Sweden)
Indika Wickramasinghe
2015-01-01
Full Text Available In this study, Polya’s problem-solving method is introduced in a statistics class in an effort to enhance students’ performance. Teaching the method was applied to one of the two introductory-level statistics classes taught by the same instructor, and a comparison was made between the performances in the two classes. The results indicate there was a significant improvement of the students’ performance in the class in which Polya’s method was introduced.
Zhang, Qian; Harman, Ciaran J.; Kirchner, James W.
2018-02-01
River water-quality time series often exhibit fractal scaling, which here refers to autocorrelation that decays as a power law over some range of scales. Fractal scaling presents challenges to the identification of deterministic trends because (1) fractal scaling has the potential to lead to false inference about the statistical significance of trends and (2) the abundance of irregularly spaced data in water-quality monitoring networks complicates efforts to quantify fractal scaling. Traditional methods for estimating fractal scaling - in the form of spectral slope (β) or other equivalent scaling parameters (e.g., Hurst exponent) - are generally inapplicable to irregularly sampled data. Here we consider two types of estimation approaches for irregularly sampled data and evaluate their performance using synthetic time series. These time series were generated such that (1) they exhibit a wide range of prescribed fractal scaling behaviors, ranging from white noise (β = 0) to Brown noise (β = 2) and (2) their sampling gap intervals mimic the sampling irregularity (as quantified by both the skewness and mean of gap-interval lengths) in real water-quality data. The results suggest that none of the existing methods fully account for the effects of sampling irregularity on β estimation. First, the results illustrate the danger of using interpolation for gap filling when examining autocorrelation, as the interpolation methods consistently underestimate or overestimate β under a wide range of prescribed β values and gap distributions. Second, the widely used Lomb-Scargle spectral method also consistently underestimates β. A previously published modified form, using only the lowest 5 % of the frequencies for spectral slope estimation, has very poor precision, although the overall bias is small. Third, a recent wavelet-based method, coupled with an aliasing filter, generally has the smallest bias and root-mean-squared error among all methods for a wide range of
Statistical modelling of networked human-automation performance using working memory capacity.
Ahmed, Nisar; de Visser, Ewart; Shaw, Tyler; Mohamed-Ameen, Amira; Campbell, Mark; Parasuraman, Raja
2014-01-01
This study examines the challenging problem of modelling the interaction between individual attentional limitations and decision-making performance in networked human-automation system tasks. Analysis of real experimental data from a task involving networked supervision of multiple unmanned aerial vehicles by human participants shows that both task load and network message quality affect performance, but that these effects are modulated by individual differences in working memory (WM) capacity. These insights were used to assess three statistical approaches for modelling and making predictions with real experimental networked supervisory performance data: classical linear regression, non-parametric Gaussian processes and probabilistic Bayesian networks. It is shown that each of these approaches can help designers of networked human-automated systems cope with various uncertainties in order to accommodate future users by linking expected operating conditions and performance from real experimental data to observable cognitive traits like WM capacity. Practitioner Summary: Working memory (WM) capacity helps account for inter-individual variability in operator performance in networked unmanned aerial vehicle supervisory tasks. This is useful for reliable performance prediction near experimental conditions via linear models; robust statistical prediction beyond experimental conditions via Gaussian process models and probabilistic inference about unknown task conditions/WM capacities via Bayesian network models.
Energy Technology Data Exchange (ETDEWEB)
Cabalin, L.M.; Gonzalez, A. [Department of Analytical Chemistry, University of Malaga, E-29071 Malaga (Spain); Ruiz, J. [Department of Applied Physics I, University of Malaga, E-29071 Malaga (Spain); Laserna, J.J., E-mail: laserna@uma.e [Department of Analytical Chemistry, University of Malaga, E-29071 Malaga (Spain)
2010-08-15
Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s{sup -1}. Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.
Cabalín, L. M.; González, A.; Ruiz, J.; Laserna, J. J.
2010-08-01
Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s - 1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.
International Nuclear Information System (INIS)
Cabalin, L.M.; Gonzalez, A.; Ruiz, J.; Laserna, J.J.
2010-01-01
Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s -1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.
Business Statistics: A Comparison of Student Performance in Three Learning Modes
Simmons, Gerald R.
2014-01-01
The purpose of this study was to compare the performance of three teaching modes and age groups of business statistics sections in terms of course exam scores. The research questions were formulated to determine the performance of the students within each teaching mode, to compare each mode in terms of exam scores, and to compare exam scores by…
International Nuclear Information System (INIS)
Fossum, Kristian; Mannseth, Trond
2014-01-01
We assess and compare parameter sampling capabilities of one sequential and one simultaneous Bayesian, ensemble-based, joint state-parameter (JS) estimation method. In the companion paper, part I (Fossum and Mannseth 2014 Inverse Problems 30 114002), analytical investigations lead us to propose three claims, essentially stating that the sequential method can be expected to outperform the simultaneous method for weakly nonlinear forward models. Here, we assess the reliability and robustness of these claims through statistical analysis of results from a range of numerical experiments. Samples generated by the two approximate JS methods are compared to samples from the posterior distribution generated by a Markov chain Monte Carlo method, using four approximate measures of distance between probability distributions. Forward-model nonlinearity is assessed from a stochastic nonlinearity measure allowing for sufficiently large model dimensions. Both toy models (with low computational complexity, and where the nonlinearity is fairly easy to control) and two-phase porous-media flow models (corresponding to down-scaled versions of problems to which the JS methods have been frequently applied recently) are considered in the numerical experiments. Results from the statistical analysis show strong support of all three claims stated in part I. (paper)
A comparison of linear and nonlinear statistical techniques in performance attribution.
Chan, N H; Genovese, C R
2001-01-01
Performance attribution is usually conducted under the linear framework of multifactor models. Although commonly used by practitioners in finance, linear multifactor models are known to be less than satisfactory in many situations. After a brief survey of nonlinear methods, nonlinear statistical techniques are applied to performance attribution of a portfolio constructed from a fixed universe of stocks using factors derived from some commonly used cross sectional linear multifactor models. By rebalancing this portfolio monthly, the cumulative returns for procedures based on standard linear multifactor model and three nonlinear techniques-model selection, additive models, and neural networks-are calculated and compared. It is found that the first two nonlinear techniques, especially in combination, outperform the standard linear model. The results in the neural-network case are inconclusive because of the great variety of possible models. Although these methods are more complicated and may require some tuning, toolboxes are developed and suggestions on calibration are proposed. This paper demonstrates the usefulness of modern nonlinear statistical techniques in performance attribution.
International Nuclear Information System (INIS)
Ghanbari, Y.; Habibnia, A.; Memar, A.
2009-01-01
In geochemical stream sediment surveys in Moghangegh Region in north west of Iran, sheet 1:50,000, 152 samples were collected and after the analyze and processing of data, it revealed that Yb, Sc, Ni, Li, Eu, Cd, Co, as contents in one sample is far higher than other samples. After detecting this sample as an outlier sample, the effect of this sample on multivariate statistical data processing for destructive effects of outlier sample in geochemical exploration was investigated. Pearson and Spear man correlation coefficient methods and cluster analysis were used for multivariate studies and the scatter plot of some elements together the regression profiles are given in case of 152 and 151 samples and the results are compared. After investigation of multivariate statistical data processing results, it was realized that results of existence of outlier samples may appear as the following relations between elements: - true relation between two elements, which have no outlier frequency in the outlier sample. - false relation between two elements which one of them has outlier frequency in the outlier sample. - complete false relation between two elements which both have outlier frequency in the outlier sample
Bhuiya, Abbas; Hanifi, S M A; Roy, Nikhil; Streatfield, P Kim
2007-03-01
This paper compared the performance of the lot quality assurance sampling (LQAS) method in identifying inadequately-performing health work-areas with that of using health and demographic surveillance system (HDSS) data and examined the feasibility of applying the method by field-level programme supervisors. The study was carried out in Matlab, the field site of ICDDR,B, where a HDSS has been in place for over 30 years. The LQAS method was applied in 57 work-areas of community health workers in ICDDR,B-served areas in Matlab during July-September 2002. The performance of the LQAS method in identifying work-areas with adequate and inadequate coverage of various health services was compared with those of the HDSS. The health service-coverage indicators included coverage of DPT, measles, BCG vaccination, and contraceptive use. It was observed that the difference in the proportion of work-areas identified to be inadequately performing using the LQAS method with less than 30 respondents, and the HDSS was not statistically significant. The consistency between the LQAS method and the HDSS in identifying work-areas was greater for adequately-performing areas than inadequately-performing areas. It was also observed that the field managers could be trained to apply the LQAS method in monitoring their performance in reaching the target population.
Exact distributions of two-sample rank statistics and block rank statistics using computer algebra
Wiel, van de M.A.
1998-01-01
We derive generating functions for various rank statistics and we use computer algebra to compute the exact null distribution of these statistics. We present various techniques for reducing time and memory space used by the computations. We use the results to write Mathematica notebooks for
International Nuclear Information System (INIS)
Gilbert, R.O.; Bernhardt, D.E.; Hahn, P.B.
1983-01-01
A summary of a field soil sampling study conducted around the Rocky Flats Colorado plant in May 1977 is preseted. Several different soil sampling techniques that had been used in the area were applied at four different sites. One objective was to comparethe average 239 - 240 Pu concentration values obtained by the various soil sampling techniques used. There was also interest in determining whether there are differences in the reproducibility of the various techniques and how the techniques compared with the proposed EPA technique of sampling to 1 cm depth. Statistically significant differences in average concentrations between the techniques were found. The differences could be largely related to the differences in sampling depth-the primary physical variable between the techniques. The reproducibility of the techniques was evaluated by comparing coefficients of variation. Differences between coefficients of variation were not statistically significant. Average (median) coefficients ranged from 21 to 42 percent for the five sampling techniques. A laboratory study indicated that various sample treatment and particle sizing techniques could increase the concentration of plutonium in the less than 10 micrometer size fraction by up to a factor of about 4 compared to the 2 mm size fraction
De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos
2014-06-01
Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.
Energy Technology Data Exchange (ETDEWEB)
Mishra, Srikanta; Schuetter, Jared
2014-11-01
We compare two approaches for building a statistical proxy model (metamodel) for CO₂ geologic sequestration from the results of full-physics compositional simulations. The first approach involves a classical Box-Behnken or Augmented Pairs experimental design with a quadratic polynomial response surface. The second approach used a space-filling maxmin Latin Hypercube sampling or maximum entropy design with the choice of five different meta-modeling techniques: quadratic polynomial, kriging with constant and quadratic trend terms, multivariate adaptive regression spline (MARS) and additivity and variance stabilization (AVAS). Simulations results for CO₂ injection into a reservoir-caprock system with 9 design variables (and 97 samples) were used to generate the data for developing the proxy models. The fitted models were validated with using an independent data set and a cross-validation approach for three different performance metrics: total storage efficiency, CO₂ plume radius and average reservoir pressure. The Box-Behnken–quadratic polynomial metamodel performed the best, followed closely by the maximin LHS–kriging metamodel.
Statistical Control Charts: Performances of Short Term Stock Trading in Croatia
Directory of Open Access Journals (Sweden)
Dumičić Ksenija
2015-03-01
Full Text Available Background: The stock exchange, as a regulated financial market, in modern economies reflects their economic development level. The stock market indicates the mood of investors in the development of a country and is an important ingredient for growth. Objectives: This paper aims to introduce an additional statistical tool used to support the decision-making process in stock trading, and it investigate the usage of statistical process control (SPC methods into the stock trading process. Methods/Approach: The individual (I, exponentially weighted moving average (EWMA and cumulative sum (CUSUM control charts were used for gaining trade signals. The open and the average prices of CROBEX10 index stocks on the Zagreb Stock Exchange were used in the analysis. The statistical control charts capabilities for stock trading in the short-run were analysed. Results: The statistical control chart analysis pointed out too many signals to buy or sell stocks. Most of them are considered as false alarms. So, the statistical control charts showed to be not so much useful in stock trading or in a portfolio analysis. Conclusions: The presence of non-normality and autocorellation has great impact on statistical control charts performances. It is assumed that if these two problems are solved, the use of statistical control charts in a portfolio analysis could be greatly improved.
Finite-sample instrumental variables Inference using an Asymptotically Pivotal Statistic
Bekker, P.; Kleibergen, F.R.
2001-01-01
The paper considers the K-statistic, Kleibergen’s (2000) adaptation ofthe Anderson-Rubin (AR) statistic in instrumental variables regression.Compared to the AR-statistic this K-statistic shows improvedasymptotic efficiency in terms of degrees of freedom in overidentifiedmodels and yet it shares,
A preliminary study on identification of Thai rice samples by INAA and statistical analysis
Kongsri, S.; Kukusamude, C.
2017-09-01
This study aims to investigate the elemental compositions in 93 Thai rice samples using instrumental neutron activation analysis (INAA) and to identify rice according to their types and rice cultivars using statistical analysis. As, Mg, Cl, Al, Br, Mn, K, Rb and Zn in Thai jasmine rice and Sung Yod rice samples were successfully determined by INAA. The accuracy and precision of the INAA method were verified by SRM 1568a Rice Flour. All elements were found to be in a good agreement with the certified values. The precisions in term of %RSD were lower than 7%. The LODs were obtained in range of 0.01 to 29 mg kg-1. The concentration of 9 elements distributed in Thai rice samples was evaluated and used as chemical indicators to identify the type of rice samples. The result found that Mg, Cl, As, Br, Mn, K, Rb, and Zn concentrations in Thai jasmine rice samples are significantly different but there was no evidence that Al is significantly different from concentration in Sung Yod rice samples at 95% confidence interval. Our results may provide preliminary information for discrimination of rice samples and may be useful database of Thai rice.
Finite-sample instrumental variables inference using an asymptotically pivotal statistic
Bekker, Paul A.; Kleibergen, Frank
2001-01-01
The paper considers the K-statistic, Kleibergen’s (2000) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Compared to the AR-statistic this K-statistic shows improved asymptotic efficiency in terms of degrees of freedom in overidenti?ed models and yet it shares,
Spatial scan statistics to assess sampling strategy of antimicrobial resistance monitoring programme
DEFF Research Database (Denmark)
Vieira, Antonio; Houe, Hans; Wegener, Henrik Caspar
2009-01-01
Pie collection and analysis of data on antimicrobial resistance in human and animal Populations are important for establishing a baseline of the occurrence of resistance and for determining trends over time. In animals, targeted monitoring with a stratified sampling plan is normally used. However...... sampled by the Danish Integrated Antimicrobial Resistance Monitoring and Research Programme (DANMAP), by identifying spatial Clusters of samples and detecting areas with significantly high or low sampling rates. These analyses were performed for each year and for the total 5-year study period for all...... by an antimicrobial monitoring program....
Statistical analyses of the performance of Macedonian investment and pension funds
Directory of Open Access Journals (Sweden)
Petar Taleski
2015-10-01
Full Text Available The foundation of the post-modern portfolio theory is creating a portfolio based on a desired target return. This specifically applies to the performance of investment and pension funds that provide a rate of return meeting payment requirements from investment funds. A desired target return is the goal of an investment or pension fund. It is the primary benchmark used to measure performances, dynamic monitoring and evaluation of the risk–return ratio on investment funds. The analysis in this paper is based on monthly returns of Macedonian investment and pension funds (June 2011 - June 2014. Such analysis utilizes the basic, but highly informative statistical characteristic moments like skewness, kurtosis, Jarque–Bera, and Chebyishev’s Inequality. The objective of this study is to perform a trough analysis, utilizing the above mentioned and other types of statistical techniques (Sharpe, Sortino, omega, upside potential, Calmar, Sterling to draw relevant conclusions regarding the risks and characteristic moments in Macedonian investment and pension funds. Pension funds are the second largest segment of the financial system, and has great potential for further growth due to constant inflows from pension insurance. The importance of investment funds for the financial system in the Republic of Macedonia is still small, although open-end investment funds have been the fastest growing segment of the financial system. Statistical analysis has shown that pension funds have delivered a significantly positive volatility-adjusted risk premium in the analyzed period more so than investment funds.
CONFIDENCE LEVELS AND/VS. STATISTICAL HYPOTHESIS TESTING IN STATISTICAL ANALYSIS. CASE STUDY
Directory of Open Access Journals (Sweden)
ILEANA BRUDIU
2009-05-01
Full Text Available Estimated parameters with confidence intervals and testing statistical assumptions used in statistical analysis to obtain conclusions on research from a sample extracted from the population. Paper to the case study presented aims to highlight the importance of volume of sample taken in the study and how this reflects on the results obtained when using confidence intervals and testing for pregnant. If statistical testing hypotheses not only give an answer "yes" or "no" to some questions of statistical estimation using statistical confidence intervals provides more information than a test statistic, show high degree of uncertainty arising from small samples and findings build in the "marginally significant" or "almost significant (p very close to 0.05.
Energy Technology Data Exchange (ETDEWEB)
Lee, Kyung Hoon; Park, Ho Jin; Lee, Chung Chan; Cho, Jin Young [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)
2015-10-15
The purpose of this paper is to study the effect on output parameters in the lattice physics calculation due to the last input uncertainty such as manufacturing deviations from nominal value for material composition and geometric dimensions. In a nuclear design and analysis, the lattice physics calculations are usually employed to generate lattice parameters for the nodal core simulation and pin power reconstruction. These lattice parameters which consist of homogenized few-group cross-sections, assembly discontinuity factors, and form-functions can be affected by input uncertainties which arise from three different sources: 1) multi-group cross-section uncertainties, 2) the uncertainties associated with methods and modeling approximations utilized in lattice physics codes, and 3) fuel/assembly manufacturing uncertainties. In this paper, data provided by the light water reactor (LWR) uncertainty analysis in modeling (UAM) benchmark has been used as the manufacturing uncertainties. First, the effect of each input parameter has been investigated through sensitivity calculations at the fuel assembly level. Then, uncertainty in prediction of peaking factor due to the most sensitive input parameter has been estimated using the statistical sampling method, often called the brute force method. For our analysis, the two-dimensional transport lattice code DeCART2D and its ENDF/B-VII.1 based 47-group library were used to perform the lattice physics calculation. Sensitivity calculations have been performed in order to study the influence of manufacturing tolerances on the lattice parameters. The manufacturing tolerance that has the largest influence on the k-inf is the fuel density. The second most sensitive parameter is the outer clad diameter.
Li, Jin-Na; Er, Meng-Joo; Tan, Yen-Kheng; Yu, Hai-Bin; Zeng, Peng
2015-09-01
This paper investigates an adaptive sampling rate control scheme for networked control systems (NCSs) subject to packet disordering. The main objectives of the proposed scheme are (a) to avoid heavy packet disordering existing in communication networks and (b) to stabilize NCSs with packet disordering, transmission delay and packet loss. First, a novel sampling rate control algorithm based on statistical characteristics of disordering entropy is proposed; secondly, an augmented closed-loop NCS that consists of a plant, a sampler and a state-feedback controller is transformed into an uncertain and stochastic system, which facilitates the controller design. Then, a sufficient condition for stochastic stability in terms of Linear Matrix Inequalities (LMIs) is given. Moreover, an adaptive tracking controller is designed such that the sampling period tracks a desired sampling period, which represents a significant contribution. Finally, experimental results are given to illustrate the effectiveness and advantages of the proposed scheme. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Directory of Open Access Journals (Sweden)
Q. Zhang
2018-02-01
Full Text Available River water-quality time series often exhibit fractal scaling, which here refers to autocorrelation that decays as a power law over some range of scales. Fractal scaling presents challenges to the identification of deterministic trends because (1 fractal scaling has the potential to lead to false inference about the statistical significance of trends and (2 the abundance of irregularly spaced data in water-quality monitoring networks complicates efforts to quantify fractal scaling. Traditional methods for estimating fractal scaling – in the form of spectral slope (β or other equivalent scaling parameters (e.g., Hurst exponent – are generally inapplicable to irregularly sampled data. Here we consider two types of estimation approaches for irregularly sampled data and evaluate their performance using synthetic time series. These time series were generated such that (1 they exhibit a wide range of prescribed fractal scaling behaviors, ranging from white noise (β = 0 to Brown noise (β = 2 and (2 their sampling gap intervals mimic the sampling irregularity (as quantified by both the skewness and mean of gap-interval lengths in real water-quality data. The results suggest that none of the existing methods fully account for the effects of sampling irregularity on β estimation. First, the results illustrate the danger of using interpolation for gap filling when examining autocorrelation, as the interpolation methods consistently underestimate or overestimate β under a wide range of prescribed β values and gap distributions. Second, the widely used Lomb–Scargle spectral method also consistently underestimates β. A previously published modified form, using only the lowest 5 % of the frequencies for spectral slope estimation, has very poor precision, although the overall bias is small. Third, a recent wavelet-based method, coupled with an aliasing filter, generally has the smallest bias and root-mean-squared error among
Ten-Doménech, Isabel; Beltrán-Iturat, Eduardo; Herrero-Martínez, José Manuel; Sancho-Llopis, Juan Vicente; Simó-Alfonso, Ernesto Francisco
2015-06-24
In this work, a method for the separation of triacylglycerols (TAGs) present in human milk and from other mammalian species by reversed-phase high-performance liquid chromatography using a core-shell particle packed column with UV and evaporative light-scattering detectors is described. Under optimal conditions, a mobile phase containing acetonitrile/n-pentanol at 10 °C gave an excellent resolution among more than 50 TAG peaks. A small-scale method for fat extraction in these milks (particularly of interest for human milk samples) using minimal amounts of sample and reagents was also developed. The proposed extraction protocol and the traditional method were compared, giving similar results, with respect to the total fat and relative TAG contents. Finally, a statistical study based on linear discriminant analysis on the TAG composition of different types of milks (human, cow, sheep, and goat) was carried out to differentiate the samples according to their mammalian origin.
Kiekkas, Panagiotis; Panagiotarou, Aliki; Malja, Alvaro; Tahirai, Daniela; Zykai, Rountina; Bakalis, Nick; Stefanopoulos, Nikolaos
2015-12-01
Although statistical knowledge and skills are necessary for promoting evidence-based practice, health sciences students have expressed anxiety about statistics courses, which may hinder their learning of statistical concepts. To evaluate the effects of a biostatistics course on nursing students' attitudes toward statistics and to explore the association between these attitudes and their performance in the course examination. One-group quasi-experimental pre-test/post-test design. Undergraduate nursing students of the fifth or higher semester of studies, who attended a biostatistics course. Participants were asked to complete the pre-test and post-test forms of The Survey of Attitudes Toward Statistics (SATS)-36 scale at the beginning and end of the course respectively. Pre-test and post-test scale scores were compared, while correlations between post-test scores and participants' examination performance were estimated. Among 156 participants, post-test scores of the overall SATS-36 scale and of the Affect, Cognitive Competence, Interest and Effort components were significantly higher than pre-test ones, indicating that the course was followed by more positive attitudes toward statistics. Among 104 students who participated in the examination, higher post-test scores of the overall SATS-36 scale and of the Affect, Difficulty, Interest and Effort components were significantly but weakly correlated with higher examination performance. Students' attitudes toward statistics can be improved through appropriate biostatistics courses, while positive attitudes contribute to higher course achievements and possibly to improved statistical skills in later professional life. Copyright © 2015 Elsevier Ltd. All rights reserved.
International Nuclear Information System (INIS)
Edjabou, Maklawe Essonanawe; Jensen, Morten Bang; Götze, Ramona; Pivnenko, Kostyantyn; Petersen, Claus; Scheutz, Charlotte; Astrup, Thomas Fruergaard
2015-01-01
Highlights: • Tiered approach to waste sorting ensures flexibility and facilitates comparison of solid waste composition data. • Food and miscellaneous wastes are the main fractions contributing to the residual household waste. • Separation of food packaging from food leftovers during sorting is not critical for determination of the solid waste composition. - Abstract: Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10–50 waste fractions, organised according to a three-level (tiered approach) facilitating comparison of the waste data between individual sub-areas with different fractionation (waste from one municipality was sorted at “Level III”, e.g. detailed, while the two others were sorted only at “Level I”). The results showed that residual household waste mainly contained food waste (42 ± 5%, mass per wet basis) and miscellaneous combustibles (18 ± 3%, mass per wet basis). The residual household waste generation rate in the study areas was 3–4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three municipalities. While the waste generation rates were similar for each of the two housing types (single
Energy Technology Data Exchange (ETDEWEB)
Edjabou, Maklawe Essonanawe, E-mail: vine@env.dtu.dk [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark); Jensen, Morten Bang; Götze, Ramona; Pivnenko, Kostyantyn [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark); Petersen, Claus [Econet AS, Omøgade 8, 2.sal, 2100 Copenhagen (Denmark); Scheutz, Charlotte; Astrup, Thomas Fruergaard [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark)
2015-02-15
Highlights: • Tiered approach to waste sorting ensures flexibility and facilitates comparison of solid waste composition data. • Food and miscellaneous wastes are the main fractions contributing to the residual household waste. • Separation of food packaging from food leftovers during sorting is not critical for determination of the solid waste composition. - Abstract: Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10–50 waste fractions, organised according to a three-level (tiered approach) facilitating comparison of the waste data between individual sub-areas with different fractionation (waste from one municipality was sorted at “Level III”, e.g. detailed, while the two others were sorted only at “Level I”). The results showed that residual household waste mainly contained food waste (42 ± 5%, mass per wet basis) and miscellaneous combustibles (18 ± 3%, mass per wet basis). The residual household waste generation rate in the study areas was 3–4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three municipalities. While the waste generation rates were similar for each of the two housing types (single
International Nuclear Information System (INIS)
Picard, R.R.
1987-01-01
Verification of an inventory or of a reported material unaccounted for (MUF) calls for the remeasurement of a sample of items by an inspector followed by comparison of the inspector's data to the facility's reported values. Such comparison is intended to protect against falsification of accounting data that could conceal material loss. In the international arena, the observed discrepancies between the inspector's data and the reported data are quantified using the D statistic. If data have been falsified by the facility, the standard deviations of the D and MUF-D statistics are inflated owing to the sampling distribution. Moreover, under certain conditions the distributions of those statistics can depart markedly from normality, complicating evaluation of an inspection plan's performance. Detection probabilities estimated using standard deviations appropriate for the no-falsification case in conjunction with assumed normality can be far too optimistic. Under very general conditions regarding the facility's and/or the inspector's measurement error procedures and the inspector's sampling regime, the variance of the MUF-D statistic can be broken into three components. The inspection's sensitivity against various falsification scenarios can be traced to one or more of these components. Obvious implications exist for the planning of effective inspections, particularly in the area of resource optimization
Bourne, Victoria J.
2018-01-01
Statistics anxiety is experienced by a large number of psychology students, and previous research has examined a range of potential correlates, including academic performance, mathematical ability and psychological predictors. These varying predictors are often considered separately, although there may be shared variance between them. In the…
Hagell, Peter; Westergren, Albert
Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).
Statistical Analysis of EGFR Structures’ Performance in Virtual Screening
Li, Yan; Li, Xiang; Dong, Zigang
2015-01-01
In this work the ability of EGFR structures to distinguish true inhibitors from decoys in docking and MM-PBSA is assessed by statistical procedures. The docking performance depends critically on the receptor conformation and bound state. The enrichment of known inhibitors is well correlated with the difference between EGFR structures rather than the bound-ligand property. The optimal structures for virtual screening can be selected based purely on the complex information. And the mixed combination of distinct EGFR conformations is recommended for ensemble docking. In MM-PBSA, a variety of EGFR structures have identically good performance in the scoring and ranking of known inhibitors, indicating that the choice of the receptor structure has little effect on the screening. PMID:26476847
Serdobolskii, Vadim Ivanovich
2007-01-01
This monograph presents mathematical theory of statistical models described by the essentially large number of unknown parameters, comparable with sample size but can also be much larger. In this meaning, the proposed theory can be called "essentially multiparametric". It is developed on the basis of the Kolmogorov asymptotic approach in which sample size increases along with the number of unknown parameters.This theory opens a way for solution of central problems of multivariate statistics, which up until now have not been solved. Traditional statistical methods based on the idea of an infinite sampling often break down in the solution of real problems, and, dependent on data, can be inefficient, unstable and even not applicable. In this situation, practical statisticians are forced to use various heuristic methods in the hope the will find a satisfactory solution.Mathematical theory developed in this book presents a regular technique for implementing new, more efficient versions of statistical procedures. ...
Statistical sampling methods for soils monitoring
Ann M. Abbott
2010-01-01
Development of the best sampling design to answer a research question should be an interactive venture between the land manager or researcher and statisticians, and is the result of answering various questions. A series of questions that can be asked to guide the researcher in making decisions that will arrive at an effective sampling plan are described, and a case...
Huang, Erich P; Wang, Xiao-Feng; Choudhury, Kingshuk Roy; McShane, Lisa M; Gönen, Mithat; Ye, Jingjing; Buckler, Andrew J; Kinahan, Paul E; Reeves, Anthony P; Jackson, Edward F; Guimaraes, Alexander R; Zahlmann, Gudrun
2015-02-01
Medical imaging serves many roles in patient care and the drug approval process, including assessing treatment response and guiding treatment decisions. These roles often involve a quantitative imaging biomarker, an objectively measured characteristic of the underlying anatomic structure or biochemical process derived from medical images. Before a quantitative imaging biomarker is accepted for use in such roles, the imaging procedure to acquire it must undergo evaluation of its technical performance, which entails assessment of performance metrics such as repeatability and reproducibility of the quantitative imaging biomarker. Ideally, this evaluation will involve quantitative summaries of results from multiple studies to overcome limitations due to the typically small sample sizes of technical performance studies and/or to include a broader range of clinical settings and patient populations. This paper is a review of meta-analysis procedures for such an evaluation, including identification of suitable studies, statistical methodology to evaluate and summarize the performance metrics, and complete and transparent reporting of the results. This review addresses challenges typical of meta-analyses of technical performance, particularly small study sizes, which often causes violations of assumptions underlying standard meta-analysis techniques. Alternative approaches to address these difficulties are also presented; simulation studies indicate that they outperform standard techniques when some studies are small. The meta-analysis procedures presented are also applied to actual [18F]-fluorodeoxyglucose positron emission tomography (FDG-PET) test-retest repeatability data for illustrative purposes. © The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Statistical evaluation of vibration analysis techniques
Milner, G. Martin; Miller, Patrice S.
1987-01-01
An evaluation methodology is presented for a selection of candidate vibration analysis techniques applicable to machinery representative of the environmental control and life support system of advanced spacecraft; illustrative results are given. Attention is given to the statistical analysis of small sample experiments, the quantification of detection performance for diverse techniques through the computation of probability of detection versus probability of false alarm, and the quantification of diagnostic performance.
Eum, H. I.; Cannon, A. J.
2015-12-01
Climate models are a key provider to investigate impacts of projected future climate conditions on regional hydrologic systems. However, there is a considerable mismatch of spatial resolution between GCMs and regional applications, in particular a region characterized by complex terrain such as Korean peninsula. Therefore, a downscaling procedure is an essential to assess regional impacts of climate change. Numerous statistical downscaling methods have been used mainly due to the computational efficiency and simplicity. In this study, four statistical downscaling methods [Bias-Correction/Spatial Disaggregation (BCSD), Bias-Correction/Constructed Analogue (BCCA), Multivariate Adaptive Constructed Analogs (MACA), and Bias-Correction/Climate Imprint (BCCI)] are applied to downscale the latest Climate Forecast System Reanalysis data to stations for precipitation, maximum temperature, and minimum temperature over South Korea. By split sampling scheme, all methods are calibrated with observational station data for 19 years from 1973 to 1991 are and tested for the recent 19 years from 1992 to 2010. To assess skill of the downscaling methods, we construct a comprehensive suite of performance metrics that measure an ability of reproducing temporal correlation, distribution, spatial correlation, and extreme events. In addition, we employ Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) to identify robust statistical downscaling methods based on the performance metrics for each season. The results show that downscaling skill is considerably affected by the skill of CFSR and all methods lead to large improvements in representing all performance metrics. According to seasonal performance metrics evaluated, when TOPSIS is applied, MACA is identified as the most reliable and robust method for all variables and seasons. Note that such result is derived from CFSR output which is recognized as near perfect climate data in climate studies. Therefore, the
European downstream oil industry safety performance. Statistical summary of reported incidents 2009
International Nuclear Information System (INIS)
Burton, A.; Den Haan, K.H.
2010-10-01
The sixteenth such report by CONCAWE, this issue includes statistics on workrelated personal injuries for the European downstream oil industry's own employees as well as contractors for the year 2009. Data were received from 33 companies representing more than 97% of the European refining capacity. Trends over the last sixteen years are highlighted and the data are also compared to similar statistics from related industries. In addition, this report presents the results of the first Process Safety Performance Indicator data gathering exercise amongst the CONCAWE membership.
Hayslett, H T
1991-01-01
Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the
Choice of Sample Split in Out-of-Sample Forecast Evaluation
DEFF Research Database (Denmark)
Hansen, Peter Reinhard; Timmermann, Allan
, while conversely the power of forecast evaluation tests is strongest with long out-of-sample periods. To deal with size distortions, we propose a test statistic that is robust to the effect of considering multiple sample split points. Empirical applications to predictabil- ity of stock returns......Out-of-sample tests of forecast performance depend on how a given data set is split into estimation and evaluation periods, yet no guidance exists on how to choose the split point. Empirical forecast evaluation results can therefore be difficult to interpret, particularly when several values...... and inﬂation demonstrate that out-of-sample forecast evaluation results can critically depend on how the sample split is determined....
Directory of Open Access Journals (Sweden)
Gerardo Manfreda
2013-04-01
Full Text Available Campylobacteriosis represents the most important food-borne illness in the EU. Broilers, as well as poultry meat, spread the majority of strains responsible for human cases. The main aims of this study were to suggest an approach for the definition of performance objectives (POs based on prevalence and concentration of Campylobacter species (spp. in broiler carcasses; moreover, sampling plans to determine the acceptability of broiler batches at the slaughterhouses in relation to such POs were formulated. The dataset used in this study was the one regarding Italy composed during the European Food Safety Authority baseline survey which was performed in the EU in 2008. A total of 393 carcasses obtained from 393 different batches collected from 48 Italian slaughterhouses were included in the analysis. Uncertainty in prevalence and concentration of Campylobacter spp. on carcasses was quantified assuming a beta and log normal distribution. Statistical analysis and distribution fitting were performed in ModelRisk v4.3 (Monte Carlo simulation with 10,000 iterations. By taking the 50th percentile of prevalence distribution as safety limit, sampling plans were subsequently calculated basing on the binomial approach. Final values of number of samples were equal to 4 or 5 to test with qualitative analysis. Considering a limit of quantification of 10 colony forming units/g, a higher number of samples (i.e. 10-13 would be necessary to test using enumeration. An increase of the sensibility of the analytical technique should be necessary to achieve realistic and useful sampling plans based on concentration data.
Bell, Thomas L.; Kundu, Prasun K.; Einaudi, Franco (Technical Monitor)
2000-01-01
Estimates from TRMM satellite data of monthly total rainfall over an area are subject to substantial sampling errors due to the limited number of visits to the area by the satellite during the month. Quantitative comparisons of TRMM averages with data collected by other satellites and by ground-based systems require some estimate of the size of this sampling error. A method of estimating this sampling error based on the actual statistics of the TRMM observations and on some modeling work has been developed. "Sampling error" in TRMM monthly averages is defined here relative to the monthly total a hypothetical satellite permanently stationed above the area would have reported. "Sampling error" therefore includes contributions from the random and systematic errors introduced by the satellite remote sensing system. As part of our long-term goal of providing error estimates for each grid point accessible to the TRMM instruments, sampling error estimates for TRMM based on rain retrievals from TRMM microwave (TMI) data are compared for different times of the year and different oceanic areas (to minimize changes in the statistics due to algorithmic differences over land and ocean). Changes in sampling error estimates due to changes in rain statistics due 1) to evolution of the official algorithms used to process the data, and 2) differences from other remote sensing systems such as the Defense Meteorological Satellite Program (DMSP) Special Sensor Microwave/Imager (SSM/I), are analyzed.
Marković, Snežana; Kerč, Janez; Horvat, Matej
2017-03-01
We are presenting a new approach of identifying sources of variability within a manufacturing process by NIR measurements of samples of intermediate material after each consecutive unit operation (interprocess NIR sampling technique). In addition, we summarize the development of a multivariate statistical process control (MSPC) model for the production of enteric-coated pellet product of the proton-pump inhibitor class. By developing provisional NIR calibration models, the identification of critical process points yields comparable results to the established MSPC modeling procedure. Both approaches are shown to lead to the same conclusion, identifying parameters of extrusion/spheronization and characteristics of lactose that have the greatest influence on the end-product's enteric coating performance. The proposed approach enables quicker and easier identification of variability sources during manufacturing process, especially in cases when historical process data is not straightforwardly available. In the presented case the changes of lactose characteristics are influencing the performance of the extrusion/spheronization process step. The pellet cores produced by using one (considered as less suitable) lactose source were on average larger and more fragile, leading to consequent breakage of the cores during subsequent fluid bed operations. These results were confirmed by additional experimental analyses illuminating the underlying mechanism of fracture of oblong pellets during the pellet coating process leading to compromised film coating.
Energy Technology Data Exchange (ETDEWEB)
Kyle, Jennifer E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Casey, Cameron P. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Stratton, Kelly G. [National Security Directorate, Pacific Northwest National Laboratory, Richland WA USA; Zink, Erika M. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Kim, Young-Mo [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Zheng, Xueyun [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Monroe, Matthew E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Weitz, Karl K. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Bloodsworth, Kent J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Orton, Daniel J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Ibrahim, Yehia M. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Moore, Ronald J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Lee, Christine G. [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Research Service, Portland Veterans Affairs Medical Center, Portland OR USA; Pedersen, Catherine [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Orwoll, Eric [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Smith, Richard D. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Burnum-Johnson, Kristin E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Baker, Erin S. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA
2017-02-05
The use of dried blood spots (DBS) has many advantages over traditional plasma and serum samples such as smaller blood volume required, storage at room temperature, and ability for sampling in remote locations. However, understanding the robustness of different analytes in DBS samples is essential, especially in older samples collected for longitudinal studies. Here we analyzed DBS samples collected in 2000-2001 and stored at room temperature and compared them to matched serum samples stored at -80°C to determine if they could be effectively used as specific time points in a longitudinal study following metabolic disease. Four hundred small molecules were identified in both the serum and DBS samples using gas chromatograph-mass spectrometry (GC-MS), liquid chromatography-MS (LC-MS) and LC-ion mobility spectrometry-MS (LC-IMS-MS). The identified polar metabolites overlapped well between the sample types, though only one statistically significant polar metabolite in a case-control study was conserved, indicating degradation occurs in the DBS samples affecting quantitation. Differences in the lipid identifications indicated that some oxidation occurs in the DBS samples. However, thirty-six statistically significant lipids correlated in both sample types indicating that lipid quantitation was more stable across the sample types.
Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben
2017-09-15
Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Use of Unlabeled Samples for Mitigating the Hughes Phenomenon
Landgrebe, David A.; Shahshahani, Behzad M.
1993-01-01
The use of unlabeled samples in improving the performance of classifiers is studied. When the number of training samples is fixed and small, additional feature measurements may reduce the performance of a statistical classifier. It is shown that by using unlabeled samples, estimates of the parameters can be improved and therefore this phenomenon may be mitigated. Various methods for using unlabeled samples are reviewed and experimental results are provided.
Performance evaluation soil samples utilizing encapsulation technology
Dahlgran, James R.
1999-01-01
Performance evaluation soil samples and method of their preparation using encapsulation technology to encapsulate analytes which are introduced into a soil matrix for analysis and evaluation by analytical laboratories. Target analytes are mixed in an appropriate solvent at predetermined concentrations. The mixture is emulsified in a solution of polymeric film forming material. The emulsified solution is polymerized to form microcapsules. The microcapsules are recovered, quantitated and introduced into a soil matrix in a predetermined ratio to form soil samples with the desired analyte concentration.
Sampling Efficiency and Performance of Selected Thoracic Aerosol Samplers.
Görner, Peter; Simon, Xavier; Boivin, Alexis; Bau, Sébastien
2017-08-01
Measurement of worker exposure to a thoracic health-related aerosol fraction is necessary in a number of occupational situations. This is the case of workplaces with atmospheres polluted by fibrous particles, such as cotton dust or asbestos, and by particles inducing irritation or bronchoconstriction such as acid mists or flour dust. Three personal and two static thoracic aerosol samplers were tested under laboratory conditions. Sampling efficiency with respect to particle aerodynamic diameter was measured in a horizontal low wind tunnel and in a vertical calm air chamber. Sampling performance was evaluated against conventional thoracic penetration. Three of the tested samplers performed well, when sampling the thoracic aerosol at nominal flow rate and two others performed well at optimized flow rate. The limit of flow rate optimization was found when using cyclone samplers. © The Author 2017. Published by Oxford University Press on behalf of the British Occupational Hygiene Society.
Small, coded, pill-sized tracers embedded in grain are proposed as a method for grain traceability. A sampling process for a grain traceability system was designed and investigated by applying probability statistics using a science-based sampling approach to collect an adequate number of tracers fo...
PRIS-STATISTICS: Power Reactor Information System Statistical Reports. User's Manual
International Nuclear Information System (INIS)
2013-01-01
The IAEA developed the Power Reactor Information System (PRIS)-Statistics application to assist PRIS end users with generating statistical reports from PRIS data. Statistical reports provide an overview of the status, specification and performance results of every nuclear power reactor in the world. This user's manual was prepared to facilitate the use of the PRIS-Statistics application and to provide guidelines and detailed information for each report in the application. Statistical reports support analyses of nuclear power development and strategies, and the evaluation of nuclear power plant performance. The PRIS database can be used for comprehensive trend analyses and benchmarking against best performers and industrial standards.
DEFF Research Database (Denmark)
Edjabou, Vincent Maklawe Essonanawe; Jensen, Morten Bang; Götze, Ramona
2015-01-01
Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both...... comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub......-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10-50 waste fractions, organised according to a three-level (tiered approach) facilitating,comparison of the waste data between individual sub-areas with different fractionation (waste...
Energy Technology Data Exchange (ETDEWEB)
Price-Whelan, Adrian M.; Agüeros, Marcel A. [Department of Astronomy, Columbia University, 550 W 120th Street, New York, NY 10027 (United States); Fournier, Amanda P. [Department of Physics, Broida Hall, University of California, Santa Barbara, CA 93106 (United States); Street, Rachel [Las Cumbres Observatory Global Telescope Network, Inc., 6740 Cortona Drive, Suite 102, Santa Barbara, CA 93117 (United States); Ofek, Eran O. [Benoziyo Center for Astrophysics, Weizmann Institute of Science, 76100 Rehovot (Israel); Covey, Kevin R. [Lowell Observatory, 1400 West Mars Hill Road, Flagstaff, AZ 86001 (United States); Levitan, David; Sesar, Branimir [Division of Physics, Mathematics, and Astronomy, California Institute of Technology, Pasadena, CA 91125 (United States); Laher, Russ R.; Surace, Jason, E-mail: adrn@astro.columbia.edu [Spitzer Science Center, California Institute of Technology, Mail Stop 314-6, Pasadena, CA 91125 (United States)
2014-01-20
Many photometric time-domain surveys are driven by specific goals, such as searches for supernovae or transiting exoplanets, which set the cadence with which fields are re-imaged. In the case of the Palomar Transient Factory (PTF), several sub-surveys are conducted in parallel, leading to non-uniform sampling over its ∼20,000 deg{sup 2} footprint. While the median 7.26 deg{sup 2} PTF field has been imaged ∼40 times in the R band, ∼2300 deg{sup 2} have been observed >100 times. We use PTF data to study the trade off between searching for microlensing events in a survey whose footprint is much larger than that of typical microlensing searches, but with far-from-optimal time sampling. To examine the probability that microlensing events can be recovered in these data, we test statistics used on uniformly sampled data to identify variables and transients. We find that the von Neumann ratio performs best for identifying simulated microlensing events in our data. We develop a selection method using this statistic and apply it to data from fields with >10 R-band observations, 1.1 × 10{sup 9} light curves, uncovering three candidate microlensing events. We lack simultaneous, multi-color photometry to confirm these as microlensing events. However, their number is consistent with predictions for the event rate in the PTF footprint over the survey's three years of operations, as estimated from near-field microlensing models. This work can help constrain all-sky event rate predictions and tests microlensing signal recovery in large data sets, which will be useful to future time-domain surveys, such as that planned with the Large Synoptic Survey Telescope.
Visual Sample Plan Version 7.0 User's Guide
Energy Technology Data Exchange (ETDEWEB)
Matzke, Brett D. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Newburn, Lisa LN [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Hathaway, John E. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Bramer, Lisa M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Wilson, John E. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Dowson, Scott T. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Sego, Landon H. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Pulsipher, Brent A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
2014-03-01
User's guide for VSP 7.0 This user's guide describes Visual Sample Plan (VSP) Version 7.0 and provides instructions for using the software. VSP selects the appropriate number and location of environmental samples to ensure that the results of statistical tests performed to provide input to risk decisions have the required confidence and performance. VSP Version 7.0 provides sample-size equations or algorithms needed by specific statistical tests appropriate for specific environmental sampling objectives. It also provides data quality assessment and statistical analysis functions to support evaluation of the data and determine whether the data support decisions regarding sites suspected of contamination. The easy-to-use program is highly visual and graphic. VSP runs on personal computers with Microsoft Windows operating systems (XP, Vista, Windows 7, and Windows 8). Designed primarily for project managers and users without expertise in statistics, VSP is applicable to two- and three-dimensional populations to be sampled (e.g., rooms and buildings, surface soil, a defined layer of subsurface soil, water bodies, and other similar applications) for studies of environmental quality. VSP is also applicable for designing sampling plans for assessing chem/rad/bio threat and hazard identification within rooms and buildings, and for designing geophysical surveys for unexploded ordnance (UXO) identification.
A Statistical Primer: Understanding Descriptive and Inferential Statistics
Gillian Byrne
2007-01-01
As libraries and librarians move more towards evidence‐based decision making, the data being generated in libraries is growing. Understanding the basics of statistical analysis is crucial for evidence‐based practice (EBP), in order to correctly design and analyze researchas well as to evaluate the research of others. This article covers the fundamentals of descriptive and inferential statistics, from hypothesis construction to sampling to common statistical techniques including chi‐square, co...
International Nuclear Information System (INIS)
Baron, Jorge H.; Nunez Mac Leod, J.E.
2000-01-01
The present paper deals with the utilization of advanced sampling statistical methods to perform uncertainty and sensitivity analysis on numerical models. Such models may represent physical phenomena, logical structures (such as boolean expressions) or other systems, and various of their intrinsic parameters and/or input variables are usually treated as random variables simultaneously. In the present paper a simple method to scale-up Latin Hypercube Sampling (LHS) samples is presented, starting with a small sample and duplicating its size at each step, making it possible to use the already run numerical model results with the smaller sample. The method does not distort the statistical properties of the random variables and does not add any bias to the samples. The results is a significant reduction in numerical models running time can be achieved (by re-using the previously run samples), keeping all the advantages of LHS, until an acceptable representation level is achieved in the output variables. (author)
Mariano, A.J.; Ryan, E.H.; Huntley, H.S.; Laurindo, L.C.; Coelho, E.; Ozgokmen, TM; Berta, M.; Bogucki, D; Chen, S.S.; Curcic, M.; Drouin, K.L.; Gough, M; Haus, BK; Haza, A.C.; Hogan, P
2016-01-01
The Grand LAgrangian Deployment (GLAD) used multiscale sampling and GPS technology to observe time series of drifter positions with initial drifter separation of O(100 m) to O(10 km), and nominal 5 min sampling, during the summer and fall of 2012 in the northern Gulf of Mexico. Histograms of the velocity field and its statistical parameters are non-Gaussian; most are multimodal. The dominant periods for the surface velocity field are 1–2 days due to inertial oscillations, tides, and the sea b...
Energy Technology Data Exchange (ETDEWEB)
Curley, G. Michael [North American Electric Reliability Corporation (United States); Mandula, Jiri [International Atomic Energy Agency (IAEA)
2008-05-15
The WEC Committee on the Performance of Generating Plant (PGP) has been collecting and analysing power plant performance statistics worldwide for more than 30 years and has produced regular reports, which include examples of advanced techniques and methods for improving power plant performance through benchmarking. A series of reports from the various working groups was issued in 2008. This reference presents the results of Working Group 2 (WG2). WG2's main task is to facilitate the collection and input on an annual basis of power plant performance data (unit-by-unit and aggregated data) into the WEC PGP database. The statistics will be collected for steam, nuclear, gas turbine and combined cycle, hydro and pump storage plant. WG2 will also oversee the ongoing development of the availability statistics database, including the contents, the required software, security issues and other important information. The report is divided into two sections: Thermal generating, combined cycle/co-generation, combustion turbine, hydro and pumped storage unavailability factors and availability statistics; and nuclear power generating units.
Magnetic resonance imaging of the wrist: Diagnostic performance statistics
International Nuclear Information System (INIS)
Hobby, Jonathan L.; Tom, Brian D.M.; Bearcroft, Philip W.P.; Dixon, Adrian K.
2001-01-01
AIM: To review the published diagnostic performance statistics for magnetic resonance imaging (MRI) of the wrist for tears of the triangular fibrocartilage complex, the intrinsic carpal ligaments, and for osteonecrosis of the carpal bones. MATERIALS AND METHODS: We used Medline and Embase to search the English language literature. Studies evaluating the diagnostic performance of MRI of the wrist in living patients with surgical confirmation of MR findings were identified. RESULTS: We identified 11 studies reporting the diagnostic performance of MRI for tears of the triangular fibrocartilage complex for a total of 410 patients, six studies for the scapho-lunate ligament (159 patients), six studies for the luno-triquetral ligament (142 patients) and four studies (56 patients) for osteonecrosis of the carpal bones. CONCLUSIONS: Magnetic resonance imaging is an accurate means of diagnosing tears of the triangular fibrocartilage and carpal osteonecrosis. Although MRI is highly specific for tears of the intrinsic carpal ligaments, its sensitivity is low. The diagnostic performance of MRI in the wrist is improved by using high-resolution T2* weighted 3D gradient echo sequences. Using current imaging techniques without intra-articular contrast medium, magnetic resonance imaging cannot reliably exclude tears of the intrinsic carpal ligaments. Hobby, J.L. (2001)
International Nuclear Information System (INIS)
Matsuda, Hideharu; Minato, Susumu
2002-01-01
The accuracy of statistical quantity like the mean value and contour map obtained by measurement of the environmental gamma-ray dose rate was evaluated by random sampling of 5 different model distribution maps made by the mean slope, -1.3, of power spectra calculated from the actually measured values. The values were derived from 58 natural gamma dose rate data reported worldwide ranging in the means of 10-100 Gy/h rates and 10 -3 -10 7 km 2 areas. The accuracy of the mean value was found around ±7% even for 60 or 80 samplings (the most frequent number) and the standard deviation had the accuracy less than 1/4-1/3 of the means. The correlation coefficient of the frequency distribution was found 0.860 or more for 200-400 samplings (the most frequent number) but of the contour map, 0.502-0.770. (K.H.)
Evaluation of Approaches to Analyzing Continuous Correlated Eye Data When Sample Size Is Small.
Huang, Jing; Huang, Jiayan; Chen, Yong; Ying, Gui-Shuang
2018-02-01
To evaluate the performance of commonly used statistical methods for analyzing continuous correlated eye data when sample size is small. We simulated correlated continuous data from two designs: (1) two eyes of a subject in two comparison groups; (2) two eyes of a subject in the same comparison group, under various sample size (5-50), inter-eye correlation (0-0.75) and effect size (0-0.8). Simulated data were analyzed using paired t-test, two sample t-test, Wald test and score test using the generalized estimating equations (GEE) and F-test using linear mixed effects model (LMM). We compared type I error rates and statistical powers, and demonstrated analysis approaches through analyzing two real datasets. In design 1, paired t-test and LMM perform better than GEE, with nominal type 1 error rate and higher statistical power. In design 2, no test performs uniformly well: two sample t-test (average of two eyes or a random eye) achieves better control of type I error but yields lower statistical power. In both designs, the GEE Wald test inflates type I error rate and GEE score test has lower power. When sample size is small, some commonly used statistical methods do not perform well. Paired t-test and LMM perform best when two eyes of a subject are in two different comparison groups, and t-test using the average of two eyes performs best when the two eyes are in the same comparison group. When selecting the appropriate analysis approach the study design should be considered.
Understanding the Sampling Distribution and Its Use in Testing Statistical Significance.
Breunig, Nancy A.
Despite the increasing criticism of statistical significance testing by researchers, particularly in the publication of the 1994 American Psychological Association's style manual, statistical significance test results are still popular in journal articles. For this reason, it remains important to understand the logic of inferential statistics. A…
Performance of an active well coincidence counter for HEU samples
International Nuclear Information System (INIS)
Ferrari, Francesca; Peerani, Paolo
2010-01-01
Neutron coincidence counting is the reference NDA technique used in nuclear safeguards to measure the mass of nuclear material in samples. For high-enriched uranium (HEU) samples active neutron interrogation is generally performed and the most common device used by nuclear inspectors is the Active Well Coincidence Counter (AWCC). Within her master thesis at the Polytechnic of Milan, the first author performed an intensive study on the characteristics and performances of the AWCC in order to assess the 235 U mass in HEU oxide samples at the PERLA laboratory of JRC. The work has been summarised in this paper that starts with the optimisation of the use of AWCC for nuclear safeguards, describing the calibration procedure, reporting results of a series of verification measurements, summarising the performances that can be obtained with this instruments during inspections at fuel production plants and concluding with the discussion of uncertainties related to these measurements.
Understanding Computational Bayesian Statistics
Bolstad, William M
2011-01-01
A hands-on introduction to computational statistics from a Bayesian point of view Providing a solid grounding in statistics while uniquely covering the topics from a Bayesian perspective, Understanding Computational Bayesian Statistics successfully guides readers through this new, cutting-edge approach. With its hands-on treatment of the topic, the book shows how samples can be drawn from the posterior distribution when the formula giving its shape is all that is known, and how Bayesian inferences can be based on these samples from the posterior. These ideas are illustrated on common statistic
Engineer’s estimate reliability and statistical characteristics of bids
Directory of Open Access Journals (Sweden)
Fariborz M. Tehrani
2016-12-01
Full Text Available The objective of this report is to provide a methodology for examining bids and evaluating the performance of engineer’s estimates in capturing the true cost of projects. This study reviews the cost development for transportation projects in addition to two sources of uncertainties in a cost estimate, including modeling errors and inherent variability. Sample projects are highway maintenance projects with a similar scope of the work, size, and schedule. Statistical analysis of engineering estimates and bids examines the adaptability of statistical models for sample projects. Further, the variation of engineering cost estimates from inception to implementation has been presented and discussed for selected projects. Moreover, the applicability of extreme values theory is assessed for available data. The results indicate that the performance of engineer’s estimate is best evaluated based on trimmed average of bids, excluding discordant bids.
Directory of Open Access Journals (Sweden)
Rui Xu
2013-01-01
Full Text Available Minimum description length (MDL based group-wise registration was a state-of-the-art method to determine the corresponding points of 3D shapes for the construction of statistical shape models (SSMs. However, it suffered from the problem that determined corresponding points did not uniformly spread on original shapes, since corresponding points were obtained by uniformly sampling the aligned shape on the parameterized space of unit sphere. We proposed a particle-system based method to obtain adaptive sampling positions on the unit sphere to resolve this problem. Here, a set of particles was placed on the unit sphere to construct a particle system whose energy was related to the distortions of parameterized meshes. By minimizing this energy, each particle was moved on the unit sphere. When the system became steady, particles were treated as vertices to build a spherical mesh, which was then relaxed to slightly adjust vertices to obtain optimal sampling-positions. We used 47 cases of (left and right lungs and 50 cases of livers, (left and right kidneys, and spleens for evaluations. Experiments showed that the proposed method was able to resolve the problem of the original MDL method, and the proposed method performed better in the generalization and specificity tests.
International Nuclear Information System (INIS)
Reiser, I; Lu, Z
2014-01-01
Purpose: Recently, task-based assessment of diagnostic CT systems has attracted much attention. Detection task performance can be estimated using human observers, or mathematical observer models. While most models are well established, considerable bias can be introduced when performance is estimated from a limited number of image samples. Thus, the purpose of this work was to assess the effect of sample size on bias and uncertainty of two channelized Hotelling observers and a template-matching observer. Methods: The image data used for this study consisted of 100 signal-present and 100 signal-absent regions-of-interest, which were extracted from CT slices. The experimental conditions included two signal sizes and five different x-ray beam current settings (mAs). Human observer performance for these images was determined in 2-alternative forced choice experiments. These data were provided by the Mayo clinic in Rochester, MN. Detection performance was estimated from three observer models, including channelized Hotelling observers (CHO) with Gabor or Laguerre-Gauss (LG) channels, and a template-matching observer (TM). Different sample sizes were generated by randomly selecting a subset of image pairs, (N=20,40,60,80). Observer performance was quantified as proportion of correct responses (PC). Bias was quantified as the relative difference of PC for 20 and 80 image pairs. Results: For n=100, all observer models predicted human performance across mAs and signal sizes. Bias was 23% for CHO (Gabor), 7% for CHO (LG), and 3% for TM. The relative standard deviation, σ(PC)/PC at N=20 was highest for the TM observer (11%) and lowest for the CHO (Gabor) observer (5%). Conclusion: In order to make image quality assessment feasible in the clinical practice, a statistically efficient observer model, that can predict performance from few samples, is needed. Our results identified two observer models that may be suited for this task
Statistical performance evaluation of ECG transmission using wireless networks.
Shakhatreh, Walid; Gharaibeh, Khaled; Al-Zaben, Awad
2013-07-01
This paper presents simulation of the transmission of biomedical signals (using ECG signal as an example) over wireless networks. Investigation of the effect of channel impairments including SNR, pathloss exponent, path delay and network impairments such as packet loss probability; on the diagnosability of the received ECG signal are presented. The ECG signal is transmitted through a wireless network system composed of two communication protocols; an 802.15.4- ZigBee protocol and an 802.11b protocol. The performance of the transmission is evaluated using higher order statistics parameters such as kurtosis and Negative Entropy in addition to the common techniques such as the PRD, RMS and Cross Correlation.
A note on the kappa statistic for clustered dichotomous data.
Zhou, Ming; Yang, Zhao
2014-06-30
The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.
[A comparison of convenience sampling and purposive sampling].
Suen, Lee-Jen Wu; Huang, Hui-Man; Lee, Hao-Hsien
2014-06-01
Convenience sampling and purposive sampling are two different sampling methods. This article first explains sampling terms such as target population, accessible population, simple random sampling, intended sample, actual sample, and statistical power analysis. These terms are then used to explain the difference between "convenience sampling" and purposive sampling." Convenience sampling is a non-probabilistic sampling technique applicable to qualitative or quantitative studies, although it is most frequently used in quantitative studies. In convenience samples, subjects more readily accessible to the researcher are more likely to be included. Thus, in quantitative studies, opportunity to participate is not equal for all qualified individuals in the target population and study results are not necessarily generalizable to this population. As in all quantitative studies, increasing the sample size increases the statistical power of the convenience sample. In contrast, purposive sampling is typically used in qualitative studies. Researchers who use this technique carefully select subjects based on study purpose with the expectation that each participant will provide unique and rich information of value to the study. As a result, members of the accessible population are not interchangeable and sample size is determined by data saturation not by statistical power analysis.
DEFF Research Database (Denmark)
Njage, Patrick Murigu Kamau; Sawe, Chemutai Tonui; Onyango, Cecilia Moraa
2017-01-01
assessment scheme and statistical modeling were used to systematically assess the microbial performance of core control and assurance activities in five Kenyan fresh produce processing and export companies. Generalized linear mixed models and correlated random-effects joint models for multivariate clustered...... the maximum safety level for environmental samples. Escherichia coli was detected in five of the six CSLs, including the final product. Among the processing-environment samples, the hand or glove swabs of personnel revealed a higher level of predicted contamination with E. coli, and 80% of the factories were...... of contamination with coliforms in water at the inlet than in the final rinse water. Four (80%) of the five assessed processors had poor to unacceptable counts of Enterobacteriaceae on processing surfaces. Personnel-, equipment-, and product-related hygiene measures to improve the performance of preventive...
Global health business: the production and performativity of statistics in Sierra Leone and Germany.
Erikson, Susan L
2012-01-01
The global push for health statistics and electronic digital health information systems is about more than tracking health incidence and prevalence. It is also experienced on the ground as means to develop and maintain particular norms of health business, knowledge, and decision- and profit-making that are not innocent. Statistics make possible audit and accountability logics that undergird the management of health at a distance and that are increasingly necessary to the business of health. Health statistics are inextricable from their social milieus, yet as business artifacts they operate as if they are freely formed, objectively originated, and accurate. This article explicates health statistics as cultural forms and shows how they have been produced and performed in two very different countries: Sierra Leone and Germany. In both familiar and surprising ways, this article shows how statistics and their pursuit organize and discipline human behavior, constitute subject positions, and reify existing relations of power.
Classical boson sampling algorithms with superior performance to near-term experiments
Neville, Alex; Sparrow, Chris; Clifford, Raphaël; Johnston, Eric; Birchall, Patrick M.; Montanaro, Ashley; Laing, Anthony
2017-12-01
It is predicted that quantum computers will dramatically outperform their conventional counterparts. However, large-scale universal quantum computers are yet to be built. Boson sampling is a rudimentary quantum algorithm tailored to the platform of linear optics, which has sparked interest as a rapid way to demonstrate such quantum supremacy. Photon statistics are governed by intractable matrix functions, which suggests that sampling from the distribution obtained by injecting photons into a linear optical network could be solved more quickly by a photonic experiment than by a classical computer. The apparently low resource requirements for large boson sampling experiments have raised expectations of a near-term demonstration of quantum supremacy by boson sampling. Here we present classical boson sampling algorithms and theoretical analyses of prospects for scaling boson sampling experiments, showing that near-term quantum supremacy via boson sampling is unlikely. Our classical algorithm, based on Metropolised independence sampling, allowed the boson sampling problem to be solved for 30 photons with standard computing hardware. Compared to current experiments, a demonstration of quantum supremacy over a successful implementation of these classical methods on a supercomputer would require the number of photons and experimental components to increase by orders of magnitude, while tackling exponentially scaling photon loss.
A statistical approach to nuclear fuel design and performance
Cunning, Travis Andrew
As CANDU fuel failures can have significant economic and operational consequences on the Canadian nuclear power industry, it is essential that factors impacting fuel performance are adequately understood. Current industrial practice relies on deterministic safety analysis and the highly conservative "limit of operating envelope" approach, where all parameters are assumed to be at their limits simultaneously. This results in a conservative prediction of event consequences with little consideration given to the high quality and precision of current manufacturing processes. This study employs a novel approach to the prediction of CANDU fuel reliability. Probability distributions are fitted to actual fuel manufacturing datasets provided by Cameco Fuel Manufacturing, Inc. They are used to form input for two industry-standard fuel performance codes: ELESTRES for the steady-state case and ELOCA for the transient case---a hypothesized 80% reactor outlet header break loss of coolant accident. Using a Monte Carlo technique for input generation, 105 independent trials are conducted and probability distributions are fitted to key model output quantities. Comparing model output against recognized industrial acceptance criteria, no fuel failures are predicted for either case. Output distributions are well removed from failure limit values, implying that margin exists in current fuel manufacturing and design. To validate the results and attempt to reduce the simulation burden of the methodology, two dimensional reduction methods are assessed. Using just 36 trials, both methods are able to produce output distributions that agree strongly with those obtained via the brute-force Monte Carlo method, often to a relative discrepancy of less than 0.3% when predicting the first statistical moment, and a relative discrepancy of less than 5% when predicting the second statistical moment. In terms of global sensitivity, pellet density proves to have the greatest impact on fuel performance
Directory of Open Access Journals (Sweden)
Fiser Ondrej
2011-01-01
Full Text Available Long-term monthly and annual statistics of the attenuation of electromagnetic waves that have been obtained from 6 years of measurements on a free space optical path, 853 meters long, with a wavelength of 850 nm and on a precisely parallel radio path with a frequency of 58 GHz are presented. All the attenuation events observed are systematically classified according to the hydrometeor type causing the particular event. Monthly and yearly propagation statistics on the free space optical path and radio path are obtained. The influence of individual hydrometeors on attenuation is analysed. The obtained propagation statistics are compared to the calculated statistics using ITU-R models. The calculated attenuation statistics both at 850 nm and 58 GHz underestimate the measured statistics for higher attenuation levels. The availability performance of a simulated hybrid FSO/RF system is analysed based on the measured data.
Kooijman, Pieter C; Kok, Sander J; Weusten, Jos J A M; Honing, Maarten
2016-05-05
Preparation of samples according to an optimized method is crucial for accurate determination of polymer sample characteristics by Matrix-Assisted Laser Desorption Ionization (MALDI) analysis. Sample preparation conditions such as matrix choice, cationization agent, deposition technique or even the deposition volume should be chosen to suit the sample of interest. Many sample preparation protocols have been developed and employed, yet finding the optimal sample preparation protocol remains a challenge. Because an objective comparison between the results of diverse protocols is not possible, "gut-feeling" or "good enough" is often decisive in the search for an optimum. This implies that sub-optimal protocols are used, leading to a loss of mass spectral information quality. To address this problem a novel analytical strategy based on MALDI imaging and statistical data processing was developed in which eight parameters were formulated to objectively quantify the quality of sample deposition and optimal MALDI matrix composition and finally sum up to an overall quality score of the sample deposition. These parameters can be established in a fully automated way using commercially available mass spectrometry imaging instruments without any hardware adjustments. With the newly developed analytical strategy the highest quality MALDI spots were selected, resulting in more reproducible and more valuable spectra for PEG in a variety of matrices. Moreover, our method enables an objective comparison of sample preparation protocols for any analyte and opens up new fields of investigation by presenting MALDI performance data in a clear and concise way. Copyright © 2016 Elsevier B.V. All rights reserved.
Sampling of temporal networks: Methods and biases
Rocha, Luis E. C.; Masuda, Naoki; Holme, Petter
2017-11-01
Temporal networks have been increasingly used to model a diversity of systems that evolve in time; for example, human contact structures over which dynamic processes such as epidemics take place. A fundamental aspect of real-life networks is that they are sampled within temporal and spatial frames. Furthermore, one might wish to subsample networks to reduce their size for better visualization or to perform computationally intensive simulations. The sampling method may affect the network structure and thus caution is necessary to generalize results based on samples. In this paper, we study four sampling strategies applied to a variety of real-life temporal networks. We quantify the biases generated by each sampling strategy on a number of relevant statistics such as link activity, temporal paths and epidemic spread. We find that some biases are common in a variety of networks and statistics, but one strategy, uniform sampling of nodes, shows improved performance in most scenarios. Given the particularities of temporal network data and the variety of network structures, we recommend that the choice of sampling methods be problem oriented to minimize the potential biases for the specific research questions on hand. Our results help researchers to better design network data collection protocols and to understand the limitations of sampled temporal network data.
International Nuclear Information System (INIS)
Tauhata, L.; Vianna, M.E.; Oliveira, A.E. de; Clain, A.F.; Ferreira, A.C.M.; Bernardes, E.M.
2000-01-01
The accuracy and precision of results of the radionuclide analyses in environmental samples are widely claimed internationally due to its consequences in the decision process coupled to evaluation of environmental pollution, impact, internal and external population exposure. These characteristics of measurement of the laboratories can be shown clearly using intercomparison data, due to the existence of a reference value and the need of three determinations for each analysis. In intercomparison studies accuracy in radionuclide assays in low-level environmental samples has usually been the main focus in performance evaluation and it can be estimated by taking into account the deviation between the experimental laboratory mean value and the reference value. The laboratory repeatability of measurements or their standard deviation is seldom included in performance evaluation. In order to show the influence of the uncertainties in performance evaluation of the laboratories, data of 22 intercomparison runs which distributed 790 spiked environmental samples to 20 Brazilian participant laboratories were compared, using the 'Normalised Standard Deviation' as statistical criteria for performance evaluation of U.S.EPA. It mainly takes into account the laboratory accuracy and the performance evaluation using the same data classified by normalised standard deviation modified by a weight reactor that includes the individual laboratory uncertainty. The results show a relative decrease in laboratory performance in each radionuclide assay: 1.8% for 65 Zn, 2.8% for 40 K, 3.4 for 60 Co, 3.7% for 134 Cs, 4.0% for 137 Cs, 4.4% for Th and U nat , 4.5% for 3 H, 6.3% for 133 Ba, 8.6% for 90 Sr, 10.6% for Gross Alpha, 10.9% for 106 Ru, 11.1% for 226 Ra, 11.5% for Gross Beta and 13.6% for 228 Ra. The changes in the parameters of the statistical distribution function were negligible and the distribution remained as Gaussian type for all radionuclides analysed. Data analyses in terms of
Computational statistics handbook with Matlab
Martinez, Wendy L
2007-01-01
Prefaces Introduction What Is Computational Statistics? An Overview of the Book Probability Concepts Introduction Probability Conditional Probability and Independence Expectation Common Distributions Sampling Concepts Introduction Sampling Terminology and Concepts Sampling Distributions Parameter Estimation Empirical Distribution Function Generating Random Variables Introduction General Techniques for Generating Random Variables Generating Continuous Random Variables Generating Discrete Random Variables Exploratory Data Analysis Introduction Exploring Univariate Data Exploring Bivariate and Trivariate Data Exploring Multidimensional Data Finding Structure Introduction Projecting Data Principal Component Analysis Projection Pursuit EDA Independent Component Analysis Grand Tour Nonlinear Dimensionality Reduction Monte Carlo Methods for Inferential Statistics Introduction Classical Inferential Statistics Monte Carlo Methods for Inferential Statist...
Directory of Open Access Journals (Sweden)
Jędrzej S. Bojanowski
2014-12-01
Full Text Available Cloud property data sets derived from passive sensors onboard the polar orbiting satellites (such as the NOAA’s Advanced Very High Resolution Radiometer have global coverage and now span a climatological time period. Synoptic surface observations (SYNOP are often used to characterize the accuracy of satellite-based cloud cover. Infrequent overpasses of polar orbiting satellites combined with the 3- or 6-h SYNOP frequency lead to collocation time differences of up to 3 h. The associated collocation error degrades the cloud cover performance statistics such as the Hanssen-Kuiper’s discriminant (HK by up to 45%. Limiting the time difference to 10 min, on the other hand, introduces a sampling error due to a lower number of corresponding satellite and SYNOP observations. This error depends on both the length of the validated time series and the SYNOP frequency. The trade-off between collocation and sampling error call for an optimum collocation time difference. It however depends on cloud cover characteristics and SYNOP frequency, and cannot be generalized. Instead, a method is presented to reconstruct the unbiased (true HK from HK affected by the collocation differences, which significantly (t-test p < 0.01 improves the validation results.
ACS sampling system: design, implementation, and performance evaluation
Di Marcantonio, Paolo; Cirami, Roberto; Chiozzi, Gianluca
2004-09-01
By means of ACS (ALMA Common Software) framework we designed and implemented a sampling system which allows sampling of every Characteristic Component Property with a specific, user-defined, sustained frequency limited only by the hardware. Collected data are sent to various clients (one or more Java plotting widgets, a dedicated GUI or a COTS application) using the ACS/CORBA Notification Channel. The data transport is optimized: samples are cached locally and sent in packets with a lower and user-defined frequency to keep network load under control. Simultaneous sampling of the Properties of different Components is also possible. Together with the design and implementation issues we present the performance of the sampling system evaluated on two different platforms: on a VME based system using VxWorks RTOS (currently adopted by ALMA) and on a PC/104+ embedded platform using Red Hat 9 Linux operating system. The PC/104+ solution offers, as an alternative, a low cost PC compatible hardware environment with free and open operating system.
Sampling Approaches for Multi-Domain Internet Performance Measurement Infrastructures
Energy Technology Data Exchange (ETDEWEB)
Calyam, Prasad
2014-09-15
The next-generation of high-performance networks being developed in DOE communities are critical for supporting current and emerging data-intensive science applications. The goal of this project is to investigate multi-domain network status sampling techniques and tools to measure/analyze performance, and thereby provide “network awareness” to end-users and network operators in DOE communities. We leverage the infrastructure and datasets available through perfSONAR, which is a multi-domain measurement framework that has been widely deployed in high-performance computing and networking communities; the DOE community is a core developer and the largest adopter of perfSONAR. Our investigations include development of semantic scheduling algorithms, measurement federation policies, and tools to sample multi-domain and multi-layer network status within perfSONAR deployments. We validate our algorithms and policies with end-to-end measurement analysis tools for various monitoring objectives such as network weather forecasting, anomaly detection, and fault-diagnosis. In addition, we develop a multi-domain architecture for an enterprise-specific perfSONAR deployment that can implement monitoring-objective based sampling and that adheres to any domain-specific measurement policies.
Statistical techniques for sampling and monitoring natural resources
Hans T. Schreuder; Richard Ernst; Hugo Ramirez-Maldonado
2004-01-01
We present the statistical theory of inventory and monitoring from a probabilistic point of view. We start with the basics and show the interrelationships between designs and estimators illustrating the methods with a small artificial population as well as with a mapped realistic population. For such applications, useful open source software is given in Appendix 4....
Analysis of statistical misconception in terms of statistical reasoning
Maryati, I.; Priatna, N.
2018-05-01
Reasoning skill is needed for everyone to face globalization era, because every person have to be able to manage and use information from all over the world which can be obtained easily. Statistical reasoning skill is the ability to collect, group, process, interpret, and draw conclusion of information. Developing this skill can be done through various levels of education. However, the skill is low because many people assume that statistics is just the ability to count and using formulas and so do students. Students still have negative attitude toward course which is related to research. The purpose of this research is analyzing students’ misconception in descriptive statistic course toward the statistical reasoning skill. The observation was done by analyzing the misconception test result and statistical reasoning skill test; observing the students’ misconception effect toward statistical reasoning skill. The sample of this research was 32 students of math education department who had taken descriptive statistic course. The mean value of misconception test was 49,7 and standard deviation was 10,6 whereas the mean value of statistical reasoning skill test was 51,8 and standard deviation was 8,5. If the minimal value is 65 to state the standard achievement of a course competence, students’ mean value is lower than the standard competence. The result of students’ misconception study emphasized on which sub discussion that should be considered. Based on the assessment result, it was found that students’ misconception happen on this: 1) writing mathematical sentence and symbol well, 2) understanding basic definitions, 3) determining concept that will be used in solving problem. In statistical reasoning skill, the assessment was done to measure reasoning from: 1) data, 2) representation, 3) statistic format, 4) probability, 5) sample, and 6) association.
Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A
2012-01-01
Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.
International Nuclear Information System (INIS)
Inada, Naohisa; Oguri, Masamune; Shin, Min-Su; Kayo, Issha; Fukugita, Masataka; Strauss, Michael A.; Gott, J. Richard; Hennawi, Joseph F.; Morokuma, Tomoki; Becker, Robert H.; Gregg, Michael D.; White, Richard L.; Kochanek, Christopher S.; Chiu, Kuenley; Johnston, David E.; Clocchiatti, Alejandro; Richards, Gordon T.; Schneider, Donald P.; Frieman, Joshua A.
2010-01-01
We present the second report of our systematic search for strongly lensed quasars from the data of the Sloan Digital Sky Survey (SDSS). From extensive follow-up observations of 136 candidate objects, we find 36 lenses in the full sample of 77,429 spectroscopically confirmed quasars in the SDSS Data Release 5. We then define a complete sample of 19 lenses, including 11 from our previous search in the SDSS Data Release 3, from the sample of 36,287 quasars with i Λ = 0.84 +0.06 -0.08 (stat.) +0.09 -0.07 (syst.) assuming a flat universe, which is in good agreement with other cosmological observations. We also report the discoveries of seven binary quasars with separations ranging from 1.''1 to 16.''6, which are identified in the course of our lens survey. This study concludes the construction of our statistical lens sample in the full SDSS-I data set.
Hydrazine Determination in Sludge Samples by High Performance Liquid Chromatography
Energy Technology Data Exchange (ETDEWEB)
G. Elias; G. A. Park
2006-02-01
A high-performance liquid chromatographic method using ultraviolet (UV) detection was developed to detect and quantify hydrazine in a variety of environmental matrices. The method was developed primarily for sludge samples, but it is also applicable to soil and water samples. The hydrazine in the matrices was derivatized to their hydrazones with benzaldehyde. The derivatized hydrazones were separated using high performance liquid chromatography (HPLC) with a reversed-phase C-18 column in an isocratic mode with methanol-water (95:5, v/v), and detected with UV detection at 313 nm. The detection limit (25 ml) for the new analytical method is 0.0067 mg ml-1of hydrazine. Hydrazine showed low recovery in soil samples because components in soil oxidized hydrazine. Sludge samples that contained relatively high soil content also showed lower recovery. The technique is relatively simple and cost-effective, and is applicable for hydrazine analysis in different environmental matrices.
International Nuclear Information System (INIS)
Juang, K.-W.; Lee, D.-Y.; Teng, Y.-L.
2005-01-01
Correctly classifying 'contaminated' areas in soils, based on the threshold for a contaminated site, is important for determining effective clean-up actions. Pollutant mapping by means of kriging is increasingly being used for the delineation of contaminated soils. However, those areas where the kriged pollutant concentrations are close to the threshold have a high possibility for being misclassified. In order to reduce the misclassification due to the over- or under-estimation from kriging, an adaptive sampling using the cumulative distribution function of order statistics (CDFOS) was developed to draw additional samples for delineating contaminated soils, while kriging. A heavy-metal contaminated site in Hsinchu, Taiwan was used to illustrate this approach. The results showed that compared with random sampling, adaptive sampling using CDFOS reduced the kriging estimation errors and misclassification rates, and thus would appear to be a better choice than random sampling, as additional sampling is required for delineating the 'contaminated' areas. - A sampling approach was derived for drawing additional samples while kriging
Statistical and Machine Learning forecasting methods: Concerns and ways forward
Makridakis, Spyros; Assimakopoulos, Vassilios
2018-01-01
Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions. PMID:29584784
Statistical and Machine Learning forecasting methods: Concerns and ways forward.
Makridakis, Spyros; Spiliotis, Evangelos; Assimakopoulos, Vassilios
2018-01-01
Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions.
ten Veldhuis, Marie-Claire; Schleiss, Marc
2017-04-01
Urban catchments are typically characterised by a more flashy nature of the hydrological response compared to natural catchments. Predicting flow changes associated with urbanisation is not straightforward, as they are influenced by interactions between impervious cover, basin size, drainage connectivity and stormwater management infrastructure. In this study, we present an alternative approach to statistical analysis of hydrological response variability and basin flashiness, based on the distribution of inter-amount times. We analyse inter-amount time distributions of high-resolution streamflow time series for 17 (semi-)urbanised basins in North Carolina, USA, ranging from 13 to 238 km2 in size. We show that in the inter-amount-time framework, sampling frequency is tuned to the local variability of the flow pattern, resulting in a different representation and weighting of high and low flow periods in the statistical distribution. This leads to important differences in the way the distribution quantiles, mean, coefficient of variation and skewness vary across scales and results in lower mean intermittency and improved scaling. Moreover, we show that inter-amount-time distributions can be used to detect regulation effects on flow patterns, identify critical sampling scales and characterise flashiness of hydrological response. The possibility to use both the classical approach and the inter-amount-time framework to identify minimum observable scales and analyse flow data opens up interesting areas for future research.
Stefanski, Philip L.
2015-01-01
Commercially available software packages today allow users to quickly perform the routine evaluations of (1) descriptive statistics to numerically and graphically summarize both sample and population data, (2) inferential statistics that draws conclusions about a given population from samples taken of it, (3) probability determinations that can be used to generate estimates of reliability allowables, and finally (4) the setup of designed experiments and analysis of their data to identify significant material and process characteristics for application in both product manufacturing and performance enhancement. This paper presents examples of analysis and experimental design work that has been conducted using Statgraphics®(Registered Trademark) statistical software to obtain useful information with regard to solid rocket motor propellants and internal insulation material. Data were obtained from a number of programs (Shuttle, Constellation, and Space Launch System) and sources that include solid propellant burn rate strands, tensile specimens, sub-scale test motors, full-scale operational motors, rubber insulation specimens, and sub-scale rubber insulation analog samples. Besides facilitating the experimental design process to yield meaningful results, statistical software has demonstrated its ability to quickly perform complex data analyses and yield significant findings that might otherwise have gone unnoticed. One caveat to these successes is that useful results not only derive from the inherent power of the software package, but also from the skill and understanding of the data analyst.
Statistical learning in high energy and astrophysics
International Nuclear Information System (INIS)
Zimmermann, J.
2005-01-01
This thesis studies the performance of statistical learning methods in high energy and astrophysics where they have become a standard tool in physics analysis. They are used to perform complex classification or regression by intelligent pattern recognition. This kind of artificial intelligence is achieved by the principle ''learning from examples'': The examples describe the relationship between detector events and their classification. The application of statistical learning methods is either motivated by the lack of knowledge about this relationship or by tight time restrictions. In the first case learning from examples is the only possibility since no theory is available which would allow to build an algorithm in the classical way. In the second case a classical algorithm exists but is too slow to cope with the time restrictions. It is therefore replaced by a pattern recognition machine which implements a fast statistical learning method. But even in applications where some kind of classical algorithm had done a good job, statistical learning methods convinced by their remarkable performance. This thesis gives an introduction to statistical learning methods and how they are applied correctly in physics analysis. Their flexibility and high performance will be discussed by showing intriguing results from high energy and astrophysics. These include the development of highly efficient triggers, powerful purification of event samples and exact reconstruction of hidden event parameters. The presented studies also show typical problems in the application of statistical learning methods. They should be only second choice in all cases where an algorithm based on prior knowledge exists. Some examples in physics analyses are found where these methods are not used in the right way leading either to wrong predictions or bad performance. Physicists also often hesitate to profit from these methods because they fear that statistical learning methods cannot be controlled in a
Statistical learning in high energy and astrophysics
Energy Technology Data Exchange (ETDEWEB)
Zimmermann, J.
2005-06-16
This thesis studies the performance of statistical learning methods in high energy and astrophysics where they have become a standard tool in physics analysis. They are used to perform complex classification or regression by intelligent pattern recognition. This kind of artificial intelligence is achieved by the principle ''learning from examples'': The examples describe the relationship between detector events and their classification. The application of statistical learning methods is either motivated by the lack of knowledge about this relationship or by tight time restrictions. In the first case learning from examples is the only possibility since no theory is available which would allow to build an algorithm in the classical way. In the second case a classical algorithm exists but is too slow to cope with the time restrictions. It is therefore replaced by a pattern recognition machine which implements a fast statistical learning method. But even in applications where some kind of classical algorithm had done a good job, statistical learning methods convinced by their remarkable performance. This thesis gives an introduction to statistical learning methods and how they are applied correctly in physics analysis. Their flexibility and high performance will be discussed by showing intriguing results from high energy and astrophysics. These include the development of highly efficient triggers, powerful purification of event samples and exact reconstruction of hidden event parameters. The presented studies also show typical problems in the application of statistical learning methods. They should be only second choice in all cases where an algorithm based on prior knowledge exists. Some examples in physics analyses are found where these methods are not used in the right way leading either to wrong predictions or bad performance. Physicists also often hesitate to profit from these methods because they fear that statistical learning methods cannot
Clinical Strategies for Sampling Word Recognition Performance.
Schlauch, Robert S; Carney, Edward
2018-04-17
Computer simulation was used to estimate the statistical properties of searches for maximum word recognition ability (PB max). These involve presenting multiple lists and discarding all scores but that of the 1 list that produced the highest score. The simulations, which model limitations inherent in the precision of word recognition scores, were done to inform clinical protocols. A secondary consideration was a derivation of 95% confidence intervals for significant changes in score from phonemic scoring of a 50-word list. The PB max simulations were conducted on a "client" with flat performance intensity functions. The client's performance was assumed to be 60% initially and 40% for a second assessment. Thousands of estimates were obtained to examine the precision of (a) single lists and (b) multiple lists using a PB max procedure. This method permitted summarizing the precision for assessing a 20% drop in performance. A single 25-word list could identify only 58.4% of the cases in which performance fell from 60% to 40%. A single 125-word list identified 99.8% of the declines correctly. Presenting 3 or 5 lists to find PB max produced an undesirable finding: an increase in the word recognition score. A 25-word list produces unacceptably low precision for making clinical decisions. This finding holds in both single and multiple 25-word lists, as in a search for PB max. A table is provided, giving estimates of 95% critical ranges for successive presentations of a 50-word list analyzed by the number of phonemes correctly identified.
Effects of Concept Mapping Strategy on Learning Performance in Business and Economics Statistics
Chiou, Chei-Chang
2009-01-01
A concept map (CM) is a hierarchically arranged, graphic representation of the relationships among concepts. Concept mapping (CMING) is the process of constructing a CM. This paper examines whether a CMING strategy can be useful in helping students to improve their learning performance in a business and economics statistics course. A single…
International Nuclear Information System (INIS)
Thomas, J.M.; Eberhardt, L.L.; Skalski, J.R.; Simmons, M.A.
1984-05-01
As part of a larger study funded by the US Nuclear Regulatory Commission we have been investigating field sampling strategies and compositing as a means of detecting spills or migration at commercial low-level radioactive waste disposal sites. The overall project is designed to produce information for developing guidance on implementing 10 CFR part 61. Compositing (pooling samples) for detection is discussed first, followed by our development of a statistical test to allow a decision as to whether any component of a composite exceeds a prescribed maximum acceptable level. The question of optimal field sampling designs and an Apple computer program designed to show the difficulties in constructing efficient field designs and using compositing schemes are considered. 6 references, 3 figures, 3 tables
Analysis of stress corrosion data by means of the statistic of extreme values
International Nuclear Information System (INIS)
Imarisio, G.; Lanza, F.
1978-01-01
The possibility of examining stress corosion by means of extreme statistic was proposed. A series of test in boiling MgCl 2 of samples made on AISI 304 have been performed. Evolution of cracks dimension and time of life of samples was followed. It has been shown that the dimensions of the maximum cracks on the sample corroded for different times can be organized following the extreme values statistic. Also the life time of sample can be treated in the same way. A confirmation has been obtained using data taken from literature. Possible uses of predictions obtained with this type of analysis have been underlined. An extension of the toward less corrosive media and samples of several volumes is suggested to check the validity of the method
Sampling the Mouse Hippocampal Dentate Gyrus
Directory of Open Access Journals (Sweden)
Lisa Basler
2017-12-01
Full Text Available Sampling is a critical step in procedures that generate quantitative morphological data in the neurosciences. Samples need to be representative to allow statistical evaluations, and samples need to deliver a precision that makes statistical evaluations not only possible but also meaningful. Sampling generated variability should, e.g., not be able to hide significant group differences from statistical detection if they are present. Estimators of the coefficient of error (CE have been developed to provide tentative answers to the question if sampling has been “good enough” to provide meaningful statistical outcomes. We tested the performance of the commonly used Gundersen-Jensen CE estimator, using the layers of the mouse hippocampal dentate gyrus as an example (molecular layer, granule cell layer and hilus. We found that this estimator provided useful estimates of the precision that can be expected from samples of different sizes. For all layers, we found that a smoothness factor (m of 0 generally provided better estimates than an m of 1. Only for the combined layers, i.e., the entire dentate gyrus, better CE estimates could be obtained using an m of 1. The orientation of the sections impacted on CE sizes. Frontal (coronal sections are typically most efficient by providing the smallest CEs for a given amount of work. Applying the estimator to 3D-reconstructed layers and using very intense sampling, we observed CE size plots with m = 0 to m = 1 transitions that should also be expected but are not often observed in real section series. The data we present also allows the reader to approximate the sampling intervals in frontal, horizontal or sagittal sections that provide CEs of specified sizes for the layers of the mouse dentate gyrus.
Energy Technology Data Exchange (ETDEWEB)
Einfeld, Wayne; Krauter, Paula A.; Boucher, Raymond M.; Tezak, Mathew; Amidan, Brett G. (Pacific Northwest National Laboratory, Richland, WA); Piepel, Greg F. (Pacific Northwest National Laboratory, Richland, WA)
2011-05-01
Recovery of spores from environmental surfaces is known to vary due to sampling methodology, techniques, spore size and characteristics, surface materials, and environmental conditions. A series of tests were performed to evaluate a new, validated sponge-wipe method. Specific factors evaluated were the effects of contaminant concentrations and surface materials on recovery efficiency (RE), false negative rate (FNR), limit of detection (LOD) - and the uncertainties of these quantities. Ceramic tile and stainless steel had the highest mean RE values (48.9 and 48.1%, respectively). Faux leather, vinyl tile, and painted wood had mean RE values of 30.3, 25.6, and 25.5, respectively, while plastic had the lowest mean RE (9.8%). Results show a roughly linear dependence of surface roughness on RE, where the smoothest surfaces have the highest mean RE values. REs were not influenced by the low spore concentrations tested (3 x 10{sup -3} to 1.86 CFU/cm{sup 2}). The FNR data were consistent with RE data, showing a trend of smoother surfaces resulting in higher REs and lower FNRs. Stainless steel generally had the lowest mean FNR (0.123) and plastic had the highest mean FNR (0.479). The LOD{sub 90} varied with surface material, from 0.015 CFU/cm{sup 2} on stainless steel up to 0.039 on plastic. Selecting sampling locations on the basis of surface roughness and using roughness to interpret spore recovery data can improve sampling. Further, FNR values, calculated as a function of concentration and surface material, can be used pre-sampling to calculate the numbers of samples for statistical sampling plans with desired performance, and post-sampling to calculate the confidence in characterization and clearance decisions.
Explorations in statistics: the log transformation.
Curran-Everett, Douglas
2018-06-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This thirteenth installment of Explorations in Statistics explores the log transformation, an established technique that rescales the actual observations from an experiment so that the assumptions of some statistical analysis are better met. A general assumption in statistics is that the variability of some response Y is homogeneous across groups or across some predictor variable X. If the variability-the standard deviation-varies in rough proportion to the mean value of Y, a log transformation can equalize the standard deviations. Moreover, if the actual observations from an experiment conform to a skewed distribution, then a log transformation can make the theoretical distribution of the sample mean more consistent with a normal distribution. This is important: the results of a one-sample t test are meaningful only if the theoretical distribution of the sample mean is roughly normal. If we log-transform our observations, then we want to confirm the transformation was useful. We can do this if we use the Box-Cox method, if we bootstrap the sample mean and the statistic t itself, and if we assess the residual plots from the statistical model of the actual and transformed sample observations.
Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu
2015-09-01
Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Energy Technology Data Exchange (ETDEWEB)
Comnes, G.A.; Belden, T.N.; Kahn, E.P.
1995-02-01
The market for long-term bulk power is becoming increasingly competitive and mature. Given that many privately developed power projects have been or are being developed in the US, it is possible to begin to evaluate the performance of the market by analyzing its revealed prices. Using a consistent method, this paper presents levelized contract prices for a sample of privately developed US generation properties. The sample includes 26 projects with a total capacity of 6,354 MW. Contracts are described in terms of their choice of technology, choice of fuel, treatment of fuel price risk, geographic location, dispatchability, expected dispatch niche, and size. The contract price analysis shows that gas technologies clearly stand out as the most attractive. At an 80% capacity factor, coal projects have an average 20-year levelized price of $0.092/kWh, whereas natural gas combined cycle and/or cogeneration projects have an average price of $0.069/kWh. Within each technology type subsample, however, there is considerable variation. Prices for natural gas combustion turbines and one wind project are also presented. A preliminary statistical analysis is conducted to understand the relationship between price and four categories of explanatory factors including product heterogeneity, geographic heterogeneity, economic and technological change, and other buyer attributes (including avoided costs). Because of residual price variation, we are unable to accept the hypothesis that electricity is a homogeneous product. Instead, the analysis indicates that buyer value still plays an important role in the determination of price for competitively-acquired electricity.
International Nuclear Information System (INIS)
Enqvist, Andreas
2008-03-01
One particular purpose of nuclear safeguards, in addition to accounting for known materials, is the detection, identifying and quantifying unknown material, to prevent accidental and clandestine transports and uses of nuclear materials. This can be achieved in a non-destructive way through the various physical and statistical properties of particle emission and detection from such materials. This thesis addresses some fundamental aspects of nuclear materials and the way they can be detected and quantified by such methods. Factorial moments or multiplicities have long been used within the safeguard area. These are low order moments of the underlying number distributions of emission and detection. One objective of the present work was to determine the full probability distribution and its dependence on the sample mass and the detection process. Derivation and analysis of the full probability distribution and its dependence on the above factors constitutes the first part of the thesis. Another possibility of identifying unknown samples lies in the information in the 'fingerprints' (pulse shape distribution) left by a detected neutron or photon. A study of the statistical properties of the interaction of the incoming radiation (neutrons and photons) with the detectors constitutes the second part of the thesis. The interaction between fast neutrons and organic scintillation detectors is derived, and compared to Monte Carlo simulations. An experimental approach is also addressed in which cross correlation measurements were made using liquid scintillation detectors. First the dependence of the pulse height distribution on the energy and collision number of an incoming neutron was derived analytically and compared to numerical simulations. Then an algorithm was elaborated which can discriminate neutron pulses from photon pulses. The resulting cross correlation graphs are analyzed and discussed whether they can be used in applications to distinguish possible sample
Energy Technology Data Exchange (ETDEWEB)
Enqvist, Andreas
2008-03-15
One particular purpose of nuclear safeguards, in addition to accounting for known materials, is the detection, identifying and quantifying unknown material, to prevent accidental and clandestine transports and uses of nuclear materials. This can be achieved in a non-destructive way through the various physical and statistical properties of particle emission and detection from such materials. This thesis addresses some fundamental aspects of nuclear materials and the way they can be detected and quantified by such methods. Factorial moments or multiplicities have long been used within the safeguard area. These are low order moments of the underlying number distributions of emission and detection. One objective of the present work was to determine the full probability distribution and its dependence on the sample mass and the detection process. Derivation and analysis of the full probability distribution and its dependence on the above factors constitutes the first part of the thesis. Another possibility of identifying unknown samples lies in the information in the 'fingerprints' (pulse shape distribution) left by a detected neutron or photon. A study of the statistical properties of the interaction of the incoming radiation (neutrons and photons) with the detectors constitutes the second part of the thesis. The interaction between fast neutrons and organic scintillation detectors is derived, and compared to Monte Carlo simulations. An experimental approach is also addressed in which cross correlation measurements were made using liquid scintillation detectors. First the dependence of the pulse height distribution on the energy and collision number of an incoming neutron was derived analytically and compared to numerical simulations. Then an algorithm was elaborated which can discriminate neutron pulses from photon pulses. The resulting cross correlation graphs are analyzed and discussed whether they can be used in applications to distinguish possible
Statistics for Learning Genetics
Charles, Abigail Sheena
This study investigated the knowledge and skills that biology students may need to help them understand statistics/mathematics as it applies to genetics. The data are based on analyses of current representative genetics texts, practicing genetics professors' perspectives, and more directly, students' perceptions of, and performance in, doing statistically-based genetics problems. This issue is at the emerging edge of modern college-level genetics instruction, and this study attempts to identify key theoretical components for creating a specialized biological statistics curriculum. The goal of this curriculum will be to prepare biology students with the skills for assimilating quantitatively-based genetic processes, increasingly at the forefront of modern genetics. To fulfill this, two college level classes at two universities were surveyed. One university was located in the northeastern US and the other in the West Indies. There was a sample size of 42 students and a supplementary interview was administered to a select 9 students. Interviews were also administered to professors in the field in order to gain insight into the teaching of statistics in genetics. Key findings indicated that students had very little to no background in statistics (55%). Although students did perform well on exams with 60% of the population receiving an A or B grade, 77% of them did not offer good explanations on a probability question associated with the normal distribution provided in the survey. The scope and presentation of the applicable statistics/mathematics in some of the most used textbooks in genetics teaching, as well as genetics syllabi used by instructors do not help the issue. It was found that the text books, often times, either did not give effective explanations for students, or completely left out certain topics. The omission of certain statistical/mathematical oriented topics was seen to be also true with the genetics syllabi reviewed for this study. Nonetheless
Touchton, Michael
2015-01-01
I administer a quasi-experiment using undergraduate political science majors in statistics classes to evaluate whether "flipping the classroom" (the treatment) alters students' applied problem-solving performance and satisfaction relative to students in a traditional classroom environment (the control). I also assess whether general…
Experimental performance evaluation of two stack sampling systems in a plutonium facility
International Nuclear Information System (INIS)
Glissmeyer, J.A.
1992-04-01
The evaluation of two routine stack sampling systems at the Z-Plant plutonium facility operated by Rockwell International for USERDA is part of a larger study, sponsored by Rockwell and conducted by Battelle, Pacific Northwest Laboratories, of gaseous effluent sampling systems. The gaseous effluent sampling systems evaluated are located at the main plant ventilation stack (291-Z-1) and at a vessel vent stack (296-Z-3). A preliminary report, which was a paper study issued in April 1976, identified many deficiencies in the existing sampling systems and made recommendations for corrective action. The objectives of this experimental evaluation of those sampling systems were as follows: Characterize the radioactive aerosols in the stack effluents; Develop a tracer aerosol technique for validating particulate effluent sampling system performance; Evaluate the performance of the existing routine sampling systems and their compliance with the sponsor's criteria; and Recommend corrective action where required. The tracer aerosol approach to sampler evaluation was chosen because the low concentrations of radioactive particulates in the effluents would otherwise require much longer sampling times and thus more time to complete this evaluation. The following report describes the sampling systems that are the subject of this study and then details the experiments performed. The results are then presented and discussed. Much of the raw and finished data are included in the appendices
Predicting energy performance of a net-zero energy building: A statistical approach
International Nuclear Information System (INIS)
Kneifel, Joshua; Webb, David
2016-01-01
Highlights: • A regression model is applied to actual energy data from a net-zero energy building. • The model is validated through a rigorous statistical analysis. • Comparisons are made between model predictions and those of a physics-based model. • The model is a viable baseline for evaluating future models from the energy data. - Abstract: Performance-based building requirements have become more prevalent because it gives freedom in building design while still maintaining or exceeding the energy performance required by prescriptive-based requirements. In order to determine if building designs reach target energy efficiency improvements, it is necessary to estimate the energy performance of a building using predictive models and different weather conditions. Physics-based whole building energy simulation modeling is the most common approach. However, these physics-based models include underlying assumptions and require significant amounts of information in order to specify the input parameter values. An alternative approach to test the performance of a building is to develop a statistically derived predictive regression model using post-occupancy data that can accurately predict energy consumption and production based on a few common weather-based factors, thus requiring less information than simulation models. A regression model based on measured data should be able to predict energy performance of a building for a given day as long as the weather conditions are similar to those during the data collection time frame. This article uses data from the National Institute of Standards and Technology (NIST) Net-Zero Energy Residential Test Facility (NZERTF) to develop and validate a regression model to predict the energy performance of the NZERTF using two weather variables aggregated to the daily level, applies the model to estimate the energy performance of hypothetical NZERTFs located in different cities in the Mixed-Humid Climate Zone, and compares these
Statistical Process Control in a Modern Production Environment
DEFF Research Database (Denmark)
Windfeldt, Gitte Bjørg
gathered here and standard statistical software. In Paper 2 a new method for process monitoring is introduced. The method uses a statistical model of the quality characteristic and a sliding window of observations to estimate the probability that the next item will not respect the specications......Paper 1 is aimed at practicians to help them test the assumption that the observations in a sample are independent and identically distributed. An assumption that is essential when using classical Shewhart charts. The test can easily be performed in the control chart setup using the samples....... If the estimated probability exceeds a pre-determined threshold the process will be stopped. The method is exible, allowing a complexity in modeling that remains invisible to the end user. Furthermore, the method allows to build diagnostic plots based on the parameters estimates that can provide valuable insight...
Naghshpour, Shahdad
2012-01-01
Statistics is the branch of mathematics that deals with real-life problems. As such, it is an essential tool for economists. Unfortunately, the way you and many other economists learn the concept of statistics is not compatible with the way economists think and learn. The problem is worsened by the use of mathematical jargon and complex derivations. Here's a book that proves none of this is necessary. All the examples and exercises in this book are constructed within the field of economics, thus eliminating the difficulty of learning statistics with examples from fields that have no relation to business, politics, or policy. Statistics is, in fact, not more difficult than economics. Anyone who can comprehend economics can understand and use statistics successfully within this field, including you! This book utilizes Microsoft Excel to obtain statistical results, as well as to perform additional necessary computations. Microsoft Excel is not the software of choice for performing sophisticated statistical analy...
Howard Stauffer; Nadav Nur
2005-01-01
The papers included in the Advances in Statistics section of the Partners in Flight (PIF) 2002 Proceedings represent a small sample of statistical topics of current importance to Partners In Flight research scientists: hierarchical modeling, estimation of detection probabilities, and Bayesian applications. Sauer et al. (this volume) examines a hierarchical model...
Geometric statistical inference
International Nuclear Information System (INIS)
Periwal, Vipul
1999-01-01
A reparametrization-covariant formulation of the inverse problem of probability is explicitly solved for finite sample sizes. The inferred distribution is explicitly continuous for finite sample size. A geometric solution of the statistical inference problem in higher dimensions is outlined
Frontiers in statistical quality control 11
Schmid, Wolfgang
2015-01-01
The main focus of this edited volume is on three major areas of statistical quality control: statistical process control (SPC), acceptance sampling and design of experiments. The majority of the papers deal with statistical process control, while acceptance sampling and design of experiments are also treated to a lesser extent. The book is organized into four thematic parts, with Part I addressing statistical process control. Part II is devoted to acceptance sampling. Part III covers the design of experiments, while Part IV discusses related fields. The twenty-three papers in this volume stem from The 11th International Workshop on Intelligent Statistical Quality Control, which was held in Sydney, Australia from August 20 to August 23, 2013. The event was hosted by Professor Ross Sparks, CSIRO Mathematics, Informatics and Statistics, North Ryde, Australia and was jointly organized by Professors S. Knoth, W. Schmid and Ross Sparks. The papers presented here were carefully selected and reviewed by the scientifi...
Directory of Open Access Journals (Sweden)
Casey Olives
Full Text Available Originally a binary classifier, Lot Quality Assurance Sampling (LQAS has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%, and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa.We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n=15 and n=25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa.Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87. In three of the studies, the kappa-statistic for a design with n=15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50, the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error.This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools.
Comparison of small n statistical tests of differential expression applied to microarrays
Directory of Open Access Journals (Sweden)
Lee Anna Y
2009-02-01
Full Text Available Abstract Background DNA microarrays provide data for genome wide patterns of expression between observation classes. Microarray studies often have small samples sizes, however, due to cost constraints or specimen availability. This can lead to poor random error estimates and inaccurate statistical tests of differential expression. We compare the performance of the standard t-test, fold change, and four small n statistical test methods designed to circumvent these problems. We report results of various normalization methods for empirical microarray data and of various random error models for simulated data. Results Three Empirical Bayes methods (CyberT, BRB, and limma t-statistics were the most effective statistical tests across simulated and both 2-colour cDNA and Affymetrix experimental data. The CyberT regularized t-statistic in particular was able to maintain expected false positive rates with simulated data showing high variances at low gene intensities, although at the cost of low true positive rates. The Local Pooled Error (LPE test introduced a bias that lowered false positive rates below theoretically expected values and had lower power relative to the top performers. The standard two-sample t-test and fold change were also found to be sub-optimal for detecting differentially expressed genes. The generalized log transformation was shown to be beneficial in improving results with certain data sets, in particular high variance cDNA data. Conclusion Pre-processing of data influences performance and the proper combination of pre-processing and statistical testing is necessary for obtaining the best results. All three Empirical Bayes methods assessed in our study are good choices for statistical tests for small n microarray studies for both Affymetrix and cDNA data. Choice of method for a particular study will depend on software and normalization preferences.
Lunsford, M. Leigh; Rowell, Ginger Holmes; Goodson-Espy, Tracy
2006-01-01
We applied a classroom research model to investigate student understanding of sampling distributions of sample means and the Central Limit Theorem in post-calculus introductory probability and statistics courses. Using a quantitative assessment tool developed by previous researchers and a qualitative assessment tool developed by the authors, we…
Zhang, Kai; Shardt, Yuri A W; Chen, Zhiwen; Peng, Kaixiang
2017-03-01
Using the expected detection delay (EDD) index to measure the performance of multivariate statistical process monitoring (MSPM) methods for constant additive faults have been recently developed. This paper, based on a statistical investigation of the T 2 - and Q-test statistics, extends the EDD index to the multiplicative and drift fault cases. As well, it is used to assess the performance of common MSPM methods that adopt these two test statistics. Based on how to use the measurement space, these methods can be divided into two groups, those which consider the complete measurement space, for example, principal component analysis-based methods, and those which only consider some subspace that reflects changes in key performance indicators, such as partial least squares-based methods. Furthermore, a generic form for them to use T 2 - and Q-test statistics are given. With the extended EDD index, the performance of these methods to detect drift and multiplicative faults is assessed using both numerical simulations and the Tennessee Eastman process. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests.
Kosinski, Andrzej S
2013-03-15
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.
Using the modified sample entropy to detect determinism
Energy Technology Data Exchange (ETDEWEB)
Xie Hongbo, E-mail: xiehb@sjtu.or [Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hung Hom, Kowloon (Hong Kong); Department of Biomedical Engineering, Jiangsu University, Zhenjiang (China); Guo Jingyi [Department of Health Technology and Informatics, Hong Kong Polytechnic University, Hung Hom, Kowloon (Hong Kong); Zheng Yongping, E-mail: ypzheng@ieee.or [Department of Health Technology and Informatics, Hong Kong Polytechnic University, Hung Hom, Kowloon (Hong Kong); Reseach Institute of Innovative Products and Technologies, Hong Kong Polytechnic University (Hong Kong)
2010-08-23
A modified sample entropy (mSampEn), based on the nonlinear continuous and convex function, has been proposed and proven to be superior to the standard sample entropy (SampEn) in several aspects. In this Letter, we empirically investigate the ability of the mSampEn statistic combined with surrogate data method to detect determinism. The effects of the datasets length and noise on the proposed method to differentiate between deterministic and stochastic dynamics are tested on several benchmark time series. The noise performance of the mSampEn statistic is also compared with the singular value decomposition (SVD) and symplectic geometry spectrum (SGS) based methods. The results indicate that the mSampEn statistic is a robust index for detecting determinism in short and noisy time series.
Statistical Inference for Data Adaptive Target Parameters.
Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J
2016-05-01
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Mathematical background and attitudes toward statistics in a sample of Spanish college students.
Carmona, José; Martínez, Rafael J; Sánchez, Manuel
2005-08-01
To examine the relation of mathematical background and initial attitudes toward statistics of Spanish college students in social sciences the Survey of Attitudes Toward Statistics was given to 827 students. Multivariate analyses tested the effects of two indicators of mathematical background (amount of exposure and achievement in previous courses) on the four subscales. Analysis suggested grades in previous courses are more related to initial attitudes toward statistics than the number of mathematics courses taken. Mathematical background was related with students' affective responses to statistics but not with their valuing of statistics. Implications of possible research are discussed.
Performance Comparison of Reconstruction Algorithms in Discrete Blind Multi-Coset Sampling
DEFF Research Database (Denmark)
Grigoryan, Ruben; Arildsen, Thomas; Tandur, Deepaknath
2012-01-01
This paper investigates the performance of different reconstruction algorithms in discrete blind multi-coset sampling. Multi-coset scheme is a promising compressed sensing architecture that can replace traditional Nyquist-rate sampling in the applications with multi-band frequency sparse signals...
Sample representativeness verification of the FADN CZ farm business sample
Directory of Open Access Journals (Sweden)
Marie Prášilová
2011-01-01
Full Text Available Sample representativeness verification is one of the key stages of statistical work. After having joined the European Union the Czech Republic joined also the Farm Accountancy Data Network system of the Union. This is a sample of bodies and companies doing business in agriculture. Detailed production and economic data on the results of farming business are collected from that sample annually and results for the entire population of the country´s farms are then estimated and assessed. It is important hence, that the sample be representative. Representativeness is to be assessed as to the number of farms included in the survey and also as to the degree of accordance of the measures and indices as related to the population. The paper deals with the special statistical techniques and methods of the FADN CZ sample representativeness verification including the necessary sample size statement procedure. The Czech farm population data have been obtained from the Czech Statistical Office data bank.
Statistical characterization report for Single-Shell Tank 241-T-107
International Nuclear Information System (INIS)
Cromar, R.D.; Wilmarth, S.R.; Jensen, L.
1994-01-01
This report contains the results of the statistical analysis of data from three core samples obtained from single-shell tank 241-T-107 (T-107). Four specific topics are addressed. They are summarized below. Section 3.0 contains mean concentration estimates of analytes found in T-107. The estimates of open-quotes errorclose quotes associated with the concentration estimates are given as 95% confidence intervals (CI) on the mean. The results given are based on three types of samples: core composite samples, core segment samples, and drainable liquid samples. Section 4.0 contains estimates of the spatial variability (variability between cores and between segments) and the analytical variability (variability between the primary and the duplicate analysis). Statistical tests were performed to test the hypothesis that the between cores and the between segments spatial variability is zero. The results of the tests are as follows. Based on the core composite data, the between cores variance is significantly different from zero for 35 out of 74 analytes; i.e., for 53% of the analytes there is no statistically significant difference between the concentration means for two cores. Based on core segment data, the between segments variance is significantly different from zero for 22 out of 24 analytes and the between cores variance is significantly different from zero for 4 out of 24 analytes; i.e., for 8% of the analytes there is no statistically significant difference between segment means and for 83% of the analytes there is no difference between the means from the three cores. Section 5.0 contains the results of the application of multiple comparison methods to the core composite data, the core segment data, and the drainable liquid data. Section 6.0 contains the results of a statistical test conducted to determine the 222-S Analytical Laboratory's ability to homogenize solid core segments
Mahalanobis, P C
1965-01-01
Contributions to Statistics focuses on the processes, methodologies, and approaches involved in statistics. The book is presented to Professor P. C. Mahalanobis on the occasion of his 70th birthday. The selection first offers information on the recovery of ancillary information and combinatorial properties of partially balanced designs and association schemes. Discussions focus on combinatorial applications of the algebra of association matrices, sample size analogy, association matrices and the algebra of association schemes, and conceptual statistical experiments. The book then examines latt
Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James
2014-01-01
Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
Design-based Sample and Probability Law-Assumed Sample: Their Role in Scientific Investigation.
Ojeda, Mario Miguel; Sahai, Hardeo
2002-01-01
Discusses some key statistical concepts in probabilistic and non-probabilistic sampling to provide an overview for understanding the inference process. Suggests a statistical model constituting the basis of statistical inference and provides a brief review of the finite population descriptive inference and a quota sampling inferential theory.…
Parametric analysis of the statistical model of the stick-slip process
Lima, Roberta; Sampaio, Rubens
2017-06-01
In this paper it is performed a parametric analysis of the statistical model of the response of a dry-friction oscillator. The oscillator is a spring-mass system which moves over a base with a rough surface. Due to this roughness, the mass is subject to a dry-frictional force modeled as a Coulomb friction. The system is stochastically excited by an imposed bang-bang base motion. The base velocity is modeled by a Poisson process for which a probabilistic model is fully specified. The excitation induces in the system stochastic stick-slip oscillations. The system response is composed by a random sequence alternating stick and slip-modes. With realizations of the system, a statistical model is constructed for this sequence. In this statistical model, the variables of interest of the sequence are modeled as random variables, as for example, the number of time intervals in which stick or slip occur, the instants at which they begin, and their duration. Samples of the system response are computed by integration of the dynamic equation of the system using independent samples of the base motion. Statistics and histograms of the random variables which characterize the stick-slip process are estimated for the generated samples. The objective of the paper is to analyze how these estimated statistics and histograms vary with the system parameters, i.e., to make a parametric analysis of the statistical model of the stick-slip process.
LHCb: Statistical Comparison of CPU performance for LHCb applications on the Grid
Graciani, R
2009-01-01
The usage of CPU resources by LHCb on the Grid id dominated by two different applications: Gauss and Brunel. Gauss the application doing the Monte Carlo simulation of proton-proton collisions. Brunel is the application responsible for the reconstruction of the signals recorded by the detector converting them into objects that can be used for later physics analysis of the data (tracks, clusters,…) Both applications are based on the Gaudi and LHCb software frameworks. Gauss uses Pythia and Geant as underlying libraries for the simulation of the collision and the later passage of the generated particles through the LHCb detector. While Brunel makes use of LHCb specific code to process the data from each sub-detector. Both applications are CPU bound. Large Monte Carlo productions or data reconstructions running on the Grid are an ideal benchmark to compare the performance of the different CPU models for each case. Since the processed events are only statistically comparable, only statistical comparison of the...
Statistical data fusion for cross-tabulation
Kamakura, W.A.; Wedel, M.
The authors address the situation in which a researcher wants to cross-tabulate two sets of discrete variables collected in independent samples, but a subset of the variables is common to both samples. The authors propose a statistical data-fusion model that allows for statistical tests of
Business statistics I essentials
Clark, Louise
2014-01-01
REA's Essentials provide quick and easy access to critical information in a variety of different fields, ranging from the most basic to the most advanced. As its name implies, these concise, comprehensive study guides summarize the essentials of the field covered. Essentials are helpful when preparing for exams, doing homework and will remain a lasting reference source for students, teachers, and professionals. Business Statistics I includes descriptive statistics, introduction to probability, probability distributions, sampling and sampling distributions, interval estimation, and hypothesis t
The Statistical Analysis Techniques to Support the NGNP Fuel Performance Experiments
International Nuclear Information System (INIS)
Pham, Bihn T.; Einerson, Jeffrey J.
2010-01-01
This paper describes the development and application of statistical analysis techniques to support the AGR experimental program on NGNP fuel performance. The experiments conducted in the Idaho National Laboratory's Advanced Test Reactor employ fuel compacts placed in a graphite cylinder shrouded by a steel capsule. The tests are instrumented with thermocouples embedded in graphite blocks and the target quantity (fuel/graphite temperature) is regulated by the He-Ne gas mixture that fills the gap volume. Three techniques for statistical analysis, namely control charting, correlation analysis, and regression analysis, are implemented in the SAS-based NGNP Data Management and Analysis System (NDMAS) for automated processing and qualification of the AGR measured data. The NDMAS also stores daily neutronic (power) and thermal (heat transfer) code simulation results along with the measurement data, allowing for their combined use and comparative scrutiny. The ultimate objective of this work includes (a) a multi-faceted system for data monitoring and data accuracy testing, (b) identification of possible modes of diagnostics deterioration and changes in experimental conditions, (c) qualification of data for use in code validation, and (d) identification and use of data trends to support effective control of test conditions with respect to the test target. Analysis results and examples given in the paper show the three statistical analysis techniques providing a complementary capability to warn of thermocouple failures. It also suggests that the regression analysis models relating calculated fuel temperatures and thermocouple readings can enable online regulation of experimental parameters (i.e. gas mixture content), to effectively maintain the target quantity (fuel temperature) within a given range.
The statistical analysis techniques to support the NGNP fuel performance experiments
Energy Technology Data Exchange (ETDEWEB)
Pham, Binh T., E-mail: Binh.Pham@inl.gov; Einerson, Jeffrey J.
2013-10-15
This paper describes the development and application of statistical analysis techniques to support the Advanced Gas Reactor (AGR) experimental program on Next Generation Nuclear Plant (NGNP) fuel performance. The experiments conducted in the Idaho National Laboratory’s Advanced Test Reactor employ fuel compacts placed in a graphite cylinder shrouded by a steel capsule. The tests are instrumented with thermocouples embedded in graphite blocks and the target quantity (fuel temperature) is regulated by the He–Ne gas mixture that fills the gap volume. Three techniques for statistical analysis, namely control charting, correlation analysis, and regression analysis, are implemented in the NGNP Data Management and Analysis System for automated processing and qualification of the AGR measured data. The neutronic and thermal code simulation results are used for comparative scrutiny. The ultimate objective of this work includes (a) a multi-faceted system for data monitoring and data accuracy testing, (b) identification of possible modes of diagnostics deterioration and changes in experimental conditions, (c) qualification of data for use in code validation, and (d) identification and use of data trends to support effective control of test conditions with respect to the test target. Analysis results and examples given in the paper show the three statistical analysis techniques providing a complementary capability to warn of thermocouple failures. It also suggests that the regression analysis models relating calculated fuel temperatures and thermocouple readings can enable online regulation of experimental parameters (i.e. gas mixture content), to effectively maintain the fuel temperature within a given range.
Statistics 101 for Radiologists.
Anvari, Arash; Halpern, Elkan F; Samir, Anthony E
2015-10-01
Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced. © RSNA, 2015.
Turking Statistics: Student-Generated Surveys Increase Student Engagement and Performance
Whitley, Cameron T.; Dietz, Thomas
2018-01-01
Thirty years ago, Hubert M. Blalock Jr. published an article in "Teaching Sociology" about the importance of teaching statistics. We honor Blalock's legacy by assessing how using Amazon Mechanical Turk (MTurk) in statistics classes can enhance student learning and increase statistical literacy among social science gradaute students. In…
Evaluation of Respondent-Driven Sampling
McCreesh, Nicky; Frost, Simon; Seeley, Janet; Katongole, Joseph; Tarsh, Matilda Ndagire; Ndunguse, Richard; Jichi, Fatima; Lunel, Natasha L; Maher, Dermot; Johnston, Lisa G; Sonnenberg, Pam; Copas, Andrew J; Hayes, Richard J; White, Richard G
2012-01-01
Background Respondent-driven sampling is a novel variant of link-tracing sampling for estimating the characteristics of hard-to-reach groups, such as HIV prevalence in sex-workers. Despite its use by leading health organizations, the performance of this method in realistic situations is still largely unknown. We evaluated respondent-driven sampling by comparing estimates from a respondent-driven sampling survey with total-population data. Methods Total-population data on age, tribe, religion, socioeconomic status, sexual activity and HIV status were available on a population of 2402 male household-heads from an open cohort in rural Uganda. A respondent-driven sampling (RDS) survey was carried out in this population, employing current methods of sampling (RDS sample) and statistical inference (RDS estimates). Analyses were carried out for the full RDS sample and then repeated for the first 250 recruits (small sample). Results We recruited 927 household-heads. Full and small RDS samples were largely representative of the total population, but both samples under-represented men who were younger, of higher socioeconomic status, and with unknown sexual activity and HIV status. Respondent-driven-sampling statistical-inference methods failed to reduce these biases. Only 31%-37% (depending on method and sample size) of RDS estimates were closer to the true population proportions than the RDS sample proportions. Only 50%-74% of respondent-driven-sampling bootstrap 95% confidence intervals included the population proportion. Conclusions Respondent-driven sampling produced a generally representative sample of this well-connected non-hidden population. However, current respondent-driven-sampling inference methods failed to reduce bias when it occurred. Whether the data required to remove bias and measure precision can be collected in a respondent-driven sampling survey is unresolved. Respondent-driven sampling should be regarded as a (potentially superior) form of convenience-sampling
Evaluation of respondent-driven sampling.
McCreesh, Nicky; Frost, Simon D W; Seeley, Janet; Katongole, Joseph; Tarsh, Matilda N; Ndunguse, Richard; Jichi, Fatima; Lunel, Natasha L; Maher, Dermot; Johnston, Lisa G; Sonnenberg, Pam; Copas, Andrew J; Hayes, Richard J; White, Richard G
2012-01-01
Respondent-driven sampling is a novel variant of link-tracing sampling for estimating the characteristics of hard-to-reach groups, such as HIV prevalence in sex workers. Despite its use by leading health organizations, the performance of this method in realistic situations is still largely unknown. We evaluated respondent-driven sampling by comparing estimates from a respondent-driven sampling survey with total population data. Total population data on age, tribe, religion, socioeconomic status, sexual activity, and HIV status were available on a population of 2402 male household heads from an open cohort in rural Uganda. A respondent-driven sampling (RDS) survey was carried out in this population, using current methods of sampling (RDS sample) and statistical inference (RDS estimates). Analyses were carried out for the full RDS sample and then repeated for the first 250 recruits (small sample). We recruited 927 household heads. Full and small RDS samples were largely representative of the total population, but both samples underrepresented men who were younger, of higher socioeconomic status, and with unknown sexual activity and HIV status. Respondent-driven sampling statistical inference methods failed to reduce these biases. Only 31%-37% (depending on method and sample size) of RDS estimates were closer to the true population proportions than the RDS sample proportions. Only 50%-74% of respondent-driven sampling bootstrap 95% confidence intervals included the population proportion. Respondent-driven sampling produced a generally representative sample of this well-connected nonhidden population. However, current respondent-driven sampling inference methods failed to reduce bias when it occurred. Whether the data required to remove bias and measure precision can be collected in a respondent-driven sampling survey is unresolved. Respondent-driven sampling should be regarded as a (potentially superior) form of convenience sampling method, and caution is required
Sampling, Probability Models and Statistical Reasoning -RE ...
Indian Academy of Sciences (India)
random sampling allows data to be modelled with the help of probability ... g based on different trials to get an estimate of the experimental error. ... research interests lie in the .... if e is indeed the true value of the proportion of defectives in the.
Rohatgi, Vijay K
2003-01-01
Unified treatment of probability and statistics examines and analyzes the relationship between the two fields, exploring inferential issues. Numerous problems, examples, and diagrams--some with solutions--plus clear-cut, highlighted summaries of results. Advanced undergraduate to graduate level. Contents: 1. Introduction. 2. Probability Model. 3. Probability Distributions. 4. Introduction to Statistical Inference. 5. More on Mathematical Expectation. 6. Some Discrete Models. 7. Some Continuous Models. 8. Functions of Random Variables and Random Vectors. 9. Large-Sample Theory. 10. General Meth
Jeffrey J. Green; Courtenay C. Stone; Abera Zegeye; Thomas A. Charles
2007-01-01
We use a binary probit model to assess the impact of several changes in math prerequisites on student performance in an undergraduate business statistics course. While the initial prerequisites did not necessarily provide students with the necessary math skills, our study, the first to examine the effect of math prerequisite changes, shows that these changes were deleterious to student performance. Our results helped convince the College of Business to change the math prerequisite again begin...
Identification of AE Bursts by Classification of Physical and Statistical Parameters
International Nuclear Information System (INIS)
Mieza, J.I.; Oliveto, M.E.; Lopez Pumarega, M.I.; Armeite, M.; Ruzzante, J.E.; Piotrkowski, R.
2005-01-01
Physical and statistical parameters obtained with the Principal Components method, extracted from Acoustic Emission bursts coming from triaxial deformation tests were analyzed. The samples came from seamless steel tubes used in the petroleum industry and some of them were provided with a protective coating. The purpose of our work was to identify bursts originated in the breakage of the coating, from those originated in damage mechanisms in the bulk steel matrix. Analysis was performed by statistical distributions, fractal analysis and clustering methods
Tabor, Josh
2010-01-01
On the 2009 AP[c] Statistics Exam, students were asked to create a statistic to measure skewness in a distribution. This paper explores several of the most popular student responses and evaluates which statistic performs best when sampling from various skewed populations. (Contains 8 figures, 3 tables, and 4 footnotes.)
A laboratory evaluation of the influence of weighing gauges performance on extreme events statistics
Colli, Matteo; Lanza, Luca
2014-05-01
The effects of inaccurate ground based rainfall measurements on the information derived from rain records is yet not much documented in the literature. La Barbera et al. (2002) investigated the propagation of the systematic mechanic errors of tipping bucket type rain gauges (TBR) into the most common statistics of rainfall extremes, e.g. in the assessment of the return period T (or the related non-exceedance probability) of short-duration/high intensity events. Colli et al. (2012) and Lanza et al. (2012) extended the analysis to a 22-years long precipitation data set obtained from a virtual weighing type gauge (WG). The artificial WG time series was obtained basing on real precipitation data measured at the meteo-station of the University of Genova and modelling the weighing gauge output as a linear dynamic system. This approximation was previously validated with dedicated laboratory experiments and is based on the evidence that the accuracy of WG measurements under real world/time varying rainfall conditions is mainly affected by the dynamic response of the gauge (as revealed during the last WMO Field Intercomparison of Rainfall Intensity Gauges). The investigation is now completed by analyzing actual measurements performed by two common weighing gauges, the OTT Pluvio2 load-cell gauge and the GEONOR T-200 vibrating-wire gauge, since both these instruments demonstrated very good performance under previous constant flow rate calibration efforts. A laboratory dynamic rainfall generation system has been arranged and validated in order to simulate a number of precipitation events with variable reference intensities. Such artificial events were generated basing on real world rainfall intensity (RI) records obtained from the meteo-station of the University of Genova so that the statistical structure of the time series is preserved. The influence of the WG RI measurements accuracy on the associated extreme events statistics is analyzed by comparing the original intensity
Olives, Casey; Valadez, Joseph J; Brooker, Simon J; Pagano, Marcello
2012-01-01
Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n=15 and n=25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n=15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools.
Brus, D.J.
2015-01-01
In balanced sampling a linear relation between the soil property of interest and one or more covariates with known means is exploited in selecting the sampling locations. Recent developments make this sampling design attractive for statistical soil surveys. This paper introduces balanced sampling
Moyé, Lemuel A; Lai, Dejian; Jing, Kaiyan; Baraniuk, Mary Sarah; Kwak, Minjung; Penn, Marc S; Wu, Colon O
2011-01-01
The assumptions that anchor large clinical trials are rooted in smaller, Phase II studies. In addition to specifying the target population, intervention delivery, and patient follow-up duration, physician-scientists who design these Phase II studies must select the appropriate response variables (endpoints). However, endpoint measures can be problematic. If the endpoint assesses the change in a continuous measure over time, then the occurrence of an intervening significant clinical event (SCE), such as death, can preclude the follow-up measurement. Finally, the ideal continuous endpoint measurement may be contraindicated in a fraction of the study patients, a change that requires a less precise substitution in this subset of participants.A score function that is based on the U-statistic can address these issues of 1) intercurrent SCE's and 2) response variable ascertainments that use different measurements of different precision. The scoring statistic is easy to apply, clinically relevant, and provides flexibility for the investigators' prospective design decisions. Sample size and power formulations for this statistic are provided as functions of clinical event rates and effect size estimates that are easy for investigators to identify and discuss. Examples are provided from current cardiovascular cell therapy research.
Fourcade, Yoan; Engler, Jan O; Rödder, Dennis; Secondi, Jean
2014-01-01
MAXENT is now a common species distribution modeling (SDM) tool used by conservation practitioners for predicting the distribution of a species from a set of records and environmental predictors. However, datasets of species occurrence used to train the model are often biased in the geographical space because of unequal sampling effort across the study area. This bias may be a source of strong inaccuracy in the resulting model and could lead to incorrect predictions. Although a number of sampling bias correction methods have been proposed, there is no consensual guideline to account for it. We compared here the performance of five methods of bias correction on three datasets of species occurrence: one "virtual" derived from a land cover map, and two actual datasets for a turtle (Chrysemys picta) and a salamander (Plethodon cylindraceus). We subjected these datasets to four types of sampling biases corresponding to potential types of empirical biases. We applied five correction methods to the biased samples and compared the outputs of distribution models to unbiased datasets to assess the overall correction performance of each method. The results revealed that the ability of methods to correct the initial sampling bias varied greatly depending on bias type, bias intensity and species. However, the simple systematic sampling of records consistently ranked among the best performing across the range of conditions tested, whereas other methods performed more poorly in most cases. The strong effect of initial conditions on correction performance highlights the need for further research to develop a step-by-step guideline to account for sampling bias. However, this method seems to be the most efficient in correcting sampling bias and should be advised in most cases.
Energy Technology Data Exchange (ETDEWEB)
Inada, Naohisa; /Wako, RIKEN /Tokyo U., ICEPP; Oguri, Masamune; /Natl. Astron. Observ. of Japan /Stanford U., Phys. Dept.; Shin, Min-Su; /Michigan U. /Princeton U. Observ.; Kayo, Issha; /Tokyo U., ICRR; Strauss, Michael A.; /Princeton U. Observ.; Hennawi, Joseph F.; /UC, Berkeley /Heidelberg, Max Planck Inst. Astron.; Morokuma, Tomoki; /Natl. Astron. Observ. of Japan; Becker, Robert H.; /LLNL, Livermore /UC, Davis; White, Richard L.; /Baltimore, Space Telescope Sci.; Kochanek, Christopher S.; /Ohio State U.; Gregg, Michael D.; /LLNL, Livermore /UC, Davis /Exeter U.
2010-05-01
We present the second report of our systematic search for strongly lensed quasars from the data of the Sloan Digital Sky Survey (SDSS). From extensive follow-up observations of 136 candidate objects, we find 36 lenses in the full sample of 77,429 spectroscopically confirmed quasars in the SDSS Data Release 5. We then define a complete sample of 19 lenses, including 11 from our previous search in the SDSS Data Release 3, from the sample of 36,287 quasars with i < 19.1 in the redshift range 0.6 < z < 2.2, where we require the lenses to have image separations of 1 < {theta} < 20 and i-band magnitude differences between the two images smaller than 1.25 mag. Among the 19 lensed quasars, 3 have quadruple-image configurations, while the remaining 16 show double images. This lens sample constrains the cosmological constant to be {Omega}{sub {Lambda}} = 0.84{sub -0.08}{sup +0.06}(stat.){sub -0.07}{sup + 0.09}(syst.) assuming a flat universe, which is in good agreement with other cosmological observations. We also report the discoveries of 7 binary quasars with separations ranging from 1.1 to 16.6, which are identified in the course of our lens survey. This study concludes the construction of our statistical lens sample in the full SDSS-I data set.
Testing statistical hypotheses
Lehmann, E L
2005-01-01
The third edition of Testing Statistical Hypotheses updates and expands upon the classic graduate text, emphasizing optimality theory for hypothesis testing and confidence sets. The principal additions include a rigorous treatment of large sample optimality, together with the requisite tools. In addition, an introduction to the theory of resampling methods such as the bootstrap is developed. The sections on multiple testing and goodness of fit testing are expanded. The text is suitable for Ph.D. students in statistics and includes over 300 new problems out of a total of more than 760. E.L. Lehmann is Professor of Statistics Emeritus at the University of California, Berkeley. He is a member of the National Academy of Sciences and the American Academy of Arts and Sciences, and the recipient of honorary degrees from the University of Leiden, The Netherlands and the University of Chicago. He is the author of Elements of Large-Sample Theory and (with George Casella) he is also the author of Theory of Point Estimat...
A statistical test for outlier identification in data envelopment analysis
Directory of Open Access Journals (Sweden)
Morteza Khodabin
2010-09-01
Full Text Available In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results. In these ‘‘deterministic’’ frontier models, statistical theory is now mostly available. This paper deals with the statistical pared sample method and its capability of detecting outliers in data envelopment analysis. In the presented method, each observation is deleted from the sample once and the resulting linear program is solved, leading to a distribution of efficiency estimates. Based on the achieved distribution, a pared test is designed to identify the potential outlier(s. We illustrate the method through a real data set. The method could be used in a first step, as an exploratory data analysis, before using any frontier estimation.
McCray, Wilmon Wil L., Jr.
The research was prompted by a need to conduct a study that assesses process improvement, quality management and analytical techniques taught to students in U.S. colleges and universities undergraduate and graduate systems engineering and the computing science discipline (e.g., software engineering, computer science, and information technology) degree programs during their academic training that can be applied to quantitatively manage processes for performance. Everyone involved in executing repeatable processes in the software and systems development lifecycle processes needs to become familiar with the concepts of quantitative management, statistical thinking, process improvement methods and how they relate to process-performance. Organizations are starting to embrace the de facto Software Engineering Institute (SEI) Capability Maturity Model Integration (CMMI RTM) Models as process improvement frameworks to improve business processes performance. High maturity process areas in the CMMI model imply the use of analytical, statistical, quantitative management techniques, and process performance modeling to identify and eliminate sources of variation, continually improve process-performance; reduce cost and predict future outcomes. The research study identifies and provides a detail discussion of the gap analysis findings of process improvement and quantitative analysis techniques taught in U.S. universities systems engineering and computing science degree programs, gaps that exist in the literature, and a comparison analysis which identifies the gaps that exist between the SEI's "healthy ingredients " of a process performance model and courses taught in U.S. universities degree program. The research also heightens awareness that academicians have conducted little research on applicable statistics and quantitative techniques that can be used to demonstrate high maturity as implied in the CMMI models. The research also includes a Monte Carlo simulation optimization
Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics
Dowding, Irene; Haufe, Stefan
2018-01-01
Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885
Efficient Parallel Statistical Model Checking of Biochemical Networks
Directory of Open Access Journals (Sweden)
Paolo Ballarini
2009-12-01
Full Text Available We consider the problem of verifying stochastic models of biochemical networks against behavioral properties expressed in temporal logic terms. Exact probabilistic verification approaches such as, for example, CSL/PCTL model checking, are undermined by a huge computational demand which rule them out for most real case studies. Less demanding approaches, such as statistical model checking, estimate the likelihood that a property is satisfied by sampling executions out of the stochastic model. We propose a methodology for efficiently estimating the likelihood that a LTL property P holds of a stochastic model of a biochemical network. As with other statistical verification techniques, the methodology we propose uses a stochastic simulation algorithm for generating execution samples, however there are three key aspects that improve the efficiency: first, the sample generation is driven by on-the-fly verification of P which results in optimal overall simulation time. Second, the confidence interval estimation for the probability of P to hold is based on an efficient variant of the Wilson method which ensures a faster convergence. Third, the whole methodology is designed according to a parallel fashion and a prototype software tool has been implemented that performs the sampling/verification process in parallel over an HPC architecture.
Kappa statistic for clustered matched-pair data.
Yang, Zhao; Zhou, Ming
2014-07-10
Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
Test plan for evaluating the performance of the in-tank fluidic sampling system
International Nuclear Information System (INIS)
BOGER, R.M.
1999-01-01
The PHMC will provide Low Activity Wastes (LAW) tank wastes for final treatment by a privatization contractor from double-shell feed tanks, 241-AP-102 and 241-AP-104, Concerns about the inability of the baseline ''grab'' sampling to provide large volume samples within time constraints has led to the development of a conceptual sampling system that would be deployed in a feed tank riser, This sampling system will provide large volume, representative samples without the environmental, radiation exposure, and sample volume impacts of the current base-line ''grab'' sampling method. This test plan identifies ''proof-of-principle'' cold tests for the conceptual sampling system using simulant materials. The need for additional testing was identified as a result of completing tests described in the revision test plan document, Revision 1 outlines tests that will evaluate the performance and ability to provide samples that are representative of a tanks' content within a 95 percent confidence interval, to recovery from plugging, to sample supernatant wastes with over 25 wt% solids content, and to evaluate the impact of sampling at different heights within the feed tank. The test plan also identifies operating parameters that will optimize the performance of the sampling system
The Impact of Including Below Detection Limit Samples in Post Decommissioning Soil Sample Analyses
Energy Technology Data Exchange (ETDEWEB)
Kim, Jung Hwan; Yim, Man-Sung [KAIST, Daejeon (Korea, Republic of)
2016-10-15
To meet the required standards the site owner has to show that the soil at the facility has been sufficiently cleaned up. To do this one must know the contamination of the soil at the site prior to clean up. This involves sampling that soil to identify the degree of contamination. However there is a technical difficulty in determining how much decontamination should be done. The problem arises when measured samples are below the detection limit. Regulatory guidelines for site reuse after decommissioning are commonly challenged because the majority of the activity in the soil at or below the limit of detection. Using additional statistical analyses of contaminated soil after decommissioning is expected to have the following advantages: a better and more reliable probabilistic exposure assessment, better economics (lower project costs) and improved communication with the public. This research will develop an approach that defines an acceptable method for demonstrating compliance of decommissioned NPP sites and validates that compliance. Soil samples from NPP often contain censored data. Conventional methods for dealing with censored data sets are statistically biased and limited in their usefulness. In this research, additional methods are performed using real data from a monazite manufacturing factory.
Energy Technology Data Exchange (ETDEWEB)
Cho, Su Gil; Jang, Jun Yong; Kim, Ji Hoon; Lee, Tae Hee [Hanyang University, Seoul (Korea, Republic of); Lee, Min Uk [Romax Technology Ltd., Seoul (Korea, Republic of); Choi, Jong Su; Hong, Sup [Korea Research Institute of Ships and Ocean Engineering, Daejeon (Korea, Republic of)
2015-04-15
Sequential surrogate model-based global optimization algorithms, such as super-EGO, have been developed to increase the efficiency of commonly used global optimization technique as well as to ensure the accuracy of optimization. However, earlier studies have drawbacks because there are three phases in the optimization loop and empirical parameters. We propose a united sampling criterion to simplify the algorithm and to achieve the global optimum of problems with constraints without any empirical parameters. It is able to select the points located in a feasible region with high model uncertainty as well as the points along the boundary of constraint at the lowest objective value. The mean squared error determines which criterion is more dominant among the infill sampling criterion and boundary sampling criterion. Also, the method guarantees the accuracy of the surrogate model because the sample points are not located within extremely small regions like super-EGO. The performance of the proposed method, such as the solvability of a problem, convergence properties, and efficiency, are validated through nonlinear numerical examples with disconnected feasible regions.
International Nuclear Information System (INIS)
Hickey, E.E.; Stoetzel, G.A.; Strom, D.J.; Cicotte, G.R.; Wiblin, C.M.; McGuire, S.A.
1993-09-01
This report provides technical information on air sampling that will be useful for facilities following the recommendations in the NRC's Regulatory Guide 8.25, Revision 1, ''Air sampling in the Workplace.'' That guide addresses air sampling to meet the requirements in NRC's regulations on radiation protection, 10 CFR Part 20. This report describes how to determine the need for air sampling based on the amount of material in process modified by the type of material, release potential, and confinement of the material. The purposes of air sampling and how the purposes affect the types of air sampling provided are discussed. The report discusses how to locate air samplers to accurately determine the concentrations of airborne radioactive materials that workers will be exposed to. The need for and the methods of performing airflow pattern studies to improve the accuracy of air sampling results are included. The report presents and gives examples of several techniques that can be used to evaluate whether the airborne concentrations of material are representative of the air inhaled by workers. Methods to adjust derived air concentrations for particle size are described. Methods to calibrate for volume of air sampled and estimate the uncertainty in the volume of air sampled are described. Statistical tests for determining minimum detectable concentrations are presented. How to perform an annual evaluation of the adequacy of the air sampling is also discussed
Optimization of counting time using count statistics on a diffraction beamline
Energy Technology Data Exchange (ETDEWEB)
Marais, D., E-mail: Deon.Marais@necsa.co.za [Research and Development Division, South African Nuclear Energy Corporation (Necsa) SOC Limited, PO Box 582, Pretoria 0001 (South Africa); School of Mechanical and Nuclear Engineering, North-West University, Potchefstroom 2520 (South Africa); Venter, A.M., E-mail: Andrew.Venter@necsa.co.za [Research and Development Division, South African Nuclear Energy Corporation (Necsa) SOC Limited, PO Box 582, Pretoria 0001 (South Africa); Faculty of Agriculture Science and Technology, North-West University, Mahikeng 2790 (South Africa); Markgraaff, J., E-mail: Johan.Markgraaff@nwu.ac.za [School of Mechanical and Nuclear Engineering, North-West University, Potchefstroom 2520 (South Africa)
2016-05-11
The feasibility of an alternative data acquisition strategy to improve the efficiency of beam time usage with neutron strain scanner instruments is demonstrated. By performing strain measurements against set statistical criteria, rather than time, not only leads to substantially reduced sample investigation time but also renders data of similar quality throughout.
Luchko, Tyler; Blinov, Nikolay; Limon, Garrett C.; Joyce, Kevin P.; Kovalenko, Andriy
2016-11-01
Implicit solvent methods for classical molecular modeling are frequently used to provide fast, physics-based hydration free energies of macromolecules. Less commonly considered is the transferability of these methods to other solvents. The Statistical Assessment of Modeling of Proteins and Ligands 5 (SAMPL5) distribution coefficient dataset and the accompanying explicit solvent partition coefficient reference calculations provide a direct test of solvent model transferability. Here we use the 3D reference interaction site model (3D-RISM) statistical-mechanical solvation theory, with a well tested water model and a new united atom cyclohexane model, to calculate partition coefficients for the SAMPL5 dataset. The cyclohexane model performed well in training and testing (R=0.98 for amino acid neutral side chain analogues) but only if a parameterized solvation free energy correction was used. In contrast, the same protocol, using single solute conformations, performed poorly on the SAMPL5 dataset, obtaining R=0.73 compared to the reference partition coefficients, likely due to the much larger solute sizes. Including solute conformational sampling through molecular dynamics coupled with 3D-RISM (MD/3D-RISM) improved agreement with the reference calculation to R=0.93. Since our initial calculations only considered partition coefficients and not distribution coefficients, solute sampling provided little benefit comparing against experiment, where ionized and tautomer states are more important. Applying a simple pK_{ {a}} correction improved agreement with experiment from R=0.54 to R=0.66, despite a small number of outliers. Better agreement is possible by accounting for tautomers and improving the ionization correction.
Luchko, Tyler; Blinov, Nikolay; Limon, Garrett C; Joyce, Kevin P; Kovalenko, Andriy
2016-11-01
Implicit solvent methods for classical molecular modeling are frequently used to provide fast, physics-based hydration free energies of macromolecules. Less commonly considered is the transferability of these methods to other solvents. The Statistical Assessment of Modeling of Proteins and Ligands 5 (SAMPL5) distribution coefficient dataset and the accompanying explicit solvent partition coefficient reference calculations provide a direct test of solvent model transferability. Here we use the 3D reference interaction site model (3D-RISM) statistical-mechanical solvation theory, with a well tested water model and a new united atom cyclohexane model, to calculate partition coefficients for the SAMPL5 dataset. The cyclohexane model performed well in training and testing ([Formula: see text] for amino acid neutral side chain analogues) but only if a parameterized solvation free energy correction was used. In contrast, the same protocol, using single solute conformations, performed poorly on the SAMPL5 dataset, obtaining [Formula: see text] compared to the reference partition coefficients, likely due to the much larger solute sizes. Including solute conformational sampling through molecular dynamics coupled with 3D-RISM (MD/3D-RISM) improved agreement with the reference calculation to [Formula: see text]. Since our initial calculations only considered partition coefficients and not distribution coefficients, solute sampling provided little benefit comparing against experiment, where ionized and tautomer states are more important. Applying a simple [Formula: see text] correction improved agreement with experiment from [Formula: see text] to [Formula: see text], despite a small number of outliers. Better agreement is possible by accounting for tautomers and improving the ionization correction.
Lyons, L.
2016-01-01
Accelerators and detectors are expensive, both in terms of money and human effort. It is thus important to invest effort in performing a good statistical anal- ysis of the data, in order to extract the best information from it. This series of five lectures deals with practical aspects of statistical issues that arise in typical High Energy Physics analyses.
MANAGERIAL DECISION IN INNOVATIVE EDUCATION SYSTEMS STATISTICAL SURVEY BASED ON SAMPLE THEORY
Directory of Open Access Journals (Sweden)
Gheorghe SĂVOIU
2012-12-01
Full Text Available Before formulating the statistical hypotheses and the econometrictesting itself, a breakdown of some of the technical issues is required, which are related to managerial decision in innovative educational systems, the educational managerial phenomenon tested through statistical and mathematical methods, respectively the significant difference in perceiving the current qualities, knowledge, experience, behaviour and desirable health, obtained through a questionnaire applied to a stratified population at the end,in the educational environment, either with educational activities, or with simultaneously managerial and educational activities. The details having to do with research focused on the survey theory, turning into a working tool the questionnaires and statistical data that are processed from those questionnaires, are summarized below.
International Nuclear Information System (INIS)
Ishii, Kyoko; Matsumiya, Hisato; Horie, Hideki; Miyagi, Kazumi
2009-01-01
The purpose of this work is to evaluate quantitatively and statistically the safety performance of Super-Safe, Small, and Simple reactor (4S) by analyzing with ARGO code, a plant dynamics code for a sodium-cooled fast reactor. In this evaluation, an Anticipated Transient Without Scram (ATWS) is assumed, and an Unprotected Loss of Flow (ULOF) event is selected as a typical ATWS case. After a metric concerned with safety design is defined as performance factor a Phenomena Identification Ranking Table (PIRT) is produced in order to select the plausible phenomena that affect the metric. Then a sensitivity analysis is performed for the parameters related to the selected plausible phenomena. Finally the metric is evaluated with statistical methods whether it satisfies the given safety acceptance criteria. The result is as follows: The Cumulative Damage Fraction (CDF) for the cladding is defined as a metric, and the statistical estimation of the one-sided upper tolerance limit of 95 percent probability at a 95 percent confidence level in CDF is within the safety acceptance criterion; CDF < 0.1. The result shows that the 4S safety performance is acceptable in the ULOF event. (author)
Statistical concepts a second course
Lomax, Richard G
2012-01-01
Statistical Concepts consists of the last 9 chapters of An Introduction to Statistical Concepts, 3rd ed. Designed for the second course in statistics, it is one of the few texts that focuses just on intermediate statistics. The book highlights how statistics work and what they mean to better prepare students to analyze their own data and interpret SPSS and research results. As such it offers more coverage of non-parametric procedures used when standard assumptions are violated since these methods are more frequently encountered when working with real data. Determining appropriate sample sizes
An introduction to statistical computing a simulation-based approach
Voss, Jochen
2014-01-01
A comprehensive introduction to sampling-based methods in statistical computing The use of computers in mathematics and statistics has opened up a wide range of techniques for studying otherwise intractable problems. Sampling-based simulation techniques are now an invaluable tool for exploring statistical models. This book gives a comprehensive introduction to the exciting area of sampling-based methods. An Introduction to Statistical Computing introduces the classical topics of random number generation and Monte Carlo methods. It also includes some advanced met
International Nuclear Information System (INIS)
Yamaoka, Naoto; Watanabe, Wataru; Hontani, Hidekata
2010-01-01
Most of the time when we construct statistical point cloud model, we need to calculate the corresponding points. Constructed statistical model will not be the same if we use different types of method to calculate the corresponding points. This article proposes the effect to statistical model of human organ made by different types of method to calculate the corresponding points. We validated the performance of statistical model by registering a surface of an organ in a 3D medical image. We compare two methods to calculate corresponding points. The first, the 'Generalized Multi-Dimensional Scaling (GMDS)', determines the corresponding points by the shapes of two curved surfaces. The second approach, the 'Entropy-based Particle system', chooses corresponding points by calculating a number of curved surfaces statistically. By these methods we construct the statistical models and using these models we conducted registration with the medical image. For the estimation, we use non-parametric belief propagation and this method estimates not only the position of the organ but also the probability density of the organ position. We evaluate how the two different types of method that calculates corresponding points affects the statistical model by change in probability density of each points. (author)
A statistical evaluation of asbestos air concentrations
International Nuclear Information System (INIS)
Lange, J.H.
1999-01-01
Both area and personal air samples collected during an asbestos abatement project were matched and statistically analysed. Among the many parameters studied were fibre concentrations and their variability. Mean values for area and personal samples were 0.005 and 0.024 f cm - - 3 of air, respectively. Summary values for area and personal samples suggest that exposures are low with no single exposure value exceeding the current OSHA TWA value of 0.1 f cm -3 of air. Within- and between-worker analysis suggests that these data are homogeneous. Comparison of within- and between-worker values suggests that the exposure source and variability for abatement are more related to the process than individual practices. This supports the importance of control measures for abatement. Study results also suggest that area and personal samples are not statistically related, that is, there is no association observed for these two sampling methods when data are analysed by correlation or regression analysis. Personal samples were statistically higher in concentration than area samples. Area sampling cannot be used as a surrogate exposure for asbestos abatement workers. (author)
Sampling the Mouse Hippocampal Dentate Gyrus
Lisa Basler; Lisa Basler; Stephan Gerdes; David P. Wolfer; David P. Wolfer; David P. Wolfer; Lutz Slomianka; Lutz Slomianka
2017-01-01
Sampling is a critical step in procedures that generate quantitative morphological data in the neurosciences. Samples need to be representative to allow statistical evaluations, and samples need to deliver a precision that makes statistical evaluations not only possible but also meaningful. Sampling generated variability should, e.g., not be able to hide significant group differences from statistical detection if they are present. Estimators of the coefficient of error (CE) have been develope...
Sensor performance as a function of sampling (d) and optical blur (Fλ)
Bijl, P.; Hogervorst, M.A.
2009-01-01
Detector sampling and optical blur are two major factors affecting Target Acquisition (TA) performance with modern EO and IR systems. In order to quantify their relative significance, we simulated five realistic LWIR and MWIR sensors from very under-sampled (detector pitch d >> diffraction blur Fλ)
Industrial statistics with Minitab
Cintas, Pere Grima; Llabres, Xavier Tort-Martorell
2012-01-01
Industrial Statistics with MINITAB demonstrates the use of MINITAB as a tool for performing statistical analysis in an industrial context. This book covers introductory industrial statistics, exploring the most commonly used techniques alongside those that serve to give an overview of more complex issues. A plethora of examples in MINITAB are featured along with case studies for each of the statistical techniques presented. Industrial Statistics with MINITAB: Provides comprehensive coverage of user-friendly practical guidance to the essential statistical methods applied in industry.Explores
Houts, Carrie R; Edwards, Michael C; Wirth, R J; Deal, Linda S
2016-11-01
There has been a notable increase in the advocacy of using small-sample designs as an initial quantitative assessment of item and scale performance during the scale development process. This is particularly true in the development of clinical outcome assessments (COAs), where Rasch analysis has been advanced as an appropriate statistical tool for evaluating the developing COAs using a small sample. We review the benefits such methods are purported to offer from both a practical and statistical standpoint and detail several problematic areas, including both practical and statistical theory concerns, with respect to the use of quantitative methods, including Rasch-consistent methods, with small samples. The feasibility of obtaining accurate information and the potential negative impacts of misusing large-sample statistical methods with small samples during COA development are discussed.
Time Series Analysis Based on Running Mann Whitney Z Statistics
A sensitive and objective time series analysis method based on the calculation of Mann Whitney U statistics is described. This method samples data rankings over moving time windows, converts those samples to Mann-Whitney U statistics, and then normalizes the U statistics to Z statistics using Monte-...
A.R. Mason; H.G. Paul
1994-01-01
Procedures for monitoring larval populations of the Douglas-fir tussock moth and the western spruce budworm are recommended based on many years experience in sampling these species in eastern Oregon and Washington. It is shown that statistically reliable estimates of larval density can be made for a population by sampling host trees in a series of permanent plots in a...
FUNSTAT and statistical image representations
Parzen, E.
1983-01-01
General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.
Raunig, David L; McShane, Lisa M; Pennello, Gene; Gatsonis, Constantine; Carson, Paul L; Voyvodic, James T; Wahl, Richard L; Kurland, Brenda F; Schwarz, Adam J; Gönen, Mithat; Zahlmann, Gudrun; Kondratovich, Marina V; O'Donnell, Kevin; Petrick, Nicholas; Cole, Patricia E; Garra, Brian; Sullivan, Daniel C
2015-02-01
Technological developments and greater rigor in the quantitative measurement of biological features in medical images have given rise to an increased interest in using quantitative imaging biomarkers to measure changes in these features. Critical to the performance of a quantitative imaging biomarker in preclinical or clinical settings are three primary metrology areas of interest: measurement linearity and bias, repeatability, and the ability to consistently reproduce equivalent results when conditions change, as would be expected in any clinical trial. Unfortunately, performance studies to date differ greatly in designs, analysis method, and metrics used to assess a quantitative imaging biomarker for clinical use. It is therefore difficult or not possible to integrate results from different studies or to use reported results to design studies. The Radiological Society of North America and the Quantitative Imaging Biomarker Alliance with technical, radiological, and statistical experts developed a set of technical performance analysis methods, metrics, and study designs that provide terminology, metrics, and methods consistent with widely accepted metrological standards. This document provides a consistent framework for the conduct and evaluation of quantitative imaging biomarker performance studies so that results from multiple studies can be compared, contrasted, or combined. © The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Practical continuous-variable quantum key distribution without finite sampling bandwidth effects.
Li, Huasheng; Wang, Chao; Huang, Peng; Huang, Duan; Wang, Tao; Zeng, Guihua
2016-09-05
In a practical continuous-variable quantum key distribution system, finite sampling bandwidth of the employed analog-to-digital converter at the receiver's side may lead to inaccurate results of pulse peak sampling. Then, errors in the parameters estimation resulted. Subsequently, the system performance decreases and security loopholes are exposed to eavesdroppers. In this paper, we propose a novel data acquisition scheme which consists of two parts, i.e., a dynamic delay adjusting module and a statistical power feedback-control algorithm. The proposed scheme may improve dramatically the data acquisition precision of pulse peak sampling and remove the finite sampling bandwidth effects. Moreover, the optimal peak sampling position of a pulse signal can be dynamically calibrated through monitoring the change of the statistical power of the sampled data in the proposed scheme. This helps to resist against some practical attacks, such as the well-known local oscillator calibration attack.
Energy Technology Data Exchange (ETDEWEB)
Kurtz, S.E.; Fields, D.E.
1983-10-01
The KSTEST code presented here is designed to perform the Kolmogorov-Smirnov one-sample test. The code may be used as a stand-alone program or the principal subroutines may be excerpted and used to service other programs. The Kolmogorov-Smirnov one-sample test is a nonparametric goodness-of-fit test. A number of codes to perform this test are in existence, but they suffer from the inability to provide meaningful results in the case of small sample sizes (number of values less than or equal to 80). The KSTEST code overcomes this inadequacy by using two distinct algorithms. If the sample size is greater than 80, an asymptotic series developed by Smirnov is evaluated. If the sample size is 80 or less, a table of values generated by Birnbaum is referenced. Valid results can be obtained from KSTEST when the sample contains from 3 to 300 data points. The program was developed on a Digital Equipment Corporation PDP-10 computer using the FORTRAN-10 language. The code size is approximately 450 card images and the typical CPU execution time is 0.19 s.
Statistical process control for alpha spectroscopy
Energy Technology Data Exchange (ETDEWEB)
Richardson, W; Majoras, R E [Oxford Instruments, Inc. P.O. Box 2560, Oak Ridge TN 37830 (United States); Joo, I O; Seymour, R S [Accu-Labs Research, Inc. 4663 Table Mountain Drive, Golden CO 80403 (United States)
1995-10-01
Statistical process control(SPC) allows for the identification of problems in alpha spectroscopy processes before they occur, unlike standard laboratory Q C which only identifies problems after a process fails. SPC tools that are directly applicable to alpha spectroscopy include individual X-charts and X-bar charts, process capability plots, and scatter plots. Most scientists are familiar with the concepts the and methods employed by SPC. These tools allow analysis of process bias, precision, accuracy and reproducibility as well as process capability. Parameters affecting instrument performance are monitored and analyzed using SPC methods. These instrument parameters can also be compared to sampling, preparation, measurement, and analysis Q C parameters permitting the evaluation of cause effect relationships. Three examples of SPC, as applied to alpha spectroscopy , are presented. The first example investigates background contamination using averaging to show trends quickly. A second example demonstrates how SPC can identify sample processing problems, analyzing both how and why this problem occurred. A third example illustrates how SPC can predict when an alpha spectroscopy process is going to fail. This allows for an orderly and timely shutdown of the process to perform preventative maintenance, avoiding the need to repeat costly sample analyses. 7 figs., 2 tabs.
Statistical process control for alpha spectroscopy
International Nuclear Information System (INIS)
Richardson, W.; Majoras, R.E.; Joo, I.O.; Seymour, R.S.
1995-01-01
Statistical process control(SPC) allows for the identification of problems in alpha spectroscopy processes before they occur, unlike standard laboratory Q C which only identifies problems after a process fails. SPC tools that are directly applicable to alpha spectroscopy include individual X-charts and X-bar charts, process capability plots, and scatter plots. Most scientists are familiar with the concepts the and methods employed by SPC. These tools allow analysis of process bias, precision, accuracy and reproducibility as well as process capability. Parameters affecting instrument performance are monitored and analyzed using SPC methods. These instrument parameters can also be compared to sampling, preparation, measurement, and analysis Q C parameters permitting the evaluation of cause effect relationships. Three examples of SPC, as applied to alpha spectroscopy , are presented. The first example investigates background contamination using averaging to show trends quickly. A second example demonstrates how SPC can identify sample processing problems, analyzing both how and why this problem occurred. A third example illustrates how SPC can predict when an alpha spectroscopy process is going to fail. This allows for an orderly and timely shutdown of the process to perform preventative maintenance, avoiding the need to repeat costly sample analyses. 7 figs., 2 tabs
Applied statistical designs for the researcher
Paulson, Daryl S
2003-01-01
Research and Statistics Basic Review of Parametric Statistics Exploratory Data Analysis Two Sample Tests Completely Randomized One-Factor Analysis of Variance One and Two Restrictions on Randomization Completely Randomized Two-Factor Factorial Designs Two-Factor Factorial Completely Randomized Blocked Designs Useful Small Scale Pilot Designs Nested Statistical Designs Linear Regression Nonparametric Statistics Introduction to Research Synthesis and "Meta-Analysis" and Conclusory Remarks References Index.
Constrained statistical inference : sample-size tables for ANOVA and regression
Vanbrabant, Leonard; Van De Schoot, Rens; Rosseel, Yves
2015-01-01
Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient β1 is larger than β2 and β3. The corresponding hypothesis is H: β1 > {β2, β3} and
Direction-of-Arrival Estimation Based on Sparse Recovery with Second-Order Statistics
Directory of Open Access Journals (Sweden)
H. Chen
2015-04-01
Full Text Available Traditional direction-of-arrival (DOA estimation techniques perform Nyquist-rate sampling of the received signals and as a result they require high storage. To reduce sampling ratio, we introduce level-crossing (LC sampling which captures samples whenever the signal crosses predetermined reference levels, and the LC-based analog-to-digital converter (LC ADC has been shown to efficiently sample certain classes of signals. In this paper, we focus on the DOA estimation problem by using second-order statistics based on the LC samplings recording on one sensor, along with the synchronous samplings of the another sensors, a sparse angle space scenario can be found by solving an $ell_1$ minimization problem, giving the number of sources and their DOA's. The experimental results show that our proposed method, when compared with some existing norm-based constrained optimization compressive sensing (CS algorithms, as well as subspace method, improves the DOA estimation performance, while using less samples when compared with Nyquist-rate sampling and reducing sensor activity especially for long time silence signal.
Toward cost-efficient sampling methods
Luo, Peng; Li, Yongli; Wu, Chong; Zhang, Guijie
2015-09-01
The sampling method has been paid much attention in the field of complex network in general and statistical physics in particular. This paper proposes two new sampling methods based on the idea that a small part of vertices with high node degree could possess the most structure information of a complex network. The two proposed sampling methods are efficient in sampling high degree nodes so that they would be useful even if the sampling rate is low, which means cost-efficient. The first new sampling method is developed on the basis of the widely used stratified random sampling (SRS) method and the second one improves the famous snowball sampling (SBS) method. In order to demonstrate the validity and accuracy of two new sampling methods, we compare them with the existing sampling methods in three commonly used simulation networks that are scale-free network, random network, small-world network, and also in two real networks. The experimental results illustrate that the two proposed sampling methods perform much better than the existing sampling methods in terms of achieving the true network structure characteristics reflected by clustering coefficient, Bonacich centrality and average path length, especially when the sampling rate is low.
Catch me if you can: Comparing ballast water sampling skids to traditional net sampling
Bradie, Johanna; Gianoli, Claudio; Linley, Robert Dallas; Schillak, Lothar; Schneider, Gerd; Stehouwer, Peter; Bailey, Sarah
2018-03-01
With the recent ratification of the International Convention for the Control and Management of Ships' Ballast Water and Sediments, 2004, it will soon be necessary to assess ships for compliance with ballast water discharge standards. Sampling skids that allow the efficient collection of ballast water samples in a compact space have been developed for this purpose. We ran 22 trials on board the RV Meteor from June 4-15, 2015 to evaluate the performance of three ballast water sampling devices (traditional plankton net, Triton sampling skid, SGS sampling skid) for three organism size classes: ≥ 50 μm, ≥ 10 μm to Natural sea water was run through the ballast water system and untreated samples were collected using paired sampling devices. Collected samples were analyzed in parallel by multiple analysts using several different analytic methods to quantify organism concentrations. To determine whether there were differences in the number of viable organisms collected across sampling devices, results were standardized and statistically treated to filter out other sources of variability, resulting in an outcome variable representing the mean difference in measurements that can be attributed to sampling devices. These results were tested for significance using pairwise Tukey contrasts. Differences in organism concentrations were found in 50% of comparisons between sampling skids and the plankton net for ≥ 50 μm, and ≥ 10 μm to < 50 μm size classes, with net samples containing either higher or lower densities. There were no differences for < 10 μm organisms. Future work will be required to explicitly examine the potential effects of flow velocity, sampling duration, sampled volume, and organism concentrations on sampling device performance.
Robustness of S1 statistic with Hodges-Lehmann for skewed distributions
Ahad, Nor Aishah; Yahaya, Sharipah Soaad Syed; Yin, Lee Ping
2016-10-01
Analysis of variance (ANOVA) is a common use parametric method to test the differences in means for more than two groups when the populations are normally distributed. ANOVA is highly inefficient under the influence of non- normal and heteroscedastic settings. When the assumptions are violated, researchers are looking for alternative such as Kruskal-Wallis under nonparametric or robust method. This study focused on flexible method, S1 statistic for comparing groups using median as the location estimator. S1 statistic was modified by substituting the median with Hodges-Lehmann and the default scale estimator with the variance of Hodges-Lehmann and MADn to produce two different test statistics for comparing groups. Bootstrap method was used for testing the hypotheses since the sampling distributions of these modified S1 statistics are unknown. The performance of the proposed statistic in terms of Type I error was measured and compared against the original S1 statistic, ANOVA and Kruskal-Wallis. The propose procedures show improvement compared to the original statistic especially under extremely skewed distribution.
Page, G P; Amos, C I; Boerwinkle, E
1998-04-01
We present a test statistic, the quantitative LOD (QLOD) score, for the testing of both linkage and exclusion of quantitative-trait loci in randomly selected human sibships. As with the traditional LOD score, the boundary values of 3, for linkage, and -2, for exclusion, can be used for the QLOD score. We investigated the sample sizes required for inferring exclusion and linkage, for various combinations of linked genetic variance, total heritability, recombination distance, and sibship size, using fixed-size sampling. The sample sizes required for both linkage and exclusion were not qualitatively different and depended on the percentage of variance being linked or excluded and on the total genetic variance. Information regarding linkage and exclusion in sibships larger than size 2 increased as approximately all possible pairs n(n-1)/2 up to sibships of size 6. Increasing the recombination (theta) distance between the marker and the trait loci reduced empirically the power for both linkage and exclusion, as a function of approximately (1-2theta)4.
Frontiers in statistical quality control
Wilrich, Peter-Theodor
2004-01-01
This volume treats the four main categories of Statistical Quality Control: General SQC Methodology, On-line Control including Sampling Inspection and Statistical Process Control, Off-line Control with Data Analysis and Experimental Design, and, fields related to Reliability. Experts with international reputation present their newest contributions.
A statistical evaluation of asbestos air concentrations
Energy Technology Data Exchange (ETDEWEB)
Lange, J.H. [Envirosafe Training and Consultants, Pittsburgh, PA (United States)
1999-07-01
Both area and personal air samples collected during an asbestos abatement project were matched and statistically analysed. Among the many parameters studied were fibre concentrations and their variability. Mean values for area and personal samples were 0.005 and 0.024 f cm{sup -}-{sup 3} of air, respectively. Summary values for area and personal samples suggest that exposures are low with no single exposure value exceeding the current OSHA TWA value of 0.1 f cm{sup -3} of air. Within- and between-worker analysis suggests that these data are homogeneous. Comparison of within- and between-worker values suggests that the exposure source and variability for abatement are more related to the process than individual practices. This supports the importance of control measures for abatement. Study results also suggest that area and personal samples are not statistically related, that is, there is no association observed for these two sampling methods when data are analysed by correlation or regression analysis. Personal samples were statistically higher in concentration than area samples. Area sampling cannot be used as a surrogate exposure for asbestos abatement workers. (author)
International Nuclear Information System (INIS)
Gallaher, B.; Mercier, T.; Black, P.; Mullen, K.
2000-01-01
Four governmental agencies conducted a round of groundwater, surface water, and spring water sampling at the Los Alamos National Laboratory during 1998. Samples were split among the four parties and sent to independent analytical laboratories. Results from three of the agencies were available for this study. Comparisons of analytical results that were paired by location and date were made between the various analytical laboratories. The results for over 50 split samples analyzed for inorganic chemicals, metals, and radionuclides were compared. Statistical analyses included non-parametric (sign test and signed-ranks test) and parametric (paired t-test and linear regression) methods. The data pairs were tested for statistically significant differences, defined by an observed significance level, or p-value, less than 0.05. The main conclusion is that the laboratories' performances are similar across most of the analytes that were measured. In some 95% of the laboratory measurements there was agreement on whether contaminant levels exceeded regulatory limits. The most significant differences in performance were noted for the radioactive suite, particularly for gross alpha particle activity and Sr-90
Multi-reader ROC studies with split-plot designs: a comparison of statistical methods.
Obuchowski, Nancy A; Gallas, Brandon D; Hillis, Stephen L
2012-12-01
Multireader imaging trials often use a factorial design, in which study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of this design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper, the authors compare three methods of analysis for the split-plot design. Three statistical methods are presented: the Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean analysis-of-variance approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power, and confidence interval coverage of the three test statistics. The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% confidence intervals falls close to the nominal coverage for small and large sample sizes. The split-plot multireader, multicase study design can be statistically efficient compared to the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rates, similar power, and nominal confidence interval coverage, are available for this study design. Copyright © 2012 AUR. All rights reserved.
National transportation statistics 2010
2010-01-01
National Transportation Statistics presents statistics on the U.S. transportation system, including its physical components, safety record, economic performance, the human and natural environment, and national security. This is a large online documen...
Statistical hadronization and hadronic micro-canonical ensemble II
International Nuclear Information System (INIS)
Becattini, F.; Ferroni, L.
2004-01-01
We present a Monte Carlo calculation of the micro-canonical ensemble of the ideal hadron-resonance gas including all known states up to a mass of about 1.8 GeV and full quantum statistics. The micro-canonical average multiplicities of the various hadron species are found to converge to the canonical ones for moderately low values of the total energy, around 8 GeV, thus bearing out previous analyses of hadronic multiplicities in the canonical ensemble. The main numerical computing method is an importance sampling Monte Carlo algorithm using the product of Poisson distributions to generate multi-hadronic channels. It is shown that the use of this multi-Poisson distribution allows for an efficient and fast computation of averages, which can be further improved in the limit of very large clusters. We have also studied the fitness of a previously proposed computing method, based on the Metropolis Monte Carlo algorithm, for event generation in the statistical hadronization model. We find that the use of the multi-Poisson distribution as proposal matrix dramatically improves the computation performance. However, due to the correlation of subsequent samples, this method proves to be generally less robust and effective than the importance sampling method. (orig.)
Mulholland, Henry
1968-01-01
Fundamentals of Statistics covers topics on the introduction, fundamentals, and science of statistics. The book discusses the collection, organization and representation of numerical data; elementary probability; the binomial Poisson distributions; and the measures of central tendency. The text describes measures of dispersion for measuring the spread of a distribution; continuous distributions for measuring on a continuous scale; the properties and use of normal distribution; and tests involving the normal or student's 't' distributions. The use of control charts for sample means; the ranges
How Sample Size Affects a Sampling Distribution
Mulekar, Madhuri S.; Siegel, Murray H.
2009-01-01
If students are to understand inferential statistics successfully, they must have a profound understanding of the nature of the sampling distribution. Specifically, they must comprehend the determination of the expected value and standard error of a sampling distribution as well as the meaning of the central limit theorem. Many students in a high…
Oravec, Heather Ann; Daniels, Christopher C.
2014-01-01
The National Aeronautics and Space Administration has been developing a novel docking system to meet the requirements of future exploration missions to low-Earth orbit and beyond. A dynamic gas pressure seal is located at the main interface between the active and passive mating components of the new docking system. This seal is designed to operate in the harsh space environment, but is also to perform within strict loading requirements while maintaining an acceptable level of leak rate. In this study, a candidate silicone elastomer seal was designed, and multiple subscale test articles were manufactured for evaluation purposes. The force required to fully compress each test article at room temperature was quantified and found to be below the maximum allowable load for the docking system. However, a significant amount of scatter was observed in the test results. Due to the stochastic nature of the mechanical performance of this candidate docking seal, a statistical process control technique was implemented to isolate unusual compression behavior from typical mechanical performance. The results of this statistical analysis indicated a lack of process control, suggesting a variation in the manufacturing phase of the process. Further investigation revealed that changes in the manufacturing molding process had occurred which may have influenced the mechanical performance of the seal. This knowledge improves the chance of this and future space seals to satisfy or exceed design specifications.
Statistical inferences with jointly type-II censored samples from two Pareto distributions
Abu-Zinadah, Hanaa H.
2017-08-01
In the several fields of industries the product comes from more than one production line, which is required to work the comparative life tests. This problem requires sampling of the different production lines, then the joint censoring scheme is appeared. In this article we consider the life time Pareto distribution with jointly type-II censoring scheme. The maximum likelihood estimators (MLE) and the corresponding approximate confidence intervals as well as the bootstrap confidence intervals of the model parameters are obtained. Also Bayesian point and credible intervals of the model parameters are presented. The life time data set is analyzed for illustrative purposes. Monte Carlo results from simulation studies are presented to assess the performance of our proposed method.
Güleda Doğan
2017-01-01
This editorial is on statistical sampling, which is one of the most two important reasons for editorial rejection from our journal Turkish Librarianship. The stages of quantitative research, the stage in which we are sampling, the importance of sampling for a research, deciding on sample size and sampling methods are summarised briefly.
Directory of Open Access Journals (Sweden)
Obed M. Ali
2015-12-01
Full Text Available In this study, the fuel properties and engine performance of blended palm biodiesel-diesel using diethyl ether as additive have been investigated. The properties of B30 blended palm biodiesel-diesel fuel were measured and analyzed statistically with the addition of 2%, 4%, 6% and 8% (by volume diethyl ether additive. The engine tests were conducted at increasing engine speeds from 1500 rpm to 3500 rpm and under constant load. Optimization of independent variables was performed using the desirability approach of the response surface methodology (RSM with the goal of minimizing emissions and maximizing performance parameters. The experiments were designed using a statistical tool known as design of experiments (DoE based on RSM.
Boslaugh, Sarah
2013-01-01
Need to learn statistics for your job? Want help passing a statistics course? Statistics in a Nutshell is a clear and concise introduction and reference for anyone new to the subject. Thoroughly revised and expanded, this edition helps you gain a solid understanding of statistics without the numbing complexity of many college texts. Each chapter presents easy-to-follow descriptions, along with graphics, formulas, solved examples, and hands-on exercises. If you want to perform common statistical analyses and learn a wide range of techniques without getting in over your head, this is your book.
Leyrat, Clémence; Caille, Agnès; Foucher, Yohann; Giraudeau, Bruno
2016-01-22
Despite randomization, baseline imbalance and confounding bias may occur in cluster randomized trials (CRTs). Covariate imbalance may jeopardize the validity of statistical inferences if they occur on prognostic factors. Thus, the diagnosis of a such imbalance is essential to adjust statistical analysis if required. We developed a tool based on the c-statistic of the propensity score (PS) model to detect global baseline covariate imbalance in CRTs and assess the risk of confounding bias. We performed a simulation study to assess the performance of the proposed tool and applied this method to analyze the data from 2 published CRTs. The proposed method had good performance for large sample sizes (n =500 per arm) and when the number of unbalanced covariates was not too small as compared with the total number of baseline covariates (≥40% of unbalanced covariates). We also provide a strategy for pre selection of the covariates needed to be included in the PS model to enhance imbalance detection. The proposed tool could be useful in deciding whether covariate adjustment is required before performing statistical analyses of CRTs.
Preferential sampling in veterinary parasitological surveillance
Directory of Open Access Journals (Sweden)
Lorenzo Cecconi
2016-04-01
Full Text Available In parasitological surveillance of livestock, prevalence surveys are conducted on a sample of farms using several sampling designs. For example, opportunistic surveys or informative sampling designs are very common. Preferential sampling refers to any situation in which the spatial process and the sampling locations are not independent. Most examples of preferential sampling in the spatial statistics literature are in environmental statistics with focus on pollutant monitors, and it has been shown that, if preferential sampling is present and is not accounted for in the statistical modelling and data analysis, statistical inference can be misleading. In this paper, working in the context of veterinary parasitology, we propose and use geostatistical models to predict the continuous and spatially-varying risk of a parasite infection. Specifically, breaking with the common practice in veterinary parasitological surveillance to ignore preferential sampling even though informative or opportunistic samples are very common, we specify a two-stage hierarchical Bayesian model that adjusts for preferential sampling and we apply it to data on Fasciola hepatica infection in sheep farms in Campania region (Southern Italy in the years 2013-2014.
An Evaluation of the Use of Statistical Procedures in Soil Science
Directory of Open Access Journals (Sweden)
Laene de Fátima Tavares
2016-01-01
Full Text Available ABSTRACT Experimental statistical procedures used in almost all scientific papers are fundamental for clearer interpretation of the results of experiments conducted in agrarian sciences. However, incorrect use of these procedures can lead the researcher to incorrect or incomplete conclusions. Therefore, the aim of this study was to evaluate the characteristics of the experiments and quality of the use of statistical procedures in soil science in order to promote better use of statistical procedures. For that purpose, 200 articles, published between 2010 and 2014, involving only experimentation and studies by sampling in the soil areas of fertility, chemistry, physics, biology, use and management were randomly selected. A questionnaire containing 28 questions was used to assess the characteristics of the experiments, the statistical procedures used, and the quality of selection and use of these procedures. Most of the articles evaluated presented data from studies conducted under field conditions and 27 % of all papers involved studies by sampling. Most studies did not mention testing to verify normality and homoscedasticity, and most used the Tukey test for mean comparisons. Among studies with a factorial structure of the treatments, many had ignored this structure, and data were compared assuming the absence of factorial structure, or the decomposition of interaction was performed without showing or mentioning the significance of the interaction. Almost none of the papers that had split-block factorial designs considered the factorial structure, or they considered it as a split-plot design. Among the articles that performed regression analysis, only a few of them tested non-polynomial fit models, and none reported verification of the lack of fit in the regressions. The articles evaluated thus reflected poor generalization and, in some cases, wrong generalization in experimental design and selection of procedures for statistical analysis.
Milewski, Emil G
2012-01-01
REA's Essentials provide quick and easy access to critical information in a variety of different fields, ranging from the most basic to the most advanced. As its name implies, these concise, comprehensive study guides summarize the essentials of the field covered. Essentials are helpful when preparing for exams, doing homework and will remain a lasting reference source for students, teachers, and professionals. Statistics II discusses sampling theory, statistical inference, independent and dependent variables, correlation theory, experimental design, count data, chi-square test, and time se
Statistical quality management using miniTAB 14
International Nuclear Information System (INIS)
An, Seong Jin
2007-01-01
This book explains statistical quality management giving descriptions of definition of quality, quality management, quality cost, basic methods of quality management, principles of control chart, control chart for variables, control chart for attributes, capability analysis, other issues of statistical process control, acceptance sampling, sampling for variable acceptance, design and analysis of experiment, Taguchi quality engineering, reaction surface methodology reliability analysis.
Leppäranta, Matti; Lewis, John E; Heini, Anniina; Arvola, Lauri
2018-06-04
Spatial variability, an essential characteristic of lake ecosystems, has often been neglected in field research and monitoring. In this study, we apply spatial statistical methods for the key physics and chemistry variables and chlorophyll a over eight sampling dates in two consecutive years in a large (area 103 km 2 ) eutrophic boreal lake in southern Finland. In the four summer sampling dates, the water body was vertically and horizontally heterogenic except with color and DOC, in the two winter ice-covered dates DO was vertically stratified, while in the two autumn dates, no significant spatial differences in any of the measured variables were found. Chlorophyll a concentration was one order of magnitude lower under the ice cover than in open water. The Moran statistic for spatial correlation was significant for chlorophyll a and NO 2 +NO 3 -N in all summer situations and for dissolved oxygen and pH in three cases. In summer, the mass centers of the chemicals were within 1.5 km from the geometric center of the lake, and the 2nd moment radius ranged in 3.7-4.1 km respective to 3.9 km for the homogeneous situation. The lateral length scales of the studied variables were 1.5-2.5 km, about 1 km longer in the surface layer. The detected spatial "noise" strongly suggests that besides vertical variation also the horizontal variation in eutrophic lakes, in particular, should be considered when the ecosystems are monitored.
Small sample approach, and statistical and epidemiological aspects
Offringa, Martin; van der Lee, Hanneke
2011-01-01
In this chapter, the design of pharmacokinetic studies and phase III trials in children is discussed. Classical approaches and relatively novel approaches, which may be more useful in the context of drug research in children, are discussed. The burden of repeated blood sampling in pediatric
Strong laws for L- and U-statistics
Aaronson, J; Burton, R; Dehling, H; Gilat, D; Hill, T; Weiss, B
Strong laws of large numbers are given for L-statistics (linear combinations of order statistics) and for U-statistics (averages of kernels of random samples) for ergodic stationary processes, extending classical theorems; of Hoeffding and of Helmers for lid sequences. Examples are given to show
Impact of Cesium decontamination on performances of high activity sample analysis
Energy Technology Data Exchange (ETDEWEB)
Maillard, Christophe; Boyer-Deslys, Valerie; Dautheribes, Jean L.; Esbelin, Eric; Beres, Andre; Jan, Steve; Baghdadi, Sarah; Rivier, Cedric [CEA, Nuclear Energy Division, Bagnols sur Ceze (France). RadioChemistry and Processes Dept.
2017-09-01
Experiments in the ATALANTE facility can lead to high activity samples (for example the dissolution of hulls and spent fuels), essentially coming from the presence of {sup 137}Cs. Usually, these samples are handled in a shielded cell. The removal of this radionuclide from the sample would make it possible to handle it in glove boxes without having to perform an important dilution in the shielded cells beforehand. It would allow to analyze samples using techniques usually implemented in glove boxes (such as ICP, α spectrometry..) and to reach lower detection and quantification limits. To do so, a separation by extraction chromatography using a Triskem AMP-PAN column was developed. A cesium decontamination factor higher than 5 x 10{sup 4} and detection limits improvement up to a factor 100 were obtained.
Indian Academy of Sciences (India)
inference and finite population sampling. Sudhakar Kunte. Elements of statistical computing are discussed in this series. ... which captain gets an option to decide whether to field first or bat first ... may of course not be fair, in the sense that the team which wins ... describe two methods of drawing a random number between 0.
Thompson, Steven K
2012-01-01
Praise for the Second Edition "This book has never had a competitor. It is the only book that takes a broad approach to sampling . . . any good personal statistics library should include a copy of this book." —Technometrics "Well-written . . . an excellent book on an important subject. Highly recommended." —Choice "An ideal reference for scientific researchers and other professionals who use sampling." —Zentralblatt Math Features new developments in the field combined with all aspects of obtaining, interpreting, and using sample data Sampling provides an up-to-date treat
International Nuclear Information System (INIS)
Vaganov, P.A.; Kol'tsov, A.A.; Kulikov, V.D.; Mejer, V.A.
1983-01-01
The multi-element instrumental neutron activation analysis of samples of mountain rocks (sandstones, aleurolites and shales of one of gold deposits) is performed. The spectra of irradiated samples are measured by Ge(Li) detector of the volume of 35 mm 3 . The content of 22 chemical elements is determined in each sample. The results of analysis serve as reliable basis for multi-dimensional statistic information processing, they constitute the basis for the generalized characteristics of rocks which brings about the solution of classification problem for rocks of different deposits
Frontiers in statistical quality control
Wilrich, Peter-Theodor
2001-01-01
The book is a collection of papers presented at the 5th International Workshop on Intelligent Statistical Quality Control in Würzburg, Germany. Contributions deal with methodology and successful industrial applications. They can be grouped in four catagories: Sampling Inspection, Statistical Process Control, Data Analysis and Process Capability Studies and Experimental Design.
Spatial delayed nonmatching-to-sample performances in long-living Ames dwarf mice.
Derenne, Adam; Brown-Borg, Holly M; Martner, Sarah; Wolff, Wendy; Frerking, Morgan
2014-01-17
Ames dwarf mice have an extended lifespan by comparison with normal mice. Behavioral testing has revealed that sometimes Ames dwarf mice also evince superior performances relative to normal mice, but in other cases they do not. In this experiment, Ames dwarf and normal mice were compared on a T-maze test and on a delayed nonmatching-to-sample variant of a T-maze test. On the simple T-maze, Ames dwarf and normal mice committed comparable numbers of errors. On the nonmatching-to-sample task, normal mice mastered the discrimination by the end of the experiment while Ames dwarf mice did not. The apparatus, distances traveled and session duration were equivalent between the two tasks. The poorer performances of Ames dwarf mice on the nonmatching-to-sample task suggests that Ames dwarf mice may not be as capable of learning relatively cognitively complex tasks as normal mice. © 2013.
The Math Problem: Advertising Students' Attitudes toward Statistics
Fullerton, Jami A.; Kendrick, Alice
2013-01-01
This study used the Students' Attitudes toward Statistics Scale (STATS) to measure attitude toward statistics among a national sample of advertising students. A factor analysis revealed four underlying factors make up the attitude toward statistics construct--"Interest & Future Applicability," "Confidence," "Statistical Tools," and "Initiative."…
Spiric, Aurelija; Trbovic, Dejana; Vranic, Danijela; Djinovic, Jasna; Petronijevic, Radivoj; Matekalo-Sverak, Vesna
2010-07-05
Studies performed on lipid extraction from animal and fish tissues do not provide information on its influence on fatty acid composition of the extracted lipids as well as on cholesterol content. Data presented in this paper indicate the impact of extraction procedures on fatty acid profile of fish lipids extracted by the modified Soxhlet and ASE (accelerated solvent extraction) procedure. Cholesterol was also determined by direct saponification method, too. Student's paired t-test used for comparison of the total fat content in carp fish population obtained by two extraction methods shows that differences between values of the total fat content determined by ASE and modified Soxhlet method are not statistically significant. Values obtained by three different methods (direct saponification, ASE and modified Soxhlet method), used for determination of cholesterol content in carp, were compared by one-way analysis of variance (ANOVA). The obtained results show that modified Soxhlet method gives results which differ significantly from the results obtained by direct saponification and ASE method. However the results obtained by direct saponification and ASE method do not differ significantly from each other. The highest quantities for cholesterol (37.65 to 65.44 mg/100 g) in the analyzed fish muscle were obtained by applying direct saponification method, as less destructive one, followed by ASE (34.16 to 52.60 mg/100 g) and modified Soxhlet extraction method (10.73 to 30.83 mg/100 g). Modified Soxhlet method for extraction of fish lipids gives higher values for n-6 fatty acids than ASE method (t(paired)=3.22 t(c)=2.36), while there is no statistically significant difference in the n-3 content levels between the methods (t(paired)=1.31). The UNSFA/SFA ratio obtained by using modified Soxhlet method is also higher than the ratio obtained using ASE method (t(paired)=4.88 t(c)=2.36). Results of Principal Component Analysis (PCA) showed that the highest positive impact to
Use of statistical process control in evaluation of academic performance
Directory of Open Access Journals (Sweden)
Ezequiel Gibbon Gautério
2014-05-01
Full Text Available The aim of this article was to study some indicators of academic performance (number of students per class, dropout rate, failure rate and scores obtained by the students to identify a pattern of behavior that would enable to implement improvements in the teaching-learning process. The sample was composed of five classes of undergraduate courses in Engineering. The data were collected for three years. Initially an exploratory analysis with analytical and graphical techniques was performed. An analysis of variance and Tukey’s test investigated some sources of variability. This information was used in the construction of control charts. We have found evidence that classes with more students are associated with higher failure rates and lower mean. Moreover, when the course was later in the curriculum, the students had higher scores. The results showed that although they have been detected some special causes interfering in the process, it was possible to stabilize it and to monitor it.
Some connections between importance sampling and enhanced sampling methods in molecular dynamics.
Lie, H C; Quer, J
2017-11-21
In molecular dynamics, enhanced sampling methods enable the collection of better statistics of rare events from a reference or target distribution. We show that a large class of these methods is based on the idea of importance sampling from mathematical statistics. We illustrate this connection by comparing the Hartmann-Schütte method for rare event simulation (J. Stat. Mech. Theor. Exp. 2012, P11004) and the Valsson-Parrinello method of variationally enhanced sampling [Phys. Rev. Lett. 113, 090601 (2014)]. We use this connection in order to discuss how recent results from the Monte Carlo methods literature can guide the development of enhanced sampling methods.
Equivalent statistics and data interpretation.
Francis, Gregory
2017-08-01
Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.
Sample size determination and power
Ryan, Thomas P, Jr
2013-01-01
THOMAS P. RYAN, PhD, teaches online advanced statistics courses for Northwestern University and The Institute for Statistics Education in sample size determination, design of experiments, engineering statistics, and regression analysis.
Statistical sampling strategies for survey of soil contamination
Brus, D.J.
2011-01-01
This chapter reviews methods for selecting sampling locations in contaminated soils for three situations. In the first situation a global estimate of the soil contamination in an area is required. The result of the surey is a number or a series of numbers per contaminant, e.g. the estimated mean
Basics of modern mathematical statistics
Spokoiny, Vladimir
2015-01-01
This textbook provides a unified and self-contained presentation of the main approaches to and ideas of mathematical statistics. It collects the basic mathematical ideas and tools needed as a basis for more serious studies or even independent research in statistics. The majority of existing textbooks in mathematical statistics follow the classical asymptotic framework. Yet, as modern statistics has changed rapidly in recent years, new methods and approaches have appeared. The emphasis is on finite sample behavior, large parameter dimensions, and model misspecifications. The present book provides a fully self-contained introduction to the world of modern mathematical statistics, collecting the basic knowledge, concepts and findings needed for doing further research in the modern theoretical and applied statistics. This textbook is primarily intended for graduate and postdoc students and young researchers who are interested in modern statistical methods.
Statistical intervals a guide for practitioners
Hahn, Gerald J
2011-01-01
Presents a detailed exposition of statistical intervals and emphasizes applications in industry. The discussion differentiates at an elementary level among different kinds of statistical intervals and gives instruction with numerous examples and simple math on how to construct such intervals from sample data. This includes confidence intervals to contain a population percentile, confidence intervals on probability of meeting specified threshold value, and prediction intervals to include observation in a future sample. Also has an appendix containing computer subroutines for nonparametric stati
Directory of Open Access Journals (Sweden)
AbdolRahman Maleki Afzal
2017-06-01
Full Text Available The purpose of this research is to review the relationship between effectiveness of marketing strategies performance Marketing Bank branches in the city of Sanandaj. This research in terms of purpose is applied; and also in terms of method is descriptive correlational research and in term of duration is a single period. The statistical population contains staff at the branch of the bank of Sanandaj city. To choose the statistical sample by using classical random sampling and according to the statistical formula, a sample of 302 people is selected from among the statistical population. According to research methodology and the type of the underlying data in this study, questionnaire of standard marketing strategies and researcher marketing performance are used as the main tool to measure and data collection. In order to do the statistical analysis, SPSS statistical analysis software is used. The findings show that there is a significant positive relationship between the effectiveness of marketing strategies and Performance Marketing Banks city of Sanandaj. Also, the results show that between all of the components of the effectiveness of marketing strategies and Performance Marketing (customer-philosophy (0.611, Integrated Marketing Effort (0.562, Marketing Information (0.361, Strategic orientation (0.573, Orientation Efficiency (0.678 there is a significant relationship.
Snell, Kym Ie; Ensor, Joie; Debray, Thomas Pa; Moons, Karel Gm; Riley, Richard D
2017-01-01
If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.
Statistical cluster analysis and diagnosis of nuclear system level performance
International Nuclear Information System (INIS)
Teichmann, T.; Levine, M.M.; Samanta, P.K.; Kato, W.Y.
1985-01-01
The complexity of individual nuclear power plants and the importance of maintaining reliable and safe operations makes it desirable to complement the deterministic analyses of these plants by corresponding statistical surveys and diagnoses. Based on such investigations, one can then explore, statistically, the anticipation, prevention, and when necessary, the control of such failures and malfunctions. This paper, and the accompanying one by Samanta et al., describe some of the initial steps in exploring the feasibility of setting up such a program on an integrated and global (industry-wide) basis. The conceptual statistical and data framework was originally outlined in BNL/NUREG-51609, NUREG/CR-3026, and the present work aims at showing how some important elements might be implemented in a practical way (albeit using hypothetical or simulated data)
Zhang, Fanghong; Miyaoka, Etsuo; Huang, Fuping; Tanaka, Yutaka
2015-01-01
The problem for establishing noninferiority is discussed between a new treatment and a standard (control) treatment with ordinal categorical data. A measure of treatment effect is used and a method of specifying noninferiority margin for the measure is provided. Two Z-type test statistics are proposed where the estimation of variance is constructed under the shifted null hypothesis using U-statistics. Furthermore, the confidence interval and the sample size formula are given based on the proposed test statistics. The proposed procedure is applied to a dataset from a clinical trial. A simulation study is conducted to compare the performance of the proposed test statistics with that of the existing ones, and the results show that the proposed test statistics are better in terms of the deviation from nominal level and the power.
Statistical physics of fracture: scientific discovery through high-performance computing
International Nuclear Information System (INIS)
Kumar, Phani; Nukala, V V; Simunovic, Srdan; Mills, Richard T
2006-01-01
The paper presents the state-of-the-art algorithmic developments for simulating the fracture of disordered quasi-brittle materials using discrete lattice systems. Large scale simulations are often required to obtain accurate scaling laws; however, due to computational complexity, the simulations using the traditional algorithms were limited to small system sizes. We have developed two algorithms: a multiple sparse Cholesky downdating scheme for simulating 2D random fuse model systems, and a block-circulant preconditioner for simulating 2D random fuse model systems. Using these algorithms, we were able to simulate fracture of largest ever lattice system sizes (L = 1024 in 2D, and L = 64 in 3D) with extensive statistical sampling. Our recent simulations on 1024 processors of Cray-XT3 and IBM Blue-Gene/L have further enabled us to explore fracture of 3D lattice systems of size L = 200, which is a significant computational achievement. These largest ever numerical simulations have enhanced our understanding of physics of fracture; in particular, we analyze damage localization and its deviation from percolation behavior, scaling laws for damage density, universality of fracture strength distribution, size effect on the mean fracture strength, and finally the scaling of crack surface roughness
Avoiding Pitfalls in the Statistical Analysis of Heterogeneous Tumors
Directory of Open Access Journals (Sweden)
Judith-Anne W. Chapman
2009-01-01
Full Text Available Information about tumors is usually obtained from a single assessment of a tumor sample, performed at some point in the course of the development and progression of the tumor, with patient characteristics being surrogates for natural history context. Differences between cells within individual tumors (intratumor heterogeneity and between tumors of different patients (intertumor heterogeneity may mean that a small sample is not representative of the tumor as a whole, particularly for solid tumors which are the focus of this paper. This issue is of increasing importance as high-throughput technologies generate large multi-feature data sets in the areas of genomics, proteomics, and image analysis. Three potential pitfalls in statistical analysis are discussed (sampling, cut-points, and validation and suggestions are made about how to avoid these pitfalls.
Probability an introduction with statistical applications
Kinney, John J
2014-01-01
Praise for the First Edition""This is a well-written and impressively presented introduction to probability and statistics. The text throughout is highly readable, and the author makes liberal use of graphs and diagrams to clarify the theory."" - The StatisticianThoroughly updated, Probability: An Introduction with Statistical Applications, Second Edition features a comprehensive exploration of statistical data analysis as an application of probability. The new edition provides an introduction to statistics with accessible coverage of reliability, acceptance sampling, confidence intervals, h
Gika, Helen G; Theodoridis, Georgios A; Wilson, Ian D
2008-05-02
Typically following collection biological samples are kept in a freezer for periods ranging from a few days to several months before analysis. Experience has shown that in LC-MS-based metabonomics research the best analytical practice is to store samples as these are collected, complete the sample set and analyse it in a single run. However, this approach is prudent only if the samples stored in the refrigerator or in the freezer are stable. Another important issue is the stability of the samples following the freeze-thaw process. To investigate these matters urine samples were collected from 6 male volunteers and analysed by LC-MS and ultra-performance liquid chromatography (UPLC)-MS [in both positive and negative electrospray ionization (ESI)] on the day of collection or at intervals of up to 6 months storage at -20 degrees C and -80 degrees C. Other sets of these samples underwent a series of up to nine freeze-thaw cycles. The stability of samples kept at 4 degrees C in an autosampler for up to 6 days was also assessed, with clear differences appearing after 48h. Data was analysed using multivariate statistical analysis (principal component analysis). The results show that sample storage at both -20 and -80 degrees C appeared to ensure sample stability. Similarly up to nine freeze thaw cycles were without any apparent effect on the profile.
STATISTICAL TOOLS FOR CLASSIFYING GALAXY GROUP DYNAMICS
International Nuclear Information System (INIS)
Hou, Annie; Parker, Laura C.; Harris, William E.; Wilman, David J.
2009-01-01
The dynamical state of galaxy groups at intermediate redshifts can provide information about the growth of structure in the universe. We examine three goodness-of-fit tests, the Anderson-Darling (A-D), Kolmogorov, and χ 2 tests, in order to determine which statistical tool is best able to distinguish between groups that are relaxed and those that are dynamically complex. We perform Monte Carlo simulations of these three tests and show that the χ 2 test is profoundly unreliable for groups with fewer than 30 members. Power studies of the Kolmogorov and A-D tests are conducted to test their robustness for various sample sizes. We then apply these tests to a sample of the second Canadian Network for Observational Cosmology Redshift Survey (CNOC2) galaxy groups and find that the A-D test is far more reliable and powerful at detecting real departures from an underlying Gaussian distribution than the more commonly used χ 2 and Kolmogorov tests. We use this statistic to classify a sample of the CNOC2 groups and find that 34 of 106 groups are inconsistent with an underlying Gaussian velocity distribution, and thus do not appear relaxed. In addition, we compute velocity dispersion profiles (VDPs) for all groups with more than 20 members and compare the overall features of the Gaussian and non-Gaussian groups, finding that the VDPs of the non-Gaussian groups are distinct from those classified as Gaussian.
Statistical analysis of archeomagnetic samples of Teotihuacan, Mexico
Soler-Arechalde, A. M.
2012-12-01
Teotihuacan was the one of the most important metropolis of Mesoamerica during the Classic Period (1 to 600 AC). The city had a continuous growth in different stages that usually concluded with a ritual. Fire was an important element natives would burn entire structures. An example of this is the Quetzalcoatl pyramid in La Ciudadela (350 AC), it was burned and a new structure was built over it, also the Big Fire at 570 AC, that marks its end. These events are suitable to archaeomagnetic dating. The inclusion of ash in the stucco enhances the magnetic signal of detrital type that also allows us to make dating. This increases the number of samples to be processed as well as the number of dates. The samples have been analyzed according to their type: floor, wall, talud and painting and whether or not exposed to fire. Sequences of directions obtained in excavations in strict stratigraphic control will be shown. A sequence of images was used to analyze the improving of Teotihuacan secular variation curve through more than a decade of continuous work at the area.
Statistical study on the strength of structural materials and elements
International Nuclear Information System (INIS)
Blume, J.A.; Dalal, J.S.; Honda, K.K.
1975-07-01
Strength data for structural materials and elements including concrete, reinforcing steel, structural steel, plywood elements, reinforced concrete beams, reinforced concrete columns, brick masonry elements, and concrete masonry walls were statistically analyzed. Sample statistics were computed for these data, and distribution parameters were derived for normal, lognormal, and Weibull distributions. Goodness-of-fit tests were performed on these distributions. Most data, except those for masonry elements, displayed fairly small dispersion. Dispersion in data for structural materials was generally found to be smaller than for structural elements. Lognormal and Weibull distributions displayed better overall fits to data than normal distribution, although either Weibull or lognormal distribution can be used to represent the data analyzed. (auth)
Errors in patient specimen collection: application of statistical process control.
Dzik, Walter Sunny; Beckman, Neil; Selleng, Kathleen; Heddle, Nancy; Szczepiorkowski, Zbigniew; Wendel, Silvano; Murphy, Michael
2008-10-01
Errors in the collection and labeling of blood samples for pretransfusion testing increase the risk of transfusion-associated patient morbidity and mortality. Statistical process control (SPC) is a recognized method to monitor the performance of a critical process. An easy-to-use SPC method was tested to determine its feasibility as a tool for monitoring quality in transfusion medicine. SPC control charts were adapted to a spreadsheet presentation. Data tabulating the frequency of mislabeled and miscollected blood samples from 10 hospitals in five countries from 2004 to 2006 were used to demonstrate the method. Control charts were produced to monitor process stability. The participating hospitals found the SPC spreadsheet very suitable to monitor the performance of the sample labeling and collection and applied SPC charts to suit their specific needs. One hospital monitored subcategories of sample error in detail. A large hospital monitored the number of wrong-blood-in-tube (WBIT) events. Four smaller-sized facilities, each following the same policy for sample collection, combined their data on WBIT samples into a single control chart. One hospital used the control chart to monitor the effect of an educational intervention. A simple SPC method is described that can monitor the process of sample collection and labeling in any hospital. SPC could be applied to other critical steps in the transfusion processes as a tool for biovigilance and could be used to develop regional or national performance standards for pretransfusion sample collection. A link is provided to download the spreadsheet for free.
Nonparametric statistics for social and behavioral sciences
Kraska-MIller, M
2013-01-01
Introduction to Research in Social and Behavioral SciencesBasic Principles of ResearchPlanning for ResearchTypes of Research Designs Sampling ProceduresValidity and Reliability of Measurement InstrumentsSteps of the Research Process Introduction to Nonparametric StatisticsData AnalysisOverview of Nonparametric Statistics and Parametric Statistics Overview of Parametric Statistics Overview of Nonparametric StatisticsImportance of Nonparametric MethodsMeasurement InstrumentsAnalysis of Data to Determine Association and Agreement Pearson Chi-Square Test of Association and IndependenceContingency
Statistical prediction of Late Miocene climate
Digital Repository Service at National Institute of Oceanography (India)
Fernandes, A.A; Gupta, S.M.
by making certain simplifying assumptions; for example in modelling ocean 4 currents, the geostrophic approximation is made. In case of statistical prediction no such a priori assumption need be made. statistical prediction comprises of using observed data... the number of equations. In this case the equations are overdetermined, and therefore one has to look for a solution that best fits the sample data in a least squares sense. To this end we express the sample .... (2.1)+ ry = y + data as follows: n L c. (x...
The Statistics and Mathematics of High Dimension Low Sample Size Asymptotics.
Shen, Dan; Shen, Haipeng; Zhu, Hongtu; Marron, J S
2016-10-01
The aim of this paper is to establish several deep theoretical properties of principal component analysis for multiple-component spike covariance models. Our new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size. When this ratio converges to a nonzero constant, the sample eigenvector converges to a cone, with a certain angle to its corresponding population eigenvector. In the High Dimension, Low Sample Size case, the angle between the sample eigenvector and its population counterpart converges to a limiting distribution. Several generalizations of the multi-spike covariance models are also explored, and additional theoretical results are presented.
Oxidant-antioxidant status in tissue samples of oral leukoplakia
Directory of Open Access Journals (Sweden)
Kumar Chandan Srivastava
2014-01-01
Full Text Available Background: Imbalances between the oxidant-antioxidant status have been implicated in the pathogenesis of several diseases, including oral cancer. Majority of oral cancer are preceded by a well-recognized group of pre-malignant lesions. However, only a small fraction of those lesions, undergo malignant transformation. Hence, there is a great need to identify biological markers, which will assist in identifying lesion carrying high-risk. This study aims to evaluate and compare the status of oxidative stress and antioxidant enzymes in tissue samples of patients with various clinicopathological stages of oral pre-malignancy. Materials and Methods: A case control study consisting of 20 new histopathologically proven leukoplakia patients and equal number of age, sex, and habit matched healthy subjects were recruited for this study. Their tissue samples were subjected to evaluation of lipid peroxidation product, thiobarbituric acid reactive substances and antioxidant enzymes, namely, superoxide dismutase (SOD, catalase (CAT, reduced glutathione (GSH, and glutathione peroxidase (GPx using spectrophotometric methods. The data are expressed as mean ± standard deviation. The statistical comparisons were performed by independent Student′s unpaired t-test and one-way analysis of variance. Pearson′s correlation was performed for the biochemical parameters within the group and between the groups. For statistically significant correlations, simple linear regression was performed. P- value < 0.05 was considered statistically significant. Results: Significant reduction in lipid peroxidation (P < 0.001 SOD and CAT (P < 0.001 was observed in the tissue of leukoplakia patients as compared to the healthy controls. On the other hand, GSH and GPx were significantly increased in tumor samples. Conclusion: Reduced lipid peroxidation and increased activity of GSH and GPx provides the suitable environment for the tumor growth and malignant transformation in the later
Understanding the Sampling Distribution and the Central Limit Theorem.
Lewis, Charla P.
The sampling distribution is a common source of misuse and misunderstanding in the study of statistics. The sampling distribution, underlying distribution, and the Central Limit Theorem are all interconnected in defining and explaining the proper use of the sampling distribution of various statistics. The sampling distribution of a statistic is…
Research design and statistical analysis
Myers, Jerome L; Lorch Jr, Robert F
2013-01-01
Research Design and Statistical Analysis provides comprehensive coverage of the design principles and statistical concepts necessary to make sense of real data. The book's goal is to provide a strong conceptual foundation to enable readers to generalize concepts to new research situations. Emphasis is placed on the underlying logic and assumptions of the analysis and what it tells the researcher, the limitations of the analysis, and the consequences of violating assumptions. Sampling, design efficiency, and statistical models are emphasized throughout. As per APA recommendations
USING STATISTICAL SURVEY IN ECONOMICS
Directory of Open Access Journals (Sweden)
Delia TESELIOS
2012-01-01
Full Text Available Statistical survey is an effective method of statistical investigation that involves gathering quantitative data, which is often preferred in statistical reports due to the information which can be obtained regarding the entire population studied by observing a part of it. Therefore, because of the information provided, polls are used in many research areas. In economics, statistics are used in the decision making process in choosing competitive strategies in the analysis of certain economic phenomena, the formulation of forecasts. Economic study presented in this paper is to illustrate how a simple random sampling is used to analyze the existing parking spaces situation in a given locality.
Environmental radionuclide concentrations: statistical model to determine uniformity of distribution
International Nuclear Information System (INIS)
Cawley, C.N.; Fenyves, E.J.; Spitzberg, D.B.; Wiorkowski, J.; Chehroudi, M.T.
1980-01-01
In the evaluation of data from environmental sampling and measurement, a basic question is whether the radionuclide (or pollutant) is distributed uniformly. Since physical measurements have associated errors, it is inappropriate to consider the measurements alone in this determination. Hence, a statistical model has been developed. It consists of a weighted analysis of variance with subsequent t-tests between weighted and independent means. A computer program to perform the calculations is included
Use of Statistical Heuristics in Everyday Inductive Reasoning.
Nisbett, Richard E.; And Others
1983-01-01
In everyday reasoning, people use statistical heuristics (judgmental tools that are rough intuitive equivalents of statistical principles). Use of statistical heuristics is more likely when (1) sampling is clear, (2) the role of chance is clear, (3) statistical reasoning is normative for the event, or (4) the subject has had training in…
International Nuclear Information System (INIS)
Bennett, J.T.; Crowder, C.A.; Connolly, M.J.
1994-01-01
Gas samples from drums of radioactive waste at the Department of Energy (DOE) Idaho National Engineering Laboratory are being characterized for 29 volatile organic compounds to determine the feasibility of storing the waste in DOE's Waste Isolation Pilot Plant (WIPP) in Carlsbad, New Mexico. Quality requirements for the gas chromatography (GC) and GC/mass spectrometry chemical methods used to analyze the waste are specified in the Quality Assurance Program Plan for the WIPP Experimental Waste Characterization Program. Quality requirements consist of both objective criteria (data quality objectives, DQOs) and statistical criteria (process control). The DQOs apply to routine sample analyses, while the statistical criteria serve to determine and monitor precision and accuracy (P ampersand A) of the analysis methods and are also used to assign upper confidence limits to measurement results close to action levels. After over two years and more than 1000 sample analyses there are two general conclusions concerning the two approaches to quality control: (1) Objective criteria (e.g., ± 25% precision, ± 30% accuracy) based on customer needs and the usually prescribed criteria for similar EPA- approved methods are consistently attained during routine analyses. (2) Statistical criteria based on short term method performance are almost an order of magnitude more stringent than objective criteria and are difficult to satisfy following the same routine laboratory procedures which satisfy the objective criteria. A more cost effective and representative approach to establishing statistical method performances criteria would be either to utilize a moving average of P ampersand A from control samples over a several month time period or to determine within a sample variation by one-way analysis of variance of several months replicate sample analysis results or both. Confidence intervals for results near action levels could also be determined by replicate analysis of the sample in
Estimation of global network statistics from incomplete data.
Directory of Open Access Journals (Sweden)
Catherine A Bliss
Full Text Available Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week.
Gómez, Miguel A; Lorenzo, Alberto; Barakat, Rubén; Ortega, Enrique; Palao, José M
2008-02-01
The aim of the present study was to identify game-related statistics that differentiate winning and losing teams according to game location. The sample included 306 games of the 2004-2005 regular season of the Spanish professional men's league (ACB League). The independent variables were game location (home or away) and game result (win or loss). The game-related statistics registered were free throws (successful and unsuccessful), 2- and 3-point field goals (successful and unsuccessful), offensive and defensive rebounds, blocks, assists, fouls, steals, and turnovers. Descriptive and inferential analyses were done (one-way analysis of variance and discriminate analysis). The multivariate analysis showed that winning teams differ from losing teams in defensive rebounds (SC = .42) and in assists (SC = .38). Similarly, winning teams differ from losing teams when they play at home in defensive rebounds (SC = .40) and in assists (SC = .41). On the other hand, winning teams differ from losing teams when they play away in defensive rebounds (SC = .44), assists (SC = .30), successful 2-point field goals (SC = .31), and unsuccessful 3-point field goals (SC = -.35). Defensive rebounds and assists were the only game-related statistics common to all three analyses.
Pseudo-populations a basic concept in statistical surveys
Quatember, Andreas
2015-01-01
This book emphasizes that artificial or pseudo-populations play an important role in statistical surveys from finite universes in two manners: firstly, the concept of pseudo-populations may substantially improve users’ understanding of various aspects in the sampling theory and survey methodology; an example of this scenario is the Horvitz-Thompson estimator. Secondly, statistical procedures exist in which pseudo-populations actually have to be generated. An example of such a scenario can be found in simulation studies in the field of survey sampling, where close-to-reality pseudo-populations are generated from known sample and population data to form the basis for the simulation process. The chapters focus on estimation methods, sampling techniques, nonresponse, questioning designs and statistical disclosure control.This book is a valuable reference in understanding the importance of the pseudo-population concept and applying it in teaching and research.
Using robust statistics to improve neutron activation analysis results
International Nuclear Information System (INIS)
Zahn, Guilherme S.; Genezini, Frederico A.; Ticianelli, Regina B.; Figueiredo, Ana Maria G.
2011-01-01
Neutron activation analysis (NAA) is an analytical technique where an unknown sample is submitted to a neutron flux in a nuclear reactor, and its elemental composition is calculated by measuring the induced activity produced. By using the relative NAA method, one or more well-characterized samples (usually certified reference materials - CRMs) are irradiated together with the unknown ones, and the concentration of each element is then calculated by comparing the areas of the gamma ray peaks related to that element. When two or more CRMs are used as reference, the concentration of each element can be determined by several different ways, either using more than one gamma ray peak for that element (when available), or using the results obtained in the comparison with each CRM. Therefore, determining the best estimate for the concentration of each element in the sample can be a delicate issue. In this work, samples from three CRMs were irradiated together and the elemental concentration in one of them was calculated using the other two as reference. Two sets of peaks were analyzed for each element: a smaller set containing only the literature-recommended gamma-ray peaks and a larger one containing all peaks related to that element that could be quantified in the gamma-ray spectra; the most recommended transition was also used as a benchmark. The resulting data for each element was then reduced using up to five different statistical approaches: the usual (and not robust) unweighted and weighted means, together with three robust means: the Limitation of Relative Statistical Weight, Normalized Residuals and Rajeval. The resulting concentration values were then compared to the certified value for each element, allowing for discussion on both the performance of each statistical tool and on the best choice of peaks for each element. (author)
Final Sampling and Analysis Plan for Background Sampling, Fort Sheridan, Illinois
National Research Council Canada - National Science Library
1995-01-01
.... This Background Sampling and Analysis Plan (BSAP) is designed to address this issue through the collection of additional background samples at Fort Sheridan to support the statistical analysis and the Baseline Risk Assessment (BRA...
International Nuclear Information System (INIS)
Yahya, Noorazrul; Ebert, Martin A.; Bulsara, Max; House, Michael J.; Kennedy, Angel; Joseph, David J.; Denham, James W.
2016-01-01
Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥ 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions
Energy Technology Data Exchange (ETDEWEB)
Yahya, Noorazrul, E-mail: noorazrul.yahya@research.uwa.edu.au [School of Physics, University of Western Australia, Western Australia 6009, Australia and School of Health Sciences, National University of Malaysia, Bangi 43600 (Malaysia); Ebert, Martin A. [School of Physics, University of Western Australia, Western Australia 6009, Australia and Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008 (Australia); Bulsara, Max [Institute for Health Research, University of Notre Dame, Fremantle, Western Australia 6959 (Australia); House, Michael J. [School of Physics, University of Western Australia, Western Australia 6009 (Australia); Kennedy, Angel [Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008 (Australia); Joseph, David J. [Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008, Australia and School of Surgery, University of Western Australia, Western Australia 6009 (Australia); Denham, James W. [School of Medicine and Public Health, University of Newcastle, New South Wales 2308 (Australia)
2016-05-15
Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥ 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions
Writing to Learn Statistics in an Advanced Placement Statistics Course
Northrup, Christian Glenn
2012-01-01
This study investigated the use of writing in a statistics classroom to learn if writing provided a rich description of problem-solving processes of students as they solved problems. Through analysis of 329 written samples provided by students, it was determined that writing provided a rich description of problem-solving processes and enabled…
Directory of Open Access Journals (Sweden)
Scott Morris
Full Text Available Next generation sequencing tests (NGS are usually performed on relatively small core biopsy or fine needle aspiration (FNA samples. Data is limited on what amount of tumor by volume or minimum number of FNA passes are needed to yield sufficient material for running NGS. We sought to identify the amount of tumor for running the PCDx NGS platform.2,723 consecutive tumor tissues of all cancer types were queried and reviewed for inclusion. Information on tumor volume, success of performing NGS, and results of NGS were compiled. Assessment of sequence analysis, mutation calling and sensitivity, quality control, drug associations, and data aggregation and analysis were performed.6.4% of samples were rejected from all testing due to insufficient tumor quantity. The number of genes with insufficient sensitivity make definitive mutation calls increased as the percentage of tumor decreased, reaching statistical significance below 5% tumor content. The number of drug associations also decreased with a lower percentage of tumor, but this difference only became significant between 1-3%. The number of drug associations did decrease with smaller tissue size as expected. Neither specimen size or percentage of tumor affected the ability to pass mRNA quality control. A tumor area of 10 mm2 provides a good margin of error for specimens to yield adequate drug association results.Specimen suitability remains a major obstacle to clinical NGS testing. We determined that PCR-based library creation methods allow the use of smaller specimens, and those with a lower percentage of tumor cells to be run on the PCDx NGS platform.
A comparison of fitness-case sampling methods for genetic programming
Martínez, Yuliana; Naredo, Enrique; Trujillo, Leonardo; Legrand, Pierrick; López, Uriel
2017-11-01
Genetic programming (GP) is an evolutionary computation paradigm for automatic program induction. GP has produced impressive results but it still needs to overcome some practical limitations, particularly its high computational cost, overfitting and excessive code growth. Recently, many researchers have proposed fitness-case sampling methods to overcome some of these problems, with mixed results in several limited tests. This paper presents an extensive comparative study of four fitness-case sampling methods, namely: Interleaved Sampling, Random Interleaved Sampling, Lexicase Selection and Keep-Worst Interleaved Sampling. The algorithms are compared on 11 symbolic regression problems and 11 supervised classification problems, using 10 synthetic benchmarks and 12 real-world data-sets. They are evaluated based on test performance, overfitting and average program size, comparing them with a standard GP search. Comparisons are carried out using non-parametric multigroup tests and post hoc pairwise statistical tests. The experimental results suggest that fitness-case sampling methods are particularly useful for difficult real-world symbolic regression problems, improving performance, reducing overfitting and limiting code growth. On the other hand, it seems that fitness-case sampling cannot improve upon GP performance when considering supervised binary classification.
The large sample size fallacy.
Lantz, Björn
2013-06-01
Significance in the statistical sense has little to do with significance in the common practical sense. Statistical significance is a necessary but not a sufficient condition for practical significance. Hence, results that are extremely statistically significant may be highly nonsignificant in practice. The degree of practical significance is generally determined by the size of the observed effect, not the p-value. The results of studies based on large samples are often characterized by extreme statistical significance despite small or even trivial effect sizes. Interpreting such results as significant in practice without further analysis is referred to as the large sample size fallacy in this article. The aim of this article is to explore the relevance of the large sample size fallacy in contemporary nursing research. Relatively few nursing articles display explicit measures of observed effect sizes or include a qualitative discussion of observed effect sizes. Statistical significance is often treated as an end in itself. Effect sizes should generally be calculated and presented along with p-values for statistically significant results, and observed effect sizes should be discussed qualitatively through direct and explicit comparisons with the effects in related literature. © 2012 Nordic College of Caring Science.
International Nuclear Information System (INIS)
Potter, G.L.; Ellsaesser, H.W.; MacCracken, M.C.; Luther, F.M.
1978-06-01
Results from the zonal model indicate quite reasonable agreement with observation in terms of the parameters and processes that influence the radiation and energy balance calculations. The model produces zonal statistics similar to those from general circulation models, and has also been shown to produce similar responses in sensitivity studies. Further studies of model performance are planned, including: comparison with July data; comparison of temperature and moisture transport and wind fields for winter and summer months; and a tabulation of atmospheric energetics. Based on these preliminary performance studies, however, it appears that the zonal model can be used in conjunction with more complex models to help unravel the problems of understanding the processes governing present climate and climate change. As can be seen in the subsequent paper on model sensitivity studies, in addition to reduced cost of computation, the zonal model facilitates analysis of feedback mechanisms and simplifies analysis of the interactions between processes
Directory of Open Access Journals (Sweden)
Brion Philippe
2015-12-01
Full Text Available Using as much administrative data as possible is a general trend among most national statistical institutes. Different kinds of administrative sources, from tax authorities or other administrative bodies, are very helpful material in the production of business statistics. However, these sources often have to be completed by information collected through statistical surveys. This article describes the way Insee has implemented such a strategy in order to produce French structural business statistics. The originality of the French procedure is that administrative and survey variables are used jointly for the same enterprises, unlike the majority of multisource systems, in which the two kinds of sources generally complement each other for different categories of units. The idea is to use, as much as possible, the richness of the administrative sources combined with the timeliness of a survey, even if the latter is conducted only on a sample of enterprises. One main issue is the classification of enterprises within the NACE nomenclature, which is a cornerstone variable in producing the breakdown of the results by industry. At a given date, two values of the corresponding code may coexist: the value of the register, not necessarily up to date, and the value resulting from the data collected via the survey, but only from a sample of enterprises. Using all this information together requires the implementation of specific statistical estimators combining some properties of the difference estimators with calibration techniques. This article presents these estimators, as well as their statistical properties, and compares them with those of other methods.
Statistical universals reveal the structures and functions of human music.
Savage, Patrick E; Brown, Steven; Sakai, Emi; Currie, Thomas E
2015-07-21
Music has been called "the universal language of mankind." Although contemporary theories of music evolution often invoke various musical universals, the existence of such universals has been disputed for decades and has never been empirically demonstrated. Here we combine a music-classification scheme with statistical analyses, including phylogenetic comparative methods, to examine a well-sampled global set of 304 music recordings. Our analyses reveal no absolute universals but strong support for many statistical universals that are consistent across all nine geographic regions sampled. These universals include 18 musical features that are common individually as well as a network of 10 features that are commonly associated with one another. They span not only features related to pitch and rhythm that are often cited as putative universals but also rarely cited domains including performance style and social context. These cross-cultural structural regularities of human music may relate to roles in facilitating group coordination and cohesion, as exemplified by the universal tendency to sing, play percussion instruments, and dance to simple, repetitive music in groups. Our findings highlight the need for scientists studying music evolution to expand the range of musical cultures and musical features under consideration. The statistical universals we identified represent important candidates for future investigation.
Abbey, Craig K.; Samuelson, Frank W.; Gallas, Brandon D.; Boone, John M.; Niklason, Loren T.
2013-03-01
The receiver operating characteristic (ROC) curve has become a common tool for evaluating diagnostic imaging technologies, and the primary endpoint of such evaluations is the area under the curve (AUC), which integrates sensitivity over the entire false positive range. An alternative figure of merit for ROC studies is expected utility (EU), which focuses on the relevant region of the ROC curve as defined by disease prevalence and the relative utility of the task. However if this measure is to be used, it must also have desirable statistical properties keep the burden of observer performance studies as low as possible. Here, we evaluate effect size and variability for EU and AUC. We use two observer performance studies recently submitted to the FDA to compare the EU and AUC endpoints. The studies were conducted using the multi-reader multi-case methodology in which all readers score all cases in all modalities. ROC curves from the study were used to generate both the AUC and EU values for each reader and modality. The EU measure was computed assuming an iso-utility slope of 1.03. We find mean effect sizes, the reader averaged difference between modalities, to be roughly 2.0 times as big for EU as AUC. The standard deviation across readers is roughly 1.4 times as large, suggesting better statistical properties for the EU endpoint. In a simple power analysis of paired comparison across readers, the utility measure required 36% fewer readers on average to achieve 80% statistical power compared to AUC.
Comparison of the performances of the CS model coil and the Good Joint SULTAN sample
International Nuclear Information System (INIS)
Wesche, Rainer; Herzog, Robert; Bruzzone, Pierluigi
2008-01-01
The relevance of short sample measurements in SULTAN for the prediction of the performance of the coils of the International Thermonuclear Experimental Reactor (ITER) is assessed using the case of the Nb 3 Sn high-field central solenoid model coil (CSMC) conductor, for which both coil performance and short sample SULTAN results (Good Joint (GJ) sample) are available. A least-squares fit procedure, based on a uniform current distribution among the strands and the Durham scaling relations for the field, temperature and strain dependences of the strand J c provides a thermal strain of -0.294% and a degradation factor of approximately 60% for the GJ sample. In the calculation of the voltage along Layer 1A of the CSMC the hoop stress and the variation of the magnetic field in the conductor cross-section were taken into account. The temperature profile, used in the calculations, is based on published temperature profiles and empirical relations between helium inlet and outlet temperatures. A comparison with the GJ results indicates that short sample measurements in SULTAN provide a conservative estimate of the coil performance
Use Of R in Statistics Lithuania
Directory of Open Access Journals (Sweden)
Tomas Rudys
2016-06-01
Full Text Available Recently R becoming more and more popular among official statistics offices. It can be used not even for research purposes, but also for a production of official statistics. Statistics Lithuania recently started an analysis of possibilities where R can be used and could it replace some other statistical programming languages or systems. For this reason a work group was arranged. In the paper we will present overview of the current situation on implementation of R in Statistics Lithuania, some problems we are chasing with and some future plans. At the current situation R is used mainly for research purposes. Looking for- ward a short courses on basic R was prepared and at the moment we are starting to use R for data analysis, data manipulation from Oracle data bases, some reports preparation, data editing, survey estimation. On the other hand we found some problems working with big data sets, also survey sampling as there are surveys with complex sampling designs. We are also analysing the running of R on our servers in order to have possibilities to use more random access memory (RAM. Despite the problems, we are trying to use R in more fields in production of official statistics.
Fisher statistics for analysis of diffusion tensor directional information.
Hutchinson, Elizabeth B; Rutecki, Paul A; Alexander, Andrew L; Sutula, Thomas P
2012-04-30
A statistical approach is presented for the quantitative analysis of diffusion tensor imaging (DTI) directional information using Fisher statistics, which were originally developed for the analysis of vectors in the field of paleomagnetism. In this framework, descriptive and inferential statistics have been formulated based on the Fisher probability density function, a spherical analogue of the normal distribution. The Fisher approach was evaluated for investigation of rat brain DTI maps to characterize tissue orientation in the corpus callosum, fornix, and hilus of the dorsal hippocampal dentate gyrus, and to compare directional properties in these regions following status epilepticus (SE) or traumatic brain injury (TBI) with values in healthy brains. Direction vectors were determined for each region of interest (ROI) for each brain sample and Fisher statistics were applied to calculate the mean direction vector and variance parameters in the corpus callosum, fornix, and dentate gyrus of normal rats and rats that experienced TBI or SE. Hypothesis testing was performed by calculation of Watson's F-statistic and associated p-value giving the likelihood that grouped observations were from the same directional distribution. In the fornix and midline corpus callosum, no directional differences were detected between groups, however in the hilus, significant (pstatistical comparison of tissue structural orientation. Copyright © 2012 Elsevier B.V. All rights reserved.
Evaluation of analytical results on DOE Quality Assessment Program Samples
International Nuclear Information System (INIS)
Jaquish, R.E.; Kinnison, R.R.; Mathur, S.P.; Sastry, R.
1985-01-01
Criteria were developed for evaluating the participants analytical results in the DOE Quality Assessment Program (QAP). Historical data from previous QAP studies were analyzed using descriptive statistical methods to determine the interlaboratory precision that had been attained. Performance criteria used in other similar programs were also reviewed. Using these data, precision values and control limits were recommended for each type of analysis performed in the QA program. Results of the analysis performed by the QAP participants on the November 1983 samples were statistically analyzed and evaluated. The Environmental Measurements Laboratory (EML) values were used as the known values and 3-sigma precision values were used as control limits. Results were submitted by 26 participating laboratories for 49 different radionuclide media combinations. The participants reported 419 results and of these, 350 or 84% were within control limits. Special attention was given to the data from gamma spectral analysis of air filters and water samples. both normal probability and box plots were prepared for each nuclide to help evaluate the distribution of the data. Results that were outside the expected range were identified and suggestions made that laboratories check calculations, and procedures on these results
Cao, Yan; Wang, Shaozhan; Li, Yinghua; Chen, Xiaofei; Chen, Langdong; Wang, Dongyao; Zhu, Zhenyu; Yuan, Yongfang; Lv, Diya
2018-03-09
Cell membrane chromatography (CMC) has been successfully applied to screen bioactive compounds from Chinese herbs for many years, and some offline and online two-dimensional (2D) CMC-high performance liquid chromatography (HPLC) hyphenated systems have been established to perform screening assays. However, the requirement of sample preparation steps for the second-dimensional analysis in offline systems and the need for an interface device and technical expertise in the online system limit their extensive use. In the present study, an offline 2D CMC-HPLC analysis combined with the XCMS (various forms of chromatography coupled to mass spectrometry) Online statistical tool for data processing was established. First, our previously reported online 2D screening system was used to analyze three Chinese herbs that were reported to have potential anti-inflammatory effects, and two binding components were identified. By contrast, the proposed offline 2D screening method with XCMS Online analysis was applied, and three more ingredients were discovered in addition to the two compounds revealed by the online system. Then, cross-validation of the three compounds was performed, and they were confirmed to be included in the online data as well, but were not identified there because of their low concentrations and lack of credible statistical approaches. Last, pharmacological experiments showed that these five ingredients could inhibit IL-6 release and IL-6 gene expression on LPS-induced RAW cells in a dose-dependent manner. Compared with previous 2D CMC screening systems, this newly developed offline 2D method needs no sample preparation steps for the second-dimensional analysis, and it is sensitive, efficient, and convenient. It will be applicable in identifying active components from Chinese herbs and practical in discovery of lead compounds derived from herbs. Copyright © 2018 Elsevier B.V. All rights reserved.
DESIGNING ENVIRONMENTAL MONITORING DATABASES FOR STATISTIC ASSESSMENT
Databases designed for statistical analyses have characteristics that distinguish them from databases intended for general use. EMAP uses a probabilistic sampling design to collect data to produce statistical assessments of environmental conditions. In addition to supporting the ...
Identification of mine waters by statistical multivariate methods
Energy Technology Data Exchange (ETDEWEB)
Mali, N [IGGG, Ljubljana (Slovenia)
1992-01-01
Three water-bearing aquifers are present in the Velenje lignite mine. The aquifer waters have differing chemical composition; a geochemical water analysis can therefore determine the source of mine water influx. Mine water samples from different locations in the mine were analyzed, the results of chemical content and of electric conductivity of mine water were statistically processed by means of MICROGAS, SPSS-X and IN STATPAC computer programs, which apply three multivariate statistical methods (discriminate, cluster and factor analysis). Reliability of calculated values was determined with the Kolmogorov and Smirnov tests. It is concluded that laboratory analysis of single water samples can produce measurement errors, but statistical processing of water sample data can identify origin and movement of mine water. 15 refs.
Statistical Model of Extreme Shear
DEFF Research Database (Denmark)
Larsen, Gunner Chr.; Hansen, Kurt Schaldemose
2004-01-01
In order to continue cost-optimisation of modern large wind turbines, it is important to continously increase the knowledge on wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... by a model that, on a statistically consistent basis, describe the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of high-sampled full-scale time series measurements...... are consistent, given the inevitabel uncertainties associated with model as well as with the extreme value data analysis. Keywords: Statistical model, extreme wind conditions, statistical analysis, turbulence, wind loading, statistical analysis, turbulence, wind loading, wind shear, wind turbines....
Estimating HIES Data through Ratio and Regression Methods for Different Sampling Designs
Directory of Open Access Journals (Sweden)
Faqir Muhammad
2007-01-01
Full Text Available In this study, comparison has been made for different sampling designs, using the HIES data of North West Frontier Province (NWFP for 2001-02 and 1998-99 collected from the Federal Bureau of Statistics, Statistical Division, Government of Pakistan, Islamabad. The performance of the estimators has also been considered using bootstrap and Jacknife. A two-stage stratified random sample design is adopted by HIES. In the first stage, enumeration blocks and villages are treated as the first stage Primary Sampling Units (PSU. The sample PSU’s are selected with probability proportional to size. Secondary Sampling Units (SSU i.e., households are selected by systematic sampling with a random start. They have used a single study variable. We have compared the HIES technique with some other designs, which are: Stratified Simple Random Sampling. Stratified Systematic Sampling. Stratified Ranked Set Sampling. Stratified Two Phase Sampling. Ratio and Regression methods were applied with two study variables, which are: Income (y and Household sizes (x. Jacknife and Bootstrap are used for variance replication. Simple Random Sampling with sample size (462 to 561 gave moderate variances both by Jacknife and Bootstrap. By applying Systematic Sampling, we received moderate variance with sample size (467. In Jacknife with Systematic Sampling, we obtained variance of regression estimator greater than that of ratio estimator for a sample size (467 to 631. At a sample size (952 variance of ratio estimator gets greater than that of regression estimator. The most efficient design comes out to be Ranked set sampling compared with other designs. The Ranked set sampling with jackknife and bootstrap, gives minimum variance even with the smallest sample size (467. Two Phase sampling gave poor performance. Multi-stage sampling applied by HIES gave large variances especially if used with a single study variable.
Gyarmathy, V Anna; Johnston, Lisa G; Caplinskiene, Irma; Caplinskas, Saulius; Latkin, Carl A
2014-02-01
Respondent driven sampling (RDS) and incentivized snowball sampling (ISS) are two sampling methods that are commonly used to reach people who inject drugs (PWID). We generated a set of simulated RDS samples on an actual sociometric ISS sample of PWID in Vilnius, Lithuania ("original sample") to assess if the simulated RDS estimates were statistically significantly different from the original ISS sample prevalences for HIV (9.8%), Hepatitis A (43.6%), Hepatitis B (Anti-HBc 43.9% and HBsAg 3.4%), Hepatitis C (87.5%), syphilis (6.8%) and Chlamydia (8.8%) infections and for selected behavioral risk characteristics. The original sample consisted of a large component of 249 people (83% of the sample) and 13 smaller components with 1-12 individuals. Generally, as long as all seeds were recruited from the large component of the original sample, the simulation samples simply recreated the large component. There were no significant differences between the large component and the entire original sample for the characteristics of interest. Altogether 99.2% of 360 simulation sample point estimates were within the confidence interval of the original prevalence values for the characteristics of interest. When population characteristics are reflected in large network components that dominate the population, RDS and ISS may produce samples that have statistically non-different prevalence values, even though some isolated network components may be under-sampled and/or statistically significantly different from the main groups. This so-called "strudel effect" is discussed in the paper. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Oliveira Mendes, Thiago de; Pinto, Liliane Pereira; Santos, Laurita dos; Tippavajhala, Vamshi Krishna; Téllez Soto, Claudio Alberto; Martin, Airton Abrahão
2016-07-01
The analysis of biological systems by spectroscopic techniques involves the evaluation of hundreds to thousands of variables. Hence, different statistical approaches are used to elucidate regions that discriminate classes of samples and to propose new vibrational markers for explaining various phenomena like disease monitoring, mechanisms of action of drugs, food, and so on. However, the technical statistics are not always widely discussed in applied sciences. In this context, this work presents a detailed discussion including the various steps necessary for proper statistical analysis. It includes univariate parametric and nonparametric tests, as well as multivariate unsupervised and supervised approaches. The main objective of this study is to promote proper understanding of the application of various statistical tools in these spectroscopic methods used for the analysis of biological samples. The discussion of these methods is performed on a set of in vivo confocal Raman spectra of human skin analysis that aims to identify skin aging markers. In the Appendix, a complete routine of data analysis is executed in a free software that can be used by the scientific community involved in these studies.
The matchmaking paradox: a statistical explanation
International Nuclear Information System (INIS)
Eliazar, Iddo I; Sokolov, Igor M
2010-01-01
Medical surveys regarding the number of heterosexual partners per person yield different female and male averages-a result which, from a physical standpoint, is impossible. In this paper we term this puzzle the 'matchmaking paradox', and establish a statistical model explaining it. We consider a bipartite graph with N male and N female nodes (N >> 1), and B bonds connecting them (B >> 1). Each node is associated a random 'attractiveness level', and the bonds connect to the nodes randomly-with probabilities which are proportionate to the nodes' attractiveness levels. The population's average bonds-per-nodes B/N is estimated via a sample average calculated from a survey of size n (n >> 1). A comprehensive statistical analysis of this model is carried out, asserting that (i) the sample average well estimates the population average if and only if the attractiveness levels possess a finite mean; (ii) if the attractiveness levels are governed by a 'fat-tailed' probability law then the sample average displays wild fluctuations and strong skew-thus providing a statistical explanation to the matchmaking paradox.
Effect of Task Presentation on Students' Performances in Introductory Statistics Courses
Tomasetto, Carlo; Matteucci, Maria Cristina; Carugati, Felice; Selleri, Patrizia
2009-01-01
Research on academic learning indicates that many students experience major difficulties with introductory statistics and methodology courses. We hypothesized that students' difficulties may depend in part on the fact that statistics tasks are commonly viewed as related to the threatening domain of math. In two field experiments which we carried…
Comparison of sampling techniques for Bayesian parameter estimation
Allison, Rupert; Dunkley, Joanna
2014-02-01
The posterior probability distribution for a set of model parameters encodes all that the data have to tell us in the context of a given model; it is the fundamental quantity for Bayesian parameter estimation. In order to infer the posterior probability distribution we have to decide how to explore parameter space. Here we compare three prescriptions for how parameter space is navigated, discussing their relative merits. We consider Metropolis-Hasting sampling, nested sampling and affine-invariant ensemble Markov chain Monte Carlo (MCMC) sampling. We focus on their performance on toy-model Gaussian likelihoods and on a real-world cosmological data set. We outline the sampling algorithms themselves and elaborate on performance diagnostics such as convergence time, scope for parallelization, dimensional scaling, requisite tunings and suitability for non-Gaussian distributions. We find that nested sampling delivers high-fidelity estimates for posterior statistics at low computational cost, and should be adopted in favour of Metropolis-Hastings in many cases. Affine-invariant MCMC is competitive when computing clusters can be utilized for massive parallelization. Affine-invariant MCMC and existing extensions to nested sampling naturally probe multimodal and curving distributions.
Bias expansion of spatial statistics and approximation of differenced ...
Indian Academy of Sciences (India)
Investigations of spatial statistics, computed from lattice data in the plane, can lead to a special lattice point counting problem. The statistical goal is to expand the asymptotic expectation or large-sample bias of certain spatial covariance estimators, where this bias typically depends on the shape of a spatial sampling region.
Torimitsu, Suguru; Nishida, Yoshifumi; Takano, Tachio; Koizumi, Yoshinori; Makino, Yohsuke; Yajima, Daisuke; Hayakawa, Mutsumi; Inokuchi, Go; Motomura, Ayumi; Chiba, Fumiko; Otsuka, Katsura; Kobayashi, Kazuhiro; Odo, Yuriko; Iwase, Hirotaro
2014-01-01
The purpose of this research was to investigate the biomechanical properties of the adult human skull and the structural changes that occur with age in both sexes. The heads of 94 Japanese cadavers (54 male cadavers, 40 female cadavers) autopsied in our department were used in this research. A total of 376 cranial samples, four from each skull, were collected. Sample fracture load was measured by a bending test. A statistically significant negative correlation between the sample fracture load and cadaver age was found. This indicates that the stiffness of cranial bones in Japanese individuals decreases with age, and the risk of skull fracture thus probably increases with age. Prior to the bending test, the sample mass, the sample thickness, the ratio of the sample thickness to cadaver stature (ST/CS), and the sample density were measured and calculated. Significant negative correlations between cadaver age and sample thickness, ST/CS, and the sample density were observed only among the female samples. Computerized tomographic (CT) images of 358 cranial samples were available. The computed tomography value (CT value) of cancellous bone which refers to a quantitative scale for describing radiodensity, cancellous bone thickness and cortical bone thickness were measured and calculated. Significant negative correlation between cadaver age and the CT value or cortical bone thickness was observed only among the female samples. These findings suggest that the skull is substantially affected by decreased bone metabolism resulting from osteoporosis. Therefore, osteoporosis prevention and treatment may increase cranial stiffness and reinforce the skull structure, leading to a decrease in the risk of skull fractures. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Small Scale Mixing Demonstration Batch Transfer and Sampling Performance of Simulated HLW - 12307
Energy Technology Data Exchange (ETDEWEB)
Jensen, Jesse; Townson, Paul; Vanatta, Matt [EnergySolutions, Engineering and Technology Group, Richland, WA, 99354 (United States)
2012-07-01
The ability to effectively mix, sample, certify, and deliver consistent batches of High Level Waste (HLW) feed from the Hanford Double Shell Tanks (DST) to the Waste treatment Plant (WTP) has been recognized as a significant mission risk with potential to impact mission length and the quantity of HLW glass produced. At the end of 2009 DOE's Tank Operations Contractor, Washington River Protection Solutions (WRPS), awarded a contract to EnergySolutions to design, fabricate and operate a demonstration platform called the Small Scale Mixing Demonstration (SSMD) to establish pre-transfer sampling capacity, and batch transfer performance data at two different scales. This data will be used to examine the baseline capacity for a tank mixed via rotational jet mixers to transfer consistent or bounding batches, and provide scale up information to predict full scale operational performance. This information will then in turn be used to define the baseline capacity of such a system to transfer and sample batches sent to WTP. The Small Scale Mixing Demonstration (SSMD) platform consists of 43'' and 120'' diameter clear acrylic test vessels, each equipped with two scaled jet mixer pump assemblies, and all supporting vessels, controls, services, and simulant make up facilities. All tank internals have been modeled including the air lift circulators (ALCs), the steam heating coil, and the radius between the wall and floor. The test vessels are set up to simulate the transfer of HLW out of a mixed tank, and collect a pre-transfer sample in a manner similar to the proposed baseline configuration. The collected material is submitted to an NQA-1 laboratory for chemical analysis. Previous work has been done to assess tank mixing performance at both scales. This work involved a combination of unique instruments to understand the three dimensional distribution of solids using a combination of Coriolis meter measurements, in situ chord length distribution
Mathematical Anxiety among Business Statistics Students.
High, Robert V.
A survey instrument was developed to identify sources of mathematics anxiety among undergraduate business students in a statistics class. A number of statistics classes were selected at two colleges in Long Island, New York. A final sample of n=102 respondents indicated that there was a relationship between the mathematics grade in prior…
International Nuclear Information System (INIS)
Oh, Jung Su; Lee, Jae Sung; Kim, Yu Kyeong; Chung, June Key; Lee, Myung Chul; Lee, Dong Soo
2005-01-01
In the statistical probabilistic mapping, commonly, differences between two or more groups of subjects are statistically analyzed following spatial normalization. However, to our best knowledge, there is few study which performed the statistical mapping in the individual brain space rather than in the stereotaxic brain space, i.e., template space. Therefore, in the current study, a new method for mapping the statistical results in the template space onto individual brain space has been developed. Four young subjects with epilepsy and their age-matched thirty normal healthy subjects were recruited. Both FDG PET and T1 structural MRI was scanned in these groups. Statistical analysis on the decreased FDG metabolism in epilepsy was performed on the SPM with two sample t-test (p < 0.001, intensity threshold 100). To map the statistical results onto individual space, inverse deformation was performed as follows. With SPM deformation toolbox, DCT (discrete cosine transform) basis-encoded deformation fields between individual T1 images and T1 MNI template were obtained. Afterward, inverse of those fields, i.e., inverse deformation fields were obtained. Since both PET and T1 images have been already normalized in the same MNI space, inversely deformed results in PET is on the individual brain MRI space. By applying inverse deformation field on the statistical results of the PET, the statistical map of decreased metabolism in individual spaces were obtained. With statistical results in the template space, localization of decreased metabolism was in the inferior temporal lobe, which was slightly inferior to the hippocampus. The statistical results in the individual space were commonly located in the hippocampus, where the activation should be decreased according to a priori knowledge of neuroscience. With our newly developed statistical mapping on the individual spaces, the localization of the brain functional mapping became more appropriate in the sense of neuroscience
Energy Technology Data Exchange (ETDEWEB)
Oh, Jung Su; Lee, Jae Sung; Kim, Yu Kyeong; Chung, June Key; Lee, Myung Chul; Lee, Dong Soo [Seoul National University Hospital, Seoul (Korea, Republic of)
2005-07-01
In the statistical probabilistic mapping, commonly, differences between two or more groups of subjects are statistically analyzed following spatial normalization. However, to our best knowledge, there is few study which performed the statistical mapping in the individual brain space rather than in the stereotaxic brain space, i.e., template space. Therefore, in the current study, a new method for mapping the statistical results in the template space onto individual brain space has been developed. Four young subjects with epilepsy and their age-matched thirty normal healthy subjects were recruited. Both FDG PET and T1 structural MRI was scanned in these groups. Statistical analysis on the decreased FDG metabolism in epilepsy was performed on the SPM with two sample t-test (p < 0.001, intensity threshold 100). To map the statistical results onto individual space, inverse deformation was performed as follows. With SPM deformation toolbox, DCT (discrete cosine transform) basis-encoded deformation fields between individual T1 images and T1 MNI template were obtained. Afterward, inverse of those fields, i.e., inverse deformation fields were obtained. Since both PET and T1 images have been already normalized in the same MNI space, inversely deformed results in PET is on the individual brain MRI space. By applying inverse deformation field on the statistical results of the PET, the statistical map of decreased metabolism in individual spaces were obtained. With statistical results in the template space, localization of decreased metabolism was in the inferior temporal lobe, which was slightly inferior to the hippocampus. The statistical results in the individual space were commonly located in the hippocampus, where the activation should be decreased according to a priori knowledge of neuroscience. With our newly developed statistical mapping on the individual spaces, the localization of the brain functional mapping became more appropriate in the sense of neuroscience.
Working with sample data exploration and inference
Chaffe-Stengel, Priscilla
2014-01-01
Managers and analysts routinely collect and examine key performance measures to better understand their operations and make good decisions. Being able to render the complexity of operations data into a coherent account of significant events requires an understanding of how to work well with raw data and to make appropriate inferences. Although some statistical techniques for analyzing data and making inferences are sophisticated and require specialized expertise, there are methods that are understandable and applicable by anyone with basic algebra skills and the support of a spreadsheet package. By applying these fundamental methods themselves rather than turning over both the data and the responsibility for analysis and interpretation to an expert, managers will develop a richer understanding and potentially gain better control over their environment. This text is intended to describe these fundamental statistical techniques to managers, data analysts, and students. Statistical analysis of sample data is enh...
Comparison of Statistical Methods for Detector Testing Programs
Energy Technology Data Exchange (ETDEWEB)
Rennie, John Alan [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Abhold, Mark [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
2016-10-14
A typical goal for any detector testing program is to ascertain not only the performance of the detector systems under test, but also the confidence that systems accepted using that testing program’s acceptance criteria will exceed a minimum acceptable performance (which is usually expressed as the minimum acceptable success probability, p). A similar problem often arises in statistics, where we would like to ascertain the fraction, p, of a population of items that possess a property that may take one of two possible values. Typically, the problem is approached by drawing a fixed sample of size n, with the number of items out of n that possess the desired property, x, being termed successes. The sample mean gives an estimate of the population mean p ≈ x/n, although usually it is desirable to accompany such an estimate with a statement concerning the range within which p may fall and the confidence associated with that range. Procedures for establishing such ranges and confidence limits are described in detail by Clopper, Brown, and Agresti for two-sided symmetric confidence intervals.
Applied statistics for economists
Lewis, Margaret
2012-01-01
This book is an undergraduate text that introduces students to commonly-used statistical methods in economics. Using examples based on contemporary economic issues and readily-available data, it not only explains the mechanics of the various methods, it also guides students to connect statistical results to detailed economic interpretations. Because the goal is for students to be able to apply the statistical methods presented, online sources for economic data and directions for performing each task in Excel are also included.
International Nuclear Information System (INIS)
Brodsky, A.
1986-04-01
This report provides statistical concepts and formulas for defining minimum detectable amount (MDA), bias and precision of sample analytical measurements of radioactivity for radiobioassay purposes. The defined statistical quantities and accuracy criteria were developed for use in standard performance criteria for radiobioassay, but are also useful in intralaboratory quality assurance programs. This report also includes a literature review and analysis of accuracy needs and accuracy recommendations of national and international scientific organizations for radiation or radioactivity measurements used for radiation protection purposes. Computer programs are also included for calculating the probabilities of passing or failing multiple analytical tests for different acceptable ranges of bias and precision
Nam, Sungsik
2011-08-01
Spread spectrum receivers with generalized selection combining (GSC) RAKE reception were proposed and have been studied as alternatives to the classical two fundamental schemes: maximal ratio combining and selection combining because the number of diversity paths increases with the transmission bandwidth. Previous work on performance analyses of GSC RAKE receivers based on the signal to noise ratio focused on the development of methodologies to derive exact closed-form expressions for various performance measures. However, some open problems related to the performance evaluation of GSC RAKE receivers still remain to be solved such as the exact performance analysis of the capture probability and an exact assessment of the impact of self-interference on GSC RAKE receivers. The major difficulty in these problems is to derive some joint statistics of ordered exponential variates. With this motivation in mind, we capitalize in this paper on some new order statistics results to derive exact closed-form expressions for the capture probability and outage probability of GSC RAKE receivers subject to self-interference over independent and identically distributed Rayleigh fading channels, and compare it to that of partial RAKE receivers. © 2011 IEEE.
Galaxies distribution in the universe: large-scale statistics and structures
International Nuclear Information System (INIS)
Maurogordato, Sophie
1988-01-01
This research thesis addresses the distribution of galaxies in the Universe, and more particularly large scale statistics and structures. Based on an assessment of the main used statistical techniques, the author outlines the need to develop additional tools to correlation functions in order to characterise the distribution. She introduces a new indicator: the probability of a volume randomly tested in the distribution to be void. This allows a characterisation of void properties at the work scales (until 10h"-"1 Mpc) in the Harvard Smithsonian Center for Astrophysics Redshift Survey, or CfA catalog. A systematic analysis of statistical properties of different sub-samples has then been performed with respect to the size and location, luminosity class, and morphological type. This analysis is then extended to different scenarios of structure formation. A program of radial speed measurements based on observations allows the determination of possible relationships between apparent structures. The author also presents results of the search for south extensions of Perseus supernova [fr
Automated statistical modeling of analytical measurement systems
International Nuclear Information System (INIS)
Jacobson, J.J.
1992-01-01
The statistical modeling of analytical measurement systems at the Idaho Chemical Processing Plant (ICPP) has been completely automated through computer software. The statistical modeling of analytical measurement systems is one part of a complete quality control program used by the Remote Analytical Laboratory (RAL) at the ICPP. The quality control program is an integration of automated data input, measurement system calibration, database management, and statistical process control. The quality control program and statistical modeling program meet the guidelines set forth by the American Society for Testing Materials and American National Standards Institute. A statistical model is a set of mathematical equations describing any systematic bias inherent in a measurement system and the precision of a measurement system. A statistical model is developed from data generated from the analysis of control standards. Control standards are samples which are made up at precise known levels by an independent laboratory and submitted to the RAL. The RAL analysts who process control standards do not know the values of those control standards. The object behind statistical modeling is to describe real process samples in terms of their bias and precision and, to verify that a measurement system is operating satisfactorily. The processing of control standards gives us this ability
Mukhopadhyay, Nitai D; Sampson, Andrew J; Deniz, Daniel; Alm Carlsson, Gudrun; Williamson, Jeffrey; Malusek, Alexandr
2012-01-01
Correlated sampling Monte Carlo methods can shorten computing times in brachytherapy treatment planning. Monte Carlo efficiency is typically estimated via efficiency gain, defined as the reduction in computing time by correlated sampling relative to conventional Monte Carlo methods when equal statistical uncertainties have been achieved. The determination of the efficiency gain uncertainty arising from random effects, however, is not a straightforward task specially when the error distribution is non-normal. The purpose of this study is to evaluate the applicability of the F distribution and standardized uncertainty propagation methods (widely used in metrology to estimate uncertainty of physical measurements) for predicting confidence intervals about efficiency gain estimates derived from single Monte Carlo runs using fixed-collision correlated sampling in a simplified brachytherapy geometry. A bootstrap based algorithm was used to simulate the probability distribution of the efficiency gain estimates and the shortest 95% confidence interval was estimated from this distribution. It was found that the corresponding relative uncertainty was as large as 37% for this particular problem. The uncertainty propagation framework predicted confidence intervals reasonably well; however its main disadvantage was that uncertainties of input quantities had to be calculated in a separate run via a Monte Carlo method. The F distribution noticeably underestimated the confidence interval. These discrepancies were influenced by several photons with large statistical weights which made extremely large contributions to the scored absorbed dose difference. The mechanism of acquiring high statistical weights in the fixed-collision correlated sampling method was explained and a mitigation strategy was proposed. Copyright © 2011 Elsevier Ltd. All rights reserved.
An experimental verification of laser-velocimeter sampling bias and its correction
Johnson, D. A.; Modarress, D.; Owen, F. K.
1982-01-01
The existence of 'sampling bias' in individual-realization laser velocimeter measurements is experimentally verified and shown to be independent of sample rate. The experiments were performed in a simple two-stream mixing shear flow with the standard for comparison being laser-velocimeter results obtained under continuous-wave conditions. It is also demonstrated that the errors resulting from sampling bias can be removed by a proper interpretation of the sampling statistics. In addition, data obtained in a shock-induced separated flow and in the near-wake of airfoils are presented, both bias-corrected and uncorrected, to illustrate the effects of sampling bias in the extreme.
Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science.
Veldkamp, Coosje L S; Nuijten, Michèle B; Dominguez-Alvarez, Linda; van Assen, Marcel A L M; Wicherts, Jelte M
2014-01-01
Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this 'co-piloting' currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors.
Angeler, David G; Viedma, Olga; Moreno, José M
2009-11-01
Time lag analysis (TLA) is a distance-based approach used to study temporal dynamics of ecological communities by measuring community dissimilarity over increasing time lags. Despite its increased use in recent years, its performance in comparison with other more direct methods (i.e., canonical ordination) has not been evaluated. This study fills this gap using extensive simulations and real data sets from experimental temporary ponds (true zooplankton communities) and landscape studies (landscape categories as pseudo-communities) that differ in community structure and anthropogenic stress history. Modeling time with a principal coordinate of neighborhood matrices (PCNM) approach, the canonical ordination technique (redundancy analysis; RDA) consistently outperformed the other statistical tests (i.e., TLAs, Mantel test, and RDA based on linear time trends) using all real data. In addition, the RDA-PCNM revealed different patterns of temporal change, and the strength of each individual time pattern, in terms of adjusted variance explained, could be evaluated, It also identified species contributions to these patterns of temporal change. This additional information is not provided by distance-based methods. The simulation study revealed better Type I error properties of the canonical ordination techniques compared with the distance-based approaches when no deterministic component of change was imposed on the communities. The simulation also revealed that strong emphasis on uniform deterministic change and low variability at other temporal scales is needed to result in decreased statistical power of the RDA-PCNM approach relative to the other methods. Based on the statistical performance of and information content provided by RDA-PCNM models, this technique serves ecologists as a powerful tool for modeling temporal change of ecological (pseudo-) communities.
Haris, K; Pai, Archana
2015-01-01
In this article, we revisit the problem of coherent multi-detector search of gravitational wave from compact binary coalescence with Neutron stars and Black Holes using advanced interferometers like LIGO-Virgo. Based on the loss of optimal multi-detector signal-to-noise ratio (SNR), we construct a hybrid statistic as a best of maximum-likelihood-ratio(MLR) statistic tuned for face-on and face-off binaries. The statistical properties of the hybrid statistic is studied. The performance of this ...
Statistical evaluations of current sampling procedures and incomplete core recovery
International Nuclear Information System (INIS)
Heasler, P.G.; Jensen, L.
1994-03-01
This document develops two formulas that describe the effects of incomplete recovery on core sampling results for the Hanford waste tanks. The formulas evaluate incomplete core recovery from a worst-case (i.e.,biased) and best-case (i.e., unbiased) perspective. A core sampler is unbiased if the sample material recovered is a random sample of the material in the tank, while any sampler that preferentially recovers a particular type of waste over others is a biased sampler. There is strong evidence to indicate that the push-mode sampler presently used at the Hanford site is a biased one. The formulas presented here show the effects of incomplete core recovery on the accuracy of composition measurements, as functions of the vertical variability in the waste. These equations are evaluated using vertical variability estimates from previously sampled tanks (B110, U110, C109). Assuming that the values of vertical variability used in this study adequately describes the Hanford tank farm, one can use the formulas to compute the effect of incomplete recovery on the accuracy of an average constituent estimate. To determine acceptable recovery limits, we have assumed that the relative error of such an estimate should be no more than 20%
Statistical distributions applications and parameter estimates
Thomopoulos, Nick T
2017-01-01
This book gives a description of the group of statistical distributions that have ample application to studies in statistics and probability. Understanding statistical distributions is fundamental for researchers in almost all disciplines. The informed researcher will select the statistical distribution that best fits the data in the study at hand. Some of the distributions are well known to the general researcher and are in use in a wide variety of ways. Other useful distributions are less understood and are not in common use. The book describes when and how to apply each of the distributions in research studies, with a goal to identify the distribution that best applies to the study. The distributions are for continuous, discrete, and bivariate random variables. In most studies, the parameter values are not known a priori, and sample data is needed to estimate parameter values. In other scenarios, no sample data is available, and the researcher seeks some insight that allows the estimate of ...
Application of nonparametric statistics to material strength/reliability assessment
International Nuclear Information System (INIS)
Arai, Taketoshi
1992-01-01
An advanced material technology requires data base on a wide variety of material behavior which need to be established experimentally. It may often happen that experiments are practically limited in terms of reproducibility or a range of test parameters. Statistical methods can be applied to understanding uncertainties in such a quantitative manner as required from the reliability point of view. Statistical assessment involves determinations of a most probable value and the maximum and/or minimum value as one-sided or two-sided confidence limit. A scatter of test data can be approximated by a theoretical distribution only if the goodness of fit satisfies a test criterion. Alternatively, nonparametric statistics (NPS) or distribution-free statistics can be applied. Mathematical procedures by NPS are well established for dealing with most reliability problems. They handle only order statistics of a sample. Mathematical formulas and some applications to engineering assessments are described. They include confidence limits of median, population coverage of sample, required minimum number of a sample, and confidence limits of fracture probability. These applications demonstrate that a nonparametric statistical estimation is useful in logical decision making in the case a large uncertainty exists. (author)
Judgment sampling: a health care improvement perspective.
Perla, Rocco J; Provost, Lloyd P
2012-01-01
Sampling plays a major role in quality improvement work. Random sampling (assumed by most traditional statistical methods) is the exception in improvement situations. In most cases, some type of "judgment sample" is used to collect data from a system. Unfortunately, judgment sampling is not well understood. Judgment sampling relies upon those with process and subject matter knowledge to select useful samples for learning about process performance and the impact of changes over time. It many cases, where the goal is to learn about or improve a specific process or system, judgment samples are not merely the most convenient and economical approach, they are technically and conceptually the most appropriate approach. This is because improvement work is done in the real world in complex situations involving specific areas of concern and focus; in these situations, the assumptions of classical measurement theory neither can be met nor should an attempt be made to meet them. The purpose of this article is to describe judgment sampling and its importance in quality improvement work and studies with a focus on health care settings.
Model Complexity and Out-of-Sample Performance: Evidence from S&P 500 Index Returns
Kaeck, Andreas; Rodrigues, Paulo; Seeger, Norman J.
We apply a range of out-of-sample specification tests to more than forty competing stochastic volatility models to address how model complexity affects out-of-sample performance. Using daily S&P 500 index returns, model confidence set estimations provide strong evidence that the most important model
Environmental restoration and statistics: Issues and needs
International Nuclear Information System (INIS)
Gilbert, R.O.
1991-10-01
Statisticians have a vital role to play in environmental restoration (ER) activities. One facet of that role is to point out where additional work is needed to develop statistical sampling plans and data analyses that meet the needs of ER. This paper is an attempt to show where statistics fits into the ER process. The statistician, as member of the ER planning team, works collaboratively with the team to develop the site characterization sampling design, so that data of the quality and quantity required by the specified data quality objectives (DQOs) are obtained. At the same time, the statistician works with the rest of the planning team to design and implement, when appropriate, the observational approach to streamline the ER process and reduce costs. The statistician will also provide the expertise needed to select or develop appropriate tools for statistical analysis that are suited for problems that are common to waste-site data. These data problems include highly heterogeneous waste forms, large variability in concentrations over space, correlated data, data that do not have a normal (Gaussian) distribution, and measurements below detection limits. Other problems include environmental transport and risk models that yield highly uncertain predictions, and the need to effectively communicate to the public highly technical information, such as sampling plans, site characterization data, statistical analysis results, and risk estimates. Even though some statistical analysis methods are available ''off the shelf'' for use in ER, these problems require the development of additional statistical tools, as discussed in this paper. 29 refs
Testing Homogeneity in a Semiparametric Two-Sample Problem
Directory of Open Access Journals (Sweden)
Yukun Liu
2012-01-01
Full Text Available We study a two-sample homogeneity testing problem, in which one sample comes from a population with density f(x and the other is from a mixture population with mixture density (1−λf(x+λg(x. This problem arises naturally from many statistical applications such as test for partial differential gene expression in microarray study or genetic studies for gene mutation. Under the semiparametric assumption g(x=f(xeα+βx, a penalized empirical likelihood ratio test could be constructed, but its implementation is hindered by the fact that there is neither feasible algorithm for computing the test statistic nor available research results on its theoretical properties. To circumvent these difficulties, we propose an EM test based on the penalized empirical likelihood. We prove that the EM test has a simple chi-square limiting distribution, and we also demonstrate its competitive testing performances by simulations. A real-data example is used to illustrate the proposed methodology.
Noser, Thomas C.; Tanner, John R.; Shah, Situl
2008-01-01
The purpose of this study was to measure the comprehension of basic mathematical skills of students enrolled in statistics classes at a large regional university, and to determine if the scores earned on a basic math skills test are useful in forecasting student performance in these statistics classes, and to determine if students' basic math…
Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B
2013-03-23
Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.
Special nuclear material inventory sampling plans
International Nuclear Information System (INIS)
Vaccaro, H.; Goldman, A.
1987-01-01
Since their introduction in 1942, sampling inspection procedures have been common quality assurance practice. The U.S. Department of Energy (DOE) supports such sampling of special nuclear materials inventories. The DOE Order 5630.7 states, Operations Offices may develop and use statistically valid sampling plans appropriate for their site-specific needs. The benefits for nuclear facilities operations include reduced worker exposure and reduced work load. Improved procedures have been developed for obtaining statistically valid sampling plans that maximize these benefits. The double sampling concept is described and the resulting sample sizes for double sample plans are compared with other plans. An algorithm is given for finding optimal double sampling plans that assist in choosing the appropriate detection and false alarm probabilities for various sampling plans
Bayesian statistics an introduction
Lee, Peter M
2012-01-01
Bayesian Statistics is the school of thought that combines prior beliefs with the likelihood of a hypothesis to arrive at posterior beliefs. The first edition of Peter Lee’s book appeared in 1989, but the subject has moved ever onwards, with increasing emphasis on Monte Carlo based techniques. This new fourth edition looks at recent techniques such as variational methods, Bayesian importance sampling, approximate Bayesian computation and Reversible Jump Markov Chain Monte Carlo (RJMCMC), providing a concise account of the way in which the Bayesian approach to statistics develops as wel
Statistical evaluation of cleanup: How should it be done?
International Nuclear Information System (INIS)
Gilbert, R.O.
1993-02-01
This paper discusses statistical issues that must be addressed when conducting statistical tests for the purpose of evaluating if a site has been remediated to guideline values or standards. The importance of using the Data Quality Objectives (DQO) process to plan and design the sampling plan is emphasized. Other topics discussed are: (1) accounting for the uncertainty of cleanup standards when conducting statistical tests, (2) determining the number of samples and measurements needed to attain specified DQOs, (3) considering whether the appropriate testing philosophy in a given situation is ''guilty until proven innocent'' or ''innocent until proven guilty'' when selecting a statistical test for evaluating the attainment of standards, (4) conducting tests using data sets that contain measurements that have been reported by the laboratory as less than the minimum detectable activity, and (5) selecting statistical tests that are appropriate for risk-based or background-based standards. A recent draft report by Berger that provides guidance on sampling plans and data analyses for final status surveys at US Nuclear Regulatory Commission licensed facilities serves as a focal point for discussion
International Nuclear Information System (INIS)
Abu-Hamdeh, Nidal H.; Alnefaie, Khaled A.; Almitani, Khalid H.
2013-01-01
Highlights: • The successes of using olive waste/methanol as an adsorbent/adsorbate pair. • The experimental gross cycle coefficient of performance obtained was COP a = 0.75. • Optimization showed expanding adsorbent mass to a certain range increases the COP. • The statistical optimization led to optimum tank volume between 0.2 and 0.3 m 3 . • Increasing the collector area to a certain range increased the COP. - Abstract: The current work demonstrates a developed model of a solar adsorption refrigeration system with specific requirements and specifications. The recent scheme can be employed as a refrigerator and cooler unit suitable for remote areas. The unit runs through a parabolic trough solar collector (PTC) and uses olive waste as adsorbent with methanol as adsorbate. Cooling production, COP (coefficient of performance, and COP a (cycle gross coefficient of performance) were used to assess the system performance. The system’s design optimum parameters in this study were arrived to through statistical and experimental methods. The lowest temperature attained in the refrigerated space was 4 °C and the equivalent ambient temperature was 27 °C. The temperature started to decrease steadily at 20:30 – when the actual cooling started – until it reached 4 °C at 01:30 in the next day when it rose again. The highest COP a obtained was 0.75
Franz, H. B.; Mahaffy, P. R.; Kasprzak, W.; Lyness, E.; Raaen, E.
2011-01-01
The Sample Analysis at Mars (SAM) instrument suite comprises the largest science payload on the Mars Science Laboratory (MSL) "Curiosity" rover. SAM will perform chemical and isotopic analysis of volatile compounds from atmospheric and solid samples to address questions pertaining to habitability and geochemical processes on Mars. Sulfur is a key element of interest in this regard, as sulfur compounds have been detected on the Martian surface by both in situ and remote sensing techniques. Their chemical and isotopic composition can belp constrain environmental conditions and mechanisms at the time of formation. A previous study examined the capability of the SAM quadrupole mass spectrometer (QMS) to determine sulfur isotope ratios of SO2 gas from a statistical perspective. Here we discuss the development of a method for determining sulfur isotope ratios with the QMS by sampling SO2 generated from heating of solid sulfate samples in SAM's pyrolysis oven. This analysis, which was performed with the SAM breadboard system, also required development of a novel treatment of the QMS dead time to accommodate the characteristics of an aging detector.
International Nuclear Information System (INIS)
Mccoy, J.C.
1997-01-01
The Transuranic Performance Demonstration Program (TPDP) sample packaging is used to transport highway route controlled quantities of weapons grade (WG) plutonium samples from the Plutonium Finishing Plant (PFP) to the Waste Receiving and Processing (WRAP) facility and back. The purpose of these shipments is to test the nondestructive assay equipment in the WRAP facility as part of the Nondestructive Waste Assay PDP. The PDP is part of the U. S. Department of Energy (DOE) National TRU Program managed by the U. S. Department of Energy, Carlsbad Area Office, Carlsbad, New Mexico. Details of this program are found in CAO-94-1045, Performance Demonstration Program Plan for Nondestructive Assay for the TRU Waste Characterization Program (CAO 1994); INEL-96/0129, Design of Benign Matrix Drums for the Non-Destructive Assay Performance Demonstration Program for the National TRU Program (INEL 1996a); and INEL-96/0245, Design of Phase 1 Radioactive Working Reference Materials for the Nondestructive Assay Performance Demonstration Program for the National TRU Program (INEL 1996b). Other program documentation is maintained by the national TRU program and each DOE site participating in the program. This safety analysis report for packaging (SARP) provides the analyses and evaluations necessary to demonstrate that the TRU PDP sample packaging meets the onsite transportation safety requirements of WHC-CM-2-14, Hazardous Material Packaging and Shipping, for an onsite Transportation Hazard Indicator (THI) 2 packaging. This SARP, however, does not include evaluation of any operations within the PFP or WRAP facilities, including handling, maintenance, storage, or operating requirements, except as they apply directly to transportation between the gate of PFP and the gate of the WRAP facility. All other activities are subject to the requirements of the facility safety analysis reports (FSAR) of the PFP or WRAP facility and requirements of the PDP
Statistical characterization report for Single-Shell Tank 241-T-104
International Nuclear Information System (INIS)
Cromar, R.D.; Wilmarth, S.R.; Jensen, L.
1994-01-01
This report contains the results of the statistical analysis of data from two core samples obtained from single-shell tank 241-T-104 (T-104). Section 2.0 contains a description of the core samples and the chemical analyses performed on the core samples. Section 3.0 contains mean concentration estimates and associated 95% confidence intervals (CIs) on the mean for each of the analytes found in the core composite samples. Section 4.0 contains estimates of the spatial variability (variability between cores) and estimates of the analytical variability from the core composite data. Two types of analytical variability were estimated from the core composite data: (1) sample composite variability (variability between composite samples within the same core) and (2) analytical measurement variability (variability between the primary and duplicate analyses within each core composite sample). Estimates of the analytical measurement variability were used as the reference value to test the significance of the spatial and sample composite variability. Spatial variability was significantly different from zero for 32 out of 80 analytes. The sample composite variance was significantly different from zero for 18 out of the 80 analytes
Statistical characterization report for single-shell tank 241-T-111
International Nuclear Information System (INIS)
Cromar, R.D.; Wilmarth, S.R.
1994-01-01
This report contains the results of the statistical analysis of data from two core samples obtained from single-shell tank 241-T-111 (T-111). Section 2.0 contains a description of the core samples and the chemical analyses performed on the core samples. Section 3.0 contains mean concentration estimates and associated 95% confidence intervals (CIs) on the mean for each of the analytes found in the core samples from T-111. Section 4.0 contains estimates of the spatial variability (variability between cores) and estimates of the analytical variability from the core composite data. Two types of analytical variability were estimated from the core composite data: (1) sample composite variability (variability between composite samples within the same core) and (2) analytical measurement variability (variability between the primary and duplicate analyses within each core composite sample). Estimates of the analytical measurement variability were used as the reference value to test the significance of the spatial and sample composite variability. Spatial variability was significantly different from zero for 39 out of 85 analytes. The sample composite variance was significantly different from zero for (a different) 39 out of the 85 analytes
International Nuclear Information System (INIS)
Kim, Jin Gyeong; Park, Jin Ho; Park, Hyeon Jin; Lee, Jae Jun; Jun, Whong Seok; Whang, Jin Su
2009-08-01
This book explains statistics for engineers using MATLAB, which includes arrangement and summary of data, probability, probability distribution, sampling distribution, assumption, check, variance analysis, regression analysis, categorical data analysis, quality assurance such as conception of control chart, consecutive control chart, breakthrough strategy and analysis using Matlab, reliability analysis like measurement of reliability and analysis with Maltab, and Markov chain.
Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong
2016-01-01
Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set-proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.
Comparison of sampling techniques for use in SYVAC
International Nuclear Information System (INIS)
Dalrymple, G.J.
1984-01-01
The Stephen Howe review (reference TR-STH-1) recommended the use of a deterministic generator (DG) sampling technique for sampling the input values to the SYVAC (SYstems Variability Analysis Code) program. This technique was compared with Monte Carlo simple random sampling (MC) by taking a 1000 run case of SYVAC using MC as the reference case. The results show that DG appears relatively inaccurate for most values of consequence when used with 11 sample intervals. If 22 sample intervals are used then DG generates cumulative distribution functions that are statistically similar to the reference distribution. 400 runs of DG or MC are adequate to generate a representative cumulative distribution function. The MC technique appears to perform better than DG for the same number of runs. However, the DG predicts higher doses and in view of the importance of generating data in the high dose region this sampling technique with 22 sample intervals is recommended for use in SYVAC. (author)
Comparing distribution models for small samples of overdispersed counts of freshwater fish
Vaudor, Lise; Lamouroux, Nicolas; Olivier, Jean-Michel
2011-05-01
The study of species abundance often relies on repeated abundance counts whose number is limited by logistic or financial constraints. The distribution of abundance counts is generally right-skewed (i.e. with many zeros and few high values) and needs to be modelled for statistical inference. We used an extensive dataset involving about 100,000 fish individuals of 12 freshwater fish species collected in electrofishing points (7 m 2) during 350 field surveys made in 25 stream sites, in order to compare the performance and the generality of four distribution models of counts (Poisson, negative binomial and their zero-inflated counterparts). The negative binomial distribution was the best model (Bayesian Information Criterion) for 58% of the samples (species-survey combinations) and was suitable for a variety of life histories, habitat, and sample characteristics. The performance of the models was closely related to samples' statistics such as total abundance and variance. Finally, we illustrated the consequences of a distribution assumption by calculating confidence intervals around the mean abundance, either based on the most suitable distribution assumption or on an asymptotical, distribution-free (Student's) method. Student's method generally corresponded to narrower confidence intervals, especially when there were few (≤3) non-null counts in the samples.
Directory of Open Access Journals (Sweden)
Ammar Saleem Khazaal
2018-01-01
Full Text Available This research reviews a statistical study to check the conformity of aggregates (Coarse and Fine was used in Kirkuk city to the requirements of the Iraqi specifications. The data of sieve analysis (215 samples of aggregates being obtained from of National Central Construction Laboratory and Technical College Construction Laboratory in Kirkuk city have analyzed using the statistical program SAS. The results showed that 5%, 17%, and 18% of fine aggregate samples are passing sieve sizes 10 mm, 4.75 mm, and 2.36 mm, respectively, which were less than the minimum limit allowed by the Iraqi specifications for each sieve. The percentages passing sieve sizes 1.18mm, 600micrometers, and 300micrometers were more than the upper limit of specification by 5%, 20%, and 30% respectively. The samples were passing sieve sizes 1.18mm, and 600micrometers less than the minimum limit of specification by 17%, and 4%, respectively. The results showed that the deviation in a sieve size of 150 micrometers for the upper limit of the specification performs 2% of the total number of samples. For Coarse aggregate, the samples passing sieves size 37.5mm and 20mm were comforting the Iraqi specifications by 100% and 83% respectively, it has found that the samples were passing sieve sizes 10 mm was 5% was more than the higher limit of Iraqi specifications, and 27% of these samples were less than the minimum limit, whereas sample passing sieve size 5mm was 1% which is more than the upper limit of the Iraqi specification. As a result of statistical analysis of data for fine aggregate, it has found that the samples were passing sieve sizes 10 mm, 2.36 mm, 1.18 mm and 150micrometers conforming from statistical point of view the Iraqi specifications, whereas the samples were passing sieve sizes 4.75 mm, 600micrometers and 300 micrometers didn’t conform. Statistical analysis of the results of the coarse aggregates also showed that conforming to sieve sizes of 37.5 mm and 20 mm and
BROËT, PHILIPPE; TSODIKOV, ALEXANDER; DE RYCKE, YANN; MOREAU, THIERRY
2010-01-01
This paper presents two-sample statistics suited for testing equality of survival functions against improper semi-parametric accelerated failure time alternatives. These tests are designed for comparing either the short- or the long-term effect of a prognostic factor, or both. These statistics are obtained as partial likelihood score statistics from a time-dependent Cox model. As a consequence, the proposed tests can be very easily implemented using widely available software. A breast cancer clinical trial is presented as an example to demonstrate the utility of the proposed tests. PMID:15293627
Broët, Philippe; Tsodikov, Alexander; De Rycke, Yann; Moreau, Thierry
2004-06-01
This paper presents two-sample statistics suited for testing equality of survival functions against improper semi-parametric accelerated failure time alternatives. These tests are designed for comparing either the short- or the long-term effect of a prognostic factor, or both. These statistics are obtained as partial likelihood score statistics from a time-dependent Cox model. As a consequence, the proposed tests can be very easily implemented using widely available software. A breast cancer clinical trial is presented as an example to demonstrate the utility of the proposed tests.
Saini, Komal; Singh, Parminder; Bajwa, Bikramjit Singh
2016-12-01
LED flourimeter has been used for microanalysis of uranium concentration in groundwater samples collected from six districts of South West (SW), West (W) and North East (NE) Punjab, India. Average value of uranium content in water samples of SW Punjab is observed to be higher than WHO, USEPA recommended safe limit of 30µgl -1 as well as AERB proposed limit of 60µgl -1 . Whereas, for W and NE region of Punjab, average level of uranium concentration was within AERB recommended limit of 60µgl -1 . Average value observed in SW Punjab is around 3-4 times the value observed in W Punjab, whereas its value is more than 17 times the average value observed in NE region of Punjab. Statistical analysis of carcinogenic as well as non carcinogenic risks due to uranium have been evaluated for each studied district. Copyright © 2016 Elsevier Ltd. All rights reserved.
Statistics for environmental science and management
National Research Council Canada - National Science Library
Manly, B.F.J
2009-01-01
.... Additional topics covered include environmental monitoring, impact assessment, censored data, environmental sampling, the role of statistics in environmental science, assessing site reclamation...
Theunissen, Raf; Kadosh, Jesse S.; Allen, Christian B.
2015-06-01
Spatially varying signals are typically sampled by collecting uniformly spaced samples irrespective of the signal content. For signals with inhomogeneous information content, this leads to unnecessarily dense sampling in regions of low interest or insufficient sample density at important features, or both. A new adaptive sampling technique is presented directing sample collection in proportion to local information content, capturing adequately the short-period features while sparsely sampling less dynamic regions. The proposed method incorporates a data-adapted sampling strategy on the basis of signal curvature, sample space-filling, variable experimental uncertainty and iterative improvement. Numerical assessment has indicated a reduction in the number of samples required to achieve a predefined uncertainty level overall while improving local accuracy for important features. The potential of the proposed method has been further demonstrated on the basis of Laser Doppler Anemometry experiments examining the wake behind a NACA0012 airfoil and the boundary layer characterisation of a flat plate.
Cho, Hyun-Deok; Kim, Unyong; Suh, Joon Hyuk; Eom, Han Young; Kim, Junghyun; Lee, Seul Gi; Choi, Yong Seok; Han, Sang Beom
2016-04-01
Analytical methods using high-performance liquid chromatography with diode array and tandem mass spectrometry detection were developed for the discrimination of the rhizomes of four Atractylodes medicinal plants: A. japonica, A. macrocephala, A. chinensis, and A. lancea. A quantitative study was performed, selecting five bioactive components, including atractylenolide I, II, III, eudesma-4(14),7(11)-dien-8-one and atractylodin, on twenty-six Atractylodes samples of various origins. Sample extraction was optimized to sonication with 80% methanol for 40 min at room temperature. High-performance liquid chromatography with diode array detection was established using a C18 column with a water/acetonitrile gradient system at a flow rate of 1.0 mL/min, and the detection wavelength was set at 236 nm. Liquid chromatography with tandem mass spectrometry was applied to certify the reliability of the quantitative results. The developed methods were validated by ensuring specificity, linearity, limit of quantification, accuracy, precision, recovery, robustness, and stability. Results showed that cangzhu contained higher amounts of atractylenolide I and atractylodin than baizhu, and especially atractylodin contents showed the greatest variation between baizhu and cangzhu. Multivariate statistical analysis, such as principal component analysis and hierarchical cluster analysis, were also employed for further classification of the Atractylodes plants. The established method was suitable for quality control of the Atractylodes plants. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Statistical Methods for Environmental Pollution Monitoring
Energy Technology Data Exchange (ETDEWEB)
Gilbert, Richard O. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
1987-01-01
The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Some statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.
BrightStat.com: free statistics online.
Stricker, Daniel
2008-10-01
Powerful software for statistical analysis is expensive. Here I present BrightStat, a statistical software running on the Internet which is free of charge. BrightStat's goals, its main capabilities and functionalities are outlined. Three different sample runs, a Friedman test, a chi-square test, and a step-wise multiple regression are presented. The results obtained by BrightStat are compared with results computed by SPSS, one of the global leader in providing statistical software, and VassarStats, a collection of scripts for data analysis running on the Internet. Elementary statistics is an inherent part of academic education and BrightStat is an alternative to commercial products.
Statistical inference for discrete-time samples from affine stochastic delay differential equations
DEFF Research Database (Denmark)
Küchler, Uwe; Sørensen, Michael
2013-01-01
Statistical inference for discrete time observations of an affine stochastic delay differential equation is considered. The main focus is on maximum pseudo-likelihood estimators, which are easy to calculate in practice. A more general class of prediction-based estimating functions is investigated...
International Nuclear Information System (INIS)
Guha, S.; Taylor, J.H.
1996-01-01
It is critical that summary statistics on background data, or background levels, be computed based on standardized and defensible statistical methods because background levels are frequently used in subsequent analyses and comparisons performed by separate analysts over time. The final background for naturally occurring radionuclide concentrations in soil at a RCRA facility, and the associated statistical methods used to estimate these concentrations, are presented. The primary objective is to describe, via a case study, the statistical methods used to estimate 95% upper tolerance limits (UTL) on radionuclide background soil data sets. A 95% UTL on background samples can be used as a screening level concentration in the absence of definitive soil cleanup criteria for naturally occurring radionuclides. The statistical methods are based exclusively on EPA guidance. This paper includes an introduction, a discussion of the analytical results for the radionuclides and a detailed description of the statistical analyses leading to the determination of 95% UTLs. Soil concentrations reported are based on validated data. Data sets are categorized as surficial soil; samples collected at depths from zero to one-half foot; and deep soil, samples collected from 3 to 5 feet. These data sets were tested for statistical outliers and underlying distributions were determined by using the chi-squared test for goodness-of-fit. UTLs for the data sets were then computed based on the percentage of non-detects and the appropriate best-fit distribution (lognormal, normal, or non-parametric). For data sets containing greater than approximately 50% nondetects, nonparametric UTLs were computed
7 CFR 52.38a - Definitions of terms applicable to statistical sampling.
2010-01-01
... the number of defects (or defectives), which exceed the sample unit tolerance (“T”), in a series of... accumulation of defects (or defectives) allowed to exceed the sample unit tolerance (“T”) in any sample unit or consecutive group of sample units. (ii) CuSum value. The accumulated number of defects (or defectives) that...
Kabeshova, Anastasiia; Launay, Cyrille P; Gromov, Vasilii A; Fantino, Bruno; Levinoff, Elise J; Allali, Gilles; Beauchet, Olivier
2016-01-01
To compare performance criteria (i.e., sensitivity, specificity, positive predictive value, negative predictive value, area under receiver operating characteristic curve and accuracy) of linear and non-linear statistical models for fall risk in older community-dwellers. Participants were recruited in two large population-based studies, "Prévention des Chutes, Réseau 4" (PCR4, n=1760, cross-sectional design, retrospective collection of falls) and "Prévention des Chutes Personnes Agées" (PCPA, n=1765, cohort design, prospective collection of falls). Six linear statistical models (i.e., logistic regression, discriminant analysis, Bayes network algorithm, decision tree, random forest, boosted trees), three non-linear statistical models corresponding to artificial neural networks (multilayer perceptron, genetic algorithm and neuroevolution of augmenting topologies [NEAT]) and the adaptive neuro fuzzy interference system (ANFIS) were used. Falls ≥1 characterizing fallers and falls ≥2 characterizing recurrent fallers were used as outcomes. Data of studies were analyzed separately and together. NEAT and ANFIS had better performance criteria compared to other models. The highest performance criteria were reported with NEAT when using PCR4 database and falls ≥1, and with both NEAT and ANFIS when pooling data together and using falls ≥2. However, sensitivity and specificity were unbalanced. Sensitivity was higher than specificity when identifying fallers, whereas the converse was found when predicting recurrent fallers. Our results showed that NEAT and ANFIS were non-linear statistical models with the best performance criteria for the prediction of falls but their sensitivity and specificity were unbalanced, underscoring that models should be used respectively for the screening of fallers and the diagnosis of recurrent fallers. Copyright © 2015 European Federation of Internal Medicine. Published by Elsevier B.V. All rights reserved.
Using Pre-Statistical Analysis to Streamline Monitoring Assessments
International Nuclear Information System (INIS)
Reed, J.K.
1999-01-01
A variety of statistical methods exist to aid evaluation of groundwater quality and subsequent decision making in regulatory programs. These methods are applied because of large temporal and spatial extrapolations commonly applied to these data. In short, statistical conclusions often serve as a surrogate for knowledge. However, facilities with mature monitoring programs that have generated abundant data have inherently less uncertainty because of the sheer quantity of analytical results. In these cases, statistical tests can be less important, and ''expert'' data analysis should assume an important screening role.The WSRC Environmental Protection Department, working with the General Separations Area BSRI Environmental Restoration project team has developed a method for an Integrated Hydrogeological Analysis (IHA) of historical water quality data from the F and H Seepage Basins groundwater remediation project. The IHA combines common sense analytical techniques and a GIS presentation that force direct interactive evaluation of the data. The IHA can perform multiple data analysis tasks required by the RCRA permit. These include: (1) Development of a groundwater quality baseline prior to remediation startup, (2) Targeting of constituents for removal from RCRA GWPS, (3) Targeting of constituents for removal from UIC, permit, (4) Targeting of constituents for reduced, (5)Targeting of monitoring wells not producing representative samples, (6) Reduction in statistical evaluation, and (7) Identification of contamination from other facilities
Topics in computer simulations of statistical systems
International Nuclear Information System (INIS)
Salvador, R.S.
1987-01-01
Several computer simulations studying a variety of topics in statistical mechanics and lattice gauge theories are performed. The first study describes a Monte Carlo simulation performed on Ising systems defined on Sierpinsky carpets of dimensions between one and four. The critical coupling and the exponent γ are measured as a function of dimension. The Ising gauge theory in d = 4 - epsilon, for epsilon → 0 + , is then studied by performing a Monte Carlo simulation for the theory defined on fractals. A high statistics Monte Carlo simulation for the three-dimensional Ising model is presented for lattices of sizes 8 3 to 44 3 . All the data obtained agrees completely, within statistical errors, with the forms predicted by finite-sizing scaling. Finally, a method to estimate numerically the partition function of statistical systems is developed
Evaluating the performance of species richness estimators: sensitivity to sample grain size
DEFF Research Database (Denmark)
Hortal, Joaquín; Borges, Paulo A. V.; Gaspar, Clara
2006-01-01
and several recent estimators [proposed by Rosenzweig et al. (Conservation Biology, 2003, 17, 864-874), and Ugland et al. (Journal of Animal Ecology, 2003, 72, 888-897)] performed poorly. 3. Estimations developed using the smaller grain sizes (pair of traps, traps, records and individuals) presented similar....... Data obtained with standardized sampling of 78 transects in natural forest remnants of five islands were aggregated in seven different grains (i.e. ways of defining a single sample): islands, natural areas, transects, pairs of traps, traps, database records and individuals to assess the effect of using...
The power and statistical behaviour of allele-sharing statistics when ...
Indian Academy of Sciences (India)
Unknown
3Human Genetics Division, School of Medicine, University of Southampton, Southampton SO16 6YD, UK. Abstract ... that the statistic S-#alleles gives good performance for recessive ... (H50) of the families are linked to the single marker. The.
An audit of the statistics and the comparison with the parameter in the population
Bujang, Mohamad Adam; Sa'at, Nadiah; Joys, A. Reena; Ali, Mariana Mohamad
2015-10-01
The sufficient sample size that is needed to closely estimate the statistics for particular parameters are use to be an issue. Although sample size might had been calculated referring to objective of the study, however, it is difficult to confirm whether the statistics are closed with the parameter for a particular population. All these while, guideline that uses a p-value less than 0.05 is widely used as inferential evidence. Therefore, this study had audited results that were analyzed from various sub sample and statistical analyses and had compared the results with the parameters in three different populations. Eight types of statistical analysis and eight sub samples for each statistical analysis were analyzed. Results found that the statistics were consistent and were closed to the parameters when the sample study covered at least 15% to 35% of population. Larger sample size is needed to estimate parameter that involve with categorical variables compared with numerical variables. Sample sizes with 300 to 500 are sufficient to estimate the parameters for medium size of population.
Fraley, R. Chris; Vazire, Simine
2014-01-01
The authors evaluate the quality of research reported in major journals in social-personality psychology by ranking those journals with respect to their N-pact Factors (NF)—the statistical power of the empirical studies they publish to detect typical effect sizes. Power is a particularly important attribute for evaluating research quality because, relative to studies that have low power, studies that have high power are more likely to (a) to provide accurate estimates of effects, (b) to produce literatures with low false positive rates, and (c) to lead to replicable findings. The authors show that the average sample size in social-personality research is 104 and that the power to detect the typical effect size in the field is approximately 50%. Moreover, they show that there is considerable variation among journals in sample sizes and power of the studies they publish, with some journals consistently publishing higher power studies than others. The authors hope that these rankings will be of use to authors who are choosing where to submit their best work, provide hiring and promotion committees with a superior way of quantifying journal quality, and encourage competition among journals to improve their NF rankings. PMID:25296159
Hong-Ghi Min
2011-01-01
Using Monte Carlo simulation of the Portfolio-balance model of the exchange rates, we report finite sample properties of the GMM estimator for testing over-identifying restrictions in the simultaneous equations model. F-form of Sargans statistic performs better than its chi-squared form while Hansens GMM statistic has the smallest bias.
Statistical Processing Algorithms for Human Population Databases
Directory of Open Access Journals (Sweden)
Camelia COLESCU
2012-01-01
Full Text Available The article is describing some algoritms for statistic functions aplied to a human population database. The samples are specific for the most interesting periods, when the evolution of statistical datas has spectacolous value. The article describes the most usefull form of grafical prezentation of the results
Types of non-probabilistic sampling used in marketing research. „Snowball” sampling
Manuela Rozalia Gabor
2007-01-01
A significant way of investigating a firm’s market is the statistical sampling. The sampling typology provides a non / probabilistic models of gathering information and this paper describes thorough information related to network sampling, named “snowball” sampling. This type of sampling enables the survey of occurrence forms concerning the decision power within an organisation and of the interpersonal relation network governing a certain collectivity, a certain consumer panel. The snowball s...
Ma, Junjun; Xiong, Xiong; He, Feng; Zhang, Wei
2017-04-01
The stock price fluctuation is studied in this paper with intrinsic time perspective. The event, directional change (DC) or overshoot, are considered as time scale of price time series. With this directional change law, its corresponding statistical properties and parameter estimation is tested in Chinese stock market. Furthermore, a directional change trading strategy is proposed for invest in the market portfolio in Chinese stock market, and both in-sample and out-of-sample performance are compared among the different method of model parameter estimation. We conclude that DC method can capture important fluctuations in Chinese stock market and gain profit due to the statistical property that average upturn overshoot size is bigger than average downturn directional change size. The optimal parameter of DC method is not fixed and we obtained 1.8% annual excess return with this DC-based trading strategy.
International Nuclear Information System (INIS)
Adekola, A.S.; Colaresi, J.; Douwen, J.; Jaederstroem, H.; Mueller, W.F.; Yocum, K.M.; Carmichael, K.
2015-01-01
Environmental scientific research requires a detector that has sensitivity low enough to reveal the presence of any contaminant in the sample at a reasonable counting time. Canberra developed the germanium detector geometry called Small Anode Germanium (SAGe) Well detector, which is now available commercially. The SAGe Well detector is a new type of low capacitance germanium well detector manufactured using small anode technology capable of advancing many environmental scientific research applications. The performance of this detector has been evaluated for a range of sample sizes and geometries counted inside the well, and on the end cap of the detector. The detector has energy resolution performance similar to semi-planar detectors, and offers significant improvement over the existing coaxial and Well detectors. Energy resolution performance of 750 eV Full Width at Half Maximum (FWHM) at 122 keV γ-ray energy and resolution of 2.0 - 2.3 keV FWHM at 1332 keV γ-ray energy are guaranteed for detector volumes up to 425 cm 3 . The SAGe Well detector offers an optional 28 mm well diameter with the same energy resolution as the standard 16 mm well. Such outstanding resolution performance will benefit environmental applications in revealing the detailed radionuclide content of samples, particularly at low energy, and will enhance the detection sensitivity resulting in reduced counting time. The detector is compatible with electric coolers without any sacrifice in performance and supports the Canberra Mathematical efficiency calibration method (In situ Object Calibration Software or ISOCS, and Laboratory Source-less Calibration Software or LABSOCS). In addition, the SAGe Well detector supports true coincidence summing available in the ISOCS/LABSOCS framework. The improved resolution performance greatly enhances detection sensitivity of this new detector for a range of sample sizes and geometries counted inside the well. This results in lower minimum detectable
Energy Technology Data Exchange (ETDEWEB)
Adekola, A.S.; Colaresi, J.; Douwen, J.; Jaederstroem, H.; Mueller, W.F.; Yocum, K.M.; Carmichael, K. [Canberra Industries Inc., 800 Research Parkway, Meriden, CT 06450 (United States)
2015-07-01
Environmental scientific research requires a detector that has sensitivity low enough to reveal the presence of any contaminant in the sample at a reasonable counting time. Canberra developed the germanium detector geometry called Small Anode Germanium (SAGe) Well detector, which is now available commercially. The SAGe Well detector is a new type of low capacitance germanium well detector manufactured using small anode technology capable of advancing many environmental scientific research applications. The performance of this detector has been evaluated for a range of sample sizes and geometries counted inside the well, and on the end cap of the detector. The detector has energy resolution performance similar to semi-planar detectors, and offers significant improvement over the existing coaxial and Well detectors. Energy resolution performance of 750 eV Full Width at Half Maximum (FWHM) at 122 keV γ-ray energy and resolution of 2.0 - 2.3 keV FWHM at 1332 keV γ-ray energy are guaranteed for detector volumes up to 425 cm{sup 3}. The SAGe Well detector offers an optional 28 mm well diameter with the same energy resolution as the standard 16 mm well. Such outstanding resolution performance will benefit environmental applications in revealing the detailed radionuclide content of samples, particularly at low energy, and will enhance the detection sensitivity resulting in reduced counting time. The detector is compatible with electric coolers without any sacrifice in performance and supports the Canberra Mathematical efficiency calibration method (In situ Object Calibration Software or ISOCS, and Laboratory Source-less Calibration Software or LABSOCS). In addition, the SAGe Well detector supports true coincidence summing available in the ISOCS/LABSOCS framework. The improved resolution performance greatly enhances detection sensitivity of this new detector for a range of sample sizes and geometries counted inside the well. This results in lower minimum detectable
Sample preparation of environmental samples using benzene synthesis followed by high-performance LSC
International Nuclear Information System (INIS)
Filippis, S. De; Noakes, J.E.
1991-01-01
Liquid scintillation counting (LSC) techniques have been widely employed as the detection method for determining environmental levels of tritium and 14 C. Since anthropogenic and nonanthropogenic inputs to the environment are a concern, sampling the environment surrounding a nuclear power facility or fuel reprocessing operation requires the collection of many different sample types, including agriculture products, water, biota, aquatic life, soil, and vegetation. These sample types are not suitable for the direct detection of tritium of 14 C for liquid scintillation techniques. Each sample type must be initially prepared in order to obtain the carbon or hydrogen component of interest and present this in a chemical form that is compatible with common chemicals used in scintillation counting applications. Converting the sample of interest to chemically pure benzene as a sample preparation technique has been widely accepted for processing samples for radiocarbon age-dating applications. The synthesized benzene is composed of the carbon or hydrogen atoms from the original sample and is ideal as a solvent for LSC with excellent photo-optical properties. Benzene synthesis followed by low-background scintillation counting can be applied to the preparation and measurement of environmental samples yielding good detection sensitivities, high radionuclide counting efficiency, and shorter preparation time. The method of benzene synthesis provides a unique approach to the preparation of a wide variety of environmental sample types using similar chemistry for all samples
The COMT Val158 allele is associated with impaired delayed-match-to-sample performance in ADHD
Directory of Open Access Journals (Sweden)
Matthews Natasha
2012-05-01
Full Text Available Abstract Background This study explored the association between three measures of working memory ability and genetic variation in a range of catecholamine genes in a sample of children with ADHD. Methods One hundred and eighteen children with ADHD performed three working memory measures taken from the CANTAB battery (Spatial Span, Delayed-match-to-sample, and Spatial Working Memory. Associations between performance on working memory measures and allelic variation in catecholamine genes (including those for the noradrenaline transporter [NET1], the dopamine D4 and D2 receptor genes [DRD4; DRD2], the gene encoding dopamine beta hydroxylase [DBH] and catechol-O-methyl transferase [COMT] were investigated using regression models that controlled for age, IQ, gender and medication status on the day of test. Results Significant associations were found between performance on the delayed-match-to-sample task and COMT genotype. More specifically, val/val homozygotes produced significantly more errors than did children who carried a least one met allele. There were no further associations between allelic variants and performance across the other working memory tasks. Conclusions The working memory measures employed in the present study differed in the degree to which accurate task performance depended upon either the dynamic updating and/or manipulation of items in working memory, as in the spatial span and spatial working memory tasks, or upon the stable maintenance of representations, as in the delay-match–to-sample task. The results are interpreted as evidence of a relationship between tonic dopamine levels associated with the met COMT allele and the maintenance of stable working memory representations required to perform the delayed-match-to-sample-task.
Identifying User Profiles from Statistical Grouping Methods
Directory of Open Access Journals (Sweden)
Francisco Kelsen de Oliveira
2018-02-01
Full Text Available This research aimed to group users into subgroups according to their levels of knowledge about technology. Statistical hierarchical and non-hierarchical clustering methods were studied, compared and used in the creations of the subgroups from the similarities of the skill levels with these users’ technology. The research sample consisted of teachers who answered online questionnaires about their skills with the use of software and hardware with educational bias. The statistical methods of grouping were performed and showed the possibilities of groupings of the users. The analyses of these groups allowed to identify the common characteristics among the individuals of each subgroup. Therefore, it was possible to define two subgroups of users, one with skill in technology and another with skill with technology, so that the partial results of the research showed two main algorithms for grouping with 92% similarity in the formation of groups of users with skill with technology and the other with little skill, confirming the accuracy of the techniques of discrimination against individuals.
2012 Aerospace Medical Certification Statistical Handbook
2013-12-01
2012 Aerospace Medical Certification Statistical Handbook Valerie J. Skaggs Ann I. Norris Civil Aerospace Medical Institute Federal Aviation...Certification Statistical Handbook December 2013 6. Performing Organization Code 7. Author(s) 8. Performing Organization Report No. Skaggs VJ, Norris AI 9...2.57 Hayfever 14,477 2.49 Asthma 12,558 2.16 Other general heart pathology (abnormal ECG, open heart surgery, etc.). Wolff-Parkinson-White syndrome
Data splitting for artificial neural networks using SOM-based stratified sampling.
May, R J; Maier, H R; Dandy, G C
2010-03-01
Data splitting is an important consideration during artificial neural network (ANN) development where hold-out cross-validation is commonly employed to ensure generalization. Even for a moderate sample size, the sampling methodology used for data splitting can have a significant effect on the quality of the subsets used for training, testing and validating an ANN. Poor data splitting can result in inaccurate and highly variable model performance; however, the choice of sampling methodology is rarely given due consideration by ANN modellers. Increased confidence in the sampling is of paramount importance, since the hold-out sampling is generally performed only once during ANN development. This paper considers the variability in the quality of subsets that are obtained using different data splitting approaches. A novel approach to stratified sampling, based on Neyman sampling of the self-organizing map (SOM), is developed, with several guidelines identified for setting the SOM size and sample allocation in order to minimize the bias and variance in the datasets. Using an example ANN function approximation task, the SOM-based approach is evaluated in comparison to random sampling, DUPLEX, systematic stratified sampling, and trial-and-error sampling to minimize the statistical differences between data sets. Of these approaches, DUPLEX is found to provide benchmark performance with good model performance, with no variability. The results show that the SOM-based approach also reliably generates high-quality samples and can therefore be used with greater confidence than other approaches, especially in the case of non-uniform datasets, with the benefit of scalability to perform data splitting on large datasets. Copyright 2009 Elsevier Ltd. All rights reserved.
Defining And Characterizing Sample Representativeness For DWPF Melter Feed Samples
Energy Technology Data Exchange (ETDEWEB)
Shine, E. P.; Poirier, M. R.
2013-10-29
Representative sampling is important throughout the Defense Waste Processing Facility (DWPF) process, and the demonstrated success of the DWPF process to achieve glass product quality over the past two decades is a direct result of the quality of information obtained from the process. The objective of this report was to present sampling methods that the Savannah River Site (SRS) used to qualify waste being dispositioned at the DWPF. The goal was to emphasize the methodology, not a list of outcomes from those studies. This methodology includes proven methods for taking representative samples, the use of controlled analytical methods, and data interpretation and reporting that considers the uncertainty of all error sources. Numerous sampling studies were conducted during the development of the DWPF process and still continue to be performed in order to evaluate options for process improvement. Study designs were based on use of statistical tools applicable to the determination of uncertainties associated with the data needs. Successful designs are apt to be repeated, so this report chose only to include prototypic case studies that typify the characteristics of frequently used designs. Case studies have been presented for studying in-tank homogeneity, evaluating the suitability of sampler systems, determining factors that affect mixing and sampling, comparing the final waste glass product chemical composition and durability to that of the glass pour stream sample and other samples from process vessels, and assessing the uniformity of the chemical composition in the waste glass product. Many of these studies efficiently addressed more than one of these areas of concern associated with demonstrating sample representativeness and provide examples of statistical tools in use for DWPF. The time when many of these designs were implemented was in an age when the sampling ideas of Pierre Gy were not as widespread as they are today. Nonetheless, the engineers and
Defining And Characterizing Sample Representativeness For DWPF Melter Feed Samples
International Nuclear Information System (INIS)
Shine, E. P.; Poirier, M. R.
2013-01-01
Representative sampling is important throughout the Defense Waste Processing Facility (DWPF) process, and the demonstrated success of the DWPF process to achieve glass product quality over the past two decades is a direct result of the quality of information obtained from the process. The objective of this report was to present sampling methods that the Savannah River Site (SRS) used to qualify waste being dispositioned at the DWPF. The goal was to emphasize the methodology, not a list of outcomes from those studies. This methodology includes proven methods for taking representative samples, the use of controlled analytical methods, and data interpretation and reporting that considers the uncertainty of all error sources. Numerous sampling studies were conducted during the development of the DWPF process and still continue to be performed in order to evaluate options for process improvement. Study designs were based on use of statistical tools applicable to the determination of uncertainties associated with the data needs. Successful designs are apt to be repeated, so this report chose only to include prototypic case studies that typify the characteristics of frequently used designs. Case studies have been presented for studying in-tank homogeneity, evaluating the suitability of sampler systems, determining factors that affect mixing and sampling, comparing the final waste glass product chemical composition and durability to that of the glass pour stream sample and other samples from process vessels, and assessing the uniformity of the chemical composition in the waste glass product. Many of these studies efficiently addressed more than one of these areas of concern associated with demonstrating sample representativeness and provide examples of statistical tools in use for DWPF. The time when many of these designs were implemented was in an age when the sampling ideas of Pierre Gy were not as widespread as they are today. Nonetheless, the engineers and
Statistical Analysis and validation
Hoefsloot, H.C.J.; Horvatovich, P.; Bischoff, R.
2013-01-01
In this chapter guidelines are given for the selection of a few biomarker candidates from a large number of compounds with a relative low number of samples. The main concepts concerning the statistical validation of the search for biomarkers are discussed. These complicated methods and concepts are