Estimating population size with correlated sampling unit estimates
David C. Bowden; Gary C. White; Alan B. Franklin; Joseph L. Ganey
2003-01-01
Finite population sampling theory is useful in estimating total population size (abundance) from abundance estimates of each sampled unit (quadrat). We develop estimators that allow correlated quadrat abundance estimates, even for quadrats in different sampling strata. Correlated quadrat abundance estimates based on markârecapture or distance sampling methods occur...
Basic Statistical Concepts for Sample Size Estimation
Directory of Open Access Journals (Sweden)
Vithal K Dhulkhed
2008-01-01
Full Text Available For grant proposals the investigator has to include an estimation of sample size .The size of the sample should be adequate enough so that there is sufficient data to reliably answer the research question being addressed by the study. At the very planning stage of the study the investigator has to involve the statistician. To have meaningful dialogue with the statistician every research worker should be familiar with the basic concepts of statistics. This paper is concerned with simple principles of sample size calculation. Concepts are explained based on logic rather than rigorous mathematical calculations to help him assimilate the fundamentals.
Sample size estimation and sampling techniques for selecting a representative sample
Aamir Omair
2014-01-01
Introduction: The purpose of this article is to provide a general understanding of the concepts of sampling as applied to health-related research. Sample Size Estimation: It is important to select a representative sample in quantitative research in order to be able to generalize the results to the target population. The sample should be of the required sample size and must be selected using an appropriate probability sampling technique. There are many hidden biases which can adversely affect ...
Sampling strategies for estimating brook trout effective population size
Andrew R. Whiteley; Jason A. Coombs; Mark Hudy; Zachary Robinson; Keith H. Nislow; Benjamin H. Letcher
2012-01-01
The influence of sampling strategy on estimates of effective population size (Ne) from single-sample genetic methods has not been rigorously examined, though these methods are increasingly used. For headwater salmonids, spatially close kin association among age-0 individuals suggests that sampling strategy (number of individuals and location from...
Sample size estimation and sampling techniques for selecting a representative sample
Directory of Open Access Journals (Sweden)
Aamir Omair
2014-01-01
Full Text Available Introduction: The purpose of this article is to provide a general understanding of the concepts of sampling as applied to health-related research. Sample Size Estimation: It is important to select a representative sample in quantitative research in order to be able to generalize the results to the target population. The sample should be of the required sample size and must be selected using an appropriate probability sampling technique. There are many hidden biases which can adversely affect the outcome of the study. Important factors to consider for estimating the sample size include the size of the study population, confidence level, expected proportion of the outcome variable (for categorical variables/standard deviation of the outcome variable (for numerical variables, and the required precision (margin of accuracy from the study. The more the precision required, the greater is the required sample size. Sampling Techniques: The probability sampling techniques applied for health related research include simple random sampling, systematic random sampling, stratified random sampling, cluster sampling, and multistage sampling. These are more recommended than the nonprobability sampling techniques, because the results of the study can be generalized to the target population.
Effects of sample size on KERNEL home range estimates
Seaman, D.E.; Millspaugh, J.J.; Kernohan, Brian J.; Brundige, Gary C.; Raedeke, Kenneth J.; Gitzen, Robert A.
1999-01-01
Kernel methods for estimating home range are being used increasingly in wildlife research, but the effect of sample size on their accuracy is not known. We used computer simulations of 10-200 points/home range and compared accuracy of home range estimates produced by fixed and adaptive kernels with the reference (REF) and least-squares cross-validation (LSCV) methods for determining the amount of smoothing. Simulated home ranges varied from simple to complex shapes created by mixing bivariate normal distributions. We used the size of the 95% home range area and the relative mean squared error of the surface fit to assess the accuracy of the kernel home range estimates. For both measures, the bias and variance approached an asymptote at about 50 observations/home range. The fixed kernel with smoothing selected by LSCV provided the least-biased estimates of the 95% home range area. All kernel methods produced similar surface fit for most simulations, but the fixed kernel with LSCV had the lowest frequency and magnitude of very poor estimates. We reviewed 101 papers published in The Journal of Wildlife Management (JWM) between 1980 and 1997 that estimated animal home ranges. A minority of these papers used nonparametric utilization distribution (UD) estimators, and most did not adequately report sample sizes. We recommend that home range studies using kernel estimates use LSCV to determine the amount of smoothing, obtain a minimum of 30 observations per animal (but preferably a?Y50), and report sample sizes in published results.
Estimation of individual reference intervals in small sample sizes
DEFF Research Database (Denmark)
Hansen, Ase Marie; Garde, Anne Helene; Eller, Nanna Hurwitz
2007-01-01
In occupational health studies, the study groups most often comprise healthy subjects performing their work. Sampling is often planned in the most practical way, e.g., sampling of blood in the morning at the work site just after the work starts. Optimal use of reference intervals requires...... of that order of magnitude for all topics in question. Therefore, new methods to estimate reference intervals for small sample sizes are needed. We present an alternative method based on variance component models. The models are based on data from 37 men and 84 women taking into account biological variation...... presented in this study. The presented method enables occupational health researchers to calculate reference intervals for specific groups, i.e. smokers versus non-smokers, etc. In conclusion, the variance component models provide an appropriate tool to estimate reference intervals based on small sample...
Fearon, Elizabeth; Chabata, Sungai T; Thompson, Jennifer A; Cowan, Frances M; Hargreaves, James R
2017-09-14
While guidance exists for obtaining population size estimates using multiplier methods with respondent-driven sampling surveys, we lack specific guidance for making sample size decisions. To guide the design of multiplier method population size estimation studies using respondent-driven sampling surveys to reduce the random error around the estimate obtained. The population size estimate is obtained by dividing the number of individuals receiving a service or the number of unique objects distributed (M) by the proportion of individuals in a representative survey who report receipt of the service or object (P). We have developed an approach to sample size calculation, interpreting methods to estimate the variance around estimates obtained using multiplier methods in conjunction with research into design effects and respondent-driven sampling. We describe an application to estimate the number of female sex workers in Harare, Zimbabwe. There is high variance in estimates. Random error around the size estimate reflects uncertainty from M and P, particularly when the estimate of P in the respondent-driven sampling survey is low. As expected, sample size requirements are higher when the design effect of the survey is assumed to be greater. We suggest a method for investigating the effects of sample size on the precision of a population size estimate obtained using multipler methods and respondent-driven sampling. Uncertainty in the size estimate is high, particularly when P is small, so balancing against other potential sources of bias, we advise researchers to consider longer service attendance reference periods and to distribute more unique objects, which is likely to result in a higher estimate of P in the respondent-driven sampling survey.
Determining Sample Size for Accurate Estimation of the Squared Multiple Correlation Coefficient.
Algina, James; Olejnik, Stephen
2000-01-01
Discusses determining sample size for estimation of the squared multiple correlation coefficient and presents regression equations that permit determination of the sample size for estimating this parameter for up to 20 predictor variables. (SLD)
Variance estimation, design effects, and sample size calculations for respondent-driven sampling.
Salganik, Matthew J
2006-11-01
Hidden populations, such as injection drug users and sex workers, are central to a number of public health problems. However, because of the nature of these groups, it is difficult to collect accurate information about them, and this difficulty complicates disease prevention efforts. A recently developed statistical approach called respondent-driven sampling improves our ability to study hidden populations by allowing researchers to make unbiased estimates of the prevalence of certain traits in these populations. Yet, not enough is known about the sample-to-sample variability of these prevalence estimates. In this paper, we present a bootstrap method for constructing confidence intervals around respondent-driven sampling estimates and demonstrate in simulations that it outperforms the naive method currently in use. We also use simulations and real data to estimate the design effects for respondent-driven sampling in a number of situations. We conclude with practical advice about the power calculations that are needed to determine the appropriate sample size for a study using respondent-driven sampling. In general, we recommend a sample size twice as large as would be needed under simple random sampling.
An Update on Using the Range to Estimate σ When Determining Sample Sizes.
Rhiel, George Steven; Markowski, Edward
2017-04-01
In this research, we develop a strategy for using a range estimator of σ when determining a sample size for estimating a mean. Previous research by Rhiel is extended to provide dn values for use in calculating a range estimate of σ when working with sampling frames up to size 1,000,000. This allows the use of the range estimator of σ with "big data." A strategy is presented for using the range estimator of σ for determining sample sizes based on the dn values developed in this study.
Blinded sample size re-estimation in three-arm trials with 'gold standard' design.
Mütze, Tobias; Friede, Tim
2017-10-15
In this article, we study blinded sample size re-estimation in the 'gold standard' design with internal pilot study for normally distributed outcomes. The 'gold standard' design is a three-arm clinical trial design that includes an active and a placebo control in addition to an experimental treatment. We focus on the absolute margin approach to hypothesis testing in three-arm trials at which the non-inferiority of the experimental treatment and the assay sensitivity are assessed by pairwise comparisons. We compare several blinded sample size re-estimation procedures in a simulation study assessing operating characteristics including power and type I error. We find that sample size re-estimation based on the popular one-sample variance estimator results in overpowered trials. Moreover, sample size re-estimation based on unbiased variance estimators such as the Xing-Ganju variance estimator results in underpowered trials, as it is expected because an overestimation of the variance and thus the sample size is in general required for the re-estimation procedure to eventually meet the target power. To overcome this problem, we propose an inflation factor for the sample size re-estimation with the Xing-Ganju variance estimator and show that this approach results in adequately powered trials. Because of favorable features of the Xing-Ganju variance estimator such as unbiasedness and a distribution independent of the group means, the inflation factor does not depend on the nuisance parameter and, therefore, can be calculated prior to a trial. Moreover, we prove that the sample size re-estimation based on the Xing-Ganju variance estimator does not bias the effect estimate. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Sample size for estimation of the Pearson correlation coefficient in cherry tomato tests
Directory of Open Access Journals (Sweden)
Bruno Giacomini Sari
2017-09-01
Full Text Available ABSTRACT: The aim of this study was to determine the required sample size for estimation of the Pearson coefficient of correlation between cherry tomato variables. Two uniformity tests were set up in a protected environment in the spring/summer of 2014. The observed variables in each plant were mean fruit length, mean fruit width, mean fruit weight, number of bunches, number of fruits per bunch, number of fruits, and total weight of fruits, with calculation of the Pearson correlation matrix between them. Sixty eight sample sizes were planned for one greenhouse and 48 for another, with the initial sample size of 10 plants, and the others were obtained by adding five plants. For each planned sample size, 3000 estimates of the Pearson correlation coefficient were obtained through bootstrap re-samplings with replacement. The sample size for each correlation coefficient was determined when the 95% confidence interval amplitude value was less than or equal to 0.4. Obtaining estimates of the Pearson correlation coefficient with high precision is difficult for parameters with a weak linear relation. Accordingly, a larger sample size is necessary to estimate them. Linear relations involving variables dealing with size and number of fruits per plant have less precision. To estimate the coefficient of correlation between productivity variables of cherry tomato, with a confidence interval of 95% equal to 0.4, it is necessary to sample 275 plants in a 250m² greenhouse, and 200 plants in a 200m² greenhouse.
Effects of Sample Size on Estimates of Population Growth Rates Calculated with Matrix Models
Fiske, Ian J.; Bruna, Emilio M.; Bolker, Benjamin M.
2008-01-01
Background Matrix models are widely used to study the dynamics and demography of populations. An important but overlooked issue is how the number of individuals sampled influences estimates of the population growth rate (λ) calculated with matrix models. Even unbiased estimates of vital rates do not ensure unbiased estimates of λ–Jensen's Inequality implies that even when the estimates of the vital rates are accurate, small sample sizes lead to biased estimates of λ due to increased sampling variance. We investigated if sampling variability and the distribution of sampling effort among size classes lead to biases in estimates of λ. Methodology/Principal Findings Using data from a long-term field study of plant demography, we simulated the effects of sampling variance by drawing vital rates and calculating λ for increasingly larger populations drawn from a total population of 3842 plants. We then compared these estimates of λ with those based on the entire population and calculated the resulting bias. Finally, we conducted a review of the literature to determine the sample sizes typically used when parameterizing matrix models used to study plant demography. Conclusions/Significance We found significant bias at small sample sizes when survival was low (survival = 0.5), and that sampling with a more-realistic inverse J-shaped population structure exacerbated this bias. However our simulations also demonstrate that these biases rapidly become negligible with increasing sample sizes or as survival increases. For many of the sample sizes used in demographic studies, matrix models are probably robust to the biases resulting from sampling variance of vital rates. However, this conclusion may depend on the structure of populations or the distribution of sampling effort in ways that are unexplored. We suggest more intensive sampling of populations when individual survival is low and greater sampling of stages with high elasticities. PMID:18769483
Effects of sample size on estimates of population growth rates calculated with matrix models.
Directory of Open Access Journals (Sweden)
Ian J Fiske
Full Text Available BACKGROUND: Matrix models are widely used to study the dynamics and demography of populations. An important but overlooked issue is how the number of individuals sampled influences estimates of the population growth rate (lambda calculated with matrix models. Even unbiased estimates of vital rates do not ensure unbiased estimates of lambda-Jensen's Inequality implies that even when the estimates of the vital rates are accurate, small sample sizes lead to biased estimates of lambda due to increased sampling variance. We investigated if sampling variability and the distribution of sampling effort among size classes lead to biases in estimates of lambda. METHODOLOGY/PRINCIPAL FINDINGS: Using data from a long-term field study of plant demography, we simulated the effects of sampling variance by drawing vital rates and calculating lambda for increasingly larger populations drawn from a total population of 3842 plants. We then compared these estimates of lambda with those based on the entire population and calculated the resulting bias. Finally, we conducted a review of the literature to determine the sample sizes typically used when parameterizing matrix models used to study plant demography. CONCLUSIONS/SIGNIFICANCE: We found significant bias at small sample sizes when survival was low (survival = 0.5, and that sampling with a more-realistic inverse J-shaped population structure exacerbated this bias. However our simulations also demonstrate that these biases rapidly become negligible with increasing sample sizes or as survival increases. For many of the sample sizes used in demographic studies, matrix models are probably robust to the biases resulting from sampling variance of vital rates. However, this conclusion may depend on the structure of populations or the distribution of sampling effort in ways that are unexplored. We suggest more intensive sampling of populations when individual survival is low and greater sampling of stages with high
Post-stratified estimation: with-in strata and total sample size recommendations
James A. Westfall; Paul L. Patterson; John W. Coulston
2011-01-01
Post-stratification is used to reduce the variance of estimates of the mean. Because the stratification is not fixed in advance, within-strata sample sizes can be quite small. The survey statistics literature provides some guidance on minimum within-strata sample sizes; however, the recommendations and justifications are inconsistent and apply broadly for many...
Evaluating the performance of species richness estimators: sensitivity to sample grain size
DEFF Research Database (Denmark)
Hortal, Joaquín; Borges, Paulo A. V.; Gaspar, Clara
2006-01-01
scores in a number of estimators (the above-mentioned plus ICE, Chao2, Michaelis-Menten, Negative Exponential and Clench). The estimations from those four sample sizes were also highly correlated. 4. Contrary to other studies, we conclude that most species richness estimators may be useful......Fifteen species richness estimators (three asymptotic based on species accumulation curves, 11 nonparametric, and one based in the species-area relationship) were compared by examining their performance in estimating the total species richness of epigean arthropods in the Azorean Laurisilva forests...... different sampling units on species richness estimations. 2. Estimated species richness scores depended both on the estimator considered and on the grain size used to aggregate data. However, several estimators (ACE, Chao1, Jackknife1 and 2 and Bootstrap) were precise in spite of grain variations. Weibull...
The Impact of Sample Size and Other Factors When Estimating Multilevel Logistic Models
Schoeneberger, Jason A.
2016-01-01
The design of research studies utilizing binary multilevel models must necessarily incorporate knowledge of multiple factors, including estimation method, variance component size, or number of predictors, in addition to sample sizes. This Monte Carlo study examined the performance of random effect binary outcome multilevel models under varying…
Voss, Sebastian; Zimmermann, Beate; Zimmermann, Alexander
2016-09-01
In the last decades, an increasing number of studies analyzed spatial patterns in throughfall by means of variograms. The estimation of the variogram from sample data requires an appropriate sampling scheme: most importantly, a large sample and a layout of sampling locations that often has to serve both variogram estimation and geostatistical prediction. While some recommendations on these aspects exist, they focus on Gaussian data and high ratios of the variogram range to the extent of the study area. However, many hydrological data, and throughfall data in particular, do not follow a Gaussian distribution. In this study, we examined the effect of extent, sample size, sampling design, and calculation method on variogram estimation of throughfall data. For our investigation, we first generated non-Gaussian random fields based on throughfall data with large outliers. Subsequently, we sampled the fields with three extents (plots with edge lengths of 25 m, 50 m, and 100 m), four common sampling designs (two grid-based layouts, transect and random sampling) and five sample sizes (50, 100, 150, 200, 400). We then estimated the variogram parameters by method-of-moments (non-robust and robust estimators) and residual maximum likelihood. Our key findings are threefold. First, the choice of the extent has a substantial influence on the estimation of the variogram. A comparatively small ratio of the extent to the correlation length is beneficial for variogram estimation. Second, a combination of a minimum sample size of 150, a design that ensures the sampling of small distances and variogram estimation by residual maximum likelihood offers a good compromise between accuracy and efficiency. Third, studies relying on method-of-moments based variogram estimation may have to employ at least 200 sampling points for reliable variogram estimates. These suggested sample sizes exceed the number recommended by studies dealing with Gaussian data by up to 100 %. Given that most previous
Estimating the Size of a Large Network and its Communities from a Random Sample.
Chen, Lin; Karbasi, Amin; Crawford, Forrest W
2016-01-01
Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = (V, E) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W ⊆ V and letting G(W) be the induced subgraph in G of the vertices in W. In addition to G(W), we observe the total degree of each sampled vertex and its block membership. Given this partial information, we propose an efficient PopULation Size Estimation algorithm, called PULSE, that accurately estimates the size of the whole population as well as the size of each community. To support our theoretical analysis, we perform an exhaustive set of experiments to study the effects of sample size, K, and SBM model parameters on the accuracy of the estimates. The experimental results also demonstrate that PULSE significantly outperforms a widely-used method called the network scale-up estimator in a wide variety of scenarios.
Vallejo, Adriana; Muniesa, Ana; Ferreira, Chelo; de Blas, Ignacio
2013-10-01
Nowadays the formula to calculate the sample size for estimate a proportion (as prevalence) is based on the Normal distribution, however it would be based on a Binomial distribution which confidence interval was possible to be calculated using the Wilson Score method. By comparing the two formulae (Normal and Binomial distributions), the variation of the amplitude of the confidence intervals is relevant in the tails and the center of the curves. In order to calculate the needed sample size we have simulated an iterative sampling procedure, which shows an underestimation of the sample size for values of prevalence closed to 0 or 1, and also an overestimation for values closed to 0.5. Attending to these results we proposed an algorithm based on Wilson Score method that provides similar values for the sample size than empirically obtained by simulation. Copyright © 2013 Elsevier Ltd. All rights reserved.
Sample Size Calculation for Estimating or Testing a Nonzero Squared Multiple Correlation Coefficient
Krishnamoorthy, K.; Xia, Yanping
2008-01-01
The problems of hypothesis testing and interval estimation of the squared multiple correlation coefficient of a multivariate normal distribution are considered. It is shown that available one-sided tests are uniformly most powerful, and the one-sided confidence intervals are uniformly most accurate. An exact method of calculating sample size to…
B-graph sampling to estimate the size of a hidden population
Spreen, M.; Bogaerts, S.
2015-01-01
Link-tracing designs are often used to estimate the size of hidden populations by utilizing the relational links between their members. A major problem in studies of hidden populations is the lack of a convenient sampling frame. The most frequently applied design in studies of hidden populations is
A simple nomogram for sample size for estimating sensitivity and specificity of medical tests
Directory of Open Access Journals (Sweden)
Malhotra Rajeev
2010-01-01
Full Text Available Sensitivity and specificity measure inherent validity of a diagnostic test against a gold standard. Researchers develop new diagnostic methods to reduce the cost, risk, invasiveness, and time. Adequate sample size is a must to precisely estimate the validity of a diagnostic test. In practice, researchers generally decide about the sample size arbitrarily either at their convenience, or from the previous literature. We have devised a simple nomogram that yields statistically valid sample size for anticipated sensitivity or anticipated specificity. MS Excel version 2007 was used to derive the values required to plot the nomogram using varying absolute precision, known prevalence of disease, and 95% confidence level using the formula already available in the literature. The nomogram plot was obtained by suitably arranging the lines and distances to conform to this formula. This nomogram could be easily used to determine the sample size for estimating the sensitivity or specificity of a diagnostic test with required precision and 95% confidence level. Sample size at 90% and 99% confidence level, respectively, can also be obtained by just multiplying 0.70 and 1.75 with the number obtained for the 95% confidence level. A nomogram instantly provides the required number of subjects by just moving the ruler and can be repeatedly used without redoing the calculations. This can also be applied for reverse calculations. This nomogram is not applicable for testing of the hypothesis set-up and is applicable only when both diagnostic test and gold standard results have a dichotomous category.
A simple method for estimating genetic diversity in large populations from finite sample sizes
Directory of Open Access Journals (Sweden)
Rajora Om P
2009-12-01
Full Text Available Abstract Background Sample size is one of the critical factors affecting the accuracy of the estimation of population genetic diversity parameters. Small sample sizes often lead to significant errors in determining the allelic richness, which is one of the most important and commonly used estimators of genetic diversity in populations. Correct estimation of allelic richness in natural populations is challenging since they often do not conform to model assumptions. Here, we introduce a simple and robust approach to estimate the genetic diversity in large natural populations based on the empirical data for finite sample sizes. Results We developed a non-linear regression model to infer genetic diversity estimates in large natural populations from finite sample sizes. The allelic richness values predicted by our model were in good agreement with those observed in the simulated data sets and the true allelic richness observed in the source populations. The model has been validated using simulated population genetic data sets with different evolutionary scenarios implied in the simulated populations, as well as large microsatellite and allozyme experimental data sets for four conifer species with contrasting patterns of inherent genetic diversity and mating systems. Our model was a better predictor for allelic richness in natural populations than the widely-used Ewens sampling formula, coalescent approach, and rarefaction algorithm. Conclusions Our regression model was capable of accurately estimating allelic richness in natural populations regardless of the species and marker system. This regression modeling approach is free from assumptions and can be widely used for population genetic and conservation applications.
B-Graph Sampling to Estimate the Size of a Hidden Population
Directory of Open Access Journals (Sweden)
Spreen Marinus
2015-12-01
Full Text Available Link-tracing designs are often used to estimate the size of hidden populations by utilizing the relational links between their members. A major problem in studies of hidden populations is the lack of a convenient sampling frame. The most frequently applied design in studies of hidden populations is respondent-driven sampling in which no sampling frame is used. However, in some studies multiple but incomplete sampling frames are available. In this article, we introduce the B-graph design that can be used in such situations. In this design, all available incomplete sampling frames are joined and turned into one sampling frame, from which a random sample is drawn and selected respondents are asked to mention their contacts. By considering the population as a bipartite graph of a two-mode network (those from the sampling frame and those who are not on the frame, the number of respondents who are directly linked to the sampling frame members can be estimated using Chao’s and Zelterman’s estimators for sparse data. The B-graph sampling design is illustrated using the data of a social network study from Utrecht, the Netherlands.
Wan, Xiang; Wang, Wenqian; Liu, Jiming; Tong, Tiejun
2014-12-19
In systematic reviews and meta-analysis, researchers often pool the results of the sample mean and standard deviation from a set of similar clinical trials. A number of the trials, however, reported the study using the median, the minimum and maximum values, and/or the first and third quartiles. Hence, in order to combine results, one may have to estimate the sample mean and standard deviation for such trials. In this paper, we propose to improve the existing literature in several directions. First, we show that the sample standard deviation estimation in Hozo et al.'s method (BMC Med Res Methodol 5:13, 2005) has some serious limitations and is always less satisfactory in practice. Inspired by this, we propose a new estimation method by incorporating the sample size. Second, we systematically study the sample mean and standard deviation estimation problem under several other interesting settings where the interquartile range is also available for the trials. We demonstrate the performance of the proposed methods through simulation studies for the three frequently encountered scenarios, respectively. For the first two scenarios, our method greatly improves existing methods and provides a nearly unbiased estimate of the true sample standard deviation for normal data and a slightly biased estimate for skewed data. For the third scenario, our method still performs very well for both normal data and skewed data. Furthermore, we compare the estimators of the sample mean and standard deviation under all three scenarios and present some suggestions on which scenario is preferred in real-world applications. In this paper, we discuss different approximation methods in the estimation of the sample mean and standard deviation and propose some new estimation methods to improve the existing literature. We conclude our work with a summary table (an Excel spread sheet including all formulas) that serves as a comprehensive guidance for performing meta-analysis in different
Estimating the Size of a Large Network and its Communities from a Random Sample
Chen, Lin; Crawford, Forrest W
2016-01-01
Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = (V;E) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W and letting G(W) be the induced subgraph in G of the vertices in W. In addition to G(W), we observe the total degree of each sampled vertex and its block membership. Given this partial information, we propose an efficient PopULation Size Estimation algorithm, called PULSE, that correctly estimates the size of the whole population as well as the size of each community. To support our theoretical analysis, we perform an exhausti...
Sample size for estimation of the Pearson correlation coefficient in cherry tomato tests
Bruno Giacomini Sari; Alessandro Dal’Col Lúcio; Cinthya Souza Santana; Dionatan Ketzer Krysczun; André Luís Tischler; Lucas Drebes
2017-01-01
ABSTRACT: The aim of this study was to determine the required sample size for estimation of the Pearson coefficient of correlation between cherry tomato variables. Two uniformity tests were set up in a protected environment in the spring/summer of 2014. The observed variables in each plant were mean fruit length, mean fruit width, mean fruit weight, number of bunches, number of fruits per bunch, number of fruits, and total weight of fruits, with calculation of the Pearson correlation matrix b...
Prediction accuracy of a sample-size estimation method for ROC studies.
Chakraborty, Dev P
2010-05-01
Sample-size estimation is an important consideration when planning a receiver operating characteristic (ROC) study. The aim of this work was to assess the prediction accuracy of a sample-size estimation method using the Monte Carlo simulation method. Two ROC ratings simulators characterized by low reader and high case variabilities (LH) and high reader and low case variabilities (HL) were used to generate pilot data sets in two modalities. Dorfman-Berbaum-Metz multiple-reader multiple-case (DBM-MRMC) analysis of the ratings yielded estimates of the modality-reader, modality-case, and error variances. These were input to the Hillis-Berbaum (HB) sample-size estimation method, which predicted the number of cases needed to achieve 80% power for 10 readers and an effect size of 0.06 in the pivotal study. Predictions that generalized to readers and cases (random-all), to cases only (random-cases), and to readers only (random-readers) were generated. A prediction-accuracy index defined as the probability that any single prediction yields true power in the 75%-90% range was used to assess the HB method. For random-case generalization, the HB-method prediction-accuracy was reasonable, approximately 50% for five readers and 100 cases in the pilot study. Prediction-accuracy was generally higher under LH conditions than under HL conditions. Under ideal conditions (many readers in the pilot study) the DBM-MRMC-based HB method overestimated the number of cases. The overestimates could be explained by the larger modality-reader variance estimates when reader variability was large (HL). The largest benefit of increasing the number of readers in the pilot study was realized for LH, where 15 readers were enough to yield prediction accuracy >50% under all generalization conditions, but the benefit was lesser for HL where prediction accuracy was approximately 36% for 15 readers under random-all and random-reader conditions. The HB method tends to overestimate the number of cases
Distance software: design and analysis of distance sampling surveys for estimating population size.
Thomas, Len; Buckland, Stephen T; Rexstad, Eric A; Laake, Jeff L; Strindberg, Samantha; Hedley, Sharon L; Bishop, Jon Rb; Marques, Tiago A; Burnham, Kenneth P
2010-02-01
1.Distance sampling is a widely used technique for estimating the size or density of biological populations. Many distance sampling designs and most analyses use the software Distance.2.We briefly review distance sampling and its assumptions, outline the history, structure and capabilities of Distance, and provide hints on its use.3.Good survey design is a crucial prerequisite for obtaining reliable results. Distance has a survey design engine, with a built-in geographic information system, that allows properties of different proposed designs to be examined via simulation, and survey plans to be generated.4.A first step in analysis of distance sampling data is modelling the probability of detection. Distance contains three increasingly sophisticated analysis engines for this: conventional distance sampling, which models detection probability as a function of distance from the transect and assumes all objects at zero distance are detected; multiple-covariate distance sampling, which allows covariates in addition to distance; and mark-recapture distance sampling, which relaxes the assumption of certain detection at zero distance.5.All three engines allow estimation of density or abundance, stratified if required, with associated measures of precision calculated either analytically or via the bootstrap.6.Advanced analysis topics covered include the use of multipliers to allow analysis of indirect surveys (such as dung or nest surveys), the density surface modelling analysis engine for spatial and habitat modelling, and information about accessing the analysis engines directly from other software.7.Synthesis and applications. Distance sampling is a key method for producing abundance and density estimates in challenging field conditions. The theory underlying the methods continues to expand to cope with realistic estimation situations. In step with theoretical developments, state-of-the-art software that implements these methods is described that makes the methods
Florey, C D
1993-01-01
The common failure to include an estimation of sample size in grant proposals imposes a major handicap on applicants, particularly for those proposing work in any aspect of research in the health services. Members of research committees need evidence that a study is of adequate size for there to be a reasonable chance of a clear answer at the end. A simple illustrated explanation of the concepts in determining sample size should encourage the faint hearted to pay more attention to this increa...
Hui, Tin-Yu J; Burt, Austin
2015-05-01
The effective population size [Formula: see text] is a key parameter in population genetics and evolutionary biology, as it quantifies the expected distribution of changes in allele frequency due to genetic drift. Several methods of estimating [Formula: see text] have been described, the most direct of which uses allele frequencies measured at two or more time points. A new likelihood-based estimator [Formula: see text] for contemporary effective population size using temporal data is developed in this article. The existing likelihood methods are computationally intensive and unable to handle the case when the underlying [Formula: see text] is large. This article tries to work around this problem by using a hidden Markov algorithm and applying continuous approximations to allele frequencies and transition probabilities. Extensive simulations are run to evaluate the performance of the proposed estimator [Formula: see text], and the results show that it is more accurate and has lower variance than previous methods. The new estimator also reduces the computational time by at least 1000-fold and relaxes the upper bound of [Formula: see text] to several million, hence allowing the estimation of larger [Formula: see text]. Finally, we demonstrate how this algorithm can cope with nonconstant [Formula: see text] scenarios and be used as a likelihood-ratio test to test for the equality of [Formula: see text] throughout the sampling horizon. An R package "NB" is now available for download to implement the method described in this article. Copyright © 2015 by the Genetics Society of America.
Sediment grain size estimation using airborne remote sensing, field sampling, and robust statistic.
Castillo, Elena; Pereda, Raúl; Luis, Julio Manuel de; Medina, Raúl; Viguri, Javier
2011-10-01
Remote sensing has been used since the 1980s to study parameters in relation with coastal zones. It was not until the beginning of the twenty-first century that it started to acquire imagery with good temporal and spectral resolution. This has encouraged the development of reliable imagery acquisition systems that consider remote sensing as a water management tool. Nevertheless, the spatial resolution that it provides is not adapted to carry out coastal studies. This article introduces a new methodology for estimating the most fundamental physical property of intertidal sediment, the grain size, in coastal zones. The study combines hyperspectral information (CASI-2 flight), robust statistic, and simultaneous field work (chemical and radiometric sampling), performed over Santander Bay, Spain. Field data acquisition was used to build a spectral library in order to study different atmospheric correction algorithms for CASI-2 data and to develop algorithms to estimate grain size in an estuary. Two robust estimation techniques (MVE and MCD multivariate M-estimators of location and scale) were applied to CASI-2 imagery, and the results showed that robust adjustments give acceptable and meaningful algorithms. These adjustments have given the following R(2) estimated results: 0.93 in the case of sandy loam contribution, 0.94 for the silty loam, and 0.67 for clay loam. The robust statistic is a powerful tool for large dataset.
Multiple sensitive estimation and optimal sample size allocation in the item sum technique.
Perri, Pier Francesco; Rueda García, María Del Mar; Cobo Rodríguez, Beatriz
2017-09-27
For surveys of sensitive issues in life sciences, statistical procedures can be used to reduce nonresponse and social desirability response bias. Both of these phenomena provoke nonsampling errors that are difficult to deal with and can seriously flaw the validity of the analyses. The item sum technique (IST) is a very recent indirect questioning method derived from the item count technique that seeks to procure more reliable responses on quantitative items than direct questioning while preserving respondents' anonymity. This article addresses two important questions concerning the IST: (i) its implementation when two or more sensitive variables are investigated and efficient estimates of their unknown population means are required; (ii) the determination of the optimal sample size to achieve minimum variance estimates. These aspects are of great relevance for survey practitioners engaged in sensitive research and, to the best of our knowledge, were not studied so far. In this article, theoretical results for multiple estimation and optimal allocation are obtained under a generic sampling design and then particularized to simple random sampling and stratified sampling designs. Theoretical considerations are integrated with a number of simulation studies based on data from two real surveys and conducted to ascertain the efficiency gain derived from optimal allocation in different situations. One of the surveys concerns cannabis consumption among university students. Our findings highlight some methodological advances that can be obtained in life sciences IST surveys when optimal allocation is achieved. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Desu, M M
2012-01-01
One of the most important problems in designing an experiment or a survey is sample size determination and this book presents the currently available methodology. It includes both random sampling from standard probability distributions and from finite populations. Also discussed is sample size determination for estimating parameters in a Bayesian setting by considering the posterior distribution of the parameter and specifying the necessary requirements. The determination of the sample size is considered for ranking and selection problems as well as for the design of clinical trials. Appropria
Naing, Nyi Nyi
2003-01-01
There is a particular importance of determining a basic minimum required ‘n’ size of the sample to recognize a particular measurement of a particular population. This article has highlighted the determination of an appropriate size to estimate population parameters.
Effects of sample size on estimation of rainfall extremes at high temperatures
Directory of Open Access Journals (Sweden)
B. Boessenkool
2017-09-01
Full Text Available High precipitation quantiles tend to rise with temperature, following the so-called Clausius–Clapeyron (CC scaling. It is often reported that the CC-scaling relation breaks down and even reverts for very high temperatures. In our study, we investigate this reversal using observational climate data from 142 stations across Germany. One of the suggested meteorological explanations for the breakdown is limited moisture supply. Here we argue that, instead, it could simply originate from undersampling. As rainfall frequency generally decreases with higher temperatures, rainfall intensities as dictated by CC scaling are less likely to be recorded than for moderate temperatures. Empirical quantiles are conventionally estimated from order statistics via various forms of plotting position formulas. They have in common that their largest representable return period is given by the sample size. In small samples, high quantiles are underestimated accordingly. The small-sample effect is weaker, or disappears completely, when using parametric quantile estimates from a generalized Pareto distribution (GPD fitted with L moments. For those, we obtain quantiles of rainfall intensities that continue to rise with temperature.
Effects of sample size on estimation of rainfall extremes at high temperatures
Boessenkool, Berry; Bürger, Gerd; Heistermann, Maik
2017-09-01
High precipitation quantiles tend to rise with temperature, following the so-called Clausius-Clapeyron (CC) scaling. It is often reported that the CC-scaling relation breaks down and even reverts for very high temperatures. In our study, we investigate this reversal using observational climate data from 142 stations across Germany. One of the suggested meteorological explanations for the breakdown is limited moisture supply. Here we argue that, instead, it could simply originate from undersampling. As rainfall frequency generally decreases with higher temperatures, rainfall intensities as dictated by CC scaling are less likely to be recorded than for moderate temperatures. Empirical quantiles are conventionally estimated from order statistics via various forms of plotting position formulas. They have in common that their largest representable return period is given by the sample size. In small samples, high quantiles are underestimated accordingly. The small-sample effect is weaker, or disappears completely, when using parametric quantile estimates from a generalized Pareto distribution (GPD) fitted with L moments. For those, we obtain quantiles of rainfall intensities that continue to rise with temperature.
Bahçecitapar, Melike Kaya
2017-07-01
Determining sample size necessary for correct results is a crucial step in the design of longitudinal studies. Simulation-based statistical power calculation is a flexible approach to determine number of subjects and repeated measures of longitudinal studies especially in complex design. Several papers have provided sample size/statistical power calculations for longitudinal studies incorporating data analysis by linear mixed effects models (LMMs). In this study, different estimation methods (methods based on maximum likelihood (ML) and restricted ML) with different iterative algorithms (quasi-Newton and ridge-stabilized Newton-Raphson) in fitting LMMs to generated longitudinal data for simulation-based power calculation are compared. This study examines statistical power of F-test statistics for parameter representing difference in responses over time from two treatment groups in the LMM with a longitudinal covariate. The most common procedures in SAS, such as PROC GLIMMIX using quasi-Newton algorithm and PROC MIXED using ridge-stabilized algorithm are used for analyzing generated longitudinal data in simulation. It is seen that both procedures present similar results. Moreover, it is found that the magnitude of the parameter of interest in the model for simulations affect statistical power calculations in both procedures substantially.
Florey, C D
1993-05-01
The common failure to include an estimation of sample size in grant proposals imposes a major handicap on applicants, particularly for those proposing work in any aspect of research in the health services. Members of research committees need evidence that a study is of adequate size for there to be a reasonable chance of a clear answer at the end. A simple illustrated explanation of the concepts in determining sample size should encourage the faint hearted to pay more attention to this increasingly important aspect of grantsmanship.
Directory of Open Access Journals (Sweden)
B SOLEYMANI
2001-06-01
Full Text Available In many casses the estimation of variance which is used to determine sample size in clinical trials, derives from limited primary or pilot studies in which number of samples is small. since in such casses the estimation of variance may be much far from the real variance, the size of samples is suspected to be less or more than what is really needed. In this article an attempt has been made to give a solution to this problem. in the case of normal distribution. Based on distribution of (n-1 S2/?2 which is chi-square for normal variables, an appropriate estimation of variance is determined an used to calculate sample size. Also, total probability to ensure specific precision and power has been achived. In method presented here, The probability for getting desired precision and power is more than that of usual method, but results of two methods get closer when sample size increases in primary studies.
Estimating sample size for a small-quadrat method of botanical ...
African Journals Online (AJOL)
... in eight plant communities in the Nylsvley Nature Reserve. Illustrates with a table. Keywords: Botanical surveys; Grass density; Grasslands; Mixed Bushveld; Nylsvley Nature Reserve; Quadrat size species density; Small-quadrat method; Species density; Species richness; botany; sample size; method; survey; south africa
Sample sizes to control error estimates in determining soil bulk density in California forest soils
Youzhi Han; Jianwei Zhang; Kim G. Mattson; Weidong Zhang; Thomas A. Weber
2016-01-01
Characterizing forest soil properties with high variability is challenging, sometimes requiring large numbers of soil samples. Soil bulk density is a standard variable needed along with element concentrations to calculate nutrient pools. This study aimed to determine the optimal sample size, the number of observation (n), for predicting the soil bulk density with a...
Directory of Open Access Journals (Sweden)
Stefanović Milena
2013-01-01
Full Text Available In studies of population variability, particular attention has to be paid to the selection of a representative sample. The aim of this study was to assess the size of the new representative sample on the basis of the variability of chemical content of the initial sample on the example of a whitebark pine population. Statistical analysis included the content of 19 characteristics (terpene hydrocarbons and their derivates of the initial sample of 10 elements (trees. It was determined that the new sample should contain 20 trees so that the mean value calculated from it represents a basic set with a probability higher than 95 %. Determination of the lower limit of the representative sample size that guarantees a satisfactory reliability of generalization proved to be very important in order to achieve cost efficiency of the research. [Projekat Ministarstva nauke Republike Srbije, br. OI-173011, br. TR-37002 i br. III-43007
Umesh P. Agarwal; Sally A. Ralph; Carlos Baez; Richard S. Reiner; Steve P. Verrill
2017-01-01
Although X-ray diffraction (XRD) has been the most widely used technique to investigate crystallinity index (CrI) and crystallite size (L200) of cellulose materials, there are not many studies that have taken into account the role of sample moisture on these measurements. The present investigation focuses on a variety of celluloses and cellulose...
Uijlenhoet, R.; Porrà, J.M.; Sempere Torres, D.; Creutin, J.D.
2006-01-01
A stochastic model of the microstructure of rainfall is used to derive explicit expressions for the magnitude of the sampling fluctuations in rainfall properties estimated from raindrop size measurements in stationary rainfall. The model is a marked point process, in which the points represent the
Teare, M Dawn; Dimairo, Munyaradzi; Shephard, Neil; Hayman, Alex; Whitehead, Amy; Walters, Stephen J
2014-07-03
External pilot or feasibility studies can be used to estimate key unknown parameters to inform the design of the definitive randomised controlled trial (RCT). However, there is little consensus on how large pilot studies need to be, and some suggest inflating estimates to adjust for the lack of precision when planning the definitive RCT. We use a simulation approach to illustrate the sampling distribution of the standard deviation for continuous outcomes and the event rate for binary outcomes. We present the impact of increasing the pilot sample size on the precision and bias of these estimates, and predicted power under three realistic scenarios. We also illustrate the consequences of using a confidence interval argument to inflate estimates so the required power is achieved with a pre-specified level of confidence. We limit our attention to external pilot and feasibility studies prior to a two-parallel-balanced-group superiority RCT. For normally distributed outcomes, the relative gain in precision of the pooled standard deviation (SDp) is less than 10% (for each five subjects added per group) once the total sample size is 70. For true proportions between 0.1 and 0.5, we find the gain in precision for each five subjects added to the pilot sample is less than 5% once the sample size is 60. Adjusting the required sample sizes for the imprecision in the pilot study estimates can result in excessively large definitive RCTs and also requires a pilot sample size of 60 to 90 for the true effect sizes considered here. We recommend that an external pilot study has at least 70 measured subjects (35 per group) when estimating the SDp for a continuous outcome. If the event rate in an intervention group needs to be estimated by the pilot then a total of 60 to 100 subjects is required. Hence if the primary outcome is binary a total of at least 120 subjects (60 in each group) may be required in the pilot trial. It is very much more efficient to use a larger pilot study, than to
Ellison, Laura E.; Lukacs, Paul M.
2014-01-01
Concern for migratory tree-roosting bats in North America has grown because of possible population declines from wind energy development. This concern has driven interest in estimating population-level changes. Mark-recapture methodology is one possible analytical framework for assessing bat population changes, but sample size requirements to produce reliable estimates have not been estimated. To illustrate the sample sizes necessary for a mark-recapture-based monitoring program we conducted power analyses using a statistical model that allows reencounters of live and dead marked individuals. We ran 1,000 simulations for each of five broad sample size categories in a Burnham joint model, and then compared the proportion of simulations in which 95% confidence intervals overlapped between and among years for a 4-year study. Additionally, we conducted sensitivity analyses of sample size to various capture probabilities and recovery probabilities. More than 50,000 individuals per year would need to be captured and released to accurately determine 10% and 15% declines in annual survival. To detect more dramatic declines of 33% or 50% survival over four years, then sample sizes of 25,000 or 10,000 per year, respectively, would be sufficient. Sensitivity analyses reveal that increasing recovery of dead marked individuals may be more valuable than increasing capture probability of marked individuals. Because of the extraordinary effort that would be required, we advise caution should such a mark-recapture effort be initiated because of the difficulty in attaining reliable estimates. We make recommendations for what techniques show the most promise for mark-recapture studies of bats because some techniques violate the assumptions of mark-recapture methodology when used to mark bats.
Gupta, Manan; Joshi, Amitabh; Vidya, T N C
2017-01-01
Mark-recapture estimators are commonly used for population size estimation, and typically yield unbiased estimates for most solitary species with low to moderate home range sizes. However, these methods assume independence of captures among individuals, an assumption that is clearly violated in social species that show fission-fusion dynamics, such as the Asian elephant. In the specific case of Asian elephants, doubts have been raised about the accuracy of population size estimates. More importantly, the potential problem for the use of mark-recapture methods posed by social organization in general has not been systematically addressed. We developed an individual-based simulation framework to systematically examine the potential effects of type of social organization, as well as other factors such as trap density and arrangement, spatial scale of sampling, and population density, on bias in population sizes estimated by POPAN, Robust Design, and Robust Design with detection heterogeneity. In the present study, we ran simulations with biological, demographic and ecological parameters relevant to Asian elephant populations, but the simulation framework is easily extended to address questions relevant to other social species. We collected capture history data from the simulations, and used those data to test for bias in population size estimation. Social organization significantly affected bias in most analyses, but the effect sizes were variable, depending on other factors. Social organization tended to introduce large bias when trap arrangement was uniform and sampling effort was low. POPAN clearly outperformed the two Robust Design models we tested, yielding close to zero bias if traps were arranged at random in the study area, and when population density and trap density were not too low. Social organization did not have a major effect on bias for these parameter combinations at which POPAN gave more or less unbiased population size estimates. Therefore, the
Brunstrom, Jeffrey M; Rogers, Peter J; Pothos, Emmanuel M; Calitri, Raff; Tapper, Katy
2008-09-01
This paper (i) explores the proposition that body weight is associated with large portion sizes and (ii) introduces a new technique for measuring everyday portion size. In our paradigm, the participant is shown a picture of a food portion and is asked to indicate whether it is larger or smaller than their usual portion. After responding to a range of different portions an estimate of everyday portion size is calculated using probit analysis. Importantly, this estimate is likely to be robust because it is based on many responses. First-year undergraduate students (N=151) completed our procedure for 12 commonly consumed foods. As expected, portion sizes were predicted by gender and by a measure of dieting and dietary restraint. Furthermore, consistent with reports of hungry supermarket shoppers, portion-size estimates tended to be higher in hungry individuals. However, we found no evidence for a relationship between BMI and portion size in any of the test foods. We consider reasons why this finding should be anticipated. In particular, we suggest that the difference in total energy expenditure of individuals with a higher and lower BMI is too small to be detected as a concomitant difference in portion size (at least in our sample).
Estimation of the Shape Parameter of Ged Distribution for a Small Sample Size
Directory of Open Access Journals (Sweden)
Purczyński Jan
2014-06-01
Full Text Available In this paper a new method of estimating the shape parameter of generalized error distribution (GED, called ‘approximated moment method’, was proposed. The following estimators were considered: the one obtained through the maximum likelihood method (MLM, approximated fast estimator (AFE, and approximated moment method (AMM. The quality of estimator was evaluated on the basis of the value of the relative mean square error. Computer simulations were conducted using random number generators for the following shape parameters: s = 0.5, s = 1.0 (Laplace distribution s = 2.0 (Gaussian distribution and s = 3.0.
Directory of Open Access Journals (Sweden)
Ayfer SAYIN
2016-12-01
Full Text Available In adjustment studies of scales and in terms of cross validity at scale development, confirmatory factor analysis is conducted. Confirmatory factor analysis, multivariate statistics, is estimated via various parameter estimation methods and utilizes several fit indexes for evaluating the model fit. In this study, model fit indexes utilized in confirmatory factor analysis are examined with different parameter estimation methods under different sample sizes. For the purpose of this study, answers of 60, 100, 250, 500 and 1000 students who attended PISA 2012 program were pulled from the answers to two dimensional “thoughts on the importance of mathematics” dimension. Estimations were based on methods of maximum likelihood (ML, unweighted least squares (ULS and generalized least squares (GLS. As a result of the study, it was found that model fit indexes were affected by the conditions, however some fit indexes were affected less than others and vice versa. In order to analyze these, some suggestions were made.
Neubauer, Simon; Gunz, Philipp; Weber, Gerhard W; Hublin, Jean-Jacques
2012-04-01
Estimation of endocranial volume in Australopithecus africanus is important in interpreting early hominin brain evolution. However, the number of individuals available for investigation is limited and most of these fossils are, to some degree, incomplete and/or distorted. Uncertainties of the required reconstruction ('missing data uncertainty') and the small sample size ('small sample uncertainty') both potentially bias estimates of the average and within-group variation of endocranial volume in A. africanus. We used CT scans, electronic preparation (segmentation), mirror-imaging and semilandmark-based geometric morphometrics to generate and reconstruct complete endocasts for Sts 5, Sts 60, Sts 71, StW 505, MLD 37/38, and Taung, and measured their endocranial volumes (EV). To get a sense of the reliability of these new EV estimates, we then used simulations based on samples of chimpanzees and humans to: (a) test the accuracy of our approach, (b) assess missing data uncertainty, and (c) appraise small sample uncertainty. Incorporating missing data uncertainty of the five adult individuals, A. africanus was found to have an average adult endocranial volume of 454-461 ml with a standard deviation of 66-75 ml. EV estimates for the juvenile Taung individual range from 402 to 407 ml. Our simulations show that missing data uncertainty is small given the missing portions of the investigated fossils, but that small sample sizes are problematic for estimating species average EV. It is important to take these uncertainties into account when different fossil groups are being compared. Copyright © 2012 Elsevier Ltd. All rights reserved.
Sillett, T Scott; Chandler, Richard B; Royle, J Andrew; Kery, Marc; Morrison, Scott A
2012-10-01
Population size and habitat-specific abundance estimates are essential for conservation management. A major impediment to obtaining such estimates is that few statistical models are able to simultaneously account for both spatial variation in abundance and heterogeneity in detection probability, and still be amenable to large-scale applications. The hierarchical distance-sampling model of J. A. Royle, D. K. Dawson, and S. Bates provides a practical solution. Here, we extend this model to estimate habitat-specific abundance and rangewide population size of a bird species of management concern, the Island Scrub-Jay (Aphelocoma insularis), which occurs solely on Santa Cruz Island, California, USA. We surveyed 307 randomly selected, 300 m diameter, point locations throughout the 250-km2 island during October 2008 and April 2009. Population size was estimated to be 2267 (95% CI 1613-3007) and 1705 (1212-2369) during the fall and spring respectively, considerably lower than a previously published but statistically problematic estimate of 12 500. This large discrepancy emphasizes the importance of proper survey design and analysis for obtaining reliable information for management decisions. Jays were most abundant in low-elevation chaparral habitat; the detection function depended primarily on the percent cover of chaparral and forest within count circles. Vegetation change on the island has been dramatic in recent decades, due to release from herbivory following the eradication of feral sheep (Ovis aries) from the majority of the island in the mid-1980s. We applied best-fit fall and spring models of habitat-specific jay abundance to a vegetation map from 1985, and estimated the population size of A. insularis was 1400-1500 at that time. The 20-30% increase in the jay population suggests that the species has benefited from the recovery of native vegetation since sheep removal. Nevertheless, this jay's tiny range and small population size make it vulnerable to natural
Bayesian adaptive approach to estimating sample sizes for seizures of illicit drugs.
Moroni, Rossana; Aalberg, Laura; Reinikainen, Tapani; Corander, Jukka
2012-01-01
A considerable amount of discussion can be found in the forensics literature about the issue of using statistical sampling to obtain for chemical analyses an appropriate subset of units from a police seizure suspected to contain illicit material. Use of the Bayesian paradigm has been suggested as the most suitable statistical approach to solving the question of how large a sample needs to be to ensure legally and practically acceptable purposes. Here, we introduce a hypergeometric sampling model combined with a specific prior distribution for the homogeneity of the seizure, where a parameter for the analyst's expectation of homogeneity (α) is included. Our results show how an adaptive approach to sampling can minimize the practical efforts needed in the laboratory analyses, as the model allows the scientist to decide sequentially how to proceed, while maintaining a sufficiently high confidence in the conclusions. © 2011 American Academy of Forensic Sciences.
Bacchetti, Peter; Wolf, Leslie E; Segal, Mark R; McCulloch, Charles E
2005-01-15
The belief is widespread that studies are unethical if their sample size is not large enough to ensure adequate power. The authors examine how sample size influences the balance that determines the ethical acceptability of a study: the balance between the burdens that participants accept and the clinical or scientific value that a study can be expected to produce. The average projected burden per participant remains constant as the sample size increases, but the projected study value does not increase as rapidly as the sample size if it is assumed to be proportional to power or inversely proportional to confidence interval width. This implies that the value per participant declines as the sample size increases and that smaller studies therefore have more favorable ratios of projected value to participant burden. The ethical treatment of study participants therefore does not require consideration of whether study power is less than the conventional goal of 80% or 90%. Lower power does not make a study unethical. The analysis addresses only ethical acceptability, not optimality; large studies may be desirable for other than ethical reasons.
DEFF Research Database (Denmark)
Gardi, Jonathan Eyal; Nyengaard, Jens Randel; Gundersen, Hans Jørgen Gottlieb
2008-01-01
The proportionator is a novel and radically different approach to sampling with microscopes based on well-known statistical theory (probability proportional to size - PPS sampling). It uses automatic image analysis, with a large range of options, to assign to every field of view in the section a ...
Tsai, Ming-Yen; Chen, Shih-Yu; Lin, Chung-Chun
2017-04-01
The Meridian Energy Analysis Device is currently a popular tool in the scientific research of meridian electrophysiology. In this field, it is generally believed that measuring the electrical conductivity of meridians provides information about the balance of bioenergy or Qi-blood in the body. PubMed database based on some original articles from 1956 to 2014 and the authoŕs clinical experience. In this short communication, we provide clinical examples of Meridian Energy Analysis Device application, especially in the field of traditional Chinese medicine, discuss the reliability of the measurements, and put the values obtained into context by considering items of considerable variability and by estimating sample size. The Meridian Energy Analysis Device is making a valuable contribution to the diagnosis of Qi-blood dysfunction. It can be assessed from short-term and long-term meridian bioenergy recordings. It is one of the few methods that allow outpatient traditional Chinese medicine diagnosis, monitoring the progress, therapeutic effect and evaluation of patient prognosis. The holistic approaches underlying the practice of traditional Chinese medicine and new trends in modern medicine toward the use of objective instruments require in-depth knowledge of the mechanisms of meridian energy, and the Meridian Energy Analysis Device can feasibly be used for understanding and interpreting traditional Chinese medicine theory, especially in view of its expansion in Western countries.
DEFF Research Database (Denmark)
Haugbøl, Steven; Pinborg, Lars H; Arfan, Haroon M
2006-01-01
PURPOSE: To determine the reproducibility of measurements of brain 5-HT2A receptors with an [18F]altanserin PET bolus/infusion approach. Further, to estimate the sample size needed to detect regional differences between two groups and, finally, to evaluate how partial volume correction affects...... reproducibility and the required sample size. METHODS: For assessment of the variability, six subjects were investigated with [18F]altanserin PET twice, at an interval of less than 2 weeks. The sample size required to detect a 20% difference was estimated from [18F]altanserin PET studies in 84 healthy subjects......% (range 5-12%), whereas in regions with a low receptor density, BP1 reproducibility was lower, with a median difference of 17% (range 11-39%). Partial volume correction reduced the variability in the sample considerably. The sample size required to detect a 20% difference in brain regions with high...
Directory of Open Access Journals (Sweden)
Eva-Maria Willing
Full Text Available Population genetic studies provide insights into the evolutionary processes that influence the distribution of sequence variants within and among wild populations. F(ST is among the most widely used measures for genetic differentiation and plays a central role in ecological and evolutionary genetic studies. It is commonly thought that large sample sizes are required in order to precisely infer F(ST and that small sample sizes lead to overestimation of genetic differentiation. Until recently, studies in ecological model organisms incorporated a limited number of genetic markers, but since the emergence of next generation sequencing, the panel size of genetic markers available even in non-reference organisms has rapidly increased. In this study we examine whether a large number of genetic markers can substitute for small sample sizes when estimating F(ST. We tested the behavior of three different estimators that infer F(ST and that are commonly used in population genetic studies. By simulating populations, we assessed the effects of sample size and the number of markers on the various estimates of genetic differentiation. Furthermore, we tested the effect of ascertainment bias on these estimates. We show that the population sample size can be significantly reduced (as small as n = 4-6 when using an appropriate estimator and a large number of bi-allelic genetic markers (k>1,000. Therefore, conservation genetic studies can now obtain almost the same statistical power as studies performed on model organisms using markers developed with next-generation sequencing.
Trattner, Sigal; Cheng, Bin; Pieniazek, Radoslaw L; Hoffmann, Udo; Douglas, Pamela S; Einstein, Andrew J
2014-04-01
Effective dose (ED) is a widely used metric for comparing ionizing radiation burden between different imaging modalities, scanners, and scan protocols. In computed tomography (CT), ED can be estimated by performing scans on an anthropomorphic phantom in which metal-oxide-semiconductor field-effect transistor (MOSFET) solid-state dosimeters have been placed to enable organ dose measurements. Here a statistical framework is established to determine the sample size (number of scans) needed for estimating ED to a desired precision and confidence, for a particular scanner and scan protocol, subject to practical limitations. The statistical scheme involves solving equations which minimize the sample size required for estimating ED to desired precision and confidence. It is subject to a constrained variation of the estimated ED and solved using the Lagrange multiplier method. The scheme incorporates measurement variation introduced both by MOSFET calibration, and by variation in MOSFET readings between repeated CT scans. Sample size requirements are illustrated on cardiac, chest, and abdomen-pelvis CT scans performed on a 320-row scanner and chest CT performed on a 16-row scanner. Sample sizes for estimating ED vary considerably between scanners and protocols. Sample size increases as the required precision or confidence is higher and also as the anticipated ED is lower. For example, for a helical chest protocol, for 95% confidence and 5% precision for the ED, 30 measurements are required on the 320-row scanner and 11 on the 16-row scanner when the anticipated ED is 4 mSv; these sample sizes are 5 and 2, respectively, when the anticipated ED is 10 mSv. Applying the suggested scheme, it was found that even at modest sample sizes, it is feasible to estimate ED with high precision and a high degree of confidence. As CT technology develops enabling ED to be lowered, more MOSFET measurements are needed to estimate ED with the same precision and confidence. © 2014 American
DEFF Research Database (Denmark)
Kostoulas, P.; Nielsen, Søren Saxmose; Browne, W. J.
2013-01-01
SUMMARY Disease cases are often clustered within herds or generally groups that share common characteristics. Sample size formulae must adjust for the within-cluster correlation of the primary sampling units. Traditionally, the intra-cluster correlation coefficient (ICC), which is an average meas...... subsp. paratuberculosis infection, in Danish dairy cattle and a study on critical control points for Salmonella cross-contamination of pork, in Greek slaughterhouses....
Scott, Thomas F; Schramke, Carol J; Cutter, Gary
2003-06-01
Risk factors for short-term progression in early relapsing remitting MS have been identified recently. Previously we determined potential risk factors for rapid progression of early relapsing remitting MS and identified three groups of high-risk patients. These non-mutually exclusive groups of patients were drawn from a consecutively studied sample of 98 patients with newly diagnosed MS. High-risk patients had a history of either poor recovery from initial attacks, more than two attacks in the first two years of disease, or a combination of at least four other risk factors. To determine differences in sample sizes required to show a meaningful treatment effect when using a high-risk sample versus a random sample of patients. Power analyses were used to calculate the different sample sizes needed for hypothetical treatment trials. We found that substantially smaller numbers of patients should be needed to show a significant treatment effect by employing these high-risk groups of patients as compared to a random population of MS patients (e.g., 58% reduction in sample size in one model). The use of patients at higher risk of progression to perform drug treatment trials can be considered as a means to reduce the number of patients needed to show a significant treatment effect for patients with very early MS.
Hans T. Schreuder; Jin-Mann S. Lin; John Teply
2000-01-01
The Forest Inventory and Analysis units in the USDA Forest Service have been mandated by Congress to go to an annualized inventory where a certain percentage of plots, say 20 percent, will be measured in each State each year. Although this will result in an annual sample size that will be too small for reliable inference for many areas, it is a sufficiently large...
Chan, Kelvin K W; Xie, Feng; Willan, Andrew R; Pullenayegum, Eleanor M
2018-01-01
Resource-constrained countries have difficulty conducting large EQ-5D valuation studies, which limits their ability to conduct cost-utility analyses using a value set specific to their own population. When estimates of similar but related parameters are available, shrinkage estimators reduce uncertainty and yield estimators with smaller mean square error (MSE). We hypothesized that health utilities based on shrinkage estimators can reduce MSE and mean absolute error (MAE) when compared to country-specific health utilities. We conducted a simulation study (1,000 iterations) based on the observed means and standard deviations (or standard errors) of the EQ-5D-3L valuation studies from 14 counties. In each iteration, the simulated data were fitted with the model based on the country-specific functional form of the scoring algorithm to create country-specific health utilities ("naïve" estimators). Shrinkage estimators were calculated based on the empirical Bayes estimation methods. The performance of shrinkage estimators was compared with those of the naïve estimators over a range of different sample sizes based on MSE, MAE, mean bias, standard errors and the width of confidence intervals. The MSE of the shrinkage estimators was smaller than the MSE of the naïve estimators on average, as theoretically predicted. Importantly, the MAE of the shrinkage estimators was also smaller than the MAE of the naïve estimators on average. In addition, the reduction in MSE with the use of shrinkage estimators did not substantially increase bias. The degree of reduction in uncertainty by shrinkage estimators is most apparent in valuation studies with small sample size. Health utilities derived from shrinkage estimation allow valuation studies with small sample size to "borrow strength" from other valuation studies to reduce uncertainty.
Fang, J; Cui, L Y; Liu, M S; Guan, Y Z; Ding, Q Y; Du, H; Li, B H; Wu, S
2017-03-07
Objective: The study aimed to investigate whether sample sizes of F-wave study differed according to different nerves, different F-wave parameters, and amyotrophic lateral sclerosis(ALS) patients or healthy subjects. Methods: The F-waves in the median, ulnar, tibial, and deep peroneal nerves of 55 amyotrophic lateral sclerosis (ALS) patients and 52 healthy subjects were studied to assess the effect of sample size on the accuracy of measurements of the following F-wave parameters: F-wave minimum latency, maximum latency, mean latency, F-wave persistence, F-wave chronodispersion, mean and maximum F-wave amplitude. A hundred stimuli were used in F-wave study. The values obtained from 100 stimuli were considered "true" values and were compared with the corresponding values from smaller samples of 20, 40, 60 and 80 stimuli. F-wave parameters obtained from different sample sizes were compared between the ALS patients and the normal controls. Results: Significant differences were not detected with samples above 60 stimuli for chronodispersion in all four nerves in normal participants. Significant differences were not detected with samples above 40 stimuli for maximum F-wave amplitude in median, ulnar and tibial nerves in normal participants. When comparing ALS patients and normal controls, significant differences were detected in the maximum (median nerve, Z=-3.560, PF-wave latency (median nerve, Z=-3.243, PF-wave chronodispersion (Z=-3.152, PF-wave persistence in the median (Z=6.139, PF-wave amplitude in the tibial nerve(t=2.981, PF-wave amplitude in the ulnar (Z=-2.134, PF-wave persistence in tibial nerve (Z=2.119, PF-wave amplitude in ulnar (Z=-2.552, PF-wave amplitude in peroneal nerve (t=2.693, PF-wave study differed according to different nerves, different F-wave parameters , and ALS patients or healthy subjects.
NeCamp, Timothy; Kilbourne, Amy; Almirall, Daniel
2017-08-01
Cluster-level dynamic treatment regimens can be used to guide sequential treatment decision-making at the cluster level in order to improve outcomes at the individual or patient-level. In a cluster-level dynamic treatment regimen, the treatment is potentially adapted and re-adapted over time based on changes in the cluster that could be impacted by prior intervention, including aggregate measures of the individuals or patients that compose it. Cluster-randomized sequential multiple assignment randomized trials can be used to answer multiple open questions preventing scientists from developing high-quality cluster-level dynamic treatment regimens. In a cluster-randomized sequential multiple assignment randomized trial, sequential randomizations occur at the cluster level and outcomes are observed at the individual level. This manuscript makes two contributions to the design and analysis of cluster-randomized sequential multiple assignment randomized trials. First, a weighted least squares regression approach is proposed for comparing the mean of a patient-level outcome between the cluster-level dynamic treatment regimens embedded in a sequential multiple assignment randomized trial. The regression approach facilitates the use of baseline covariates which is often critical in the analysis of cluster-level trials. Second, sample size calculators are derived for two common cluster-randomized sequential multiple assignment randomized trial designs for use when the primary aim is a between-dynamic treatment regimen comparison of the mean of a continuous patient-level outcome. The methods are motivated by the Adaptive Implementation of Effective Programs Trial which is, to our knowledge, the first-ever cluster-randomized sequential multiple assignment randomized trial in psychiatry.
Directory of Open Access Journals (Sweden)
Pierre Mollet
Full Text Available We conducted a survey of an endangered and cryptic forest grouse, the capercaillie Tetrao urogallus, based on droppings collected on two sampling occasions in eight forest fragments in central Switzerland in early spring 2009. We used genetic analyses to sex and individually identify birds. We estimated sex-dependent detection probabilities and population size using a modern spatial capture-recapture (SCR model for the data from pooled surveys. A total of 127 capercaillie genotypes were identified (77 males, 46 females, and 4 of unknown sex. The SCR model yielded a total population size estimate (posterior mean of 137.3 capercaillies (posterior sd 4.2, 95% CRI 130-147. The observed sex ratio was skewed towards males (0.63. The posterior mean of the sex ratio under the SCR model was 0.58 (posterior sd 0.02, 95% CRI 0.54-0.61, suggesting a male-biased sex ratio in our study area. A subsampling simulation study indicated that a reduced sampling effort representing 75% of the actual detections would still yield practically acceptable estimates of total size and sex ratio in our population. Hence, field work and financial effort could be reduced without compromising accuracy when the SCR model is used to estimate key population parameters of cryptic species.
Anwar Fitrianto; Lee Ceng Yik
2014-01-01
When independent variables have high linear correlation in a multiple linear regression model, we can have wrong analysis. It happens if we do the multiple linear regression analysis based on common Ordinary Least Squares (OLS) method. In this situation, we are suggested to use ridge regression estimator. We conduct some simulation study to compare the performance of ridge regression estimator and the OLS. We found that Hoerl and Kennard ridge regression estimation method has better performan...
Sample size determination and power
Ryan, Thomas P, Jr
2013-01-01
THOMAS P. RYAN, PhD, teaches online advanced statistics courses for Northwestern University and The Institute for Statistics Education in sample size determination, design of experiments, engineering statistics, and regression analysis.
Jacob Strunk; Hailemariam Temesgen; Hans-Erik Andersen; James P. Flewelling; Lisa Madsen
2012-01-01
Using lidar in an area-based model-assisted approach to forest inventory has the potential to increase estimation precision for some forest inventory variables. This study documents the bias and precision of a model-assisted (regression estimation) approach to forest inventory with lidar-derived auxiliary variables relative to lidar pulse density and the number of...
Hamilton, Matthew B; Tartakovsky, Maria; Battocletti, Amy
2018-02-03
The genetic effective population size, N e , can be estimated from the average gametic disequilibrium (r2^) between pairs of loci but such estimates require evaluation of assumptions and currently have few methods to estimate confidence intervals. SpEED-Ne is a suite of Matlab computer code functions to estimate Ne^ from r2^ with a graphical user interface and a rich set of outputs that aid in understanding data patterns and comparing multiple estimators. SpEED-Ne includes functions to either generate or input simulated genotype data to facilitate comparative studies of Ne^ estimators under various population genetic scenarios. SpEED-Ne was validated with data simulated under both time-forward and time-backward coalescent models of genetic drift. Three classes of estimators were compared with simulated data to examine several general questions: what are the impacts of microsatellite null alleles on Ne^, how should missing data be treated, and does disequilibrium contributed by reduced recombination among some loci in a sample impactNe^. Estimators differed greatly in precision in the scenarios examined and a widely employed Ne^ estimator exhibited the largest variances among replicate data sets. SpEED-Ne implements several jackknife approaches to estimate confidence intervals, and simulated data showed that jackknifing over loci and jackknifing over individuals provided approximately 95% confidence interval coverage for some estimators and should be useful for empirical studies. SpEED-Ne provides an open-source extensible tool for estimation of N e from empirical genotype data and to conduct simulations of both microsatellite and single nucleotide polymorphism (SNP) data types to develop expectations and to compare Ne^ estimators. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Sample size for morphological traits of pigeonpea
Directory of Open Access Journals (Sweden)
Giovani Facco
2015-12-01
Full Text Available The objectives of this study were to determine the sample size (i.e., number of plants required to accurately estimate the average of morphological traits of pigeonpea (Cajanus cajan L. and to check for variability in sample size between evaluation periods and seasons. Two uniformity trials (i.e., experiments without treatment were conducted for two growing seasons. In the first season (2011/2012, the seeds were sown by broadcast seeding, and in the second season (2012/2013, the seeds were sown in rows spaced 0.50 m apart. The ground area in each experiment was 1,848 m2, and 360 plants were marked in the central area, in a 2 m × 2 m grid. Three morphological traits (e.g., number of nodes, plant height and stem diameter were evaluated 13 times during the first season and 22 times in the second season. Measurements for all three morphological traits were normally distributed and confirmed through the Kolmogorov-Smirnov test. Randomness was confirmed using the Run Test, and the descriptive statistics were calculated. For each trait, the sample size (n was calculated for the semiamplitudes of the confidence interval (i.e., estimation error equal to 2, 4, 6, ..., 20% of the estimated mean with a confidence coefficient (1-? of 95%. Subsequently, n was fixed at 360 plants, and the estimation error of the estimated percentage of the average for each trait was calculated. Variability of the sample size for the pigeonpea culture was observed between the morphological traits evaluated, among the evaluation periods and between seasons. Therefore, to assess with an accuracy of 6% of the estimated average, at least 136 plants must be evaluated throughout the pigeonpea crop cycle to determine the sample size for the traits (e.g., number of nodes, plant height and stem diameter in the different evaluation periods and between seasons.
Determining sample size for tree utilization surveys
Stanley J. Zarnoch; James W. Bentley; Tony G. Johnson
2004-01-01
The U.S. Department of Agriculture Forest Service has conducted many studies to determine what proportion of the timber harvested in the South is actually utilized. This paper describes the statistical methods used to determine required sample sizes for estimating utilization ratios for a required level of precision. The data used are those for 515 hardwood and 1,557...
Christensen, Jette; Stryhn, Henrik; Vallières, André; El Allaki, Farouk
2011-05-01
In 2008, Canada designed and implemented the Canadian Notifiable Avian Influenza Surveillance System (CanNAISS) with six surveillance activities in a phased-in approach. CanNAISS was a surveillance system because it had more than one surveillance activity or component in 2008: passive surveillance; pre-slaughter surveillance; and voluntary enhanced notifiable avian influenza surveillance. Our objectives were to give a short overview of two active surveillance components in CanNAISS; describe the CanNAISS scenario tree model and its application to estimation of probability of populations being free of NAI virus infection and sample size determination. Our data from the pre-slaughter surveillance component included diagnostic test results from 6296 serum samples representing 601 commercial chicken and turkey farms collected from 25 August 2008 to 29 January 2009. In addition, we included data from a sub-population of farms with high biosecurity standards: 36,164 samples from 55 farms sampled repeatedly over the 24 months study period from January 2007 to December 2008. All submissions were negative for Notifiable Avian Influenza (NAI) virus infection. We developed the CanNAISS scenario tree model, so that it will estimate the surveillance component sensitivity and the probability of a population being free of NAI at the 0.01 farm-level and 0.3 within-farm-level prevalences. We propose that a general model, such as the CanNAISS scenario tree model, may have a broader application than more detailed models that require disease specific input parameters, such as relative risk estimates. Crown Copyright © 2011. Published by Elsevier B.V. All rights reserved.
Size Estimates in Inverse Problems
Di Cristo, Michele
2014-01-06
Detection of inclusions or obstacles inside a body by boundary measurements is an inverse problems very useful in practical applications. When only finite numbers of measurements are available, we try to detect some information on the embedded object such as its size. In this talk we review some recent results on several inverse problems. The idea is to provide constructive upper and lower estimates of the area/volume of the unknown defect in terms of a quantity related to the work that can be expressed with the available boundary data.
National Research Council Canada - National Science Library
Matsuo, Eder; Sediyama, Tuneo; Cruz, Cosme Damiao; Oliveira, Rita de Cassia Teixeira; Cadore, Luiz Renato
2012-01-01
The objective of this study was to estimate the genetic parameters and optimal sample size for the lengths of the hypocotyl and epicotyls and to analyze the conversion of quantitative data in multiple...
How Sample Size Affects a Sampling Distribution
Mulekar, Madhuri S.; Siegel, Murray H.
2009-01-01
If students are to understand inferential statistics successfully, they must have a profound understanding of the nature of the sampling distribution. Specifically, they must comprehend the determination of the expected value and standard error of a sampling distribution as well as the meaning of the central limit theorem. Many students in a high…
Predicting sample size required for classification performance
Directory of Open Access Journals (Sweden)
Figueroa Rosa L
2012-02-01
Full Text Available Abstract Background Supervised learning methods need annotated data in order to generate efficient models. Annotated data, however, is a relatively scarce resource and can be expensive to obtain. For both passive and active learning methods, there is a need to estimate the size of the annotated sample required to reach a performance target. Methods We designed and implemented a method that fits an inverse power law model to points of a given learning curve created using a small annotated training set. Fitting is carried out using nonlinear weighted least squares optimization. The fitted model is then used to predict the classifier's performance and confidence interval for larger sample sizes. For evaluation, the nonlinear weighted curve fitting method was applied to a set of learning curves generated using clinical text and waveform classification tasks with active and passive sampling methods, and predictions were validated using standard goodness of fit measures. As control we used an un-weighted fitting method. Results A total of 568 models were fitted and the model predictions were compared with the observed performances. Depending on the data set and sampling method, it took between 80 to 560 annotated samples to achieve mean average and root mean squared error below 0.01. Results also show that our weighted fitting method outperformed the baseline un-weighted method (p Conclusions This paper describes a simple and effective sample size prediction algorithm that conducts weighted fitting of learning curves. The algorithm outperformed an un-weighted algorithm described in previous literature. It can help researchers determine annotation sample size for supervised machine learning.
Uyei, Jennifer; Li, Lingfeng; Braithwaite, R Scott
2017-01-01
Given the serious health consequences of discontinuing antiretroviral therapy, randomised control trials of interventions to improve retention in care may be warranted. As funding for global HIV research is finite, it may be argued that choices about sample size should be tied to maximising health. For an East African setting, we calculated expected value of sample information and expected net benefit of sampling to identify the optimal sample size (greatest return on investment) and to quantify net health gains associated with research. Two hypothetical interventions were analysed: (1) one aimed at reducing disengagement from HIV care and (2) another aimed at finding/relinking disengaged patients. When the willingness to pay (WTP) threshold was within a plausible range (1-3 × GDP; US$1377-4130/QALY), the optimal sample size was zero for both interventions, meaning that no further research was recommended because the pre-research probability of an intervention's effectiveness and value was sufficient to support a decision on whether to adopt the intervention and any new information gained from additional research would likely not change that decision. In threshold analyses, at a higher WTP of $5200 the optimal sample size for testing a risk reduction intervention was 2750 per arm. For the outreach intervention, the optimal sample size remained zero across a wide range of WTP thresholds and was insensitive to variation. Limitations, including not varying all inputs in the model, may have led to an underestimation of the value of investing in new research. In summary, more research is not always needed, particularly when there is moderately robust prestudy belief about intervention effectiveness and little uncertainty about the value (cost-effectiveness) of the intervention. Users can test their own assumptions at http://torchresearch.org.
Determining sample size when assessing mean equivalence.
Asberg, Arne; Solem, Kristine B; Mikkelsen, Gustav
2014-11-01
When we want to assess whether two analytical methods are equivalent, we could test if the difference between the mean results is within the specification limits of 0 ± an acceptance criterion. Testing the null hypothesis of zero difference is less interesting, and so is the sample size estimation based on testing that hypothesis. Power function curves for equivalence testing experiments are not widely available. In this paper we present power function curves to help decide on the number of measurements when testing equivalence between the means of two analytical methods. Computer simulation was used to calculate the probability that the 90% confidence interval for the difference between the means of two analytical methods would exceed the specification limits of 0 ± 1, 0 ± 2 or 0 ± 3 analytical standard deviations (SDa), respectively. The probability of getting a nonequivalence alarm increases with increasing difference between the means when the difference is well within the specification limits. The probability increases with decreasing sample size and with smaller acceptance criteria. We may need at least 40-50 measurements with each analytical method when the specification limits are 0 ± 1 SDa, and 10-15 and 5-10 when the specification limits are 0 ± 2 and 0 ± 3 SDa, respectively. The power function curves provide information of the probability of false alarm, so that we can decide on the sample size under less uncertainty.
Sample size calculations for skewed distributions.
Cundill, Bonnie; Alexander, Neal D E
2015-04-02
Sample size calculations should correspond to the intended method of analysis. Nevertheless, for non-normal distributions, they are often done on the basis of normal approximations, even when the data are to be analysed using generalized linear models (GLMs). For the case of comparison of two means, we use GLM theory to derive sample size formulae, with particular cases being the negative binomial, Poisson, binomial, and gamma families. By simulation we estimate the performance of normal approximations, which, via the identity link, are special cases of our approach, and for common link functions such as the log. The negative binomial and gamma scenarios are motivated by examples in hookworm vaccine trials and insecticide-treated materials, respectively. Calculations on the link function (log) scale work well for the negative binomial and gamma scenarios examined and are often superior to the normal approximations. However, they have little advantage for the Poisson and binomial distributions. The proposed method is suitable for sample size calculations for comparisons of means of highly skewed outcome variables.
Optimal flexible sample size design with robust power.
Zhang, Lanju; Cui, Lu; Yang, Bo
2016-08-30
It is well recognized that sample size determination is challenging because of the uncertainty on the treatment effect size. Several remedies are available in the literature. Group sequential designs start with a sample size based on a conservative (smaller) effect size and allow early stop at interim looks. Sample size re-estimation designs start with a sample size based on an optimistic (larger) effect size and allow sample size increase if the observed effect size is smaller than planned. Different opinions favoring one type over the other exist. We propose an optimal approach using an appropriate optimality criterion to select the best design among all the candidate designs. Our results show that (1) for the same type of designs, for example, group sequential designs, there is room for significant improvement through our optimization approach; (2) optimal promising zone designs appear to have no advantages over optimal group sequential designs; and (3) optimal designs with sample size re-estimation deliver the best adaptive performance. We conclude that to deal with the challenge of sample size determination due to effect size uncertainty, an optimal approach can help to select the best design that provides most robust power across the effect size range of interest. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
MetSizeR: selecting the optimal sample size for metabolomic studies using an analysis based approach
Nyamundanda, Gift; Gormley, Isobel Claire; Fan, Yue; Gallagher, William M.; Brennan, Lorraine
2013-01-01
Background: Determining sample sizes for metabolomic experiments is important but due to the complexity of these experiments, there are currently no standard methods for sample size estimation in metabolomics. Since pilot studies are rarely done in metabolomics, currently existing sample size estimation approaches which rely on pilot data can not be applied. Results: In this article, an analysis based approach called MetSizeR is developed to estimate sample size for metabolomic experime...
How to calculate sample size and why.
Kim, Jeehyoung; Seo, Bong Soo
2013-09-01
Calculating the sample size is essential to reduce the cost of a study and to prove the hypothesis effectively. Referring to pilot studies and previous research studies, we can choose a proper hypothesis and simplify the studies by using a website or Microsoft Excel sheet that contains formulas for calculating sample size in the beginning stage of the study. There are numerous formulas for calculating the sample size for complicated statistics and studies, but most studies can use basic calculating methods for sample size calculation.
Sample size determination for the fluctuation experiment.
Zheng, Qi
2017-01-01
The Luria-Delbrück fluctuation experiment protocol is increasingly employed to determine microbial mutation rates in the laboratory. An important question raised at the planning stage is "How many cultures are needed?" For over 70 years sample sizes have been determined either by intuition or by following published examples where sample sizes were chosen intuitively. This paper proposes a practical method for determining the sample size. The proposed method relies on existing algorithms for computing the expected Fisher information under two commonly used mutant distributions. The role of partial plating in reducing sample size is discussed. Copyright © 2016 Elsevier B.V. All rights reserved.
Considerations in determining sample size for pilot studies.
Hertzog, Melody A
2008-04-01
There is little published guidance concerning how large a pilot study should be. General guidelines, for example using 10% of the sample required for a full study, may be inadequate for aims such as assessment of the adequacy of instrumentation or providing statistical estimates for a larger study. This article illustrates how confidence intervals constructed around a desired or anticipated value can help determine the sample size needed. Samples ranging in size from 10 to 40 per group are evaluated for their adequacy in providing estimates precise enough to meet a variety of possible aims. General sample size guidelines by type of aim are offered.
Additional Considerations in Determining Sample Size.
Levin, Joel R.; Subkoviak, Michael J.
Levin's (1975) sample-size determination procedure for completely randomized analysis of variance designs is extended to designs in which antecedent or blocking variables information is considered. In particular, a researcher's choice of designs is framed in terms of determining the respective sample sizes necessary to detect specified contrasts…
Determining Sample Size for Research Activities
Krejcie, Robert V.; Morgan, Daryle W.
1970-01-01
A formula for determining sample size, which originally appeared in 1960, has lacked a table for easy reference. This article supplies a graph of the function and a table of values which permits easy determination of the size of sample needed to be representative of a given population. (DG)
A review of software for sample size determination.
Dattalo, Patrick
2009-09-01
The size of a sample is an important element in determining the statistical precision with which population values can be estimated. This article identifies and describes free and commercial programs for sample size determination. Programs are categorized as follows: (a) multiple procedure for sample size determination; (b) single procedure for sample size determination; and (c) Web-based. Programs are described in terms of (a) cost; (b) ease of use, including interface, operating system and hardware requirements, and availability of documentation and technical support; (c) file management, including input and output formats; and (d) analytical and graphical capabilities.
Johnston, Lisa G; Prybylski, Dimitri; Raymond, H Fisher; Mirzazadeh, Ali; Manopaiboon, Chomnad; McFarland, Willi
2013-04-01
Estimating the sizes of populations at highest risk for HIV is essential for developing and monitoring effective HIV prevention and treatment programs. We provide several country examples of how service multiplier methods have been used in respondent-driven sampling surveys and provide guidance on how to maximize this method's use. Population size estimates were conducted in 4 countries (Mauritius- intravenous drug users [IDU] and female sex workers [FSW]; Papua New Guinea-FSW and men who have sex with men [MSM]; Thailand-IDU; United States-IDU) using adjusted proportions of population members reporting attending a service, project or study listed in a respondent-driven sampling survey, and the estimated total number of population members who visited one of the listed services, projects, or studies collected from the providers. The median population size estimates were 8866 for IDU and 667 for FSW in Mauritius. Median point estimates for FSW were 4190 in Port Moresby and 8712 in Goroka, Papua New Guinea, and 2,126 for MSM in Port Moresby and 4200 for IDU in Bangkok, Thailand. Median estimates for IDU were 1050 in Chiang Mai, Thailand, and 15,789 in 2005 and 15,554 in 2009 in San Francisco. Our estimates for almost all groups in each country fall within the range of other regional and national estimates, indicating that the service multiplier method, assuming all assumptions are met, can produce informative estimates. We suggest using multiple multipliers whenever possible, garnering program data from the widest possible range of services, projects, and studies. A median of several estimates is likely more robust to potential biases than a single estimate.
Sample size in qualitative interview studies
DEFF Research Database (Denmark)
Malterud, Kirsti; Siersma, Volkert Dirk; Guassora, Ann Dorrit Kristiane
2016-01-01
Sample sizes must be ascertained in qualitative studies like in quantitative studies but not by the same means. The prevailing concept for sample size in qualitative studies is “saturation.” Saturation is closely tied to a specific methodology, and the term is inconsistently applied. We propose...... the concept “information power” to guide adequate sample size for qualitative studies. Information power indicates that the more information the sample holds, relevant for the actual study, the lower amount of participants is needed. We suggest that the size of a sample with sufficient information power...... depends on (a) the aim of the study, (b) sample specificity, (c) use of established theory, (d) quality of dialogue, and (e) analysis strategy. We present a model where these elements of information and their relevant dimensions are related to information power. Application of this model in the planning...
Sample size determination in medical and surgical research.
Flikkema, Robert M; Toledo-Pereyra, Luis H
2012-02-01
One of the most critical yet frequently misunderstood principles of research is sample size determination. Obtaining an inadequate sample is a serious problem that can invalidate an entire study. Without an extensive background in statistics, the seemingly simple question of selecting a sample size can become quite a daunting task. This article aims to give a researcher with no background in statistics the basic tools needed for sample size determination. After reading this article, the researcher will be aware of all the factors involved in a power analysis and will be able to work more effectively with the statistician when determining sample size. This work also reviews the power of a statistical hypothesis, as well as how to estimate the effect size of a research study. These are the two key components of sample size determination. Several examples will be considered throughout the text.
Biostatistics Series Module 5: Determining Sample Size.
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Determining the appropriate sample size for a study, whatever be its type, is a fundamental aspect of biomedical research. An adequate sample ensures that the study will yield reliable information, regardless of whether the data ultimately suggests a clinically important difference between the interventions or elements being studied. The probability of Type 1 and Type 2 errors, the expected variance in the sample and the effect size are the essential determinants of sample size in interventional studies. Any method for deriving a conclusion from experimental data carries with it some risk of drawing a false conclusion. Two types of false conclusion may occur, called Type 1 and Type 2 errors, whose probabilities are denoted by the symbols σ and β. A Type 1 error occurs when one concludes that a difference exists between the groups being compared when, in reality, it does not. This is akin to a false positive result. A Type 2 error occurs when one concludes that difference does not exist when, in reality, a difference does exist, and it is equal to or larger than the effect size defined by the alternative to the null hypothesis. This may be viewed as a false negative result. When considering the risk of Type 2 error, it is more intuitive to think in terms of power of the study or (1 - β). Power denotes the probability of detecting a difference when a difference does exist between the groups being compared. Smaller α or larger power will increase sample size. Conventional acceptable values for power and α are 80% or above and 5% or below, respectively, when calculating sample size. Increasing variance in the sample tends to increase the sample size required to achieve a given power level. The effect size is the smallest clinically important difference that is sought to be detected and, rather than statistical convention, is a matter of past experience and clinical judgment. Larger samples are required if smaller differences are to be detected. Although the
Ratio estimation in poststratified sampling over two occasions ...
African Journals Online (AJOL)
Expressions for the optimum matching or replacement fractions of both estimators are obtained since the estimators are based on a partial replacement of sample units on the second occasion. Conditions under-which one estimator is to be preferred to the other estimator are obtained for repeated samples of fixed sizes.
Estimating Search Engine Index Size Variability
DEFF Research Database (Denmark)
Van den Bosch, Antal; Bogers, Toine; De Kunder, Maurice
2016-01-01
One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel...... method of estimating the size of a Web search engine’s index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing’s indices over a nine-year period, from March 2006...... until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find...
Morton, S E; Chiew, Y S; Pretty, C; Moltchanova, E; Scarrott, C; Redmond, D; Shaw, G M; Chase, J G
2017-02-01
Randomised control trials have sought to seek to improve mechanical ventilation treatment. However, few trials to date have shown clinical significance. It is hypothesised that aside from effective treatment, the outcome metrics and sample sizes of the trial also affect the significance, and thus impact trial design. In this study, a Monte-Carlo simulation method was developed and used to investigate several outcome metrics of ventilation treatment, including 1) length of mechanical ventilation (LoMV); 2) Ventilator Free Days (VFD); and 3) LoMV-28, a combination of the other metrics. As these metrics have highly skewed distributions, it also investigated the impact of imposing clinically relevant exclusion criteria on study power to enable better design for significance. Data from invasively ventilated patients from a single intensive care unit were used in this analysis to demonstrate the method. Use of LoMV as an outcome metric required 160 patients/arm to reach 80% power with a clinically expected intervention difference of 25% LoMV if clinically relevant exclusion criteria were applied to the cohort, but 400 patients/arm if they were not. However, only 130 patients/arm would be required for the same statistical significance at the same intervention difference if VFD was used. A Monte-Carlo simulation approach using local cohort data combined with objective patient selection criteria can yield better design of ventilation studies to desired power and significance, with fewer patients per arm than traditional trial design methods, which in turn reduces patient risk. Outcome metrics, such as VFD, should be used when a difference in mortality is also expected between the two cohorts. Finally, the non-parametric approach taken is readily generalisable to a range of trial types where outcome data is similarly skewed. Copyright © 2016. Published by Elsevier Inc.
Graph Sampling for Covariance Estimation
Chepuri, Sundeep Prabhakar
2017-04-25
In this paper the focus is on subsampling as well as reconstructing the second-order statistics of signals residing on nodes of arbitrary undirected graphs. Second-order stationary graph signals may be obtained by graph filtering zero-mean white noise and they admit a well-defined power spectrum whose shape is determined by the frequency response of the graph filter. Estimating the graph power spectrum forms an important component of stationary graph signal processing and related inference tasks such as Wiener prediction or inpainting on graphs. The central result of this paper is that by sampling a significantly smaller subset of vertices and using simple least squares, we can reconstruct the second-order statistics of the graph signal from the subsampled observations, and more importantly, without any spectral priors. To this end, both a nonparametric approach as well as parametric approaches including moving average and autoregressive models for the graph power spectrum are considered. The results specialize for undirected circulant graphs in that the graph nodes leading to the best compression rates are given by the so-called minimal sparse rulers. A near-optimal greedy algorithm is developed to design the subsampling scheme for the non-parametric and the moving average models, whereas a particular subsampling scheme that allows linear estimation for the autoregressive model is proposed. Numerical experiments on synthetic as well as real datasets related to climatology and processing handwritten digits are provided to demonstrate the developed theory.
Particle size distribution in ground biological samples.
Koglin, D; Backhaus, F; Schladot, J D
1997-05-01
Modern trace and retrospective analysis of Environmental Specimen Bank (ESB) samples require surplus material prepared and characterized as reference materials. Before the biological samples could be analyzed and stored for long periods at cryogenic temperatures, the materials have to be pre-crushed. As a second step, a milling and homogenization procedure has to follow. For this preparation, a grinding device is cooled with liquid nitrogen to a temperature of -190 degrees C. It is a significant condition for homogeneous samples that at least 90% of the particles should be smaller than 200 microns. In the German ESB the particle size distribution of the processed material is determined by means of a laser particle sizer. The decrease of particle sizes of deer liver and bream muscles after different grinding procedures as well as the consequences of ultrasonic treatment of the sample before particle size measurements have been investigated.
Estimating software development project size, using probabilistic ...
African Journals Online (AJOL)
This paper describes the quantitative process of managing the size of software development projects by Purchasers (Clients) and Vendors (Development Houses) where there are no historical databases. Probabilistic approach was used to estimate the software project size, using the data collected when we developed a ...
Sample Size and Statistical Power Calculation in Genetic Association Studies
Directory of Open Access Journals (Sweden)
Eun Pyo Hong
2012-06-01
Full Text Available A sample size with sufficient statistical power is critical to the success of genetic association studies to detect causal genes of human complex diseases. Genome-wide association studies require much larger sample sizes to achieve an adequate statistical power. We estimated the statistical power with increasing numbers of markers analyzed and compared the sample sizes that were required in case-control studies and case-parent studies. We computed the effective sample size and statistical power using Genetic Power Calculator. An analysis using a larger number of markers requires a larger sample size. Testing a single-nucleotide polymorphism (SNP marker requires 248 cases, while testing 500,000 SNPs and 1 million markers requires 1,206 cases and 1,255 cases, respectively, under the assumption of an odds ratio of 2, 5% disease prevalence, 5% minor allele frequency, complete linkage disequilibrium (LD, 1:1 case/control ratio, and a 5% error rate in an allelic test. Under a dominant model, a smaller sample size is required to achieve 80% power than other genetic models. We found that a much lower sample size was required with a strong effect size, common SNP, and increased LD. In addition, studying a common disease in a case-control study of a 1:4 case-control ratio is one way to achieve higher statistical power. We also found that case-parent studies require more samples than case-control studies. Although we have not covered all plausible cases in study design, the estimates of sample size and statistical power computed under various assumptions in this study may be useful to determine the sample size in designing a population-based genetic association study.
Improving your Hypothesis Testing: Determining Sample Sizes.
Luftig, Jeffrey T.; Norton, Willis P.
1982-01-01
This article builds on an earlier discussion of the importance of the Type II error (beta) and power to the hypothesis testing process (CE 511 484), and illustrates the methods by which sample size calculations should be employed so as to improve the research process. (Author/CT)
Anderson, Samantha F; Kelley, Ken; Maxwell, Scott E
2017-11-01
The sample size necessary to obtain a desired level of statistical power depends in part on the population value of the effect size, which is, by definition, unknown. A common approach to sample-size planning uses the sample effect size from a prior study as an estimate of the population value of the effect to be detected in the future study. Although this strategy is intuitively appealing, effect-size estimates, taken at face value, are typically not accurate estimates of the population effect size because of publication bias and uncertainty. We show that the use of this approach often results in underpowered studies, sometimes to an alarming degree. We present an alternative approach that adjusts sample effect sizes for bias and uncertainty, and we demonstrate its effectiveness for several experimental designs. Furthermore, we discuss an open-source R package, BUCSS, and user-friendly Web applications that we have made available to researchers so that they can easily implement our suggested methods.
Estimating the size of the homeless population in Budapest, Hungary
David, B; Snijders, TAB
In this study we try to estimate the size of the homeless population in Budapest by using two - non-standard - sampling methods: snowball sampling and capture-recapture method. Using two methods and three different data sets we are able to compare the methods as well as the results, and we also
Comparison of distance sampling estimates to a known population ...
African Journals Online (AJOL)
Line-transect sampling was used to obtain abundance estimates of an Ant-eating Chat Myrmecocichla formicivora population to compare these with the true size of the population. The population size was determined by a long-term banding study, and abundance estimates were obtained by surveying line transects.
Lo, J W; Fung, C H
1999-01-01
To guide cytotechnologists and pathologists in calculating the false negative proportion, or rate, and the number of Papanicolaou smears to be reevaluated for a meaningful assessment of screening performance, a computer program written in BASIC was prepared, based on several recent publications in the field of cytopathology. A complete program listing and sample runs to help users be cognizant of the necessary inputs to run the program. The output from the program gives the results of the various calculations. Since the tedious manual calculations are handled by the computer program, it is more likely for those involved in the interpretation of Papanicolaou smears to follow the approaches suggested by experts in these two areas.
Conservative Sample Size Determination for Repeated Measures Analysis of Covariance
Timothy M Morgan; Case, L. Douglas
2013-01-01
In the design of a randomized clinical trial with one pre and multiple post randomized assessments of the outcome variable, one needs to account for the repeated measures in determining the appropriate sample size. Unfortunately, one seldom has a good estimate of the variance of the outcome measure, let alone the correlations among the measurements over time.
Effects of Mesh Size on Sieved Samples of Corophium volutator
Crewe, Tara L.; Hamilton, Diana J.; Diamond, Antony W.
2001-08-01
Corophium volutator (Pallas), gammaridean amphipods found on intertidal mudflats, are frequently collected in mud samples sieved on mesh screens. However, mesh sizes used vary greatly among studies, raising the possibility that sampling methods bias results. The effect of using different mesh sizes on the resulting size-frequency distributions of Corophium was tested by collecting Corophium from mud samples with 0·5 and 0·25 mm sieves. More than 90% of Corophium less than 2 mm long passed through the larger sieve. A significantly smaller, but still substantial, proportion of 2-2·9 mm Corophium (30%) was also lost. Larger size classes were unaffected by mesh size. Mesh size significantly changed the observed size-frequency distribution of Corophium, and effects varied with sampling date. It is concluded that a 0·5 mm sieve is suitable for studies concentrating on adults, but to accurately estimate Corophium density and size-frequency distributions, a 0·25 mm sieve must be used.
Sample size requirements for training high-dimensional risk predictors.
Dobbin, Kevin K; Song, Xiao
2013-09-01
A common objective of biomarker studies is to develop a predictor of patient survival outcome. Determining the number of samples required to train a predictor from survival data is important for designing such studies. Existing sample size methods for training studies use parametric models for the high-dimensional data and cannot handle a right-censored dependent variable. We present a new training sample size method that is non-parametric with respect to the high-dimensional vectors, and is developed for a right-censored response. The method can be applied to any prediction algorithm that satisfies a set of conditions. The sample size is chosen so that the expected performance of the predictor is within a user-defined tolerance of optimal. The central method is based on a pilot dataset. To quantify uncertainty, a method to construct a confidence interval for the tolerance is developed. Adequacy of the size of the pilot dataset is discussed. An alternative model-based version of our method for estimating the tolerance when no adequate pilot dataset is available is presented. The model-based method requires a covariance matrix be specified, but we show that the identity covariance matrix provides adequate sample size when the user specifies three key quantities. Application of the sample size method to two microarray datasets is discussed.
Conservative Sample Size Determination for Repeated Measures Analysis of Covariance.
Morgan, Timothy M; Case, L Douglas
2013-07-05
In the design of a randomized clinical trial with one pre and multiple post randomized assessments of the outcome variable, one needs to account for the repeated measures in determining the appropriate sample size. Unfortunately, one seldom has a good estimate of the variance of the outcome measure, let alone the correlations among the measurements over time. We show how sample sizes can be calculated by making conservative assumptions regarding the correlations for a variety of covariance structures. The most conservative choice for the correlation depends on the covariance structure and the number of repeated measures. In the absence of good estimates of the correlations, the sample size is often based on a two-sample t-test, making the 'ultra' conservative and unrealistic assumption that there are zero correlations between the baseline and follow-up measures while at the same time assuming there are perfect correlations between the follow-up measures. Compared to the case of taking a single measurement, substantial savings in sample size can be realized by accounting for the repeated measures, even with very conservative assumptions regarding the parameters of the assumed correlation matrix. Assuming compound symmetry, the sample size from the two-sample t-test calculation can be reduced at least 44%, 56%, and 61% for repeated measures analysis of covariance by taking 2, 3, and 4 follow-up measures, respectively. The results offer a rational basis for determining a fairly conservative, yet efficient, sample size for clinical trials with repeated measures and a baseline value.
Uncertainty of the sample size reduction step in pesticide residue analysis of large-sized crops.
Omeroglu, P Yolci; Ambrus, Á; Boyacioglu, D; Majzik, E Solymosne
2013-01-01
To estimate the uncertainty of the sample size reduction step, each unit in laboratory samples of papaya and cucumber was cut into four segments in longitudinal directions and two opposite segments were selected for further homogenisation while the other two were discarded. Jackfruit was cut into six segments in longitudinal directions, and all segments were kept for further analysis. To determine the pesticide residue concentrations in each segment, they were individually homogenised and analysed by chromatographic methods. One segment from each unit of the laboratory sample was drawn randomly to obtain 50 theoretical sub-samples with an MS Office Excel macro. The residue concentrations in a sub-sample were calculated from the weight of segments and the corresponding residue concentration. The coefficient of variation calculated from the residue concentrations of 50 sub-samples gave the relative uncertainty resulting from the sample size reduction step. The sample size reduction step, which is performed by selecting one longitudinal segment from each unit of the laboratory sample, resulted in relative uncertainties of 17% and 21% for field-treated jackfruits and cucumber, respectively, and 7% for post-harvest treated papaya. The results demonstrated that sample size reduction is an inevitable source of uncertainty in pesticide residue analysis of large-sized crops. The post-harvest treatment resulted in a lower variability because the dipping process leads to a more uniform residue concentration on the surface of the crops than does the foliar application of pesticides.
Impaired hand size estimation in CRPS.
Peltz, Elena; Seifert, Frank; Lanz, Stefan; Müller, Rüdiger; Maihöfner, Christian
2011-10-01
A triad of clinical symptoms, ie, autonomic, motor and sensory dysfunctions, characterizes complex regional pain syndromes (CRPS). Sensory dysfunction comprises sensory loss or spontaneous and stimulus-evoked pain. Furthermore, a disturbance in the body schema may occur. In the present study, patients with CRPS of the upper extremity and healthy controls estimated their hand sizes on the basis of expanded or compressed schematic drawings of hands. In patients with CRPS we found an impairment in accurate hand size estimation; patients estimated their own CRPS-affected hand to be larger than it actually was when measured objectively. Moreover, overestimation correlated significantly with disease duration, neglect score, and increase of two-point-discrimination-thresholds (TPDT) compared to the unaffected hand and to control subjects' estimations. In line with previous functional imaging studies in CRPS patients demonstrating changes in central somatotopic maps, we suggest an involvement of the central nervous system in this disruption of the body schema. Potential cortical areas may be the primary somatosensory and posterior parietal cortices, which have been proposed to play a critical role in integrating visuospatial information. CRPS patients perceive their affected hand to be bigger than it is. The magnitude of this overestimation correlates with disease duration, decreased tactile thresholds, and neglect-score. Suggesting a disrupted body schema as the source of this impairment, our findings corroborate the current assumption of a CNS involvement in CRPS. Copyright © 2011 American Pain Society. Published by Elsevier Inc. All rights reserved.
Defining sample size and sampling strategy for dendrogeomorphic rockfall reconstructions
Morel, Pauline; Trappmann, Daniel; Corona, Christophe; Stoffel, Markus
2015-05-01
Optimized sampling strategies have been recently proposed for dendrogeomorphic reconstructions of mass movements with a large spatial footprint, such as landslides, snow avalanches, and debris flows. Such guidelines have, by contrast, been largely missing for rockfalls and cannot be transposed owing to the sporadic nature of this process and the occurrence of individual rocks and boulders. Based on a data set of 314 European larch (Larix decidua Mill.) trees (i.e., 64 trees/ha), growing on an active rockfall slope, this study bridges this gap and proposes an optimized sampling strategy for the spatial and temporal reconstruction of rockfall activity. Using random extractions of trees, iterative mapping, and a stratified sampling strategy based on an arbitrary selection of trees, we investigate subsets of the full tree-ring data set to define optimal sample size and sampling design for the development of frequency maps of rockfall activity. Spatially, our results demonstrate that the sampling of only 6 representative trees per ha can be sufficient to yield a reasonable mapping of the spatial distribution of rockfall frequencies on a slope, especially if the oldest and most heavily affected individuals are included in the analysis. At the same time, however, sampling such a low number of trees risks causing significant errors especially if nonrepresentative trees are chosen for analysis. An increased number of samples therefore improves the quality of the frequency maps in this case. Temporally, we demonstrate that at least 40 trees/ha are needed to obtain reliable rockfall chronologies. These results will facilitate the design of future studies, decrease the cost-benefit ratio of dendrogeomorphic studies and thus will permit production of reliable reconstructions with reasonable temporal efforts.
Sample Size Requirements for Traditional and Regression-Based Norms.
Oosterhuis, Hannah E M; van der Ark, L Andries; Sijtsma, Klaas
2016-04-01
Test norms enable determining the position of an individual test taker in the group. The most frequently used approach to obtain test norms is traditional norming. Regression-based norming may be more efficient than traditional norming and is rapidly growing in popularity, but little is known about its technical properties. A simulation study was conducted to compare the sample size requirements for traditional and regression-based norming by examining the 95% interpercentile ranges for percentile estimates as a function of sample size, norming method, size of covariate effects on the test score, test length, and number of answer categories in an item. Provided the assumptions of the linear regression model hold in the data, for a subdivision of the total group into eight equal-size subgroups, we found that regression-based norming requires samples 2.5 to 5.5 times smaller than traditional norming. Sample size requirements are presented for each norming method, test length, and number of answer categories. We emphasize that additional research is needed to establish sample size requirements when the assumptions of the linear regression model are violated. © The Author(s) 2015.
Heckmann, T.; Gegg, K.; Gegg, A.; Becht, M.
2013-06-01
Predictive spatial modelling is an important task in natural hazard assessment and regionalisation of geomorphic processes or landforms. Logistic regression is a multivariate statistical approach frequently used in predictive modelling; it can be conducted stepwise in order to select from a number of candidate independent variables those that lead to the best model. In our case study on a debris flow susceptibility model, we investigate the sensitivity of model selection and quality to different sample sizes in light of the following problem: on the one hand, a sample has to be large enough to cover the variability of geofactors within the study area, and to yield stable results; on the other hand, the sample must not be too large, because a large sample is likely to violate the assumption of independent observations due to spatial autocorrelation. Using stepwise model selection with 1000 random samples for a number of sample sizes between n = 50 and n = 5000, we investigate the inclusion and exclusion of geofactors and the diversity of the resulting models as a function of sample size; the multiplicity of different models is assessed using numerical indices borrowed from information theory and biodiversity research. Model diversity decreases with increasing sample size and reaches either a local minimum or a plateau; even larger sample sizes do not further reduce it, and approach the upper limit of sample size given, in this study, by the autocorrelation range of the spatial datasets. In this way, an optimised sample size can be derived from an exploratory analysis. Model uncertainty due to sampling and model selection, and its predictive ability, are explored statistically and spatially through the example of 100 models estimated in one study area and validated in a neighbouring area: depending on the study area and on sample size, the predicted probabilities for debris flow release differed, on average, by 7 to 23 percentage points. In view of these results, we
Heckmann, T.; Gegg, K.; Gegg, A.; Becht, M.
2014-02-01
Predictive spatial modelling is an important task in natural hazard assessment and regionalisation of geomorphic processes or landforms. Logistic regression is a multivariate statistical approach frequently used in predictive modelling; it can be conducted stepwise in order to select from a number of candidate independent variables those that lead to the best model. In our case study on a debris flow susceptibility model, we investigate the sensitivity of model selection and quality to different sample sizes in light of the following problem: on the one hand, a sample has to be large enough to cover the variability of geofactors within the study area, and to yield stable and reproducible results; on the other hand, the sample must not be too large, because a large sample is likely to violate the assumption of independent observations due to spatial autocorrelation. Using stepwise model selection with 1000 random samples for a number of sample sizes between n = 50 and n = 5000, we investigate the inclusion and exclusion of geofactors and the diversity of the resulting models as a function of sample size; the multiplicity of different models is assessed using numerical indices borrowed from information theory and biodiversity research. Model diversity decreases with increasing sample size and reaches either a local minimum or a plateau; even larger sample sizes do not further reduce it, and they approach the upper limit of sample size given, in this study, by the autocorrelation range of the spatial data sets. In this way, an optimised sample size can be derived from an exploratory analysis. Model uncertainty due to sampling and model selection, and its predictive ability, are explored statistically and spatially through the example of 100 models estimated in one study area and validated in a neighbouring area: depending on the study area and on sample size, the predicted probabilities for debris flow release differed, on average, by 7 to 23 percentage points. In
Software sizing, cost estimation and scheduling
Cheadle, William G.
1988-01-01
The Technology Implementation and Support Section at Martin Marietta Astronautics Group Denver is tasked with software development analysis, data collection, software productivity improvement and developing and applying various computerized software tools and models. The computerized tools are parametric models that reflect actuals taken from the large data base of completed software development projects. Martin Marietta's data base consists of over 300 completed projects and hundreds of cost estimating relationships (CERs) that are used in sizing, costing, scheduling and productivity improvement equations, studies, models and computerized tools.
Improved variance estimation along sample eigenvectors
Hendrikse, A.J.; Veldhuis, Raymond N.J.; Spreeuwers, Lieuwe Jan
Second order statistics estimates in the form of sample eigenvalues and sample eigenvectors give a sub optimal description of the population density. So far only attempts have been made to reduce the bias in the sample eigenvalues. However, because the sample eigenvectors differ from the population
Regression Estimator Using Double Ranked Set Sampling
Directory of Open Access Journals (Sweden)
Hani M. Samawi
2002-06-01
Full Text Available The performance of a regression estimator based on the double ranked set sample (DRSS scheme, introduced by Al-Saleh and Al-Kadiri (2000, is investigated when the mean of the auxiliary variable X is unknown. Our primary analysis and simulation indicates that using the DRSS regression estimator for estimating the population mean substantially increases relative efficiency compared to using regression estimator based on simple random sampling (SRS or ranked set sampling (RSS (Yu and Lam, 1997 regression estimator. Moreover, the regression estimator using DRSS is also more efficient than the naïve estimators of the population mean using SRS, RSS (when the correlation coefficient is at least 0.4 and DRSS for high correlation coefficient (at least 0.91. The theory is illustrated using a real data set of trees.
Heckmann, Tobias; Gegg, Katharina; Becht, Michael
2013-04-01
Statistical approaches to landslide susceptibility modelling on the catchment and regional scale are used very frequently compared to heuristic and physically based approaches. In the present study, we deal with the problem of the optimal sample size for a logistic regression model. More specifically, a stepwise approach has been chosen in order to select those independent variables (from a number of derivatives of a digital elevation model and landcover data) that explain best the spatial distribution of debris flow initiation zones in two neighbouring central alpine catchments in Austria (used mutually for model calculation and validation). In order to minimise problems arising from spatial autocorrelation, we sample a single raster cell from each debris flow initiation zone within an inventory. In addition, as suggested by previous work using the "rare events logistic regression" approach, we take a sample of the remaining "non-event" raster cells. The recommendations given in the literature on the size of this sample appear to be motivated by practical considerations, e.g. the time and cost of acquiring data for non-event cases, which do not apply to the case of spatial data. In our study, we aim at finding empirically an "optimal" sample size in order to avoid two problems: First, a sample too large will violate the independent sample assumption as the independent variables are spatially autocorrelated; hence, a variogram analysis leads to a sample size threshold above which the average distance between sampled cells falls below the autocorrelation range of the independent variables. Second, if the sample is too small, repeated sampling will lead to very different results, i.e. the independent variables and hence the result of a single model calculation will be extremely dependent on the choice of non-event cells. Using a Monte-Carlo analysis with stepwise logistic regression, 1000 models are calculated for a wide range of sample sizes. For each sample size
Small Sample Sizes Yield Biased Allometric Equations in Temperate Forests
Duncanson, L.; Rourke, O.; Dubayah, R.
2015-01-01
Accurate quantification of forest carbon stocks is required for constraining the global carbon cycle and its impacts on climate. The accuracies of forest biomass maps are inherently dependent on the accuracy of the field biomass estimates used to calibrate models, which are generated with allometric equations. Here, we provide a quantitative assessment of the sensitivity of allometric parameters to sample size in temperate forests, focusing on the allometric relationship between tree height a...
Food models for portion size estimation of Asian foods.
Lanerolle, P; Thoradeniya, T; de Silva, A
2013-08-01
Novel portion size estimation aids relevant to the cognitive potential of children are necessary for an improved accuracy in dietary recall. We developed graduated realistic food models for Asian foods and tested their accuracy and precision in children. Food models were constructed for nine commonly consumed food items using a range of low cost materials. These were tested among a random sample of 80 school children (aged 10-16 years). A total of 719 estimations were made. A high percentage (68%) of correct estimations and high correlation (r > 0.95, P foods. Portion size estimation using realistic food models is found to be accurate and precise and is suitable for use in children. © 2013 The Authors Journal of Human Nutrition and Dietetics © 2013 The British Dietetic Association Ltd.
Sample Size Determination for Regression Models Using Monte Carlo Methods in R
Beaujean, A. Alexander
2014-01-01
A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…
National Research Council Canada - National Science Library
Rita de Cássia Teixeira Oliveira; Cosme Damião Cruz; Tuneo Sediyama; Éder Matsuo; Luiz Renato Cadore
2012-01-01
The objective of this study was to estimate the genetic parameters and optimal sample size for the lengths of the hypocotyl and epicotyls and to analyze the conversion of quantitative data in multiple...
Modelling complete particle-size distributions from operator estimates of particle-size
Roberson, Sam; Weltje, Gert Jan
2014-05-01
Estimates of particle-size made by operators in the field and laboratory represent a vast and relatively untapped data archive. The wide spatial distribution of particle-size estimates makes them ideal for constructing geological models and soil maps. This study uses a large data set from the Netherlands (n = 4837) containing both operator estimates of particle size and complete particle-size distributions measured by laser granulometry. This study introduces a logit-based constrained-cubic-spline (CCS) algorithm to interpolate complete particle-size distributions from operator estimates. The CCS model is compared to four other models: (i) a linear interpolation; (ii) a log-hyperbolic interpolation; (iii) an empirical logistic function; and (iv) an empirical arctan function. Operator estimates were found to be both inaccurate and imprecise; only 14% of samples were successfully classified using the Dutch classification scheme for fine sediment. Operator estimates of sediment particle-size encompass the same range of values as particle-size distributions measured by laser analysis. However, the distributions measured by laser analysis show that most of the sand percentage values lie between zero and one, so the majority of the variability in the data is lost because operator estimates are made to the nearest 1% at best, and more frequently to the nearest 5%. A method for constructing complete particle-size distributions from operator estimates of sediment texture using a logit constrained cubit spline (CCS) interpolation algorithm is presented. This model and four other previously published methods are compared to establish the best approach to modelling particle-size distributions. The logit-CCS model is the most accurate method, although both logit-linear and log-linear interpolation models provide reasonable alternatives. Models based on empirical distribution functions are less accurate than interpolation algorithms for modelling particle-size distributions in
Directory of Open Access Journals (Sweden)
Carlos Montenegro Silva
2009-01-01
Full Text Available Se analizó el desempeño de distintos tamaños de muestra para estimar la composición de tallas de las capturas del langostino colorado (Pleuroncodes monodon, a partir de un procedimiento de remuestreo computacional. Se seleccionaron datos recolectados en mayo de 2002 entre los 29°10'S y 32°10'S. A partir de éstos, se probaron siete escenarios de muestreo de viajes de pesca (1-7 viajes, 12 escenarios de número de ejemplares muestreados (25, 50,...300, cada 25 ejemplares y dos estrategias de muestreo de lances de pesca al interior de un viaje de pesca (censo de lances y muestreo sistemático. Se probó la combinación de todos estos escenarios, lo que permitió analizar el desempeño de 168 escenarios de tamaño de muestra para estimar la composición de tallas por sexo. Los resultados indicaron una disminución en el índice de error en la estimación de la distribución de frecuencia de tallas, conforme aumentó el número de viajes de pesca, con disminuciones progresivamente menores entre escenarios adyacentes. Del mismo modo, se verificó una disminución en el índice de error al aumentar el número de ejemplares muestreados, con mejoras marginales sobre los 175 ejemplares.The performances of different sample sizes for estimating the size distribution of squat lobster (Pleuroncodes monodon catches were analyzed using a computer resampling procedure. The data selected were gathered in May 2002 between 29°10'S and 32°10'S. These data were used to test seven sampling scenarios for fishing trips (1-7 trips, twelve scenarios of the number of individuals sampled per tow (25, 50,..., 300, and two within-trip sampling strategies (sampling all tows and systematic tow sampling. By testing the combination of all these scenarios, we were able to analyze the performance of 168 scenarios of sample size for estimating the composition of sizes by sex. The results indicate a lower error index for estimates of the size frequency distribution as the
Small Sample Sizes Yield Biased Allometric Equations in Temperate Forests
Duncanson, L.; Rourke, O.; Dubayah, R.
2015-11-01
Accurate quantification of forest carbon stocks is required for constraining the global carbon cycle and its impacts on climate. The accuracies of forest biomass maps are inherently dependent on the accuracy of the field biomass estimates used to calibrate models, which are generated with allometric equations. Here, we provide a quantitative assessment of the sensitivity of allometric parameters to sample size in temperate forests, focusing on the allometric relationship between tree height and crown radius. We use LiDAR remote sensing to isolate between 10,000 to more than 1,000,000 tree height and crown radius measurements per site in six U.S. forests. We find that fitted allometric parameters are highly sensitive to sample size, producing systematic overestimates of height. We extend our analysis to biomass through the application of empirical relationships from the literature, and show that given the small sample sizes used in common allometric equations for biomass, the average site-level biomass bias is ~+70% with a standard deviation of 71%, ranging from -4% to +193%. These findings underscore the importance of increasing the sample sizes used for allometric equation generation.
Sample size determination for longitudinal designs with binary response.
Kapur, Kush; Bhaumik, Runa; Tang, X Charlene; Hur, Kwan; Reda, Domenic J; Bhaumik, Dulal K
2014-09-28
In this article, we develop appropriate statistical methods for determining the required sample size while comparing the efficacy of an intervention to a control with repeated binary response outcomes. Our proposed methodology incorporates the complexity of the hierarchical nature of underlying designs and provides solutions when varying attrition rates are present over time. We explore how the between-subject variability and attrition rates jointly influence the computation of sample size formula. Our procedure also shows how efficient estimation methods play a crucial role in power analysis. A practical guideline is provided when information regarding individual variance component is unavailable. The validity of our methods is established by extensive simulation studies. Results are illustrated with the help of two randomized clinical trials in the areas of contraception and insomnia. Copyright © 2014 John Wiley & Sons, Ltd.
Accurate Biomass Estimation via Bayesian Adaptive Sampling
Wheeler, K.; Knuth, K.; Castle, P.
2005-12-01
Typical estimates of standing wood derived from remote sensing sources take advantage of aggregate measurements of canopy heights (e.g. LIDAR) and canopy diameters (segmentation of IKONOS imagery) to obtain a wood volume estimate by assuming homogeneous species and a fixed function that returns volume. The validation of such techniques use manually measured diameter at breast height records (DBH). Our goal is to improve the accuracy and applicability of biomass estimation methods to heterogeneous forests and transitional areas. We are developing estimates with quantifiable uncertainty using a new form of estimation function, active sampling, and volumetric reconstruction image rendering for species specific mass truth. Initially we are developing a Bayesian adaptive sampling method for BRDF associated with the MISR Rahman model with respect to categorical biomes. This involves characterizing the probability distributions of the 3 free parameters of the Rahman model for the 6 categories of biomes used by MISR. Subsequently, these distributions can be used to determine the optimal sampling methodology to distinguish biomes during acquisition. We have a remotely controlled semi-autonomous helicopter that has stereo imaging, lidar, differential GPS, and spectrometers covering wavelengths from visible to NIR. We intend to automatically vary the way points of the flight path via the Bayesian adaptive sampling method. The second critical part of this work is in automating the validation of biomass estimates via using machine vision techniques. This involves taking 2-D pictures of trees of known species, and then via Bayesian techniques, reconstructing 3-D models of the trees to estimate the distribution moments associated with wood volume. Similar techniques have been developed by the medical imaging community. This then provides probability distributions conditional upon species. The final part of this work is in relating the BRDF actively sampled measurements to species
Lithiasis size estimation variability depending on image technical methodology.
Argüelles Salido, Enrique; Aguilar García, Jesús; Lozano-Blasco, Jose María; Subirá Rios, Jorge; Beardo Villar, Pastora; Campoy-Martínez, Pedro; Medina-López, Rafael A
2013-11-01
The lithiasic size is a determining factor in selecting the most suitable treatment, surgical or medical. However, the method for obtaining a reliable lithiasic size is not standardized. Our objetives are to determine the differences between the estimated lithiasic sizes shown by plain radiography test and by computerized axial tomography (CT) scan (using different techniques) in relation to the actual size, and to establish which is the ideal type of imaging for this purpose. We present an in vitro model with lithiasis obtained in cooperation with four centers. lithiasis >0.5 cm, intact, and visible via simple radiography. A sample of 245 lithiases was obtained, with 87 rejected as they did not fulfill the inclusion criteria. Initially the three main actual diameters of each lithiasis were measured with a calibrator, then a plain X-ray and a CT scan were taken of the samples to determine the surface size in cm(2) for simple radiography; surface size and volume in cm(3) for CT scan, in bone window and soft tissue (Toshiba Aquillion 64, sections of 0.5 mm, 120 Kv, 250 mA). The tomographic area was calculated by employing the formula recommended by the European Association of Urology and scanner software. The actual, radiographic and tomographic measurements were taken by three different researchers who were unaware of the results obtained by the each other. The statistics program IBM SPSS Statistics(®) 19 was used. Differences were analyzed using the Wilcoxon sign test. The bone window CT scan slightly overestimated the actual lithiasic size (0.12 vs. 0.17 cm(3)), while in soft tissue window the actual volume was practically doubled (0.12 vs. 0.21 cm(3)) (p lithiasis measurements can be estimated, although the craniocaudal diameter measurement will be overestimated. Using soft tissue window gives an overestimated size.
Sample Size Growth with an Increasing Number of Comparisons
Directory of Open Access Journals (Sweden)
Chi-Hong Tseng
2012-01-01
Full Text Available An appropriate sample size is crucial for the success of many studies that involve a large number of comparisons. Sample size formulas for testing multiple hypotheses are provided in this paper. They can be used to determine the sample sizes required to provide adequate power while controlling familywise error rate or false discovery rate, to derive the growth rate of sample size with respect to an increasing number of comparisons or decrease in effect size, and to assess reliability of study designs. It is demonstrated that practical sample sizes can often be achieved even when adjustments for a large number of comparisons are made as in many genomewide studies.
Optimizing Sampling Efficiency for Biomass Estimation Across NEON Domains
Abercrombie, H. H.; Meier, C. L.; Spencer, J. J.
2013-12-01
Over the course of 30 years, the National Ecological Observatory Network (NEON) will measure plant biomass and productivity across the U.S. to enable an understanding of terrestrial carbon cycle responses to ecosystem change drivers. Over the next several years, prior to operational sampling at a site, NEON will complete construction and characterization phases during which a limited amount of sampling will be done at each site to inform sampling designs, and guide standardization of data collection across all sites. Sampling biomass in 60+ sites distributed among 20 different eco-climatic domains poses major logistical and budgetary challenges. Traditional biomass sampling methods such as clip harvesting and direct measurements of Leaf Area Index (LAI) involve collecting and processing plant samples, and are time and labor intensive. Possible alternatives include using indirect sampling methods for estimating LAI such as digital hemispherical photography (DHP) or using a LI-COR 2200 Plant Canopy Analyzer. These LAI estimations can then be used as a proxy for biomass. The biomass estimates calculated can then inform the clip harvest sampling design during NEON operations, optimizing both sample size and number so that standardized uncertainty limits can be achieved with a minimum amount of sampling effort. In 2011, LAI and clip harvest data were collected from co-located sampling points at the Central Plains Experimental Range located in northern Colorado, a short grass steppe ecosystem that is the NEON Domain 10 core site. LAI was measured with a LI-COR 2200 Plant Canopy Analyzer. The layout of the sampling design included four, 300 meter transects, with clip harvests plots spaced every 50m, and LAI sub-transects spaced every 10m. LAI was measured at four points along 6m sub-transects running perpendicular to the 300m transect. Clip harvest plots were co-located 4m from corresponding LAI transects, and had dimensions of 0.1m by 2m. We conducted regression analyses
A power analysis for fidelity measurement sample size determination.
Stokes, Lynne; Allor, Jill H
2016-03-01
The importance of assessing fidelity has been emphasized recently with increasingly sophisticated definitions, assessment procedures, and integration of fidelity data into analyses of outcomes. Fidelity is often measured through observation and coding of instructional sessions either live or by video. However, little guidance has been provided about how to determine the number of observations needed to precisely measure fidelity. We propose a practical method for determining a reasonable sample size for fidelity data collection when fidelity assessment requires observation. The proposed methodology is based on consideration of the power of tests of the treatment effect of outcome itself, as well as of the relationship between fidelity and outcome. It makes use of the methodology of probability sampling from a finite population, because the fidelity parameters of interest are estimated over a specific, limited time frame using a sample. For example, consider a fidelity measure defined as the number of minutes of exposure to a treatment curriculum during the 36 weeks of the study. In this case, the finite population is the 36 sessions, the parameter (number of minutes over the entire 36 sessions) is a total, and the sample is the observed sessions. Software for the sample size calculation is provided. (c) 2016 APA, all rights reserved).
Simple and multiple linear regression: sample size considerations.
Hanley, James A
2016-11-01
The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright Â© 2016 Elsevier Inc. All rights reserved.
Sample size calculations for pilot randomized trials: a confidence interval approach.
Cocks, Kim; Torgerson, David J
2013-02-01
To describe a method using confidence intervals (CIs) to estimate the sample size for a pilot randomized trial. Using one-sided CIs and the estimated effect size that would be sought in a large trial, we calculated the sample size needed for pilot trials. Using an 80% one-sided CI, we estimated that a pilot trial should have at least 9% of the sample size of the main planned trial. Using the estimated effect size difference for the main trial and using a one-sided CI, this allows us to calculate a sample size for a pilot trial, which will make its results more useful than at present. Copyright © 2013 Elsevier Inc. All rights reserved.
An expert system for the calculation of sample size.
Ebell, M H; Neale, A V; Hodgkins, B J
1994-06-01
Calculation of sample size is a useful technique for researchers who are designing a study, and for clinicians who wish to interpret research findings. The elements that must be specified to calculate the sample size include alpha, beta, Type I and Type II errors, 1- and 2-tail tests, confidence intervals, and confidence levels. A computer software program written by one of the authors (MHE), Sample Size Expert, facilitates sample size calculations. The program uses an expert system to help inexperienced users calculate sample sizes for analytic and descriptive studies. The software is available at no cost from the author or electronically via several on-line information services.
The PowerAtlas: a power and sample size atlas for microarray experimental design and research
Directory of Open Access Journals (Sweden)
Wang Jelai
2006-02-01
Full Text Available Abstract Background Microarrays permit biologists to simultaneously measure the mRNA abundance of thousands of genes. An important issue facing investigators planning microarray experiments is how to estimate the sample size required for good statistical power. What is the projected sample size or number of replicate chips needed to address the multiple hypotheses with acceptable accuracy? Statistical methods exist for calculating power based upon a single hypothesis, using estimates of the variability in data from pilot studies. There is, however, a need for methods to estimate power and/or required sample sizes in situations where multiple hypotheses are being tested, such as in microarray experiments. In addition, investigators frequently do not have pilot data to estimate the sample sizes required for microarray studies. Results To address this challenge, we have developed a Microrarray PowerAtlas 1. The atlas enables estimation of statistical power by allowing investigators to appropriately plan studies by building upon previous studies that have similar experimental characteristics. Currently, there are sample sizes and power estimates based on 632 experiments from Gene Expression Omnibus (GEO. The PowerAtlas also permits investigators to upload their own pilot data and derive power and sample size estimates from these data. This resource will be updated regularly with new datasets from GEO and other databases such as The Nottingham Arabidopsis Stock Center (NASC. Conclusion This resource provides a valuable tool for investigators who are planning efficient microarray studies and estimating required sample sizes.
Power Spectrum Estimation of Randomly Sampled Signals
DEFF Research Database (Denmark)
Velte, Clara M.; Buchhave, Preben; K. George, William
2014-01-01
with high data rate and low inherent bias, respectively, while residence time weighting provides non-biased estimates regardless of setting. The free-running processor was also tested and compared to residence time weighting using actual LDA measurements in a turbulent round jet. Power spectra from...... of alternative methods attempting to produce correct power spectra have been invented andtested. The objective of the current study is to create a simple computer generated signal for baseline testing of residence time weighting and some of the most commonly proposed algorithms (or algorithms which most...... modernalgorithms ultimately are based on), sample-and-hold and the direct spectral estimator without residence time weighting, and compare how they perform in relation to power spectra based on the equidistantly sampled reference signal. The computer generated signal is a Poisson process with a sample rate...
Evaluation of sampling strategies to estimate crown biomass
Directory of Open Access Journals (Sweden)
Krishna P Poudel
2015-01-01
Full Text Available Background Depending on tree and site characteristics crown biomass accounts for a significant portion of the total aboveground biomass in the tree. Crown biomass estimation is useful for different purposes including evaluating the economic feasibility of crown utilization for energy production or forest products, fuel load assessments and fire management strategies, and wildfire modeling. However, crown biomass is difficult to predict because of the variability within and among species and sites. Thus the allometric equations used for predicting crown biomass should be based on data collected with precise and unbiased sampling strategies. In this study, we evaluate the performance different sampling strategies to estimate crown biomass and to evaluate the effect of sample size in estimating crown biomass. Methods Using data collected from 20 destructively sampled trees, we evaluated 11 different sampling strategies using six evaluation statistics: bias, relative bias, root mean square error (RMSE, relative RMSE, amount of biomass sampled, and relative biomass sampled. We also evaluated the performance of the selected sampling strategies when different numbers of branches (3, 6, 9, and 12 are selected from each tree. Tree specific log linear model with branch diameter and branch length as covariates was used to obtain individual branch biomass. Results Compared to all other methods stratified sampling with probability proportional to size estimation technique produced better results when three or six branches per tree were sampled. However, the systematic sampling with ratio estimation technique was the best when at least nine branches per tree were sampled. Under the stratified sampling strategy, selecting unequal number of branches per stratum produced approximately similar results to simple random sampling, but it further decreased RMSE when information on branch diameter is used in the design and estimation phases. Conclusions Use of
Planning Educational Research: Determining the Necessary Sample Size.
Olejnik, Stephen F.
1984-01-01
This paper discusses the sample size problem and four factors affecting its solution: significance level, statistical power, analysis procedure, and effect size. The interrelationship between these factors is discussed and demonstrated by calculating minimal sample size requirements for a variety of research conditions. (Author)
Ayutyanont, Napatkamon; Langbaum, Jessica B.; Hendrix, Suzanne B.; Chen, Kewei; Fleisher, Adam S.; Friesenhahn, Michel; Ward, Michael; Aguirre, Camilo; Acosta-Baena, Natalia; Madrigal, Lucìa; Muñoz, Claudia; Tirado, Victoria; Moreno, Sonia; Tariot, Pierre N.; Lopera, Francisco; Reiman, Eric M.
2014-01-01
Objective There is a need to identify a cognitive composite that is sensitive to tracking preclinical AD decline to be used as a primary endpoint in treatment trials. Method We capitalized on longitudinal data, collected from 1995 to 2010, from cognitively unimpaired presenilin 1 (PSEN1) E280A mutation carriers from the world’s largest known early-onset autosomal dominant AD (ADAD) kindred to identify a composite cognitive test with the greatest statistical power to track preclinical AD decline and estimate the number of carriers age 30 and older needed to detect a treatment effect in the Alzheimer’s Prevention Initiative’s (API) preclinical AD treatment trial. The mean-to-standard-deviation ratios (MSDRs) of change over time were calculated in a search for the optimal combination of one to seven cognitive tests/sub-tests drawn from the neuropsychological test battery in cognitively unimpaired mutation carriers during a two and five year follow-up period, using data from non-carriers during the same time period to correct for aging and practice effects. Combinations that performed well were then evaluated for robustness across follow-up years, occurrence of selected items within top performing combinations and representation of relevant cognitive domains. Results This optimal test combination included CERAD Word List Recall, CERAD Boston Naming Test (high frequency items), MMSE Orientation to Time, CERAD Constructional Praxis and Ravens Progressive Matrices (Set A) with an MSDR of 1.62. This composite is more sensitive than using either the CERAD Word List Recall (MSDR=0.38) or the entire CERAD-Col battery (MSDR=0.76). A sample size of 75 cognitively normal PSEN1-E280A mutation carriers age 30 and older per treatment arm allows for a detectable treatment effect of 29% in a 60-month trial (80% power, p=0.05). Conclusions We have identified a composite cognitive test score representing multiple cognitive domains that has improved power compared to the most
The Sample Size Influence in the Accuracy of the Image Classification of the Remote Sensing
Directory of Open Access Journals (Sweden)
Thomaz C. e C. da Costa
2004-12-01
Full Text Available Landuse/landcover maps produced by classification of remote sensing images incorporate uncertainty. This uncertainty is measured by accuracy indices using reference samples. The size of the reference sample is defined by approximation by a binomial function without the use of a pilot sample. This way the accuracy are not estimated, but fixed a priori. In case of divergency between the estimated and a priori accuracy the error of the sampling will deviate from the expected error. The size using pilot sample (theorically correct procedure justify when haven´t estimate of accuracy for work area, referent the product remote sensing utility.
Sample size determination in clinical trials with multiple endpoints
Sozu, Takashi; Hamasaki, Toshimitsu; Evans, Scott R
2015-01-01
This book integrates recent methodological developments for calculating the sample size and power in trials with more than one endpoint considered as multiple primary or co-primary, offering an important reference work for statisticians working in this area. The determination of sample size and the evaluation of power are fundamental and critical elements in the design of clinical trials. If the sample size is too small, important effects may go unnoticed; if the sample size is too large, it represents a waste of resources and unethically puts more participants at risk than necessary. Recently many clinical trials have been designed with more than one endpoint considered as multiple primary or co-primary, creating a need for new approaches to the design and analysis of these clinical trials. The book focuses on the evaluation of power and sample size determination when comparing the effects of two interventions in superiority clinical trials with multiple endpoints. Methods for sample size calculation in clin...
Better Size Estimation for Sparse Matrix Products
DEFF Research Database (Denmark)
Amossen, Rasmus Resen; Campagna, Andrea; Pagh, Rasmus
2010-01-01
We consider the problem of doing fast and reliable estimation of the number of non-zero entries in a sparse Boolean matrix product. Let n denote the total number of non-zero entries in the input matrices. We show how to compute a 1 ± ε approximation (with small probability of error) in expected...
Preeminence and prerequisites of sample size calculations in clinical trials
Directory of Open Access Journals (Sweden)
Richa Singhal
2015-01-01
Full Text Available The key components while planning a clinical study are the study design, study duration, and sample size. These features are an integral part of planning a clinical trial efficiently, ethically, and cost-effectively. This article describes some of the prerequisites for sample size calculation. It also explains that sample size calculation is different for different study designs. The article in detail describes the sample size calculation for a randomized controlled trial when the primary outcome is a continuous variable and when it is a proportion or a qualitative variable.
Requirements for Minimum Sample Size for Sensitivity and Specificity Analysis
Adnan, Tassha Hilda
2016-01-01
Sensitivity and specificity analysis is commonly used for screening and diagnostic tests. The main issue researchers face is to determine the sufficient sample sizes that are related with screening and diagnostic studies. Although the formula for sample size calculation is available but concerning majority of the researchers are not mathematicians or statisticians, hence, sample size calculation might not be easy for them. This review paper provides sample size tables with regards to sensitivity and specificity analysis. These tables were derived from formulation of sensitivity and specificity test using Power Analysis and Sample Size (PASS) software based on desired type I error, power and effect size. The approaches on how to use the tables were also discussed. PMID:27891446
Comparing interval estimates for small sample ordinal CFA models.
Natesan, Prathiba
2015-01-01
Robust maximum likelihood (RML) and asymptotically generalized least squares (AGLS) methods have been recommended for fitting ordinal structural equation models. Studies show that some of these methods underestimate standard errors. However, these studies have not investigated the coverage and bias of interval estimates. An estimate with a reasonable standard error could still be severely biased. This can only be known by systematically investigating the interval estimates. The present study compares Bayesian, RML, and AGLS interval estimates of factor correlations in ordinal confirmatory factor analysis models (CFA) for small sample data. Six sample sizes, 3 factor correlations, and 2 factor score distributions (multivariate normal and multivariate mildly skewed) were studied. Two Bayesian prior specifications, informative and relatively less informative were studied. Undercoverage of confidence intervals and underestimation of standard errors was common in non-Bayesian methods. Underestimated standard errors may lead to inflated Type-I error rates. Non-Bayesian intervals were more positive biased than negatively biased, that is, most intervals that did not contain the true value were greater than the true value. Some non-Bayesian methods had non-converging and inadmissible solutions for small samples and non-normal data. Bayesian empirical standard error estimates for informative and relatively less informative priors were closer to the average standard errors of the estimates. The coverage of Bayesian credibility intervals was closer to what was expected with overcoverage in a few cases. Although some Bayesian credibility intervals were wider, they reflected the nature of statistical uncertainty that comes with the data (e.g., small sample). Bayesian point estimates were also more accurate than non-Bayesian estimates. The results illustrate the importance of analyzing coverage and bias of interval estimates, and how ignoring interval estimates can be misleading
Estimation of population size using open capture-recapture models
McDonald, T.L.; Amstrup, Steven C.
2001-01-01
One of the most important needs for wildlife managers is an accurate estimate of population size. Yet, for many species, including most marine species and large mammals, accurate and precise estimation of numbers is one of the most difficult of all research challenges. Open-population capture-recapture models have proven useful in many situations to estimate survival probabilities but typically have not been used to estimate population size. We show that open-population models can be used to estimate population size by developing a Horvitz-Thompson-type estimate of population size and an estimator of its variance. Our population size estimate keys on the probability of capture at each trap occasion and therefore is quite general and can be made a function of external covariates measured during the study. Here we define the estimator and investigate its bias, variance, and variance estimator via computer simulation. Computer simulations make extensive use of real data taken from a study of polar bears (Ursus maritimus) in the Beaufort Sea. The population size estimator is shown to be useful because it was negligibly biased in all situations studied. The variance estimator is shown to be useful in all situations, but caution is warranted in cases of extreme capture heterogeneity.
Determination of the optimal sample size for a clinical trial accounting for the population size
Miller, Frank; Day, Simon; Hee, Siew Wan; Madan, Jason; Zohar, Sarah; Posch, Martin
2016-01-01
The problem of choosing a sample size for a clinical trial is a very common one. In some settings, such as rare diseases or other small populations, the large sample sizes usually associated with the standard frequentist approach may be infeasible, suggesting that the sample size chosen should reflect the size of the population under consideration. Incorporation of the population size is possible in a decision‐theoretic approach either explicitly by assuming that the population size is fixed and known, or implicitly through geometric discounting of the gain from future patients reflecting the expected population size. This paper develops such approaches. Building on previous work, an asymptotic expression is derived for the sample size for single and two‐arm clinical trials in the general case of a clinical trial with a primary endpoint with a distribution of one parameter exponential family form that optimizes a utility function that quantifies the cost and gain per patient as a continuous function of this parameter. It is shown that as the size of the population, N, or expected size, N∗ in the case of geometric discounting, becomes large, the optimal trial size is O(N1/2) or O(N∗1/2). The sample size obtained from the asymptotic expression is also compared with the exact optimal sample size in examples with responses with Bernoulli and Poisson distributions, showing that the asymptotic approximations can also be reasonable in relatively small sample sizes. PMID:27184938
Particle size distribution: A key factor in estimating powder dustiness.
López Lilao, Ana; Sanfélix Forner, Vicenta; Mallol Gasch, Gustavo; Monfort Gimeno, Eliseo
2017-12-01
A wide variety of raw materials, involving more than 20 samples of quartzes, feldspars, nephelines, carbonates, dolomites, sands, zircons, and alumina, were selected and characterised. Dustiness, i.e., a materials' tendency to generate dust on handling, was determined using the continuous drop method. These raw materials were selected to encompass a wide range of particle sizes (1.6-294 µm) and true densities (2650-4680 kg/m 3 ). The dustiness of the raw materials, i.e., their tendency to generate dust on handling, was determined using the continuous drop method. The influence of some key material parameters (particle size distribution, flowability, and specific surface area) on dustiness was assessed. In this regard, dustiness was found to be significantly affected by particle size distribution. Data analysis enabled development of a model for predicting the dustiness of the studied materials, assuming that dustiness depended on the particle fraction susceptible to emission and on the bulk material's susceptibility to release these particles. On the one hand, the developed model allows the dustiness mechanisms to be better understood. In this regard, it may be noted that relative emission increased with mean particle size. However, this did not necessarily imply that dustiness did, because dustiness also depended on the fraction of particles susceptible to be emitted. On the other hand, the developed model enables dustiness to be estimated using just the particle size distribution data. The quality of the fits was quite good and the fact that only particle size distribution data are needed facilitates industrial application, since these data are usually known by raw materials managers, thus making additional tests unnecessary. This model may therefore be deemed a key tool in drawing up efficient preventive and/or corrective measures to reduce dust emissions during bulk powder processing, both inside and outside industrial facilities. It is recommended, however
Software Size Estimation Using Expert Estimation: A Fuzzy Logic Approach
Stevenson, Glenn A.
2012-01-01
For decades software managers have been using formal methodologies such as the Constructive Cost Model and Function Points to estimate the effort of software projects during the early stages of project development. While some research shows these methodologies to be effective, many software managers feel that they are overly complicated to use and…
Kühberger, Anton; Fritz, Astrid; Scherndl, Thomas
2014-01-01
Background The p value obtained from a significance test provides no information about the magnitude or importance of the underlying phenomenon. Therefore, additional reporting of effect size is often recommended. Effect sizes are theoretically independent from sample size. Yet this may not hold true empirically: non-independence could indicate publication bias. Methods We investigate whether effect size is independent from sample size in psychological research. We randomly sampled 1,000 psychological articles from all areas of psychological research. We extracted p values, effect sizes, and sample sizes of all empirical papers, and calculated the correlation between effect size and sample size, and investigated the distribution of p values. Results We found a negative correlation of r = −.45 [95% CI: −.53; −.35] between effect size and sample size. In addition, we found an inordinately high number of p values just passing the boundary of significance. Additional data showed that neither implicit nor explicit power analysis could account for this pattern of findings. Conclusion The negative correlation between effect size and samples size, and the biased distribution of p values indicate pervasive publication bias in the entire field of psychology. PMID:25192357
Kühberger, Anton; Fritz, Astrid; Scherndl, Thomas
2014-01-01
The p value obtained from a significance test provides no information about the magnitude or importance of the underlying phenomenon. Therefore, additional reporting of effect size is often recommended. Effect sizes are theoretically independent from sample size. Yet this may not hold true empirically: non-independence could indicate publication bias. We investigate whether effect size is independent from sample size in psychological research. We randomly sampled 1,000 psychological articles from all areas of psychological research. We extracted p values, effect sizes, and sample sizes of all empirical papers, and calculated the correlation between effect size and sample size, and investigated the distribution of p values. We found a negative correlation of r = -.45 [95% CI: -.53; -.35] between effect size and sample size. In addition, we found an inordinately high number of p values just passing the boundary of significance. Additional data showed that neither implicit nor explicit power analysis could account for this pattern of findings. The negative correlation between effect size and samples size, and the biased distribution of p values indicate pervasive publication bias in the entire field of psychology.
Wang, Ming; Kong, Lan; Li, Zheng; Zhang, Lijun
2016-05-10
Generalized estimating equations (GEE) is a general statistical method to fit marginal models for longitudinal data in biomedical studies. The variance-covariance matrix of the regression parameter coefficients is usually estimated by a robust "sandwich" variance estimator, which does not perform satisfactorily when the sample size is small. To reduce the downward bias and improve the efficiency, several modified variance estimators have been proposed for bias-correction or efficiency improvement. In this paper, we provide a comprehensive review on recent developments of modified variance estimators and compare their small-sample performance theoretically and numerically through simulation and real data examples. In particular, Wald tests and t-tests based on different variance estimators are used for hypothesis testing, and the guideline on appropriate sample sizes for each estimator is provided for preserving type I error in general cases based on numerical results. Moreover, we develop a user-friendly R package "geesmv" incorporating all of these variance estimators for public usage in practice. Copyright © 2015 John Wiley & Sons, Ltd.
Sample size computation for association studies using case–parents ...
Indian Academy of Sciences (India)
sample size for case–control association studies is discussed. Materials and methods. Parameter settings. We consider a candidate locus with two alleles A and a where. A is putatively associated with the disease status (increasing. Keywords. sample size; association tests; genotype relative risk; power; autism. Journal of ...
Understanding Power and Rules of Thumb for Determining Sample Sizes
Betsy L. Morgan; Carmen R. Wilson Van Voorhis
2007-01-01
This article addresses the definition of power and its relationship to Type I and Type II errors. We discuss the relationship of sample size and power. Finally, we offer statistical rules of thumb guiding the selection of sample sizes large enough for sufficient power to detecting differences, associations, chi-square, and factor analyses.
Understanding Power and Rules of Thumb for Determining Sample Sizes
Directory of Open Access Journals (Sweden)
Betsy L. Morgan
2007-09-01
Full Text Available This article addresses the definition of power and its relationship to Type I and Type II errors. We discuss the relationship of sample size and power. Finally, we offer statistical rules of thumb guiding the selection of sample sizes large enough for sufficient power to detecting differences, associations, chi-square, and factor analyses.
Olives, Casey; Valadez, Joseph J; Pagano, Marcello
2014-03-01
To assess the bias incurred when curtailment of Lot Quality Assurance Sampling (LQAS) is ignored, to present unbiased estimators, to consider the impact of cluster sampling by simulation and to apply our method to published polio immunization data from Nigeria. We present estimators of coverage when using two kinds of curtailed LQAS strategies: semicurtailed and curtailed. We study the proposed estimators with independent and clustered data using three field-tested LQAS designs for assessing polio vaccination coverage, with samples of size 60 and decision rules of 9, 21 and 33, and compare them to biased maximum likelihood estimators. Lastly, we present estimates of polio vaccination coverage from previously published data in 20 local government authorities (LGAs) from five Nigerian states. Simulations illustrate substantial bias if one ignores the curtailed sampling design. Proposed estimators show no bias. Clustering does not affect the bias of these estimators. Across simulations, standard errors show signs of inflation as clustering increases. Neither sampling strategy nor LQAS design influences estimates of polio vaccination coverage in 20 Nigerian LGAs. When coverage is low, semicurtailed LQAS strategies considerably reduces the sample size required to make a decision. Curtailed LQAS designs further reduce the sample size when coverage is high. Results presented dispel the misconception that curtailed LQAS data are unsuitable for estimation. These findings augment the utility of LQAS as a tool for monitoring vaccination efforts by demonstrating that unbiased estimation using curtailed designs is not only possible but these designs also reduce the sample size. © 2014 John Wiley & Sons Ltd.
Estimation of Optimal Size of Plots for Experiments with Radiometer ...
African Journals Online (AJOL)
An experimental error can lead to rework and, consequently, to the loss of financial and human resources. One way to reduce this problem is the estimation of the optimum size of experimental plot to carry out the treatments. The objective of this study was to estimate the optimal size of plots for reflectance measurements in ...
Power Spectrum Estimation of Randomly Sampled Signals
DEFF Research Database (Denmark)
Velte, C. M.; Buchhave, P.; K. George, W.
. Residence time weighting provides non-biased estimates regardless of setting. The free-running processor was also tested and compared to residence time weighting using actual LDA measurements in a turbulent round jet. Power spectra from measurements on the jet centerline and the outer part of the jet...... sine waves. The primary signal and the corresponding power spectrum are shown in Figure 1. The conventional spectrum shows multiple erroneous mixing frequencies and the peak values are too low. The residence time weighted spectrum is correct. The sample-and-hold spectrum has lower power than...... the correct spectrum, and the f -2-filtering effect appearing for low data densities is evident (Adrian and Yao 1987). The remaining tests also show that sample-and-hold and the free-running processor perform well only under very particular circumstances with high data rate and low inherent bias, respectively...
OPTIMAL SAMPLE SIZE FOR STATISTICAL ANALYSIS OF WINTER WHEAT QUANTITATIVE TRAITS
Andrijana Eđed; Dražen Horvat; Zdenko Lončarić
2009-01-01
In the planning phase of every research particular attention should be dedicated to estimation of optimal sample size, aiming to obtain more precise and objective results of statistical analysis. The aim of this paper was to estimate optimal sample size of wheat yield components (plant height, spike length, number of spikelets per spike, number of grains per spike, weight of grains per spike and 1000 grains weight) for determination of statistically significant differences between two treatme...
Determining the sample size required for a community radon survey.
Chen, Jing; Tracy, Bliss L; Zielinski, Jan M; Moir, Deborah
2008-04-01
Radon measurements in homes and other buildings have been included in various community health surveys often dealing with only a few hundred randomly sampled households. It would be interesting to know whether such a small sample size can adequately represent the radon distribution in a large community. An analysis of radon measurement data obtained from the Winnipeg case-control study with randomly sampled subsets of different sizes has showed that a sample size of one to several hundred can serve the survey purpose well.
How Many Words Do Children Know? A Corpus-Based Estimation of Children's Total Vocabulary Size
Segbers, Jutta; Schroeder, Sascha
2017-01-01
In this article we present a new method for estimating children's total vocabulary size based on a language corpus in German. We drew a virtual sample of different lexicon sizes from a corpus and let the virtual sample "take" a vocabulary test by comparing whether the items were included in the virtual lexicons or not. This enabled us to…
Spake, Laure; Cardoso, Hugo F V
2018-01-01
The population on which forensic juvenile skeletal age estimation methods are applied has not been critically considered. Previous research suggests that child victims of homicide tend to be from socioeconomically disadvantaged contexts, and that these contexts impair linear growth. This study investigates whether juvenile skeletal remains examined by forensic anthropologists are short for age compared to their normal healthy peers. Cadaver lengths were obtained from records of autopsies of 1256 individuals, aged birth to eighteen years at death, conducted between 2000 and 2015 in Australia, New Zealand, and the U.S. Growth status of the forensic population, represented by homicide victims, and general population, represented by accident victims, were compared using height for age Z-scores and independent sample t-tests. Cadaver lengths of the accident victims were compared to growth references using one sample t-tests to evaluate whether accident victims reflect the general population. Homicide victims are shorter for age than accident victims in samples from the U.S., but not in Australia and New Zealand. Accident victims are more representative of the general population in Australia and New Zealand. Different results in Australia and New Zealand as opposed to the U.S. may be linked to socioeconomic inequality. These results suggest that physical anthropologists should critically select reference samples when devising forensic juvenile skeletal age estimation methods. Children examined in forensic investigations may be short for age, and thus methods developed on normal healthy children may yield inaccurate results. A healthy reference population may not necessarily constitute an appropriate growth comparison for the forensic anthropology population. Copyright © 2017 Elsevier B.V. All rights reserved.
Estimate of the particle size in nanoparticles of magnetite
Energy Technology Data Exchange (ETDEWEB)
Paresque, M.C.; Castro, J.A.; Campos, M.F.; Oliveira, E.M.; Liuzzi, M.A.S.C. [Universidade Federal Fluminense (UFF), Niteroi, RJ (Brazil)
2016-07-01
Full Text: Nanocrystalline particles of Fe3O4 were produced by co-precipitation in aquous mean. The particle size of magnetite is a very important parameter, because for particle size around 30 nm there is a transition superparamagnetic for ferromagnetic. This transition profoundly affects the properties of the nanofluid. The Langevin model allows an estimate of the particle size, directly from measured hysteresis curves. In this study, the particle size was also determined by x-ray diffraction with Rietveld analysis and by a Laser Particle Size Analyzer equipment. These two methods pointed out particle size around 20 nm. (author)
Determining sample size and a passing criterion for respirator fit-test panels.
Landsittel, D; Zhuang, Z; Newcomb, W; Berry Ann, R
2014-01-01
Few studies have proposed methods for sample size determination and specification of passing criterion (e.g., number needed to pass from a given size panel) for respirator fit-tests. One approach is to account for between- and within- subject variability, and thus take full advantage of the multiple donning measurements within subject, using a random effects model. The corresponding sample size calculation, however, may be difficult to implement in practice, as it depends on the model-specific and test panel-specific variance estimates, and thus does not yield a single sample size or specific cutoff for number needed to pass. A simple binomial approach is therefore proposed to simultaneously determine both the required sample size and the optimal cutoff for the number of subjects needed to achieve a passing result. The method essentially conducts a global search of the type I and type II errors under different null and alternative hypotheses, across the range of possible sample sizes, to find the lowest sample size which yields at least one cutoff satisfying, or approximately satisfying all pre-determined limits for the different error rates. Benchmark testing of 98 respirators (conducted by the National Institute for Occupational Safety and Health) is used to illustrate the binomial approach and show how sample size estimates from the random effects model can vary substantially depending on estimated variance components. For the binomial approach, probability calculations show that a sample size of 35 to 40 yields acceptable error rates under different null and alternative hypotheses. For the random effects model, the required sample sizes are generally smaller, but can vary substantially based on the estimate variance components. Overall, despite some limitations, the binomial approach represents a highly practical approach with reasonable statistical properties.
Methods for sample size determination in cluster randomized trials.
Rutterford, Clare; Copas, Andrew; Eldridge, Sandra
2015-06-01
The use of cluster randomized trials (CRTs) is increasing, along with the variety in their design and analysis. The simplest approach for their sample size calculation is to calculate the sample size assuming individual randomization and inflate this by a design effect to account for randomization by cluster. The assumptions of a simple design effect may not always be met; alternative or more complicated approaches are required. We summarise a wide range of sample size methods available for cluster randomized trials. For those familiar with sample size calculations for individually randomized trials but with less experience in the clustered case, this manuscript provides formulae for a wide range of scenarios with associated explanation and recommendations. For those with more experience, comprehensive summaries are provided that allow quick identification of methods for a given design, outcome and analysis method. We present first those methods applicable to the simplest two-arm, parallel group, completely randomized design followed by methods that incorporate deviations from this design such as: variability in cluster sizes; attrition; non-compliance; or the inclusion of baseline covariates or repeated measures. The paper concludes with methods for alternative designs. There is a large amount of methodology available for sample size calculations in CRTs. This paper gives the most comprehensive description of published methodology for sample size calculation and provides an important resource for those designing these trials. © The Author 2015. Published by Oxford University Press on behalf of the International Epidemiological Association.
Greene, Tom
2015-01-01
Performing well-powered randomized controlled trials is of fundamental importance in clinical research. The goal of sample size calculations is to assure that statistical power is acceptable while maintaining a small probability of a type I error. This chapter overviews the fundamentals of sample size calculation for standard types of outcomes for two-group studies. It considers (1) the problems of determining the size of the treatment effect that the studies will be designed to detect, (2) the modifications to sample size calculations to account for loss to follow-up and nonadherence, (3) the options when initial calculations indicate that the feasible sample size is insufficient to provide adequate power, and (4) the implication of using multiple primary endpoints. Sample size estimates for longitudinal cohort studies must take account of confounding by baseline factors.
Dang, Qianyu; Mazumdar, Sati; Houck, Patricia R
2008-08-01
The generalized linear mixed model (GLIMMIX) provides a powerful technique to model correlated outcomes with different types of distributions. The model can now be easily implemented with SAS PROC GLIMMIX in version 9.1. For binary outcomes, linearization methods of penalized quasi-likelihood (PQL) or marginal quasi-likelihood (MQL) provide relatively accurate variance estimates for fixed effects. Using GLIMMIX based on these linearization methods, we derived formulas for power and sample size calculations for longitudinal designs with attrition over time. We found that the power and sample size estimates depend on the within-subject correlation and the size of random effects. In this article, we present tables of minimum sample sizes commonly used to test hypotheses for longitudinal studies. A simulation study was used to compare the results. We also provide a Web link to the SAS macro that we developed to compute power and sample sizes for correlated binary outcomes.
Neuromuscular dose-response studies: determining sample size.
Kopman, A F; Lien, C A; Naguib, M
2011-02-01
Investigators planning dose-response studies of neuromuscular blockers have rarely used a priori power analysis to determine the minimal sample size their protocols require. Institutional Review Boards and peer-reviewed journals now generally ask for this information. This study outlines a proposed method for meeting these requirements. The slopes of the dose-response relationships of eight neuromuscular blocking agents were determined using regression analysis. These values were substituted for γ in the Hill equation. When this is done, the coefficient of variation (COV) around the mean value of the ED₅₀ for each drug is easily calculated. Using these values, we performed an a priori one-sample two-tailed t-test of the means to determine the required sample size when the allowable error in the ED₅₀ was varied from ±10-20%. The COV averaged 22% (range 15-27%). We used a COV value of 25% in determining the sample size. If the allowable error in finding the mean ED₅₀ is ±15%, a sample size of 24 is needed to achieve a power of 80%. Increasing 'accuracy' beyond this point requires increasing greater sample sizes (e.g. an 'n' of 37 for a ±12% error). On the basis of the results of this retrospective analysis, a total sample size of not less than 24 subjects should be adequate for determining a neuromuscular blocking drug's clinical potency with a reasonable degree of assurance.
Determination of the optimal sample size for a clinical trial accounting for the population size.
Stallard, Nigel; Miller, Frank; Day, Simon; Hee, Siew Wan; Madan, Jason; Zohar, Sarah; Posch, Martin
2017-07-01
The problem of choosing a sample size for a clinical trial is a very common one. In some settings, such as rare diseases or other small populations, the large sample sizes usually associated with the standard frequentist approach may be infeasible, suggesting that the sample size chosen should reflect the size of the population under consideration. Incorporation of the population size is possible in a decision-theoretic approach either explicitly by assuming that the population size is fixed and known, or implicitly through geometric discounting of the gain from future patients reflecting the expected population size. This paper develops such approaches. Building on previous work, an asymptotic expression is derived for the sample size for single and two-arm clinical trials in the general case of a clinical trial with a primary endpoint with a distribution of one parameter exponential family form that optimizes a utility function that quantifies the cost and gain per patient as a continuous function of this parameter. It is shown that as the size of the population, N, or expected size, N∗ in the case of geometric discounting, becomes large, the optimal trial size is O(N1/2) or O(N∗1/2). The sample size obtained from the asymptotic expression is also compared with the exact optimal sample size in examples with responses with Bernoulli and Poisson distributions, showing that the asymptotic approximations can also be reasonable in relatively small sample sizes. © 2016 The Author. Biometrical Journal published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Determining the effective sample size of a parametric prior.
Morita, Satoshi; Thall, Peter F; Müller, Peter
2008-06-01
We present a definition for the effective sample size of a parametric prior distribution in a Bayesian model, and propose methods for computing the effective sample size in a variety of settings. Our approach first constructs a prior chosen to be vague in a suitable sense, and updates this prior to obtain a sequence of posteriors corresponding to each of a range of sample sizes. We then compute a distance between each posterior and the parametric prior, defined in terms of the curvature of the logarithm of each distribution, and the posterior minimizing the distance defines the effective sample size of the prior. For cases where the distance cannot be computed analytically, we provide a numerical approximation based on Monte Carlo simulation. We provide general guidelines for application, illustrate the method in several standard cases where the answer seems obvious, and then apply it to some nonstandard settings.
Effects of sample size on the second magnetization peak in ...
Indian Academy of Sciences (India)
8+ crystals are observed at low temperatures, above the temperature where the SMP totally disappears. In particular, the onset of the SMP shifts to lower fields as the sample size decreases - a result that could be interpreted as a size effect in ...
Planning Longitudinal Field Studies: Considerations in Determining Sample Size.
St.Pierre, Robert G.
1980-01-01
Factors that influence the sample size necessary for longitudinal evaluations include the nature of the evaluation questions, nature of available comparison groups, consistency of the treatment in different sites, effect size, attrition rate, significance level for statistical tests, and statistical power. (Author/GDC)
Investigating the impact of sample size on cognate detection
List, Johann-Mattis
2013-01-01
International audience; In historical linguistics, the problem of cognate detection is traditionally approached within the frame-work of the comparative method. Since the method is usually carried out manually, it is very flexible regarding its input parameters. However, while the number of languages and the selection of comparanda is not important for the successfull application of the method, the sample size of the comparanda is. In order to shed light on the impact of sample size on cognat...
Directory of Open Access Journals (Sweden)
Danielle C Barbiero
2011-01-01
Full Text Available The objective of this study was to determine the minimum sample size for studies of community structure and/or dominant species at different heights of a rocky intertidal zone at Rio de Janeiro. Community structure indicators suggested a variation in the minimum surface of 100 to 800 cm , with a minimum of 2 to 8 pro files and at least 20 to 80 quadrant sampling points, depending on the height. Indicators of species abundance suggest 100 cm for Hypnea musciformis and 400 cm for Ulva fasciata, Phragmatopoma lapidosa Kinberg, (1867 and Gymnogongrus griffthsiae at lower heights; 200 cm² for Chthamalus spp. at intermedíate heights; and 800 cm for Littorina ziczac at the greatest height. In general, seven to eight profiles and 10 to 20 sampling points were used. Different sample sizes were related to the abundance and spatial distributions of individual species, which varied at each intertidal height according to the degree of environmental stress.El objetivo de este estudio fue determinar el tamaño mínimo de muestra para estudios de estructura de la comunidad y para las especies dominantes a diferentes alturas, en una zona intermareal rocosa en Río de Janeiro. Los indicadores de la estructura de la comunidad sugirieron una variación en la superficie mínima de 100 a 800 cm , 2 a 8 el número mínimo de perfiles y 20 a 80 el número mínimo de puntos de muestreo de cuadrantes, dependiendo en la altura. Los indicadores de abundancia de especies sugieren 100 cm para Hypnea musciformis, 400 cm para Ulva fasciata, Phragmatopoma lapidosa Kinberg, (1867 y Gymnogongrus griffthsiae a las alturas inferiores; 200 cm² para Chthamalus spp. a las alturas mediales y 800 cm para Littorina ziczac a la altura superior. El número de perfiles y puntos de muestreo fue, en general, 7-8 y 10-20, respectivamente. Diferentes tamaños de la muestra fueron relacionados con la abundancia de especies individuales y su distribución espacial, que varían en cada altura
Body-size distribution, biomass estimates and life histories of ...
African Journals Online (AJOL)
The body-size distributions and biomass estimates of Caenis (Ephemeroptera: Caenidae), Cloeon (Ephemeroptera: Baetidae), Coenagrionidae (Odonata), Micronecta (Hemiptera: Corixidae), Chironominae (Diptera: Chironomidae) and Orthocladiinae (Diptera: Chironomidae), the most common and abundant insect taxa ...
Estimates of software size from state machine designs
Britcher, R. N.; Gaffney, J. E.
1982-01-01
The length, or size (in number of Source Lines of Code) of programs represented as state machines, it is demonstrated, can be reliably estimated in terms of the number of internal state machine variables.
Sample Size Determination in a Chi-Squared Test Given Information from an Earlier Study.
Gillett, Raphael
1996-01-01
A rigorous method is outlined for using information from a previous study and explicitly taking into account the variability of an effect size estimate when determining sample size for a chi-squared test. This approach assures that the average power of all experiments in a discipline attains the desired level. (SLD)
A margin based approach to determining sample sizes via tolerance bounds.
Energy Technology Data Exchange (ETDEWEB)
Newcomer, Justin T.; Freeland, Katherine Elizabeth
2013-09-01
This paper proposes a tolerance bound approach for determining sample sizes. With this new methodology we begin to think of sample size in the context of uncertainty exceeding margin. As the sample size decreases the uncertainty in the estimate of margin increases. This can be problematic when the margin is small and only a few units are available for testing. In this case there may be a true underlying positive margin to requirements but the uncertainty may be too large to conclude we have sufficient margin to those requirements with a high level of statistical confidence. Therefore, we provide a methodology for choosing a sample size large enough such that an estimated QMU uncertainty based on the tolerance bound approach will be smaller than the estimated margin (assuming there is positive margin). This ensures that the estimated tolerance bound will be within performance requirements and the tolerance ratio will be greater than one, supporting a conclusion that we have sufficient margin to the performance requirements. In addition, this paper explores the relationship between margin, uncertainty, and sample size and provides an approach and recommendations for quantifying risk when sample sizes are limited.
Mini-batch stochastic gradient descent with dynamic sample sizes
Metel, Michael R.
2017-01-01
We focus on solving constrained convex optimization problems using mini-batch stochastic gradient descent. Dynamic sample size rules are presented which ensure a descent direction with high probability. Empirical results from two applications show superior convergence compared to fixed sample implementations.
Sample size formulae for the Bayesian continual reassessment method.
Cheung, Ying Kuen
2013-01-01
In the planning of a dose finding study, a primary design objective is to maintain high accuracy in terms of the probability of selecting the maximum tolerated dose. While numerous dose finding methods have been proposed in the literature, concrete guidance on sample size determination is lacking. With a motivation to provide quick and easy calculations during trial planning, we present closed form formulae for sample size determination associated with the use of the Bayesian continual reassessment method (CRM). We examine the sampling distribution of a nonparametric optimal design and exploit it as a proxy to empirically derive an accuracy index of the CRM using linear regression. We apply the formulae to determine the sample size of a phase I trial of PTEN-long in pancreatic cancer patients and demonstrate that the formulae give results very similar to simulation. The formulae are implemented by an R function 'getn' in the package 'dfcrm'. The results are developed for the Bayesian CRM and should be validated by simulation when used for other dose finding methods. The analytical formulae we propose give quick and accurate approximation of the required sample size for the CRM. The approach used to derive the formulae can be applied to obtain sample size formulae for other dose finding methods.
On efficiency of some ratio estimators in double sampling design ...
African Journals Online (AJOL)
In this paper, three sampling ratio estimators in double sampling design were proposed with the intention of finding an alternative double sampling design estimator to the conventional ratio estimator in double sampling design discussed by Cochran (1997), Okafor (2002) , Raj (1972) and Raj and Chandhok (1999).
Treatment effect on biases in size estimation in spider phobia.
Shiban, Youssef; Fruth, Martina B; Pauli, Paul; Kinateder, Max; Reichenberger, Jonas; Mühlberger, Andreas
2016-12-01
The current study investigates biases in size estimations made by spider-phobic and healthy participants before and after treatment. Forty-one spider-phobic and 20 healthy participants received virtual reality (VR) exposure treatment and were then asked to rate the size of a real spider immediately before and, on average, 15days after the treatment. During the VR exposure treatment skin conductance response was assessed. Prior to the treatment, both groups tended to overestimate the size of the spider, but this size estimation bias was significantly larger in the phobic group than in the control group. The VR exposure treatment reduced this bias, which was reflected in a significantly smaller size rating post treatment. However, the size estimation bias was unrelated to the skin conductance response. Our results confirm the hypothesis that size estimation by spider-phobic patients is biased. This bias is not stable over time and can be decreased with adequate treatment. Copyright © 2016 Elsevier B.V. All rights reserved.
Directory of Open Access Journals (Sweden)
R. Eric Heidel
2016-01-01
Full Text Available Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by an a priori sample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up an a priori sample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.
Accuracy or precision: Implications of sample design and methodology on abundance estimation
Kowalewski, Lucas K.; Chizinski, Christopher J.; Powell, Larkin A.; Pope, Kevin L.; Pegg, Mark A.
2015-01-01
Sampling by spatially replicated counts (point-count) is an increasingly popular method of estimating population size of organisms. Challenges exist when sampling by point-count method, and it is often impractical to sample entire area of interest and impossible to detect every individual present. Ecologists encounter logistical limitations that force them to sample either few large-sample units or many small sample-units, introducing biases to sample counts. We generated a computer environment and simulated sampling scenarios to test the role of number of samples, sample unit area, number of organisms, and distribution of organisms in the estimation of population sizes using N-mixture models. Many sample units of small area provided estimates that were consistently closer to true abundance than sample scenarios with few sample units of large area. However, sample scenarios with few sample units of large area provided more precise abundance estimates than abundance estimates derived from sample scenarios with many sample units of small area. It is important to consider accuracy and precision of abundance estimates during the sample design process with study goals and objectives fully recognized, although and with consequence, consideration of accuracy and precision of abundance estimates is often an afterthought that occurs during the data analysis process.
Sample size considerations for clinical research studies in nuclear cardiology.
Chiuzan, Cody; West, Erin A; Duong, Jimmy; Cheung, Ken Y K; Einstein, Andrew J
2015-12-01
Sample size calculation is an important element of research design that investigators need to consider in the planning stage of the study. Funding agencies and research review panels request a power analysis, for example, to determine the minimum number of subjects needed for an experiment to be informative. Calculating the right sample size is crucial to gaining accurate information and ensures that research resources are used efficiently and ethically. The simple question "How many subjects do I need?" does not always have a simple answer. Before calculating the sample size requirements, a researcher must address several aspects, such as purpose of the research (descriptive or comparative), type of samples (one or more groups), and data being collected (continuous or categorical). In this article, we describe some of the most frequent methods for calculating the sample size with examples from nuclear cardiology research, including for t tests, analysis of variance (ANOVA), non-parametric tests, correlation, Chi-squared tests, and survival analysis. For the ease of implementation, several examples are also illustrated via user-friendly free statistical software.
Sample Size for Assessing Agreement between Two Methods of Measurement by Bland-Altman Method.
Lu, Meng-Jie; Zhong, Wei-Hua; Liu, Yu-Xiu; Miao, Hua-Zhang; Li, Yong-Chang; Ji, Mu-Huo
2016-11-01
The Bland-Altman method has been widely used for assessing agreement between two methods of measurement. However, it remains unsolved about sample size estimation. We propose a new method of sample size estimation for Bland-Altman agreement assessment. According to the Bland-Altman method, the conclusion on agreement is made based on the width of the confidence interval for LOAs (limits of agreement) in comparison to predefined clinical agreement limit. Under the theory of statistical inference, the formulae of sample size estimation are derived, which depended on the pre-determined level of α, β, the mean and the standard deviation of differences between two measurements, and the predefined limits. With this new method, the sample sizes are calculated under different parameter settings which occur frequently in method comparison studies, and Monte-Carlo simulation is used to obtain the corresponding powers. The results of Monte-Carlo simulation showed that the achieved powers could coincide with the pre-determined level of powers, thus validating the correctness of the method. The method of sample size estimation can be applied in the Bland-Altman method to assess agreement between two methods of measurement.
The yield estimation of semiconductor products based on truncated samples
Directory of Open Access Journals (Sweden)
Gu K.
2013-01-01
Full Text Available Product yield reflects the potential product quality and reliability, which means that high yield corresponds to good quality and high reliability. Yet consumers usually couldn’t know the actual yield of the products they purchase. Generally, the products that consumers get from suppliers are all eligible. Since the quality characteristic of the eligible products is covered by the specifications, then the observations of quality characteristic follow truncated normal distribution. In the light of maximum likelihood estimation, this paper proposes an algorithm for calculating the parameters of full Gaussian distribution before truncation based on truncated data and estimating product yield. The confidence interval of the yield result is derived, and the effect of sample size on the precision of the calculation result is also analyzed. Finally, the effectiveness of this algorithm is verified by an actual instance.
Allen, John C; Thumboo, Julian; Lye, Weng Kit; Conaghan, Philip G; Chew, Li-Ching; Tan, York Kiat
2017-10-03
To determine whether novel methods of selecting joints through (i) ultrasonography (individualized-ultrasound [IUS] method), or (ii) ultrasonography and clinical examination (individualized-composite-ultrasound [ICUS] method) translate into smaller rheumatoid arthritis (RA) clinical trial sample sizes when compared to existing methods utilizing predetermined joint sites for ultrasonography. Cohen's effect size (ES) was estimated (ES^) and a 95% CI (ES^L, ES^U) calculated on a mean change in 3-month total inflammatory score for each method. Corresponding 95% CIs [nL(ES^U), nU(ES^L)] were obtained on a post hoc sample size reflecting the uncertainty in ES^. Sample size calculations were based on a one-sample t-test as the patient numbers needed to provide 80% power at α = 0.05 to reject a null hypothesis H0 : ES = 0 versus alternative hypotheses H1 : ES = ES^, ES = ES^L and ES = ES^U. We aimed to provide point and interval estimates on projected sample sizes for future studies reflecting the uncertainty in our study ES^S. Twenty-four treated RA patients were followed up for 3 months. Utilizing the 12-joint approach and existing methods, the post hoc sample size (95% CI) was 22 (10-245). Corresponding sample sizes using ICUS and IUS were 11 (7-40) and 11 (6-38), respectively. Utilizing a seven-joint approach, the corresponding sample sizes using ICUS and IUS methods were nine (6-24) and 11 (6-35), respectively. Our pilot study suggests that sample size for RA clinical trials with ultrasound endpoints may be reduced using the novel methods, providing justification for larger studies to confirm these observations. © 2017 Asia Pacific League of Associations for Rheumatology and John Wiley & Sons Australia, Ltd.
Directory of Open Access Journals (Sweden)
Annegret Grimm
Full Text Available Reliable estimates of population size are fundamental in many ecological studies and biodiversity conservation. Selecting appropriate methods to estimate abundance is often very difficult, especially if data are scarce. Most studies concerning the reliability of different estimators used simulation data based on assumptions about capture variability that do not necessarily reflect conditions in natural populations. Here, we used data from an intensively studied closed population of the arboreal gecko Gehyra variegata to construct reference population sizes for assessing twelve different population size estimators in terms of bias, precision, accuracy, and their 95%-confidence intervals. Two of the reference populations reflect natural biological entities, whereas the other reference populations reflect artificial subsets of the population. Since individual heterogeneity was assumed, we tested modifications of the Lincoln-Petersen estimator, a set of models in programs MARK and CARE-2, and a truncated geometric distribution. Ranking of methods was similar across criteria. Models accounting for individual heterogeneity performed best in all assessment criteria. For populations from heterogeneous habitats without obvious covariates explaining individual heterogeneity, we recommend using the moment estimator or the interpolated jackknife estimator (both implemented in CAPTURE/MARK. If data for capture frequencies are substantial, we recommend the sample coverage or the estimating equation (both models implemented in CARE-2. Depending on the distribution of catchabilities, our proposed multiple Lincoln-Petersen and a truncated geometric distribution obtained comparably good results. The former usually resulted in a minimum population size and the latter can be recommended when there is a long tail of low capture probabilities. Models with covariates and mixture models performed poorly. Our approach identified suitable methods and extended options to
Dong, Nianbo; Maynard, Rebecca
2013-01-01
This paper and the accompanying tool are intended to complement existing supports for conducting power analysis tools by offering a tool based on the framework of Minimum Detectable Effect Sizes (MDES) formulae that can be used in determining sample size requirements and in estimating minimum detectable effect sizes for a range of individual- and…
Sample size for collecting germplasms–a polyploid model with ...
Indian Academy of Sciences (India)
Numerous expressions/results developed for germplasm collection/regeneration for diploid populations by earlier workers can be directly deduced from our general expression by assigning appropriate values of the corresponding parameters. A seed factor which influences the plant sample size has also been isolated to ...
Sample size for collecting germplasms – a polyploid model with ...
Indian Academy of Sciences (India)
Unknown
germplasm collection/regeneration for diploid populations by earlier workers can be directly deduced from our general expression by assigning appropriate values of the corresponding parameters. A seed factor which influences the plant sample size has also been isolated to aid the collectors in selecting the appropriate.
Research Note Pilot survey to assess sample size for herbaceous ...
African Journals Online (AJOL)
A pilot survey to determine sub-sample size (number of point observations per plot) for herbaceous species composition assessments, using a wheel-point apparatus applying the nearest-plant method, was conducted. Three plots differing in species composition on the Zululand coastal plain were selected, and on each plot ...
Determining sample size for assessing species composition in ...
African Journals Online (AJOL)
Species composition is measured in grasslands for a variety of reasons. Commonly, observations are made using the wheel-point apparatus, but the problem of determining optimum sample size has not yet been satisfactorily resolved. In this study the wheel-point apparatus was used to record 2 000 observations in each of ...
Sample Size Determinations for the Two Rater Kappa Statistic.
Flack, Virginia F.; And Others
1988-01-01
A method is presented for determining sample size that will achieve a pre-specified bound on confidence interval width for the interrater agreement measure "kappa." The same results can be used when a pre-specified power is desired for testing hypotheses about the value of kappa. (Author/SLD)
Mongoloid-Caucasoid Differences in Brain Size from Military Samples.
Rushton, J. Philippe; And Others
1991-01-01
Calculation of cranial capacities for the means from 4 Mongoloid and 20 Caucasoid samples (raw data from 57,378 individuals in 1978) found larger brain size for Mongoloids, a finding discussed in evolutionary terms. The conclusion is disputed by L. Willerman but supported by J. P. Rushton. (SLD)
Sample size determination for a t test given a t value from a previous study: A FORTRAN 77 program.
Gillett, R
2001-11-01
When uncertain about the magnitude of an effect, researchers commonly substitute in the standard sample-size-determination formula an estimate of effect size derived from a previous experiment. A problem with this approach is that the traditional sample-size-determination formula was not designed to deal with the uncertainty inherent in an effect-size estimate. Consequently, estimate-substitution in the traditional sample-size-determination formula can lead to a substantial loss of power. A method of sample-size determination designed to handle uncertainty in effect-size estimates is described. The procedure uses the t value and sample size from a previous study, which might be a pilot study or a related study in the same area, to establish a distribution of probable effect sizes. The sample size to be employed in the new study is that which supplies an expected power of the desired amount over the distribution of probable effect sizes. A FORTRAN 77 program is presented that permits swift calculation of sample size for a variety of t tests, including independent t tests, related t tests, t tests of correlation coefficients, and t tests of multiple regression b coefficients.
Sample size and power calculation for molecular biology studies.
Jung, Sin-Ho
2010-01-01
Sample size calculation is a critical procedure when designing a new biological study. In this chapter, we consider molecular biology studies generating huge dimensional data. Microarray studies are typical examples, so that we state this chapter in terms of gene microarray data, but the discussed methods can be used for design and analysis of any molecular biology studies involving high-dimensional data. In this chapter, we discuss sample size calculation methods for molecular biology studies when the discovery of prognostic molecular markers is performed by accurately controlling false discovery rate (FDR) or family-wise error rate (FWER) in the final data analysis. We limit our discussion to the two-sample case.
Aerosol Sampling Bias from Differential Electrostatic Charge and Particle Size
Jayjock, Michael Anthony
Lack of reliable epidemiological data on long term health effects of aerosols is due in part to inadequacy of sampling procedures and the attendant doubt regarding the validity of the concentrations measured. Differential particle size has been widely accepted and studied as a major potential biasing effect in the sampling of such aerosols. However, relatively little has been done to study the effect of electrostatic particle charge on aerosol sampling. The objective of this research was to investigate the possible biasing effects of differential electrostatic charge, particle size and their interaction on the sampling accuracy of standard aerosol measuring methodologies. Field studies were first conducted to determine the levels and variability of aerosol particle size and charge at two manufacturing facilities making acrylic powder. The field work showed that the particle mass median aerodynamic diameter (MMAD) varied by almost an order of magnitude (4-34 microns) while the aerosol surface charge was relatively stable (0.6-0.9 micro coulombs/m('2)). The second part of this work was a series of laboratory experiments in which aerosol charge and MMAD were manipulated in a 2('n) factorial design with the percentage of sampling bias for various standard methodologies as the dependent variable. The experiments used the same friable acrylic powder studied in the field work plus two size populations of ground quartz as a nonfriable control. Despite some ill conditioning of the independent variables due to experimental difficulties, statistical analysis has shown aerosol charge (at levels comparable to those measured in workroom air) is capable of having a significant biasing effect. Physical models consistent with the sampling data indicate that the level and bipolarity of the aerosol charge are determining factors in the extent and direction of the bias.
Estimated spatial requirements of the medium- to large-sized ...
African Journals Online (AJOL)
Conservation planning in the Cape Floristic Region (CFR) of South Africa, a recognised world plant diversity hotspot, required information on the estimated spatial requirements of selected medium- to large-sized mammals within each of 102 Broad Habitat Units (BHUs) delineated according to key biophysical parameters.
Estimating population size of Saddle-billed Storks Ephippiorhynchus ...
African Journals Online (AJOL)
Counting Saddle-billed Storks in a study area the size of the Kruger National Park, at 2.2 million ha, is difficult because the birds are long-lived, sparse in the landscape and have large home ranges. Aerial surveys conducted to date provide an estimate with no measure of data dispersion, thence precision. The aim of this ...
Size matters: how accurate is clinical estimation of traumatic wound size?
Peterson, N; Stevenson, H; Sahni, V
2014-01-01
The presentation of traumatic wounds is commonplace in the accident & emergency department. Often, these wounds need referral to specialist care, e.g. trauma & orthopaedic, plastic or maxillofacial surgeons. Documentation and communication of the size of the wound can influence management, e.g. Gustilo & Anderson classification of open fractures. Several papers acknowledge the variability in measurement of chronic wounds, but there is no data regarding accuracy of traumatic wound assessment. The authors hypothesised that the estimation of wound size and subsequent communication or documentation was often inaccurate, with high inter-observer variability. A study was designed to assess this hypothesis. A total of 7 scaled images of wounds related to trauma were obtained from an Internet search engine. The questionnaire asked 3 questions regarding mechanism of injury, relevant anatomy and proposed treatment, to simulate real patient assessment. One further question addressed the estimation of wound size. 50 doctors of varying experience across several specialities were surveyed. The images were analysed after data collection had finished to provide appropriate measurements, and compared to the questionnaire results by a researcher blinded to the demographics of the individual. Our results show that there is a high inter-observer variability and inaccuracy in the estimation of wound size. This inaccuracy was directional and affected by gender. Male doctors were more likely to overestimate the size of wounds, whilst their female colleagues were more likely to underestimate size. The estimation of wound size is a common requirement of clinical practice, and inaccurate interpretation of size may influence surgical management. Assessment using estimation was inaccurate, with high inter-observer variability. Assessment of traumatic wounds that require surgical management should be accurately measured, possibly using photography and ruler measurement. Copyright © 2012
Sample Size for Tablet Compression and Capsule Filling Events During Process Validation.
Charoo, Naseem Ahmad; Durivage, Mark; Rahman, Ziyaur; Ayad, Mohamad Haitham
2017-12-01
During solid dosage form manufacturing, the uniformity of dosage units (UDU) is ensured by testing samples at 2 stages, that is, blend stage and tablet compression or capsule/powder filling stage. The aim of this work is to propose a sample size selection approach based on quality risk management principles for process performance qualification (PPQ) and continued process verification (CPV) stages by linking UDU to potential formulation and process risk factors. Bayes success run theorem appeared to be the most appropriate approach among various methods considered in this work for computing sample size for PPQ. The sample sizes for high-risk (reliability level of 99%), medium-risk (reliability level of 95%), and low-risk factors (reliability level of 90%) were estimated to be 299, 59, and 29, respectively. Risk-based assignment of reliability levels was supported by the fact that at low defect rate, the confidence to detect out-of-specification units would decrease which must be supplemented with an increase in sample size to enhance the confidence in estimation. Based on level of knowledge acquired during PPQ and the level of knowledge further required to comprehend process, sample size for CPV was calculated using Bayesian statistics to accomplish reduced sampling design for CPV. Copyright © 2017 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.
A simulated Experiment for Sampling Soil Micriarthropods to Reduce Sample Size
Tamura, Hiroshi
1987-01-01
An experiment was conducted to examine a possibility of reducing the necessary sample size in a quantitative survey on soil microarthropods, using soybeans instead of animals. An artificially provided, intensely aggregated distribution pattern of soybeans was easily transformed to the random pattern by stirring the substrate, which is soil in a large cardboard box. This enabled the necessary sample size to be greatly reduced without sacrificing the statistical reliability. A new practical met...
Estimating Aquatic Insect Populations. Introduction to Sampling.
Chihuahuan Desert Research Inst., Alpine, TX.
This booklet introduces high school and junior high school students to the major groups of aquatic insects and to population sampling techniques. Chapter 1 consists of a short field guide which can be used to identify five separate orders of aquatic insects: odonata (dragonflies and damselflies); ephemeroptera (mayflies); diptera (true flies);…
Optimum Sampling Times for Spectral Estimation.
1982-04-30
regression has been investigated by De La Garza [4,5], Hoel and Levine [6], and Kiefer and Wolfowitz [3]. The framework for their development is as...Designs, Ann. Math. Statistics, Vol. 36, pp. 1627-1655, 1965. 4. De La Garza , A., "Sampling of Information in Polynomial Regression," Ann. Math
Estimating total suspended sediment yield with probability sampling
Robert B. Thomas
1985-01-01
The ""Selection At List Time"" (SALT) scheme controls sampling of concentration for estimating total suspended sediment yield. The probability of taking a sample is proportional to its estimated contribution to total suspended sediment discharge. This procedure gives unbiased estimates of total suspended sediment yield and the variance of the...
Population-Sample Regression in the Estimation of Population Proportions
Weitzman, R. A.
2006-01-01
Focusing on a single sample obtained randomly with replacement from a single population, this article examines the regression of population on sample proportions and develops an unbiased estimator of the square of the correlation between them. This estimator turns out to be the regression coefficient. Use of the squared-correlation estimator as a…
Performance of Random Effects Model Estimators under Complex Sampling Designs
Jia, Yue; Stokes, Lynne; Harris, Ian; Wang, Yan
2011-01-01
In this article, we consider estimation of parameters of random effects models from samples collected via complex multistage designs. Incorporation of sampling weights is one way to reduce estimation bias due to unequal probabilities of selection. Several weighting methods have been proposed in the literature for estimating the parameters of…
Species-genetic diversity correlations in habitat fragmentation can be biased by small sample sizes.
Nazareno, Alison G; Jump, Alistair S
2012-06-01
Predicted parallel impacts of habitat fragmentation on genes and species lie at the core of conservation biology, yet tests of this rule are rare. In a recent article in Ecology Letters, Struebig et al. (2011) report that declining genetic diversity accompanies declining species diversity in tropical forest fragments. However, this study estimates diversity in many populations through extrapolation from very small sample sizes. Using the data of this recent work, we show that results estimated from the smallest sample sizes drive the species-genetic diversity correlation (SGDC), owing to a false-positive association between habitat fragmentation and loss of genetic diversity. Small sample sizes are a persistent problem in habitat fragmentation studies, the results of which often do not fit simple theoretical models. It is essential, therefore, that data assessing the proposed SGDC are sufficient in order that conclusions be robust.
Effects of sample size on the second magnetization peak in ...
Indian Academy of Sciences (India)
*E-mail: yeshurun@mail.biu.ac.il. Abstract. Effects of sample size on the second magnetization peak (SMP) in. Bi2Sr2CaCuO8+δ crystals are ... a termination of the measured transition line at Tl, typically 17–20 K (see figure 1). The obscuring and eventual disappearance of the SMP with decreasing tempera- tures has been ...
[Variance estimation considering multistage sampling design in multistage complex sample analysis].
Li, Yichong; Zhao, Yinjun; Wang, Limin; Zhang, Mei; Zhou, Maigeng
2016-03-01
Multistage sampling is a frequently-used method in random sampling survey in public health. Clustering or independence between observations often exists in the sampling, often called complex sample, generated by multistage sampling. Sampling error may be underestimated and the probability of type I error may be increased if the multistage sample design was not taken into consideration in analysis. As variance (error) estimator in complex sample is often complicated, statistical software usually adopt ultimate cluster variance estimate (UCVE) to approximate the estimation, which simply assume that the sample comes from one-stage sampling. However, with increased sampling fraction of primary sampling unit, contribution from subsequent sampling stages is no more trivial, and the ultimate cluster variance estimate may, therefore, lead to invalid variance estimation. This paper summarize a method of variance estimation considering multistage sampling design. The performances are compared with UCVE and the method considering multistage sampling design by simulating random sampling under different sampling schemes using real world data. Simulation showed that as primary sampling unit (PSU) sampling fraction increased, UCVE tended to generate increasingly biased estimation, whereas accurate estimates were obtained by using the method considering multistage sampling design.
Generalized and synthetic regression estimators for randomized branch sampling
David L. R. Affleck; Timothy G. Gregoire
2015-01-01
In felled-tree studies, ratio and regression estimators are commonly used to convert more readily measured branch characteristics to dry crown mass estimates. In some cases, data from multiple trees are pooled to form these estimates. This research evaluates the utility of both tactics in the estimation of crown biomass following randomized branch sampling (...
Uniformity trial size in estimates of plot size in restrict areas
Directory of Open Access Journals (Sweden)
Diogo Vanderlei Schwertner
Full Text Available ABSTRACTThe aim of this study was to determine the uniformity trial size when estimating optimum plot size in order to evaluate fresh phytomass in lettuce plants and fruit weight in sweet peppers. Production data, collected in uniformity trial on lettuce in a plastic greenhouse in both summer and winter, lettuce in plastic tunnels in autumn and winter, and sweet pepper in a plastic greenhouse in the summer-autumn and spring-summer seasons, were used to plan different uniformity trial sizes in crop rows. In all the experiments, each plant was evaluated individually and considered as a basic experimental unit. For each size in a uniformity trial, 3,000 resamples, randomly taken with replacement, were used to estimate optimum plot size. Uniformity trial using 27 basic experimental units to evaluate the fresh phytomass of lettuce plants, and with 29 basic experimental units to assess fruit weight in sweet pepper, are sufficient to estimate optimum plot size, with an amplitude of the 95% confidence interval of less than or equal to two basic experimental units.
Bergh, Daniel
2015-01-01
Chi-square statistics are commonly used for tests of fit of measurement models. Chi-square is also sensitive to sample size, which is why several approaches to handle large samples in test of fit analysis have been developed. One strategy to handle the sample size problem may be to adjust the sample size in the analysis of fit. An alternative is to adopt a random sample approach. The purpose of this study was to analyze and to compare these two strategies using simulated data. Given an original sample size of 21,000, for reductions of sample sizes down to the order of 5,000 the adjusted sample size function works as good as the random sample approach. In contrast, when applying adjustments to sample sizes of lower order the adjustment function is less effective at approximating the chi-square value for an actual random sample of the relevant size. Hence, the fit is exaggerated and misfit under-estimated using the adjusted sample size function. Although there are big differences in chi-square values between the two approaches at lower sample sizes, the inferences based on the p-values may be the same.
Estimating the size of the leprosy problem: the Bangladesh experience.
Richardus, J H; Croft, R P
1995-06-01
Assessing the size of the leprosy problem in a country is an important but difficult issue for the purpose of programme planning. Different methods have been proposed but often estimates have proved to be very different from reality. We have attempted to address this issue in Bangladesh, a country where official estimates are more than 5 times greater than the registered number of leprosy cases. A combination of methods, including surveys, data from leprosy control programmes and local knowledge based on the Delphi techniques have been combined to construct an estimate of the total number of cases in Bangladesh. This figure (173,196) is only 10% greater than the official estimate (136,000). It will be possible over the next few years to see how close this figure is to reality through data obtained from the National Leprosy Control Programme which is now rapidly developing to cover the whole country.
Sample size calculations for clinical trials targeting tauopathies: A new potential disease target
Whitwell, Jennifer L.; Duffy, Joseph R.; Strand, Edythe A.; Machulda, Mary M.; Tosakulwong, Nirubol; Weigand, Stephen D.; Senjem, Matthew L.; Spychalla, Anthony J.; Gunter, Jeffrey L.; Petersen, Ronald C.; Jack, Clifford R.; Josephs, Keith A.
2015-01-01
Disease-modifying therapies are being developed to target tau pathology, and should, therefore, be tested in primary tauopathies. We propose that progressive apraxia of speech should be considered one such target group. In this study, we investigate potential neuroimaging and clinical outcome measures for progressive apraxia of speech and determine sample size estimates for clinical trials. We prospectively recruited 24 patients with progressive apraxia of speech who underwent two serial MRI with an interval of approximately two years. Detailed speech and language assessments included the Apraxia of Speech Rating Scale (ASRS) and Motor Speech Disorders (MSD) severity scale. Rates of ventricular expansion and rates of whole brain, striatal and midbrain atrophy were calculated. Atrophy rates across 38 cortical regions were also calculated and the regions that best differentiated patients from controls were selected. Sample size estimates required to power placebo-controlled treatment trials were calculated. The smallest sample size estimates were obtained with rates of atrophy of the precentral gyrus and supplementary motor area, with both measures requiring less than 50 subjects per arm to detect a 25% treatment effect with 80% power. These measures outperformed the other regional and global MRI measures and the clinical scales. Regional rates of cortical atrophy therefore provide the best outcome measures in progressive apraxia of speech. The small sample size estimates demonstrate feasibility for including progressive apraxia of speech in future clinical treatment trials targeting tau. PMID:26076744
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning
Li, Zhushan
2014-01-01
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
Estimation of Nanoparticle Size Distributions by Image Analysis
DEFF Research Database (Denmark)
Fisker, Rune; Carstensen, Jens Michael; Hansen, Mikkel Fougt
2000-01-01
. In this paper, we present an automated image analysis technique based on a deformable ellipse model that can perform this task. Results of using this technique are shown for both nearly spherical particles and more irregularly shaped particles. The technique proves to be a very useful tool for nanoparticle......Knowledge of the nanoparticle size distribution is important for the interpretation of experimental results in many studies of nanoparticle properties. An automated method is needed for accurate and robust estimation of particle size distribution from nanoparticle images with thousands of particles...
Mesh Size Effects on Fracture Toughness Estimation by Damage Model
Energy Technology Data Exchange (ETDEWEB)
Choi, Shin Beom; Chang, Yoon Suk; Kim, Young Jin [School of Mechanical Engineering, Sungkyunkwan Univ., Suwon (Korea, Republic of); Kim, Min Chul; Lee, Bong Sang [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)
2009-05-15
The objective of this paper is to investigate mesh size effects on fracture toughness of SA508 carbon steel by damage model. To achieve this goal, a series of finite element analyses are carried out for CT (compact tension) and PCVN (pre-cracked V-notch) specimens. And Weibull stress model are adopted to derive toughness scale diagram. Finally, toughness scale diagram, which considered crack-tip mesh size effects, is derived from comparing estimated fracture toughness data between CT and PCVN specimens under -60 .deg. C and -80 .deg. C.
Sample size reduction in groundwater surveys via sparse data assimilation
Hussain, Z.
2013-04-01
In this paper, we focus on sparse signal recovery methods for data assimilation in groundwater models. The objective of this work is to exploit the commonly understood spatial sparsity in hydrodynamic models and thereby reduce the number of measurements to image a dynamic groundwater profile. To achieve this we employ a Bayesian compressive sensing framework that lets us adaptively select the next measurement to reduce the estimation error. An extension to the Bayesian compressive sensing framework is also proposed which incorporates the additional model information to estimate system states from even lesser measurements. Instead of using cumulative imaging-like measurements, such as those used in standard compressive sensing, we use sparse binary matrices. This choice of measurements can be interpreted as randomly sampling only a small subset of dug wells at each time step, instead of sampling the entire grid. Therefore, this framework offers groundwater surveyors a significant reduction in surveying effort without compromising the quality of the survey. © 2013 IEEE.
Estimation of pore size distribution using concentric double pulsed-field gradient NMR.
Benjamini, Dan; Nevo, Uri
2013-05-01
Estimation of pore size distribution of well calibrated phantoms using NMR is demonstrated here for the first time. Porous materials are a central constituent in fields as diverse as biology, geology, and oil drilling. Noninvasive characterization of monodisperse porous samples using conventional pulsed-field gradient (PFG) NMR is a well-established method. However, estimation of pore size distribution of heterogeneous polydisperse systems, which comprise most of the materials found in nature, remains extremely challenging. Concentric double pulsed-field gradient (CDPFG) is a 2-D technique where both q (the amplitude of the diffusion gradient) and φ (the relative angle between the gradient pairs) are varied. A recent prediction indicates this method should produce a more accurate and robust estimation of pore size distribution than its conventional 1-D versions. Five well defined size distribution phantoms, consisting of 1-5 different pore sizes in the range of 5-25 μm were used. The estimated pore size distributions were all in good agreement with the known theoretical size distributions, and were obtained without any a priori assumption on the size distribution model. These findings support that in addition to its theoretical benefits, the CDPFG method is experimentally reliable. Furthermore, by adding the angle parameter, sensitivity to small compartment sizes is increased without the use of strong gradients, thus making CDPFG safe for biological applications. Copyright © 2013 Elsevier Inc. All rights reserved.
Sample size of the reference sample in a case-augmented study.
Ghosh, Palash; Dewanji, Anup
2017-05-01
The case-augmented study, in which a case sample is augmented with a reference (random) sample from the source population with only covariates information known, is becoming popular in different areas of applied science such as pharmacovigilance, ecology, and econometrics. In general, the case sample is available from some source (for example, hospital database, case registry, etc.); however, the reference sample is required to be drawn from the corresponding source population. The required minimum size of the reference sample is an important issue in this regard. In this work, we address the minimum sample size calculation and discuss related issues. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Application of cokriging techniques for the estimation of hail size
Farnell, Carme; Rigo, Tomeu; Martin-Vide, Javier
2018-01-01
There are primarily two ways of estimating hail size: the first is the direct interpolation of point observations, and the second is the transformation of remote sensing fields into measurements of hail properties. Both techniques have advantages and limitations as regards generating the resultant map of hail damage. This paper presents a new methodology that combines the above mentioned techniques in an attempt to minimise the limitations and take advantage of the benefits of interpolation and the use of remote sensing data. The methodology was tested for several episodes with good results being obtained for the estimation of hail size at practically all the points analysed. The study area presents a large database of hail episodes, and for this reason, it constitutes an optimal test bench.
Rashidian, Arash; Miles, Jeremy; Russell, Daphne; Russell, Ian
2006-11-01
Interest has been growing in the use of the theory of planned behaviour (TBP) in health services research. The sample sizes range from less than 50 to more than 750 in published TPB studies without sample size calculations. We estimate the sample size for a multi-stage random survey of prescribing intention and actual prescribing for asthma in British general practice. To our knowledge, this is the first systematic attempt to determine sample size for a TPB survey. We use two different approaches: reported values of regression models' goodness-of-fit (the lambda method) and zero-order correlations (the variance inflation factor or VIF method). Intra-cluster correlation coefficient (ICC) is estimated and a socioeconomic variable is used for stratification. We perform sensitivity analysis to estimate the effects of our decisions on final sample size. The VIF method is more sensitive to the requirements of a TPB study. Given a correlation of .25 between intention and behaviour, and of .4 between intention and perceived behavioural control, the proposed sample size is 148. We estimate the ICC for asthma prescribing to be around 0.07. If 10 general practitioners were sampled per cluster, the sample size would be 242. It is feasible to perform sophisticated sample size calculations for a TPB study. The VIF is the appropriate method. Our approach can be used with adjustments in other settings and for other regression models.
Estimation of T-cell repertoire diversity and clonal size distribution by Poisson abundance models.
Sepúlveda, Nuno; Paulino, Carlos Daniel; Carneiro, Jorge
2010-02-28
The answer to many fundamental questions in Immunology requires the quantitative characterization of the T-cell repertoire, namely T cell receptor (TCR) diversity and clonal size distribution. An increasing number of repertoire studies are based on sequencing of the TCR variable regions in T-cell samples from which one tries to estimate the diversity of the original T-cell populations. Hitherto, estimation of TCR diversity was tackled either by a "standard" method that assumes a homogeneous clonal size distribution, or by non-parametric methods, such as the abundance-coverage and incidence-coverage estimators. However, both methods show caveats. On the one hand, the samples exhibit clonal size distributions with heavy right tails, a feature that is incompatible with the assumption of an equal frequency of every TCR sequence in the repertoire. Thus, this "standard" method produces inaccurate estimates. On the other hand, non-parametric estimators are robust in a wide range of situations, but per se provide no information about the clonal size distribution. This paper redeploys Poisson abundance models from Ecology to overcome the limitations of the above inferential procedures. These models assume that each TCR variant is sampled according to a Poisson distribution with a specific sampling rate, itself varying according to some Exponential, Gamma, or Lognormal distribution, or still an appropriate mixture of Exponential distributions. With these models, one can estimate the clonal size distribution in addition to TCR diversity of the repertoire. A procedure is suggested to evaluate robustness of diversity estimates with respect to the most abundant sampled TCR sequences. For illustrative purposes, previously published data on mice with limited TCR diversity are analyzed. Two of the presented models are more consistent with the data and give the most robust TCR diversity estimates. They suggest that clonal sizes follow either a Lognormal or an appropriate mixture of
Estimation of LOCA break size using cascaded Fuzzy neural networks
Energy Technology Data Exchange (ETDEWEB)
Choi, Geon Pil; Yoo, Kwae Hwan; Back, Ju Hyun; Na, Man Gyun [Dept. of Nuclear Engineering, Chosun University, Gwangju (Korea, Republic of)
2017-04-15
Operators of nuclear power plants may not be equipped with sufficient information during a loss-of-coolant accident (LOCA), which can be fatal, or they may not have sufficient time to analyze the information they do have, even if this information is adequate. It is not easy to predict the progression of LOCAs in nuclear power plants. Therefore, accurate information on the LOCA break position and size should be provided to efficiently manage the accident. In this paper, the LOCA break size is predicted using a cascaded fuzzy neural network (CFNN) model. The input data of the CFNN model are the time-integrated values of each measurement signal for an initial short-time interval after a reactor scram. The training of the CFNN model is accomplished by a hybrid method combined with a genetic algorithm and a least squares method. As a result, LOCA break size is estimated exactly by the proposed CFNN model.
Estimation of LOCA Break Size Using Cascaded Fuzzy Neural Networks
Directory of Open Access Journals (Sweden)
Geon Pil Choi
2017-04-01
Full Text Available Operators of nuclear power plants may not be equipped with sufficient information during a loss-of-coolant accident (LOCA, which can be fatal, or they may not have sufficient time to analyze the information they do have, even if this information is adequate. It is not easy to predict the progression of LOCAs in nuclear power plants. Therefore, accurate information on the LOCA break position and size should be provided to efficiently manage the accident. In this paper, the LOCA break size is predicted using a cascaded fuzzy neural network (CFNN model. The input data of the CFNN model are the time-integrated values of each measurement signal for an initial short-time interval after a reactor scram. The training of the CFNN model is accomplished by a hybrid method combined with a genetic algorithm and a least squares method. As a result, LOCA break size is estimated exactly by the proposed CFNN model.
Estimating a distribution function of the tumor size at metastasis.
Xu, J L; Prorok, P C
1998-09-01
In studying the relationship between the size of primary cancers and the occurrence of metastases, two quantities are of prime importance. The first is the distribution of tumor size at the point of metastatic transition, while the second is the probability that detectable metastases are present when cancer comes to medical attention. Kimmel and Flehinger (1991, Biometrics 47, 987-1004) developed a general nonparametric model and studied its two limiting cases. Because of unidentifiablity of their general model, a new identifiable model is introduced by making the hazard function for detecting a metastatic cancer a constant. The new model includes Kimmel and Flehinger's (1991) second limiting model as a special case. An estimator of the tumor size distribution at metastases is proposed. The result is applied to a set of colorectal cancer data.
Complex sampling designs for the Customer Satisfaction Index estimation
Directory of Open Access Journals (Sweden)
Tonio Di Battista
2013-05-01
Full Text Available In this paper we focus on sampling designs best suited to meeting the needs of Customer Satisfaction (CS assessment with particular attention being paid to adaptive sampling which may be useful. Complex sampling designs are illustrated in order to build CS indices that may be used for inference purposes. When the phenomenon of satisfaction is rare, adaptive designs can produce gains in efficiency, relative to conventional designs, for estimating the population parameters. For such sampling design, nonlinear estimators may be used to estimate customer satisfaction indices which are generally biased and the variance estimator may not be obtained in a closed-form solution. Delta, jackknfe and bootstrap procedures are introduced in order to reduce bias and estimating variance. The paper ends up with a simulation study in order to estimate the variance of the proposed estimator.
The effects of fresh and rapid desiccated tissue on estimates of Ophiopogoneae genome size
Directory of Open Access Journals (Sweden)
Guangyan Wang
2016-08-01
Full Text Available Fresh plant material is usually used for genome size estimation by flow cytometry (FCM. Lack of fresh material is cited as one of the main reasons for the dearth of studies on plants from remote locations. Genome sizes in fresh versus desiccated tissue of 16 Ophiopogoneae species and five model plant species were estimated. Our results indicated that desiccated tissue was suitable for genome size estimation; this method enables broader geographic sampling of plants when fresh tissue collection is not feasible. To be useful, after dessication the Ophiopogoneae sample should be green without brown or yellow markings; it should be stored in deep freezer at −80 °C, and the storage time should be no more than 6 months.
Reliability of fish size estimates obtained from multibeam imaging sonar
Hightower, Joseph E.; Magowan, Kevin J.; Brown, Lori M.; Fox, Dewayne A.
2013-01-01
Multibeam imaging sonars have considerable potential for use in fisheries surveys because the video-like images are easy to interpret, and they contain information about fish size, shape, and swimming behavior, as well as characteristics of occupied habitats. We examined images obtained using a dual-frequency identification sonar (DIDSON) multibeam sonar for Atlantic sturgeon Acipenser oxyrinchus oxyrinchus, striped bass Morone saxatilis, white perch M. americana, and channel catfish Ictalurus punctatus of known size (20–141 cm) to determine the reliability of length estimates. For ranges up to 11 m, percent measurement error (sonar estimate – total length)/total length × 100 varied by species but was not related to the fish's range or aspect angle (orientation relative to the sonar beam). Least-square mean percent error was significantly different from 0.0 for Atlantic sturgeon (x̄ = −8.34, SE = 2.39) and white perch (x̄ = 14.48, SE = 3.99) but not striped bass (x̄ = 3.71, SE = 2.58) or channel catfish (x̄ = 3.97, SE = 5.16). Underestimating lengths of Atlantic sturgeon may be due to difficulty in detecting the snout or the longer dorsal lobe of the heterocercal tail. White perch was the smallest species tested, and it had the largest percent measurement errors (both positive and negative) and the lowest percentage of images classified as good or acceptable. Automated length estimates for the four species using Echoview software varied with position in the view-field. Estimates tended to be low at more extreme azimuthal angles (fish's angle off-axis within the view-field), but mean and maximum estimates were highly correlated with total length. Software estimates also were biased by fish images partially outside the view-field and when acoustic crosstalk occurred (when a fish perpendicular to the sonar and at relatively close range is detected in the side lobes of adjacent beams). These sources of
Dong, H.; Zhang, H.; Zuo, Y.; Gao, P.; Ye, G.
2018-01-01
Mercury intrusion porosimetry (MIP) measurements are widely used to determine pore throat size distribution (PSD) curves of porous materials. The pore throat size of porous materials has been used to estimate their compressive strength and air permeability. However, the effect of sample size on
Gluttonous predators: how to estimate prey size when there are too many prey
Directory of Open Access Journals (Sweden)
MS. Araújo
Full Text Available Prey size is an important factor in food consumption. In studies of feeding ecology, prey items are usually measured individually using calipers or ocular micrometers. Among amphibians and reptiles, there are species that feed on large numbers of small prey items (e.g. ants, termites. This high intake makes it difficult to estimate prey size consumed by these animals. We addressed this problem by developing and evaluating a procedure for subsampling the stomach contents of such predators in order to estimate prey size. Specifically, we developed a protocol based on a bootstrap procedure to obtain a subsample with a precision error of at the most 5%, with a confidence level of at least 95%. This guideline should reduce the sampling effort and facilitate future studies on the feeding habits of amphibians and reptiles, and also provide a means of obtaining precise estimates of prey size.
Monte Carlo approaches for determining power and sample size in low-prevalence applications.
Williams, Michael S; Ebel, Eric D; Wagner, Bruce A
2007-11-15
The prevalence of disease in many populations is often low. For example, the prevalence of tuberculosis, brucellosis, and bovine spongiform encephalopathy range from 1 per 100,000 to less than 1 per 1,000,000 in many countries. When an outbreak occurs, epidemiological investigations often require comparing the prevalence in an exposed population with that of an unexposed population. To determine if the level of disease in the two populations is significantly different, the epidemiologist must consider the test to be used, desired power of the test, and determine the appropriate sample size for both the exposed and unexposed populations. Commonly available software packages provide estimates of the required sample sizes for this application. This study shows that these estimated sample sizes can exceed the necessary number of samples by more than 35% when the prevalence is low. We provide a Monte Carlo-based solution and show that in low-prevalence applications this approach can lead to reductions in the total samples size of more than 10,000 samples.
Lawson, Chris A.; Fisher, Anna V.
2011-01-01
Developmental studies have provided mixed evidence with regard to the question of whether children consider sample size and sample diversity in their inductive generalizations. Results from four experiments with 105 undergraduates, 105 school-age children (M = 7.2 years), and 105 preschoolers (M = 4.9 years) showed that preschoolers made a higher…
Network Structure and Biased Variance Estimation in Respondent Driven Sampling.
Directory of Open Access Journals (Sweden)
Ashton M Verdery
Full Text Available This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS. Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance estimation for the construction of confidence intervals and hypothesis tests. In this paper, we show that the estimators of RDS sampling variance rely on a critical assumption that the network is First Order Markov (FOM with respect to the dependent variable of interest. We demonstrate, through intuitive examples, mathematical generalizations, and computational experiments that current RDS variance estimators will always underestimate the population sampling variance of RDS in empirical networks that do not conform to the FOM assumption. Analysis of 215 observed university and school networks from Facebook and Add Health indicates that the FOM assumption is violated in every empirical network we analyze, and that these violations lead to substantially biased RDS estimators of sampling variance. We propose and test two alternative variance estimators that show some promise for reducing biases, but which also illustrate the limits of estimating sampling variance with only partial information on the underlying population social network.
Network Structure and Biased Variance Estimation in Respondent Driven Sampling.
Verdery, Ashton M; Mouw, Ted; Bauldry, Shawn; Mucha, Peter J
2015-01-01
This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS). Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance estimation for the construction of confidence intervals and hypothesis tests. In this paper, we show that the estimators of RDS sampling variance rely on a critical assumption that the network is First Order Markov (FOM) with respect to the dependent variable of interest. We demonstrate, through intuitive examples, mathematical generalizations, and computational experiments that current RDS variance estimators will always underestimate the population sampling variance of RDS in empirical networks that do not conform to the FOM assumption. Analysis of 215 observed university and school networks from Facebook and Add Health indicates that the FOM assumption is violated in every empirical network we analyze, and that these violations lead to substantially biased RDS estimators of sampling variance. We propose and test two alternative variance estimators that show some promise for reducing biases, but which also illustrate the limits of estimating sampling variance with only partial information on the underlying population social network.
Body mass estimates of hominin fossils and the evolution of human body size.
Grabowski, Mark; Hatala, Kevin G; Jungers, William L; Richmond, Brian G
2015-08-01
Body size directly influences an animal's place in the natural world, including its energy requirements, home range size, relative brain size, locomotion, diet, life history, and behavior. Thus, an understanding of the biology of extinct organisms, including species in our own lineage, requires accurate estimates of body size. Since the last major review of hominin body size based on postcranial morphology over 20 years ago, new fossils have been discovered, species attributions have been clarified, and methods improved. Here, we present the most comprehensive and thoroughly vetted set of individual fossil hominin body mass predictions to date, and estimation equations based on a large (n = 220) sample of modern humans of known body masses. We also present species averages based exclusively on fossils with reliable taxonomic attributions, estimates of species averages by sex, and a metric for levels of sexual dimorphism. Finally, we identify individual traits that appear to be the most reliable for mass estimation for each fossil species, for use when only one measurement is available for a fossil. Our results show that many early hominins were generally smaller-bodied than previously thought, an outcome likely due to larger estimates in previous studies resulting from the use of large-bodied modern human reference samples. Current evidence indicates that modern human-like large size first appeared by at least 3-3.5 Ma in some Australopithecus afarensis individuals. Our results challenge an evolutionary model arguing that body size increased from Australopithecus to early Homo. Instead, we show that there is no reliable evidence that the body size of non-erectus early Homo differed from that of australopiths, and confirm that Homo erectus evolved larger average body size than earlier hominins. Copyright © 2015 Elsevier Ltd. All rights reserved.
Spatial Sampling Design for Estimating Regional GPP With Spatial Heterogeneities
Wang, J.H.; Ge, Y.; Heuvelink, G.B.M.; Zhou, C.H.
2014-01-01
The estimation of regional gross primary production (GPP) is a crucial issue in carbon cycle studies. One commonly used way to estimate the characteristics of GPP is to infer the total amount of GPP by collecting field samples. In this process, the spatial sampling design will affect the error
Mars Rover/Sample Return - Phase A cost estimation
Stancati, Michael L.; Spadoni, Daniel J.
1990-01-01
This paper presents a preliminary cost estimate for the design and development of the Mars Rover/Sample Return (MRSR) mission. The estimate was generated using a modeling tool specifically built to provide useful cost estimates from design parameters of the type and fidelity usually available during early phases of mission design. The model approach and its application to MRSR are described.
Size Matters: FTIR Spectral Analysis of Apollo Regolith Samples Exhibits Grain Size Dependence.
Martin, Dayl; Joy, Katherine; Pernet-Fisher, John; Wogelius, Roy; Morlok, Andreas; Hiesinger, Harald
2017-04-01
The Mercury Thermal Infrared Spectrometer (MERTIS) on the upcoming BepiColombo mission is designed to analyse the surface of Mercury in thermal infrared wavelengths (7-14 μm) to investigate the physical properties of the surface materials [1]. Laboratory analyses of analogue materials are useful for investigating how various sample properties alter the resulting infrared spectrum. Laboratory FTIR analysis of Apollo fine (exposure to space weathering processes), and proportion of glassy material affect their average infrared spectra. Each of these samples was analysed as a bulk sample and five size fractions: 60%) causes a 'flattening' of the spectrum, with reduced reflectance in the Reststrahlen Band region (RB) as much as 30% in comparison to samples that are dominated by a high proportion of crystalline material. Apollo 15401,147 is an immature regolith with a high proportion of volcanic glass pyroclastic beads [2]. The high mafic mineral content results in a systematic shift in the Christiansen Feature (CF - the point of lowest reflectance) to longer wavelength: 8.6 μm. The glass beads dominate the spectrum, displaying a broad peak around the main Si-O stretch band (at 10.8 μm). As such, individual mineral components of this sample cannot be resolved from the average spectrum alone. Apollo 67481,96 is a sub-mature regolith composed dominantly of anorthite plagioclase [2]. The CF position of the average spectrum is shifted to shorter wavelengths (8.2 μm) due to the higher proportion of felsic minerals. Its average spectrum is dominated by anorthite reflectance bands at 8.7, 9.1, 9.8, and 10.8 μm. The average reflectance is greater than the other samples due to a lower proportion of glassy material. In each soil, the smallest fractions (0-25 and 25-63 μm) have CF positions 0.1-0.4 μm higher than the larger grain sizes. Also, the bulk-sample spectra mostly closely resemble the 0-25 μm sieved size fraction spectrum, indicating that this size fraction of each
Spatially-explicit estimation of Wright's neighborhood size in continuous populations
Andrew J. Shirk; Samuel A. Cushman
2014-01-01
Effective population size (Ne) is an important parameter in conservation genetics because it quantifies a population's capacity to resist loss of genetic diversity due to inbreeding and drift. The classical approach to estimate Ne from genetic data involves grouping sampled individuals into discretely defined subpopulations assumed to be panmictic. Importantly,...
Effect of sieve mesh size on the estimation of benthic invertebrate ...
African Journals Online (AJOL)
Characterisation of benthic invertebrate communities, taxonomic abundance and composition provides information that is used during river bioassessment. However, the mesh size of the sieves used during processing of invertebrate samples may affect the estimation of taxonomic abundance and composition. In the current ...
Sample size requirement in analytical studies for similarity assessment.
Chow, Shein-Chung; Song, Fuyu; Bai, He
2017-01-01
For the assessment of biosimilar products, the FDA recommends a stepwise approach for obtaining the totality-of-the-evidence for assessing biosimilarity between a proposed biosimilar product and its corresponding innovative biologic product. The stepwise approach starts with analytical studies for assessing similarity in critical quality attributes (CQAs), which are relevant to clinical outcomes at various stages of the manufacturing process. For CQAs that are the most relevant to clinical outcomes, the FDA requires an equivalence test be performed for similarity assessment based on an equivalence acceptance criterion (EAC) that is obtained using a single test value of some selected reference lots. In practice, we often have extremely imbalanced numbers of reference and test lots available for the establishment of EAC. In this case, to assist the sponsors, the FDA proposed an idea for determining the number of reference lots and the number of test lots required in order not to have imbalanced sample sizes when establishing EAC for the equivalence test based on extensive simulation studies. Along this line, this article not only provides statistical justification of Dong, Tsong, and Weng's proposal, but also proposes an alternative method for sample size requirement for the Tier 1 equivalence test.
Neurocognitive aspects of body size estimation - A study of contemporary dancers
Directory of Open Access Journals (Sweden)
André Bizerra
Full Text Available Abstract Dancers use multiple forms of body language when performing their functions in the contemporary dance scene. Some neurocognitive aspects are involved in dance, and we highlight the aspect of body image, in particular, the dimensional aspect of the body perception. The aim of this study is to analyze the perceptual aspect of body image (body size estimation and its possible association with the motor aspect (dynamic balance involved in the practice of dance, comparing contemporary dancers with physically active and inactive individuals. The sample consisted of 48 subjects divided into four groups: 1 Professional Group (PG; 2 Dance Student Group (SG; 3 Physically Active Group (AG; and 4 Physically Inactive Group (IG.Two tests were used: the Image Marking Procedure (body size estimation and the Star Excursion Balance Test (dynamic balance. Was observed that dancing and exercising contribute to a proper body size estimation, but cannot be considered the only determining factor. Although dancers have higher ability in the motor test (dynamic balance, no direct relation to the perception of body size was observed, leading us to conclude it is a skill task/dependent acquired by repeating and training. In this study, we found a statistical significant association between educational level and body size estimation. The study opens new horizons in relation to the understanding of factors involved in the construction of the body size estimation.
Improving Sample Estimate Reliability and Validity with Linked Ego Networks
Lu, Xin
2012-01-01
Respondent-driven sampling (RDS) is currently widely used in public health, especially for the study of hard-to-access populations such as injecting drug users and men who have sex with men. The method works like a snowball sample but can, given that some assumptions are met, generate unbiased population estimates. However, recent studies have shown that traditional RDS estimators are likely to generate large variance and estimate error. To improve the performance of traditional estimators, we propose a method to generate estimates with ego network data collected by RDS. By simulating RDS processes on an empirical human social network with known population characteristics, we have shown that the precision of estimates on the composition of network link types is greatly improved with ego network data. The proposed estimator for population characteristics shows superior advantage over traditional RDS estimators, and most importantly, the new method exhibits strong robustness to the recruitment preference of res...
inverse gaussian model for small area estimation via gibbs sampling
African Journals Online (AJOL)
ADMIN
These small domains need not be geographical locations, but can represent distinct subdomains defined by several stratification factors. Sample survey data are .... a stratified random sample design is used such that each cell defines a stratum from which a random sample of size nij is drawn. Following the terminology of a ...
Estimating population size of Pygoscelid Penguins from TM data
Olson, Charles E., Jr.; Schwaller, Mathew R.; Dahmer, Paul A.
1987-01-01
An estimate was made toward a continent wide population of penguins. The results indicate that Thematic Mapper data can be used to identify penguin rookeries due to the unique reflectance properties of guano. Strong correlations exist between nesting populations and rookery area occupied by the birds. These correlations allow estimation of the number of nesting pairs in colonies. The success of remote sensing and biometric analyses leads one to believe that a continent wide estimate of penguin populations is possible based on a timely sample employing ground based and remote sensing techniques. Satellite remote sensing along the coastline may well locate previously undiscovered penguin nesting sites, or locate rookeries which have been assumed to exist for over a half century, but never located. Observations which found that penguins are one of the most sensitive elements in the complex of Southern Ocean ecosystems motivated this study.
Low-sampling-rate ultra-wideband channel estimation using equivalent-time sampling
Ballal, Tarig
2014-09-01
In this paper, a low-sampling-rate scheme for ultra-wideband channel estimation is proposed. The scheme exploits multiple observations generated by transmitting multiple pulses. In the proposed scheme, P pulses are transmitted to produce channel impulse response estimates at a desired sampling rate, while the ADC samples at a rate that is P times slower. To avoid loss of fidelity, the number of sampling periods (based on the desired rate) in the inter-pulse interval is restricted to be co-prime with P. This condition is affected when clock drift is present and the transmitted pulse locations change. To handle this case, and to achieve an overall good channel estimation performance, without using prior information, we derive an improved estimator based on the bounded data uncertainty (BDU) model. It is shown that this estimator is related to the Bayesian linear minimum mean squared error (LMMSE) estimator. Channel estimation performance of the proposed sub-sampling scheme combined with the new estimator is assessed in simulation. The results show that high reduction in sampling rate can be achieved. The proposed estimator outperforms the least squares estimator in almost all cases, while in the high SNR regime it also outperforms the LMMSE estimator. In addition to channel estimation, a synchronization method is also proposed that utilizes the same pulse sequence used for channel estimation. © 2014 IEEE.
Estimating Functions of Distributions Defined over Spaces of Unknown Size
Directory of Open Access Journals (Sweden)
David H. Wolpert
2013-10-01
Full Text Available We consider Bayesian estimation of information-theoretic quantities from data, using a Dirichlet prior. Acknowledging the uncertainty of the event space size m and the Dirichlet prior’s concentration parameter c, we treat both as random variables set by a hyperprior. We show that the associated hyperprior, P(c, m, obeys a simple “Irrelevance of Unseen Variables” (IUV desideratum iff P(c, m = P(cP(m. Thus, requiring IUV greatly reduces the number of degrees of freedom of the hyperprior. Some information-theoretic quantities can be expressed multiple ways, in terms of different event spaces, e.g., mutual information. With all hyperpriors (implicitly used in earlier work, different choices of this event space lead to different posterior expected values of these information-theoretic quantities. We show that there is no such dependence on the choice of event space for a hyperprior that obeys IUV. We also derive a result that allows us to exploit IUV to greatly simplify calculations, like the posterior expected mutual information or posterior expected multi-information. We also use computer experiments to favorably compare an IUV-based estimator of entropy to three alternative methods in common use. We end by discussing how seemingly innocuous changes to the formalization of an estimation problem can substantially affect the resultant estimates of posterior expectations.
n4Studies: Sample Size Calculation for an Epidemiological Study on a Smart Device
Directory of Open Access Journals (Sweden)
Chetta Ngamjarus
2016-05-01
Full Text Available Objective: This study was to develop a sample size application (called “n4Studies” for free use on iPhone and Android devices and to compare sample size functions between n4Studies with other applications and software. Methods: Objective-C programming language was used to create the application for the iPhone OS (operating system while javaScript, jquery mobile, PhoneGap and jstat were used to develop it for Android phones. Other sample size applications were searched from the Apple app and Google play stores. The applications’ characteristics and sample size functions were collected. Spearman’s rank correlation was used to investigate the relationship between number of sample size functions and price. Results: “n4Studies” provides several functions for sample size and power calculations for various epidemiological study designs. It can be downloaded from the Apple App and Google play store. Comparing n4Studies with other applications, it covers several more types of epidemiological study designs, gives similar results for estimation of infinite/finite population mean and infinite/finite proportion from GRANMO, for comparing two independent means from BioStats, for comparing two independent proportions from EpiCal application. When using the same parameters, n4Studies gives similar results to STATA, epicalc package in R, PS, G*Power, and OpenEpi. Conclusion: “n4Studies” can be an alternative tool for calculating the sample size. It may be useful to students, lecturers and researchers in conducting their research projects.
Variance component estimates for alternative litter size traits in swine.
Putz, A M; Tiezzi, F; Maltecca, C; Gray, K A; Knauer, M T
2015-11-01
Litter size at d 5 (LS5) has been shown to be an effective trait to increase total number born (TNB) while simultaneously decreasing preweaning mortality. The objective of this study was to determine the optimal litter size day for selection (i.e., other than d 5). Traits included TNB, number born alive (NBA), litter size at d 2, 5, 10, 30 (LS2, LS5, LS10, LS30, respectively), litter size at weaning (LSW), number weaned (NW), piglet mortality at d 30 (MortD30), and average piglet birth weight (BirthWt). Litter size traits were assigned to biological litters and treated as a trait of the sow. In contrast, NW was the number of piglets weaned by the nurse dam. Bivariate animal models included farm, year-season, and parity as fixed effects. Number born alive was fit as a covariate for BirthWt. Random effects included additive genetics and the permanent environment of the sow. Variance components were plotted for TNB, NBA, and LS2 to LS30 using univariate animal models to determine how variances changed over time. Additive genetic variance was minimized at d 7 in Large White and at d 14 in Landrace pigs. Total phenotypic variance for litter size traits decreased over the first 10 d and then stabilized. Heritability estimates increased between TNB and LS30. Genetic correlations between TNB, NBA, and LS2 to LS29 with LS30 plateaued within the first 10 d. A genetic correlation with LS30 of 0.95 was reached at d 4 for Large White and at d 8 for Landrace pigs. Heritability estimates ranged from 0.07 to 0.13 for litter size traits and MortD30. Birth weight had an h of 0.24 and 0.26 for Large White and Landrace pigs, respectively. Genetic correlations among LS30, LSW, and NW ranged from 0.97 to 1.00. In the Large White breed, genetic correlations between MortD30 with TNB and LS30 were 0.23 and -0.64, respectively. These correlations were 0.10 and -0.61 in the Landrace breed. A high genetic correlation of 0.98 and 0.97 was observed between LS10 and NW for Large White and
Comparison of Four Estimators under sampling without Replacement
African Journals Online (AJOL)
The results were obtained using a program written in Microsoft Visual C++ programming language. It was observed that the two-stage sampling under unequal probabilities without replacement is always better than the other three estimators considered. Keywords: Unequal probability sampling, two-stage sampling, ...
Shieh, Gwowen
2017-12-01
Covariate-dependent reference limits have been extensively applied in biology and medicine for determining the substantial magnitude and relative importance of quantitative measurements. Confidence interval and sample size procedures are available for studying regression-based reference limits. However, the existing popular methods employ different technical simplifications and are applicable only in certain limited situations. This paper describes exact confidence intervals of regression-based reference limits and compares the exact approach with the approximate methods under a wide range of model configurations. Using the ratio between the widths of confidence interval and reference interval as the relative precision index, optimal sample size procedures are presented for precise interval estimation under expected ratio and tolerance probability considerations. Simulation results show that the approximate interval methods using normal distribution have inaccurate confidence limits. The exact confidence intervals dominate the approximate procedures in one- and two-sided coverage performance. Unlike the current simplifications, the proposed sample size procedures integrate all key factors including covariate features in the optimization process and are suitable for various regression-based reference limit studies with potentially diverse configurations. The exact interval estimation has theoretical and practical advantages over the approximate methods. The corresponding sample size procedures and computing algorithms are also presented to facilitate the data analysis and research design of regression-based reference limits. Copyright © 2017 Elsevier Ltd. All rights reserved.
Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics
DEFF Research Database (Denmark)
Holland, Dominic; Wang, Yunpeng; Thompson, Wesley K
2016-01-01
-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities, relying only on summary statistics from GWAS substudies, and a scheme allowing......Genome-wide Association Studies (GWAS) result in millions of summary statistics ("z-scores") for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric...... 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We assess the degree to which effect sizes are over-estimated when...
Estimation of Tree Size Diversity Using Object Oriented Texture Analysis and Aster Imagery
Directory of Open Access Journals (Sweden)
Ozdemir Senturk
2008-08-01
Full Text Available This study investigates the potential of object-based texture parameters extracted from 15m spatial resolution ASTER imagery for estimating tree size diversity in a Mediterranean forested landscape in Turkey. Tree size diversity based on tree basal area was determined using the Shannon index and Gini Coefficient at the sampling plot level. Image texture parameters were calculated based on the grey level co-occurrence matrix (GLCM for various image segmentation levels. Analyses of relationships between tree size diversity and texture parameters found that relationships between the Gini Coefficient and the GLCM values were the most statistically significant, with the highest correlation (r=0.69 being with GLCM Homogeneity values. In contrast, Shannon Index values were weakly correlated with image derived texture parameters. The results suggest that 15m resolution Aster imagery has considerable potential in estimating tree size diversity based on the Gini Coefficient for heterogeneous Mediterranean forests.
R. L. Czaplewski
2009-01-01
The minimum variance multivariate composite estimator is a relatively simple sequential estimator for complex sampling designs (Czaplewski 2009). Such designs combine a probability sample of expensive field data with multiple censuses and/or samples of relatively inexpensive multi-sensor, multi-resolution remotely sensed data. Unfortunately, the multivariate composite...
Schrago, Carlos G
2014-08-01
Reliable estimates of ancestral effective population sizes are necessary to unveil the population-level phenomena that shaped the phylogeny and molecular evolution of the African great apes. Although several methods have previously been applied to infer ancestral effective population sizes, an analysis of the influence of the selective regime on the estimates of ancestral demography has not been thoroughly conducted. In this study, three independent data sets under different selective regimes were used were composed to tackle this issue. The results showed that selection had a significant impact on the estimates of ancestral effective population sizes of the African great apes. The inference of the ancestral demography of African great apes was affected by the selection regime. The effects, however, were not homogeneous along the ancestral populations of great apes. The effective population size of the ancestor of humans and chimpanzees was more impacted by the selection regime when compared to the same parameter in the ancestor of humans, chimpanzees and gorillas. Because the selection regime influenced the estimates of ancestral effective population size, it is reasonable to assume that a portion of the discrepancy found in previous studies that inferred the ancestral effective population size may be attributable to the differential action of selection on the genes sampled.
Implications of sampling design and sample size for national carbon accounting systems.
Köhl, Michael; Lister, Andrew; Scott, Charles T; Baldauf, Thomas; Plugge, Daniel
2011-11-08
Countries willing to adopt a REDD regime need to establish a national Measurement, Reporting and Verification (MRV) system that provides information on forest carbon stocks and carbon stock changes. Due to the extensive areas covered by forests the information is generally obtained by sample based surveys. Most operational sampling approaches utilize a combination of earth-observation data and in-situ field assessments as data sources. We compared the cost-efficiency of four different sampling design alternatives (simple random sampling, regression estimators, stratified sampling, 2-phase sampling with regression estimators) that have been proposed in the scope of REDD. Three of the design alternatives provide for a combination of in-situ and earth-observation data. Under different settings of remote sensing coverage, cost per field plot, cost of remote sensing imagery, correlation between attributes quantified in remote sensing and field data, as well as population variability and the percent standard error over total survey cost was calculated. The cost-efficiency of forest carbon stock assessments is driven by the sampling design chosen. Our results indicate that the cost of remote sensing imagery is decisive for the cost-efficiency of a sampling design. The variability of the sample population impairs cost-efficiency, but does not reverse the pattern of cost-efficiency of the individual design alternatives. Our results clearly indicate that it is important to consider cost-efficiency in the development of forest carbon stock assessments and the selection of remote sensing techniques. The development of MRV-systems for REDD need to be based on a sound optimization process that compares different data sources and sampling designs with respect to their cost-efficiency. This helps to reduce the uncertainties related with the quantification of carbon stocks and to increase the financial benefits from adopting a REDD regime.
MEPAG Recommendations for a 2018 Mars Sample Return Caching Lander - Sample Types, Number, and Sizes
Allen, Carlton C.
2011-01-01
The return to Earth of geological and atmospheric samples from the surface of Mars is among the highest priority objectives of planetary science. The MEPAG Mars Sample Return (MSR) End-to-End International Science Analysis Group (MEPAG E2E-iSAG) was chartered to propose scientific objectives and priorities for returned sample science, and to map out the implications of these priorities, including for the proposed joint ESA-NASA 2018 mission that would be tasked with the crucial job of collecting and caching the samples. The E2E-iSAG identified four overarching scientific aims that relate to understanding: (A) the potential for life and its pre-biotic context, (B) the geologic processes that have affected the martian surface, (C) planetary evolution of Mars and its atmosphere, (D) potential for future human exploration. The types of samples deemed most likely to achieve the science objectives are, in priority order: (1A). Subaqueous or hydrothermal sediments (1B). Hydrothermally altered rocks or low temperature fluid-altered rocks (equal priority) (2). Unaltered igneous rocks (3). Regolith, including airfall dust (4). Present-day atmosphere and samples of sedimentary-igneous rocks containing ancient trapped atmosphere Collection of geologically well-characterized sample suites would add considerable value to interpretations of all collected rocks. To achieve this, the total number of rock samples should be about 30-40. In order to evaluate the size of individual samples required to meet the science objectives, the E2E-iSAG reviewed the analytical methods that would likely be applied to the returned samples by preliminary examination teams, for planetary protection (i.e., life detection, biohazard assessment) and, after distribution, by individual investigators. It was concluded that sample size should be sufficient to perform all high-priority analyses in triplicate. In keeping with long-established curatorial practice of extraterrestrial material, at least 40% by
Creel survey sampling designs for estimating effort in short-duration Chinook salmon fisheries
McCormick, Joshua L.; Quist, Michael C.; Schill, Daniel J.
2013-01-01
Chinook Salmon Oncorhynchus tshawytscha sport fisheries in the Columbia River basin are commonly monitored using roving creel survey designs and require precise, unbiased catch estimates. The objective of this study was to examine the relative bias and precision of total catch estimates using various sampling designs to estimate angling effort under the assumption that mean catch rate was known. We obtained information on angling populations based on direct visual observations of portions of Chinook Salmon fisheries in three Idaho river systems over a 23-d period. Based on the angling population, Monte Carlo simulations were used to evaluate the properties of effort and catch estimates for each sampling design. All sampling designs evaluated were relatively unbiased. Systematic random sampling (SYS) resulted in the most precise estimates. The SYS and simple random sampling designs had mean square error (MSE) estimates that were generally half of those observed with cluster sampling designs. The SYS design was more efficient (i.e., higher accuracy per unit cost) than a two-cluster design. Increasing the number of clusters available for sampling within a day decreased the MSE of estimates of daily angling effort, but the MSE of total catch estimates was variable depending on the fishery. The results of our simulations provide guidelines on the relative influence of sample sizes and sampling designs on parameters of interest in short-duration Chinook Salmon fisheries.
Laser photogrammetry improves size and demographic estimates for whale sharks
Richardson, Anthony J.; Prebble, Clare E.M.; Marshall, Andrea D.; Bennett, Michael B.; Weeks, Scarla J.; Cliff, Geremy; Wintner, Sabine P.; Pierce, Simon J.
2015-01-01
Whale sharks Rhincodon typus are globally threatened, but a lack of biological and demographic information hampers an accurate assessment of their vulnerability to further decline or capacity to recover. We used laser photogrammetry at two aggregation sites to obtain more accurate size estimates of free-swimming whale sharks compared to visual estimates, allowing improved estimates of biological parameters. Individual whale sharks ranged from 432–917 cm total length (TL) (mean ± SD = 673 ± 118.8 cm, N = 122) in southern Mozambique and from 420–990 cm TL (mean ± SD = 641 ± 133 cm, N = 46) in Tanzania. By combining measurements of stranded individuals with photogrammetry measurements of free-swimming sharks, we calculated length at 50% maturity for males in Mozambique at 916 cm TL. Repeat measurements of individual whale sharks measured over periods from 347–1,068 days yielded implausible growth rates, suggesting that the growth increment over this period was not large enough to be detected using laser photogrammetry, and that the method is best applied to estimating growth rates over longer (decadal) time periods. The sex ratio of both populations was biased towards males (74% in Mozambique, 89% in Tanzania), the majority of which were immature (98% in Mozambique, 94% in Tanzania). The population structure for these two aggregations was similar to most other documented whale shark aggregations around the world. Information on small (sharks, mature individuals, and females in this region is lacking, but necessary to inform conservation initiatives for this globally threatened species. PMID:25870776
Complex sample survey estimation in static state-space
Raymond L. Czaplewski
2010-01-01
Increased use of remotely sensed data is a key strategy adopted by the Forest Inventory and Analysis Program. However, multiple sensor technologies require complex sampling units and sampling designs. The Recursive Restriction Estimator (RRE) accommodates this complexity. It is a design-consistent Empirical Best Linear Unbiased Prediction for the state-vector, which...
Sampling strategies for efficient estimation of tree foliage biomass
Hailemariam Temesgen; Vicente Monleon; Aaron Weiskittel; Duncan Wilson
2011-01-01
Conifer crowns can be highly variable both within and between trees, particularly with respect to foliage biomass and leaf area. A variety of sampling schemes have been used to estimate biomass and leaf area at the individual tree and stand scales. Rarely has the effectiveness of these sampling schemes been compared across stands or even across species. In addition,...
Atkins, T. J.; Humphrey, V. F.; Duck, F. A.; Tooley, M. A.
2011-02-01
The response of two coaxially aligned weakly focused ultrasonic transducers, typical of those employed for measuring the attenuation of small samples using the immersion method, has been investigated. The effects of the sample size on transmission measurements have been analyzed by integrating the sound pressure distribution functions of the radiator and receiver over different limits to determine the size of the region that contributes to the system response. The results enable the errors introduced into measurements of attenuation to be estimated as a function of sample size. A theoretical expression has been used to examine how the transducer separation affects the receiver output. The calculations are compared with an experimental study of the axial response of three unpaired transducers in water. The separation of each transducer pair giving the maximum response was determined, and compared with the field characteristics of the individual transducers. The optimum transducer separation, for accurate estimation of sample properties, was found to fall between the sum of the focal distances and the sum of the geometric focal lengths as this reduced diffraction errors.
Sample Size of One: Operational Qualitative Analysis in the Classroom
Directory of Open Access Journals (Sweden)
John Hoven
2015-10-01
Full Text Available Qualitative analysis has two extraordinary capabilities: first, finding answers to questions we are too clueless to ask; and second, causal inference – hypothesis testing and assessment – within a single unique context (sample size of one. These capabilities are broadly useful, and they are critically important in village-level civil-military operations. Company commanders need to learn quickly, "What are the problems and possibilities here and now, in this specific village? What happens if we do A, B, and C?" – and that is an ill-defined, one-of-a-kind problem. The U.S. Army's Eighty-Third Civil Affairs Battalion is our "first user" innovation partner in a new project to adapt qualitative research methods to an operational tempo and purpose. Our aim is to develop a simple, low-cost methodology and training program for local civil-military operations conducted by non-specialist conventional forces. Complementary to that, this paper focuses on some essential basics that can be implemented by college professors without significant cost, effort, or disruption.
Estimating the Size and Impact of the Ecological Restoration Economy.
Directory of Open Access Journals (Sweden)
Todd BenDor
Full Text Available Domestic public debate continues over the economic impacts of environmental regulations that require environmental restoration. This debate has occurred in the absence of broad-scale empirical research on economic output and employment resulting from environmental restoration, restoration-related conservation, and mitigation actions - the activities that are part of what we term the "restoration economy." In this article, we provide a high-level accounting of the size and scope of the restoration economy in terms of employment, value added, and overall economic output on a national scale. We conducted a national survey of businesses that participate in restoration work in order to estimate the total sales and number of jobs directly associated with the restoration economy, and to provide a profile of this nascent sector in terms of type of restoration work, industrial classification, workforce needs, and growth potential. We use survey results as inputs into a national input-output model (IMPLAN 3.1 in order to estimate the indirect and induced economic impacts of restoration activities. Based on this analysis we conclude that the domestic ecological restoration sector directly employs ~ 126,000 workers and generates ~ $9.5 billion in economic output (sales annually. This activity supports an additional 95,000 jobs and $15 billion in economic output through indirect (business-to-business linkages and increased household spending.
Estimating the Size and Impact of the Ecological Restoration Economy.
BenDor, Todd; Lester, T William; Livengood, Avery; Davis, Adam; Yonavjak, Logan
2015-01-01
Domestic public debate continues over the economic impacts of environmental regulations that require environmental restoration. This debate has occurred in the absence of broad-scale empirical research on economic output and employment resulting from environmental restoration, restoration-related conservation, and mitigation actions - the activities that are part of what we term the "restoration economy." In this article, we provide a high-level accounting of the size and scope of the restoration economy in terms of employment, value added, and overall economic output on a national scale. We conducted a national survey of businesses that participate in restoration work in order to estimate the total sales and number of jobs directly associated with the restoration economy, and to provide a profile of this nascent sector in terms of type of restoration work, industrial classification, workforce needs, and growth potential. We use survey results as inputs into a national input-output model (IMPLAN 3.1) in order to estimate the indirect and induced economic impacts of restoration activities. Based on this analysis we conclude that the domestic ecological restoration sector directly employs ~ 126,000 workers and generates ~ $9.5 billion in economic output (sales) annually. This activity supports an additional 95,000 jobs and $15 billion in economic output through indirect (business-to-business) linkages and increased household spending.
Limitations of mRNA amplification from small-size cell samples
Directory of Open Access Journals (Sweden)
Myklebost Ola
2005-10-01
Full Text Available Abstract Background Global mRNA amplification has become a widely used approach to obtain gene expression profiles from limited material. An important concern is the reliable reflection of the starting material in the results obtained. This is especially important with extremely low quantities of input RNA where stochastic effects due to template dilution may be present. This aspect remains under-documented in the literature, as quantitative measures of data reliability are most often lacking. To address this issue, we examined the sensitivity levels of each transcript in 3 different cell sample sizes. ANOVA analysis was used to estimate the overall effects of reduced input RNA in our experimental design. In order to estimate the validity of decreasing sample sizes, we examined the sensitivity levels of each transcript by applying a novel model-based method, TransCount. Results From expression data, TransCount provided estimates of absolute transcript concentrations in each examined sample. The results from TransCount were used to calculate the Pearson correlation coefficient between transcript concentrations for different sample sizes. The correlations were clearly transcript copy number dependent. A critical level was observed where stochastic fluctuations became significant. The analysis allowed us to pinpoint the gene specific number of transcript templates that defined the limit of reliability with respect to number of cells from that particular source. In the sample amplifying from 1000 cells, transcripts expressed with at least 121 transcripts/cell were statistically reliable and for 250 cells, the limit was 1806 transcripts/cell. Above these thresholds, correlation between our data sets was at acceptable values for reliable interpretation. Conclusion These results imply that the reliability of any amplification experiment must be validated empirically to justify that any gene exists in sufficient quantity in the input material. This
Wang, Ji-Peng; François, Bertrand; Lambert, Pierre
2017-09-01
Estimating hydraulic conductivity from particle size distribution (PSD) is an important issue for various engineering problems. Classical models such as Hazen model, Beyer model, and Kozeny-Carman model usually regard the grain diameter at 10% passing (d10) as an effective grain size and the effects of particle size uniformity (in Beyer model) or porosity (in Kozeny-Carman model) are sometimes embedded. This technical note applies the dimensional analysis (Buckingham's ∏ theorem) to analyze the relationship between hydraulic conductivity and particle size distribution (PSD). The porosity is regarded as a dependent variable on the grain size distribution in unconsolidated conditions. It indicates that the coefficient of grain size uniformity and a dimensionless group representing the gravity effect, which is proportional to the mean grain volume, are the main two determinative parameters for estimating hydraulic conductivity. Regression analysis is then carried out on a database comprising 431 samples collected from different depositional environments and new equations are developed for hydraulic conductivity estimation. The new equation, validated in specimens beyond the database, shows an improved prediction comparing to using the classic models.
An Improvement to Interval Estimation for Small Samples
Directory of Open Access Journals (Sweden)
SUN Hui-Ling
2017-02-01
Full Text Available Because it is difficult and complex to determine the probability distribution of small samples，it is improper to use traditional probability theory to process parameter estimation for small samples. Bayes Bootstrap method is always used in the project. Although，the Bayes Bootstrap method has its own limitation，In this article an improvement is given to the Bayes Bootstrap method，This method extended the amount of samples by numerical simulation without changing the circumstances in a small sample of the original sample. And the new method can give the accurate interval estimation for the small samples. Finally，by using the Monte Carlo simulation to model simulation to the specific small sample problems. The effectiveness and practicability of the Improved-Bootstrap method was proved.
Estimation of Raindrop size Distribution over Darjeeling (India)
Mehta, Shyam; Mitra, Amitabha
2016-07-01
A study of rain drop size distribution (DSD) model over Darjeeling (27001'N, 88015'E), India, has been carried out using a Micro Rain Radar (MRR). In this article on the basis of MRR which measured DSD (number of rain drop size and rain rates with the time interval of one minute), at the particular heights and the different rain rates. It starts the simulating data for using the general formula moment of the gamma DSD; however, Applying the method by DSD model of exponential, lognormal, and gamma, to check the true estimation of drop size distributions and it has been estimated by the lower order moments and higher order moments for gamma Distributions. It shows the DSD at different altitudes from 150 m to 2000 m, in the vertical steps of 500 m. however it has been simulated the DSD data about 2 km out of 4.5 km. (I). At the height of 150 m where most of DSD behaves gamma Distributions according to the moments order of low and the moments order of high, However, where occupying low concentrations for any rain rates, (ii). Upper altitudes from 450 m to 2000 m as where as shown most of DSD behaves gamma Distributions according to the moments order of high only, However, where occupying high concentrations for any rain rates. DSD at the altitudes of 2 km and even more 4.5 km (as not shown) but every height behaves more or less similar manner except at the height of 150 m, The DSD of empirical model has been derived on the basis of fit parameters evaluated from experimental data. It is observed that data fits well in gamma distribution for Darjeeling. And relation between slope (ΛɅ) and shape (μµ) which bears the best resemblance at the height of 150m (ground surface) at the lower order moments by the linear fit for any rain rates. In higher altitudes obtained where shape (μ) and slope (ΛɅ) which is not making any resemblance by the linear fit or polynomial fit for any rain rates in Darjeeling.
Genetic sampling for estimating density of common species.
Cheng, Ellen; Hodges, Karen E; Sollmann, Rahel; Mills, L Scott
2017-08-01
Understanding population dynamics requires reliable estimates of population density, yet this basic information is often surprisingly difficult to obtain. With rare or difficult-to-capture species, genetic surveys from noninvasive collection of hair or scat has proved cost-efficient for estimating densities. Here, we explored whether noninvasive genetic sampling (NGS) also offers promise for sampling a relatively common species, the snowshoe hare (Lepus americanus Erxleben, 1777), in comparison with traditional live trapping. We optimized a protocol for single-session NGS sampling of hares. We compared spatial capture-recapture population estimates from live trapping to estimates derived from NGS, and assessed NGS costs. NGS provided population estimates similar to those derived from live trapping, but a higher density of sampling plots was required for NGS. The optimal NGS protocol for our study entailed deploying 160 sampling plots for 4 days and genotyping one pellet per plot. NGS laboratory costs ranged from approximately $670 to $3000 USD per field site. While live trapping does not incur laboratory costs, its field costs can be considerably higher than for NGS, especially when study sites are difficult to access. We conclude that NGS can work for common species, but that it will require field and laboratory pilot testing to develop cost-effective sampling protocols.
On Regression Estimators Using Extreme Ranked Set Samples
Directory of Open Access Journals (Sweden)
Hani M. Samawi
2004-06-01
Full Text Available Regression is used to estimate the population mean of the response variable, , in the two cases where the population mean of the concomitant (auxiliary variable, , is known and where it is unknown. In the latter case, a double sampling method is used to estimate the population mean of the concomitant variable. We invesitagate the performance of the two methods using extreme ranked set sampling (ERSS, as discussed by Samawi et al. (1996. Theoretical and Monte Carlo evaluation results as well as an illustration using actual data are presented. The results show that if the underlying joint distribution of and is symmetric, then using ERSS to obtain regression estimates is more efficient than using ranked set sampling (RSS or simple random sampling (SRS.
Threshold-dependent sample sizes for selenium assessment with stream fish tissue.
Hitt, Nathaniel P; Smith, David R
2015-01-01
for estimating mean conditions. However, low sample sizes (<5 fish) did not achieve 80% power to detect near-threshold values (i.e., <1 mg Se/kg) under any scenario we evaluated. This analysis can assist the sampling design and interpretation of Se assessments from fish tissue by accounting for natural variation in stream fish populations. This article is a US government work and, as such, is in the public domain in the United States of America.
Threshold-dependent sample sizes for selenium assessment with stream fish tissue
Hitt, Nathaniel P.; Smith, David R.
2015-01-01
precision of composites for estimating mean conditions. However, low sample sizes (sampling design and interpretation of Se assessments from fish tissue by accounting for natural variation in stream fish populations.
[A comparative study of different sampling designs in fish community estimation].
Zhao, Jing; Zhang, Shou-Yu; Lin, Jun; Zhou, Xi-Jie
2014-04-01
The study of fishery community ecology depends on quality and quantity of data collected from well-designed sampling programs. The optimal sampling design must be cost-efficient, and the sampling results have been recognized as a significant factor affecting resources management. In this paper, the performances of stationary sampling, simple random sampling and stratified random sampling in estimating fish community were compared based on computer simulation by design effect (De), relative error (REE) and relative bias (RB). The results showed that, De of stationary sampling (average De was 3.37) was worse than simple random sampling and stratified random sampling (average De was 0.961). Stratified random sampling performed best among the three designs in terms of De, REE and RB. With the sample size increased, the design effect of stratified random sampling decreased but the precision and accuracy increased.
Designing image segmentation studies: Statistical power, sample size and reference standard quality.
Gibson, Eli; Hu, Yipeng; Huisman, Henkjan J; Barratt, Dean C
2017-12-01
Segmentation algorithms are typically evaluated by comparison to an accepted reference standard. The cost of generating accurate reference standards for medical image segmentation can be substantial. Since the study cost and the likelihood of detecting a clinically meaningful difference in accuracy both depend on the size and on the quality of the study reference standard, balancing these trade-offs supports the efficient use of research resources. In this work, we derive a statistical power calculation that enables researchers to estimate the appropriate sample size to detect clinically meaningful differences in segmentation accuracy (i.e. the proportion of voxels matching the reference standard) between two algorithms. Furthermore, we derive a formula to relate reference standard errors to their effect on the sample sizes of studies using lower-quality (but potentially more affordable and practically available) reference standards. The accuracy of the derived sample size formula was estimated through Monte Carlo simulation, demonstrating, with 95% confidence, a predicted statistical power within 4% of simulated values across a range of model parameters. This corresponds to sample size errors of less than 4 subjects and errors in the detectable accuracy difference less than 0.6%. The applicability of the formula to real-world data was assessed using bootstrap resampling simulations for pairs of algorithms from the PROMISE12 prostate MR segmentation challenge data set. The model predicted the simulated power for the majority of algorithm pairs within 4% for simulated experiments using a high-quality reference standard and within 6% for simulated experiments using a low-quality reference standard. A case study, also based on the PROMISE12 data, illustrates using the formulae to evaluate whether to use a lower-quality reference standard in a prostate segmentation study. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
SAMPLE SIZE DETERMINATION IN NON-RADOMIZED SURVIVAL STUDIES WITH NON-CENSORED AND CENSORED DATA
Directory of Open Access Journals (Sweden)
S FAGHIHZADEH
2003-06-01
Full Text Available Introduction: In survival analysis, determination of sufficient sample size to achieve suitable statistical power is important .In both parametric and non-parametric methods of classic statistics, randomn selection of samples is a basic condition. practically, in most clinical trials and health surveys randomn allocation is impossible. Fixed - effect multiple linear regression analysis covers this need and this feature could be extended to survival regression analysis. This paper is the result of sample size determination in non-randomnized surval analysis with censored and non -censored data. Methods: In non-randomnized survival studies, linear regression with fixed -effect variable could be used. In fact such a regression is conditional expectation of dependent variable, conditioned on independent variable. Likelihood fuction with exponential hazard constructed by considering binary variable for allocation of each subject to one of two comparing groups, stating the variance of coefficient of fixed - effect independent variable by determination coefficient , sample size determination formulas are obtained with both censored and non-cencored data. So estimation of sample size is not based on the relation of a single independent variable but it could be attain the required power for a test adjusted for effect of the other explanatory covariates. Since the asymptotic distribution of the likelihood estimator of parameter is normal, we obtained the variance of the regression coefficient estimator formula then by stating the variance of regression coefficient of fixed-effect variable, by determination coefficient we derived formulas for determination of sample size in both censored and non-censored data. Results: In no-randomnized survival analysis ,to compare hazard rates of two groups without censored data, we obtained an estimation of determination coefficient ,risk ratio and proportion of membership to each group and their variances from
Horowitz, Arthur J.; Clarke, Robin T.; Merten, Gustavo Henrique
2015-01-01
Since the 1970s, there has been both continuing and growing interest in developing accurate estimates of the annual fluvial transport (fluxes and loads) of suspended sediment and sediment-associated chemical constituents. This study provides an evaluation of the effects of manual sample numbers (from 4 to 12 year−1) and sample scheduling (random-based, calendar-based and hydrology-based) on the precision, bias and accuracy of annual suspended sediment flux estimates. The evaluation is based on data from selected US Geological Survey daily suspended sediment stations in the USA and covers basins ranging in area from just over 900 km2 to nearly 2 million km2 and annual suspended sediment fluxes ranging from about 4 Kt year−1 to about 200 Mt year−1. The results appear to indicate that there is a scale effect for random-based and calendar-based sampling schemes, with larger sample numbers required as basin size decreases. All the sampling schemes evaluated display some level of positive (overestimates) or negative (underestimates) bias. The study further indicates that hydrology-based sampling schemes are likely to generate the most accurate annual suspended sediment flux estimates with the fewest number of samples, regardless of basin size. This type of scheme seems most appropriate when the determination of suspended sediment concentrations, sediment-associated chemical concentrations, annual suspended sediment and annual suspended sediment-associated chemical fluxes only represent a few of the parameters of interest in multidisciplinary, multiparameter monitoring programmes. The results are just as applicable to the calibration of autosamplers/suspended sediment surrogates currently used to measure/estimate suspended sediment concentrations and ultimately, annual suspended sediment fluxes, because manual samples are required to adjust the sample data/measurements generated by these techniques so that they provide depth-integrated and cross
Harry T. Valentine; David L. R. Affleck; Timothy G. Gregoire
2009-01-01
Systematic sampling is easy, efficient, and widely used, though it is not generally recognized that a systematic sample may be drawn from the population of interest with or without restrictions on randomization. The restrictions or the lack of them determine which estimators are unbiased, when using the sampling design as the basis for inference. We describe the...
Directory of Open Access Journals (Sweden)
Alberto Cargnelutti Filho
2012-09-01
Full Text Available O objetivo deste trabalho foi determinar o tamanho de amostra necessário para a estimação da média do comprimento, dos diâmetros maior e menor e da massa de sementes de feijão de porco (Canavalia ensiformis e de mucuna cinza (Stizolobium cinereum. Em 300 sementes de feijão de porco e em 300 sementes de mucuna cinza, foram mensurados os seguintes caracteres: comprimento, diâmetros maior e menor e massa. Foram calculadas medidas de tendência central e de variabilidade. Após, foram testadas as hipóteses de igualdade entre as médias e de homogeneidade entre as variâncias. Foi determinado o tamanho de amostra por meio de reamostragem, com reposição de 10.000 amostras. Para a estimação da média do comprimento, dos diâmetros maior e menor e da massa, com intervalo de confiança de 95% igual a 10% da estimativa da média, 117 e 66 sementes são suficientes, respectivamente, para feijão de porco e mucuna cinza.The objective of this research was to determine the sample size (number of seeds to estimate the average of length, major and minor diameters and weight of seeds of jack bean (Canavalia ensiformis and velvet bean (Stizolobium cinereum. In 300 seeds of jack bean and 300 seeds of velvet bean it was measured following characters: length, major and minor diameters and weight. It was calculated measures of central tendency and variability. After the hypothesis of equality between the means and homogeneity of variances, were tested. It was determined the sample size using resampling with replacement of 10,000 samples. For estimating the average of length, major and minor diameters and weight, with amplitude of confidence interval of 95%, equal 10% of average estimate, 117 and 66 seeds are sufficient, respectively, for jack bean and velvet bean.
Rooper, Christopher N.; Martin, Michael H.; Butler, John L.; Jones, Darin T.; Zimmermann, Mark
2012-01-01
Rockfish (Sebastes spp.) biomass is difficult to assess with standard bottom trawl or acoustic surveys because of their propensity to aggregate near the seafloor in highrelief areas that are inaccessible to sampling by trawling. We compared the ability of a remotely operated vehicle (ROV), a modified bottom trawl, and a stereo drop camera system (SDC) to identify rockfish species and estimate their size composition. The ability to discriminate species was highest for the bottom trawl...
Sample Size and Probability Threshold Considerations with the Tailored Data Method.
Wyse, Adam E
This article discusses sample size and probability threshold considerations in the use of the tailored data method with the Rasch model. In the tailored data method, one performs an initial Rasch analysis and then reanalyzes data after setting item responses to missing that are below a chosen probability threshold. A simple analytical formula is provided that can be used to check whether or not the application of the tailored data method with a chosen probability threshold will create situations in which the number of remaining item responses for the Rasch calibration will or will not meet minimum sample size requirements. The formula is illustrated using a real data example from a medical imaging licensure exam with several different probability thresholds. It is shown that as the probability threshold was increased more item responses were set to missing and the parameter standard errors and item difficulty estimates also tended to increase. It is suggested that some consideration should be given to the chosen probability threshold and how this interacts with potential examinee sample sizes and the accuracy of parameter estimates when calibrating data with the tailored data method.
ESTIMATING SOIL PARTICLE-SIZE DISTRIBUTION FOR SICILIAN SOILS
Directory of Open Access Journals (Sweden)
Vincenzo Bagarello
2009-09-01
Full Text Available The soil particle-size distribution (PSD is commonly used for soil classification and for estimating soil behavior. An accurate mathematical representation of the PSD is required to estimate soil hydraulic properties and to compare texture measurements from different classification systems. The objective of this study was to evaluate the ability of the Haverkamp and Parlange (HP and Fredlund et al. (F PSD models to fit 243 measured PSDs from a wide range of 38 005_Bagarello(547_33 18-11-2009 11:55 Pagina 38 soil textures in Sicily and to test the effect of the number of measured particle diameters on the fitting of the theoretical PSD. For each soil textural class, the best fitting performance, established using three statistical indices (MXE, ME, RMSE, was obtained for the F model with three fitting parameters. In particular, this model performed better in the fine-textured soils than the coarse-textured ones but a good performance (i.e., RMSE < 0.03 was detected for the majority of the investigated soil textural classes, i.e. clay, silty-clay, silty-clay-loam, silt-loam, clay-loam, loamy-sand, and loam classes. Decreasing the number of measured data pairs from 14 to eight determined a worse fitting of the theoretical distribution to the measured one. It was concluded that the F model with three fitting parameters has a wide applicability for Sicilian soils and that the comparison of different PSD investigations can be affected by the number of measured data pairs.
Technical Note: Trend estimation from irregularly sampled, correlated data
Directory of Open Access Journals (Sweden)
T. von Clarmann
2010-07-01
Full Text Available Estimation of a trend of an atmospheric state variable is usually performed by fitting a linear regression line to a set of data of this variable sampled at different times. Often these data are irregularly sampled in space and time and clustered in a sense that error correlations among data points cause a similar error of data points sampled at similar times. Since this can affect the estimated trend, we suggest to take the full error covariance matrix of the data into account. Superimposed periodic variations can be jointly fitted in a straightforward manner, even if the shape of the periodic function is not known. Global data sets, particularly satellite data, can form the basis to estimate the error correlations. State-dependent amplitudes of superimposed periodic corrections result in a non-linear optimization problem which is solved iteratively.
Joint inversion of NMR and SIP data to estimate pore size distribution of geomaterials
Niu, Qifei; Zhang, Chi
2018-03-01
There are growing interests in using geophysical tools to characterize the microstructure of geomaterials because of the non-invasive nature and the applicability in field. In these applications, multiple types of geophysical data sets are usually processed separately, which may be inadequate to constrain the key feature of target variables. Therefore, simultaneous processing of multiple data sets could potentially improve the resolution. In this study, we propose a method to estimate pore size distribution by joint inversion of nuclear magnetic resonance (NMR) T2 relaxation and spectral induced polarization (SIP) spectra. The petrophysical relation between NMR T2 relaxation time and SIP relaxation time is incorporated in a nonlinear least squares problem formulation, which is solved using Gauss-Newton method. The joint inversion scheme is applied to a synthetic sample and a Berea sandstone sample. The jointly estimated pore size distributions are very close to the true model and results from other experimental method. Even when the knowledge of the petrophysical models of the sample is incomplete, the joint inversion can still capture the main features of the pore size distribution of the samples, including the general shape and relative peak positions of the distribution curves. It is also found from the numerical example that the surface relaxivity of the sample could be extracted with the joint inversion of NMR and SIP data if the diffusion coefficient of the ions in the electrical double layer is known. Comparing to individual inversions, the joint inversion could improve the resolution of the estimated pore size distribution because of the addition of extra data sets. The proposed approach might constitute a first step towards a comprehensive joint inversion that can extract the full pore geometry information of a geomaterial from NMR and SIP data.
40 CFR 761.243 - Standard wipe sample method and size.
2010-07-01
... 40 Protection of Environment 30 2010-07-01 2010-07-01 false Standard wipe sample method and size... Natural Gas Pipeline: Selecting Sample Sites, Collecting Surface Samples, and Analyzing Standard PCB Wipe Samples § 761.243 Standard wipe sample method and size. (a) Collect a surface sample from a natural gas...
Estimation of measurement uncertainty arising from manual sampling of fuels.
Theodorou, Dimitrios; Liapis, Nikolaos; Zannikos, Fanourios
2013-02-15
Sampling is an important part of any measurement process and is therefore recognized as an important contributor to the measurement uncertainty. A reliable estimation of the uncertainty arising from sampling of fuels leads to a better control of risks associated with decisions concerning whether product specifications are met or not. The present work describes and compares the results of three empirical statistical methodologies (classical ANOVA, robust ANOVA and range statistics) using data from a balanced experimental design, which includes duplicate samples analyzed in duplicate from 104 sampling targets (petroleum retail stations). These methodologies are used for the estimation of the uncertainty arising from the manual sampling of fuel (automotive diesel) and the subsequent sulfur mass content determination. The results of the three methodologies statistically differ, with the expanded uncertainty of sampling being in the range of 0.34-0.40 mg kg(-1), while the relative expanded uncertainty lying in the range of 4.8-5.1%, depending on the methodology used. The estimation of robust ANOVA (sampling expanded uncertainty of 0.34 mg kg(-1) or 4.8% in relative terms) is considered more reliable, because of the presence of outliers within the 104 datasets used for the calculations. Robust ANOVA, in contrast to classical ANOVA and range statistics, accommodates outlying values, lessening their effects on the produced estimates. The results of this work also show that, in the case of manual sampling of fuels, the main contributor to the whole measurement uncertainty is the analytical measurement uncertainty, with the sampling uncertainty accounting only for the 29% of the total measurement uncertainty. Copyright © 2012 Elsevier B.V. All rights reserved.
Guo, Jiin-Huarng; Luh, Wei-Ming
2009-05-01
When planning a study, sample size determination is one of the most important tasks facing the researcher. The size will depend on the purpose of the study, the cost limitations, and the nature of the data. By specifying the standard deviation ratio and/or the sample size ratio, the present study considers the problem of heterogeneous variances and non-normality for Yuen's two-group test and develops sample size formulas to minimize the total cost or maximize the power of the test. For a given power, the sample size allocation ratio can be manipulated so that the proposed formulas can minimize the total cost, the total sample size, or the sum of total sample size and total cost. On the other hand, for a given total cost, the optimum sample size allocation ratio can maximize the statistical power of the test. After the sample size is determined, the present simulation applies Yuen's test to the sample generated, and then the procedure is validated in terms of Type I errors and power. Simulation results show that the proposed formulas can control Type I errors and achieve the desired power under the various conditions specified. Finally, the implications for determining sample sizes in experimental studies and future research are discussed.
Sample size calculations for evaluating treatment policies in multi-stage designs.
Dawson, Ree; Lavori, Philip W
2010-12-01
Sequential multiple assignment randomized (SMAR) designs are used to evaluate treatment policies, also known as adaptive treatment strategies (ATS). The determination of SMAR sample sizes is challenging because of the sequential and adaptive nature of ATS, and the multi-stage randomized assignment used to evaluate them. We derive sample size formulae appropriate for the nested structure of successive SMAR randomizations. This nesting gives rise to ATS that have overlapping data, and hence between-strategy covariance. We focus on the case when covariance is substantial enough to reduce sample size through improved inferential efficiency. Our design calculations draw upon two distinct methodologies for SMAR trials, using the equality of the optimal semi-parametric and Bayesian predictive estimators of standard error. This 'hybrid' approach produces a generalization of the t-test power calculation that is carried out in terms of effect size and regression quantities familiar to the trialist. Simulation studies support the reasonableness of underlying assumptions as well as the adequacy of the approximation to between-strategy covariance when it is substantial. Investigation of the sensitivity of formulae to misspecification shows that the greatest influence is due to changes in effect size, which is an a priori clinical judgment on the part of the trialist. We have restricted simulation investigation to SMAR studies of two and three stages, although the methods are fully general in that they apply to 'K-stage' trials. Practical guidance is needed to allow the trialist to size a SMAR design using the derived methods. To this end, we define ATS to be 'distinct' when they differ by at least the (minimal) size of effect deemed to be clinically relevant. Simulation results suggest that the number of subjects needed to distinguish distinct strategies will be significantly reduced by adjustment for covariance only when small effects are of interest.
Robust Estimation of Diffusion-Optimized Ensembles for Enhanced Sampling
DEFF Research Database (Denmark)
Tian, Pengfei; Jónsson, Sigurdur Æ.; Ferkinghoff-Borg, Jesper
2014-01-01
The multicanonical, or flat-histogram, method is a common technique to improve the sampling efficiency of molecular simulations. The idea is that free-energy barriers in a simulation can be removed by simulating from a distribution where all values of a reaction coordinate are equally likely...... accurate estimates of the diffusion coefficient. Here, we present a simple, yet robust solution to this problem. Compared to current state-of-the-art procedures, the new estimation method requires an order of magnitude fewer data to obtain reliable estimates, thus broadening the potential scope in which...
Influence of Sample Size of Polymer Materials on Aging Characteristics in the Salt Fog Test
Otsubo, Masahisa; Anami, Naoya; Yamashita, Seiji; Honda, Chikahisa; Takenouchi, Osamu; Hashimoto, Yousuke
Polymer insulators have been used in worldwide because of some superior properties; light weight, high mechanical strength, good hydrophobicity etc., as compared with porcelain insulators. In this paper, effect of sample size on the aging characteristics in the salt fog test is examined. Leakage current was measured by using 100 MHz AD board or 100 MHz digital oscilloscope and separated three components as conductive current, corona discharge current and dry band arc discharge current by using FFT and the current differential method newly proposed. Each component cumulative charge was estimated automatically by a personal computer. As the results, when the sample size increased under the same average applied electric field, the peak values of leakage current and each component current increased. Especially, the cumulative charges and the arc discharge length of dry band arc discharge increased remarkably with the increase of gap length.
An Efficient Estimator for the Expected Value of Sample Information.
Menzies, Nicolas A
2016-04-01
Conventional estimators for the expected value of sample information (EVSI) are computationally expensive or limited to specific analytic scenarios. I describe a novel approach that allows efficient EVSI computation for a wide range of study designs and is applicable to models of arbitrary complexity. The posterior parameter distribution produced by a hypothetical study is estimated by reweighting existing draws from the prior distribution. EVSI can then be estimated using a conventional probabilistic sensitivity analysis, with no further model evaluations and with a simple sequence of calculations (Algorithm 1). A refinement to this approach (Algorithm 2) uses smoothing techniques to improve accuracy. Algorithm performance was compared with the conventional EVSI estimator (2-level Monte Carlo integration) and an alternative developed by Brennan and Kharroubi (BK), in a cost-effectiveness case study. Compared with the conventional estimator, Algorithm 2 exhibited a root mean square error (RMSE) 8%-17% lower, with far fewer model evaluations (3-4 orders of magnitude). Algorithm 1 produced results similar to those of the conventional estimator when study evidence was weak but underestimated EVSI when study evidence was strong. Compared with the BK estimator, the proposed algorithms reduced RSME by 18%-38% in most analytic scenarios, with 40 times fewer model evaluations. Algorithm 1 performed poorly in the context of strong study evidence. All methods were sensitive to the number of samples in the outer loop of the simulation. The proposed algorithms remove two major challenges for estimating EVSI--the difficulty of estimating the posterior parameter distribution given hypothetical study data and the need for many model evaluations to obtain stable and unbiased results. These approaches make EVSI estimation feasible for a wide range of analytic scenarios. © The Author(s) 2015.
Comparing Server Energy Use and Efficiency Using Small Sample Sizes
Energy Technology Data Exchange (ETDEWEB)
Coles, Henry C.; Qin, Yong; Price, Phillip N.
2014-11-01
This report documents a demonstration that compared the energy consumption and efficiency of a limited sample size of server-type IT equipment from different manufacturers by measuring power at the server power supply power cords. The results are specific to the equipment and methods used. However, it is hoped that those responsible for IT equipment selection can used the methods described to choose models that optimize energy use efficiency. The demonstration was conducted in a data center at Lawrence Berkeley National Laboratory in Berkeley, California. It was performed with five servers of similar mechanical and electronic specifications; three from Intel and one each from Dell and Supermicro. Server IT equipment is constructed using commodity components, server manufacturer-designed assemblies, and control systems. Server compute efficiency is constrained by the commodity component specifications and integration requirements. The design freedom, outside of the commodity component constraints, provides room for the manufacturer to offer a product with competitive efficiency that meets market needs at a compelling price. A goal of the demonstration was to compare and quantify the server efficiency for three different brands. The efficiency is defined as the average compute rate (computations per unit of time) divided by the average energy consumption rate. The research team used an industry standard benchmark software package to provide a repeatable software load to obtain the compute rate and provide a variety of power consumption levels. Energy use when the servers were in an idle state (not providing computing work) were also measured. At high server compute loads, all brands, using the same key components (processors and memory), had similar results; therefore, from these results, it could not be concluded that one brand is more efficient than the other brands. The test results show that the power consumption variability caused by the key components as a
Determination of reference limits: statistical concepts and tools for sample size calculation.
Wellek, Stefan; Lackner, Karl J; Jennen-Steinmetz, Christine; Reinhard, Iris; Hoffmann, Isabell; Blettner, Maria
2014-12-01
Reference limits are estimators for 'extreme' percentiles of the distribution of a quantitative diagnostic marker in the healthy population. In most cases, interest will be in the 90% or 95% reference intervals. The standard parametric method of determining reference limits consists of computing quantities of the form X̅±c·S. The proportion of covered values in the underlying population coincides with the specificity obtained when a measurement value falling outside the corresponding reference region is classified as diagnostically suspect. Nonparametrically, reference limits are estimated by means of so-called order statistics. In both approaches, the precision of the estimate depends on the sample size. We present computational procedures for calculating minimally required numbers of subjects to be enrolled in a reference study. The much more sophisticated concept of reference bands replacing statistical reference intervals in case of age-dependent diagnostic markers is also discussed.
Shieh, Gwowen
2013-01-01
The a priori determination of a proper sample size necessary to achieve some specified power is an important problem encountered frequently in practical studies. To establish the needed sample size for a two-sample "t" test, researchers may conduct the power analysis by specifying scientifically important values as the underlying population means…
Small-sample robust estimators of noncentrality-based and incremental model fit
Boomsma, Anne; Herzog, W.
2009-01-01
Traditional estimators of fit measures based on the noncentral chi-square distribution (root mean square error of approximation [RMSEA], Steiger's , etc.) tend to overreject acceptable models when the sample size is small. To handle this problem, it is proposed to employ Bartlett's (1950), Yuan's
Precise, unbiased estimates of population size are an essential tool for fisheries management. For a wide variety of salmonid fishes, redd counts from a sample of reaches are commonly used to monitor annual trends in abundance. Using a 9-year time series of georeferenced censuses...
The role of the upper sample size limit in two-stage bioequivalence designs.
Karalis, Vangelis
2013-11-01
Two-stage designs (TSDs) are currently recommended by the regulatory authorities for bioequivalence (BE) assessment. The TSDs presented until now rely on an assumed geometric mean ratio (GMR) value of the BE metric in stage I in order to avoid inflation of type I error. In contrast, this work proposes a more realistic TSD design where sample re-estimation relies not only on the variability of stage I, but also on the observed GMR. In these cases, an upper sample size limit (UL) is introduced in order to prevent inflation of type I error. The aim of this study is to unveil the impact of UL on two TSD bioequivalence approaches which are based entirely on the interim results. Monte Carlo simulations were used to investigate several different scenarios of UL levels, within-subject variability, different starting number of subjects, and GMR. The use of UL leads to no inflation of type I error. As UL values increase, the % probability of declaring BE becomes higher. The starting sample size and the variability of the study affect type I error. Increased UL levels result in higher total sample sizes of the TSD which are more pronounced for highly variable drugs. Copyright © 2013 Elsevier B.V. All rights reserved.
Sample size allocation for food item radiation monitoring and safety inspection.
Seto, Mayumi; Uriu, Koichiro
2015-03-01
The objective of this study is to identify a procedure for determining sample size allocation for food radiation inspections of more than one food item to minimize the potential risk to consumers of internal radiation exposure. We consider a simplified case of food radiation monitoring and safety inspection in which a risk manager is required to monitor two food items, milk and spinach, in a contaminated area. Three protocols for food radiation monitoring with different sample size allocations were assessed by simulating random sampling and inspections of milk and spinach in a conceptual monitoring site. Distributions of (131)I and radiocesium concentrations were determined in reference to (131)I and radiocesium concentrations detected in Fukushima prefecture, Japan, for March and April 2011. The results of the simulations suggested that a protocol that allocates sample size to milk and spinach based on the estimation of (131)I and radiocesium concentrations using the apparent decay rate constants sequentially calculated from past monitoring data can most effectively minimize the potential risks of internal radiation exposure. © 2014 Society for Risk Analysis.
Tipton, Elizabeth; Pustejovsky, James E.
2015-01-01
Meta-analyses often include studies that report multiple effect sizes based on a common pool of subjects or that report effect sizes from several samples that were treated with very similar research protocols. The inclusion of such studies introduces dependence among the effect size estimates. When the number of studies is large, robust variance…
Estimating abundance of mountain lions from unstructured spatial sampling
Russell, Robin E.; Royle, J. Andrew; Desimone, Richard; Schwartz, Michael K.; Edwards, Victoria L.; Pilgrim, Kristy P.; Mckelvey, Kevin S.
2012-01-01
Mountain lions (Puma concolor) are often difficult to monitor because of their low capture probabilities, extensive movements, and large territories. Methods for estimating the abundance of this species are needed to assess population status, determine harvest levels, evaluate the impacts of management actions on populations, and derive conservation and management strategies. Traditional mark–recapture methods do not explicitly account for differences in individual capture probabilities due to the spatial distribution of individuals in relation to survey effort (or trap locations). However, recent advances in the analysis of capture–recapture data have produced methods estimating abundance and density of animals from spatially explicit capture–recapture data that account for heterogeneity in capture probabilities due to the spatial organization of individuals and traps. We adapt recently developed spatial capture–recapture models to estimate density and abundance of mountain lions in western Montana. Volunteers and state agency personnel collected mountain lion DNA samples in portions of the Blackfoot drainage (7,908 km2) in west-central Montana using 2 methods: snow back-tracking mountain lion tracks to collect hair samples and biopsy darting treed mountain lions to obtain tissue samples. Overall, we recorded 72 individual capture events, including captures both with and without tissue sample collection and hair samples resulting in the identification of 50 individual mountain lions (30 females, 19 males, and 1 unknown sex individual). We estimated lion densities from 8 models containing effects of distance, sex, and survey effort on detection probability. Our population density estimates ranged from a minimum of 3.7 mountain lions/100 km2 (95% Cl 2.3–5.7) under the distance only model (including only an effect of distance on detection probability) to 6.7 (95% Cl 3.1–11.0) under the full model (including effects of distance, sex, survey effort, and
Generalized sample size determination formulas for experimental research with hierarchical data.
Usami, Satoshi
2014-06-01
Hierarchical data sets arise when the data for lower units (e.g., individuals such as students, clients, and citizens) are nested within higher units (e.g., groups such as classes, hospitals, and regions). In data collection for experimental research, estimating the required sample size beforehand is a fundamental question for obtaining sufficient statistical power and precision of the focused parameters. The present research extends previous research from Heo and Leon (2008) and Usami (2011b), by deriving closed-form formulas for determining the required sample size to test effects in experimental research with hierarchical data, and by focusing on both multisite-randomized trials (MRTs) and cluster-randomized trials (CRTs). These formulas consider both statistical power and the width of the confidence interval of a standardized effect size, on the basis of estimates from a random-intercept model for three-level data that considers both balanced and unbalanced designs. These formulas also address some important results, such as the lower bounds of the needed units at the highest levels.
Bhandari, Mohit; Tornetta, Paul; Rampersad, Shelly-Ann; Sprague, Sheila; Heels-Ansdell, Diane; Sanders, David W.; Schemitsch, Emil H.; Swiontkowski, Marc; Walter, Stephen; Guyatt, Gordon; Buckingham, Lisa; Leece, Pamela; Viveiros, Helena; Mignott, Tashay; Ansell, Natalie; Sidorkewicz, Natalie; Agel, Julie; Bombardier, Claire; Berlin, Jesse A.; Bosse, Michael; Browner, Bruce; Gillespie, Brenda; O'Brien, Peter; Poolman, Rudolf; Macleod, Mark D.; Carey, Timothy; Leitch, Kellie; Bailey, Stuart; Gurr, Kevin; Konito, Ken; Bartha, Charlene; Low, Isolina; MacBean, Leila V.; Ramu, Mala; Reiber, Susan; Strapp, Ruth; Tieszer, Christina; Kreder, Hans; Stephen, David J. G.; Axelrod, Terry S.; Yee, Albert J. M.; Richards, Robin R.; Finkelstein, Joel; Holtby, Richard M.; Cameron, Hugh; Cameron, John; Gofton, Wade; Murnaghan, John; Schatztker, Joseph; Bulmer, Beverly; Conlan, Lisa; Laflamme, Yves; Berry, Gregory; Beaumont, Pierre; Ranger, Pierre; Laflamme, Georges-Henri; Jodoin, Alain; Renaud, Eric; Gagnon, Sylvain; Maurais, Gilles; Malo, Michel; Fernandes, Julio; Latendresse, Kim; Poirier, Marie-France; Daigneault, Gina; McKee, Michael M.; Waddell, James P.; Bogoch, Earl R.; Daniels, Timothy R.; McBroom, Robert R.; Vicente, Milena R.; Storey, Wendy; Wild, Lisa M.; McCormack, Robert; Perey, Bertrand; Goetz, Thomas J.; Pate, Graham; Penner, Murray J.; Panagiotopoulos, Kostas; Pirani, Shafique; Dommisse, Ian G.; Loomer, Richard L.; Stone, Trevor; Moon, Karyn; Zomar, Mauri; Webb, Lawrence X.; Teasdall, Robert D.; Birkedal, John Peter; Martin, David Franklin; Ruch, David S.; Kilgus, Douglas J.; Pollock, David C.; Harris, Mitchel Brion; Wiesler, Ethan Ron; Ward, William G.; Shilt, Jeffrey Scott; Koman, Andrew L.; Poehling, Gary G.; Kulp, Brenda; Creevy, William R.; Stein, Andrew B.; Bono, Christopher T.; Einhorn, Thomas A.; Brown, T. Desmond; Pacicca, Donna; Sledge, John B.; Foster, Timothy E.; Voloshin, Ilva; Bolton, Jill; Carlisle, Hope; Shaughnessy, Lisa; Ombremsky, William T.; LeCroy, C. Michael; Meinberg, Eric G.; Messer, Terry M.; Craig, William L.; Dirschl, Douglas R.; Caudle, Robert; Harris, Tim; Elhert, Kurt; Hage, William; Jones, Robert; Piedrahita, Luis; Schricker, Paul O.; Driver, Robin; Godwin, Jean; Hansley, Gloria; Obremskey, William Todd; Kregor, Philip James; Tennent, Gregory; Truchan, Lisa M.; Sciadini, Marcus; Shuler, Franklin D.; Driver, Robin E.; Nading, Mary Alice; Neiderstadt, Jacky; Vap, Alexander R.; Vallier, Heather A.; Patterson, Brendan M.; Wilber, John H.; Wilber, Roger G.; Sontich, John K.; Moore, Timothy Alan; Brady, Drew; Cooperman, Daniel R.; Davis, John A.; Cureton, Beth Ann; Mandel, Scott; Orr, R. Douglas; Sadler, John T. S.; Hussain, Tousief; Rajaratnam, Krishan; Petrisor, Bradley; Drew, Brian; Bednar, Drew A.; Kwok, Desmond C. H.; Pettit, Shirley; Hancock, Jill; Cole, Peter A.; Smith, Joel J.; Brown, Gregory A.; Lange, Thomas A.; Stark, John G.; Levy, Bruce; Swiontkowski, Marc F.; Garaghty, Mary J.; Salzman, Joshua G.; Schutte, Carol A.; Tastad, Linda Toddie; Vang, Sandy; Seligson, David; Roberts, Craig S.; Malkani, Arthur L.; Sanders, Laura; Gregory, Sharon Allen; Dyer, Carmen; Heinsen, Jessica; Smith, Langan; Madanagopal, Sudhakar; Coupe, Kevin J.; Tucker, Jeffrey J.; Criswell, Allen R.; Buckle, Rosemary; Rechter, Alan Jeffrey; Sheth, Dhiren Shaskikant; Urquart, Brad; Trotscher, Thea; Anders, Mark J.; Kowalski, Joseph M.; Fineberg, Marc S.; Bone, Lawrence B.; Phillips, Matthew J.; Rohrbacher, Bernard; Stegemann, Philip; Mihalko, William M.; Buyea, Cathy; Augustine, Stephen J.; Jackson, William Thomas; Solis, Gregory; Ero, Sunday U.; Segina, Daniel N.; Berrey, Hudson B.; Agnew, Samuel G.; Fitzpatrick, Michael; Campbell, Lakina C.; Derting, Lynn; McAdams, June; Goslings, J. Carel; Ponsen, Kees Jan; Luitse, Jan; Kloen, Peter; Joosse, Pieter; Winkelhagen, Jasper; Duivenvoorden, Raphaël; Teague, David C.; Davey, Joseph; Sullivan, J. Andy; Ertl, William J. J.; Puckett, Timothy A.; Pasque, Charles B.; Tompkins, John F.; Gruel, Curtis R.; Kammerlocher, Paul; Lehman, Thomas P.; Puffinbarger, William R.; Carl, Kathy L.; Weber, Donald W.; Jomha, Nadr M.; Goplen, Gordon R.; Masson, Edward; Beaupre, Lauren A.; Greaves, Karen E.; Schaump, Lori N.; Jeray, Kyle J.; Goetz, David R.; Westberry, Davd E.; Broderick, J. Scott; Moon, Bryan S.; Tanner, Stephanie L.; Powell, James N.; Buckley, Richard E.; Elves, Leslie; Connolly, Stephen; Abraham, Edward P.; Eastwood, Donna; Steele, Trudy; Ellis, Thomas; Herzberg, Alex; Brown, George A.; Crawford, Dennis E.; Hart, Robert; Hayden, James; Orfaly, Robert M.; Vigland, Theodore; Vivekaraj, Maharani; Bundy, Gina L.; Miclau, Theodore; Matityahu, Amir; Coughlin, R. Richard; Kandemir, Utku; McClellan, R. Trigg; Lin, Cindy Hsin-Hua; Karges, David; Cramer, Kathryn; Watson, J. Tracy; Moed, Berton; Scott, Barbara; Beck, Dennis J.; Orth, Carolyn; Puskas, David; Clark, Russell; Jones, Jennifer; Egol, Kenneth A.; Paksima, Nader; France, Monet; Wai, Eugene K.; Johnson, Garth; Wilkinson, Ross; Gruszczynski, Adam T.; Vexler, Liisa
2013-01-01
Inadequate sample size and power in randomized trials can result in misleading findings. This study demonstrates the effect of sample size in a large clinical trial by evaluating the results of the Study to Prospectively evaluate Reamed Intramedullary Nails in Patients with Tibial fractures (SPRINT)
Estimation of river and stream temperature trends under haphazard sampling
Gray, Brian R.; Lyubchich, Vyacheslav; Gel, Yulia R.; Rogala, James T.; Robertson, Dale M.; Wei, Xiaoqiao
2015-01-01
Long-term temporal trends in water temperature in rivers and streams are typically estimated under the assumption of evenly-spaced space-time measurements. However, sampling times and dates associated with historical water temperature datasets and some sampling designs may be haphazard. As a result, trends in temperature may be confounded with trends in time or space of sampling which, in turn, may yield biased trend estimators and thus unreliable conclusions. We address this concern using multilevel (hierarchical) linear models, where time effects are allowed to vary randomly by day and date effects by year. We evaluate the proposed approach by Monte Carlo simulations with imbalance, sparse data and confounding by trend in time and date of sampling. Simulation results indicate unbiased trend estimators while results from a case study of temperature data from the Illinois River, USA conform to river thermal assumptions. We also propose a new nonparametric bootstrap inference on multilevel models that allows for a relatively flexible and distribution-free quantification of uncertainties. The proposed multilevel modeling approach may be elaborated to accommodate nonlinearities within days and years when sampling times or dates typically span temperature extremes.
DEFF Research Database (Denmark)
Jimenez Mena, Belen; Verrier, Etienne; Hospital, Frederic
We performed a simulation study of several estimators of the effective population size (Ne): NeH = estimator based on the rate of decrease in heterozygosity; NeT = estimator based on the temporal method; NeLD = linkage disequilibrium-based method. We first focused on NeH, which presented...... under scenarios of 3 and 20 bi-allelic loci. Increasing the number of loci largely improved the performance of NeT and NeLD. We highlight the value of NeT and NeLD when large numbers of bi-allelic loci are available, which is nowadays the case for SNPs markers....... an increase in the variability of values over time. The distance from the mean and the median to the true Ne increased over time too. This was caused by the fixation of alleles through time due to genetic drift and the changes in the distribution of allele frequencies. We compared the three estimators of Ne...
A Method of Selecting the Block Size of BMM for Estimating Extreme Loads in Engineering Vehicles
Directory of Open Access Journals (Sweden)
Jixin Wang
2016-01-01
Full Text Available Extreme loads have a significant effect on the fatigue damage of components. The block maximum method (BMM is widely used to estimate extreme values in various fields. Selecting a reasonable block size for BMM is crucial to ensure that proper extreme values are extracted to get extreme sample to estimate extreme values. Aiming at this issue, this study proposed a comprehensive evaluation approach based on multiple-criteria decision making (MCDM method to select a proper block size. A wheel loader with six sections in one operating cycle was illustrated as an example. First, spading sections of each operating cycle were extracted and connected as extreme loads often occur at that section. Then extreme sample was obtained by BMM for fitting the generalized extreme value (GEV distribution. Kolmogorov-Smirnov (K-S test, Pearson’s Chi-Square (χ2 test, and average deviation in Probability Distribution Function (PDF are selected as the fitting test. The comprehensive weights are calculated by the maximum entropy principle. Finally, the optimal block size corresponding to the minimum comprehensive evaluation indicator is obtained and the result exhibited a good fitting effect. The proposed method can also be flexibly used in various situations to select a block size.
Prasifka, Jarrad R; Lopez, Miriam D; Hellmich, Richard L; Prasifka, Patricia L
2008-01-01
Estimates of arthropod population size may paradoxically increase following insecticide applications. Research with ground beetles (Coleoptera: Carabidae) suggests that such unusual results reflect increased arthropod movement and capture in traps rather than real changes in population size. However, it is unclear whether direct (hyperactivity) or indirect (prey-mediated) mechanisms produce increased movement. Video tracking of Scarites quadriceps Chaudior indicated that brief exposure to lambda-cyhalothrin or tefluthrin increased total distance moved, maximum velocity and percentage of time moving. Repeated measurements on individual beetles indicated that movement decreased 240 min after initial lambda-cyhalothrin exposure, but increased again following a second exposure, suggesting hyperactivity could lead to increased trap captures in the field. Two field experiments in which ground beetles were collected after lambda-cyhalothrin or permethrin application attempted to detect increases in population size estimates as a result of hyperactivity. Field trials used mark-release-recapture methods in small plots and natural carabid populations in larger plots, but found no significant short-term (field results suggests mechanisms other than hyperactivity may better explain unusual changes in population size estimates. When traps are used as a primary sampling tool, unexpected population-level effects should be interpreted carefully or with additional data less influenced by arthropod activity.
PHENIX: An R package to estimate a size-controlled phenotypic integration index.
Torices, Rubén; Muñoz-Pajares, A Jesús
2015-05-01
Organisms usually show intercorrelations between all or some of their components leading to phenotypic integration, which may have deep consequences on the evolution of phenotypes. One of the main difficulties with phenotypic integration studies is how to correct the integration measures for size. This has been considered a challenging task. In this paper, we introduce an R package (PHENIX: PHENotypic Integration indeX), in which we provide functions to estimate a size-controlled phenotypic integration index, a bootstrapping method to calculate confidence intervals, and a randomization method to simulate null distributions and test the statistical significance of the integration. PHENIX is an open source package written in R. As usual for R packages, the manual and sample data are available at: http://cran.r-project.org/web/packages/PHENIX/index.html. Functions included in this package easily estimate phenotypic integration by controlling a third variable (e.g., the size of the studied organ). PHENIX helps to estimate and test the statistical significance of the magnitude of integration using one of the most-used methodological approaches, while taking size into account.
Estimation of monosaccharide radioactivity in biological samples through osazone derivatization
Energy Technology Data Exchange (ETDEWEB)
Garcia, F.J.; Pons, A.; Alemany, M.; Palou, A.
1982-03-01
A method for the quantitative estimation of radioactivity in the glucose (monosaccharide) fraction of biological samples is presented. Radioactive samples are added with cold glucose, and 1 aliquot receives a known amount of radioactive glucose as internal standard. After controlled osazone formation and three washings of the yellow precipitate, the osazones are dissolved, decolored, and their radioactivity determined through scintillation counting. The overall efficiency of recovery is 23-24% of the initial readioactivity. Each sample is corrected by the recovery of its own internal standard. There is a very close linear relationship between radioactivity present in the samples and radioactivity found, despite the use of different biological samples (rat plasma, hen egg yolk and albumen).
Size variation in samples of fossil and recent murid teeth
Freudenthal, M.; Martín Suárez, E.
1990-01-01
The variability coefficient proposed by Freudenthal & Cuenca Bescós (1984) for samples of fossil cricetid teeth, is calculated for about 200 samples of fossil and recent murid teeth. The results are discussed, and compared with those obtained for the Cricetidae.
Forest inventory using multistage sampling with probability proportional to size. [Brazil
Parada, N. D. J. (Principal Investigator); Lee, D. C. L.; Hernandezfilho, P.; Shimabukuro, Y. E.; Deassis, O. R.; Demedeiros, J. S.
1984-01-01
A multistage sampling technique, with probability proportional to size, for forest volume inventory using remote sensing data is developed and evaluated. The study area is located in the Southeastern Brazil. The LANDSAT 4 digital data of the study area are used in the first stage for automatic classification of reforested areas. Four classes of pine and eucalypt with different tree volumes are classified utilizing a maximum likelihood classification algorithm. Color infrared aerial photographs are utilized in the second stage of sampling. In the third state (ground level) the time volume of each class is determined. The total time volume of each class is expanded through a statistical procedure taking into account all the three stages of sampling. This procedure results in an accurate time volume estimate with a smaller number of aerial photographs and reduced time in field work.
SHEAR: sample heterogeneity estimation and assembly by reference.
Landman, Sean R; Hwang, Tae Hyun; Silverstein, Kevin A T; Li, Yingming; Dehm, Scott M; Steinbach, Michael; Kumar, Vipin
2014-01-29
Personal genome assembly is a critical process when studying tumor genomes and other highly divergent sequences. The accuracy of downstream analyses, such as RNA-seq and ChIP-seq, can be greatly enhanced by using personal genomic sequences rather than standard references. Unfortunately, reads sequenced from these types of samples often have a heterogeneous mix of various subpopulations with different variants, making assembly extremely difficult using existing assembly tools. To address these challenges, we developed SHEAR (Sample Heterogeneity Estimation and Assembly by Reference; http://vk.cs.umn.edu/SHEAR), a tool that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis. By making use of structural variant detection algorithms, SHEAR offers improved performance in the form of a stronger ability to handle difficult structural variant types and better computational efficiency. We compare against the lead competing approach using a variety of simulated scenarios as well as real tumor cell line data with known heterogeneous variants. SHEAR is shown to successfully estimate heterogeneity percentages in both cases, and demonstrates an improved efficiency and better ability to handle tandem duplications. SHEAR allows for accurate and efficient SV detection and personal genomic sequence generation. It is also able to account for heterogeneous sequencing samples, such as from tumor tissue, by estimating the subpopulation percentage for each heterogeneous variant.
Software Functional Size: For Cost Estimation and More
Ozkan, Baris; Turetken, Oktay; Demirors, Onur
Determining software characteristics that will effectively support project planning, execution, monitoring and closure remains to be one of the prevalent challenges software project managers face. Functional size measures were introduced to quantify one of the primary characteristics of software. Although functional size measurement methods have not been without criticisms, they have significant promises for software project management. In this paper, we explore the contributions of functional size measurement to project management. We identified diverse uses of functional size by performing a literature survey and investigating how functional size measurement can be incorporated into project management practices by mapping the uses of functional size to the knowledge areas defined in project management body of knowledge (PMBOK).
Condori-Fernandez, Nelly; Daneva, Maia; Buglione, Luigi; Ormandjieva, Olga; Ormandjieva, O.; Constantinides, C.; Abran, A.; Lee, R.
2010-01-01
This paper reports on an experiment that investigates the predictability of software project size from software product size. The predictability research problem is analyzed at the stage of early requirements by accounting the size of functional requirements as well as the size of non-functional
Sample Size for the "Z" Test and Its Confidence Interval
Liu, Xiaofeng Steven
2012-01-01
The statistical power of a significance test is closely related to the length of the confidence interval (i.e. estimate precision). In the case of a "Z" test, the length of the confidence interval can be expressed as a function of the statistical power. (Contains 1 figure and 1 table.)
Steinsbekk, Silje; Klöckner, Christian A; Fildes, Alison; Kristoffersen, Pernille; Rognsås, Stine L; Wichstrøm, Lars
2017-01-01
Individuals who are overweight are more likely to underestimate their body size than those who are normal weight, and overweight underestimators are less likely to engage in weight loss efforts. Underestimation of body size might represent a barrier to prevention and treatment of overweight; thus insight in how underestimation of body size develops and tracks through the childhood years is needed. The aim of the present study was therefore to examine stability in children's underestimation of body size, exploring predictors of underestimation over time. The prospective path from underestimation to BMI was also tested. In a Norwegian cohort of 6 year olds, followed up at ages 8 and 10 (analysis sample: n = 793) body size estimation was captured by the Children's Body Image Scale, height and weight were measured and BMI calculated. Overall, children were more likely to underestimate than overestimate their body size. Individual stability in underestimation was modest, but significant. Higher BMI predicted future underestimation, even when previous underestimation was adjusted for, but there was no evidence for the opposite direction of influence. Boys were more likely than girls to underestimate their body size at ages 8 and 10 (age 8: 38.0% vs. 24.1%; Age 10: 57.9% vs. 30.8%) and showed a steeper increase in underestimation with age compared to girls. In conclusion, the majority of 6, 8, and 10-year olds correctly estimate their body size (prevalence ranging from 40 to 70% depending on age and gender), although a substantial portion perceived themselves to be thinner than they actually were. Higher BMI forecasted future underestimation, but underestimation did not increase the risk for excessive weight gain in middle childhood.
Directory of Open Access Journals (Sweden)
Silje Steinsbekk
2017-11-01
Full Text Available Individuals who are overweight are more likely to underestimate their body size than those who are normal weight, and overweight underestimators are less likely to engage in weight loss efforts. Underestimation of body size might represent a barrier to prevention and treatment of overweight; thus insight in how underestimation of body size develops and tracks through the childhood years is needed. The aim of the present study was therefore to examine stability in children’s underestimation of body size, exploring predictors of underestimation over time. The prospective path from underestimation to BMI was also tested. In a Norwegian cohort of 6 year olds, followed up at ages 8 and 10 (analysis sample: n = 793 body size estimation was captured by the Children’s Body Image Scale, height and weight were measured and BMI calculated. Overall, children were more likely to underestimate than overestimate their body size. Individual stability in underestimation was modest, but significant. Higher BMI predicted future underestimation, even when previous underestimation was adjusted for, but there was no evidence for the opposite direction of influence. Boys were more likely than girls to underestimate their body size at ages 8 and 10 (age 8: 38.0% vs. 24.1%; Age 10: 57.9% vs. 30.8% and showed a steeper increase in underestimation with age compared to girls. In conclusion, the majority of 6, 8, and 10-year olds correctly estimate their body size (prevalence ranging from 40 to 70% depending on age and gender, although a substantial portion perceived themselves to be thinner than they actually were. Higher BMI forecasted future underestimation, but underestimation did not increase the risk for excessive weight gain in middle childhood.
Sample size calculations for randomised trials including both independent and paired data.
Yelland, Lisa N; Sullivan, Thomas R; Price, David J; Lee, Katherine J
2017-04-15
Randomised trials including a mixture of independent and paired data arise in many areas of health research, yet methods for determining the sample size for such trials are lacking. We derive design effects algebraically assuming clustering because of paired data will be taken into account in the analysis using generalised estimating equations with either an independence or exchangeable working correlation structure. Continuous and binary outcomes are considered, along with three different methods of randomisation: cluster randomisation, individual randomisation and randomisation to opposite treatment groups. The design effect is shown to depend on the intracluster correlation coefficient, proportion of observations belonging to a pair, working correlation structure, type of outcome and method of randomisation. The derived design effects are validated through simulation and example calculations are presented to illustrate their use in sample size planning. These design effects will enable appropriate sample size calculations to be performed for future randomised trials including both independent and paired data. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
A Bayesian adaptive blinded sample size adjustment method for risk differences.
Hartley, Andrew Montgomery
2015-01-01
Adaptive sample size adjustment (SSA) for clinical trials consists of examining early subsets of on trial data to adjust estimates of sample size requirements. Blinded SSA is often preferred over unblinded SSA because it obviates many logistical complications of the latter and generally introduces less bias. On the other hand, current blinded SSA methods for binary data offer little to no new information about the treatment effect, ignore uncertainties associated with the population treatment proportions, and/or depend on enhanced randomization schemes that risk partial unblinding. I propose an innovative blinded SSA method for use when the primary analysis is a non-inferiority or superiority test regarding a risk difference. The method incorporates evidence about the treatment effect via the likelihood function of a mixture distribution. I compare the new method with an established one and with the fixed sample size study design, in terms of maximization of an expected utility function. The new method maximizes the expected utility better than do the comparators, under a range of assumptions. I illustrate the use of the proposed method with an example that incorporates a Bayesian hierarchical model. Lastly, I suggest topics for future study regarding the proposed methods. Copyright © 2015 John Wiley & Sons, Ltd.
Fischer, Jesse R.; Quist, Michael C.
2014-01-01
All freshwater fish sampling methods are biased toward particular species, sizes, and sexes and are further influenced by season, habitat, and fish behavior changes over time. However, little is known about gear-specific biases for many common fish species because few multiple-gear comparison studies exist that have incorporated seasonal dynamics. We sampled six lakes and impoundments representing a diversity of trophic and physical conditions in Iowa, USA, using multiple gear types (i.e., standard modified fyke net, mini-modified fyke net, sinking experimental gill net, bag seine, benthic trawl, boat-mounted electrofisher used diurnally and nocturnally) to determine the influence of sampling methodology and season on fisheries assessments. Specifically, we describe the influence of season on catch per unit effort, proportional size distribution, and the number of samples required to obtain 125 stock-length individuals for 12 species of recreational and ecological importance. Mean catch per unit effort generally peaked in the spring and fall as a result of increased sampling effectiveness in shallow areas and seasonal changes in habitat use (e.g., movement offshore during summer). Mean proportional size distribution decreased from spring to fall for white bass Morone chrysops, largemouth bass Micropterus salmoides, bluegill Lepomis macrochirus, and black crappie Pomoxis nigromaculatus, suggesting selectivity for large and presumably sexually mature individuals in the spring and summer. Overall, the mean number of samples required to sample 125 stock-length individuals was minimized in the fall with sinking experimental gill nets, a boat-mounted electrofisher used at night, and standard modified nets for 11 of the 12 species evaluated. Our results provide fisheries scientists with relative comparisons between several recommended standard sampling methods and illustrate the effects of seasonal variation on estimates of population indices that will be critical to
Huan, Xi-ping; Bao, Shui-lian; Yang, Hai-tao; Xu, Jin-shui; Qiu, Tao; Zhang, Xiang; Pan, Long; Zhu, Zhong-kui; Guo, Wei; Wang, Lu
2013-03-01
To estimate the size of female sex workers and clients in Taizhou city. A household survey using network scale-up method (NSUM) was conducted among the 3000 community residents in Taizhou city from August to October in 2011, which aimed to estimate the social network size (c value) of Taizhou residents, and the c value was adjusted by demographic characteristics, back estimation and outlier elimination. Using the adjusted c value, the number of acquaintance of female sex workers or clients and the respect level toward female sex workers or clients were used to estimate the size of female sex workers and clients. A total of 2783 valid questionnaires were collected, among which 1380 (49.6%) were collected from Taixing city, 1403 (50.4%) were collected from Jingjiang city. 1334 respondents were male (47.9%) and 1449 (47.9%) respondents were female. The mean age was (39.4 ± 10.7) years. The average personal social network size using original data for Taizhou residents was 525, which differed from place, sex, age, educational level and marriage status. Using the remaining known populations through back estimation, the social network size was 419 and became 424 after the elimination of outliers. The estimated population size for female sex worker was 6370 (95%CI: 5886 - 6853), which accounted for 0.52% (6370/1 229 980) of the total number of female aged from 15 to 49. The estimated population size for clients was 15 202 (95%CI: 14 560 - 15 847), which accounted for 1.28% (15 202/1 190 340) of the total number of males aged from 15 to 49 and the ration of clients to female sex worker was 2.39:1. NSUM is an easy and quick way to estimate the size of female sex workers or clients, but the estimated sizes are subject to bias and error due to estimate effect and sample representativeness.
Estimation of Food Guide Pyramid Serving Sizes by College Students.
Knaust, Gretchen; Foster, Irene M.
2000-01-01
College students (n=158) used the Food Guide Pyramid to select serving sizes on a questionnaire (73% had been instructed in its use). Overall mean scores (31% correct) indicated they generally did not know recommended serving sizes. Those who had read about or received instruction in the pyramid had higher mean scores. (SK)
Inferring Saving in Training Time From Effect Size Estimates
National Research Council Canada - National Science Library
Burright, Burke
2000-01-01
.... Students' time saving represents a major potential benefit of using them. This paper fills a methodology gap in estimating the students' timesaving benefit of asynchronous training technologies...
On the Estimation of Detection Probabilities for Sampling Stream-Dwelling Fishes.
Energy Technology Data Exchange (ETDEWEB)
Peterson, James T.
1999-11-01
To examine the adequacy of fish probability of detection estimates, I examined distributional properties of survey and monitoring data for bull trout (Salvelinus confluentus), brook trout (Salvelinus fontinalis), westslope cutthroat trout (Oncorhynchus clarki lewisi), chinook salmon parr (Oncorhynchus tshawytscha), and steelhead /redband trout (Oncorhynchus mykiss spp.), from 178 streams in the Interior Columbia River Basin. Negative binomial dispersion parameters varied considerably among species and streams, but were significantly (P<0.05) positively related to fish density. Across streams, the variances in fish abundances differed greatly among species and indicated that the data for all species were overdispersed with respect to the Poisson (i.e., the variances exceeded the means). This significantly affected Poisson probability of detection estimates, which were the highest across species and were, on average, 3.82, 2.66, and 3.47 times greater than baseline values. Required sample sizes for species detection at the 95% confidence level were also lowest for the Poisson, which underestimated sample size requirements an average of 72% across species. Negative binomial and Poisson-gamma probability of detection and sample size estimates were more accurate than the Poisson and generally less than 10% from baseline values. My results indicate the Poisson and binomial assumptions often are violated, which results in probability of detection estimates that are biased high and sample size estimates that are biased low. To increase the accuracy of these estimates, I recommend that future studies use predictive distributions than can incorporate multiple sources of uncertainty or excess variance and that all distributional assumptions be explicitly tested.
Abundance Estimation of Hyperspectral Data with Low Compressive Sampling Rate
Wang, Zhongliang; Feng, Yan
2017-12-01
Hyperspectral data processing typically demands enormous computational resources in terms of storage, computation, and I/O throughputs. In this paper, a compressive sensing framework with low sampling rate is described for hyperspectral imagery. It is based on the widely used linear spectral mixture model. Abundance fractions can be calculated directly from compressively sensed data with no need to reconstruct original hyperspectral imagery. The proposed abundance estimation model is based on the sparsity of abundance fractions and an alternating direction method of multipliers is developed to solve this model. Experiments show that the proposed scheme has a high potential to unmix compressively sensed hyperspectral data with low sampling rate.
Modeling extreme events: Sample fraction adaptive choice in parameter estimation
Neves, Manuela; Gomes, Ivette; Figueiredo, Fernanda; Gomes, Dora Prata
2012-09-01
When modeling extreme events there are a few primordial parameters, among which we refer the extreme value index and the extremal index. The extreme value index measures the right tail-weight of the underlying distribution and the extremal index characterizes the degree of local dependence in the extremes of a stationary sequence. Most of the semi-parametric estimators of these parameters show the same type of behaviour: nice asymptotic properties, but a high variance for small values of k, the number of upper order statistics to be used in the estimation, and a high bias for large values of k. This shows a real need for the choice of k. Choosing some well-known estimators of those parameters we revisit the application of a heuristic algorithm for the adaptive choice of k. The procedure is applied to some simulated samples as well as to some real data sets.
El Allaki, Farouk; Christensen, Jette; Vallières, André
2015-06-01
The study objectives were (1) to conduct a systematic review of the performance of capture-recapture methods; (2) to use empirical data to estimate population size in a small-sized population (turkey breeder farms) and a medium-sized population (meat turkey farms) by applying two-source capture-recapture methods (the Lincoln-Petersen, the Chapman, and Chao's lower-bound estimators) and multi-source capture-recapture methods (the log-linear modeling and sample coverage approaches); and (3) to compare the performance of these methods in predicting the true population sizes (2007 data). Our set-up was unique in that we knew the population sizes for turkey breeder farms (99) and meat turkey farms (592) in Canada in 2007, which we applied as our true population sizes, and had surveillance data from the Canadian Notifiable Avian Influenza Surveillance System (2008-2012). We defined each calendar year of sampling as a data source. We confirmed that the two-source capture-recapture methods were sensitive to the violation of the local independence assumption. The log-linear modeling and sample coverage approaches yielded estimates that were closer to the true population sizes than were the estimates provided by the two-source methods for both populations. The performance of both multi-source capture-recapture methods depended on the number of data sources analyzed and the size of the population. Simulation studies are recommended to better understand the limits of each multi-source capture-recapture method. Crown Copyright © 2014. Published by Elsevier B.V. All rights reserved.
Directory of Open Access Journals (Sweden)
Mehmet Öztürk
2017-07-01
Full Text Available Backscatter output from a 10 MHz acoustic Doppler velocimeter (ADV was used to quantify suspended sediment concentrations in a laboratory setting using sand-sized particles. The experiments included (a well-sorted sand samples ranging in size from 0.112 to 0.420 mm, obtained by the sieving of construction sand, (b different, known mixtures of these well-sorted fractions, and (c sieved natural beach sand with median sizes ranging from 0.112 to 0.325 mm. The tested concentrations ranged from 25 to 3000 mg•L−1. The backscatter output was empirically related to concentration and sediment size, and when non-dimensionalized by acoustic wavelength, a dimensionless sediment size gradation coefficient. Size-dependent upper and lower bounds on measurable concentrations were also established empirically. The range of measurable conditions is broad enough to make the approach useful for sand sizes and concentrations commonly encountered in nature. A new method is proposed to determine concentrations in cases of mixed-size sediment suspensions when only calibration data for well-sorted constituent sands are available. This approach could potentially allow better estimates when the suspended load is derived from but is not fully representative of the bed material, and when the size characteristics of the suspended material are varying in time over the period of interest. Differences in results between the construction and beach sands suggest that sediment shape may also need to be considered, and point to the importance of calibrating to sediments encountered at the site of interest.
Curtis L. VanderSchaaf; Harold E. Burkhart
2010-01-01
Maximum size-density relationships (MSDR) provide natural resource managers useful information about the relationship between tree density and average tree size. Obtaining a valid estimate of how maximum tree density changes as average tree size changes is necessary to accurately describe these relationships. This paper examines three methods to estimate the slope of...
Practical Approaches For Determination Of Sample Size In Paired Case-Control Studies
Demirel, Neslihan; Ozlem EGE ORUC; Gurler, Selma
2016-01-01
Objective: Cross-over design or paired case control studies that are using in clinical studies are the methods of design of experiments which requires dependent samples. The problem of sample size determination is generally difficult step of planning the statistical design. The aim of this study is to provide the researchers a practical approach for determining the sample size in paired control studies. Material and Methods: In this study, determination of sample size is mentioned in detail i...
Ryskin, Rachel A.; Sarah Brown-Schmidt
2014-01-01
Seven experiments use large sample sizes to robustly estimate the effect size of a previous finding that adults are more likely to commit egocentric errors in a false-belief task when the egocentric response is plausible in light of their prior knowledge. We estimate the true effect size to be less than half of that reported in the original findings. Even though we found effects in the same direction as the original, they were substantively smaller; the original study would have had less than...
Sampling of systematic errors to estimate likelihood weights in nuclear data uncertainty propagation
Helgesson, P.; Sjöstrand, H.; Koning, A. J.; Rydén, J.; Rochman, D.; Alhassan, E.; Pomp, S.
2016-01-01
In methodologies for nuclear data (ND) uncertainty assessment and propagation based on random sampling, likelihood weights can be used to infer experimental information into the distributions for the ND. As the included number of correlated experimental points grows large, the computational time for the matrix inversion involved in obtaining the likelihood can become a practical problem. There are also other problems related to the conventional computation of the likelihood, e.g., the assumption that all experimental uncertainties are Gaussian. In this study, a way to estimate the likelihood which avoids matrix inversion is investigated; instead, the experimental correlations are included by sampling of systematic errors. It is shown that the model underlying the sampling methodology (using univariate normal distributions for random and systematic errors) implies a multivariate Gaussian for the experimental points (i.e., the conventional model). It is also shown that the likelihood estimates obtained through sampling of systematic errors approach the likelihood obtained with matrix inversion as the sample size for the systematic errors grows large. In studied practical cases, it is seen that the estimates for the likelihood weights converge impractically slowly with the sample size, compared to matrix inversion. The computational time is estimated to be greater than for matrix inversion in cases with more experimental points, too. Hence, the sampling of systematic errors has little potential to compete with matrix inversion in cases where the latter is applicable. Nevertheless, the underlying model and the likelihood estimates can be easier to intuitively interpret than the conventional model and the likelihood function involving the inverted covariance matrix. Therefore, this work can both have pedagogical value and be used to help motivating the conventional assumption of a multivariate Gaussian for experimental data. The sampling of systematic errors could also
Sampling of systematic errors to estimate likelihood weights in nuclear data uncertainty propagation
Energy Technology Data Exchange (ETDEWEB)
Helgesson, P., E-mail: petter.helgesson@physics.uu.se [Department of Physics and Astronomy, Uppsala University, Box 516, 751 20 Uppsala (Sweden); Nuclear Research and Consultancy Group NRG, Petten (Netherlands); Sjöstrand, H. [Department of Physics and Astronomy, Uppsala University, Box 516, 751 20 Uppsala (Sweden); Koning, A.J. [Nuclear Research and Consultancy Group NRG, Petten (Netherlands); Department of Physics and Astronomy, Uppsala University, Box 516, 751 20 Uppsala (Sweden); Rydén, J. [Department of Mathematics, Uppsala University, Uppsala (Sweden); Rochman, D. [Paul Scherrer Institute PSI, Villigen (Switzerland); Alhassan, E.; Pomp, S. [Department of Physics and Astronomy, Uppsala University, Box 516, 751 20 Uppsala (Sweden)
2016-01-21
In methodologies for nuclear data (ND) uncertainty assessment and propagation based on random sampling, likelihood weights can be used to infer experimental information into the distributions for the ND. As the included number of correlated experimental points grows large, the computational time for the matrix inversion involved in obtaining the likelihood can become a practical problem. There are also other problems related to the conventional computation of the likelihood, e.g., the assumption that all experimental uncertainties are Gaussian. In this study, a way to estimate the likelihood which avoids matrix inversion is investigated; instead, the experimental correlations are included by sampling of systematic errors. It is shown that the model underlying the sampling methodology (using univariate normal distributions for random and systematic errors) implies a multivariate Gaussian for the experimental points (i.e., the conventional model). It is also shown that the likelihood estimates obtained through sampling of systematic errors approach the likelihood obtained with matrix inversion as the sample size for the systematic errors grows large. In studied practical cases, it is seen that the estimates for the likelihood weights converge impractically slowly with the sample size, compared to matrix inversion. The computational time is estimated to be greater than for matrix inversion in cases with more experimental points, too. Hence, the sampling of systematic errors has little potential to compete with matrix inversion in cases where the latter is applicable. Nevertheless, the underlying model and the likelihood estimates can be easier to intuitively interpret than the conventional model and the likelihood function involving the inverted covariance matrix. Therefore, this work can both have pedagogical value and be used to help motivating the conventional assumption of a multivariate Gaussian for experimental data. The sampling of systematic errors could also
Calculating sample sizes for cluster randomized trials: we can keep it simple and efficient !
van Breukelen, Gerard J.P.; Candel, Math J.J.M.
2012-01-01
Objective: Simple guidelines for efficient sample sizes in cluster randomized trials with unknown intraclass correlation and varying cluster sizes. Methods: A simple equation is given for the optimal number of clusters and sample size per cluster. Here, optimal means maximizing power for a given
Chattopadhyay, Bhargab; Kelley, Ken
2016-01-01
The coefficient of variation is an effect size measure with many potential uses in psychology and related disciplines. We propose a general theory for a sequential estimation of the population coefficient of variation that considers both the sampling error and the study cost, importantly without specific distributional assumptions. Fixed sample size planning methods, commonly used in psychology and related fields, cannot simultaneously minimize both the sampling error and the study cost. The sequential procedure we develop is the first sequential sampling procedure developed for estimating the coefficient of variation. We first present a method of planning a pilot sample size after the research goals are specified by the researcher. Then, after collecting a sample size as large as the estimated pilot sample size, a check is performed to assess whether the conditions necessary to stop the data collection have been satisfied. If not an additional observation is collected and the check is performed again. This process continues, sequentially, until a stopping rule involving a risk function is satisfied. Our method ensures that the sampling error and the study costs are considered simultaneously so that the cost is not higher than necessary for the tolerable sampling error. We also demonstrate a variety of properties of the distribution of the final sample size for five different distributions under a variety of conditions with a Monte Carlo simulation study. In addition, we provide freely available functions via the MBESS package in R to implement the methods discussed.
DEFF Research Database (Denmark)
Engemann, Kristine; Enquist, Brian J.; Sandel, Brody Steven
2015-01-01
Macro-scale species richness studies often use museum specimens as their main source of information. However, such datasets are often strongly biased due to variation in sampling effort in space and time. These biases may strongly affect diversity estimates and may, thereby, obstruct solid...... in Ecuador, one of the most species-rich and climatically heterogeneous biodiversity hotspots. Species richness estimates were calculated based on 205,735 georeferenced specimens of 15,788 species using the Margalef diversity index, the Chao estimator, the second-order Jackknife and Bootstrapping resampling...... inference on the underlying diversity drivers, as well as mislead conservation prioritization. In recent years, this has resulted in an increased focus on developing methods to correct for sampling bias. In this study, we use sample-size-correcting methods to examine patterns of tropical plant diversity...
Sample size and power determination when limited preliminary information is available
Directory of Open Access Journals (Sweden)
Christine E. McLaren
2017-04-01
Full Text Available Abstract Background We describe a novel strategy for power and sample size determination developed for studies utilizing investigational technologies with limited available preliminary data, specifically of imaging biomarkers. We evaluated diffuse optical spectroscopic imaging (DOSI, an experimental noninvasive imaging technique that may be capable of assessing changes in mammographic density. Because there is significant evidence that tamoxifen treatment is more effective at reducing breast cancer risk when accompanied by a reduction of breast density, we designed a study to assess the changes from baseline in DOSI imaging biomarkers that may reflect fluctuations in breast density in premenopausal women receiving tamoxifen. Method While preliminary data demonstrate that DOSI is sensitive to mammographic density in women about to receive neoadjuvant chemotherapy for breast cancer, there is no information on DOSI in tamoxifen treatment. Since the relationship between magnetic resonance imaging (MRI and DOSI has been established in previous studies, we developed a statistical simulation approach utilizing information from an investigation of MRI assessment of breast density in 16 women before and after treatment with tamoxifen to estimate the changes in DOSI biomarkers due to tamoxifen. Results Three sets of 10,000 pairs of MRI breast density data with correlation coefficients of 0.5, 0.8 and 0.9 were simulated and generated and were used to simulate and generate a corresponding 5,000,000 pairs of DOSI values representing water, ctHHB, and lipid. Minimum sample sizes needed per group for specified clinically-relevant effect sizes were obtained. Conclusion The simulation techniques we describe can be applied in studies of other experimental technologies to obtain the important preliminary data to inform the power and sample size calculations.
Increasing fMRI sampling rate improves Granger causality estimates.
Directory of Open Access Journals (Sweden)
Fa-Hsuan Lin
Full Text Available Estimation of causal interactions between brain areas is necessary for elucidating large-scale functional brain networks underlying behavior and cognition. Granger causality analysis of time series data can quantitatively estimate directional information flow between brain regions. Here, we show that such estimates are significantly improved when the temporal sampling rate of functional magnetic resonance imaging (fMRI is increased 20-fold. Specifically, healthy volunteers performed a simple visuomotor task during blood oxygenation level dependent (BOLD contrast based whole-head inverse imaging (InI. Granger causality analysis based on raw InI BOLD data sampled at 100-ms resolution detected the expected causal relations, whereas when the data were downsampled to the temporal resolution of 2 s typically used in echo-planar fMRI, the causality could not be detected. An additional control analysis, in which we SINC interpolated additional data points to the downsampled time series at 0.1-s intervals, confirmed that the improvements achieved with the real InI data were not explainable by the increased time-series length alone. We therefore conclude that the high-temporal resolution of InI improves the Granger causality connectivity analysis of the human brain.
Cornelissen, Katri K; Bester, Andre; Cairns, Paul; Tovée, Martin J; Cornelissen, Piers L
2015-03-01
In this cross-sectional study, we investigated the influence of personal BMI on body size estimation in 42 women who have symptoms of anorexia (referred to henceforth as anorexia spectrum disorders, ANSD), and 100 healthy controls. Low BMI control participants over-estimate their size and high BMI controls under-estimate, a pattern which is predicted by a perceptual phenomenon called contraction bias. In addition, control participants' sensitivity to size change declines as their BMI increases as predicted by Weber's law. The responses of women with ANSD are very different. Low BMI participants who have ANSD are extremely accurate at estimating body size and are very sensitive to changes in body size in this BMI range. However, as BMI rises in the ANSD participant group, there is a rapid increase in over-estimation concurrent with a rapid decline in sensitivity to size change. We discuss the results in the context of signal detection theory. Copyright © 2015 Elsevier Ltd. All rights reserved.
Harris, R.; Reimus, P. W.; Ding, M.
2015-12-01
Chromium used in Los Alamos National Laboratory cooling towers was released as effluent onto laboratory property between 1956 and 1972. As a result, the underlying regional aquifer is contaminated with chromium (VI), a toxin and carcinogen. The highest concentration of chromium is ~1 ppm in monitoring well R-42, exceeding the New Mexico drinking water standard of 50 ppb. The chromium plume is currently being investigated to identify an effective remediation method. Geologic heterogeneity within the aquifer causes the hydraulic conductivity within the plume to be spatially variable. This variability, particularly with depth, is crucial for predicting plume transport behavior. Though pump tests are useful for obtaining estimates of site specific hydraulic conductivity, they tend to interrogate hydraulic properties of only the most conductive strata. Variations in particle size distribution as a function of depth can complement pump test data by providing estimates of vertical variations in hydraulic conductivity. Samples were collected from five different sonically-drilled core holes within the chromium plume at depths ranging from 732'-1125' below the surface. To obtain particle size distributions, the samples were sieved into six different fractions from the fine sands to gravel range (>4 mm, 2-4 mm, 1.4-2 mm, 0.355-1.4 mm, 180-355 µm, and smaller than 180 µm). The Kozeny-Carmen equation (k=(δg/µ)(dm2/180)(Φ3/(1-Φ)2)), was used to estimate permeability from the particle size distribution data. Pump tests estimated a hydraulic conductivity varying between 1 and 50 feet per day. The Kozeny-Carmen equation narrowed this estimate down to an average value of 2.635 feet per day for the samples analyzed, with a range of 0.971 ft/day to 6.069 ft/day. The results of this study show that the Kozeny-Carmen equation provides quite specific estimates of hydraulic conductivity in the Los Alamos aquifer. More importantly, it provides pertinent information on the expected
Bridges, Ana J; Holler, Karen A
2007-11-01
The purpose of this investigation was to determine how confidence intervals (CIs) for pediatric neuropsychological norms vary as a function of sample size, and to determine optimal sample sizes for normative studies. First, the authors calculated 95% CIs for a set of published pediatric norms for four commonly used neuropsychological instruments. Second, 95% CIs were calculated for varying sample size (from n = 5 to n = 500). Results suggest that some pediatric norms have unacceptably wide CIs, and normative studies ought optimally to use 50 to 75 participants per cell. Smaller sample sizes may lead to overpathologizing results, while the cost of obtaining larger samples may not be justifiable.
Beamforming using subspace estimation from a diagonally averaged sample covariance.
Quijano, Jorge E; Zurk, Lisa M
2017-08-01
The potential benefit of a large-aperture sonar array for high resolution target localization is often challenged by the lack of sufficient data required for adaptive beamforming. This paper introduces a Toeplitz-constrained estimator of the clairvoyant signal covariance matrix corresponding to multiple far-field targets embedded in background isotropic noise. The estimator is obtained by averaging along subdiagonals of the sample covariance matrix, followed by covariance extrapolation using the method of maximum entropy. The sample covariance is computed from limited data snapshots, a situation commonly encountered with large-aperture arrays in environments characterized by short periods of local stationarity. Eigenvectors computed from the Toeplitz-constrained covariance are used to construct signal-subspace projector matrices, which are shown to reduce background noise and improve detection of closely spaced targets when applied to subspace beamforming. Monte Carlo simulations corresponding to increasing array aperture suggest convergence of the proposed projector to the clairvoyant signal projector, thereby outperforming the classic projector obtained from the sample eigenvectors. Beamforming performance of the proposed method is analyzed using simulated data, as well as experimental data from the Shallow Water Array Performance experiment.
Radiographic Estimation of the Location and Size of kidneys in ...
African Journals Online (AJOL)
Keywords: Radiography, Location, Kidney size, Local dogs. The kidneys of dogs and cats are located retroperitoneally (Bjorling, 1993). Visualization of the kidneys on radiographs is possible due to the contrast provided by the perirenal fat (Grandage, 1975). However, this perirenal fat rarely covers the ventral surface of the ...
Size Estimation of Non-Cooperative Data Collections
Khelghati, Mohammadreza; Hiemstra, Djoerd; van Keulen, Maurice
2012-01-01
With the increasing amount of data in deep web sources (hidden from general search engines behind web forms), ac- cessing this data has gained more attention. In the algo- rithms applied for this purpose, it is the knowledge of a data source size that enables the algorithms to make accurate de-
Effect size measures in a two-independent-samples case with nonnormal and nonhomogeneous data.
Li, Johnson Ching-Hong
2016-12-01
In psychological science, the "new statistics" refer to the new statistical practices that focus on effect size (ES) evaluation instead of conventional null-hypothesis significance testing (Cumming, Psychological Science, 25, 7-29, 2014). In a two-independent-samples scenario, Cohen's (1988) standardized mean difference (d) is the most popular ES, but its accuracy relies on two assumptions: normality and homogeneity of variances. Five other ESs-the unscaled robust d (d r* ; Hogarty & Kromrey, 2001), scaled robust d (d r ; Algina, Keselman, & Penfield, Psychological Methods, 10, 317-328, 2005), point-biserial correlation (r pb ; McGrath & Meyer, Psychological Methods, 11, 386-401, 2006), common-language ES (CL; Cliff, Psychological Bulletin, 114, 494-509, 1993), and nonparametric estimator for CL (A w ; Ruscio, Psychological Methods, 13, 19-30, 2008)-may be robust to violations of these assumptions, but no study has systematically evaluated their performance. Thus, in this simulation study the performance of these six ESs was examined across five factors: data distribution, sample, base rate, variance ratio, and sample size. The results showed that A w and d r were generally robust to these violations, and A w slightly outperformed d r . Implications for the use of A w and d r in real-world research are discussed.
Duncanson, L.; Dubayah, R.
2015-12-01
Lidar remote sensing is widely applied for mapping forest carbon stocks, and technological advances have improved our ability to capture structural details from forests, even resolving individual trees. Despite these advancements, the accuracy of forest aboveground biomass models remains limited by the quality of field estimates of biomass. The accuracies of field estimates are inherently dependent on the accuracy of the allometric equations used to relate measurable attributes to biomass. These equations are calibrated with relatively small samples of often spatially clustered trees. This research focuses on one of many issues involving allometric equations - understanding how sensitive allometric parameters are to the sample sizes used to fit them. We capitalize on recent advances in lidar remote sensing to extract individual tree structural information from six high-resolution airborne lidar datasets in the United States. We remotely measure millions of tree heights and crown radii, and fit allometric equations to the relationship between tree height and radius at a 'population' level, in each site. We then extract samples from our tree database, and build allometries on these smaller samples of trees, with varying sample sizes. We show that for the allometric relationship between tree height and crown radius, small sample sizes produce biased allometric equations that overestimate height for a given crown radius. We extend this analysis using translations from the literature to address potential implications for biomass, showing that site-level biomass may be greatly overestimated when applying allometric equations developed with the typically small sample sizes used in popular allometric equations for biomass.
Usami, Satoshi
2017-03-01
Behavioral and psychological researchers have shown strong interests in investigating contextual effects (i.e., the influences of combinations of individual- and group-level predictors on individual-level outcomes). The present research provides generalized formulas for determining the sample size needed in investigating contextual effects according to the desired level of statistical power as well as width of confidence interval. These formulas are derived within a three-level random intercept model that includes one predictor/contextual variable at each level to simultaneously cover various kinds of contextual effects that researchers can show interest. The relative influences of indices included in the formulas on the standard errors of contextual effects estimates are investigated with the aim of further simplifying sample size determination procedures. In addition, simulation studies are performed to investigate finite sample behavior of calculated statistical power, showing that estimated sample sizes based on derived formulas can be both positively and negatively biased due to complex effects of unreliability of contextual variables, multicollinearity, and violation of assumption regarding the known variances. Thus, it is advisable to compare estimated sample sizes under various specifications of indices and to evaluate its potential bias, as illustrated in the example.
Estimation of the vortex length scale and intensity from two-dimensional samples
Reuss, D. L.; Cheng, W. P.
1992-01-01
A method is proposed for estimating flow features that influence flame wrinkling in reciprocating internal combustion engines, where traditional statistical measures of turbulence are suspect. Candidate methods were tested in a computed channel flow where traditional turbulence measures are valid and performance can be rationally evaluated. Two concepts are tested. First, spatial filtering is applied to the two-dimensional velocity distribution and found to reveal structures corresponding to the vorticity field. Decreasing the spatial-frequency cutoff of the filter locally changes the character and size of the flow structures that are revealed by the filter. Second, vortex length scale and intensity is estimated by computing the ensemble-average velocity distribution conditionally sampled on the vorticity peaks. The resulting conditionally sampled 'average vortex' has a peak velocity less than half the rms velocity and a size approximately equal to the two-point-correlation integral-length scale.
Estimation of Effective Population Size in the Sapsaree: A Korean Native Dog (
Directory of Open Access Journals (Sweden)
M. Alam
2012-08-01
Full Text Available Effective population size (Ne is an important measure to understand population structure and genetic variability in animal species. The objective of this study was to estimate Ne in Sapsaree dogs using the information of rate of inbreeding and genomic data that were obtained from pedigree and the Illumina CanineSNP20 (20K and CanineHD (170K beadchips, respectively. Three SNP panels, i.e. Sap134 (20K, Sap60 (170K, and Sap183 (the combined panel from the 20K and 170K, were used to genotype 134, 60, and 183 animal samples, respectively. The Ne estimates based on inbreeding rate ranged from 16 to 51 about five to 13 generations ago. With the use of SNP genotypes, two methods were applied for Ne estimation, i.e. pair-wise r2 values using a simple expectation of distance and r2 values under a non-linear regression with respective distances assuming a finite population size. The average pair-wise Ne estimates across generations using the pairs of SNPs that were located within 5 Mb in the Sap134, Sap60, and Sap183 panels, were 1,486, 1,025 and 1,293, respectively. Under the non-linear regression method, the average Ne estimates were 1,601, 528, and 1,129 for the respective panels. Also, the point estimates of past Ne at 5, 20, and 50 generations ago ranged between 64 to 75, 245 to 286, and 573 to 646, respectively, indicating a significant Ne reduction in the last several generations. These results suggest a strong necessity for minimizing inbreeding through the application of genomic selection or other breeding strategies to increase Ne, so as to maintain genetic variation and to avoid future bottlenecks in the Sapsaree population.
Sobel Leonard, Ashley; Weissman, Daniel B; Greenbaum, Benjamin; Ghedin, Elodie; Koelle, Katia
2017-07-15
The bottleneck governing infectious disease transmission describes the size of the pathogen population transferred from the donor to the recipient host. Accurate quantification of the bottleneck size is particularly important for rapidly evolving pathogens such as influenza virus, as narrow bottlenecks reduce the amount of transferred viral genetic diversity and, thus, may decrease the rate of viral adaptation. Previous studies have estimated bottleneck sizes governing viral transmission by using statistical analyses of variants identified in pathogen sequencing data. These analyses, however, did not account for variant calling thresholds and stochastic viral replication dynamics within recipient hosts. Because these factors can skew bottleneck size estimates, we introduce a new method for inferring bottleneck sizes that accounts for these factors. Through the use of a simulated data set, we first show that our method, based on beta-binomial sampling, accurately recovers transmission bottleneck sizes, whereas other methods fail to do so. We then apply our method to a data set of influenza A virus (IAV) infections for which viral deep-sequencing data from transmission pairs are available. We find that the IAV transmission bottleneck size estimates in this study are highly variable across transmission pairs, while the mean bottleneck size of 196 virions is consistent with a previous estimate for this data set. Furthermore, regression analysis shows a positive association between estimated bottleneck size and donor infection severity, as measured by temperature. These results support findings from experimental transmission studies showing that bottleneck sizes across transmission events can be variable and influenced in part by epidemiological factors. IMPORTANCE The transmission bottleneck size describes the size of the pathogen population transferred from the donor to the recipient host and may affect the rate of pathogen adaptation within host populations. Recent
Recursive estimation of inventory quality classes using sampling
Directory of Open Access Journals (Sweden)
L. Aggoun
2003-01-01
Full Text Available In this paper we propose a new discrete time discrete state inventory model for perishable items of a single product. Items in stock are assumed to belong to one of a finite number of quality classes that are ordered in such a way that Class 1 contains the best quality and the last class contains the pre-perishable quality. By the end of each epoch, items in each inventory class either stay in the same class or lose quality and move to a lower class. The movement between classes is not observed. Samples are drawn from the inventory and based on the observations of these samples, optimal estimates for the number of items in each quality classes are derived.
García-Donas, Julieta G; Dyke, Jeffrey; Paine, Robert R; Nathena, Despoina; Kranioti, Elena F
2016-02-01
Most age estimation methods are proven problematic when applied in highly fragmented skeletal remains. Rib histomorphometry is advantageous in such cases; yet it is vital to test and revise existing techniques particularly when used in legal settings (Crowder and Rosella, 2007). This study tested Stout & Paine (1992) and Stout et al. (1994) histological age estimation methods on a Modern Greek sample using different sampling sites. Six left 4th ribs of known age and sex were selected from a modern skeletal collection. Each rib was cut into three equal segments. Two thin sections were acquired from each segment. A total of 36 thin sections were prepared and analysed. Four variables (cortical area, intact and fragmented osteon density and osteon population density) were calculated for each section and age was estimated according to Stout & Paine (1992) and Stout et al. (1994). The results showed that both methods produced a systemic underestimation of the individuals (to a maximum of 43 years) although a general improvement in accuracy levels was observed when applying the Stout et al. (1994) formula. There is an increase of error rates with increasing age with the oldest individual showing extreme differences between real age and estimated age. Comparison of the different sampling sites showed small differences between the estimated ages suggesting that any fragment of the rib could be used without introducing significant error. Yet, a larger sample should be used to confirm these results. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Wolf, Erika J.; Harrington, Kelly M.; Shaunna L Clark; Miller, Mark W.
2013-01-01
Determining sample size requirements for structural equation modeling (SEM) is a challenge often faced by investigators, peer reviewers, and grant writers. Recent years have seen a large increase in SEMs in the behavioral science literature, but consideration of sample size requirements for applied SEMs often relies on outdated rules-of-thumb. This study used Monte Carlo data simulation techniques to evaluate sample size requirements for common applied SEMs. Across a series of simulations, we...
Stamey, James D; Natanegara, Fanni; Seaman, John W
2013-01-01
In clinical trials, multiple outcomes are often collected in order to simultaneously assess effectiveness and safety. We develop a Bayesian procedure for determining the required sample size in a regression model where a continuous efficacy variable and a binary safety variable are observed. The sample size determination procedure is simulation based. The model accounts for correlation between the two variables. Through examples we demonstrate that savings in total sample size are possible when the correlation between these two variables is sufficiently high.
Relative power and sample size analysis on gene expression profiling data
Directory of Open Access Journals (Sweden)
den Dunnen JT
2009-09-01
Full Text Available Abstract Background With the increasing number of expression profiling technologies, researchers today are confronted with choosing the technology that has sufficient power with minimal sample size, in order to reduce cost and time. These depend on data variability, partly determined by sample type, preparation and processing. Objective measures that help experimental design, given own pilot data, are thus fundamental. Results Relative power and sample size analysis were performed on two distinct data sets. The first set consisted of Affymetrix array data derived from a nutrigenomics experiment in which weak, intermediate and strong PPARα agonists were administered to wild-type and PPARα-null mice. Our analysis confirms the hierarchy of PPARα-activating compounds previously reported and the general idea that larger effect sizes positively contribute to the average power of the experiment. A simulation experiment was performed that mimicked the effect sizes seen in the first data set. The relative power was predicted but the estimates were slightly conservative. The second, more challenging, data set describes a microarray platform comparison study using hippocampal δC-doublecortin-like kinase transgenic mice that were compared to wild-type mice, which was combined with results from Solexa/Illumina deep sequencing runs. As expected, the choice of technology greatly influences the performance of the experiment. Solexa/Illumina deep sequencing has the highest overall power followed by the microarray platforms Agilent and Affymetrix. Interestingly, Solexa/Illumina deep sequencing displays comparable power across all intensity ranges, in contrast with microarray platforms that have decreased power in the low intensity range due to background noise. This means that deep sequencing technology is especially more powerful in detecting differences in the low intensity range, compared to microarray platforms. Conclusion Power and sample size analysis
Relative power and sample size analysis on gene expression profiling data
van Iterson, M; 't Hoen, PAC; Pedotti, P; Hooiveld, GJEJ; den Dunnen, JT; van Ommen, GJB; Boer, JM; Menezes, RX
2009-01-01
Background With the increasing number of expression profiling technologies, researchers today are confronted with choosing the technology that has sufficient power with minimal sample size, in order to reduce cost and time. These depend on data variability, partly determined by sample type, preparation and processing. Objective measures that help experimental design, given own pilot data, are thus fundamental. Results Relative power and sample size analysis were performed on two distinct data sets. The first set consisted of Affymetrix array data derived from a nutrigenomics experiment in which weak, intermediate and strong PPARα agonists were administered to wild-type and PPARα-null mice. Our analysis confirms the hierarchy of PPARα-activating compounds previously reported and the general idea that larger effect sizes positively contribute to the average power of the experiment. A simulation experiment was performed that mimicked the effect sizes seen in the first data set. The relative power was predicted but the estimates were slightly conservative. The second, more challenging, data set describes a microarray platform comparison study using hippocampal δC-doublecortin-like kinase transgenic mice that were compared to wild-type mice, which was combined with results from Solexa/Illumina deep sequencing runs. As expected, the choice of technology greatly influences the performance of the experiment. Solexa/Illumina deep sequencing has the highest overall power followed by the microarray platforms Agilent and Affymetrix. Interestingly, Solexa/Illumina deep sequencing displays comparable power across all intensity ranges, in contrast with microarray platforms that have decreased power in the low intensity range due to background noise. This means that deep sequencing technology is especially more powerful in detecting differences in the low intensity range, compared to microarray platforms. Conclusion Power and sample size analysis based on pilot data give
Issues of sample size in sensitivity and specificity analysis with special reference to oncology
Directory of Open Access Journals (Sweden)
Atul Juneja
2015-01-01
Full Text Available Sample size is one of the basics issues, which medical researcher including oncologist faces with any research program. The current communication attempts to discuss the computation of sample size when sensitivity and specificity are being evaluated. The article intends to present the situation that the researcher could easily visualize for appropriate use of sample size techniques for sensitivity and specificity when any screening method for early detection of cancer is in question. Moreover, the researcher would be in a position to efficiently communicate with a statistician for sample size computation and most importantly applicability of the results under the conditions of the negotiated precision.
Issues of sample size in sensitivity and specificity analysis with special reference to oncology.
Juneja, Atul; Sharma, Shashi
2015-01-01
Sample size is one of the basics issues, which medical researcher including oncologist faces with any research program. The current communication attempts to discuss the computation of sample size when sensitivity and specificity are being evaluated. The article intends to present the situation that the researcher could easily visualize for appropriate use of sample size techniques for sensitivity and specificity when any screening method for early detection of cancer is in question. Moreover, the researcher would be in a position to efficiently communicate with a statistician for sample size computation and most importantly applicability of the results under the conditions of the negotiated precision.
Directory of Open Access Journals (Sweden)
Lela Sulaberidze
Full Text Available An accurate estimation of the population size of men who have sex with men (MSM is critical to the success of HIV program planning and to monitoring of the response to epidemic as a whole, but is quite often missing. In this study, our aim was to estimate the population size of MSM in Tbilisi, Georgia and compare it with other estimates in the region.In the absence of a gold standard for estimating the population size of MSM, this study reports a range of methods, including network scale-up, mobile/web apps multiplier, service and unique object multiplier, network-based capture-recapture, Handcock RDS-based and Wisdom of Crowds methods. To apply all these methods, two surveys were conducted: first, a household survey among 1,015 adults from the general population, and second, a respondent driven sample of 210 MSM. We also conducted a literature review of MSM size estimation in Eastern European and Central Asian countries.The median population size of MSM generated from all previously mentioned methods was estimated to be 5,100 (95% Confidence Interval (CI: 3,243~9,088. This corresponds to 1.42% (95%CI: 0.9%~2.53% of the adult male population in Tbilisi.Our size estimates of the MSM population (1.42% (95%CI: 0.9%~2.53% of the adult male population in Tbilisi fall within ranges reported in other Eastern European and Central Asian countries. These estimates can provide valuable information for country level HIV prevention program planning and evaluation. Furthermore, we believe, that our results will narrow the gap in data availability on the estimates of the population size of MSM in the region.
Audiovisual Interval Size Estimation Is Associated with Early Musical Training.
Directory of Open Access Journals (Sweden)
Mary Kathryn Abel
Full Text Available Although pitch is a fundamental attribute of auditory perception, substantial individual differences exist in our ability to perceive differences in pitch. Little is known about how these individual differences in the auditory modality might affect crossmodal processes such as audiovisual perception. In this study, we asked whether individual differences in pitch perception might affect audiovisual perception, as it relates to age of onset and number of years of musical training. Fifty-seven subjects made subjective ratings of interval size when given point-light displays of audio, visual, and audiovisual stimuli of sung intervals. Audiovisual stimuli were divided into congruent and incongruent (audiovisual-mismatched stimuli. Participants' ratings correlated strongly with interval size in audio-only, visual-only, and audiovisual-congruent conditions. In the audiovisual-incongruent condition, ratings correlated more with audio than with visual stimuli, particularly for subjects who had better pitch perception abilities and higher nonverbal IQ scores. To further investigate the effects of age of onset and length of musical training, subjects were divided into musically trained and untrained groups. Results showed that among subjects with musical training, the degree to which participants' ratings correlated with auditory interval size during incongruent audiovisual perception was correlated with both nonverbal IQ and age of onset of musical training. After partialing out nonverbal IQ, pitch discrimination thresholds were no longer associated with incongruent audio scores, whereas age of onset of musical training remained associated with incongruent audio scores. These findings invite future research on the developmental effects of musical training, particularly those relating to the process of audiovisual perception.
Directory of Open Access Journals (Sweden)
D Johan Kotze
Full Text Available Temporal variation in the detectability of a species can bias estimates of relative abundance if not handled correctly. For example, when effort varies in space and/or time it becomes necessary to take variation in detectability into account when data are analyzed. We demonstrate the importance of incorporating seasonality into the analysis of data with unequal sample sizes due to lost traps at a particular density of a species. A case study of count data was simulated using a spring-active carabid beetle. Traps were 'lost' randomly during high beetle activity in high abundance sites and during low beetle activity in low abundance sites. Five different models were fitted to datasets with different levels of loss. If sample sizes were unequal and a seasonality variable was not included in models that assumed the number of individuals was log-normally distributed, the models severely under- or overestimated the true effect size. Results did not improve when seasonality and number of trapping days were included in these models as offset terms, but only performed well when the response variable was specified as following a negative binomial distribution. Finally, if seasonal variation of a species is unknown, which is often the case, seasonality can be added as a free factor, resulting in well-performing negative binomial models. Based on these results we recommend (a add sampling effort (number of trapping days in our example to the models as an offset term, (b if precise information is available on seasonal variation in detectability of a study object, add seasonality to the models as an offset term; (c if information on seasonal variation in detectability is inadequate, add seasonality as a free factor; and (d specify the response variable of count data as following a negative binomial or over-dispersed Poisson distribution.
Eisenberg, Sarita L.; Guo, Ling-Yu
2015-01-01
Purpose: The purpose of this study was to investigate whether a shorter language sample elicited with fewer pictures (i.e., 7) would yield a percent grammatical utterances (PGU) score similar to that computed from a longer language sample elicited with 15 pictures for 3-year-old children. Method: Language samples were elicited by asking forty…
Estimating the settling velocity of bioclastic sediment using common grain-size analysis techniques
Cuttler, Michael V. W.; Lowe, Ryan J.; Falter, James L.; Buscombe, Daniel D.
2017-01-01
Most techniques for estimating settling velocities of natural particles have been developed for siliciclastic sediments. Therefore, to understand how these techniques apply to bioclastic environments, measured settling velocities of bioclastic sedimentary deposits sampled from a nearshore fringing reef in Western Australia were compared with settling velocities calculated using results from several common grain-size analysis techniques (sieve, laser diffraction and image analysis) and established models. The effects of sediment density and shape were also examined using a range of density values and three different models of settling velocity. Sediment density was found to have a significant effect on calculated settling velocity, causing a range in normalized root-mean-square error of up to 28%, depending upon settling velocity model and grain-size method. Accounting for particle shape reduced errors in predicted settling velocity by 3% to 6% and removed any velocity-dependent bias, which is particularly important for the fastest settling fractions. When shape was accounted for and measured density was used, normalized root-mean-square errors were 4%, 10% and 18% for laser diffraction, sieve and image analysis, respectively. The results of this study show that established models of settling velocity that account for particle shape can be used to estimate settling velocity of irregularly shaped, sand-sized bioclastic sediments from sieve, laser diffraction, or image analysis-derived measures of grain size with a limited amount of error. Collectively, these findings will allow for grain-size data measured with different methods to be accurately converted to settling velocity for comparison. This will facilitate greater understanding of the hydraulic properties of bioclastic sediment which can help to increase our general knowledge of sediment dynamics in these environments.
Power and sample size determination for measures of environmental impact in aquatic systems
Energy Technology Data Exchange (ETDEWEB)
Ammann, L.P. [Univ. of Texas, Richardson, TX (United States); Dickson, K.L.; Waller, W.T.; Kennedy, J.H. [Univ. of North Texas, Denton, TX (United States); Mayer, F.L.; Lewis, M. [Environmental Protection Agency, Gulf Breeze, FL (United States)
1994-12-31
To effectively monitor the status of various freshwater and estuarine ecological systems, it is necessary to understand the statistical power associated with the measures of ecological health that are appropriate for each system. These power functions can then be used to determine sample sizes that are required to attain targeted change detection likelihoods. A number of different measures have been proposed and are used for such monitoring. these include diversity and evenness indices, richness, and organisms counts. Power functions can be estimated when preliminary or historical data are available for the region and system of interest. Unfortunately, there are a number of problems associated with the computation of power functions and sample sizes for these measures. These problems include the presence of outliers, co-linearity among the variables, and non-normality of count data. The problems, and appropriate methods to compute the power functions, for each of the commonly employed measures of ecological health will be discussed. In addition, the relationship between power and the level of taxonomic classification used to compute the measures of diversity, evenness, richness, and organism counts will be discussed. Methods for computation of the power functions will be illustrated using data sets from previous EPA studies.
CT dose survey in adults: what sample size for what precision?
Energy Technology Data Exchange (ETDEWEB)
Taylor, Stephen [Hopital Ambroise Pare, Department of Radiology, Mons (Belgium); Muylem, Alain van [Hopital Erasme, Department of Pneumology, Brussels (Belgium); Howarth, Nigel [Clinique des Grangettes, Department of Radiology, Chene-Bougeries (Switzerland); Gevenois, Pierre Alain [Hopital Erasme, Department of Radiology, Brussels (Belgium); Tack, Denis [EpiCURA, Clinique Louis Caty, Department of Radiology, Baudour (Belgium)
2017-01-15
To determine variability of volume computed tomographic dose index (CTDIvol) and dose-length product (DLP) data, and propose a minimum sample size to achieve an expected precision. CTDIvol and DLP values of 19,875 consecutive CT acquisitions of abdomen (7268), thorax (3805), lumbar spine (3161), cervical spine (1515) and head (4106) were collected in two centers. Their variabilities were investigated according to sample size (10 to 1000 acquisitions) and patient body weight categories (no weight selection, 67-73 kg and 60-80 kg). The 95 % confidence interval in percentage of their median (CI95/med) value was calculated for increasing sample sizes. We deduced the sample size that set a 95 % CI lower than 10 % of the median (CI95/med ≤ 10 %). Sample size ensuring CI95/med ≤ 10 %, ranged from 15 to 900 depending on the body region and the dose descriptor considered. In sample sizes recommended by regulatory authorities (i.e., from 10-20 patients), mean CTDIvol and DLP of one sample ranged from 0.50 to 2.00 times its actual value extracted from 2000 samples. The sampling error in CTDIvol and DLP means is high in dose surveys based on small samples of patients. Sample size should be increased at least tenfold to decrease this variability. (orig.)
Giorli, Giacomo; Drazen, Jeffrey C.; Neuheimer, Anna B.; Copeland, Adrienne; Au, Whitlow W. L.
2018-01-01
Pelagic animals that form deep sea scattering layers (DSLs) represent an important link in the food web between zooplankton and top predators. While estimating the composition, density and location of the DSL is important to understand mesopelagic ecosystem dynamics and to predict top predators' distribution, DSL composition and density are often estimated from trawls which may be biased in terms of extrusion, avoidance, and gear-associated biases. Instead, location and biomass of DSLs can be estimated from active acoustic techniques, though estimates are often in aggregate without regard to size or taxon specific information. For the first time in the open ocean, we used a DIDSON sonar to characterize the fauna in DSLs. Estimates of the numerical density and length of animals at different depths and locations along the Kona coast of the Island of Hawaii were determined. Data were collected below and inside the DSLs with the sonar mounted on a profiler. A total of 7068 animals were counted and sized. We estimated numerical densities ranging from 1 to 7 animals/m3 and individuals as long as 3 m were detected. These numerical densities were orders of magnitude higher than those estimated from trawls and average sizes of animals were much larger as well. A mixed model was used to characterize numerical density and length of animals as a function of deep sea layer sampled, location, time of day, and day of the year. Numerical density and length of animals varied by month, with numerical density also a function of depth. The DIDSON proved to be a good tool for open-ocean/deep-sea estimation of the numerical density and size of marine animals, especially larger ones. Further work is needed to understand how this methodology relates to estimates of volume backscatters obtained with standard echosounding techniques, density measures obtained with other sampling methodologies, and to precisely evaluate sampling biases.
Reliable calculation in probabilistic logic: Accounting for small sample size and model uncertainty
Energy Technology Data Exchange (ETDEWEB)
Ferson, S. [Applied Biomathematics, Setauket, NY (United States)
1996-12-31
A variety of practical computational problems arise in risk and safety assessments, forensic statistics and decision analyses in which the probability of some event or proposition E is to be estimated from the probabilities of a finite list of related subevents or propositions F,G,H,.... In practice, the analyst`s knowledge may be incomplete in two ways. First, the probabilities of the subevents may be imprecisely known from statistical estimations, perhaps based on very small sample sizes. Second, relationships among the subevents may be known imprecisely. For instance, there may be only limited information about their stochastic dependencies. Representing probability estimates as interval ranges on has been suggested as a way to address the first source of imprecision. A suite of AND, OR and NOT operators defined with reference to the classical Frochet inequalities permit these probability intervals to be used in calculations that address the second source of imprecision, in many cases, in a best possible way. Using statistical confidence intervals as inputs unravels the closure properties of this approach however, requiring that probability estimates be characterized by a nested stack of intervals for all possible levels of statistical confidence, from a point estimate (0% confidence) to the entire unit interval (100% confidence). The corresponding logical operations implied by convolutive application of the logical operators for every possible pair of confidence intervals reduces by symmetry to a manageably simple level-wise iteration. The resulting calculus can be implemented in software that allows users to compute comprehensive and often level-wise best possible bounds on probabilities for logical functions of events.
Gordon Luikart; Nils Ryman; David A. Tallmon; Michael K. Schwartz; Fred W. Allendorf
2010-01-01
Population census size (NC) and effective population sizes (Ne) are two crucial parameters that influence population viability, wildlife management decisions, and conservation planning. Genetic estimators of both NC and Ne are increasingly widely used because molecular markers are increasingly available, statistical methods are improving rapidly, and genetic estimators...
Some basic aspects of statistical methods and sample size determination in health science research.
Binu, V S; Mayya, Shreemathi S; Dhar, Murali
2014-04-01
A health science researcher may sometimes wonder "why statistical methods are so important in research?" Simple answer is that, statistical methods are used throughout a study that includes planning, designing, collecting data, analyzing and drawing meaningful interpretation and report the findings. Hence, it is important that a researcher knows the concepts of at least basic statistical methods used at various stages of a research study. This helps the researcher in the conduct of an appropriately well-designed study leading to valid and reliable results that can be generalized to the population. A well-designed study possesses fewer biases, which intern gives precise, valid and reliable results. There are many statistical methods and tests that are used at various stages of a research. In this communication, we discuss the overall importance of statistical considerations in medical research with the main emphasis on estimating minimum sample size for different study objectives.
Oba, Yurika; Yamada, Toshihiro
2017-05-01
We estimated the sample size (the number of samples) required to evaluate the concentration of radiocesium (137Cs) in Japanese fir (Abies firma Sieb. & Zucc.), 5 years after the outbreak of the Fukushima Daiichi Nuclear Power Plant accident. We investigated the spatial structure of the contamination levels in this species growing in a mixed deciduous broadleaf and evergreen coniferous forest stand. We sampled 40 saplings with a tree height of 150 cm-250 cm in a Fukushima forest community. The results showed that: (1) there was no correlation between the 137Cs concentration in needles and soil, and (2) the difference in the spatial distribution pattern of 137Cs concentration between needles and soil suggest that the contribution of root uptake to 137Cs in new needles of this species may be minor in the 5 years after the radionuclides were released into the atmosphere. The concentration of 137Cs in needles showed a strong positive spatial autocorrelation in the distance class from 0 to 2.5 m, suggesting that the statistical analysis of data should consider spatial autocorrelation in the case of an assessment of the radioactive contamination of forest trees. According to our sample size analysis, a sample size of seven trees was required to determine the mean contamination level within an error in the means of no more than 10%. This required sample size may be feasible for most sites. Copyright © 2017 Elsevier Ltd. All rights reserved.
Directory of Open Access Journals (Sweden)
Simon Boitard
2016-03-01
Full Text Available Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey, PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles.
45 CFR Appendix C to Part 1356 - Calculating Sample Size for NYTD Follow-Up Populations
2010-10-01
... applied when the sample is drawn from a population of one to 5,000 youth, because the sample is more than... Using Finite Population Correction The FPC is not applied when the sample is drawn from a population of... 45 Public Welfare 4 2010-10-01 2010-10-01 false Calculating Sample Size for NYTD Follow-Up...
Error and bias in size estimates of whale sharks: implications for understanding demography.
Sequeira, Ana M M; Thums, Michele; Brooks, Kim; Meekan, Mark G
2016-03-01
Body size and age at maturity are indicative of the vulnerability of a species to extinction. However, they are both difficult to estimate for large animals that cannot be restrained for measurement. For very large species such as whale sharks, body size is commonly estimated visually, potentially resulting in the addition of errors and bias. Here, we investigate the errors and bias associated with total lengths of whale sharks estimated visually by comparing them with measurements collected using a stereo-video camera system at Ningaloo Reef, Western Australia. Using linear mixed-effects models, we found that visual lengths were biased towards underestimation with increasing size of the shark. When using the stereo-video camera, the number of larger individuals that were possibly mature (or close to maturity) that were detected increased by approximately 10%. Mean lengths calculated by each method were, however, comparable (5.002 ± 1.194 and 6.128 ± 1.609 m, s.d.), confirming that the population at Ningaloo is mostly composed of immature sharks based on published lengths at maturity. We then collated data sets of total lengths sampled from aggregations of whale sharks worldwide between 1995 and 2013. Except for locations in the East Pacific where large females have been reported, these aggregations also largely consisted of juveniles (mean lengths less than 7 m). Sightings of the largest individuals were limited and occurred mostly prior to 2006. This result highlights the urgent need to locate and quantify the numbers of mature male and female whale sharks in order to ascertain the conservation status and ensure persistence of the species.
Estimation of typical food portion sizes for children of different ages in Great Britain.
Wrieden, Wendy L; Longbottom, Patricia J; Adamson, Ashley J; Ogston, Simon A; Payne, Anne; Haleem, Mohammad A; Barton, Karen L
2008-06-01
It is often the case in dietary assessment that it is not practicable to weigh individual intakes of foods eaten. The aim of the work described was to estimate typical food portion weights for children of different ages. Using the data available from the British National Diet and Nutrition Surveys of children aged 1 1/2-4 1/2 years (1992-1993) and young people aged 4-18 years (1997), descriptive statistics were obtained, and predicted weights were calculated by linear, quadratic and exponential regression for each age group. Following comparison of energy and nutrient intakes calculated from actual (from an earlier weighed intake study) and estimated portion weights, the final list of typical portion sizes was based on median portion weights for the 1-3- and 4-6-year age groups, and age-adjusted means using linear regression for the 7-10-, 11-14- and 15-18-year age groups. The number of foods recorded by fifty or more children was 133 for each of the younger age groups (1-3 and 4-6 years) and seventy-five for each of the older age groups. The food portion weights covered all food groups. All portion sizes increased with age with the exception of milk in tea or coffee. The present study draws on a unique source of weighed data on food portions of a large sample of children that is unlikely to be repeated and therefore provides the best possible estimates of children's food portion sizes in the UK.
Dual-filter estimation for rotating-panel sample designs
Francis Roesch
2017-01-01
Dual-filter estimators are described and tested for use in the annual estimation for national forest inventories. The dual-filter approach involves the use of a moving widow estimator in the first pass, which is used as input to Theilâs mixed estimator in the second pass. The moving window and dual-filter estimators are tested along with two other estimators in a...
Sample Size Induced Brittle-to-Ductile Transition of Single-Crystal Aluminum Nitride
2015-08-01
Interestingly, the dislocation plasticity of the single- crystal AlN strongly depends on specimen sizes. As shown in Fig. 5a and b, the large plastic...ARL-RP-0528 ● AUG 2015 US Army Research Laboratory Sample Size Induced Brittle-to-Ductile Transition of Single- Crystal Aluminum...originator. ARL-RP-0528 ● AUG 2015 US Army Research Laboratory Sample Size Induced Brittle-to-Ductile Transition of Single- Crystal
Not too big, not too small: a goldilocks approach to sample size selection.
Broglio, Kristine R; Connor, Jason T; Berry, Scott M
2014-01-01
We present a Bayesian adaptive design for a confirmatory trial to select a trial's sample size based on accumulating data. During accrual, frequent sample size selection analyses are made and predictive probabilities are used to determine whether the current sample size is sufficient or whether continuing accrual would be futile. The algorithm explicitly accounts for complete follow-up of all patients before the primary analysis is conducted. We refer to this as a Goldilocks trial design, as it is constantly asking the question, "Is the sample size too big, too small, or just right?" We describe the adaptive sample size algorithm, describe how the design parameters should be chosen, and show examples for dichotomous and time-to-event endpoints.
Sample size determination in group-sequential clinical trials with two co-primary endpoints
Asakura, Koko; Hamasaki, Toshimitsu; Sugimoto, Tomoyuki; Hayashi, Kenichi; Evans, Scott R; Sozu, Takashi
2014-01-01
We discuss sample size determination in group-sequential designs with two endpoints as co-primary. We derive the power and sample size within two decision-making frameworks. One is to claim the test intervention’s benefit relative to control when superiority is achieved for the two endpoints at the same interim timepoint of the trial. The other is when the superiority is achieved for the two endpoints at any interim timepoint, not necessarily simultaneously. We evaluate the behaviors of sample size and power with varying design elements and provide a real example to illustrate the proposed sample size methods. In addition, we discuss sample size recalculation based on observed data and evaluate the impact on the power and Type I error rate. PMID:24676799
Ciarleglio, Maria M; Arendt, Christopher D; Peduzzi, Peter N
2016-06-01
When designing studies that have a continuous outcome as the primary endpoint, the hypothesized effect size ([Formula: see text]), that is, the hypothesized difference in means ([Formula: see text]) relative to the assumed variability of the endpoint ([Formula: see text]), plays an important role in sample size and power calculations. Point estimates for [Formula: see text] and [Formula: see text] are often calculated using historical data. However, the uncertainty in these estimates is rarely addressed. This article presents a hybrid classical and Bayesian procedure that formally integrates prior information on the distributions of [Formula: see text] and [Formula: see text] into the study's power calculation. Conditional expected power, which averages the traditional power curve using the prior distributions of [Formula: see text] and [Formula: see text] as the averaging weight, is used, and the value of [Formula: see text] is found that equates the prespecified frequentist power ([Formula: see text]) and the conditional expected power of the trial. This hypothesized effect size is then used in traditional sample size calculations when determining sample size for the study. The value of [Formula: see text] found using this method may be expressed as a function of the prior means of [Formula: see text] and [Formula: see text], [Formula: see text], and their prior standard deviations, [Formula: see text]. We show that the "naïve" estimate of the effect size, that is, the ratio of prior means, should be down-weighted to account for the variability in the parameters. An example is presented for designing a placebo-controlled clinical trial testing the antidepressant effect of alprazolam as monotherapy for major depression. Through this method, we are able to formally integrate prior information on the uncertainty and variability of both the treatment effect and the common standard deviation into the design of the study while maintaining a frequentist framework for
Mixed modeling and sample size calculations for identifying housekeeping genes.
Dai, Hongying; Charnigo, Richard; Vyhlidal, Carrie A; Jones, Bridgette L; Bhandary, Madhusudan
2013-08-15
Normalization of gene expression data using internal control genes that have biologically stable expression levels is an important process for analyzing reverse transcription polymerase chain reaction data. We propose a three-way linear mixed-effects model to select optimal housekeeping genes. The mixed-effects model can accommodate multiple continuous and/or categorical variables with sample random effects, gene fixed effects, systematic effects, and gene by systematic effect interactions. We propose using the intraclass correlation coefficient among gene expression levels as the stability measure to select housekeeping genes that have low within-sample variation. Global hypothesis testing is proposed to ensure that selected housekeeping genes are free of systematic effects or gene by systematic effect interactions. A gene combination with the highest lower bound of 95% confidence interval for intraclass correlation coefficient and no significant systematic effects is selected for normalization. Sample size calculation based on the estimation accuracy of the stability measure is offered to help practitioners design experiments to identify housekeeping genes. We compare our methods with geNorm and NormFinder by using three case studies. A free software package written in SAS (Cary, NC, U.S.A.) is available at http://d.web.umkc.edu/daih under software tab. Copyright © 2013 John Wiley & Sons, Ltd.
Using age on clothes size label to estimate weight in emergency paediatric patients.
Elgie, Laura D; Williams, Andrew R
2012-10-01
To study formulae that estimate children's weight using their actual age. To determine whether using the age on their clothes size label in these formulae can estimate weight when their actual age is unknown. The actual age and age on the clothes labels of 188 children were inserted into formulae that estimate children's weight. These estimates were compared with their actual weight. Bland-Altman plots calculated the precision and accuracy of each of these estimates. In all formulae, using age on the clothes sizes label provided a more precise estimate than the child's actual age. In emergencies where a child's age is unknown, use of the age on their clothes label in weight-estimating formulae yields acceptable weight estimates. Even in situations where a child's age is known, the age on their clothes label may provide a more accurate and precise weight estimate than the actual age.
Placzek, Marius; Friede, Tim
2017-01-01
The importance of subgroup analyses has been increasing due to a growing interest in personalized medicine and targeted therapies. Considering designs with multiple nested subgroups and a continuous endpoint, we develop methods for the analysis and sample size determination. First, we consider the joint distribution of standardized test statistics that correspond to each (sub)population. We derive multivariate exact distributions where possible, providing approximations otherwise. Based on these results, we present sample size calculation procedures. Uncertainties about nuisance parameters which are needed for sample size calculations make the study prone to misspecifications. We discuss how a sample size review can be performed in order to make the study more robust. To this end, we implement an internal pilot study design where the variances and prevalences of the subgroups are reestimated in a blinded fashion and the sample size is recalculated accordingly. Simulations show that the procedures presented here do not inflate the type I error significantly and maintain the prespecified power as long as the sample size of the smallest subgroup is not too small. We pay special attention to the case of small sample sizes and attain a lower boundary for the size of the internal pilot study.
Li, Xiang; Kuk, Anthony Y C; Xu, Jinfeng
2014-12-10
Human biomonitoring of exposure to environmental chemicals is important. Individual monitoring is not viable because of low individual exposure level or insufficient volume of materials and the prohibitive cost of taking measurements from many subjects. Pooling of samples is an efficient and cost-effective way to collect data. Estimation is, however, complicated as individual values within each pool are not observed but are only known up to their average or weighted average. The distribution of such averages is intractable when the individual measurements are lognormally distributed, which is a common assumption. We propose to replace the intractable distribution of the pool averages by a Gaussian likelihood to obtain parameter estimates. If the pool size is large, this method produces statistically efficient estimates, but regardless of pool size, the method yields consistent estimates as the number of pools increases. An empirical Bayes (EB) Gaussian likelihood approach, as well as its Bayesian analog, is developed to pool information from various demographic groups by using a mixed-effect formulation. We also discuss methods to estimate the underlying mean-variance relationship and to select a good model for the means, which can be incorporated into the proposed EB or Bayes framework. By borrowing strength across groups, the EB estimator is more efficient than the individual group-specific estimator. Simulation results show that the EB Gaussian likelihood estimates outperform a previous method proposed for the National Health and Nutrition Examination Surveys with much smaller bias and better coverage in interval estimation, especially after correction of bias. Copyright © 2014 John Wiley & Sons, Ltd.
Engemann, Kristine; Enquist, Brian J; Sandel, Brody; Boyle, Brad; Jørgensen, Peter M; Morueta-Holme, Naia; Peet, Robert K; Violle, Cyrille; Svenning, Jens-Christian
2015-01-01
Macro-scale species richness studies often use museum specimens as their main source of information. However, such datasets are often strongly biased due to variation in sampling effort in space and time. These biases may strongly affect diversity estimates and may, thereby, obstruct solid inference on the underlying diversity drivers, as well as mislead conservation prioritization. In recent years, this has resulted in an increased focus on developing methods to correct for sampling bias. In this study, we use sample-size-correcting methods to examine patterns of tropical plant diversity in Ecuador, one of the most species-rich and climatically heterogeneous biodiversity hotspots. Species richness estimates were calculated based on 205,735 georeferenced specimens of 15,788 species using the Margalef diversity index, the Chao estimator, the second-order Jackknife and Bootstrapping resampling methods, and Hill numbers and rarefaction. Species richness was heavily correlated with sampling effort, and only rarefaction was able to remove this effect, and we recommend this method for estimation of species richness with “big data” collections. PMID:25692000
Pruitt, Matthew V
2002-01-01
This articles examines estimates of the size of the gay population that are provided in the Websites of pro- and anti-gay groups. There are marked differences in the estimates that are provided by these groups. While most pro-gay groups suggest that approximately ten percent of the population is gay, anti-gay groups argue that only 1-3 percent of the population is gay. While none of the pro-gay groups address the methodological problems associated with the Kinsey data, all of the anti-gay groups that address the issue of size discredit Kinsey's work and/or the ten percent estimate that comes from Kinsey's work and is often cited by pro-gay organizations.
Sampling characteristics and calibration of snorkel counts to estimate stream fish populations
Weaver, D.; Kwak, Thomas J.; Pollock, Kenneth
2014-01-01
Snorkeling is a versatile technique for estimating lotic fish population characteristics; however, few investigators have evaluated its accuracy at population or assemblage levels. We evaluated the accuracy of snorkeling using prepositioned areal electrofishing (PAE) for estimating fish populations in a medium-sized Appalachian Mountain river during fall 2008 and summer 2009. Strip-transect snorkel counts were calibrated with PAE counts in identical locations among macrohabitats, fish species or taxa, and seasons. Mean snorkeling efficiency (i.e., the proportion of individuals counted from the true population) among all taxa and seasons was 14.7% (SE, 2.5%), and the highest efficiencies were for River Chub Nocomis micropogon at 21.1% (SE, 5.9%), Central Stoneroller Campostoma anomalum at 20.3% (SE, 9.6%), and darters (Percidae) at 17.1% (SE, 3.7%), whereas efficiencies were lower for shiners (Notropis spp., Cyprinella spp., Luxilus spp.) at 8.2% (SE, 2.2%) and suckers (Catostomidae) at 6.6% (SE, 3.2%). Macrohabitat type, fish taxon, or sampling season did not significantly explain variance in snorkeling efficiency. Mean snorkeling detection probability (i.e., probability of detecting at least one individual of a taxon) among fish taxa and seasons was 58.4% (SE, 6.1%). We applied the efficiencies from our calibration study to adjust snorkel counts from an intensive snorkeling survey conducted in a nearby reach. Total fish density estimates from strip-transect counts adjusted for snorkeling efficiency were 7,288 fish/ha (SE, 1,564) during summer and 15,805 fish/ha (SE, 4,947) during fall. Precision of fish density estimates is influenced by variation in snorkeling efficiency and sample size and may be increased with additional sampling effort. These results demonstrate the sampling properties and utility of snorkeling to characterize lotic fish assemblages with acceptable efficiency and detection probability, less effort, and no mortality, compared with traditional
Jeffrey H. Gove
2003-01-01
Many of the most popular sampling schemes used in forestry are probability proportional to size methods. These methods are also referred to as size biased because sampling is actually from a weighted form of the underlying population distribution. Length- and area-biased sampling are special cases of size-biased sampling where the probability weighting comes from a...
2010-07-01
... 40 Protection of Environment 5 2010-07-01 2010-07-01 false Estimated Mass Concentration... 53—Estimated Mass Concentration Measurement of PM2.5 for Idealized “Typical” Coarse Aerosol Size... Concentration (µg/m3) Estimated Mass Concentration Measurement (µg/m3) Ideal Sampler Fractional Sampling...
Complexity in Animal Communication: Estimating the Size of N-Gram Structures
Directory of Open Access Journals (Sweden)
Reginald Smith
2014-01-01
Full Text Available In this paper, new techniques that allow conditional entropy to estimate the combinatorics of symbols are applied to animal communication studies to estimate the communication’s repertoire size. By using the conditional entropy estimates at multiple orders, the paper estimates the total repertoire sizes for animal communication across bottlenose dolphins, humpback whales and several species of birds for an N-gram length of one to three. In addition to discussing the impact of this method on studies of animal communication complexity, the reliability of these estimates is compared to other methods through simulation. While entropy does undercount the total repertoire size due to rare N-grams, it gives a more accurate picture of the most frequently used repertoire than just repertoire size alone.
Complexity in Animal Communication: Estimating the Size of N-Gram Structures
Smith, Reginald
2014-01-01
In this paper, new techniques that allow conditional entropy to estimate the combinatorics of symbols are applied to animal communication studies to estimate the communication's repertoire size. By using the conditional entropy estimates at multiple orders, the paper estimates the total repertoire sizes for animal communication across bottlenose dolphins, humpback whales, and several species of birds for N-grams length one to three. In addition to discussing the impact of this method on studies of animal communication complexity, the reliability of these estimates is compared to other methods through simulation. While entropy does undercount the total repertoire size due to rare N-grams, it gives a more accurate picture of the most frequently used repertoire than just repertoire size alone.
Sample size and power for a stratified doubly randomized preference design.
Cameron, Briana; Esserman, Denise A
2016-11-21
The two-stage (or doubly) randomized preference trial design is an important tool for researchers seeking to disentangle the role of patient treatment preference on treatment response through estimation of selection and preference effects. Up until now, these designs have been limited by their assumption of equal preference rates and effect sizes across the entire study population. We propose a stratified two-stage randomized trial design that addresses this limitation. We begin by deriving stratified test statistics for the treatment, preference, and selection effects. Next, we develop a sample size formula for the number of patients required to detect each effect. The properties of the model and the efficiency of the design are established using a series of simulation studies. We demonstrate the applicability of the design using a study of Hepatitis C treatment modality, specialty clinic versus mobile medical clinic. In this example, a stratified preference design (stratified by alcohol/drug use) may more closely capture the true distribution of patient preferences and allow for a more efficient design than a design which ignores these differences (unstratified version). © The Author(s) 2016.
The international food unit: a new measurement aid that can improve portion size estimation.
Bucher, T; Weltert, M; Rollo, M E; Smith, S P; Jia, W; Collins, C E; Sun, M
2017-09-12
Portion size education tools, aids and interventions can be effective in helping prevent weight gain. However consumers have difficulties in estimating food portion sizes and are confused by inconsistencies in measurement units and terminologies currently used. Visual cues are an important mediator of portion size estimation, but standardized measurement units are required. In the current study, we present a new food volume estimation tool and test the ability of young adults to accurately quantify food volumes. The International Food Unit™ (IFU™) is a 4x4x4 cm cube (64cm(3)), subdivided into eight 2 cm sub-cubes for estimating smaller food volumes. Compared with currently used measures such as cups and spoons, the IFU™ standardizes estimation of food volumes with metric measures. The IFU™ design is based on binary dimensional increments and the cubic shape facilitates portion size education and training, memory and recall, and computer processing which is binary in nature. The performance of the IFU™ was tested in a randomized between-subject experiment (n = 128 adults, 66 men) that estimated volumes of 17 foods using four methods; the IFU™ cube, a deformable modelling clay cube, a household measuring cup or no aid (weight estimation). Estimation errors were compared between groups using Kruskall-Wallis tests and post-hoc comparisons. Estimation errors differed significantly between groups (H(3) = 28.48, p food portions and similar for 5 food portions. Weight estimation was associated with a median error of 23.5% (IQR = 79.8). The IFU™ improves volume estimation accuracy compared to other methods. The cubic shape was perceived as favourable, with subdivision and multiplication facilitating volume estimation. Further studies should investigate whether the IFU™ can facilitate portion size training and whether portion size education using the IFU™ is effective and sustainable without the aid. A 3-dimensional IFU™ could serve as a reference
Rowley, Christopher N; Woo, Tom K
2009-12-21
Transition path sampling has been established as a powerful tool for studying the dynamics of rare events. The trajectory generation moves of this Monte Carlo procedure, shooting moves and shifting modes, were developed primarily for rate constant calculations, although this method has been more extensively used to study the dynamics of reactive processes. We have devised and implemented three alternative trajectory generation moves for use with transition path sampling. The centering-shooting move incorporates a shifting move into a shooting move, which centers the transition period in the middle of the trajectory, eliminating the need for shifting moves and generating an ensemble where the transition event consistently occurs near the middle of the trajectory. We have also developed varied-perturbation size shooting moves, wherein smaller perturbations are made if the shooting point is far from the transition event. The trajectories generated using these moves decorrelate significantly faster than with conventional, constant sized perturbations. This results in an increase in the statistical efficiency by a factor of 2.5-5 when compared to the conventional shooting algorithm. On the other hand, the new algorithm breaks detailed balance and introduces a small bias in the transition time distribution. We have developed a modification of this varied-perturbation size shooting algorithm that preserves detailed balance, albeit at the cost of decreased sampling efficiency. Both varied-perturbation size shooting algorithms are found to have improved sampling efficiency when compared to the original constant perturbation size shooting algorithm.
Empirically determining the sample size for large-scale gene network inference algorithms.
Altay, G
2012-04-01
The performance of genome-wide gene regulatory network inference algorithms depends on the sample size. It is generally considered that the larger the sample size, the better the gene network inference performance. Nevertheless, there is not adequate information on determining the sample size for optimal performance. In this study, the author systematically demonstrates the effect of sample size on information-theory-based gene network inference algorithms with an ensemble approach. The empirical results showed that the inference performances of the considered algorithms tend to converge after a particular sample size region. As a specific example, the sample size region around ≃64 is sufficient to obtain the most of the inference performance with respect to precision using the representative algorithm C3NET on the synthetic steady-state data sets of Escherichia coli and also time-series data set of a homo sapiens subnetworks. The author verified the convergence result on a large, real data set of E. coli as well. The results give evidence to biologists to better design experiments to infer gene networks. Further, the effect of cutoff on inference performances over various sample sizes is considered. [Includes supplementary material].
Kohigashi, Tsuyoshi; Otsuka, Yoichi; Shimazu, Ryo; Matsumoto, Takuya; Iwata, Futoshi; Kawasaki, Hideya; Arakawa, Ryuichi
2016-01-01
Mass spectrometry imaging (MSI) with ambient sampling and ionization can rapidly and easily capture the distribution of chemical components in a solid sample. Because the spatial resolution of MSI is limited by the size of the sampling area, reducing sampling size is an important goal for high resolution MSI. Here, we report the first use of a nanopipette for sampling and ionization by tapping-mode scanning probe electrospray ionization (t-SPESI). The spot size of the sampling area of a dye molecular film on a glass substrate was decreased to 6 μm on average by using a nanopipette. On the other hand, ionization efficiency increased with decreasing solvent flow rate. Our results indicate the compatibility between a reduced sampling area and the ionization efficiency using a nanopipette. MSI of micropatterns of ink on a glass and a polymer substrate were also demonstrated.
Kikuchi, Takashi; Gittins, John
2011-08-01
The behavioural Bayes approach to sample size determination for clinical trials assumes that the number of subsequent patients switching to a new drug from the current drug depends on the strength of the evidence for efficacy and safety that was observed in the clinical trials. The optimal sample size is the one which maximises the expected net benefit of the trial. The approach has been developed in a series of papers by Pezeshk and the present authors (Gittins JC, Pezeshk H. A behavioral Bayes method for determining the size of a clinical trial. Drug Information Journal 2000; 34: 355-63; Gittins JC, Pezeshk H. How Large should a clinical trial be? The Statistician 2000; 49(2): 177-87; Gittins JC, Pezeshk H. A decision theoretic approach to sample size determination in clinical trials. Journal of Biopharmaceutical Statistics 2002; 12(4): 535-51; Gittins JC, Pezeshk H. A fully Bayesian approach to calculating sample sizes for clinical trials with binary responses. Drug Information Journal 2002; 36: 143-50; Kikuchi T, Pezeshk H, Gittins J. A Bayesian cost-benefit approach to the determination of sample size in clinical trials. Statistics in Medicine 2008; 27(1): 68-82; Kikuchi T, Gittins J. A behavioral Bayes method to determine the sample size of a clinical trial considering efficacy and safety. Statistics in Medicine 2009; 28(18): 2293-306; Kikuchi T, Gittins J. A Bayesian procedure for cost-benefit evaluation of a new drug in multi-national clinical trials. Statistics in Medicine 2009 (Submitted)). The purpose of this article is to provide a rationale for experimental designs which allocate more patients to the new treatment than to the control group. The model uses a logistic weight function, including an interaction term linking efficacy and safety, which determines the number of patients choosing the new drug, and hence the resulting benefit. A Monte Carlo simulation is employed for the calculation. Having a larger group of patients on the new drug in general
Directory of Open Access Journals (Sweden)
Wei Lin Teoh
Full Text Available Designs of the double sampling (DS X chart are traditionally based on the average run length (ARL criterion. However, the shape of the run length distribution changes with the process mean shifts, ranging from highly skewed when the process is in-control to almost symmetric when the mean shift is large. Therefore, we show that the ARL is a complicated performance measure and that the median run length (MRL is a more meaningful measure to depend on. This is because the MRL provides an intuitive and a fair representation of the central tendency, especially for the rightly skewed run length distribution. Since the DS X chart can effectively reduce the sample size without reducing the statistical efficiency, this paper proposes two optimal designs of the MRL-based DS X chart, for minimizing (i the in-control average sample size (ASS and (ii both the in-control and out-of-control ASSs. Comparisons with the optimal MRL-based EWMA X and Shewhart X charts demonstrate the superiority of the proposed optimal MRL-based DS X chart, as the latter requires a smaller sample size on the average while maintaining the same detection speed as the two former charts. An example involving the added potassium sorbate in a yoghurt manufacturing process is used to illustrate the effectiveness of the proposed MRL-based DS X chart in reducing the sample size needed.
Machine Learning Approaches to Rare Events Sampling and Estimation
Elsheikh, A. H.
2014-12-01
Given the severe impacts of rare events, we try to quantitatively answer the following two questions: How can we estimate the probability of a rare event? And what are the factors affecting these probabilities? We utilize machine learning classification methods to define the failure boundary (in the stochastic space) corresponding to a specific threshold of a rare event. The training samples for the classification algorithm are obtained using multilevel splitting and Monte Carlo (MC) simulations. Once the training of the classifier is performed, a full MC simulation can be performed efficiently using the classifier as a reduced order model replacing the full physics simulator.We apply the proposed method on a standard benchmark for CO2 leakage through an abandoned well. In this idealized test case, CO2 is injected into a deep aquifer and then spreads within the aquifer and, upon reaching an abandoned well; it rises to a shallower aquifer. In current study, we try to evaluate the probability of leakage of a pre-defined amount of the injected CO2 given a heavy tailed distribution of the leaky well permeability. We show that machine learning based approaches significantly outperform direct MC and multi-level splitting methods in terms of efficiency and precision. The proposed algorithm's efficiency and reliability enabled us to perform a sensitivity analysis to the different modeling assumptions including the different prior distributions on the probability of CO2 leakage.
DEFF Research Database (Denmark)
Jimenez Mena, Belen
2016-01-01
Effective population size (Ne) is an important concept to understand the evolution of a population. In conservation, Ne is used to assess the threat status of a population, evaluate its genetic viability in the future and set conservation priorities. An accurate estimation of Ne is thus essential...... is that genetic drift is homogeneous throughout the genome. We explored the variability of Ne throughout the genome of the Danish Holstein cattle, using temporally-spaced samples of individuals genotyped with a 54K SNP chip. We found heterogeneity in Ne across the genome both between chromosomes and in genomic...... the population against threat status thresholds. When molecular markers are not available, populations can be managed using pedigree information. However, this is challenging to do so for group-living species since individuals and their parentage are difficult to determine. We adapted a pedigree-based method...
Lawson, Chris A
2014-07-01
Three experiments with 81 3-year-olds (M=3.62years) examined the conditions that enable young children to use the sample size principle (SSP) of induction-the inductive rule that facilitates generalizations from large rather than small samples of evidence. In Experiment 1, children exhibited the SSP when exemplars were presented sequentially but not when exemplars were presented simultaneously. Results from Experiment 3 suggest that the advantage of sequential presentation is not due to the additional time to process the available input from the two samples but instead may be linked to better memory for specific individuals in the large sample. In addition, findings from Experiments 1 and 2 suggest that adherence to the SSP is mediated by the disparity between presented samples. Overall, these results reveal that the SSP appears early in development and is guided by basic cognitive processes triggered during the acquisition of input. Copyright © 2013 Elsevier Inc. All rights reserved.
Sample size calculations in clinical research should also be based on ethical principles.
Cesana, Bruno Mario; Antonelli, Paolo
2016-03-18
Sample size calculations based on too narrow a width, or with lower and upper confidence limits bounded by fixed cut-off points, not only increase power-based sample sizes to ethically unacceptable levels (thus making research practically unfeasible) but also greatly increase the costs and burdens of clinical trials. We propose an alternative method of combining the power of a statistical test and the probability of obtaining adequate precision (the power of the confidence interval) with an acceptable increase in power-based sample sizes.
Boushey, Carol J; Harris, Jeffrey; Bruemmer, Barbara; Archer, Sujata L
2008-04-01
Members of the Board of Editors recognize the importance of providing a resource for researchers to insure quality and accuracy of reporting in the Journal. This second monograph of a periodic series focuses on study sample selection, sample size, and common statistical procedures using parametric methods, and the presentation of statistical methods and results. Attention to sample selection and sample size is critical to avoid study bias. When outcome variables adhere to a normal distribution, then parametric procedures can be used for statistical inference. Documentation that clearly outlines the steps used in the research process will advance the science of evidence-based practice in nutrition and dietetics. Real examples from problem sets and published literature are provided, as well as reference to books and online resources.
Directory of Open Access Journals (Sweden)
Femke Broekhuis
Full Text Available Many ecological theories and species conservation programmes rely on accurate estimates of population density. Accurate density estimation, especially for species facing rapid declines, requires the application of rigorous field and analytical methods. However, obtaining accurate density estimates of carnivores can be challenging as carnivores naturally exist at relatively low densities and are often elusive and wide-ranging. In this study, we employ an unstructured spatial sampling field design along with a Bayesian sex-specific spatially explicit capture-recapture (SECR analysis, to provide the first rigorous population density estimates of cheetahs (Acinonyx jubatus in the Maasai Mara, Kenya. We estimate adult cheetah density to be between 1.28 ± 0.315 and 1.34 ± 0.337 individuals/100km2 across four candidate models specified in our analysis. Our spatially explicit approach revealed 'hotspots' of cheetah density, highlighting that cheetah are distributed heterogeneously across the landscape. The SECR models incorporated a movement range parameter which indicated that male cheetah moved four times as much as females, possibly because female movement was restricted by their reproductive status and/or the spatial distribution of prey. We show that SECR can be used for spatially unstructured data to successfully characterise the spatial distribution of a low density species and also estimate population density when sample size is small. Our sampling and modelling framework will help determine spatial and temporal variation in cheetah densities, providing a foundation for their conservation and management. Based on our results we encourage other researchers to adopt a similar approach in estimating densities of individually recognisable species.
Broekhuis, Femke; Gopalaswamy, Arjun M
2016-01-01
Many ecological theories and species conservation programmes rely on accurate estimates of population density. Accurate density estimation, especially for species facing rapid declines, requires the application of rigorous field and analytical methods. However, obtaining accurate density estimates of carnivores can be challenging as carnivores naturally exist at relatively low densities and are often elusive and wide-ranging. In this study, we employ an unstructured spatial sampling field design along with a Bayesian sex-specific spatially explicit capture-recapture (SECR) analysis, to provide the first rigorous population density estimates of cheetahs (Acinonyx jubatus) in the Maasai Mara, Kenya. We estimate adult cheetah density to be between 1.28 ± 0.315 and 1.34 ± 0.337 individuals/100km2 across four candidate models specified in our analysis. Our spatially explicit approach revealed 'hotspots' of cheetah density, highlighting that cheetah are distributed heterogeneously across the landscape. The SECR models incorporated a movement range parameter which indicated that male cheetah moved four times as much as females, possibly because female movement was restricted by their reproductive status and/or the spatial distribution of prey. We show that SECR can be used for spatially unstructured data to successfully characterise the spatial distribution of a low density species and also estimate population density when sample size is small. Our sampling and modelling framework will help determine spatial and temporal variation in cheetah densities, providing a foundation for their conservation and management. Based on our results we encourage other researchers to adopt a similar approach in estimating densities of individually recognisable species.
A multi-cyclone sampling array for the collection of size-segregated occupational aerosols.
Mischler, Steven E; Cauda, Emanuele G; Di Giuseppe, Michelangelo; Ortiz, Luis A
2013-01-01
In this study a serial multi-cyclone sampling array capable of simultaneously sampling particles of multiple size fractions, from an occupational environment, for use in in vivo and in vitro toxicity studies and physical/chemical characterization, was developed and tested. This method is an improvement over current methods used to size-segregate occupational aerosols for characterization, due to its simplicity and its ability to collect sufficient masses of nano- and ultrafine sized particles for analysis. This method was evaluated in a chamber providing a uniform atmosphere of dust concentrations using crystalline silica particles. The multi-cyclone sampling array was used to segregate crystalline silica particles into four size fractions, from a chamber concentration of 10 mg/m(3). The size distributions of the particles collected at each stage were confirmed, in the air, before and after each cyclone stage. Once collected, the particle size distribution of each size fraction was measured using light scattering techniques to further confirm the size distributions. As a final confirmation, scanning electron microscopy was used to collect images of each size fraction. The results presented here, using multiple measurement techniques, show that this multi-cyclone system was able to successfully collect distinct size-segregated particles at sufficient masses to perform toxicological evaluations and physical/chemical characterization.
Bice, K.; Clement, S. C.
1981-01-01
X-ray diffraction and spectroscopy were used to investigate the mineralogical and chemical properties of the Calvert, Ball Old Mine, Ball Martin, and Jordan Sediments. The particle size distribution and index of refraction of each sample were determined. The samples are composed primarily of quartz, kaolinite, and illite. The clay minerals are most abundant in the finer particle size fractions. The chemical properties of the four samples are similar. The Calvert sample is most notably different in that it contains a relatively high amount of iron. The dominant particle size fraction in each sample is silt, with lesser amounts of clay and sand. The indices of refraction of the sediments are the same with the exception of the Calvert sample which has a slightly higher value.
Small sample sizes in the study of ontogenetic allometry; implications for palaeobiology.
Brown, Caleb Marshall; Vavrek, Matthew J
2015-01-01
Quantitative morphometric analyses, particularly ontogenetic allometry, are common methods used in quantifying shape, and changes therein, in both extinct and extant organisms. Due to incompleteness and the potential for restricted sample sizes in the fossil record, palaeobiological analyses of allometry may encounter higher rates of error. Differences in sample size between fossil and extant studies and any resulting effects on allometric analyses have not been thoroughly investigated, and a logical lower threshold to sample size is not clear. Here we show that studies based on fossil datasets have smaller sample sizes than those based on extant taxa. A similar pattern between vertebrates and invertebrates indicates this is not a problem unique to either group, but common to both. We investigate the relationship between sample size, ontogenetic allometric relationship and statistical power using an empirical dataset of skull measurements of modern Alligator mississippiensis. Across a variety of subsampling techniques, used to simulate different taphonomic and/or sampling effects, smaller sample sizes gave less reliable and more variable results, often with the result that allometric relationships will go undetected due to Type II error (failure to reject the null hypothesis). This may result in a false impression of fewer instances of positive/negative allometric growth in fossils compared to living organisms. These limitations are not restricted to fossil data and are equally applicable to allometric analyses of rare extant taxa. No mathematically derived minimum sample size for ontogenetic allometric studies is found; rather results of isometry (but not necessarily allometry) should not be viewed with confidence at small sample sizes.
Plaisance, L.; Knowlton, N.; Paulay, G.; Meyer, C.
2009-12-01
The cryptofauna associated with coral reefs accounts for a major part of the biodiversity in these ecosystems but has been largely overlooked in biodiversity estimates because the organisms are hard to collect and identify. We combine a semi-quantitative sampling design and a DNA barcoding approach to provide metrics for the diversity of reef-associated crustacean. Twenty-two similar-sized dead heads of Pocillopora were sampled at 10 m depth from five central Pacific Ocean localities (four atolls in the Northern Line Islands and in Moorea, French Polynesia). All crustaceans were removed, and partial cytochrome oxidase subunit I was sequenced from 403 individuals, yielding 135 distinct taxa using a species-level criterion of 5% similarity. Most crustacean species were rare; 44% of the OTUs were represented by a single individual, and an additional 33% were represented by several specimens found only in one of the five localities. The Northern Line Islands and Moorea shared only 11 OTUs. Total numbers estimated by species richness statistics (Chao1 and ACE) suggest at least 90 species of crustaceans in Moorea and 150 in the Northern Line Islands for this habitat type. However, rarefaction curves for each region failed to approach an asymptote, and Chao1 and ACE estimators did not stabilize after sampling eight heads in Moorea, so even these diversity figures are underestimates. Nevertheless, even this modest sampling effort from a very limited habitat resulted in surprisingly high species numbers.
Sample size determination for logistic regression on a logit-normal distribution.
Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance
2017-06-01
Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.
Jirapatnakul, Artit C; Fotin, Sergei V; Reeves, Anthony P; Biancardi, Alberto M; Yankelevitz, David F; Henschke, Claudia I
2009-01-01
Estimation of nodule location and size is an important pre-processing step in some nodule segmentation algorithms to determine the size and location of the region of interest. Ideally, such estimation methods will consistently find the same nodule location regardless of where the the seed point (provided either manually or by a nodule detection algorithm) is placed relative to the "true" center of the nodule, and the size should be a reasonable estimate of the true nodule size. We developed a method that estimates nodule location and size using multi-scale Laplacian of Gaussian (LoG) filtering. Nodule candidates near a given seed point are found by searching for blob-like regions with high filter response. The candidates are then pruned according to filter response and location, and the remaining candidates are sorted by size and the largest candidate selected. This method was compared to a previously published template-based method. The methods were evaluated on the basis of stability of the estimated nodule location to changes in the initial seed point and how well the size estimates agreed with volumes determined by a semi-automated nodule segmentation method. The LoG method exhibited better stability to changes in the seed point, with 93% of nodules having the same estimated location even when the seed point was altered, compared to only 52% of nodules for the template-based method. Both methods also showed good agreement with sizes determined by a nodule segmentation method, with an average relative size difference of 5% and -5% for the LoG and template-based methods respectively.
Sample size calculations in clinical research should also be based on ethical principles
Cesana, Bruno Mario; Antonelli, Paolo
2016-01-01
Sample size calculations based on too narrow a width, or with lower and upper confidence limits bounded by fixed cut-off points, not only increase power-based sample sizes to ethically unacceptable levels (thus making research practically unfeasible) but also greatly increase the costs and burdens of clinical trials. We propose an alternative method of combining the power of a statistical test and the probability of obtaining adequate precision (the power of the confidence interval) with an a...
DEFF Research Database (Denmark)
Andreasen, Jo Bønding; Pistor-Riebold, Thea Unger; Knudsen, Ingrid Hell
2014-01-01
Background: To minimise the volume of blood used for diagnostic procedures, especially in children, we investigated whether the size of sample tubes affected whole blood coagulation analyses. Methods: We included 20 healthy individuals for rotational thromboelastometry (RoTEM®) analyses and compa......Background: To minimise the volume of blood used for diagnostic procedures, especially in children, we investigated whether the size of sample tubes affected whole blood coagulation analyses. Methods: We included 20 healthy individuals for rotational thromboelastometry (RoTEM®) analyses...
Sample size for equivalence trials: a case study from a vaccine lot consistency trial.
Ganju, Jitendra; Izu, Allen; Anemona, Alessandra
2008-08-30
For some trials, simple but subtle assumptions can have a profound impact on the size of the trial. A case in point is a vaccine lot consistency (or equivalence) trial. Standard sample size formulas used for designing lot consistency trials rely on only one component of variation, namely, the variation in antibody titers within lots. The other component, the variation in the means of titers between lots, is assumed to be equal to zero. In reality, some amount of variation between lots, however small, will be present even under the best manufacturing practices. Using data from a published lot consistency trial, we demonstrate that when the between-lot variation is only 0.5 per cent of the total variation, the increase in the sample size is nearly 300 per cent when compared with the size assuming that the lots are identical. The increase in the sample size is so pronounced that in order to maintain power one is led to consider a less stringent criterion for demonstration of lot consistency. The appropriate sample size formula that is a function of both components of variation is provided. We also discuss the increase in the sample size due to correlated comparisons arising from three pairs of lots as a function of the between-lot variance.
Li, Chung-I; Shyr, Yu
2016-12-01
As RNA-seq rapidly develops and costs continually decrease, the quantity and frequency of samples being sequenced will grow exponentially. With proteomic investigations becoming more multivariate and quantitative, determining a study's optimal sample size is now a vital step in experimental design. Current methods for calculating a study's required sample size are mostly based on the hypothesis testing framework, which assumes each gene count can be modeled through Poisson or negative binomial distributions; however, these methods are limited when it comes to accommodating covariates. To address this limitation, we propose an estimating procedure based on the generalized linear model. This easy-to-use method constructs a representative exemplary dataset and estimates the conditional power, all without requiring complicated mathematical approximations or formulas. Even more attractive, the downstream analysis can be performed with current R/Bioconductor packages. To demonstrate the practicability and efficiency of this method, we apply it to three real-world studies, and introduce our on-line calculator developed to determine the optimal sample size for a RNA-seq study.
ON ESTIMATION AND HYPOTHESIS TESTING OF THE GRAIN SIZE DISTRIBUTION BY THE SALTYKOV METHOD
Directory of Open Access Journals (Sweden)
Yuri Gulbin
2011-05-01
Full Text Available The paper considers the problem of validity of unfolding the grain size distribution with the back-substitution method. Due to the ill-conditioned nature of unfolding matrices, it is necessary to evaluate the accuracy and precision of parameter estimation and to verify the possibility of expected grain size distribution testing on the basis of intersection size histogram data. In order to review these questions, the computer modeling was used to compare size distributions obtained stereologically with those possessed by three-dimensional model aggregates of grains with a specified shape and random size. Results of simulations are reported and ways of improving the conventional stereological techniques are suggested. It is shown that new improvements in estimating and testing procedures enable grain size distributions to be unfolded more efficiently.
Sample size choices for XRCT scanning of highly unsaturated soil mixtures
Directory of Open Access Journals (Sweden)
Smith Jonathan C.
2016-01-01
Full Text Available Highly unsaturated soil mixtures (clay, sand and gravel are used as building materials in many parts of the world, and there is increasing interest in understanding their mechanical and hydraulic behaviour. In the laboratory, x-ray computed tomography (XRCT is becoming more widely used to investigate the microstructures of soils, however a crucial issue for such investigations is the choice of sample size, especially concerning the scanning of soil mixtures where there will be a range of particle and void sizes. In this paper we present a discussion (centred around a new set of XRCT scans on sample sizing for scanning of samples comprising soil mixtures, where a balance has to be made between realistic representation of the soil components and the desire for high resolution scanning, We also comment on the appropriateness of differing sample sizes in comparison to sample sizes used for other geotechnical testing. Void size distributions for the samples are presented and from these some hypotheses are made as to the roles of inter- and intra-aggregate voids in the mechanical behaviour of highly unsaturated soils.
Li, Chung-I; Su, Pei-Fang; Guo, Yan; Shyr, Yu
2013-01-01
Sample size determination is an important issue in the experimental design of biomedical research. Because of the complexity of RNA-seq experiments, however, the field currently lacks a sample size method widely applicable to differential expression studies utilising RNA-seq technology. In this report, we propose several methods for sample size calculation for single-gene differential expression analysis of RNA-seq data under Poisson distribution. These methods are then extended to multiple genes, with consideration for addressing the multiple testing problem by controlling false discovery rate. Moreover, most of the proposed methods allow for closed-form sample size formulas with specification of the desired minimum fold change and minimum average read count, and thus are not computationally intensive. Simulation studies to evaluate the performance of the proposed sample size formulas are presented; the results indicate that our methods work well, with achievement of desired power. Finally, our sample size calculation methods are applied to three real RNA-seq data sets.
Estimation of optimal size of plots for experiments with radiometer in ...
African Journals Online (AJOL)
Aghomotsegin
2015-07-29
Jul 29, 2015 ... Estimation of optimal size of plots for experiments with radiometer in beans. Roger Nabeyama Michels1*, Marcelo Giovanetti Canteri2, Inês Cristina de Batista Fonseca2,. Marcelo ... with beans, the size of the portions differ according to the ... obtained through the minor form: 0.45 m × 1 m (Table 1). The.
Inkmann, J.
2005-01-01
The inverse probability weighted Generalised Empirical Likelihood (IPW-GEL) estimator is proposed for the estimation of the parameters of a vector of possibly non-linear unconditional moment functions in the presence of conditionally independent sample selection or attrition.The estimator is applied
Estimation for Domains in Double Sampling with Probabilities ...
African Journals Online (AJOL)
Available publications show that the variance of an estimator of a domain parameter depends variance of the study variable for the domain elements and on the variance of the mean of that variable for element of the domain in each constituent stratum. In this article, we show that the variance of an estimator of a domain total ...
Skrbinšek, Tomaž; Jelenčič, Maja; Waits, Lisette; Kos, Ivan; Jerina, Klemen; Trontelj, Peter
2012-02-01
The effective population size (N(e) ) could be the ideal parameter for monitoring populations of conservation concern as it conveniently summarizes both the evolutionary potential of the population and its sensitivity to genetic stochasticity. However, tracing its change through time is difficult in natural populations. We applied four new methods for estimating N(e) from a single sample of genotypes to trace temporal change in N(e) for bears in the Northern Dinaric Mountains. We genotyped 510 bears using 20 microsatellite loci and determined their age. The samples were organized into cohorts with regard to the year when the animals were born and yearly samples with age categories for every year when they were alive. We used the Estimator by Parentage Assignment (EPA) to directly estimate both N(e) and generation interval for each yearly sample. For cohorts, we estimated the effective number of breeders (N(b) ) using linkage disequilibrium, sibship assignment and approximate Bayesian computation methods and extrapolated these estimates to N(e) using the generation interval. The N(e) estimate by EPA is 276 (183-350 95% CI), meeting the inbreeding-avoidance criterion of N(e) > 50 but short of the long-term minimum viable population goal of N(e) > 500. The results obtained by the other methods are highly consistent with this result, and all indicate a rapid increase in N(e) probably in the late 1990s and early 2000s. The new single-sample approaches to the estimation of N(e) provide efficient means for including N(e) in monitoring frameworks and will be of great importance for future management and conservation. © 2012 Blackwell Publishing Ltd.
Ryskin, Rachel A; Brown-Schmidt, Sarah
2014-01-01
Seven experiments use large sample sizes to robustly estimate the effect size of a previous finding that adults are more likely to commit egocentric errors in a false-belief task when the egocentric response is plausible in light of their prior knowledge. We estimate the true effect size to be less than half of that reported in the original findings. Even though we found effects in the same direction as the original, they were substantively smaller; the original study would have had less than 33% power to detect an effect of this magnitude. The influence of plausibility on the curse of knowledge in adults appears to be small enough that its impact on real-life perspective-taking may need to be reevaluated.
Directory of Open Access Journals (Sweden)
Rachel A Ryskin
Full Text Available Seven experiments use large sample sizes to robustly estimate the effect size of a previous finding that adults are more likely to commit egocentric errors in a false-belief task when the egocentric response is plausible in light of their prior knowledge. We estimate the true effect size to be less than half of that reported in the original findings. Even though we found effects in the same direction as the original, they were substantively smaller; the original study would have had less than 33% power to detect an effect of this magnitude. The influence of plausibility on the curse of knowledge in adults appears to be small enough that its impact on real-life perspective-taking may need to be reevaluated.
The efficient and unbiased estimation of nuclear size variability using the 'selector'
DEFF Research Database (Denmark)
McMillan, A M; Sørensen, Flemming Brandt
1992-01-01
The selector was used to make an unbiased estimation of nuclear size variability in one benign naevocellular skin tumour and one cutaneous malignant melanoma. The results showed that the estimates obtained using the selector were comparable to those obtained using the more time consuming Cavalieri...
Modeling grain-size dependent bias in estimating forest area: a regional application
Daolan Zheng; Linda S. Heath; Mark J. Ducey
2008-01-01
A better understanding of scaling-up effects on estimating important landscape characteristics (e.g. forest percentage) is critical for improving ecological applications over large areas. This study illustrated effects of changing grain sizes on regional forest estimates in Minnesota, Wisconsin, and Michigan of the USA using 30-m land-cover maps (1992 and 2001)...
Geoffrey H. Donovan; Peter. Noordijk
2005-01-01
To determine the optimal suppression strategy for escaped wildfires, federal land managers are requiredto conduct a wildland fire situation analysis (WFSA). As part of the WFSA process, fire managers estimate final fire size and suppression costs. Estimates from 58 WFSAs conducted during the 2002 fire season are compared to actual outcomes. Results indicate that...
Estimation method for mathematical expectation of continuous variable upon ordered sample
Domchenkov, O. A.
2014-01-01
Method for estimation of mathematical expectation of a continuous variable based on analysis of the ordered sample is proposed. The method admits the estimation class propagation on nonlinear estimation classes.
DEFF Research Database (Denmark)
Kokkalis, Alexandros; Thygesen, Uffe Høgsbro; Nielsen, Anders
Estimation of the status of fish stocks is important for sustainable management. Data limitations and data quality hinder this task. The commonly used age-based approaches require information about individual age, which is costly and relatively inaccurate. In contrast, the size of organisms...... is linked to physiology more directly than is age, and can be measured easier with less cost. In this work we used a single-species size-based model to estimate the fishing mortality (F) and the status of the stock, quantified by the ratio F/Fmsy between actual fishing mortality and the fishing mortality...... which leads to the maximum sustainable yield. A simulation analysis was done to investigate the sensitivity of the estimation and its improvement when stock specific life history information is available. To evaluate our approach with real observations, data-rich fish stocks, like the North Sea cod...
Optimal sample sizes for Welch's test under various allocation and cost considerations.
Jan, Show-Li; Shieh, Gwowen
2011-12-01
The issue of the sample size necessary to ensure adequate statistical power has been the focus of considerableattention in scientific research. Conventional presentations of sample size determination do not consider budgetary and participant allocation scheme constraints, although there is some discussion in the literature. The introduction of additional allocation and cost concerns complicates study design, although the resulting procedure permits a practical treatment of sample size planning. This article presents exact techniques for optimizing sample size determinations in the context of Welch (Biometrika, 29, 350-362, 1938) test of the difference between two means under various design and cost considerations. The allocation schemes include cases in which (1) the ratio of group sizes is given and (2) one sample size is specified. The cost implications suggest optimally assigning subjects (1) to attain maximum power performance for a fixed cost and (2) to meet adesignated power level for the least cost. The proposed methods provide useful alternatives to the conventional procedures and can be readily implemented with the developed R and SAS programs that are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
Fienen, Michael N.; Selbig, William R.
2012-01-01
A new sample collection system was developed to improve the representation of sediment entrained in urban storm water by integrating water quality samples from the entire water column. The depth-integrated sampler arm (DISA) was able to mitigate sediment stratification bias in storm water, thereby improving the characterization of suspended-sediment concentration and particle size distribution at three independent study locations. Use of the DISA decreased variability, which improved statistical regression to predict particle size distribution using surrogate environmental parameters, such as precipitation depth and intensity. The performance of this statistical modeling technique was compared to results using traditional fixed-point sampling methods and was found to perform better. When environmental parameters can be used to predict particle size distributions, environmental managers have more options when characterizing concentrations, loads, and particle size distributions in urban runoff.
SMALL SAMPLE SIZE IN 2X2 CROSS OVER DESIGNS: CONDITIONS OF DETERMINATION
Directory of Open Access Journals (Sweden)
B SOLEYMANI
2001-09-01
Full Text Available Introduction. Determination of small sample size in some clinical trials is a matter of importance. In cross-over studies which are one types of clinical trials, the matter is more significant. In this article, the conditions in which determination of small sample size in cross-over studies are possible were considered, and the effect of deviation from normality on the matter has been shown. Methods. The present study has been done on such 2x2 cross-over studies that variable of interest is quantitative one and is measurable by ratio or interval scale. The method of consideration is based on use of variable and sample mean"s distributions, central limit theorem, method of sample size determination in two groups, and cumulant or moment generating function. Results. In normal variables or transferable to normal variables, there is no restricting factors other than significant level and power of the test for determination of sample size, but in the case of non-normal variables, it should be determined such large that guarantee the normality of sample mean"s distribution. Discussion. In such cross over studies that because of existence of theoretical base, few samples can be computed, one should not do it without taking applied worth of results into consideration. While determining sample size, in addition to variance, it is necessary to consider distribution of variable, particularly through its skewness and kurtosis coefficients. the more deviation from normality, the more need of samples. Since in medical studies most of the continuous variables are closed to normal distribution, a few number of samples often seems to be adequate for convergence of sample mean to normal distribution.
Norm Block Sample Sizes: A Review of 17 Individually Administered Intelligence Tests
Norfolk, Philip A.; Farmer, Ryan L.; Floyd, Randy G.; Woods, Isaac L.; Hawkins, Haley K.; Irby, Sarah M.
2015-01-01
The representativeness, recency, and size of norm samples strongly influence the accuracy of inferences drawn from their scores. Inadequate norm samples may lead to inflated or deflated scores for individuals and poorer prediction of developmental and academic outcomes. The purpose of this study was to apply Kranzler and Floyd's method for…
Asian elephants in China: estimating population size and evaluating habitat suitability.
Directory of Open Access Journals (Sweden)
Li Zhang
Full Text Available We monitored the last remaining Asian elephant populations in China over the past decade. Using DNA tools and repeat genotyping, we estimated the population sizes from 654 dung samples collected from various areas. Combined with morphological individual identifications from over 6,300 elephant photographs taken in the wild, we estimated that the total Asian elephant population size in China is between 221 and 245. Population genetic structure and diversity were examined using a 556-bp fragment of mitochondrial DNA, and 24 unique haplotypes were detected from DNA analysis of 178 individuals. A phylogenetic analysis revealed two highly divergent clades of Asian elephants, α and β, present in Chinese populations. Four populations (Mengla, Shangyong, Mengyang, and Pu'Er carried mtDNA from the α clade, and only one population (Nangunhe carried mtDNA belonging to the β clade. Moreover, high genetic divergence was observed between the Nangunhe population and the other four populations; however, genetic diversity among the five populations was low, possibly due to limited gene flow because of habitat fragmentation. The expansion of rubber plantations, crop cultivation, and villages along rivers and roads had caused extensive degradation of natural forest in these areas. This had resulted in the loss and fragmentation of elephant habitats and had formed artificial barriers that inhibited elephant migration. Using Geographic Information System, Global Positioning System, and Remote Sensing technology, we found that the area occupied by rubber plantations, tea farms, and urban settlements had dramatically increased over the past 40 years, resulting in the loss and fragmentation of elephant habitats and forming artificial barriers that inhibit elephant migration. The restoration of ecological corridors to facilitate gene exchange among isolated elephant populations and the establishment of cross-boundary protected areas between China and Laos to secure
Asian elephants in China: estimating population size and evaluating habitat suitability.
Zhang, Li; Dong, Lu; Lin, Liu; Feng, Limin; Yan, Fan; Wang, Lanxin; Guo, Xianming; Luo, Aidong
2015-01-01
We monitored the last remaining Asian elephant populations in China over the past decade. Using DNA tools and repeat genotyping, we estimated the population sizes from 654 dung samples collected from various areas. Combined with morphological individual identifications from over 6,300 elephant photographs taken in the wild, we estimated that the total Asian elephant population size in China is between 221 and 245. Population genetic structure and diversity were examined using a 556-bp fragment of mitochondrial DNA, and 24 unique haplotypes were detected from DNA analysis of 178 individuals. A phylogenetic analysis revealed two highly divergent clades of Asian elephants, α and β, present in Chinese populations. Four populations (Mengla, Shangyong, Mengyang, and Pu'Er) carried mtDNA from the α clade, and only one population (Nangunhe) carried mtDNA belonging to the β clade. Moreover, high genetic divergence was observed between the Nangunhe population and the other four populations; however, genetic diversity among the five populations was low, possibly due to limited gene flow because of habitat fragmentation. The expansion of rubber plantations, crop cultivation, and villages along rivers and roads had caused extensive degradation of natural forest in these areas. This had resulted in the loss and fragmentation of elephant habitats and had formed artificial barriers that inhibited elephant migration. Using Geographic Information System, Global Positioning System, and Remote Sensing technology, we found that the area occupied by rubber plantations, tea farms, and urban settlements had dramatically increased over the past 40 years, resulting in the loss and fragmentation of elephant habitats and forming artificial barriers that inhibit elephant migration. The restoration of ecological corridors to facilitate gene exchange among isolated elephant populations and the establishment of cross-boundary protected areas between China and Laos to secure their natural
Yamada, Fábio Hideki; Takemoto, Ricardo Massato
2017-06-01
Accurately estimating biodiversity is fundamental to ecological understanding and prediction. Helminthes are often neglected in biodiversity estimates and when included are often underestimated. Here we examine how sampling effort affects estimates of parasite diversity in an assemblage of freshwater fish from a floodplain in Brazil. We also examine how ecological and behavioral factors influence the sampling effort necessary to accurately estimate the parasite diversity associated with a fish species. We use our dataset to suggest that host species with wide geographic distribution (i.e., long migrations), gregarious behavior (i.e., shoal), larger body size, higher population density, wide diet breadth (i.e., omnivorous), and autochthonous origin, increase the effort necessary to estimate the total diversity of parasites. However, estimating this parasitic fauna has several restrictions and limitations, due to the highly complex of the floodplain ecosystems, with non-linear and non-random responses.
Saura, María; Tenesa, Albert; Woolliams, John A; Fernández, Almudena; Villanueva, Beatriz
2015-11-11
Within the genetic methods for estimating effective population size (N e ), the method based on linkage disequilibrium (LD) has advantages over other methods, although its accuracy when applied to populations with overlapping generations is a matter of controversy. It is also unclear the best way to account for mutation and sample size when this method is implemented. Here we have addressed the applicability of this method using genome-wide information when generations overlap by profiting from having available a complete and accurate pedigree from an experimental population of Iberian pigs. Precise pedigree-based estimates of N e were considered as a baseline against which to compare LD-based estimates. We assumed six different statistical models that varied in the adjustments made for mutation and sample size. The approach allowed us to determine the most suitable statistical model of adjustment when the LD method is used for species with overlapping generations. A novel approach used here was to treat different generations as replicates of the same population in order to assess the error of the LD-based N e estimates. LD-based N e estimates obtained by estimating the mutation parameter from the data and by correcting sample size using the 1/2n term were the closest to pedigree-based estimates. The N e at the time of the foundation of the herd (26 generations ago) was 20.8 ± 3.7 (average and SD across replicates), while the pedigree-based estimate was 21. From that time on, this trend was in good agreement with that followed by pedigree-based N e. Our results showed that when using genome-wide information, the LD method is accurate and broadly applicable to small populations even when generations overlap. This supports the use of the method for estimating N e when pedigree information is unavailable in order to effectively monitor and manage populations and to early detect population declines. To our knowledge this is the first study using replicates of
Estimating Soil Water Retention Curve Using The Particle Size Distribution Based on Fractal Approach
Directory of Open Access Journals (Sweden)
M.M. Chari
2016-02-01
showed that the fractal dimension of particle size distributions obtained with both methods were not significantly different from each other. DSWRCwas also using the suction-moisture . The results indicate that all three fractal dimensions related to soil texture and clay content of the soil increases. Linear regression relationships between Dm1 and Dm2 with DSWRC was created using 48 soil samples in order to determine the coefficient of 0.902 and 0.871 . Then, based on relationships obtained from the four methods (1- Dm1 = DSWRC, 2-regression equationswere obtained Dm1, 3- Dm2 = DSWRC and 4. The regression equation obtained Dm2. DSWRC expression was used to express DSWRC. Various models for the determination of soil moisture suction according to statistical indicators normalized root mean square error, mean error, relative error.And mean geometric modeling efficiency was evaluated. The results of all four fractalsare close to each other and in most soils it is consistent with the measured data. Models predict the ability to work well in sandy loam soil fractal models and the predicted measured moisture value is less than the estimated fractal dimension- less than its actual value is the moisture curve. Conclusions: In this study, the work of Skaggs et al. (24 was used and it was amended by Fooladmand and Sepaskhah (8 grading curve using the percentage of developed sand, silt and clay . The fractal dimension of the particle size distribution was obtained.The fractal dimension particle size of the radius of the particle size of sand, silt and clay were used, respectively.In general, the study of fractals to simulate the effectiveness of retention curve proved successful. And soon it was found that the use of data, such as sand, silt and clay retention curve can be estimated with reasonable accuracy.
Body size estimation and body dissatisfaction in eating disorder patients and normal controls.
Fernández, F; Probst, M; Meermann, R; Vandereycken, W
1994-11-01
In this study comparing 41 eating disorder patients and 34 female controls, the video distortion technique was used to test the accuracy of body size estimation and to assess the ideal body image. No difference was found in the estimation of actual body sizes, although the accuracy of estimation was quite variable in both bulimics and anorexics. With regard to the ideal body image, significant differences were found: All bulimics and 92.6% of the controls wished to be thinner versus 42.9% of the anorexics (23.8% wished to be larger). Looking at subjective body experience, as measured with a self-report questionnaire (Body Attitudes Test), body dissatisfaction appeared to be negatively correlated with the ideal body image but not with the estimation of actual body sizes.
Constrained statistical inference: sample-size tables for ANOVA and regression
Directory of Open Access Journals (Sweden)
Leonard eVanbrabant
2015-01-01
Full Text Available Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient beta1 is larger than beta2 and beta3. The corresponding hypothesis is H: beta1 > {beta2, beta3} and this is known as an (order constrained hypothesis. A major advantage of testing such a hypothesis is that power can be gained and inherently a smaller sample size is needed. This article discusses this gain in sample size reduction, when an increasing number of constraints is included into the hypothesis. The main goal is to present sample-size tables for constrained hypotheses. A sample-size table contains the necessary sample-size at a prespecified power (say, 0.80 for an increasing number of constraints. To obtain sample-size tables, two Monte Carlo simulations were performed, one for ANOVA and one for multiple regression. Three results are salient. First, in an ANOVA the needed sample-size decreases with 30% to 50% when complete ordering of the parameters is taken into account. Second, small deviations from the imposed order have only a minor impact on the power. Third, at the maximum number of constraints, the linear regression results are comparable with the ANOVA results. However, in the case of fewer constraints, ordering the parameters (e.g., beta1 > beta2 results in a higher power than assigning a positive or a negative sign to the parameters (e.g., beta1 > 0.
Use of High-Frequency In-Home Monitoring Data May Reduce Sample Sizes Needed in Clinical Trials.
Directory of Open Access Journals (Sweden)
Hiroko H Dodge
Full Text Available Trials in Alzheimer's disease are increasingly focusing on prevention in asymptomatic individuals. This poses a challenge in examining treatment effects since currently available approaches are often unable to detect cognitive and functional changes among asymptomatic individuals. Resultant small effect sizes require large sample sizes using biomarkers or secondary measures for randomized controlled trials (RCTs. Better assessment approaches and outcomes capable of capturing subtle changes during asymptomatic disease stages are needed.We aimed to develop a new approach to track changes in functional outcomes by using individual-specific distributions (as opposed to group-norms of unobtrusive continuously monitored in-home data. Our objective was to compare sample sizes required to achieve sufficient power to detect prevention trial effects in trajectories of outcomes in two scenarios: (1 annually assessed neuropsychological test scores (a conventional approach, and (2 the likelihood of having subject-specific low performance thresholds, both modeled as a function of time.One hundred nineteen cognitively intact subjects were enrolled and followed over 3 years in the Intelligent Systems for Assessing Aging Change (ISAAC study. Using the difference in empirically identified time slopes between those who remained cognitively intact during follow-up (normal control, NC and those who transitioned to mild cognitive impairment (MCI, we estimated comparative sample sizes required to achieve up to 80% statistical power over a range of effect sizes for detecting reductions in the difference in time slopes between NC and MCI incidence before transition.Sample size estimates indicated approximately 2000 subjects with a follow-up duration of 4 years would be needed to achieve a 30% effect size when the outcome is an annually assessed memory test score. When the outcome is likelihood of low walking speed defined using the individual-specific distributions of
Hamilton, A J; Waters, E K; Kim, H J; Pak, W S; Furlong, M J
2009-06-01
The combined action of two lepidoteran pests, Plutella xylostella L. (Plutellidae) and Pieris rapae L. (Pieridae),causes significant yield losses in cabbage (Brassica oleracea variety capitata) crops in the Democratic People's Republic of Korea. Integrated pest management (IPM) strategies for these cropping systems are in their infancy, and sampling plans have not yet been developed. We used statistical resampling to assess the performance of fixed sample size plans (ranging from 10 to 50 plants). First, the precision (D = SE/mean) of the plans in estimating the population mean was assessed. There was substantial variation in achieved D for all sample sizes, and sample sizes of at least 20 and 45 plants were required to achieve the acceptable precision level of D < or = 0.3 at least 50 and 75% of the time, respectively. Second, the performance of the plans in classifying the population density relative to an economic threshold (ET) was assessed. To account for the different damage potentials of the two species the ETs were defined in terms of standard insects (SIs), where 1 SI = 1 P. rapae = 5 P. xylostella larvae. The plans were implemented using different economic thresholds (ETs) for the three growth stages of the crop: precupping (1 SI/plant), cupping (0.5 SI/plant), and heading (4 SI/plant). Improvement in the classification certainty with increasing sample sizes could be seen through the increasing steepness of operating characteristic curves. Rather than prescribe a particular plan, we suggest that the results of these analyses be used to inform practitioners of the relative merits of the different sample sizes.
ESTIMATION OF AMOXICILLIN RESIDUES IN COMMERCIAL MEAT AND MILK SAMPLES
Directory of Open Access Journals (Sweden)
Ainee Irum
2014-08-01
Full Text Available The present study was conducted to evaluate the extent of ß - lactam antibiotic, amoxicillin residues in market milk and meat. Samples were randomly collected from Faisalabad city, Pakistan. High Performance Liquid Chromatography (HPLC method with inflorescent detector was used to detect, identify and quantify the amoxicillin residues in milk and meat samples. The milk samples were purified by performing a protein precipitation step, followed by derivatization. To clean up tissue samples, a liquid extraction, followed by a solid-phase extraction procedure C18 (4.0X4.6mm, 5μm was performed. A 50% meat and 90% milk samples were found contaminated with residues. The residues of amoxicillin in milk were in range of 28 to 46μg/kg and in meat were 9 to 84μg/kg. All of the contaminated milk and 40 out of 50% meat samples fall in maximum residue limits.
The effect of sampling effort on estimates of methane ebullition from peat
Ramirez, Jorge A.; Baird, Andy J.; Coulthard, Tom J.
2017-05-01
We investigated the effect of sample size and sampling duration on methane bubble flux (ebullition) estimates from peat using a computer model. A field scale (10 m), seasonal (>100 days) simulation of ebullition from a two-dimensional (2-D) structurally varying peat profile was modeled at fine spatial resolution (1 mm × 1 mm). The spatial and temporal scale of this simulation was possible because of the computational efficiency of the reduced-complexity approach that was implemented, and patterns of simulated ebullition were consistent with those found in the field and laboratory. The simulated ebullition from the peat profile suggested that decreases in peat porosity—which cause increases in gas storage—produce ebullition that becomes increasingly patchy in space and erratic in time. By applying different amounts of spatial and temporal sampling effort, it was possible to determine the uncertainty in ebullition estimates from the peatland. The results suggest that traditional methods to measure ebullition can equally overestimate and underestimate flux by 20% and large ebullition events can lead to large overestimations of flux when sampling effort is low. Our findings support those of field studies, and we recommend that ebullition should be measured frequently (hourly to daily) and at many locations (n > 14).
Power and sample size calculations for Mendelian randomization studies using one genetic instrument.
Freeman, Guy; Cowling, Benjamin J; Schooling, C Mary
2013-08-01
Mendelian randomization, which is instrumental variable analysis using genetic variants as instruments, is an increasingly popular method of making causal inferences from observational studies. In order to design efficient Mendelian randomization studies, it is essential to calculate the sample sizes required. We present formulas for calculating the power of a Mendelian randomization study using one genetic instrument to detect an effect of a given size, and the minimum sample size required to detect effects for given levels of significance and power, using asymptotic statistical theory. We apply the formulas to some example data and compare the results with those from simulation methods. Power and sample size calculations using these formulas should be more straightforward to carry out than simulation approaches. These formulas make explicit that the sample size needed for Mendelian randomization study is inversely proportional to the square of the correlation between the genetic instrument and the exposure and proportional to the residual variance of the outcome after removing the effect of the exposure, as well as inversely proportional to the square of the effect size.
[Explanation of samples sizes in current biomedical journals: an irrational requirement].
Silva Ayçaguer, Luis Carlos; Alonso Galbán, Patricia
2013-01-01
To discuss the theoretical relevance of current requirements for explanations of the sample sizes employed in published studies, and to assess the extent to which these requirements are currently met by authors and demanded by referees and editors. A literature review was conducted to gain insight into and critically discuss the possible rationale underlying the requirement of justifying sample sizes. A descriptive bibliometric study was then carried out based on the original studies published in the six journals with the highest impact factor in the field of health in 2009. All the arguments used to support the requirement of an explanation of sample sizes are feeble, and there are several reasons why they should not be endorsed. These instructions are neglected in most of the studies published in the current literature with the highest impact factor. In 56% (95%CI: 52-59) of the articles, the sample size used was not substantiated, and only 27% (95%CI: 23-30) met all the requirements contained in the guidelines adhered to by the journals studied. Based on this study, we conclude that there are no convincing arguments justifying the requirement for an explanation of how the sample size was reached in published articles. There is no sound basis for this requirement, which not only does not promote the transparency of research reports but rather contributes to undermining it. Copyright © 2011 SESPAS. Published by Elsevier Espana. All rights reserved.
Exploratory factor analysis with small sample sizes: a comparison of three approaches.
Jung, Sunho
2013-07-01
Exploratory factor analysis (EFA) has emerged in the field of animal behavior as a useful tool for determining and assessing latent behavioral constructs. Because the small sample size problem often occurs in this field, a traditional approach, unweighted least squares, has been considered the most feasible choice for EFA. Two new approaches were recently introduced in the statistical literature as viable alternatives to EFA when sample size is small: regularized exploratory factor analysis and generalized exploratory factor analysis. A simulation study is conducted to evaluate the relative performance of these three approaches in terms of factor recovery under various experimental conditions of sample size, degree of overdetermination, and level of communality. In this study, overdetermination and sample size are the meaningful conditions in differentiating the performance of the three approaches in factor recovery. Specifically, when there are a relatively large number of factors, regularized exploratory factor analysis tends to recover the correct factor structure better than the other two approaches. Conversely, when few factors are retained, unweighted least squares tends to recover the factor structure better. Finally, generalized exploratory factor analysis exhibits very poor performance in factor recovery compared to the other approaches. This tendency is particularly prominent as sample size increases. Thus, generalized exploratory factor analysis may not be a good alternative to EFA. Regularized exploratory factor analysis is recommended over unweighted least squares unless small expected number of factors is ensured. Copyright © 2013 Elsevier B.V. All rights reserved.
Scott, Neil W; Fayers, Peter M; Aaronson, Neil K; Bottomley, Andrew; de Graeff, Alexander; Groenvold, Mogens; Gundy, Chad; Koller, Michael; Petersen, Morten A; Sprangers, Mirjam A G
2009-03-01
Differential item functioning (DIF) analyses are increasingly used to evaluate health-related quality of life (HRQoL) instruments, which often include relatively short subscales. Computer simulations were used to explore how various factors including scale length affect analysis of DIF by ordinal logistic regression. Simulated data, representative of HRQoL scales with four-category items, were generated. The power and type I error rates of the DIF method were then investigated when, respectively, DIF was deliberately introduced and when no DIF was added. The sample size, scale length, floor effects (FEs) and significance level were varied. When there was no DIF, type I error rates were close to 5%. Detecting moderate uniform DIF in a two-item scale required a sample size of 300 per group for adequate (>80%) power. For longer scales, a sample size of 200 was adequate. Considerably larger sample sizes were required to detect nonuniform DIF, when there were extreme FEs or when a reduced type I error rate was required. The impact of the number of items in the scale was relatively small. Ordinal logistic regression successfully detects DIF for HRQoL instruments with short scales. Sample size guidelines are provided.
Model for estimating of population abundance using line transect sampling
Abdulraqeb Abdullah Saeed, Gamil; Muhammad, Noryanti; Zun Liang, Chuan; Yusoff, Wan Nur Syahidah Wan; Zuki Salleh, Mohd
2017-09-01
Today, many studies use the nonparametric methods for estimating objects abundance, for the simplicity, the parametric methods are widely used by biometricians. This paper is designed to present the proposed model for estimating of population abundance using line transect technique. The proposed model is appealing because it is strictly monotonically decreasing with perpendicular distance and it satisfies the shoulder conditions. The statistical properties and inference of the proposed model are discussed. In the presented detection function, theoretically, the proposed model is satisfied the line transect assumption, that leads us to study the performance of this model. We use this model as a reference for the future research of density estimation. In this paper we also study the assumption of the detection function and introduce the corresponding model in order to apply the simulation in future work.
Directory of Open Access Journals (Sweden)
Nadia Mushtaq
2017-03-01
Full Text Available In this article, a combined general family of estimators is proposed for estimating finite population mean of a sensitive variable in stratified random sampling with non-sensitive auxiliary variable based on randomized response technique. Under stratified random sampling without replacement scheme, the expression of bias and mean square error (MSE up to the first-order approximations are derived. Theoretical and empirical results through a simulation study show that the proposed class of estimators is more efficient than the existing estimators, i.e., usual stratified random sample mean estimator, Sousa et al (2014 ratio and regression estimator of the sensitive variable in stratified sampling.
Dual to Ratio-Cum-Product Estimator in Simple and Stratified Random Sampling
Yunusa Olufadi
2013-01-01
New estimators for estimating the finite population mean using two auxiliary variables under simple and stratified sampling design is proposed. Their properties (e.g., mean square error) are studied to the first order of approximation. More so, some estimators are shown to be a particular member of this estimator. Furthermore, comparison of the proposed estimator with the usual unbiased estimator and other estimators considered in this paper reveals interesting results. These results are fur...
Stratified random sampling for estimating billing accuracy in health care systems.
Buddhakulsomsiri, Jirachai; Parthanadee, Parthana
2008-03-01
This paper presents a stratified random sampling plan for estimating accuracy of bill processing performance for the health care bills submitted to third party payers in health care systems. Bill processing accuracy is estimated with two measures: percent accuracy and total dollar accuracy. Difficulties in constructing a sampling plan arise when the population strata structure is unknown, and when the two measures require different sampling schemes. To efficiently utilize sample resource, the sampling plan is designed to effectively estimate both measures from the same sample. The sampling plan features a simple but efficient strata construction method, called rectangular method, and two accuracy estimation methods, one for each measure. The sampling plan is tested on actual populations from an insurance company. Accuracy estimates obtained are then used to compare the rectangular method to other potential clustering methods for strata construction, and compare the accuracy estimation methods to other eligible methods. Computational study results show effectiveness of the proposed sampling plan.
Directory of Open Access Journals (Sweden)
John A Sved
Full Text Available There is a substantial literature on the use of linkage disequilibrium (LD to estimate effective population size using unlinked loci. The Ne estimates are extremely sensitive to the sampling process, and there is currently no theory to cope with the possible biases. We derive formulae for the analysis of idealised populations mating at random with multi-allelic (microsatellite loci. The 'Burrows composite index' is introduced in a novel way with a 'composite haplotype table'. We show that in a sample of diploid size S, the mean value of x2 or r2 from the composite haplotype table is biased by a factor of 1-1/(2S-12, rather than the usual factor 1+1/(2S-1 for a conventional haplotype table. But analysis of population data using these formulae leads to Ne estimates that are unrealistically low. We provide theory and simulation to show that this bias towards low Ne estimates is due to null alleles, and introduce a randomised permutation correction to compensate for the bias. We also consider the effect of introducing a within-locus disequilibrium factor to r2, and find that this factor leads to a bias in the Ne estimate. However this bias can be overcome using the same randomised permutation correction, to yield an altered r2 with lower variance than the original r2, and one that is also insensitive to null alleles. The resulting formulae are used to provide Ne estimates on 40 samples of the Queensland fruit fly, Bactrocera tryoni, from populations with widely divergent Ne expectations. Linkage relationships are known for most of the microsatellite loci in this species. We find that there is little difference in the estimated Ne values from using known unlinked loci as compared to using all loci, which is important for conservation studies where linkage relationships are unknown.
Sved, John A; Cameron, Emilie C; Gilchrist, A Stuart
2013-01-01
There is a substantial literature on the use of linkage disequilibrium (LD) to estimate effective population size using unlinked loci. The Ne estimates are extremely sensitive to the sampling process, and there is currently no theory to cope with the possible biases. We derive formulae for the analysis of idealised populations mating at random with multi-allelic (microsatellite) loci. The 'Burrows composite index' is introduced in a novel way with a 'composite haplotype table'. We show that in a sample of diploid size S, the mean value of x2 or r2 from the composite haplotype table is biased by a factor of 1-1/(2S-1)2, rather than the usual factor 1+1/(2S-1) for a conventional haplotype table. But analysis of population data using these formulae leads to Ne estimates that are unrealistically low. We provide theory and simulation to show that this bias towards low Ne estimates is due to null alleles, and introduce a randomised permutation correction to compensate for the bias. We also consider the effect of introducing a within-locus disequilibrium factor to r2, and find that this factor leads to a bias in the Ne estimate. However this bias can be overcome using the same randomised permutation correction, to yield an altered r2 with lower variance than the original r2, and one that is also insensitive to null alleles. The resulting formulae are used to provide Ne estimates on 40 samples of the Queensland fruit fly, Bactrocera tryoni, from populations with widely divergent Ne expectations. Linkage relationships are known for most of the microsatellite loci in this species. We find that there is little difference in the estimated Ne values from using known unlinked loci as compared to using all loci, which is important for conservation studies where linkage relationships are unknown.
Baek, Ji Eun; Kim, Sung Hun; Lee, Ah Won
2014-08-01
To evaluate whether the degree of background parenchymal enhancement affects the accuracy of tumor size estimation based on breast MRI. Three hundred and twenty-two patients who had known breast cancer and underwent breast MRIs were recruited in our study. The total number of breast cancer cases was 339. All images were assessed retrospectively for the level of background parenchymal enhancement based on the BI-RADS criteria. Maximal lesion diameters were measured on the MRIs, and tumor types (mass vs. non-mass) were assessed. Tumor size differences between the MRI-based estimates and estimates based on pathological examinations were analyzed. The relationship between accuracy and tumor types and clinicopathologic features were also evaluated. The cases included minimal (47.5%), mild (28.9%), moderate (12.4%) and marked background parenchymal enhancement (11.2%). The tumors of patients with minimal or mild background parenchymal enhancement were more accurately estimated than those of patients with moderate or marked enhancement (72.1% vs. 56.8%; p=0.003). The tumors of women with mass type lesions were significantly more accurately estimated than those of the women with non-mass type lesions (81.6% vs. 28.6%; penhancement is related to the inaccurate estimation of tumor size based on MRI. Non-mass type breast cancer and HER2-positive breast cancer are other factors that may cause inaccurate assessment of tumor size. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Energy Technology Data Exchange (ETDEWEB)
Xu Huijun; Gordon, J. James; Siebers, Jeffrey V. [Department of Radiation Oncology, Virginia Commonwealth University, Richmond, Virginia 23298 (United States)
2011-02-15
Purpose: A dosimetric margin (DM) is the margin in a specified direction between a structure and a specified isodose surface, corresponding to a prescription or tolerance dose. The dosimetric margin distribution (DMD) is the distribution of DMs over all directions. Given a geometric uncertainty model, representing inter- or intrafraction setup uncertainties or internal organ motion, the DMD can be used to calculate coverage Q, which is the probability that a realized target or organ-at-risk (OAR) dose metric D{sub v} exceeds the corresponding prescription or tolerance dose. Postplanning coverage evaluation quantifies the percentage of uncertainties for which target and OAR structures meet their intended dose constraints. The goal of the present work is to evaluate coverage probabilities for 28 prostate treatment plans to determine DMD sampling parameters that ensure adequate accuracy for postplanning coverage estimates. Methods: Normally distributed interfraction setup uncertainties were applied to 28 plans for localized prostate cancer, with prescribed dose of 79.2 Gy and 10 mm clinical target volume to planning target volume (CTV-to-PTV) margins. Using angular or isotropic sampling techniques, dosimetric margins were determined for the CTV, bladder and rectum, assuming shift invariance of the dose distribution. For angular sampling, DMDs were sampled at fixed angular intervals {omega} (e.g., {omega}=1 deg., 2 deg., 5 deg., 10 deg., 20 deg.). Isotropic samples were uniformly distributed on the unit sphere resulting in variable angular increments, but were calculated for the same number of sampling directions as angular DMDs, and accordingly characterized by the effective angular increment {omega}{sub eff}. In each direction, the DM was calculated by moving the structure in radial steps of size {delta}(=0.1,0.2,0.5,1 mm) until the specified isodose was crossed. Coverage estimation accuracy {Delta}Q was quantified as a function of the sampling parameters {omega} or
Sequential sampling, magnitude estimation, and the wisdom of crowds
DEFF Research Database (Denmark)
Nash, Ulrik W.
2017-01-01
Sir Francis Galton (Galton, 1907) conjectured the psychological process of magnitude estimation caused the curious distribution of judgments he observed at Plymouth in 1906. However, after he published Vox Populi, researchers narrowed their attention to the first moment of judgment distributions ...
Evaluation of sampling strategies to estimate crown biomass
Krishna P Poudel; Hailemariam Temesgen; Andrew N Gray
2015-01-01
Depending on tree and site characteristics crown biomass accounts for a significant portion of the total aboveground biomass in the tree. Crown biomass estimation is useful for different purposes including evaluating the economic feasibility of crown utilization for energy production or forest products, fuel load assessments and fire management strategies, and wildfire...
Estimating intergenerational schooling mobility on censored samples: consequences and remedies
de Haan, M.; Plug, E.
2011-01-01
In this paper we estimate the impact of parental schooling on child schooling, focus on the problem that children who are still in school constitute censored observations, and evaluate three solutions to it: replacement of observed with expected years of schooling, maximum likelihood approach, and
Completeness of the fossil record: Estimating losses due to small body size
Cooper, Roger A.; Maxwell, Phillip A.; Crampton, James S.; Beu, Alan G.; Jones, Craig M.; Marshall, Bruce A.
2006-04-01
Size bias in the fossil record limits its use for interpreting patterns of past biodiversity and ecological change. Using comparative size frequency distributions of exceptionally good regional records of New Zealand Holocene and Cenozoic Mollusca in museum archive collections, we derive first-order estimates of the magnitude of the bias against small body size and the effect of this bias on completeness of the fossil record. Our database of 3907 fossil species represents an original living pool of 9086 species, from which ˜36% have been removed by size culling, 27% from the smallest size class (<5 mm). In contrast, non-size-related losses compose only 21% of the total. In soft rocks, the loss of small taxa can be reduced by nearly 50% through the employment of exhaustive collection and preparation techniques.
2010-07-01
... 40 Protection of Environment 5 2010-07-01 2010-07-01 false Estimated Mass Concentration... Concentration Measurement of PM2.5 for Idealized Coarse Aerosol Size Distribution Particle Aerodynamic Diameter (µm) Test Sampler Fractional Sampling Effectiveness Interval Mass Concentration (µg/m3) Estimated Mass...
2010-07-01
... 40 Protection of Environment 5 2010-07-01 2010-07-01 false Estimated Mass Concentration... Concentration Measurement of PM2.5 for Idealized Fine Aerosol Size Distribution Particle Aerodynamic Diameter (µm) Test Sampler Fractional Sampling Effectiveness Interval Mass Concentration (µg/m3) Estimated Mass...
Gridsampler – A Simulation Tool to Determine the Required Sample Size for Repertory Grid Studies
Directory of Open Access Journals (Sweden)
Mark Heckmann
2017-01-01
Full Text Available The repertory grid is a psychological data collection technique that is used to elicit qualitative data in the form of attributes as well as quantitative ratings. A common approach for evaluating multiple repertory grid data is sorting the elicited bipolar attributes (so called constructs into mutually exclusive categories by means of content analysis. An important question when planning this type of study is determining the sample size needed to a discover all attribute categories relevant to the field and b yield a predefined minimal number of attributes per category. For most applied researchers who collect multiple repertory grid data, programming a numeric simulation to answer these questions is not feasible. The gridsampler software facilitates determining the required sample size by providing a GUI for conducting the necessary numerical simulations. Researchers can supply a set of parameters suitable for the specific research situation, determine the required sample size, and easily explore the effects of changes in the parameter set.
[On the impact of sample size calculation and power in clinical research].
Held, Ulrike
2014-10-01
The aim of a clinical trial is to judge the efficacy of a new therapy or drug. In the planning phase of the study, the calculation of the necessary sample size is crucial in order to obtain a meaningful result. The study design, the expected treatment effect in outcome and its variability, power and level of significance are factors which determine the sample size. It is often difficult to fix these parameters prior to the start of the study, but related papers from the literature can be helpful sources for the unknown quantities. For scientific as well as ethical reasons it is necessary to calculate the sample size in advance in order to be able to answer the study question.
van Hassel, Daniël; van der Velden, Lud; de Bakker, Dinny; van der Hoek, Lucas; Batenburg, Ronald
2017-12-04
Our research is based on a technique for time sampling, an innovative method for measuring the working hours of Dutch general practitioners (GPs), which was deployed in an earlier study. In this study, 1051 GPs were questioned about their activities in real time by sending them one SMS text message every 3 h during 1 week. The required sample size for this study is important for health workforce planners to know if they want to apply this method to target groups who are hard to reach or if fewer resources are available. In this time-sampling method, however, standard power analyses is not sufficient for calculating the required sample size as this accounts only for sample fluctuation and not for the fluctuation of measurements taken from every participant. We investigated the impact of the number of participants and frequency of measurements per participant upon the confidence intervals (CIs) for the hours worked per week. Statistical analyses of the time-use data we obtained from GPs were performed. Ninety-five percent CIs were calculated, using equations and simulation techniques, for various different numbers of GPs included in the dataset and for various frequencies of measurements per participant. Our results showed that the one-tailed CI, including sample and measurement fluctuation, decreased from 21 until 3 h between one and 50 GPs. As a result of the formulas to calculate CIs, the increase of the precision continued and was lower with the same additional number of GPs. Likewise, the analyses showed how the number of participants required decreased if more measurements per participant were taken. For example, one measurement per 3-h time slot during the week requires 300 GPs to achieve a CI of 1 h, while one measurement per hour requires 100 GPs to obtain the same result. The sample size needed for time-use research based on a time-sampling technique depends on the design and aim of the study. In this paper, we showed how the precision of the
Wang, Shunli; Tan, Qiaofeng; Cao, Liangcai; He, Qingsheng; Jin, Guofan
2009-11-23
Based on volume holographic correlator, a multi-sample parallel estimation method is proposed to implement remote sensing image recognition with high accuracy. The essential steps of the method including image preprocessing, estimation curves fitting, template images preparation and estimation equation establishing are discussed in detail. The experimental results show the validity of the multi-sample parallel estimation method, and the recognition accuracy is improved by increasing the sample numbers.
Predictors of Citation Rate in Psychology: Inconclusive Influence of Effect and Sample Size.
Hanel, Paul H P; Haase, Jennifer
2017-01-01
In the present article, we investigate predictors of how often a scientific article is cited. Specifically, we focus on the influence of two often neglected predictors of citation rate: effect size and sample size, using samples from two psychological topical areas. Both can be considered as indicators of the importance of an article and post hoc (or observed) statistical power, and should, especially in applied fields, predict citation rates. In Study 1, effect size did not have an influence on citation rates across a topical area, both with and without controlling for numerous variables that have been previously linked to citation rates. In contrast, sample size predicted citation rates, but only while controlling for other variables. In Study 2, sample and partly effect sizes predicted citation rates, indicating that the relations vary even between scientific topical areas. Statistically significant results had more citations in Study 2 but not in Study 1. The results indicate that the importance (or power) of scientific findings may not be as strongly related to citation rate as is generally assumed.
A New Estimator For Population Mean Using Two Auxiliary Variables in Stratified random Sampling
Singh, Rajesh; Malik, Sachin
2014-01-01
In this paper, we suggest an estimator using two auxiliary variables in stratified random sampling. The propose estimator has an improvement over mean per unit estimator as well as some other considered estimators. Expressions for bias and MSE of the estimator are derived up to first degree of approximation. Moreover, these theoretical findings are supported by a numerical example with original data. Key words: Study variable, auxiliary variable, stratified random sampling, bias and mean squa...
Willan, Andrew R
2008-01-01
Traditional sample size calculations for randomized clinical trials depend on somewhat arbitrarily chosen factors, such as type I and II errors. As an alternative, taking a societal perspective, and using the expected value of information based on Bayesian decision theory, a number of authors have recently shown how to determine the sample size that maximizes the expected net gain, i.e., the difference between the cost of the trial and the value of the information gained from the results. Other authors have proposed Bayesian methods to determine sample sizes from an industry perspective. The purpose of this article is to propose a Bayesian approach to sample size calculations from an industry perspective that attempts to determine the sample size that maximizes expected profit. A model is proposed for expected total profit that includes consideration of per-patient profit, disease incidence, time horizon, trial duration, market share, discount rate, and the relationship between the results and the probability of regulatory approval. The expected value of information provided by trial data is related to the increase in expected profit from increasing the probability of regulatory approval. The methods are applied to an example, including an examination of robustness. The model is extended to consider market share as a function of observed treatment effect. The use of methods based on the expected value of information can provide, from an industry perspective, robust sample size solutions that maximize the difference between the expected cost of the trial and the expected value of information gained from the results. The method is only as good as the model for expected total profit. Although the model probably has all the right elements, it assumes that market share, per-patient profit, and incidence are insensitive to trial results. The method relies on the central limit theorem which assumes that the sample sizes involved ensure that the relevant test statistics
Ching Chun Huang
2014-01-01
This paper develops the two-state and three-state adaptive sample size control schemes based on the Max chart to simultaneously monitor the process mean and standard deviation. Since the Max chart is a single variables control chart where only one plotting statistic is needed, the design and operation of adaptive sample size schemes for this chart will be simpler than those for the joint [Xmacr ] and S charts. Three types of processes including on-target initial, off-target initial and steady...
Bayesian sample size determination for cost-effectiveness studies with censored data.
Directory of Open Access Journals (Sweden)
Daniel P Beavers
Full Text Available Cost-effectiveness models are commonly utilized to determine the combined clinical and economic impact of one treatment compared to another. However, most methods for sample size determination of cost-effectiveness studies assume fully observed costs and effectiveness outcomes, which presents challenges for survival-based studies in which censoring exists. We propose a Bayesian method for the design and analysis of cost-effectiveness data in which costs and effectiveness may be censored, and the sample size is approximated for both power and assurance. We explore two parametric models and demonstrate the flexibility of the approach to accommodate a variety of modifications to study assumptions.
Karanth, K.Ullas; Chundawat, Raghunandan S.; Nichols, James D.; Kumar, N. Samba
2004-01-01
Tropical dry-deciduous forests comprise more than 45% of the tiger (Panthera tigris) habitat in India. However, in the absence of rigorously derived estimates of ecological densities of tigers in dry forests, critical baseline data for managing tiger populations are lacking. In this study tiger densities were estimated using photographic capture–recapture sampling in the dry forests of Panna Tiger Reserve in Central India. Over a 45-day survey period, 60 camera trap sites were sampled in a well-protected part of the 542-km2 reserve during 2002. A total sampling effort of 914 camera-trap-days yielded photo-captures of 11 individual tigers over 15 sampling occasions that effectively covered a 418-km2 area. The closed capture–recapture model Mh, which incorporates individual heterogeneity in capture probabilities, fitted these photographic capture history data well. The estimated capture probability/sample, p̂= 0.04, resulted in an estimated tiger population size and standard error (N̂(SÊN̂)) of 29 (9.65), and a density (D̂(SÊD̂)) of 6.94 (3.23) tigers/100 km2. The estimated tiger density matched predictions based on prey abundance. Our results suggest that, if managed appropriately, the available dry forest habitat in India has the potential to support a population size of about 9000 wild tigers.
Investigation of Bicycle Travel Time Estimation Using Bluetooth Sensors for Low Sampling Rates
Directory of Open Access Journals (Sweden)
Zhenyu Mei
2014-10-01
Full Text Available Filtering the data for bicycle travel time using Bluetooth sensors is crucial to the estimation of link travel times on a corridor. The current paper describes an adaptive filtering algorithm for estimating bicycle travel times using Bluetooth data, with consideration of low sampling rates. The data for bicycle travel time using Bluetooth sensors has two characteristics. First, the bicycle flow contains stable and unstable conditions. Second, the collected data have low sampling rates (less than 1%. To avoid erroneous inference, filters are introduced to “purify” multiple time series. The valid data are identified within a dynamically varying validity window with the use of a robust data-filtering procedure. The size of the validity window varies based on the number of preceding sampling intervals without a Bluetooth record. Applications of the proposed algorithm to the dataset from Genshan East Road and Moganshan Road in Hangzhou demonstrate its ability to track typical variations in bicycle travel time efficiently, while suppressing high frequency noise signals.
Automated modal parameter estimation using correlation analysis and bootstrap sampling
Yaghoubi, Vahid; Vakilzadeh, Majid K.; Abrahamsson, Thomas J. S.
2018-02-01
The estimation of modal parameters from a set of noisy measured data is a highly judgmental task, with user expertise playing a significant role in distinguishing between estimated physical and noise modes of a test-piece. Various methods have been developed to automate this procedure. The common approach is to identify models with different orders and cluster similar modes together. However, most proposed methods based on this approach suffer from high-dimensional optimization problems in either the estimation or clustering step. To overcome this problem, this study presents an algorithm for autonomous modal parameter estimation in which the only required optimization is performed in a three-dimensional space. To this end, a subspace-based identification method is employed for the estimation and a non-iterative correlation-based method is used for the clustering. This clustering is at the heart of the paper. The keys to success are correlation metrics that are able to treat the problems of spatial eigenvector aliasing and nonunique eigenvectors of coalescent modes simultaneously. The algorithm commences by the identification of an excessively high-order model from frequency response function test data. The high number of modes of this model provides bases for two subspaces: one for likely physical modes of the tested system and one for its complement dubbed the subspace of noise modes. By employing the bootstrap resampling technique, several subsets are generated from the same basic dataset and for each of them a model is identified to form a set of models. Then, by correlation analysis with the two aforementioned subspaces, highly correlated modes of these models which appear repeatedly are clustered together and the noise modes are collected in a so-called Trashbox cluster. Stray noise modes attracted to the mode clusters are trimmed away in a second step by correlation analysis. The final step of the algorithm is a fuzzy c-means clustering procedure applied to
Willan, Andrew; Kowgier, Matthew
2008-01-01
Traditional sample size calculations for randomized clinical trials depend on somewhat arbitrarily chosen factors, such as Type I and II errors. An effectiveness trial (otherwise known as a pragmatic trial or management trial) is essentially an effort to inform decision-making, i.e., should treatment be adopted over standard? Taking a societal perspective and using Bayesian decision theory, Willan and Pinto (Stat. Med. 2005; 24:1791-1806 and Stat. Med. 2006; 25:720) show how to determine the sample size that maximizes the expected net gain, i.e., the difference between the cost of doing the trial and the value of the information gained from the results. These methods are extended to include multi-stage adaptive designs, with a solution given for a two-stage design. The methods are applied to two examples. As demonstrated by the two examples, substantial increases in the expected net gain (ENG) can be realized by using multi-stage adaptive designs based on expected value of information methods. In addition, the expected sample size and total cost may be reduced. Exact solutions have been provided for the two-stage design. Solutions for higher-order designs may prove to be prohibitively complex and approximate solutions may be required. The use of multi-stage adaptive designs for randomized clinical trials based on expected value of sample information methods leads to substantial gains in the ENG and reductions in the expected sample size and total cost.
A simulation-based sample size calculation method for pre-clinical tumor xenograft experiments.
Wu, Jianrong; Yang, Shengping
2017-04-07
Pre-clinical tumor xenograft experiments usually require a small sample size that is rarely greater than 20, and data generated from such experiments very often do not have censored observations. Many statistical tests can be used for analyzing such data, but most of them were developed based on large sample approximation. We demonstrate that the type-I error rates of these tests can substantially deviate from the designated rate, especially when the data to be analyzed has a skewed distribution. Consequently, the sample size calculated based on these tests can be erroneous. We propose a modified signed log-likelihood ratio test (MSLRT) to meet the type-I error rate requirement for analyzing pre-clinical tumor xenograft data. The MSLRT has a consistent and symmetric type-I error rate that is very close to the designated rate for a wide range of sample sizes. By simulation, we generated a series of sample size tables based on scenarios commonly expected in tumor xenograft experiments, and we expect that these tables can be used as guidelines for making decisions on the numbers of mice used in tumor xenograft experiments.
Brief Report: Body Image in Autism: Evidence from Body Size Estimation.
Asada, Kosuke; Tojo, Yoshikuni; Hakarino, Koichiro; Saito, Atsuko; Hasegawa, Toshikazu; Kumagaya, Shinichiro
2018-02-01
Individuals with autism spectrum disorder (ASD) have difficulties with social interaction and communication. First-hand accounts written by individuals with ASD have shown the existence of other atypical characteristics such as difficulties with body awareness. However, few studies have examined whether such atypicalities are found more generally among individuals with ASD. We examined body image (i.e., self-body awareness) by asking individuals with ASD and typically developing (TD) individuals to estimate their own body size (shoulder width). Results show that TD individuals estimated their shoulder width more accurately than individuals with ASD. This study suggests that individuals with ASD often experience misperceptions in their body size.
DEFF Research Database (Denmark)
Nielsen, Jesper Kjær; Jensen, Tobias Lindstrøm; Jensen, Jesper Rindom
2016-01-01
In many spectral estimation and array processing problems, the process of finding estimates of model parameters often involves the optimisation of a cost function containing multiple peaks and dips. Such non-convex problems are hard to solve using traditional optimisation algorithms developed...... for convex problems, and computationally intensive grid searches are therefore often used instead. In this paper, we establish an analytical connection between the grid size and the parametrisation of the cost function so that the grid size can be selected as coarsely as possible to lower the computation...
Eguiarte, Luis E; Búrquez, Alberto; Rodríguez, Jorge; Martínez-Ramos, Miguel; Sarukhán, José; Pinero, Daniel
1993-02-01
To estimate the relative importance of genetic drift, the effective population size ∗∗∗(Ne ) can be used. Here we present estimates of the effective population size and related measures in Astrocaryum mexicanum, a tropical palm from Los Tuxtlas rain forest, Veracruz, Mexico. Seed and pollen dispersal were measured. Seeds are primarily dispersed by gravity and secondarily dispersed by small mammals. Mean primary and secondary dispersal distances for seeds were found to be small (0.78 m and 2.35 m, respectively). A. mexicanum is beetle pollinated and pollen movements were measured by different methods: a) using fluorescent dyes, b) as the minimum distance between active female and male inflorescences, and c) using rare allozyme alleles as genetic markers. All three estimates of pollen dispersal were similar, with a mean of approximately 20 m. Using the seed and pollen dispersal data, the genetic neighborhood area (A) was estimated to be 2,551 m(2) . To obtain the effective population size, three different overlapping generation methods were used to estimate an effective density with demographic data from six permanent plots. The effective density ranged from 0.040 to 0.351 individuals per m(2) . The product of effective density and neighborhood area yields a direct estimate of the neighborhood effective population size (Nb ). Nb ranged from 102 to 895 individuals. Indirect estimates of population size and migration rate (Nm) were obtained using Fst for five different allozymic loci for both adults and seeds. We obtained a range of Nm from 1.2 to 19.7 in adults and a range of Nm from 4.0 to 82.6 for seeds. We discuss possible causes of the smaller indirect estimates of Nm relative to the direct and compare our estimates with values from other plant populations. Gene dispersal distances, neighborhood size, and effective population size in A. mexicanum are relatively high, suggesting that natural selection, rather than genetic drift, may play a dominant role in
A simple method for estimating the size of nuclei on fractal surfaces
Zeng, Qiang
2017-10-01
Determining the size of nuclei on complex surfaces remains a big challenge in aspects of biological, material and chemical engineering. Here the author reported a simple method to estimate the size of the nuclei in contact with complex (fractal) surfaces. The established approach was based on the assumptions of contact area proportionality for determining nucleation density and the scaling congruence between nuclei and surfaces for identifying contact regimes. It showed three different regimes governing the equations for estimating the nucleation site density. Nuclei in the size large enough could eliminate the effect of fractal structure. Nuclei in the size small enough could lead to the independence of nucleation site density on fractal parameters. Only when nuclei match the fractal scales, the nucleation site density is associated with the fractal parameters and the size of the nuclei in a coupling pattern. The method was validated by the experimental data reported in the literature. The method may provide an effective way to estimate the size of nuclei on fractal surfaces, through which a number of promising applications in relative fields can be envisioned.
Estimating search engine index size variability: a 9-year longitudinal study.
van den Bosch, Antal; Bogers, Toine; de Kunder, Maurice
One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel method of estimating the size of a Web search engine's index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing's indices over a nine-year period, from March 2006 until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find that much, if not all of this variability can be explained by changes in the indexing and ranking infrastructure of Google and Bing. This casts further doubt on whether Web search engines can be used reliably for cross-sectional webometric studies.
Liu, Beiyi; Gui, Guan; Xu, Li
2015-01-01
Least mean square (LMS) type adaptive algorithms have attracted much attention due to their low computational complexity. In the scenarios of sparse channel estimation, zero-attracting LMS (ZA-LMS), reweighted ZA-LMS (RZA-LMS) and reweighted -norm LMS (RL1-LMS) have been proposed to exploit channel sparsity. However, these proposed algorithms may hard to make tradeoff between convergence speed and estimation performance with only one step-size. To solve this problem, we propose three sparse i...
Identifying grain-size dependent errors on global forest area estimates and carbon studies
Daolan Zheng; Linda S. Heath; Mark J. Ducey
2008-01-01
Satellite-derived coarse-resolution data are typically used for conducting global analyses. But the forest areas estimated from coarse-resolution maps (e.g., 1 km) inevitably differ from a corresponding fine-resolution map (such as a 30-m map) that would be closer to ground truth. A better understanding of changes in grain size on area estimation will improve our...
An importance sampling algorithm for estimating extremes of perpetuity sequences
DEFF Research Database (Denmark)
Collamore, Jeffrey F.
2012-01-01
In a wide class of problems in insurance and financial mathematics, it is of interest to study the extremal events of a perpetuity sequence. This paper addresses the problem of numerically evaluating these rare event probabilities. Specifically, an importance sampling algorithm is described which...
Estimates of the Sampling Distribution of Scalability Coefficient H
Van Onna, Marieke J. H.
2004-01-01
Coefficient "H" is used as an index of scalability in nonparametric item response theory (NIRT). It indicates the degree to which a set of items rank orders examinees. Theoretical sampling distributions, however, have only been derived asymptotically and only under restrictive conditions. Bootstrap methods offer an alternative possibility to…
Cantarello, Elena; Steck, Claude E; Fontana, Paolo; Fontaneto, Diego; Marini, Lorenzo; Pautasso, Marco
2010-03-01
Recent large-scale studies have shown that biodiversity-rich regions also tend to be densely populated areas. The most obvious explanation is that biodiversity and human beings tend to match the distribution of energy availability, environmental stability and/or habitat heterogeneity. However, the species-people correlation can also be an artefact, as more populated regions could show more species because of a more thorough sampling. Few studies have tested this sampling bias hypothesis. Using a newly collated dataset, we studied whether Orthoptera species richness is related to human population size in Italy's regions (average area 15,000 km(2)) and provinces (2,900 km(2)). As expected, the observed number of species increases significantly with increasing human population size for both grain sizes, although the proportion of variance explained is minimal at the provincial level. However, variations in observed Orthoptera species richness are primarily associated with the available number of records, which is in turn well correlated with human population size (at least at the regional level). Estimated Orthoptera species richness (Chao2 and Jackknife) also increases with human population size both for regions and provinces. Both for regions and provinces, this increase is not significant when controlling for variation in area and number of records. Our study confirms the hypothesis that broad-scale human population-biodiversity correlations can in some cases be artefactual. More systematic sampling of less studied taxa such as invertebrates is necessary to ascertain whether biogeographical patterns persist when sampling effort is kept constant or included in models.
Cantarello, Elena; Steck, Claude E.; Fontana, Paolo; Fontaneto, Diego; Marini, Lorenzo; Pautasso, Marco
2010-03-01
Recent large-scale studies have shown that biodiversity-rich regions also tend to be densely populated areas. The most obvious explanation is that biodiversity and human beings tend to match the distribution of energy availability, environmental stability and/or habitat heterogeneity. However, the species-people correlation can also be an artefact, as more populated regions could show more species because of a more thorough sampling. Few studies have tested this sampling bias hypothesis. Using a newly collated dataset, we studied whether Orthoptera species richness is related to human population size in Italy’s regions (average area 15,000 km2) and provinces (2,900 km2). As expected, the observed number of species increases significantly with increasing human population size for both grain sizes, although the proportion of variance explained is minimal at the provincial level. However, variations in observed Orthoptera species richness are primarily associated with the available number of records, which is in turn well correlated with human population size (at least at the regional level). Estimated Orthoptera species richness (Chao2 and Jackknife) also increases with human population size both for regions and provinces. Both for regions and provinces, this increase is not significant when controlling for variation in area and number of records. Our study confirms the hypothesis that broad-scale human population-biodiversity correlations can in some cases be artefactual. More systematic sampling of less studied taxa such as invertebrates is necessary to ascertain whether biogeographical patterns persist when sampling effort is kept constant or included in models.
Sampling strategy for estimating human exposure pathways to consumer chemicals
Directory of Open Access Journals (Sweden)
Eleni Papadopoulou
2016-03-01
Full Text Available Human exposure to consumer chemicals has become a worldwide concern. In this work, a comprehensive sampling strategy is presented, to our knowledge being the first to study all relevant exposure pathways in a single cohort using multiple methods for assessment of exposure from each exposure pathway. The selected groups of chemicals to be studied are consumer chemicals whose production and use are currently in a state of transition and are; per- and polyfluorinated alkyl substances (PFASs, traditional and “emerging” brominated flame retardants (BFRs and EBFRs, organophosphate esters (OPEs and phthalate esters (PEs. Information about human exposure to these contaminants is needed due to existing data gaps on human exposure intakes from multiple exposure pathways and relationships between internal and external exposure. Indoor environment, food and biological samples were collected from 61 participants and their households in the Oslo area (Norway on two consecutive days, during winter 2013-14. Air, dust, hand wipes, and duplicate diet (food and drink samples were collected as indicators of external exposure, and blood, urine, blood spots, hair, nails and saliva as indicators of internal exposure. A food diary, food frequency questionnaire (FFQ and indoor environment questionnaire were also implemented. Approximately 2000 samples were collected in total and participant views on their experiences of this campaign were collected via questionnaire. While 91% of our participants were positive about future participation in a similar project, some tasks were viewed as problematic. Completing the food diary and collection of duplicate food/drink portions were the tasks most frequent reported as “hard”/”very hard”. Nevertheless, a strong positive correlation between the reported total mass of food/drinks in the food record and the total weight of the food/drinks in the collection bottles was observed, being an indication of accurate performance
Sample size bounding and context ranking as approaches to the HRA data problem
Energy Technology Data Exchange (ETDEWEB)
Reer, Bernhard
2004-02-01
This paper presents a technique denoted as sub sample size bounding (SSSB) useable for the statistical derivation of context-specific probabilities from data available in existing reports on operating experience. Applications for human reliability analysis (HRA) are emphasized in the presentation of the technique. Exemplified by a sample of 180 abnormal event sequences, it is outlined how SSSB can provide viable input for the quantification of errors of commission (EOCs)
Sample Size Bounding and Context Ranking as Approaches to the Human Error Quantification Problem
Energy Technology Data Exchange (ETDEWEB)
Reer, B
2004-03-01
The paper describes a technique denoted as Sub-Sample-Size Bounding (SSSB), which is useable for the statistical derivation of context-specific probabilities from data available in existing reports on operating experience. Applications to human reliability analysis (HRA) are emphasised in the presentation of this technique. Exemplified by a sample of 180 abnormal event sequences, the manner in which SSSB can provide viable input for the quantification of errors of commission (EOCs) are outlined. (author)
SAMPLE SIZE DETERMINATION IN NON-RADOMIZED SURVIVAL STUDIES WITH NON-CENSORED AND CENSORED DATA
Faghihzadeh, S.; M. Rahgozar
2003-01-01
Introduction: In survival analysis, determination of sufficient sample size to achieve suitable statistical power is important .In both parametric and non-parametric methods of classic statistics, randomn selection of samples is a basic condition. practically, in most clinical trials and health surveys randomn allocation is impossible. Fixed - effect multiple linear regression analysis covers this need and this feature could be extended to survival regression analysis. This paper is the resul...
Charles T. Scott; William A. Bechtold; Gregory A. Reams; William D. Smith; James A. Westfall; Mark H. Hansen; Gretchen G. Moisen
2005-01-01
This chapter outlines prescribed core procedures for deriving population estimates from attributes measured in conjunction with the Phase 1 and Phase 2 samples. These estimation procedures also apply to those Phase 3 attributes in common with Phase 2. Given the sampling frame and plot design described in the previous two chapters, many estimation approaches can be...
The Effect of Childhood Family Size on Fertility in Adulthood: New Evidence From IV Estimation.
Cools, Sara; Kaldager Hart, Rannveig
2017-02-01
Although fertility is positively correlated across generations, the causal effect of children's experience with larger sibships on their own fertility in adulthood is poorly understood. With the sex composition of the two firstborn children as an instrumental variable, we estimate the effect of sibship size on adult fertility using high-quality data from Norwegian administrative registers. Our study sample is all firstborns or second-borns during the 1960s in Norwegian families with at least two children (approximately 110,000 men and 104,000 women). An additional sibling has a positive effect on male fertility, mainly causing them to have three children themselves, but has a negative effect on female fertility at the same margin. Investigation into mediators reveals that mothers of girls shift relatively less time from market to family work when an additional child is born. We speculate that this scarcity in parents' time makes girls aware of the strains of life in large families, leading them to limit their own number of children in adulthood.
Wu, Jiacheng; Crawford, Forrest W; Raag, Mait; Heimer, Robert; Uusküla, Anneli
2017-01-01
Estimating the size of key risk populations is essential for determining the resources needed to implement effective public health intervention programs. Several standard methods for population size estimation exist, but the statistical and practical assumptions required for their use may not be met when applied to HIV risk groups. We apply three approaches to estimate the number of people who inject drugs (PWID) in the Kohtla-Järve region of Estonia using data from a respondent-driven sampling (RDS) study: the standard "multiplier" estimate gives 654 people (95% CI 509-804), the "successive sampling" method gives estimates between 600 and 2500 people, and a network-based estimate that uses the RDS recruitment chain gives between 700 and 2800 people. We critically assess the strengths and weaknesses of these statistical approaches for estimating the size of hidden or hard-to-reach HIV risk groups.