variables statistical analyses: Topics by WorldWideScience.org

Sample records for variables statistical analyses

Statistical analyses of variability/reproducibility of environmentally assisted cyclic crack growth rate data utilizing JAERI Material Performance Database (JMPD)

International Nuclear Information System (INIS)

Tsuji, Hirokazu; Yokoyama, Norio; Nakajima, Hajime; Kondo, Tatsuo

1993-05-01

Statistical analyses were conducted by using the cyclic crack growth rate data for pressure vessel steels stored in the JAERI Material Performance Database (JMPD), and comparisons were made on variability and/or reproducibility of the data between obtained by ΔK-increasing and by ΔK-constant type tests. Based on the results of the statistical analyses, it was concluded that ΔK-constant type tests are generally superior to the commonly used ΔK-increasing type ones from the viewpoint of variability and/or reproducibility of the data. Such a tendency was more pronounced in the tests conducted in simulated LWR primary coolants than those in air. (author)
Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations

Energy Technology Data Exchange (ETDEWEB)

Kleijnen, J.P.C.; Helton, J.C.

1999-04-01

The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are considered for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.
Methodology development for statistical evaluation of reactor safety analyses

International Nuclear Information System (INIS)

Mazumdar, M.; Marshall, J.A.; Chay, S.C.; Gay, R.

1976-07-01

In February 1975, Westinghouse Electric Corporation, under contract to Electric Power Research Institute, started a one-year program to develop methodology for statistical evaluation of nuclear-safety-related engineering analyses. The objectives of the program were to develop an understanding of the relative efficiencies of various computational methods which can be used to compute probability distributions of output variables due to input parameter uncertainties in analyses of design basis events for nuclear reactors and to develop methods for obtaining reasonably accurate estimates of these probability distributions at an economically feasible level. A series of tasks was set up to accomplish these objectives. Two of the tasks were to investigate the relative efficiencies and accuracies of various Monte Carlo and analytical techniques for obtaining such estimates for a simple thermal-hydraulic problem whose output variable of interest is given in a closed-form relationship of the input variables and to repeat the above study on a thermal-hydraulic problem in which the relationship between the predicted variable and the inputs is described by a short-running computer program. The purpose of the report presented is to document the results of the investigations completed under these tasks, giving the rationale for choices of techniques and problems, and to present interim conclusions
An innovative statistical approach for analysing non-continuous variables in environmental monitoring: assessing temporal trends of TBT pollution.

Science.gov (United States)

Santos, José António; Galante-Oliveira, Susana; Barroso, Carlos

2011-03-01

The current work presents an innovative statistical approach to model ordinal variables in environmental monitoring studies. An ordinal variable has values that can only be compared as "less", "equal" or "greater" and it is not possible to have information about the size of the difference between two particular values. The example of ordinal variable under this study is the vas deferens sequence (VDS) used in imposex (superimposition of male sexual characters onto prosobranch females) field assessment programmes for monitoring tributyltin (TBT) pollution. The statistical methodology presented here is the ordered logit regression model. It assumes that the VDS is an ordinal variable whose values match up a process of imposex development that can be considered continuous in both biological and statistical senses and can be described by a latent non-observable continuous variable. This model was applied to the case study of Nucella lapillus imposex monitoring surveys conducted in the Portuguese coast between 2003 and 2008 to evaluate the temporal evolution of TBT pollution in this country. In order to produce more reliable conclusions, the proposed model includes covariates that may influence the imposex response besides TBT (e.g. the shell size). The model also provides an analysis of the environmental risk associated to TBT pollution by estimating the probability of the occurrence of females with VDS ≥ 2 in each year, according to OSPAR criteria. We consider that the proposed application of this statistical methodology has a great potential in environmental monitoring whenever there is the need to model variables that can only be assessed through an ordinal scale of values.
Statistical variability of hydro-meteorological variables as indicators ...

African Journals Online (AJOL)

Statistical variability of hydro-meteorological variables as indicators of climate change in north-east Sokoto-Rima basin, Nigeria. ... water resources development including water supply project, agriculture and tourism in the study area. Key word: Climate change, Climatic variability, Actual evapotranspiration, Global warming ...
Population activity statistics dissect subthreshold and spiking variability in V1.

Science.gov (United States)

Bányai, Mihály; Koman, Zsombor; Orbán, Gergő

2017-07-01

Response variability, as measured by fluctuating responses upon repeated performance of trials, is a major component of neural responses, and its characterization is key to interpret high dimensional population recordings. Response variability and covariability display predictable changes upon changes in stimulus and cognitive or behavioral state, providing an opportunity to test the predictive power of models of neural variability. Still, there is little agreement on which model to use as a building block for population-level analyses, and models of variability are often treated as a subject of choice. We investigate two competing models, the doubly stochastic Poisson (DSP) model assuming stochasticity at spike generation, and the rectified Gaussian (RG) model tracing variability back to membrane potential variance, to analyze stimulus-dependent modulation of both single-neuron and pairwise response statistics. Using a pair of model neurons, we demonstrate that the two models predict similar single-cell statistics. However, DSP and RG models have contradicting predictions on the joint statistics of spiking responses. To test the models against data, we build a population model to simulate stimulus change-related modulations in pairwise response statistics. We use single-unit data from the primary visual cortex (V1) of monkeys to show that while model predictions for variance are qualitatively similar to experimental data, only the RG model's predictions are compatible with joint statistics. These results suggest that models using Poisson-like variability might fail to capture important properties of response statistics. We argue that membrane potential-level modeling of stochasticity provides an efficient strategy to model correlations. NEW & NOTEWORTHY Neural variability and covariability are puzzling aspects of cortical computations. For efficient decoding and prediction, models of information encoding in neural populations hinge on an appropriate model of
Statistical analyses of scatterplots to identify important factors in large-scale simulations, 2: robustness of techniques

International Nuclear Information System (INIS)

Kleijnen, J.P.C.; Helton, J.C.

1999-01-01

The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (i) linear relationships with correlation coefficients, (ii) monotonic relationships with rank correlation coefficients, (iii) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (iv) trends in variability as defined by variances and interquartile ranges, and (v) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are considered for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (i) Type I errors are unavoidable, (ii) Type II errors can occur when inappropriate analysis procedures are used, (iii) physical explanations should always be sought for why statistical procedures identify variables as being important, and (iv) the identification of important variables tends to be stable for independent Latin hypercube samples
Statistical identification of effective input variables

International Nuclear Information System (INIS)

Vaurio, J.K.

1982-09-01

A statistical sensitivity analysis procedure has been developed for ranking the input data of large computer codes in the order of sensitivity-importance. The method is economical for large codes with many input variables, since it uses a relatively small number of computer runs. No prior judgemental elimination of input variables is needed. The sceening method is based on stagewise correlation and extensive regression analysis of output values calculated with selected input value combinations. The regression process deals with multivariate nonlinear functions, and statistical tests are also available for identifying input variables that contribute to threshold effects, i.e., discontinuities in the output variables. A computer code SCREEN has been developed for implementing the screening techniques. The efficiency has been demonstrated by several examples and applied to a fast reactor safety analysis code (Venus-II). However, the methods and the coding are general and not limited to such applications
Statistical analyses of scatterplots to identify important factors in large-scale simulations, 1: Review and comparison of techniques

International Nuclear Information System (INIS)

Kleijnen, J.P.C.; Helton, J.C.

1999-01-01

Procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses are described and illustrated. These procedures attempt to detect increasingly complex patterns in scatterplots and involve the identification of (i) linear relationships with correlation coefficients, (ii) monotonic relationships with rank correlation coefficients, (iii) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (iv) trends in variability as defined by variances and interquartile ranges, and (v) deviations from randomness as defined by the chi-square statistic. A sequence of example analyses with a large model for two-phase fluid flow illustrates how the individual procedures can differ in the variables that they identify as having effects on particular model outcomes. The example analyses indicate that the use of a sequence of procedures is a good analysis strategy and provides some assurance that an important effect is not overlooked
The intervals method: a new approach to analyse finite element outputs using multivariate statistics

Directory of Open Access Journals (Sweden)

Jordi Marcé-Nogué

2017-10-01

Full Text Available Background In this paper, we propose a new method, named the intervals’ method, to analyse data from finite element models in a comparative multivariate framework. As a case study, several armadillo mandibles are analysed, showing that the proposed method is useful to distinguish and characterise biomechanical differences related to diet/ecomorphology. Methods The intervals’ method consists of generating a set of variables, each one defined by an interval of stress values. Each variable is expressed as a percentage of the area of the mandible occupied by those stress values. Afterwards these newly generated variables can be analysed using multivariate methods. Results Applying this novel method to the biological case study of whether armadillo mandibles differ according to dietary groups, we show that the intervals’ method is a powerful tool to characterize biomechanical performance and how this relates to different diets. This allows us to positively discriminate between specialist and generalist species. Discussion We show that the proposed approach is a useful methodology not affected by the characteristics of the finite element mesh. Additionally, the positive discriminating results obtained when analysing a difficult case study suggest that the proposed method could be a very useful tool for comparative studies in finite element analysis using multivariate statistical approaches.
The intervals method: a new approach to analyse finite element outputs using multivariate statistics

Science.gov (United States)

De Esteban-Trivigno, Soledad; Püschel, Thomas A.; Fortuny, Josep

2017-01-01

Background In this paper, we propose a new method, named the intervals’ method, to analyse data from finite element models in a comparative multivariate framework. As a case study, several armadillo mandibles are analysed, showing that the proposed method is useful to distinguish and characterise biomechanical differences related to diet/ecomorphology. Methods The intervals’ method consists of generating a set of variables, each one defined by an interval of stress values. Each variable is expressed as a percentage of the area of the mandible occupied by those stress values. Afterwards these newly generated variables can be analysed using multivariate methods. Results Applying this novel method to the biological case study of whether armadillo mandibles differ according to dietary groups, we show that the intervals’ method is a powerful tool to characterize biomechanical performance and how this relates to different diets. This allows us to positively discriminate between specialist and generalist species. Discussion We show that the proposed approach is a useful methodology not affected by the characteristics of the finite element mesh. Additionally, the positive discriminating results obtained when analysing a difficult case study suggest that the proposed method could be a very useful tool for comparative studies in finite element analysis using multivariate statistical approaches. PMID:29043107
Statistical Dependence of Pipe Breaks on Explanatory Variables

Directory of Open Access Journals (Sweden)

Patricia Gómez-Martínez

2017-02-01

Full Text Available Aging infrastructure is the main challenge currently faced by water suppliers. Estimation of assets lifetime requires reliable criteria to plan assets repair and renewal strategies. To do so, pipe break prediction is one of the most important inputs. This paper analyzes the statistical dependence of pipe breaks on explanatory variables, determining their optimal combination and quantifying their influence on failure prediction accuracy. A large set of registered data from Madrid water supply network, managed by Canal de Isabel II, has been filtered, classified and studied. Several statistical Bayesian models have been built and validated from the available information with a technique that combines reference periods of time as well as geographical location. Statistical models of increasing complexity are built from zero up to five explanatory variables following two approaches: a set of independent variables or a combination of two joint variables plus an additional number of independent variables. With the aim of finding the variable combination that provides the most accurate prediction, models are compared following an objective validation procedure based on the model skill to predict the number of pipe breaks in a large set of geographical locations. As expected, model performance improves as the number of explanatory variables increases. However, the rate of improvement is not constant. Performance metrics improve significantly up to three variables, but the tendency is softened for higher order models, especially in trunk mains where performance is reduced. Slight differences are found between trunk mains and distribution lines when selecting the most influent variables and models.
Statistical and extra-statistical considerations in differential item functioning analyses

Directory of Open Access Journals (Sweden)

G. K. Huysamen

2004-10-01

Full Text Available This article briefly describes the main procedures for performing differential item functioning (DIF analyses and points out some of the statistical and extra-statistical implications of these methods. Research findings on the sources of DIF, including those associated with translated tests, are reviewed. As DIF analyses are oblivious of correlations between a test and relevant criteria, the elimination of differentially functioning items does not necessarily improve predictive validity or reduce any predictive bias. The implications of the results of past DIF research for test development in the multilingual and multi-cultural South African society are considered. Opsomming Hierdie artikel beskryf kortliks die hoofprosedures vir die ontleding van differensiële itemfunksionering (DIF en verwys na sommige van die statistiese en buite-statistiese implikasies van hierdie metodes. ’n Oorsig word verskaf van navorsingsbevindings oor die bronne van DIF, insluitend dié by vertaalde toetse. Omdat DIF-ontledings nie die korrelasies tussen ’n toets en relevante kriteria in ag neem nie, sal die verwydering van differensieel-funksionerende items nie noodwendig voorspellingsgeldigheid verbeter of voorspellingsydigheid verminder nie. Die implikasies van vorige DIF-navorsingsbevindings vir toetsontwikkeling in die veeltalige en multikulturele Suid-Afrikaanse gemeenskap word oorweeg.
Understanding and forecasting polar stratospheric variability with statistical models

Directory of Open Access Journals (Sweden)

C. Blume

2012-07-01

Full Text Available The variability of the north-polar stratospheric vortex is a prominent aspect of the middle atmosphere. This work investigates a wide class of statistical models with respect to their ability to model geopotential and temperature anomalies, representing variability in the polar stratosphere. Four partly nonstationary, nonlinear models are assessed: linear discriminant analysis (LDA; a cluster method based on finite elements (FEM-VARX; a neural network, namely the multi-layer perceptron (MLP; and support vector regression (SVR. These methods model time series by incorporating all significant external factors simultaneously, including ENSO, QBO, the solar cycle, volcanoes, to then quantify their statistical importance. We show that variability in reanalysis data from 1980 to 2005 is successfully modeled. The period from 2005 to 2011 can be hindcasted to a certain extent, where MLP performs significantly better than the remaining models. However, variability remains that cannot be statistically hindcasted within the current framework, such as the unexpected major warming in January 2009. Finally, the statistical model with the best generalization performance is used to predict a winter 2011/12 with warm and weak vortex conditions. A vortex breakdown is predicted for late January, early February 2012.
Atmospheric forcing of decadal Baltic Sea level variability in the last 200 years. A statistical analysis

Energy Technology Data Exchange (ETDEWEB)

Huenicke, B. [GKSS-Forschungszentrum Geesthacht GmbH (Germany). Inst. fuer Kuestenforschung

2008-11-06

This study aims at the estimation of the impact of different atmospheric factors on the past sealevel variations (up to 200 years) in the Baltic Sea by statistically analysing the relationship between Baltic Sea level records and observational and proxy-based reconstructed climatic data sets. The focus lies on the identification and possible quantification of the contribution of sealevel pressure (wind), air-temperature and precipitation to the low-frequency (decadal and multi-decadal) variability of Baltic Sea level. It is known that the wind forcing is the main factor explaining average Baltic Sea level variability at inter-annual to decadal timescales, especially in wintertime. In this thesis it is statistically estimated to what extent other regional climate factors contribute to the spatially heterogeneous Baltic Sea level variations around the isostatic trend at multi-decadal timescales. Although the statistical analysis cannot be completely conclusive, as the potential climate drivers are all statistically interrelated to some degree, the results indicate that precipitation should be taken into account as an explanatory variable for sea-level variations. On the one hand it has been detected that the amplitude of the annual cycle of Baltic Sea level has increased throughout the 20th century and precipitation seems to be the only factor among those analysed (wind through SLP field, barometric effect, temperature and precipitation) that can account for this evolution. On the other hand, precipitation increases the ability to hindcast inter-annual variations of sea level in some regions and seasons, especially in the Southern Baltic in summertime. The mechanism by which precipitation exerts its influence on Baltic Sea level is not ascertained in this statistical analysis due to the lack of long salinity time series. This result, however, represents a working hypothesis that can be confirmed or disproved by long simulations of the Baltic Sea system - ocean
Multivariate statistical analyses demonstrate unique host immune responses to single and dual lentiviral infection.

Directory of Open Access Journals (Sweden)

Sunando Roy

2009-10-01

Full Text Available Feline immunodeficiency virus (FIV and human immunodeficiency virus (HIV are recently identified lentiviruses that cause progressive immune decline and ultimately death in infected cats and humans. It is of great interest to understand how to prevent immune system collapse caused by these lentiviruses. We recently described that disease caused by a virulent FIV strain in cats can be attenuated if animals are first infected with a feline immunodeficiency virus derived from a wild cougar. The detailed temporal tracking of cat immunological parameters in response to two viral infections resulted in high-dimensional datasets containing variables that exhibit strong co-variation. Initial analyses of these complex data using univariate statistical techniques did not account for interactions among immunological response variables and therefore potentially obscured significant effects between infection state and immunological parameters.Here, we apply a suite of multivariate statistical tools, including Principal Component Analysis, MANOVA and Linear Discriminant Analysis, to temporal immunological data resulting from FIV superinfection in domestic cats. We investigated the co-variation among immunological responses, the differences in immune parameters among four groups of five cats each (uninfected, single and dual infected animals, and the "immune profiles" that discriminate among them over the first four weeks following superinfection. Dual infected cats mount an immune response by 24 days post superinfection that is characterized by elevated levels of CD8 and CD25 cells and increased expression of IL4 and IFNgamma, and FAS. This profile discriminates dual infected cats from cats infected with FIV alone, which show high IL-10 and lower numbers of CD8 and CD25 cells.Multivariate statistical analyses demonstrate both the dynamic nature of the immune response to FIV single and dual infection and the development of a unique immunological profile in dual
Computational Performance Optimisation for Statistical Analysis of the Effect of Nano-CMOS Variability on Integrated Circuits

Directory of Open Access Journals (Sweden)

Zheng Xie

2013-01-01

Full Text Available The intrinsic variability of nanoscale VLSI technology must be taken into account when analyzing circuit designs to predict likely yield. Monte-Carlo- (MC- and quasi-MC- (QMC- based statistical techniques do this by analysing many randomised or quasirandomised copies of circuits. The randomisation must model forms of variability that occur in nano-CMOS technology, including “atomistic” effects without intradie correlation and effects with intradie correlation between neighbouring devices. A major problem is the computational cost of carrying out sufficient analyses to produce statistically reliable results. The use of principal components analysis, behavioural modeling, and an implementation of “Statistical Blockade” (SB is shown to be capable of achieving significant reduction in the computational costs. A computation time reduction of 98.7% was achieved for a commonly used asynchronous circuit element. Replacing MC by QMC analysis can achieve further computation reduction, and this is illustrated for more complex circuits, with the results being compared with those of transistor-level simulations. The “yield prediction” analysis of SRAM arrays is taken as a case study, where the arrays contain up to 1536 transistors modelled using parameters appropriate to 35 nm technology. It is reported that savings of up to 99.85% in computation time were obtained.
Finite-sample instrumental variables Inference using an Asymptotically Pivotal Statistic

NARCIS (Netherlands)

Bekker, P.; Kleibergen, F.R.

2001-01-01

The paper considers the K-statistic, Kleibergen’s (2000) adaptation ofthe Anderson-Rubin (AR) statistic in instrumental variables regression.Compared to the AR-statistic this K-statistic shows improvedasymptotic efficiency in terms of degrees of freedom in overidentifiedmodels and yet it shares,
Finite-sample instrumental variables inference using an asymptotically pivotal statistic

NARCIS (Netherlands)

Bekker, Paul A.; Kleibergen, Frank

2001-01-01

The paper considers the K-statistic, Kleibergen’s (2000) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Compared to the AR-statistic this K-statistic shows improved asymptotic efficiency in terms of degrees of freedom in overidenti?ed models and yet it shares,
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

Science.gov (United States)

Chu, Annie; Cui, Jenny; Dinov, Ivo D

2009-03-01

The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most

Finite-sample instrumental variables inference using an asymptotically pivotal statistic

NARCIS (Netherlands)

Bekker, P; Kleibergen, F

2003-01-01

We consider the K-statistic, Kleibergen's (2002, Econometrica 70, 1781-1803) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Whereas Kleibergen (2002) especially analyzes the asymptotic behavior of the statistic, we focus on finite-sample properties in, a
Theoretical statistics of zero-age cataclysmic variables

International Nuclear Information System (INIS)

Politano, M.J.

1988-01-01

The distribution of the white dwarf masses, the distribution of the mass ratios and the distribution of the orbital periods in cataclysmic variables which are forming at the present time are calculated. These systems are referred to as zero-age cataclysmic variables. The results show that 60% of the systems being formed contain helium white dwarfs and 40% contain carbon-oxygen white dwarfs. The mean dwarf mass in those systems containing helium white dwarfs is 0.34. The mean white dwarf mass in those systems containing carbon-oxygen white dwarfs is 0.75. The orbital period distribution identifies four main classes of zero-age cataclysmic variables: (1) short-period systems containing helium white dwarfs, (2) systems containing carbon-oxygen white dwarfs whose secondaries are convectively stable against rapid mass transfer to the white dwarf, (3) systems containing carbon-oxygen white dwarfs whose secondaries are radiatively stable against rapid mass transfer to the white dwarf and (4) long period systems with evolved secondaries. The white dwarf mass distribution in zero-age cataclysmic variables has direct application to the calculation of the frequency of outburst in classical novae as a function of the mass of the white dwarf. The method developed in this thesis to calculate the distributions of the orbital parameters in zero-age cataclysmic variables can be used to calculate theoretical statistics of any class of binary systems. This method provides a theoretical framework from which to investigate the statistical properties and the evolution of the orbital parameters of binary systems
Statistical conditional sampling for variable-resolution video compression.

Directory of Open Access Journals (Sweden)

Alexander Wong

Full Text Available In this study, we investigate a variable-resolution approach to video compression based on Conditional Random Field and statistical conditional sampling in order to further improve compression rate while maintaining high-quality video. In the proposed approach, representative key-frames within a video shot are identified and stored at full resolution. The remaining frames within the video shot are stored and compressed at a reduced resolution. At the decompression stage, a region-based dictionary is constructed from the key-frames and used to restore the reduced resolution frames to the original resolution via statistical conditional sampling. The sampling approach is based on the conditional probability of the CRF modeling by use of the constructed dictionary. Experimental results show that the proposed variable-resolution approach via statistical conditional sampling has potential for improving compression rates when compared to compressing the video at full resolution, while achieving higher video quality when compared to compressing the video at reduced resolution.
SOCR Analyses: Implementation and Demonstration of a New Graphical Statistics Educational Toolkit

Directory of Open Access Journals (Sweden)

Annie Chu

2009-04-01

Full Text Available The web-based, Java-written SOCR (Statistical Online Computational Resource toolshave been utilized in many undergraduate and graduate level statistics courses for sevenyears now (Dinov 2006; Dinov et al. 2008b. It has been proven that these resourcescan successfully improve students' learning (Dinov et al. 2008b. Being rst publishedonline in 2005, SOCR Analyses is a somewhat new component and it concentrate on datamodeling for both parametric and non-parametric data analyses with graphical modeldiagnostics. One of the main purposes of SOCR Analyses is to facilitate statistical learn-ing for high school and undergraduate students. As we have already implemented SOCRDistributions and Experiments, SOCR Analyses and Charts fulll the rest of a standardstatistics curricula. Currently, there are four core components of SOCR Analyses. Linearmodels included in SOCR Analyses are simple linear regression, multiple linear regression,one-way and two-way ANOVA. Tests for sample comparisons include t-test in the para-metric category. Some examples of SOCR Analyses' in the non-parametric category areWilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, Kolmogorov-Smirno testand Fligner-Killeen test. Hypothesis testing models include contingency table, Friedman'stest and Fisher's exact test. The last component of Analyses is a utility for computingsample sizes for normal distribution. In this article, we present the design framework,computational implementation and the utilization of SOCR Analyses.
Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures

Energy Technology Data Exchange (ETDEWEB)

Udey, Ruth Norma [Michigan State Univ., East Lansing, MI (United States)

2013-01-01

Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.
Variables associated with achievement in higher education: A systematic review of meta-analyses.

Science.gov (United States)

Schneider, Michael; Preckel, Franzis

2017-06-01

The last 2 decades witnessed a surge in empirical studies on the variables associated with achievement in higher education. A number of meta-analyses synthesized these findings. In our systematic literature review, we included 38 meta-analyses investigating 105 correlates of achievement, based on 3,330 effect sizes from almost 2 million students. We provide a list of the 105 variables, ordered by the effect size, and summary statistics for central research topics. The results highlight the close relation between social interaction in courses and achievement. Achievement is also strongly associated with the stimulation of meaningful learning by presenting information in a clear way, relating it to the students, and using conceptually demanding learning tasks. Instruction and communication technology has comparably weak effect sizes, which did not increase over time. Strong moderator effects are found for almost all instructional methods, indicating that how a method is implemented in detail strongly affects achievement. Teachers with high-achieving students invest time and effort in designing the microstructure of their courses, establish clear learning goals, and employ feedback practices. This emphasizes the importance of teacher training in higher education. Students with high achievement are characterized by high self-efficacy, high prior achievement and intelligence, conscientiousness, and the goal-directed use of learning strategies. Barring the paucity of controlled experiments and the lack of meta-analyses on recent educational innovations, the variables associated with achievement in higher education are generally well investigated and well understood. By using these findings, teachers, university administrators, and policymakers can increase the effectivity of higher education. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Flow prediction models using macroclimatic variables and multivariate statistical techniques in the Cauca River Valley

International Nuclear Information System (INIS)

Carvajal Escobar Yesid; Munoz, Flor Matilde

2007-01-01

The project this centred in the revision of the state of the art of the ocean-atmospheric phenomena that you affect the Colombian hydrology especially The Phenomenon Enos that causes a socioeconomic impact of first order in our country, it has not been sufficiently studied; therefore it is important to approach the thematic one, including the variable macroclimates associated to the Enos in the analyses of water planning. The analyses include revision of statistical techniques of analysis of consistency of hydrological data with the objective of conforming a database of monthly flow of the river reliable and homogeneous Cauca. Statistical methods are used (Analysis of data multivariante) specifically The analysis of principal components to involve them in the development of models of prediction of flows monthly means in the river Cauca involving the Lineal focus as they are the model autoregressive AR, ARX and Armax and the focus non lineal Net Artificial Network.
[Hydrologic variability and sensitivity based on Hurst coefficient and Bartels statistic].

Science.gov (United States)

Lei, Xu; Xie, Ping; Wu, Zi Yi; Sang, Yan Fang; Zhao, Jiang Yan; Li, Bin Bin

2018-04-01

Due to the global climate change and frequent human activities in recent years, the pure stochastic components of hydrological sequence is mixed with one or several of the variation ingredients, including jump, trend, period and dependency. It is urgently needed to clarify which indices should be used to quantify the degree of their variability. In this study, we defined the hydrological variability based on Hurst coefficient and Bartels statistic, and used Monte Carlo statistical tests to test and analyze their sensitivity to different variants. When the hydrological sequence had jump or trend variation, both Hurst coefficient and Bartels statistic could reflect the variation, with the Hurst coefficient being more sensitive to weak jump or trend variation. When the sequence had period, only the Bartels statistic could detect the mutation of the sequence. When the sequence had a dependency, both the Hurst coefficient and the Bartels statistics could reflect the variation, with the latter could detect weaker dependent variations. For the four variations, both the Hurst variability and Bartels variability increased with the increases of variation range. Thus, they could be used to measure the variation intensity of the hydrological sequence. We analyzed the temperature series of different weather stations in the Lancang River basin. Results showed that the temperature of all stations showed the upward trend or jump, indicating that the entire basin had experienced warming in recent years and the temperature variability in the upper and lower reaches was much higher. This case study showed the practicability of the proposed method.
Cancer Statistics Animator

Science.gov (United States)

This tool allows users to animate cancer trends over time by cancer site and cause of death, race, and sex. Provides access to incidence, mortality, and survival. Select the type of statistic, variables, format, and then extract the statistics in a delimited format for further analyses.
Transformation (normalization) of slope gradient and surface curvatures, automated for statistical analyses from DEMs

Science.gov (United States)

Csillik, O.; Evans, I. S.; Drăguţ, L.

2015-03-01

Automated procedures are developed to alleviate long tails in frequency distributions of morphometric variables. They minimize the skewness of slope gradient frequency distributions, and modify the kurtosis of profile and plan curvature distributions toward that of the Gaussian (normal) model. Box-Cox (for slope) and arctangent (for curvature) transformations are tested on nine digital elevation models (DEMs) of varying origin and resolution, and different landscapes, and shown to be effective. Resulting histograms are illustrated and show considerable improvements over those for previously recommended slope transformations (sine, square root of sine, and logarithm of tangent). Unlike previous approaches, the proposed method evaluates the frequency distribution of slope gradient values in a given area and applies the most appropriate transform if required. Sensitivity of the arctangent transformation is tested, showing that Gaussian-kurtosis transformations are acceptable also in terms of histogram shape. Cube root transformations of curvatures produced bimodal histograms. The transforms are applicable to morphometric variables and many others with skewed or long-tailed distributions. By avoiding long tails and outliers, they permit parametric statistics such as correlation, regression and principal component analyses to be applied, with greater confidence that requirements for linearity, additivity and even scatter of residuals (constancy of error variance) are likely to be met. It is suggested that such transformations should be routinely applied in all parametric analyses of long-tailed variables. Our Box-Cox and curvature automated transformations are based on a Python script, implemented as an easy-to-use script tool in ArcGIS.
Statistical screening of input variables in a complex computer code

International Nuclear Information System (INIS)

Krieger, T.J.

1982-01-01

A method is presented for ''statistical screening'' of input variables in a complex computer code. The object is to determine the ''effective'' or important input variables by estimating the relative magnitudes of their associated sensitivity coefficients. This is accomplished by performing a numerical experiment consisting of a relatively small number of computer runs with the code followed by a statistical analysis of the results. A formula for estimating the sensitivity coefficients is derived. Reference is made to an earlier work in which the method was applied to a complex reactor code with good results
Statistical analyses to support guidelines for marine avian sampling. Final report

Science.gov (United States)

Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris

2012-01-01

distribution to describe counts of a given species in a particular region and season. 4. Using a large database of historical at-sea seabird survey data, we applied this technique to identify appropriate statistical distributions for modeling a variety of species, allowing the distribution to vary by season. For each species and season, we used the selected distribution to calculate and map retrospective statistical power to detect hotspots and coldspots, and map pvalues from Monte Carlo significance tests of hotspots and coldspots, in discrete lease blocks designated by the U.S. Department of Interior, Bureau of Ocean Energy Management (BOEM). 5. Because our definition of hotspots and coldspots does not explicitly include variability over time, we examine the relationship between the temporal scale of sampling and the proportion of variance captured in time series of key environmental correlates of marine bird abundance, as well as available marine bird abundance time series, and use these analyses to develop recommendations for the temporal distribution of sampling to adequately represent both shortterm and long-term variability. We conclude by presenting a schematic “decision tree” showing how this power analysis approach would fit in a general framework for avian survey design, and discuss implications of model assumptions and results. We discuss avenues for future development of this work, and recommendations for practical implementation in the context of siting and wildlife assessment for offshore renewable energy development projects.
The determinants of bond angle variability in protein/peptide backbones: A comprehensive statistical/quantum mechanics analysis.

Science.gov (United States)

Improta, Roberto; Vitagliano, Luigi; Esposito, Luciana

2015-11-01

The elucidation of the mutual influence between peptide bond geometry and local conformation has important implications for protein structure refinement, validation, and prediction. To gain insights into the structural determinants and the energetic contributions associated with protein/peptide backbone plasticity, we here report an extensive analysis of the variability of the peptide bond angles by combining statistical analyses of protein structures and quantum mechanics calculations on small model peptide systems. Our analyses demonstrate that all the backbone bond angles strongly depend on the peptide conformation and unveil the existence of regular trends as function of ψ and/or φ. The excellent agreement of the quantum mechanics calculations with the statistical surveys of protein structures validates the computational scheme here employed and demonstrates that the valence geometry of protein/peptide backbone is primarily dictated by local interactions. Notably, for the first time we show that the position of the H(α) hydrogen atom, which is an important parameter in NMR structural studies, is also dependent on the local conformation. Most of the trends observed may be satisfactorily explained by invoking steric repulsive interactions; in some specific cases the valence bond variability is also influenced by hydrogen-bond like interactions. Moreover, we can provide a reliable estimate of the energies involved in the interplay between geometry and conformations. © 2015 Wiley Periodicals, Inc.
Curve fitting and modeling with splines using statistical variable selection techniques

Science.gov (United States)

Smith, P. L.

1982-01-01

The successful application of statistical variable selection techniques to fit splines is demonstrated. Major emphasis is given to knot selection, but order determination is also discussed. Two FORTRAN backward elimination programs, using the B-spline basis, were developed. The program for knot elimination is compared in detail with two other spline-fitting methods and several statistical software packages. An example is also given for the two-variable case using a tensor product basis, with a theoretical discussion of the difficulties of their use.
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"

Science.gov (United States)

Ozturk, Elif

2012-01-01

The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
Combining epidemiologic and biostatistical tools to enhance variable selection in HIV cohort analyses.

Directory of Open Access Journals (Sweden)

Christopher Rentsch

Full Text Available BACKGROUND: Variable selection is an important step in building a multivariate regression model for which several methods and statistical packages are available. A comprehensive approach for variable selection in complex multivariate regression analyses within HIV cohorts is explored by utilizing both epidemiological and biostatistical procedures. METHODS: Three different methods for variable selection were illustrated in a study comparing survival time between subjects in the Department of Defense's National History Study and the Atlanta Veterans Affairs Medical Center's HIV Atlanta VA Cohort Study. The first two methods were stepwise selection procedures, based either on significance tests (Score test, or on information theory (Akaike Information Criterion, while the third method employed a Bayesian argument (Bayesian Model Averaging. RESULTS: All three methods resulted in a similar parsimonious survival model. Three of the covariates previously used in the multivariate model were not included in the final model suggested by the three approaches. When comparing the parsimonious model to the previously published model, there was evidence of less variance in the main survival estimates. CONCLUSIONS: The variable selection approaches considered in this study allowed building a model based on significance tests, on an information criterion, and on averaging models using their posterior probabilities. A parsimonious model that balanced these three approaches was found to provide a better fit than the previously reported model.
Multivariate statistical methods a first course

CERN Document Server

Marcoulides, George A

2014-01-01

Multivariate statistics refer to an assortment of statistical methods that have been developed to handle situations in which multiple variables or measures are involved. Any analysis of more than two variables or measures can loosely be considered a multivariate statistical analysis. An introductory text for students learning multivariate statistical methods for the first time, this book keeps mathematical details to a minimum while conveying the basic principles. One of the principal strategies used throughout the book--in addition to the presentation of actual data analyses--is poin
Variability analysis of AGN: a review of results using new statistical criteria

Science.gov (United States)

Zibecchi, L.; Andruchow, I.; Cellone, S. A.; Romero, G. E.; Combi, J. A.

We present here a re-analysis of the variability results of a sample of active galactic nuclei (AGN), which have been observed on several sessions with the 2.15 m "Jorge Sahade" telescope (CASLEO), San Juan, Argentina, and whose results are published (Romero et al. 1999, 2000, 2002; Cellone et al. 2000). The motivation for this new analysis is the implementation, dur- ing the last years, of improvements in the statistical criteria applied, taking quantitatively into account the incidence of the photometric errors (Cellone et al. 2007). This work is framed as a first step in an integral study on the statistical estimators of AGN variability. This study is motivated by the great diversity of statistical tests that have been proposed to analyze the variability of these objects. Since we note that, in some cases, the results of the object variability depend on the test used, we attempt to make a com- parative study of the various tests and analyze, under the given conditions, which of them is the most efficient and reliable.
Ratio index variables or ANCOVA? Fisher's cats revisited.

Science.gov (United States)

Tu, Yu-Kang; Law, Graham R; Ellison, George T H; Gilthorpe, Mark S

2010-01-01

Over 60 years ago Ronald Fisher demonstrated a number of potential pitfalls with statistical analyses using ratio variables. Nonetheless, these pitfalls are largely overlooked in contemporary clinical and epidemiological research, which routinely uses ratio variables in statistical analyses. This article aims to demonstrate how very different findings can be generated as a result of less than perfect correlations among the data used to generate ratio variables. These imperfect correlations result from measurement error and random biological variation. While the former can often be reduced by improvements in measurement, random biological variation is difficult to estimate and eliminate in observational studies. Moreover, wherever the underlying biological relationships among epidemiological variables are unclear, and hence the choice of statistical model is also unclear, the different findings generated by different analytical strategies can lead to contradictory conclusions. Caution is therefore required when interpreting analyses of ratio variables whenever the underlying biological relationships among the variables involved are unspecified or unclear. (c) 2009 John Wiley & Sons, Ltd.
Comparison of climate envelope models developed using expert-selected variables versus statistical selection

Science.gov (United States)

Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romañach, Stephanie; Watling, James I.; Mazzotti, Frank J.

2017-01-01

Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable

A multi-criteria evaluation system for marine litter pollution based on statistical analyses of OSPAR beach litter monitoring time series.

Science.gov (United States)

Schulz, Marcus; Neumann, Daniel; Fleet, David M; Matthies, Michael

2013-12-01

During the last decades, marine pollution with anthropogenic litter has become a worldwide major environmental concern. Standardized monitoring of litter since 2001 on 78 beaches selected within the framework of the Convention for the Protection of the Marine Environment of the North-East Atlantic (OSPAR) has been used to identify temporal trends of marine litter. Based on statistical analyses of this dataset a two-part multi-criteria evaluation system for beach litter pollution of the North-East Atlantic and the North Sea is proposed. Canonical correlation analyses, linear regression analyses, and non-parametric analyses of variance were used to identify different temporal trends. A classification of beaches was derived from cluster analyses and served to define different states of beach quality according to abundances of 17 input variables. The evaluation system is easily applicable and relies on the above-mentioned classification and on significant temporal trends implied by significant rank correlations. Copyright © 2013 Elsevier Ltd. All rights reserved.
A Statistical Analysis of Cointegration for I(2) Variables

DEFF Research Database (Denmark)

Johansen, Søren

1995-01-01

be conducted using the ¿ sup2/sup distribution. It is shown to what extent inference on the cointegration ranks can be conducted using the tables already prepared for the analysis of cointegration of I(1) variables. New tables are needed for the test statistics to control the size of the tests. This paper...... contains a multivariate test for the existence of I(2) variables. This test is illustrated using a data set consisting of U.K. and foreign prices and interest rates as well as the exchange rate....
Spatio-temporal dependencies between hospital beds, physicians and health expenditure using visual variables and data classification in statistical table

Science.gov (United States)

Medyńska-Gulij, Beata; Cybulski, Paweł

2016-06-01

This paper analyses the use of table visual variables of statistical data of hospital beds as an important tool for revealing spatio-temporal dependencies. It is argued that some of conclusions from the data about public health and public expenditure on health have a spatio-temporal reference. Different from previous studies, this article adopts combination of cartographic pragmatics and spatial visualization with previous conclusions made in public health literature. While the significant conclusions about health care and economic factors has been highlighted in research papers, this article is the first to apply visual analysis to statistical table together with maps which is called previsualisation.
Spatio-temporal dependencies between hospital beds, physicians and health expenditure using visual variables and data classification in statistical table

Directory of Open Access Journals (Sweden)

Medyńska-Gulij Beata

2016-06-01

Full Text Available This paper analyses the use of table visual variables of statistical data of hospital beds as an important tool for revealing spatio-temporal dependencies. It is argued that some of conclusions from the data about public health and public expenditure on health have a spatio-temporal reference. Different from previous studies, this article adopts combination of cartographic pragmatics and spatial visualization with previous conclusions made in public health literature. While the significant conclusions about health care and economic factors has been highlighted in research papers, this article is the first to apply visual analysis to statistical table together with maps which is called previsualisation.
Statistical analyses of digital collections: Using a large corpus of systematic reviews to study non-citations

DEFF Research Database (Denmark)

Frandsen, Tove Faber; Nicolaisen, Jeppe

2017-01-01

Using statistical methods to analyse digital material for patterns makes it possible to detect patterns in big data that we would otherwise not be able to detect. This paper seeks to exemplify this fact by statistically analysing a large corpus of references in systematic reviews. The aim...
Comparative Analysis of Upper Ocean Heat Content Variability from Ensemble Operational Ocean Analyses

Science.gov (United States)

Xue, Yan; Balmaseda, Magdalena A.; Boyer, Tim; Ferry, Nicolas; Good, Simon; Ishikawa, Ichiro; Rienecker, Michele; Rosati, Tony; Yin, Yonghong; Kumar, Arun

2012-01-01

Upper ocean heat content (HC) is one of the key indicators of climate variability on many time-scales extending from seasonal to interannual to long-term climate trends. For example, HC in the tropical Pacific provides information on thermocline anomalies that is critical for the longlead forecast skill of ENSO. Since HC variability is also associated with SST variability, a better understanding and monitoring of HC variability can help us understand and forecast SST variability associated with ENSO and other modes such as Indian Ocean Dipole (IOD), Pacific Decadal Oscillation (PDO), Tropical Atlantic Variability (TAV) and Atlantic Multidecadal Oscillation (AMO). An accurate ocean initialization of HC anomalies in coupled climate models could also contribute to skill in decadal climate prediction. Errors, and/or uncertainties, in the estimation of HC variability can be affected by many factors including uncertainties in surface forcings, ocean model biases, and deficiencies in data assimilation schemes. Changes in observing systems can also leave an imprint on the estimated variability. The availability of multiple operational ocean analyses (ORA) that are routinely produced by operational and research centers around the world provides an opportunity to assess uncertainties in HC analyses, to help identify gaps in observing systems as they impact the quality of ORAs and therefore climate model forecasts. A comparison of ORAs also gives an opportunity to identify deficiencies in data assimilation schemes, and can be used as a basis for development of real-time multi-model ensemble HC monitoring products. The OceanObs09 Conference called for an intercomparison of ORAs and use of ORAs for global ocean monitoring. As a follow up, we intercompared HC variations from ten ORAs -- two objective analyses based on in-situ data only and eight model analyses based on ocean data assimilation systems. The mean, annual cycle, interannual variability and longterm trend of HC have
Statistical Analyses of Second Indoor Bio-Release Field Evaluation Study at Idaho National Laboratory

Energy Technology Data Exchange (ETDEWEB)

Amidan, Brett G.; Pulsipher, Brent A.; Matzke, Brett D.

2009-12-17

number of zeros. Using QQ plots these data characteristics show a lack of normality from the data after contamination. Normality is improved when looking at log(CFU/cm2). Variance component analysis (VCA) and analysis of variance (ANOVA) were used to estimate the amount of variance due to each source and to determine which sources of variability were statistically significant. In general, the sampling methods interacted with the across event variability and with the across room variability. For this reason, it was decided to do analyses for each sampling method, individually. The between event variability and between room variability were significant for each method, except for the between event variability for the swabs. For both the wipes and vacuums, the within room standard deviation was much larger (26.9 for wipes and 7.086 for vacuums) than the between event standard deviation (6.552 for wipes and 1.348 for vacuums) and the between room standard deviation (6.783 for wipes and 1.040 for vacuums). Swabs between room standard deviation was 0.151, while both the within room and between event standard deviations are less than 0.10 (all measurements in CFU/cm2).
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.

Science.gov (United States)

Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg

2009-11-01

G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Assessing compositional variability through graphical analysis and Bayesian statistical approaches: case studies on transgenic crops.

Science.gov (United States)

Harrigan, George G; Harrison, Jay M

2012-01-01

New transgenic (GM) crops are subjected to extensive safety assessments that include compositional comparisons with conventional counterparts as a cornerstone of the process. The influence of germplasm, location, environment, and agronomic treatments on compositional variability is, however, often obscured in these pair-wise comparisons. Furthermore, classical statistical significance testing can often provide an incomplete and over-simplified summary of highly responsive variables such as crop composition. In order to more clearly describe the influence of the numerous sources of compositional variation we present an introduction to two alternative but complementary approaches to data analysis and interpretation. These include i) exploratory data analysis (EDA) with its emphasis on visualization and graphics-based approaches and ii) Bayesian statistical methodology that provides easily interpretable and meaningful evaluations of data in terms of probability distributions. The EDA case-studies include analyses of herbicide-tolerant GM soybean and insect-protected GM maize and soybean. Bayesian approaches are presented in an analysis of herbicide-tolerant GM soybean. Advantages of these approaches over classical frequentist significance testing include the more direct interpretation of results in terms of probabilities pertaining to quantities of interest and no confusion over the application of corrections for multiple comparisons. It is concluded that a standardized framework for these methodologies could provide specific advantages through enhanced clarity of presentation and interpretation in comparative assessments of crop composition.
Statistical analyses in the study of solar wind-magnetosphere coupling

International Nuclear Information System (INIS)

Baker, D.N.

1985-01-01

Statistical analyses provide a valuable method for establishing initially the existence (or lack of existence) of a relationship between diverse data sets. Statistical methods also allow one to make quantitative assessments of the strengths of observed relationships. This paper reviews the essential techniques and underlying statistical bases for the use of correlative methods in solar wind-magnetosphere coupling studies. Techniques of visual correlation and time-lagged linear cross-correlation analysis are emphasized, but methods of multiple regression, superposed epoch analysis, and linear prediction filtering are also described briefly. The long history of correlation analysis in the area of solar wind-magnetosphere coupling is reviewed with the assessments organized according to data averaging time scales (minutes to years). It is concluded that these statistical methods can be very useful first steps, but that case studies and various advanced analysis methods should be employed to understand fully the average response of the magnetosphere to solar wind input. It is clear that many workers have not always recognized underlying assumptions of statistical methods and thus the significance of correlation results can be in doubt. Long-term averages (greater than or equal to 1 hour) can reveal gross relationships, but only when dealing with high-resolution data (1 to 10 min) can one reach conclusions pertinent to magnetospheric response time scales and substorm onset mechanisms
CADDIS Volume 4. Data Analysis: Advanced Analyses - Controlling for Natural Variability

Science.gov (United States)

Methods for controlling natural variability, predicting environmental conditions from biological observations method, biological trait data, species sensitivity distributions, propensity scores, Advanced Analyses of Data Analysis references.
The relationship between venture capital investment and macro economic variables via statistical computation method

Science.gov (United States)

Aygunes, Gunes

2017-07-01

The objective of this paper is to survey and determine the macroeconomic factors affecting the level of venture capital (VC) investments in a country. The literary depends on venture capitalists' quality and countries' venture capital investments. The aim of this paper is to give relationship between venture capital investment and macro economic variables via statistical computation method. We investigate the countries and macro economic variables. By using statistical computation method, we derive correlation between venture capital investments and macro economic variables. According to method of logistic regression model (logit regression or logit model), macro economic variables are correlated with each other in three group. Venture capitalists regard correlations as a indicator. Finally, we give correlation matrix of our results.
Statistical analyses of extreme food habits

International Nuclear Information System (INIS)

Breuninger, M.; Neuhaeuser-Berthold, M.

2000-01-01

This report is a summary of the results of the project ''Statistical analyses of extreme food habits'', which was ordered from the National Office for Radiation Protection as a contribution to the amendment of the ''General Administrative Regulation to paragraph 45 of the Decree on Radiation Protection: determination of the radiation exposition by emission of radioactive substances from facilities of nuclear technology''. Its aim is to show if the calculation of the radiation ingested by 95% of the population by food intake, like it is planned in a provisional draft, overestimates the true exposure. If such an overestimation exists, the dimension of it should be determined. It was possible to prove the existence of this overestimation but its dimension could only roughly be estimated. To identify the real extent of it, it is necessary to include the specific activities of the nuclides, which were not available for this investigation. In addition to this the report shows how the amounts of food consumption of different groups of foods influence each other and which connections between these amounts should be taken into account, in order to estimate the radiation exposition as precise as possible. (orig.) [de
Visualization of the variability of 3D statistical shape models by animation.

Science.gov (United States)

Lamecker, Hans; Seebass, Martin; Lange, Thomas; Hege, Hans-Christian; Deuflhard, Peter

2004-01-01

Models of the 3D shape of anatomical objects and the knowledge about their statistical variability are of great benefit in many computer assisted medical applications like images analysis, therapy or surgery planning. Statistical model of shapes have successfully been applied to automate the task of image segmentation. The generation of 3D statistical shape models requires the identification of corresponding points on two shapes. This remains a difficult problem, especially for shapes of complicated topology. In order to interpret and validate variations encoded in a statistical shape model, visual inspection is of great importance. This work describes the generation and interpretation of statistical shape models of the liver and the pelvic bone.
Hydrometeorological and statistical analyses of heavy rainfall in Midwestern USA

Science.gov (United States)

Thorndahl, S.; Smith, J. A.; Krajewski, W. F.

2012-04-01

During the last two decades the mid-western states of the United States of America has been largely afflicted by heavy flood producing rainfall. Several of these storms seem to have similar hydrometeorological properties in terms of pattern, track, evolution, life cycle, clustering, etc. which raise the question if it is possible to derive general characteristics of the space-time structures of these heavy storms. This is important in order to understand hydrometeorological features, e.g. how storms evolve and with what frequency we can expect extreme storms to occur. In the literature, most studies of extreme rainfall are based on point measurements (rain gauges). However, with high resolution and quality radar observation periods exceeding more than two decades, it is possible to do long-term spatio-temporal statistical analyses of extremes. This makes it possible to link return periods to distributed rainfall estimates and to study precipitation structures which cause floods. However, doing these statistical frequency analyses of rainfall based on radar observations introduces some different challenges, converting radar reflectivity observations to "true" rainfall, which are not problematic doing traditional analyses on rain gauge data. It is for example difficult to distinguish reflectivity from high intensity rain from reflectivity from other hydrometeors such as hail, especially using single polarization radars which are used in this study. Furthermore, reflectivity from bright band (melting layer) should be discarded and anomalous propagation should be corrected in order to produce valid statistics of extreme radar rainfall. Other challenges include combining observations from several radars to one mosaic, bias correction against rain gauges, range correction, ZR-relationships, etc. The present study analyzes radar rainfall observations from 1996 to 2011 based the American NEXRAD network of radars over an area covering parts of Iowa, Wisconsin, Illinois, and
Statistical validity of using ratio variables in human kinetics research.

Science.gov (United States)

Liu, Yuanlong; Schutz, Robert W

2003-09-01

The purposes of this study were to investigate the validity of the simple ratio and three alternative deflation models and examine how the variation of the numerator and denominator variables affects the reliability of a ratio variable. A simple ratio and three alternative deflation models were fitted to four empirical data sets, and common criteria were applied to determine the best model for deflation. Intraclass correlation was used to examine the component effect on the reliability of a ratio variable. The results indicate that the validity, of a deflation model depends on the statistical characteristics of the particular component variables used, and an optimal deflation model for all ratio variables may not exist. Therefore, it is recommended that different models be fitted to each empirical data set to determine the best deflation model. It was found that the reliability of a simple ratio is affected by the coefficients of variation and the within- and between-trial correlations between the numerator and denominator variables. It was recommended that researchers should compute the reliability of the derived ratio scores and not assume that strong reliabilities in the numerator and denominator measures automatically lead to high reliability in the ratio measures.
Best Statistical Distribution of flood variables for Johor River in Malaysia

Science.gov (United States)

Salarpour Goodarzi, M.; Yusop, Z.; Yusof, F.

2012-12-01

A complex flood event is always characterized by a few characteristics such as flood peak, flood volume, and flood duration, which might be mutually correlated. This study explored the statistical distribution of peakflow, flood duration and flood volume at Rantau Panjang gauging station on the Johor River in Malaysia. Hourly data were recorded for 45 years. The data were analysed based on water year (July - June). Five distributions namely, Log Normal, Generalize Pareto, Log Pearson, Normal and Generalize Extreme Value (GEV) were used to model the distribution of all the three variables. Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests were used to evaluate the best fit. Goodness-of-fit tests at 5% level of significance indicate that all the models can be used to model the distribution of peakflow, flood duration and flood volume. However, Generalize Pareto distribution is found to be the most suitable model when tested with the Anderson-Darling test and the, Kolmogorov-Smirnov suggested that GEV is the best for peakflow. The result of this research can be used to improve flood frequency analysis. Comparison between Generalized Extreme Value, Generalized Pareto and Log Pearson distributions in the Cumulative Distribution Function of peakflow
Variability aware compact model characterization for statistical circuit design optimization

Science.gov (United States)

Qiao, Ying; Qian, Kun; Spanos, Costas J.

2012-03-01

Variability modeling at the compact transistor model level can enable statistically optimized designs in view of limitations imposed by the fabrication technology. In this work we propose an efficient variabilityaware compact model characterization methodology based on the linear propagation of variance. Hierarchical spatial variability patterns of selected compact model parameters are directly calculated from transistor array test structures. This methodology has been implemented and tested using transistor I-V measurements and the EKV-EPFL compact model. Calculation results compare well to full-wafer direct model parameter extractions. Further studies are done on the proper selection of both compact model parameters and electrical measurement metrics used in the method.
Sensitivity and uncertainty analyses for performance assessment modeling

International Nuclear Information System (INIS)

Doctor, P.G.

1988-08-01

Sensitivity and uncertainty analyses methods for computer models are being applied in performance assessment modeling in the geologic high level radioactive waste repository program. The models used in performance assessment tend to be complex physical/chemical models with large numbers of input variables. There are two basic approaches to sensitivity and uncertainty analyses: deterministic and statistical. The deterministic approach to sensitivity analysis involves numerical calculation or employs the adjoint form of a partial differential equation to compute partial derivatives; the uncertainty analysis is based on Taylor series expansions of the input variables propagated through the model to compute means and variances of the output variable. The statistical approach to sensitivity analysis involves a response surface approximation to the model with the sensitivity coefficients calculated from the response surface parameters; the uncertainty analysis is based on simulation. The methods each have strengths and weaknesses. 44 refs
Towards an Industrial Application of Statistical Uncertainty Analysis Methods to Multi-physical Modelling and Safety Analyses

International Nuclear Information System (INIS)

Zhang, Jinzhao; Segurado, Jacobo; Schneidesch, Christophe

2013-01-01

Since 1980's, Tractebel Engineering (TE) has being developed and applied a multi-physical modelling and safety analyses capability, based on a code package consisting of the best estimate 3D neutronic (PANTHER), system thermal hydraulic (RELAP5), core sub-channel thermal hydraulic (COBRA-3C), and fuel thermal mechanic (FRAPCON/FRAPTRAN) codes. A series of methodologies have been developed to perform and to license the reactor safety analysis and core reload design, based on the deterministic bounding approach. Following the recent trends in research and development as well as in industrial applications, TE has been working since 2010 towards the application of the statistical sensitivity and uncertainty analysis methods to the multi-physical modelling and licensing safety analyses. In this paper, the TE multi-physical modelling and safety analyses capability is first described, followed by the proposed TE best estimate plus statistical uncertainty analysis method (BESUAM). The chosen statistical sensitivity and uncertainty analysis methods (non-parametric order statistic method or bootstrap) and tool (DAKOTA) are then presented, followed by some preliminary results of their applications to FRAPCON/FRAPTRAN simulation of OECD RIA fuel rod codes benchmark and RELAP5/MOD3.3 simulation of THTF tests. (authors)

Microvariability in AGNs: study of different statistical methods - I. Observational analysis

Science.gov (United States)

Zibecchi, L.; Andruchow, I.; Cellone, S. A.; Carpintero, D. D.; Romero, G. E.; Combi, J. A.

2017-05-01

We present the results of a study of different statistical methods currently used in the literature to analyse the (micro)variability of active galactic nuclei (AGNs) from ground-based optical observations. In particular, we focus on the comparison between the results obtained by applying the so-called C and F statistics, which are based on the ratio of standard deviations and variances, respectively. The motivation for this is that the implementation of these methods leads to different and contradictory results, making the variability classification of the light curves of a certain source dependent on the statistics implemented. For this purpose, we re-analyse the results on an AGN sample observed along several sessions with the 2.15 m 'Jorge Sahade' telescope (CASLEO), San Juan, Argentina. For each AGN, we constructed the nightly differential light curves. We thus obtained a total of 78 light curves for 39 AGNs, and we then applied the statistical tests mentioned above, in order to re-classify the variability state of these light curves and in an attempt to find the suitable statistical methodology to study photometric (micro)variations. We conclude that, although the C criterion is not proper a statistical test, it could still be a suitable parameter to detect variability and that its application allows us to get more reliable variability results, in contrast with the F test.
Statistical analyses of the data on occupational radiation expousure at JPDR

International Nuclear Information System (INIS)

Kato, Shohei; Anazawa, Yutaka; Matsuno, Kenji; Furuta, Toshishiro; Akiyama, Isamu

1980-01-01

In the statistical analyses of the data on occupational radiation exposure at JPDR, statistical features were obtained as follows. (1) The individual doses followed log-normal distribution. (2) In the distribution of doses from one job in controlled area, the logarithm of the mean (μ) depended on the exposure rate (γ(mR/h)), and the σ correlated to the nature of the job and normally distributed. These relations were as follows. μ = 0.48 ln r-0.24, σ = 1.2 +- 0.58 (3) For the data containing different groups, the distribution of doses showed a polygonal line on the log-normal probability paper. (4) Under the dose limitation, the distribution of the doses showed asymptotic curve along the limit on the log-normal probability paper. (author)
Data base for the analysis of compositional characteristics of coal seams and macerals. Final report - Part 10. Variability in the inorganic content of United States' coals: a multivariate statistical study

Energy Technology Data Exchange (ETDEWEB)

Glick, D.C.; Davis, A.

1984-07-01

The multivariate statistical techniques of correlation coefficients, factor analysis, and cluster analysis, implemented by computer programs, can be used to process a large data set and produce a summary of relationships between variables and between samples. These techniques were used to find relationships for data on the inorganic constituents of US coals. Three hundred thirty-five whole-seam channel samples from six US coal provinces were analyzed for inorganic variables. After consideration of the attributes of data expressed on ash basis and whole-coal basis, it was decided to perform complete statistical analyses on both data sets. Thirty variables expressed on whole-coal basis and twenty-six variables expressed on ash basis were used. For each inorganic variable, a frequency distribution histogram and a set of summary statistics was produced. These were subdivided to reveal the manner in which concentrations of inorganic constituents vary between coal provinces and between coal regions. Data collected on 124 samples from three stratigraphic groups (Pottsville, Monongahela, Allegheny) in the Appalachian region were studied using analysis of variance to determine degree of variability between stratigraphic levels. Most variables showed differences in mean values between the three groups. 193 references, 71 figures, 54 tables.
ZnO crystals obtained by electrodeposition: Statistical analysis of most important process variables

International Nuclear Information System (INIS)

Cembrero, Jesus; Busquets-Mataix, David

2009-01-01

In this paper a comparative study by means of a statistical analysis of the main process variables affecting ZnO crystal electrodeposition is presented. ZnO crystals were deposited on two different substrates, silicon wafer and indium tin oxide. The control variables were substrate types, electrolyte concentration, temperature, exposition time and current density. The morphologies of the different substrates were observed using scanning electron microscopy. The percentage of substrate area covered by ZnO deposit was calculated by computational image analysis. The design of the applied experiments was based on a two-level factorial analysis involving a series of 32 experiments and an analysis of variance. Statistical results reveal that variables exerting a significant influence on the area covered by ZnO deposit are electrolyte concentration, substrate type and time of deposition, together with a combined two-factor interaction between temperature and current density. However, morphology is also influenced by surface roughness of the substrates
Low-frequency variability of the atmospheric circulation: a comparison of statistical properties in both hemispheres and extreme seasons

International Nuclear Information System (INIS)

Buzzi, A.; Tosi, E.

1988-01-01

A statistical investigation is presented of the main variables characterizing the tropospheric general circulation in both hemispheres and extreme season, Winter and Summer. This gives up the opportunity of comparing four distinct realizations of the planetary circulation, as function of different orographic and thermal forcing conditions. Our approach is made possible by the availability of 6 years of global daily analyses prepared by ECMWF (European Centre for Medium-range Weather Forecast). The variables taken into account are the zonal geostrophic wind, the zonal thermal wind and various large-scala wave components, averaged over the tropospheric depth between 1000 and 200 hPa. The mean properties of the analysed quantities in each hemisphere and season are compared and their principal characteristics are discussed. The probability density estimates for the same variables, filtered in order to eliminate the seasonal cycle and the high frequency 'noise', are then presented. The distributions are examined, in particular, with respect of their unimodal or multimodal nature and with reference to the recent discussion in the literature on the bimodality which has been found for some indicators of planetary wave activity in the Nothern Hemisphere Winter. Our results indicate the presence of nonunimodally distributed wave and zonal flow components in both hemispheres and extreme season. The most frequent occurrence of nonunimodal behaviour is found for those wave components which exhibit an almost vanishing zonal phase speed and a larger 'response' to orographic forcing
Repeatability of heart rate variability in congenital hypothyroidism as analysed by detrended fluctuation analysis

International Nuclear Information System (INIS)

Echeverría, J C; Solís, L I; Pérez, J E; Gaitán, M J; Mandujano, M; Sánchez, M C; González-Camarena, R; Rivera, I R

2009-01-01

The analysis of heart rate fluctuations, or heart rate variability (HRV), may be applied to explore children's neurodevelopment. However, previous studies have reported poor reliability (repeatability) of HRV measures in children at rest and during light exercise. Whether the reliability can be improved by controlling variables such as physical activity, breathing rate and tidal volume, or by selecting non-conventional techniques for analysing the data remains as an open question. We evaluated the short-term repeatability of RR-interval data from medicated children with congenital hypothyroidism (CH). The α 1 exponents, obtained by detrended fluctuation analysis (DFA), from the data of 21 children collected at two different sessions were compared. Elapsed days between sessions were 59 ± 33, and data were obtained during 10 min, trying to restrict the children's activity while being seated. We found statistical agreement between the means of α 1 exponents for each session (p = 0.94) and no bias with a low-coefficient variation (9.1%); an intraclass correlation coefficient ri = 0.48 ([0.14 0.72], 95% confidence interval) was also estimated. These findings, which were compared with results obtained by conventional time and frequency techniques, indicate the existence of agreement between the α 1 exponents obtained at each session, thereby providing support concerning the repeatability of HRV data as analysed by DFA in children with congenital hypothyroidism. Of particular interest was also the agreement found by using the central frequency of the high-frequency band and the parameter pNN20, both showing better or similar ri than α 1 (0.77 [0.57 0.89] and 0.51 [0.17 0.74], respectively), yet considerably better repeatability than other conventional time and frequency parameters
Statistical methods for analysing the relationship between bank profitability and liquidity

OpenAIRE

Boguslaw Guzik

2006-01-01

The article analyses the most popular methods for the empirical estimation of the relationship between bank profitability and liquidity. Owing to the fact that profitability depends on various factors (both economic and non-economic), a simple correlation coefficient, two-dimensional (profitability/liquidity) graphs or models where profitability depends only on liquidity variable do not provide good and reliable results. Quite good results can be obtained only when multifactorial profitabilit...
The Use of Statistical Process Control Tools for Analysing Financial Statements

Directory of Open Access Journals (Sweden)

Niezgoda Janusz

2017-06-01

Full Text Available This article presents the proposed application of one type of the modified Shewhart control charts in the monitoring of changes in the aggregated level of financial ratios. The control chart x̅ has been used as a basis of analysis. The examined variable from the sample in the mentioned chart is the arithmetic mean. The author proposes to substitute it with a synthetic measure that is determined and based on the selected ratios. As the ratios mentioned above, are expressed in different units and characters, the author applies standardisation. The results of selected comparative analyses have been presented for both bankrupts and non-bankrupts. They indicate the possibility of using control charts as an auxiliary tool in financial analyses.
Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science.

Science.gov (United States)

Veldkamp, Coosje L S; Nuijten, Michèle B; Dominguez-Alvarez, Linda; van Assen, Marcel A L M; Wicherts, Jelte M

2014-01-01

Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this 'co-piloting' currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors.
Rainfall Downscaling Conditional on Upper-air Variables: Assessing Rainfall Statistics in a Changing Climate

Science.gov (United States)

Langousis, Andreas; Deidda, Roberto; Marrocu, Marino; Kaleris, Vassilios

2014-05-01

Due to its intermittent and highly variable character, and the modeling parameterizations used, precipitation is one of the least well reproduced hydrologic variables by both Global Climate Models (GCMs) and Regional Climate Models (RCMs). This is especially the case at a regional level (where hydrologic risks are assessed) and at small temporal scales (e.g. daily) used to run hydrologic models. In an effort to remedy those shortcomings and assess the effect of climate change on rainfall statistics at hydrologically relevant scales, Langousis and Kaleris (2013) developed a statistical framework for simulation of daily rainfall intensities conditional on upper air variables. The developed downscaling scheme was tested using atmospheric data from the ERA-Interim archive (http://www.ecmwf.int/research/era/do/get/index), and daily rainfall measurements from western Greece, and was proved capable of reproducing several statistical properties of actual rainfall records, at both annual and seasonal levels. This was done solely by conditioning rainfall simulation on a vector of atmospheric predictors, properly selected to reflect the relative influence of upper-air variables on ground-level rainfall statistics. In this study, we apply the developed framework for conditional rainfall simulation using atmospheric data from different GCM/RCM combinations. This is done using atmospheric data from the ENSEMBLES project (http://ensembleseu.metoffice.com), and daily rainfall measurements for an intermediate-sized catchment in Italy; i.e. the Flumendosa catchment. Since GCM/RCM products are suited to reproduce the local climatology in a statistical sense (i.e. in terms of relative frequencies), rather than ensuring a one-to-one temporal correspondence between observed and simulated fields (i.e. as is the case for ERA-interim reanalysis data), we proceed in three steps: a) we use statistical tools to establish a linkage between ERA-Interim upper-air atmospheric forecasts and
Statistical analyses of conserved features of genomic islands in bacteria.

Science.gov (United States)

Guo, F-B; Xia, Z-K; Wei, W; Zhao, H-L

2014-03-17

We performed statistical analyses of five conserved features of genomic islands of bacteria. Analyses were made based on 104 known genomic islands, which were identified by comparative methods. Four of these features include sequence size, abnormal G+C content, flanking tRNA gene, and embedded mobility gene, which are frequently investigated. One relatively new feature, G+C homogeneity, was also investigated. Among the 104 known genomic islands, 88.5% were found to fall in the typical length of 10-200 kb and 80.8% had G+C deviations with absolute values larger than 2%. For the 88 genomic islands whose hosts have been sequenced and annotated, 52.3% of them were found to have flanking tRNA genes and 64.7% had embedded mobility genes. For the homogeneity feature, 85% had an h homogeneity index less than 0.1, indicating that their G+C content is relatively uniform. Taking all the five features into account, 87.5% of 88 genomic islands had three of them. Only one genomic island had only one conserved feature and none of the genomic islands had zero features. These statistical results should help to understand the general structure of known genomic islands. We found that larger genomic islands tend to have relatively small G+C deviations relative to absolute values. For example, the absolute G+C deviations of 9 genomic islands longer than 100,000 bp were all less than 5%. This is a novel but reasonable result given that larger genomic islands should have greater restrictions in their G+C contents, in order to maintain the stable G+C content of the recipient genome.
Developing Students' Reasoning about Samples and Sampling Variability as a Path to Expert Statistical Thinking

Science.gov (United States)

Garfield, Joan; Le, Laura; Zieffler, Andrew; Ben-Zvi, Dani

2015-01-01

This paper describes the importance of developing students' reasoning about samples and sampling variability as a foundation for statistical thinking. Research on expert-novice thinking as well as statistical thinking is reviewed and compared. A case is made that statistical thinking is a type of expert thinking, and as such, research…
Basic statistical tools in research and data analysis

Directory of Open Access Journals (Sweden)

Zulfiqar Ali

2016-01-01

Full Text Available Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.
A statistical evaluation of asbestos air concentrations

International Nuclear Information System (INIS)

Lange, J.H.

1999-01-01

Both area and personal air samples collected during an asbestos abatement project were matched and statistically analysed. Among the many parameters studied were fibre concentrations and their variability. Mean values for area and personal samples were 0.005 and 0.024 f cm - - 3 of air, respectively. Summary values for area and personal samples suggest that exposures are low with no single exposure value exceeding the current OSHA TWA value of 0.1 f cm -3 of air. Within- and between-worker analysis suggests that these data are homogeneous. Comparison of within- and between-worker values suggests that the exposure source and variability for abatement are more related to the process than individual practices. This supports the importance of control measures for abatement. Study results also suggest that area and personal samples are not statistically related, that is, there is no association observed for these two sampling methods when data are analysed by correlation or regression analysis. Personal samples were statistically higher in concentration than area samples. Area sampling cannot be used as a surrogate exposure for asbestos abatement workers. (author)
Statistical reporting errors and collaboration on statistical analyses in psychological science

NARCIS (Netherlands)

Veldkamp, C.L.S.; Nuijten, M.B.; Dominguez Alvarez, L.; van Assen, M.A.L.M.; Wicherts, J.M.

2014-01-01

Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this ‘co-piloting’ currently occurs in psychology, we
A weighted U-statistic for genetic association analyses of sequencing data.

Science.gov (United States)

Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J; Lu, Qing

2014-12-01

With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. © 2014 WILEY PERIODICALS, INC.
Statistical Significance of the Contribution of Variables to the PCA Solution: An Alternative Permutation Strategy

Science.gov (United States)

Linting, Marielle; van Os, Bart Jan; Meulman, Jacqueline J.

2011-01-01

In this paper, the statistical significance of the contribution of variables to the principal components in principal components analysis (PCA) is assessed nonparametrically by the use of permutation tests. We compare a new strategy to a strategy used in previous research consisting of permuting the columns (variables) of a data matrix…
Non-Statistical Methods of Analysing of Bankruptcy Risk

Directory of Open Access Journals (Sweden)

Pisula Tomasz

2015-06-01

Full Text Available The article focuses on assessing the effectiveness of a non-statistical approach to bankruptcy modelling in enterprises operating in the logistics sector. In order to describe the issue more comprehensively, the aforementioned prediction of the possible negative results of business operations was carried out for companies functioning in the Polish region of Podkarpacie, and in Slovakia. The bankruptcy predictors selected for the assessment of companies operating in the logistics sector included 28 financial indicators characterizing these enterprises in terms of their financial standing and management effectiveness. The purpose of the study was to identify factors (models describing the bankruptcy risk in enterprises in the context of their forecasting effectiveness in a one-year and two-year time horizon. In order to assess their practical applicability the models were carefully analysed and validated. The usefulness of the models was assessed in terms of their classification properties, and the capacity to accurately identify enterprises at risk of bankruptcy and healthy companies as well as proper calibration of the models to the data from training sample sets.
Applied statistics a handbook of BMDP analyses

CERN Document Server

Snell, E J

1987-01-01

This handbook is a realization of a long term goal of BMDP Statistical Software. As the software supporting statistical analysis has grown in breadth and depth to the point where it can serve many of the needs of accomplished statisticians it can also serve as an essential support to those needing to expand their knowledge of statistical applications. Statisticians should not be handicapped by heavy computation or by the lack of needed options. When Applied Statistics, Principle and Examples by Cox and Snell appeared we at BMDP were impressed with the scope of the applications discussed and felt that many statisticians eager to expand their capabilities in handling such problems could profit from having the solutions carried further, to get them started and guided to a more advanced level in problem solving. Who would be better to undertake that task than the authors of Applied Statistics? A year or two later discussions with David Cox and Joyce Snell at Imperial College indicated that a wedding of the proble...
Statistical methods and regression analysis of stratospheric ozone and meteorological variables in Isfahan

Science.gov (United States)

Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.

2008-04-01

Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.

Empirical Correction to the Likelihood Ratio Statistic for Structural Equation Modeling with Many Variables.

Science.gov (United States)

Yuan, Ke-Hai; Tian, Yubin; Yanagihara, Hirokazu

2015-06-01

Survey data typically contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. The most widely used statistic for evaluating the adequacy of a SEM model is T ML, a slight modification to the likelihood ratio statistic. Under normality assumption, T ML approximately follows a chi-square distribution when the number of observations (N) is large and the number of items or variables (p) is small. However, in practice, p can be rather large while N is always limited due to not having enough participants. Even with a relatively large N, empirical results show that T ML rejects the correct model too often when p is not too small. Various corrections to T ML have been proposed, but they are mostly heuristic. Following the principle of the Bartlett correction, this paper proposes an empirical approach to correct T ML so that the mean of the resulting statistic approximately equals the degrees of freedom of the nominal chi-square distribution. Results show that empirically corrected statistics follow the nominal chi-square distribution much more closely than previously proposed corrections to T ML, and they control type I errors reasonably well whenever N ≥ max(50,2p). The formulations of the empirically corrected statistics are further used to predict type I errors of T ML as reported in the literature, and they perform well.
THE ABSOLUTE MAGNITUDE OF RRc VARIABLES FROM STATISTICAL PARALLAX

International Nuclear Information System (INIS)

Kollmeier, Juna A.; Burns, Christopher R.; Thompson, Ian B.; Preston, George W.; Crane, Jeffrey D.; Madore, Barry F.; Morrell, Nidia; Prieto, José L.; Shectman, Stephen; Simon, Joshua D.; Villanueva, Edward; Szczygieł, Dorota M.; Gould, Andrew; Sneden, Christopher; Dong, Subo

2013-01-01

We present the first definitive measurement of the absolute magnitude of RR Lyrae c-type variable stars (RRc) determined purely from statistical parallax. We use a sample of 242 RRc variables selected from the All Sky Automated Survey for which high-quality light curves, photometry, and proper motions are available. We obtain high-resolution echelle spectra for these objects to determine radial velocities and abundances as part of the Carnegie RR Lyrae Survey. We find that M V,RRc = 0.59 ± 0.10 at a mean metallicity of [Fe/H] = –1.59. This is to be compared with previous estimates for RRab stars (M V,RRab = 0.76 ± 0.12) and the only direct measurement of an RRc absolute magnitude (RZ Cephei, M V,RRc = 0.27 ± 0.17). We find the bulk velocity of the halo relative to the Sun to be (W π , W θ , W z ) = (12.0, –209.9, 3.0) km s –1 in the radial, rotational, and vertical directions with dispersions (σ W π ,σ W θ ,σ W z ) = (150.4, 106.1, 96.0) km s -1 . For the disk, we find (W π , W θ , W z ) = (13.0, –42.0, –27.3) km s –1 relative to the Sun with dispersions (σ W π ,σ W θ ,σ W z ) = (67.7,59.2,54.9) km s -1 . Finally, as a byproduct of our statistical framework, we are able to demonstrate that UCAC2 proper-motion errors are significantly overestimated as verified by UCAC4
Economic Statistical Design of Variable Sampling Interval X¯$\\overline X $ Control Chart Based on Surrogate Variable Using Genetic Algorithms

Directory of Open Access Journals (Sweden)

Lee Tae-Hoon

2016-12-01

Full Text Available In many cases, a X¯$\\overline X $ control chart based on a performance variable is used in industrial fields. Typically, the control chart monitors the measurements of a performance variable itself. However, if the performance variable is too costly or impossible to measure, and a less expensive surrogate variable is available, the process may be more efficiently controlled using surrogate variables. In this paper, we present a model for the economic statistical design of a VSI (Variable Sampling Interval X¯$\\overline X $ control chart using a surrogate variable that is linearly correlated with the performance variable. We derive the total average profit model from an economic viewpoint and apply the model to a Very High Temperature Reactor (VHTR nuclear fuel measurement system and derive the optimal result using genetic algorithms. Compared with the control chart based on a performance variable, the proposed model gives a larger expected net income per unit of time in the long-run if the correlation between the performance variable and the surrogate variable is relatively high. The proposed model was confined to the sample mean control chart under the assumption that a single assignable cause occurs according to the Poisson process. However, the model may also be extended to other types of control charts using a single or multiple assignable cause assumptions such as VSS (Variable Sample Size X¯$\\overline X $ control chart, EWMA, CUSUM charts and so on.
Meta-Statistics for Variable Selection: The R Package BioMark

Directory of Open Access Journals (Sweden)

Ron Wehrens

2012-11-01

Full Text Available Biomarker identification is an ever more important topic in the life sciences. With the advent of measurement methodologies based on microarrays and mass spectrometry, thousands of variables are routinely being measured on complex biological samples. Often, the question is what makes two groups of samples different. Classical hypothesis testing suffers from the multiple testing problem; however, correcting for this often leads to a lack of power. In addition, choosing α cutoff levels remains somewhat arbitrary. Also in a regression context, a model depending on few but relevant variables will be more accurate and precise, and easier to interpret biologically.We propose an R package, BioMark, implementing two meta-statistics for variable selection. The first, higher criticism, presents a data-dependent selection threshold for significance, instead of a cookbook value of α = 0.05. It is applicable in all cases where two groups are compared. The second, stability selection, is more general, and can also be applied in a regression context. This approach uses repeated subsampling of the data in order to assess the variability of the model coefficients and selects those that remain consistently important. It is shown using experimental spike-in data from the field of metabolomics that both approaches work well with real data. BioMark also contains functionality for simulating data with specific characteristics for algorithm development and testing.
CADDIS Volume 4. Data Analysis: Advanced Analyses - Controlling for Natural Variability: SSD Plot Diagrams

Science.gov (United States)

Methods for controlling natural variability, predicting environmental conditions from biological observations method, biological trait data, species sensitivity distributions, propensity scores, Advanced Analyses of Data Analysis references.
An MGF-based unified framework to determine the joint statistics of partial sums of ordered random variables

KAUST Repository

Nam, Sungsik; Alouini, Mohamed-Slim; Yang, Hongchuan

2010-01-01

Order statistics find applications in various areas of communications and signal processing. In this paper, we introduce an unified analytical framework to determine the joint statistics of partial sums of ordered random variables (RVs
Computed statistics at streamgages, and methods for estimating low-flow frequency statistics and development of regional regression equations for estimating low-flow frequency statistics at ungaged locations in Missouri

Science.gov (United States)

Southard, Rodney E.

2013-01-01

located in Region 1, 120 were located in Region 2, and 10 were located in Region 3. Streamgages located outside of Missouri were selected to extend the range of data used for the independent variables in the regression analyses. Streamgages included in the regression analyses had 10 or more years of record and were considered to be affected minimally by anthropogenic activities or trends. Regional regression analyses identified three characteristics as statistically significant for the development of regional equations. For Region 1, drainage area, longest flow path, and streamflow-variability index were statistically significant. The range in the standard error of estimate for Region 1 is 79.6 to 94.2 percent. For Region 2, drainage area and streamflow variability index were statistically significant, and the range in the standard error of estimate is 48.2 to 72.1 percent. For Region 3, drainage area and streamflow-variability index also were statistically significant with a range in the standard error of estimate of 48.1 to 96.2 percent. Limitations on the use of estimating low-flow frequency statistics at ungaged locations are dependent on the method used. The first method outlined for use in Missouri, power curve equations, were developed to estimate the selected statistics for ungaged locations on 28 selected streams with multiple streamgages located on the same stream. A second method uses a drainage-area ratio to compute statistics at an ungaged location using data from a single streamgage on the same stream with 10 or more years of record. Ungaged locations on these streams may use the ratio of the drainage area at an ungaged location to the drainage area at a streamgage location to scale the selected statistic value from the streamgage location to the ungaged location. This method can be used if the drainage area of the ungaged location is within 40 to 150 percent of the streamgage drainage area. The third method is the use of the regional regression equations
[Clinical research XXIII. From clinical judgment to meta-analyses].

Science.gov (United States)

Rivas-Ruiz, Rodolfo; Castelán-Martínez, Osvaldo D; Pérez-Rodríguez, Marcela; Palacios-Cruz, Lino; Noyola-Castillo, Maura E; Talavera, Juan O

2014-01-01

Systematic reviews (SR) are studies made in order to ask clinical questions based on original articles. Meta-analysis (MTA) is the mathematical analysis of SR. These analyses are divided in two groups, those which evaluate the measured results of quantitative variables (for example, the body mass index -BMI-) and those which evaluate qualitative variables (for example, if a patient is alive or dead, or if he is healing or not). Quantitative variables generally use the mean difference analysis and qualitative variables can be performed using several calculations: odds ratio (OR), relative risk (RR), absolute risk reduction (ARR) and hazard ratio (HR). These analyses are represented through forest plots which allow the evaluation of each individual study, as well as the heterogeneity between studies and the overall effect of the intervention. These analyses are mainly based on Student's t test and chi-squared. To take appropriate decisions based on the MTA, it is important to understand the characteristics of statistical methods in order to avoid misinterpretations.
Exact statistical results for binary mixing and reaction in variable density turbulence

Science.gov (United States)

Ristorcelli, J. R.

2017-02-01

We report a number of rigorous statistical results on binary active scalar mixing in variable density turbulence. The study is motivated by mixing between pure fluids with very different densities and whose density intensity is of order unity. Our primary focus is the derivation of exact mathematical results for mixing in variable density turbulence and we do point out the potential fields of application of the results. A binary one step reaction is invoked to derive a metric to asses the state of mixing. The mean reaction rate in variable density turbulent mixing can be expressed, in closed form, using the first order Favre mean variables and the Reynolds averaged density variance, ⟨ρ2⟩ . We show that the normalized density variance, ⟨ρ2⟩ , reflects the reduction of the reaction due to mixing and is a mix metric. The result is mathematically rigorous. The result is the variable density analog, the normalized mass fraction variance ⟨c2⟩ used in constant density turbulent mixing. As a consequence, we demonstrate that use of the analogous normalized Favre variance of the mass fraction, c″ ⁣2˜ , as a mix metric is not theoretically justified in variable density turbulence. We additionally derive expressions relating various second order moments of the mass fraction, specific volume, and density fields. The central role of the density specific volume covariance ⟨ρ v ⟩ is highlighted; it is a key quantity with considerable dynamical significance linking various second order statistics. For laboratory experiments, we have developed exact relations between the Reynolds scalar variance ⟨c2⟩ its Favre analog c″ ⁣2˜ , and various second moments including ⟨ρ v ⟩ . For moment closure models that evolve ⟨ρ v ⟩ and not ⟨ρ2⟩ , we provide a novel expression for ⟨ρ2⟩ in terms of a rational function of ⟨ρ v ⟩ that avoids recourse to Taylor series methods (which do not converge for large density differences). We have derived
PROCESS VARIABILITY REDUCTION THROUGH STATISTICAL PROCESS CONTROL FOR QUALITY IMPROVEMENT

Directory of Open Access Journals (Sweden)

B.P. Mahesh

2010-09-01

Full Text Available Quality has become one of the most important customer decision factors in the selection among the competing product and services. Consequently, understanding and improving quality is a key factor leading to business success, growth and an enhanced competitive position. Hence quality improvement program should be an integral part of the overall business strategy. According to TQM, the effective way to improve the Quality of the product or service is to improve the process used to build the product. Hence, TQM focuses on process, rather than results as the results are driven by the processes. Many techniques are available for quality improvement. Statistical Process Control (SPC is one such TQM technique which is widely accepted for analyzing quality problems and improving the performance of the production process. This article illustrates the step by step procedure adopted at a soap manufacturing company to improve the Quality by reducing process variability using Statistical Process Control.
THE ABSOLUTE MAGNITUDE OF RRc VARIABLES FROM STATISTICAL PARALLAX

Energy Technology Data Exchange (ETDEWEB)

Kollmeier, Juna A.; Burns, Christopher R.; Thompson, Ian B.; Preston, George W.; Crane, Jeffrey D.; Madore, Barry F.; Morrell, Nidia; Prieto, José L.; Shectman, Stephen; Simon, Joshua D.; Villanueva, Edward [Observatories of the Carnegie Institution of Washington, 813 Santa Barbara Street, Pasadena, CA 91101 (United States); Szczygieł, Dorota M.; Gould, Andrew [Department of Astronomy, The Ohio State University, 4051 McPherson Laboratory, Columbus, OH 43210 (United States); Sneden, Christopher [Department of Astronomy, University of Texas at Austin, TX 78712 (United States); Dong, Subo [Institute for Advanced Study, 500 Einstein Drive, Princeton, NJ 08540 (United States)

2013-09-20

We present the first definitive measurement of the absolute magnitude of RR Lyrae c-type variable stars (RRc) determined purely from statistical parallax. We use a sample of 242 RRc variables selected from the All Sky Automated Survey for which high-quality light curves, photometry, and proper motions are available. We obtain high-resolution echelle spectra for these objects to determine radial velocities and abundances as part of the Carnegie RR Lyrae Survey. We find that M{sub V,RRc} = 0.59 ± 0.10 at a mean metallicity of [Fe/H] = –1.59. This is to be compared with previous estimates for RRab stars (M{sub V,RRab} = 0.76 ± 0.12) and the only direct measurement of an RRc absolute magnitude (RZ Cephei, M{sub V,RRc} = 0.27 ± 0.17). We find the bulk velocity of the halo relative to the Sun to be (W{sub π}, W{sub θ}, W{sub z} ) = (12.0, –209.9, 3.0) km s{sup –1} in the radial, rotational, and vertical directions with dispersions (σ{sub W{sub π}},σ{sub W{sub θ}},σ{sub W{sub z}}) = (150.4, 106.1, 96.0) km s{sup -1}. For the disk, we find (W{sub π}, W{sub θ}, W{sub z} ) = (13.0, –42.0, –27.3) km s{sup –1} relative to the Sun with dispersions (σ{sub W{sub π}},σ{sub W{sub θ}},σ{sub W{sub z}}) = (67.7,59.2,54.9) km s{sup -1}. Finally, as a byproduct of our statistical framework, we are able to demonstrate that UCAC2 proper-motion errors are significantly overestimated as verified by UCAC4.
Statistical reliability analyses of two wood plastic composite extrusion processes

International Nuclear Information System (INIS)

Crookston, Kevin A.; Mark Young, Timothy; Harper, David; Guess, Frank M.

2011-01-01

Estimates of the reliability of wood plastic composites (WPC) are explored for two industrial extrusion lines. The goal of the paper is to use parametric and non-parametric analyses to examine potential differences in the WPC metrics of reliability for the two extrusion lines that may be helpful for use by the practitioner. A parametric analysis of the extrusion lines reveals some similarities and disparities in the best models; however, a non-parametric analysis reveals unique and insightful differences between Kaplan-Meier survival curves for the modulus of elasticity (MOE) and modulus of rupture (MOR) of the WPC industrial data. The distinctive non-parametric comparisons indicate the source of the differences in strength between the 10.2% and 48.0% fractiles [3,183-3,517 MPa] for MOE and for MOR between the 2.0% and 95.1% fractiles [18.9-25.7 MPa]. Distribution fitting as related to selection of the proper statistical methods is discussed with relevance to estimating the reliability of WPC. The ability to detect statistical differences in the product reliability of WPC between extrusion processes may benefit WPC producers in improving product reliability and safety of this widely used house-decking product. The approach can be applied to many other safety and complex system lifetime comparisons.
The Need for Speed in Rodent Locomotion Analyses

Science.gov (United States)

Batka, Richard J.; Brown, Todd J.; Mcmillan, Kathryn P.; Meadows, Rena M.; Jones, Kathryn J.; Haulcomb, Melissa M.

2016-01-01

Locomotion analysis is now widely used across many animal species to understand the motor defects in disease, functional recovery following neural injury, and the effectiveness of various treatments. More recently, rodent locomotion analysis has become an increasingly popular method in a diverse range of research. Speed is an inseparable aspect of locomotion that is still not fully understood, and its effects are often not properly incorporated while analyzing data. In this hybrid manuscript, we accomplish three things: (1) review the interaction between speed and locomotion variables in rodent studies, (2) comprehensively analyze the relationship between speed and 162 locomotion variables in a group of 16 wild-type mice using the CatWalk gait analysis system, and (3) develop and test a statistical method in which locomotion variables are analyzed and reported in the context of speed. Notable results include the following: (1) over 90% of variables, reported by CatWalk, were dependent on speed with an average R2 value of 0.624, (2) most variables were related to speed in a nonlinear manner, (3) current methods of controlling for speed are insufficient, and (4) the linear mixed model is an appropriate and effective statistical method for locomotion analyses that is inclusive of speed-dependent relationships. Given the pervasive dependency of locomotion variables on speed, we maintain that valid conclusions from locomotion analyses cannot be made unless they are analyzed and reported within the context of speed. PMID:24890845
Multivariate statistical methods and data mining in particle physics (4/4)

CERN Multimedia

CERN. Geneva

2008-01-01

The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.
Multivariate statistical methods and data mining in particle physics (2/4)

CERN Multimedia

CERN. Geneva

2008-01-01

The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.
Multivariate statistical methods and data mining in particle physics (1/4)

CERN Multimedia

CERN. Geneva

2008-01-01

The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.
A statistical evaluation of asbestos air concentrations

Energy Technology Data Exchange (ETDEWEB)

Lange, J.H. [Envirosafe Training and Consultants, Pittsburgh, PA (United States)

1999-07-01

Both area and personal air samples collected during an asbestos abatement project were matched and statistically analysed. Among the many parameters studied were fibre concentrations and their variability. Mean values for area and personal samples were 0.005 and 0.024 f cm{sup -}-{sup 3} of air, respectively. Summary values for area and personal samples suggest that exposures are low with no single exposure value exceeding the current OSHA TWA value of 0.1 f cm{sup -3} of air. Within- and between-worker analysis suggests that these data are homogeneous. Comparison of within- and between-worker values suggests that the exposure source and variability for abatement are more related to the process than individual practices. This supports the importance of control measures for abatement. Study results also suggest that area and personal samples are not statistically related, that is, there is no association observed for these two sampling methods when data are analysed by correlation or regression analysis. Personal samples were statistically higher in concentration than area samples. Area sampling cannot be used as a surrogate exposure for asbestos abatement workers. (author)
SOERP, Statistics and 2. Order Error Propagation for Function of Random Variables

International Nuclear Information System (INIS)

Cox, N. D.; Miller, C. F.

1985-01-01

1 - Description of problem or function: SOERP computes second-order error propagation equations for the first four moments of a function of independently distributed random variables. SOERP was written for a rigorous second-order error propagation of any function which may be expanded in a multivariable Taylor series, the input variables being independently distributed. The required input consists of numbers directly related to the partial derivatives of the function, evaluated at the nominal values of the input variables and the central moments of the input variables from the second through the eighth. 2 - Method of solution: The development of equations for computing the propagation of errors begins by expressing the function of random variables in a multivariable Taylor series expansion. The Taylor series expansion is then truncated, and statistical operations are applied to the series in order to obtain equations for the moments (about the origin) of the distribution of the computed value. If the Taylor series is truncated after powers of two, the procedure produces second-order error propagation equations. 3 - Restrictions on the complexity of the problem: The maximum number of component variables allowed is 30. The IBM version will only process one set of input data per run
Statistical analysis of corn yields responding to climate variability at various spatio-temporal resolutions

Science.gov (United States)

Jiang, H.; Lin, T.

2017-12-01

Rain-fed corn production systems are subject to sub-seasonal variations of precipitation and temperature during the growing season. As each growth phase has varied inherent physiological process, plants necessitate different optimal environmental conditions during each phase. However, this temporal heterogeneity towards climate variability alongside the lifecycle of crops is often simplified and fixed as constant responses in large scale statistical modeling analysis. To capture the time-variant growing requirements in large scale statistical analysis, we develop and compare statistical models at various spatial and temporal resolutions to quantify the relationship between corn yield and weather factors for 12 corn belt states from 1981 to 2016. The study compares three spatial resolutions (county, agricultural district, and state scale) and three temporal resolutions (crop growth phase, monthly, and growing season) to characterize the effects of spatial and temporal variability. Our results show that the agricultural district model together with growth phase resolution can explain 52% variations of corn yield caused by temperature and precipitation variability. It provides a practical model structure balancing the overfitting problem in county specific model and weak explanation power in state specific model. In US corn belt, precipitation has positive impact on corn yield in growing season except for vegetative stage while extreme heat attains highest sensitivity from silking to dough phase. The results show the northern counties in corn belt area are less interfered by extreme heat but are more vulnerable to water deficiency.
Statistical learning from nonrecurrent experience with discrete input variables and recursive-error-minimization equations

Science.gov (United States)

Carter, Jeffrey R.; Simon, Wayne E.

1990-08-01

Neural networks are trained using Recursive Error Minimization (REM) equations to perform statistical classification. Using REM equations with continuous input variables reduces the required number of training experiences by factors of one to two orders of magnitude over standard back propagation. Replacing the continuous input variables with discrete binary representations reduces the number of connections by a factor proportional to the number of variables reducing the required number of experiences by another order of magnitude. Undesirable effects of using recurrent experience to train neural networks for statistical classification problems are demonstrated and nonrecurrent experience used to avoid these undesirable effects. 1. THE 1-41 PROBLEM The statistical classification problem which we address is is that of assigning points in ddimensional space to one of two classes. The first class has a covariance matrix of I (the identity matrix) the covariance matrix of the second class is 41. For this reason the problem is known as the 1-41 problem. Both classes have equal probability of occurrence and samples from both classes may appear anywhere throughout the ddimensional space. Most samples near the origin of the coordinate system will be from the first class while most samples away from the origin will be from the second class. Since the two classes completely overlap it is impossible to have a classifier with zero error. The minimum possible error is known as the Bayes error and

Methods in pharmacoepidemiology: a review of statistical analyses and data reporting in pediatric drug utilization studies.

Science.gov (United States)

Sequi, Marco; Campi, Rita; Clavenna, Antonio; Bonati, Maurizio

2013-03-01

To evaluate the quality of data reporting and statistical methods performed in drug utilization studies in the pediatric population. Drug utilization studies evaluating all drug prescriptions to children and adolescents published between January 1994 and December 2011 were retrieved and analyzed. For each study, information on measures of exposure/consumption, the covariates considered, descriptive and inferential analyses, statistical tests, and methods of data reporting was extracted. An overall quality score was created for each study using a 12-item checklist that took into account the presence of outcome measures, covariates of measures, descriptive measures, statistical tests, and graphical representation. A total of 22 studies were reviewed and analyzed. Of these, 20 studies reported at least one descriptive measure. The mean was the most commonly used measure (18 studies), but only five of these also reported the standard deviation. Statistical analyses were performed in 12 studies, with the chi-square test being the most commonly performed test. Graphs were presented in 14 papers. Sixteen papers reported the number of drug prescriptions and/or packages, and ten reported the prevalence of the drug prescription. The mean quality score was 8 (median 9). Only seven of the 22 studies received a score of ≥10, while four studies received a score of statistical methods and reported data in a satisfactory manner. We therefore conclude that the methodology of drug utilization studies needs to be improved.
Characteristics of electrostatic solitary waves observed in the plasma sheet boundary: Statistical analyses

Directory of Open Access Journals (Sweden)

H. Kojima

1999-01-01

Full Text Available We present the characteristics of the Electrostatic Solitary Waves (ESW observed by the Geotail spacecraft in the plasma sheet boundary layer based on the statistical analyses. We also discuss the results referring to a model of ESW generation due to electron beams, which is proposed by computer simulations. In this generation model, the nonlinear evolution of Langmuir waves excited by electron bump-on-tail instabilities leads to formation of isolated electrostatic potential structures corresponding to "electron hole" in the phase space. The statistical analyses of the Geotail data, which we conducted under the assumption that polarity of ESW potentials is positive, show that most of ESW propagate in the same direction of electron beams, which are observed by the plasma instrument, simultaneously. Further, we also find that the ESW potential energy is much smaller than the background electron thermal energy and that the ESW potential widths are typically shorter than 60 times of local electron Debye length when we assume that the ESW potentials travel in the same velocity of electron beams. These results are very consistent with the ESW generation model that the nonlinear evolution of electron bump-on-tail instability leads to the formation of electron holes in the phase space.
arXiv Statistical Analyses of Higgs- and Z-Portal Dark Matter Models

CERN Document Server

Ellis, John; Marzola, Luca; Raidal, Martti

2018-06-12

We perform frequentist and Bayesian statistical analyses of Higgs- and Z-portal models of dark matter particles with spin 0, 1/2 and 1. Our analyses incorporate data from direct detection and indirect detection experiments, as well as LHC searches for monojet and monophoton events, and we also analyze the potential impacts of future direct detection experiments. We find acceptable regions of the parameter spaces for Higgs-portal models with real scalar, neutral vector, Majorana or Dirac fermion dark matter particles, and Z-portal models with Majorana or Dirac fermion dark matter particles. In many of these cases, there are interesting prospects for discovering dark matter particles in Higgs or Z decays, as well as dark matter particles weighing $\\gtrsim 100$ GeV. Negative results from planned direct detection experiments would still allow acceptable regions for Higgs- and Z-portal models with Majorana or Dirac fermion dark matter particles.
Anàlisi de classificació amb variable «criteri» a SPAD

Directory of Open Access Journals (Sweden)

Angelina Sánchez Martí

2018-01-01

Full Text Available This article presents the characteristics, procedure and utility of the technique of classification analysis with criterion variable in a large set of data, using mainly categorical variables. Classification analysis forms part of the techniques commonly known as data mining that analyse relationships or associations between variables. The paper describes step by step how to apply this statistical technique with the support of SPAD software, a statistical package for multivariate analysis, and provides an example of its application. The technique is drawn from the French school of statistics. Despite being little known, it is a very useful classification analysis for working with large amounts of data, a situation that is increasingly common in educational research, and more typical of the secondary analyses that are carried out in our field.
Statistical inference for the lifetime performance index based on generalised order statistics from exponential distribution

Science.gov (United States)

Vali Ahmadi, Mohammad; Doostparast, Mahdi; Ahmadi, Jafar

2015-04-01

In manufacturing industries, the lifetime of an item is usually characterised by a random variable X and considered to be satisfactory if X exceeds a given lower lifetime limit L. The probability of a satisfactory item is then ηL := P(X ≥ L), called conforming rate. In industrial companies, however, the lifetime performance index, proposed by Montgomery and denoted by CL, is widely used as a process capability index instead of the conforming rate. Assuming a parametric model for the random variable X, we show that there is a connection between the conforming rate and the lifetime performance index. Consequently, the statistical inferences about ηL and CL are equivalent. Hence, we restrict ourselves to statistical inference for CL based on generalised order statistics, which contains several ordered data models such as usual order statistics, progressively Type-II censored data and records. Various point and interval estimators for the parameter CL are obtained and optimal critical regions for the hypothesis testing problems concerning CL are proposed. Finally, two real data-sets on the lifetimes of insulating fluid and ball bearings, due to Nelson (1982) and Caroni (2002), respectively, and a simulated sample are analysed.
Introduction to statistical modelling 2: categorical variables and interactions in linear regression.

Science.gov (United States)

Lunt, Mark

2015-07-01

In the first article in this series we explored the use of linear regression to predict an outcome variable from a number of predictive factors. It assumed that the predictive factors were measured on an interval scale. However, this article shows how categorical variables can also be included in a linear regression model, enabling predictions to be made separately for different groups and allowing for testing the hypothesis that the outcome differs between groups. The use of interaction terms to measure whether the effect of a particular predictor variable differs between groups is also explained. An alternative approach to testing the difference between groups of the effect of a given predictor, which consists of measuring the effect in each group separately and seeing whether the statistical significance differs between the groups, is shown to be misleading. © The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
A canonical neural mechanism for behavioral variability

Science.gov (United States)

Darshan, Ran; Wood, William E.; Peters, Susan; Leblois, Arthur; Hansel, David

2017-05-01

The ability to generate variable movements is essential for learning and adjusting complex behaviours. This variability has been linked to the temporal irregularity of neuronal activity in the central nervous system. However, how neuronal irregularity actually translates into behavioural variability is unclear. Here we combine modelling, electrophysiological and behavioural studies to address this issue. We demonstrate that a model circuit comprising topographically organized and strongly recurrent neural networks can autonomously generate irregular motor behaviours. Simultaneous recordings of neurons in singing finches reveal that neural correlations increase across the circuit driving song variability, in agreement with the model predictions. Analysing behavioural data, we find remarkable similarities in the babbling statistics of 5-6-month-old human infants and juveniles from three songbird species and show that our model naturally accounts for these `universal' statistics.
Scripts for TRUMP data analyses. Part II (HLA-related data): statistical analyses specific for hematopoietic stem cell transplantation.

Science.gov (United States)

Kanda, Junya

2016-01-01

The Transplant Registry Unified Management Program (TRUMP) made it possible for members of the Japan Society for Hematopoietic Cell Transplantation (JSHCT) to analyze large sets of national registry data on autologous and allogeneic hematopoietic stem cell transplantation. However, as the processes used to collect transplantation information are complex and differed over time, the background of these processes should be understood when using TRUMP data. Previously, information on the HLA locus of patients and donors had been collected using a questionnaire-based free-description method, resulting in some input errors. To correct minor but significant errors and provide accurate HLA matching data, the use of a Stata or EZR/R script offered by the JSHCT is strongly recommended when analyzing HLA data in the TRUMP dataset. The HLA mismatch direction, mismatch counting method, and different impacts of HLA mismatches by stem cell source are other important factors in the analysis of HLA data. Additionally, researchers should understand the statistical analyses specific for hematopoietic stem cell transplantation, such as competing risk, landmark analysis, and time-dependent analysis, to correctly analyze transplant data. The data center of the JSHCT can be contacted if statistical assistance is required.
Research Pearls: The Significance of Statistics and Perils of Pooling. Part 3: Pearls and Pitfalls of Meta-analyses and Systematic Reviews.

Science.gov (United States)

Harris, Joshua D; Brand, Jefferson C; Cote, Mark P; Dhawan, Aman

2017-08-01

Within the health care environment, there has been a recent and appropriate trend towards emphasizing the value of care provision. Reduced cost and higher quality improve the value of care. Quality is a challenging, heterogeneous, variably defined concept. At the core of quality is the patient's outcome, quantified by a vast assortment of subjective and objective outcome measures. There has been a recent evolution towards evidence-based medicine in health care, clearly elucidating the role of high-quality evidence across groups of patients and studies. Synthetic studies, such as systematic reviews and meta-analyses, are at the top of the evidence-based medicine hierarchy. Thus, these investigations may be the best potential source of guiding diagnostic, therapeutic, prognostic, and economic medical decision making. Systematic reviews critically appraise and synthesize the best available evidence to provide a conclusion statement (a "take-home point") in response to a specific answerable clinical question. A meta-analysis uses statistical methods to quantitatively combine data from single studies. Meta-analyses should be performed with high methodological quality homogenous studies (Level I or II) or evidence randomized studies, to minimize confounding variable bias. When it is known that the literature is inadequate or a recent systematic review has already been performed with a demonstration of insufficient data, then a new systematic review does not add anything meaningful to the literature. PROSPERO registration and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines assist authors in the design and conduct of systematic reviews and should always be used. Complete transparency of the conduct of the review permits reproducibility and improves fidelity of the conclusions. Pooling of data from overly dissimilar investigations should be avoided. This particularly applies to Level IV evidence, that is, noncomparative investigations
Using Statistical Process Control Charts to Study Stuttering Frequency Variability during a Single Day

Science.gov (United States)

Karimi, Hamid; O'Brian, Sue; Onslow, Mark; Jones, Mark; Menzies, Ross; Packman, Ann

2013-01-01

Purpose: Stuttering varies between and within speaking situations. In this study, the authors used statistical process control charts with 10 case studies to investigate variability of stuttering frequency. Method: Participants were 10 adults who stutter. The authors counted the percentage of syllables stuttered (%SS) for segments of their speech…
Vector-field statistics for the analysis of time varying clinical gait data.

Science.gov (United States)

Donnelly, C J; Alexander, C; Pataky, T C; Stannage, K; Reid, S; Robinson, M A

2017-01-01

In clinical settings, the time varying analysis of gait data relies heavily on the experience of the individual(s) assessing these biological signals. Though three dimensional kinematics are recognised as time varying waveforms (1D), exploratory statistical analysis of these data are commonly carried out with multiple discrete or 0D dependent variables. In the absence of an a priori 0D hypothesis, clinicians are at risk of making type I and II errors in their analyis of time varying gait signatures in the event statistics are used in concert with prefered subjective clinical assesment methods. The aim of this communication was to determine if vector field waveform statistics were capable of providing quantitative corroboration to practically significant differences in time varying gait signatures as determined by two clinically trained gait experts. The case study was a left hemiplegic Cerebral Palsy (GMFCS I) gait patient following a botulinum toxin (BoNT-A) injection to their left gastrocnemius muscle. When comparing subjective clinical gait assessments between two testers, they were in agreement with each other for 61% of the joint degrees of freedom and phases of motion analysed. For tester 1 and tester 2, they were in agreement with the vector-field analysis for 78% and 53% of the kinematic variables analysed. When the subjective analyses of tester 1 and tester 2 were pooled together and then compared to the vector-field analysis, they were in agreement for 83% of the time varying kinematic variables analysed. These outcomes demonstrate that in principle, vector-field statistics corroborates with what a team of clinical gait experts would classify as practically meaningful pre- versus post time varying kinematic differences. The potential for vector-field statistics to be used as a useful clinical tool for the objective analysis of time varying clinical gait data is established. Future research is recommended to assess the usefulness of vector-field analyses
An Empirical Study of Presage Variables in the Teaching-Learning of Statistics, in the Light of Research on Competencies

Science.gov (United States)

Rodriguez, Clemente; Gutierrez-Perez, Jose; Pozo, Teresa

2010-01-01

Introduction: This research seeks to determine the influence exercised by a set of presage and process variables (students' pre-existing opinion towards statistics, their dedication to mastery of statistics content, assessment of the teaching materials, and the teacher's effort in the teaching of statistics) in students' resolution of activities…
A guide to statistical analysis in microbial ecology: a community-focused, living review of multivariate data analyses

OpenAIRE

Buttigieg, Pier Luigi; Ramette, Alban Nicolas

2014-01-01

The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynami...
Statistical optics

Science.gov (United States)

Goodman, J. W.

This book is based on the thesis that some training in the area of statistical optics should be included as a standard part of any advanced optics curriculum. Random variables are discussed, taking into account definitions of probability and random variables, distribution functions and density functions, an extension to two or more random variables, statistical averages, transformations of random variables, sums of real random variables, Gaussian random variables, complex-valued random variables, and random phasor sums. Other subjects examined are related to random processes, some first-order properties of light waves, the coherence of optical waves, some problems involving high-order coherence, effects of partial coherence on imaging systems, imaging in the presence of randomly inhomogeneous media, and fundamental limits in photoelectric detection of light. Attention is given to deterministic versus statistical phenomena and models, the Fourier transform, and the fourth-order moment of the spectrum of a detected speckle image.
Multi-Site and Multi-Variables Statistical Downscaling Technique in the Monsoon Dominated Region of Pakistan

Science.gov (United States)

Khan, Firdos; Pilz, Jürgen

2016-04-01

South Asia is under the severe impacts of changing climate and global warming. The last two decades showed that climate change or global warming is happening and the first decade of 21st century is considered as the warmest decade over Pakistan ever in history where temperature reached 53 0C in 2010. Consequently, the spatio-temporal distribution and intensity of precipitation is badly effected and causes floods, cyclones and hurricanes in the region which further have impacts on agriculture, water, health etc. To cope with the situation, it is important to conduct impact assessment studies and take adaptation and mitigation remedies. For impact assessment studies, we need climate variables at higher resolution. Downscaling techniques are used to produce climate variables at higher resolution; these techniques are broadly divided into two types, statistical downscaling and dynamical downscaling. The target location of this study is the monsoon dominated region of Pakistan. One reason for choosing this area is because the contribution of monsoon rains in this area is more than 80 % of the total rainfall. This study evaluates a statistical downscaling technique which can be then used for downscaling climatic variables. Two statistical techniques i.e. quantile regression and copula modeling are combined in order to produce realistic results for climate variables in the area under-study. To reduce the dimension of input data and deal with multicollinearity problems, empirical orthogonal functions will be used. Advantages of this new method are: (1) it is more robust to outliers as compared to ordinary least squares estimates and other estimation methods based on central tendency and dispersion measures; (2) it preserves the dependence among variables and among sites and (3) it can be used to combine different types of distributions. This is important in our case because we are dealing with climatic variables having different distributions over different meteorological
Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable.

Science.gov (United States)

Austin, Peter C; Steyerberg, Ewout W

2012-06-20

When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.
Statistical ecology comes of age

Science.gov (United States)

Gimenez, Olivier; Buckland, Stephen T.; Morgan, Byron J. T.; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M.; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M.; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

2014-01-01

The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1–4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data. PMID:25540151
Statistical ecology comes of age.

Science.gov (United States)

Gimenez, Olivier; Buckland, Stephen T; Morgan, Byron J T; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

2014-12-01

The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1-4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data.
The Relationship Between Radiative Forcing and Temperature. What Do Statistical Analyses of the Instrumental Temperature Record Measure?

International Nuclear Information System (INIS)

Kaufmann, R.K.; Kauppi, H.; Stock, J.H.

2006-01-01

Comparing statistical estimates for the long-run temperature effect of doubled CO2 with those generated by climate models begs the question, is the long-run temperature effect of doubled CO2 that is estimated from the instrumental temperature record using statistical techniques consistent with the transient climate response, the equilibrium climate sensitivity, or the effective climate sensitivity. Here, we attempt to answer the question, what do statistical analyses of the observational record measure, by using these same statistical techniques to estimate the temperature effect of a doubling in the atmospheric concentration of carbon dioxide from seventeen simulations run for the Coupled Model Intercomparison Project 2 (CMIP2). The results indicate that the temperature effect estimated by the statistical methodology is consistent with the transient climate response and that this consistency is relatively unaffected by sample size or the increase in radiative forcing in the sample
Supermathematics and its applications in statistical physics Grassmann variables and the method of supersymmetry

CERN Document Server

Wegner, Franz

2016-01-01

This text presents the mathematical concepts of Grassmann variables and the method of supersymmetry to a broad audience of physicists interested in applying these tools to disordered and critical systems, as well as related topics in statistical physics. Based on many courses and seminars held by the author, one of the pioneers in this field, the reader is given a systematic and tutorial introduction to the subject matter. The algebra and analysis of Grassmann variables is presented in part I. The mathematics of these variables is applied to a random matrix model, path integrals for fermions, dimer models and the Ising model in two dimensions. Supermathematics - the use of commuting and anticommuting variables on an equal footing - is the subject of part II. The properties of supervectors and supermatrices, which contain both commuting and Grassmann components, are treated in great detail, including the derivation of integral theorems. In part III, supersymmetric physical models are considered. While supersym...

Bounding the conservatism in flaw-related variables for pressure vessel integrity analyses

International Nuclear Information System (INIS)

Foulds, J.R.; Kennedy, E.L.

1993-01-01

The fracture mechanics-based integrity analysis of a pressure vessel, whether performed deterministically or probabilistically, requires use of one or more flaw-related input variables, such as flaw size, number of flaws, flaw location, and flaw type. The specific values of these variables are generally selected with the intent to ensure conservative predictions of vessel integrity. These selected values, however, are largely independent of vessel-specific inspection results, or are, at best, deduced by ''conservative'' interpretation of vessel-specific inspection results without adequate consideration of the pertinent inspection system performance (reliability). In either case, the conservatism associated with the flaw-related variables chosen for analysis remains examination (NDE) technology and the recently formulated ASME Code procedures for qualifying NDE system capability and performance (as applied to selected nuclear power plant components) now provides a systematic means of bounding the conservatism in flaw-related input variables for pressure vessel integrity analyses. This is essentially achieved by establishing probabilistic (risk)-based limits on the assigned variable values, dependent upon the vessel inspection results and on the inspection system unreliability. Described herein is this probabilistic method and its potential application to: (i) defining a vessel-specific ''reference'' flaw for calculating pressure-temperature limit curves in the deterministic evaluation of pressurized water reactor (PWR) reactor vessels, and (ii) limiting the flaw distribution input to a PWR reactor vessel-specific, probabilistic integrity analysis for pressurized thermal shock loads
Autonomic Differentiation Map: A Novel Statistical Tool for Interpretation of Heart Rate Variability

Directory of Open Access Journals (Sweden)

Daniela Lucini

2018-04-01

Full Text Available In spite of the large body of evidence suggesting Heart Rate Variability (HRV alone or combined with blood pressure variability (providing an estimate of baroreflex gain as a useful technique to assess the autonomic regulation of the cardiovascular system, there is still an ongoing debate about methodology, interpretation, and clinical applications. In the present investigation, we hypothesize that non-parametric and multivariate exploratory statistical manipulation of HRV data could provide a novel informational tool useful to differentiate normal controls from clinical groups, such as athletes, or subjects affected by obesity, hypertension, or stress. With a data-driven protocol in 1,352 ambulant subjects, we compute HRV and baroreflex indices from short-term data series as proxies of autonomic (ANS regulation. We apply a three-step statistical procedure, by first removing age and gender effects. Subsequently, by factor analysis, we extract four ANS latent domains that detain the large majority of information (86.94%, subdivided in oscillatory (40.84%, amplitude (18.04%, pressure (16.48%, and pulse domains (11.58%. Finally, we test the overall capacity to differentiate clinical groups vs. control. To give more practical value and improve readability, statistical results concerning individual discriminant ANS proxies and ANS differentiation profiles are displayed through peculiar graphical tools, i.e., significance diagram and ANS differentiation map, respectively. This approach, which simultaneously uses all available information about the system, shows what domains make up the difference in ANS discrimination. e.g., athletes differ from controls in all domains, but with a graded strength: maximal in the (normalized oscillatory and in the pulse domains, slightly less in the pressure domain and minimal in the amplitude domain. The application of multiple (non-parametric and exploratory statistical and graphical tools to ANS proxies defines
The number of subjects per variable required in linear regression analyses.

Science.gov (United States)

Austin, Peter C; Steyerberg, Ewout W

2015-06-01

To determine the number of independent variables that can be included in a linear regression model. We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression coefficients and standard errors, on the empirical coverage of estimated confidence intervals, and on the accuracy of the estimated R(2) of the fitted model. A minimum of approximately two SPV tended to result in estimation of regression coefficients with relative bias of less than 10%. Furthermore, with this minimum number of SPV, the standard errors of the regression coefficients were accurately estimated and estimated confidence intervals had approximately the advertised coverage rates. A much higher number of SPV were necessary to minimize bias in estimating the model R(2), although adjusted R(2) estimates behaved well. The bias in estimating the model R(2) statistic was inversely proportional to the magnitude of the proportion of variation explained by the population regression model. Linear regression models require only two SPV for adequate estimation of regression coefficients, standard errors, and confidence intervals. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Understanding Short-Term Nonmigrating Tidal Variability in the Ionospheric Dynamo Region from SABER Using Information Theory and Bayesian Statistics

Science.gov (United States)

Kumari, K.; Oberheide, J.

2017-12-01

Nonmigrating tidal diagnostics of SABER temperature observations in the ionospheric dynamo region reveal a large amount of variability on time-scales of a few days to weeks. In this paper, we discuss the physical reasons for the observed short-term tidal variability using a novel approach based on Information theory and Bayesian statistics. We diagnose short-term tidal variability as a function of season, QBO, ENSO, and solar cycle and other drivers using time dependent probability density functions, Shannon entropy and Kullback-Leibler divergence. The statistical significance of the approach and its predictive capability is exemplified using SABER tidal diagnostics with emphasis on the responses to the QBO and solar cycle. Implications for F-region plasma density will be discussed.
A simple and robust statistical framework for planning, analysing and interpreting faecal egg count reduction test (FECRT) studies

DEFF Research Database (Denmark)

Denwood, M.J.; McKendrick, I.J.; Matthews, L.

Introduction. There is an urgent need for a method of analysing FECRT data that is computationally simple and statistically robust. A method for evaluating the statistical power of a proposed FECRT study would also greatly enhance the current guidelines. Methods. A novel statistical framework has...... been developed that evaluates observed FECRT data against two null hypotheses: (1) the observed efficacy is consistent with the expected efficacy, and (2) the observed efficacy is inferior to the expected efficacy. The method requires only four simple summary statistics of the observed data. Power...... that the notional type 1 error rate of the new statistical test is accurate. Power calculations demonstrate a power of only 65% with a sample size of 20 treatment and control animals, which increases to 69% with 40 control animals or 79% with 40 treatment animals. Discussion. The method proposed is simple...
Statistical characterization report for Single-Shell Tank 241-T-104

International Nuclear Information System (INIS)

Cromar, R.D.; Wilmarth, S.R.; Jensen, L.

1994-01-01

This report contains the results of the statistical analysis of data from two core samples obtained from single-shell tank 241-T-104 (T-104). Section 2.0 contains a description of the core samples and the chemical analyses performed on the core samples. Section 3.0 contains mean concentration estimates and associated 95% confidence intervals (CIs) on the mean for each of the analytes found in the core composite samples. Section 4.0 contains estimates of the spatial variability (variability between cores) and estimates of the analytical variability from the core composite data. Two types of analytical variability were estimated from the core composite data: (1) sample composite variability (variability between composite samples within the same core) and (2) analytical measurement variability (variability between the primary and duplicate analyses within each core composite sample). Estimates of the analytical measurement variability were used as the reference value to test the significance of the spatial and sample composite variability. Spatial variability was significantly different from zero for 32 out of 80 analytes. The sample composite variance was significantly different from zero for 18 out of the 80 analytes
Statistical characterization report for single-shell tank 241-T-111

International Nuclear Information System (INIS)

Cromar, R.D.; Wilmarth, S.R.

1994-01-01

This report contains the results of the statistical analysis of data from two core samples obtained from single-shell tank 241-T-111 (T-111). Section 2.0 contains a description of the core samples and the chemical analyses performed on the core samples. Section 3.0 contains mean concentration estimates and associated 95% confidence intervals (CIs) on the mean for each of the analytes found in the core samples from T-111. Section 4.0 contains estimates of the spatial variability (variability between cores) and estimates of the analytical variability from the core composite data. Two types of analytical variability were estimated from the core composite data: (1) sample composite variability (variability between composite samples within the same core) and (2) analytical measurement variability (variability between the primary and duplicate analyses within each core composite sample). Estimates of the analytical measurement variability were used as the reference value to test the significance of the spatial and sample composite variability. Spatial variability was significantly different from zero for 39 out of 85 analytes. The sample composite variance was significantly different from zero for (a different) 39 out of the 85 analytes
An ANOVA approach for statistical comparisons of brain networks.

Science.gov (United States)

Fraiman, Daniel; Fraiman, Ricardo

2018-03-16

The study of brain networks has developed extensively over the last couple of decades. By contrast, techniques for the statistical analysis of these networks are less developed. In this paper, we focus on the statistical comparison of brain networks in a nonparametric framework and discuss the associated detection and identification problems. We tested network differences between groups with an analysis of variance (ANOVA) test we developed specifically for networks. We also propose and analyse the behaviour of a new statistical procedure designed to identify different subnetworks. As an example, we show the application of this tool in resting-state fMRI data obtained from the Human Connectome Project. We identify, among other variables, that the amount of sleep the days before the scan is a relevant variable that must be controlled. Finally, we discuss the potential bias in neuroimaging findings that is generated by some behavioural and brain structure variables. Our method can also be applied to other kind of networks such as protein interaction networks, gene networks or social networks.
The variability problem of normal human walking

DEFF Research Database (Denmark)

Simonsen, Erik B; Alkjær, Tine

2012-01-01

Previous investigations have suggested considerable inter-individual variability in the time course pattern of net joint moments during normal human walking, although the limited sample sizes precluded statistical analyses. The purpose of the present study was to obtain joint moment patterns from...... a group of normal subjects and to test whether or not the expected differences would prove to be statistically significant. Fifteen healthy male subjects were recorded on video while they walked across two force platforms. Ten kinematic and kinetic parameters were selected and input to a statistical...... cluster analysis to determine whether or not the 15 subjects could be divided into different 'families' (clusters) of walking strategy. The net joint moments showed a variability corroborating earlier reports. The cluster analysis showed that the 15 subjects could be grouped into two clusters of 5 and 10...
Multivariate statistical analysis of radioactive variables in two phosphate ores from Sudan

International Nuclear Information System (INIS)

Adam, Abdel Majid A.; Eltayeb, Mohamed Ahmed H.

2012-01-01

Multivariate statistical techniques are efficient ways to display complex relationships among many objects. An attempt was made to study the radioactive data in two types of Sudanese phosphate deposits; Kurun and Uro phosphate, using several multivariate statistical methods. Pearson correlation coefficient revealed that a U-238 distribution in Kurun phosphate is controlled by the variation of K-40 concentration, whereas in Uro phosphate it is controlled by the variation of U-235 and U-234 concentration. Histograms and normal Q–Q plots clearly show that the radioactive variables did not follow a normal distribution. This non-normality feature observed may be attributed to complicating influence of geological factors. The principal components analysis (PCA) gives a model of five components for representing the acquired data from Kurun phosphate, where 89.5% of the total variance is explained. A model of four components was sufficient to represent the acquired data from Uro phosphate, where 87.5% of the total data variance is explained. The hierarchical cluster analysis (HCA) indicates that U-238 behaves in the same manner in the two types of phosphates; it associated with a group of four radionuclides; U-234, Po-210, Ra-226, Th-230, which the most abundant radionuclides, and all belong to the uranium-238 decay series. Two parameters have been adapted for the direct differentiate between the two phosphates. Firstly, U-238 in Uro phosphate have shown higher degree of mobility (CV% = 82.6) than that in Kurun phosphate (CV% = 64.7), and secondly, the activity ratio of Th-230/Th-232 in Uro phosphate is nine times than that in Kurun phosphate. - Highlights: ► Multivariate statistical techniques were used to characterize radioactive data. ► U-238 in Uro phosphate shows higher degree of mobility (CV% = 82.6). ► U-238 in Kurun phosphate shows lower degree of mobility (CV% = 64.7). ► The radioactive variables did not follow a normal distribution. ► The ratio of Th
Statistical analysis of nuclear power plant pump failure rate variability: some preliminary results

International Nuclear Information System (INIS)

Martz, H.F.; Whiteman, D.E.

1984-02-01

In-Plant Reliability Data System (IPRDS) pump failure data on over 60 selected pumps in four nuclear power plants are statistically analyzed using the Failure Rate Analysis Code (FRAC). A major purpose of the analysis is to determine which environmental, system, and operating factors adequately explain the variability in the failure data. Catastrophic, degraded, and incipient failure severity categories are considered for both demand-related and time-dependent failures. For catastrophic demand-related pump failures, the variability is explained by the following factors listed in their order of importance: system application, pump driver, operating mode, reactor type, pump type, and unidentified plant-specific influences. Quantitative failure rate adjustments are provided for the effects of these factors. In the case of catastrophic time-dependent pump failures, the failure rate variability is explained by three factors: reactor type, pump driver, and unidentified plant-specific influences. Finally, point and confidence interval failure rate estimates are provided for each selected pump by considering the influential factors. Both types of estimates represent an improvement over the estimates computed exclusively from the data on each pump
Statistical methods for quantitative indicators of impacts, applied to transmission line projects

International Nuclear Information System (INIS)

Ospina Norena, Jesus Efren; Lema Tapias, Alvaro de Jesus

2005-01-01

Multivariate statistical analyses are proposed for encountering the relationships between variables and impacts, to obtain high explanatory power for interpretation of the causes and effects and achieve the highest certainty possible, to evaluate and classify impacts by their level of influence
"Who Was 'Shadow'?" The Computer Knows: Applying Grammar-Program Statistics in Content Analyses to Solve Mysteries about Authorship.

Science.gov (United States)

Ellis, Barbara G.; Dick, Steven J.

1996-01-01

Employs the statistics-documentation portion of a word-processing program's grammar-check feature together with qualitative analyses to determine that Henry Watterson, long-time editor of the "Louisville Courier-Journal," was probably the South's famed Civil War correspondent "Shadow." (TB)
Predictability of the recent slowdown and subsequent recovery of large-scale surface warming using statistical methods

Science.gov (United States)

Mann, Michael E.; Steinman, Byron A.; Miller, Sonya K.; Frankcombe, Leela M.; England, Matthew H.; Cheung, Anson H.

2016-04-01

The temporary slowdown in large-scale surface warming during the early 2000s has been attributed to both external and internal sources of climate variability. Using semiempirical estimates of the internal low-frequency variability component in Northern Hemisphere, Atlantic, and Pacific surface temperatures in concert with statistical hindcast experiments, we investigate whether the slowdown and its recent recovery were predictable. We conclude that the internal variability of the North Pacific, which played a critical role in the slowdown, does not appear to have been predictable using statistical forecast methods. An additional minor contribution from the North Atlantic, by contrast, appears to exhibit some predictability. While our analyses focus on combining semiempirical estimates of internal climatic variability with statistical hindcast experiments, possible implications for initialized model predictions are also discussed.
An MGF-based unified framework to determine the joint statistics of partial sums of ordered random variables

KAUST Repository

Nam, Sungsik

2010-11-01

Order statistics find applications in various areas of communications and signal processing. In this paper, we introduce an unified analytical framework to determine the joint statistics of partial sums of ordered random variables (RVs). With the proposed approach, we can systematically derive the joint statistics of any partial sums of ordered statistics, in terms of the moment generating function (MGF) and the probability density function (PDF). Our MGF-based approach applies not only when all the K ordered RVs are involved but also when only the Ks(Ks < K) best RVs are considered. In addition, we present the closed-form expressions for the exponential RV special case. These results apply to the performance analysis of various wireless communication systems over fading channels. © 2006 IEEE.
Applications of MIDAS regression in analysing trends in water quality

Science.gov (United States)

Penev, Spiridon; Leonte, Daniela; Lazarov, Zdravetz; Mann, Rob A.

2014-04-01

We discuss novel statistical methods in analysing trends in water quality. Such analysis uses complex data sets of different classes of variables, including water quality, hydrological and meteorological. We analyse the effect of rainfall and flow on trends in water quality utilising a flexible model called Mixed Data Sampling (MIDAS). This model arises because of the mixed frequency in the data collection. Typically, water quality variables are sampled fortnightly, whereas the rain data is sampled daily. The advantage of using MIDAS regression is in the flexible and parsimonious modelling of the influence of the rain and flow on trends in water quality variables. We discuss the model and its implementation on a data set from the Shoalhaven Supply System and Catchments in the state of New South Wales, Australia. Information criteria indicate that MIDAS modelling improves upon simplistic approaches that do not utilise the mixed data sampling nature of the data.
Statistical parameters of random heterogeneity estimated by analysing coda waves based on finite difference method

Science.gov (United States)

Emoto, K.; Saito, T.; Shiomi, K.

2017-12-01

Short-period (2 s) seismograms. We found that the energy of the coda of long-period seismograms shows a spatially flat distribution. This phenomenon is well known in short-period seismograms and results from the scattering by small-scale heterogeneities. We estimate the statistical parameters that characterize the small-scale random heterogeneity by modelling the spatiotemporal energy distribution of long-period seismograms. We analyse three moderate-size earthquakes that occurred in southwest Japan. We calculate the spatial distribution of the energy density recorded by a dense seismograph network in Japan at the period bands of 8-16 s, 4-8 s and 2-4 s and model them by using 3-D finite difference (FD) simulations. Compared to conventional methods based on statistical theories, we can calculate more realistic synthetics by using the FD simulation. It is not necessary to assume a uniform background velocity, body or surface waves and scattering properties considered in general scattering theories. By taking the ratio of the energy of the coda area to that of the entire area, we can separately estimate the scattering and the intrinsic absorption effects. Our result reveals the spectrum of the random inhomogeneity in a wide wavenumber range including the intensity around the corner wavenumber as P(m) = 8πε2a3/(1 + a2m2)2, where ε = 0.05 and a = 3.1 km, even though past studies analysing higher-frequency records could not detect the corner. Finally, we estimate the intrinsic attenuation by modelling the decay rate of the energy. The method proposed in this study is suitable for quantifying the statistical properties of long-wavelength subsurface random inhomogeneity, which leads the way to characterizing a wider wavenumber range of spectra, including the corner wavenumber.
Statistical contact angle analyses; "slow moving" drops on a horizontal silicon-oxide surface.

Science.gov (United States)

Schmitt, M; Grub, J; Heib, F

2015-06-01

Sessile drop experiments on horizontal surfaces are commonly used to characterise surface properties in science and in industry. The advancing angle and the receding angle are measurable on every solid. Specially on horizontal surfaces even the notions themselves are critically questioned by some authors. Building a standard, reproducible and valid method of measuring and defining specific (advancing/receding) contact angles is an important challenge of surface science. Recently we have developed two/three approaches, by sigmoid fitting, by independent and by dependent statistical analyses, which are practicable for the determination of specific angles/slopes if inclining the sample surface. These approaches lead to contact angle data which are independent on "user-skills" and subjectivity of the operator which is also of urgent need to evaluate dynamic measurements of contact angles. We will show in this contribution that the slightly modified procedures are also applicable to find specific angles for experiments on horizontal surfaces. As an example droplets on a flat freshly cleaned silicon-oxide surface (wafer) are dynamically measured by sessile drop technique while the volume of the liquid is increased/decreased. The triple points, the time, the contact angles during the advancing and the receding of the drop obtained by high-precision drop shape analysis are statistically analysed. As stated in the previous contribution the procedure is called "slow movement" analysis due to the small covered distance and the dominance of data points with low velocity. Even smallest variations in velocity such as the minimal advancing motion during the withdrawing of the liquid are identifiable which confirms the flatness and the chemical homogeneity of the sample surface and the high sensitivity of the presented approaches. Copyright © 2014 Elsevier Inc. All rights reserved.
An MGF-based unified framework to determine the joint statistics of partial sums of ordered i.n.d. random variables

KAUST Repository

Nam, Sungsik

2014-08-01

The joint statistics of partial sums of ordered random variables (RVs) are often needed for the accurate performance characterization of a wide variety of wireless communication systems. A unified analytical framework to determine the joint statistics of partial sums of ordered independent and identically distributed (i.i.d.) random variables was recently presented. However, the identical distribution assumption may not be valid in several real-world applications. With this motivation in mind, we consider in this paper the more general case in which the random variables are independent but not necessarily identically distributed (i.n.d.). More specifically, we extend the previous analysis and introduce a new more general unified analytical framework to determine the joint statistics of partial sums of ordered i.n.d. RVs. Our mathematical formalism is illustrated with an application on the exact performance analysis of the capture probability of generalized selection combining (GSC)-based RAKE receivers operating over frequency-selective fading channels with a non-uniform power delay profile. © 1991-2012 IEEE.
Biological variability of glycated hemoglobin.

Science.gov (United States)

Braga, Federica; Dolci, Alberto; Mosca, Andrea; Panteghini, Mauro

2010-11-11

The measurement of glycated hemoglobin (HbA(1c)) has a pivotal role in monitoring glycemic state in diabetic patients. Furthermore, the American Diabetes Association has recently recommended the use of HbA(1c) for diabetes diagnosis, but a clear definition of the clinically allowable measurement error is still lacking. Information on biological variability of the analyte can be used to achieve this goal. We systematically reviewed the published studies on the biological variation of HbA(1c) to check consistency of available data in order to accurately define analytical goals. The nine recruited studies were limited by choice of analytic methodology, population selection, protocol application and statistical analyses. There is an urgent need to determine biological variability of HbA(1c) using a specific and traceable assay, appropriate protocol and appropriate statistical evaluation of data. 2010 Elsevier B.V. All rights reserved.

Statistical methodology for discrete fracture model - including fracture size, orientation uncertainty together with intensity uncertainty and variability

International Nuclear Information System (INIS)

Darcel, C.; Davy, P.; Le Goc, R.; Dreuzy, J.R. de; Bour, O.

2009-11-01

Investigations led for several years at Laxemar and Forsmark reveal the large heterogeneity of geological formations and associated fracturing. This project aims at reinforcing the statistical DFN modeling framework adapted to a site scale. This leads therefore to develop quantitative methods of characterization adapted to the nature of fracturing and data availability. We start with the hypothesis that the maximum likelihood DFN model is a power-law model with a density term depending on orientations. This is supported both by literature and specifically here by former analyses of the SKB data. This assumption is nevertheless thoroughly tested by analyzing the fracture trace and lineament maps. Fracture traces range roughly between 0.5 m and 10 m - i e the usual extension of the sample outcrops. Between the raw data and final data used to compute the fracture size distribution from which the size distribution model will arise, several steps are necessary, in order to correct data from finite-size, topographical and sampling effects. More precisely, a particular attention is paid to fracture segmentation status and fracture linkage consistent with the DFN model expected. The fracture scaling trend observed over both sites displays finally a shape parameter k t close to 1.2 with a density term (α 2d ) between 1.4 and 1.8. Only two outcrops clearly display a different trend with k t close to 3 and a density term (α 2d ) between 2 and 3.5. The fracture lineaments spread over the range between 100 meters and a few kilometers. When compared with fracture trace maps, these datasets are already interpreted and the linkage process developed previously has not to be done. Except for the subregional lineament map from Forsmark, lineaments display a clear power-law trend with a shape parameter k t equal to 3 and a density term between 2 and 4.5. The apparent variation in scaling exponent, from the outcrop scale (k t = 1.2) on one side, to the lineament scale (k t = 2) on
Understanding advanced statistical methods

CERN Document Server

Westfall, Peter

2013-01-01

Introduction: Probability, Statistics, and ScienceReality, Nature, Science, and ModelsStatistical Processes: Nature, Design and Measurement, and DataModelsDeterministic ModelsVariabilityParametersPurely Probabilistic Statistical ModelsStatistical Models with Both Deterministic and Probabilistic ComponentsStatistical InferenceGood and Bad ModelsUses of Probability ModelsRandom Variables and Their Probability DistributionsIntroductionTypes of Random Variables: Nominal, Ordinal, and ContinuousDiscrete Probability Distribution FunctionsContinuous Probability Distribution FunctionsSome Calculus-Derivatives and Least SquaresMore Calculus-Integrals and Cumulative Distribution FunctionsProbability Calculation and SimulationIntroductionAnalytic Calculations, Discrete and Continuous CasesSimulation-Based ApproximationGenerating Random NumbersIdentifying DistributionsIntroductionIdentifying Distributions from Theory AloneUsing Data: Estimating Distributions via the HistogramQuantiles: Theoretical and Data-Based Estimate...
Process informed accurate compact modelling of 14-nm FinFET variability and application to statistical 6T-SRAM simulations

OpenAIRE

Wang, Xingsheng; Reid, Dave; Wang, Liping; Millar, Campbell; Burenkov, Alex; Evanschitzky, Peter; Baer, Eberhard; Lorenz, Juergen; Asenov, Asen

2016-01-01

This paper presents a TCAD based design technology co-optimization (DTCO) process for 14nm SOI FinFET based SRAM, which employs an enhanced variability aware compact modeling approach that fully takes process and lithography simulations and their impact on 6T-SRAM layout into account. Realistic double patterned gates and fins and their impacts are taken into account in the development of the variability-aware compact model. Finally, global process induced variability and local statistical var...
Trends and variability of cloud fraction cover in the Arctic, 1982-2009

Science.gov (United States)

Boccolari, Mauro; Parmiggiani, Flavio

2018-05-01

Climatology, trends and variability of cloud fraction cover (CFC) data over the Arctic (north of 70°N), were analysed over the 1982-2009 period. Data, available from the Climate Monitoring Satellite Application Facility (CM SAF), are derived from satellite measurements by AVHRR. Climatological means confirm permanent high CFC values over the Atlantic sector during all the year and during summer over the eastern Arctic Ocean. Lower values are found in the rest of the analysed area especially over Greenland and the Canadian Archipelago, nearly continuously during all the months. These results are confirmed by CFC trends and variability. Statistically significant trends were found during all the months over the Greenland Sea, particularly during the winter season (negative, less than -5 % dec -1) and over the Beaufort Sea in spring (positive, more than +5 % dec -1). CFC variability, investigated by the Empirical Orthogonal Functions, shows a substantial "non-variability" in the Northern Atlantic Ocean. Statistically significant correlations between CFC principal components elements and both the Pacific Decadal Oscillation index and Pacific North America patterns are found.
ESEARCH OF THE LAW OF DISTRIBUTION OF THE RANDOM VARIABLE OF THE COMPRESSION

Directory of Open Access Journals (Sweden)

I. Sarayeva

2011-01-01

Full Text Available At research of diagnosing the process of modern automobile engines by means of methods of mathematical statistics the experimental data of the random variable of compression are analysed and it is proved that the random variable of compression has the form of the normal law of distribution.
Statistical analyses of incidents on onshore gas transmission pipelines based on PHMSA database

International Nuclear Information System (INIS)

Lam, Chio; Zhou, Wenxing

2016-01-01

This article reports statistical analyses of the mileage and pipe-related incidents data corresponding to the onshore gas transmission pipelines in the US between 2002 and 2013 collected by the Pipeline Hazardous Material Safety Administration of the US Department of Transportation. The analysis indicates that there are approximately 480,000 km of gas transmission pipelines in the US, approximately 60% of them more than 45 years old as of 2013. Eighty percent of the pipelines are Class 1 pipelines, and about 20% of the pipelines are Classes 2 and 3 pipelines. It is found that the third-party excavation, external corrosion, material failure and internal corrosion are the four leading failure causes, responsible for more than 75% of the total incidents. The 12-year average rate of rupture equals 3.1 × 10"−"5 per km-year due to all failure causes combined. External corrosion is the leading cause for ruptures: the 12-year average rupture rate due to external corrosion equals 1.0 × 10"−"5 per km-year and is twice the rupture rate due to the third-party excavation or material failure. The study provides insights into the current state of gas transmission pipelines in the US and baseline failure statistics for the quantitative risk assessments of such pipelines. - Highlights: • Analyze PHMSA pipeline mileage and incident data between 2002 and 2013. • Focus on gas transmission pipelines. • Leading causes for pipeline failures are identified. • Provide baseline failure statistics for risk assessments of gas transmission pipelines.
The interprocess NIR sampling as an alternative approach to multivariate statistical process control for identifying sources of product-quality variability.

Science.gov (United States)

Marković, Snežana; Kerč, Janez; Horvat, Matej

2017-03-01

We are presenting a new approach of identifying sources of variability within a manufacturing process by NIR measurements of samples of intermediate material after each consecutive unit operation (interprocess NIR sampling technique). In addition, we summarize the development of a multivariate statistical process control (MSPC) model for the production of enteric-coated pellet product of the proton-pump inhibitor class. By developing provisional NIR calibration models, the identification of critical process points yields comparable results to the established MSPC modeling procedure. Both approaches are shown to lead to the same conclusion, identifying parameters of extrusion/spheronization and characteristics of lactose that have the greatest influence on the end-product's enteric coating performance. The proposed approach enables quicker and easier identification of variability sources during manufacturing process, especially in cases when historical process data is not straightforwardly available. In the presented case the changes of lactose characteristics are influencing the performance of the extrusion/spheronization process step. The pellet cores produced by using one (considered as less suitable) lactose source were on average larger and more fragile, leading to consequent breakage of the cores during subsequent fluid bed operations. These results were confirmed by additional experimental analyses illuminating the underlying mechanism of fracture of oblong pellets during the pellet coating process leading to compromised film coating.
Cost and quality effectiveness of objective-based and statistically-based quality control for volatile organic compounds analyses of gases

International Nuclear Information System (INIS)

Bennett, J.T.; Crowder, C.A.; Connolly, M.J.

1994-01-01

Gas samples from drums of radioactive waste at the Department of Energy (DOE) Idaho National Engineering Laboratory are being characterized for 29 volatile organic compounds to determine the feasibility of storing the waste in DOE's Waste Isolation Pilot Plant (WIPP) in Carlsbad, New Mexico. Quality requirements for the gas chromatography (GC) and GC/mass spectrometry chemical methods used to analyze the waste are specified in the Quality Assurance Program Plan for the WIPP Experimental Waste Characterization Program. Quality requirements consist of both objective criteria (data quality objectives, DQOs) and statistical criteria (process control). The DQOs apply to routine sample analyses, while the statistical criteria serve to determine and monitor precision and accuracy (P ampersand A) of the analysis methods and are also used to assign upper confidence limits to measurement results close to action levels. After over two years and more than 1000 sample analyses there are two general conclusions concerning the two approaches to quality control: (1) Objective criteria (e.g., ± 25% precision, ± 30% accuracy) based on customer needs and the usually prescribed criteria for similar EPA- approved methods are consistently attained during routine analyses. (2) Statistical criteria based on short term method performance are almost an order of magnitude more stringent than objective criteria and are difficult to satisfy following the same routine laboratory procedures which satisfy the objective criteria. A more cost effective and representative approach to establishing statistical method performances criteria would be either to utilize a moving average of P ampersand A from control samples over a several month time period or to determine within a sample variation by one-way analysis of variance of several months replicate sample analysis results or both. Confidence intervals for results near action levels could also be determined by replicate analysis of the sample in
Renyi statistics in equilibrium statistical mechanics

International Nuclear Information System (INIS)

Parvan, A.S.; Biro, T.S.

2010-01-01

The Renyi statistics in the canonical and microcanonical ensembles is examined both in general and in particular for the ideal gas. In the microcanonical ensemble the Renyi statistics is equivalent to the Boltzmann-Gibbs statistics. By the exact analytical results for the ideal gas, it is shown that in the canonical ensemble, taking the thermodynamic limit, the Renyi statistics is also equivalent to the Boltzmann-Gibbs statistics. Furthermore it satisfies the requirements of the equilibrium thermodynamics, i.e. the thermodynamical potential of the statistical ensemble is a homogeneous function of first degree of its extensive variables of state. We conclude that the Renyi statistics arrives at the same thermodynamical relations, as those stemming from the Boltzmann-Gibbs statistics in this limit.
A variable thickness window: Thermal and structural analyses

International Nuclear Information System (INIS)

Wang, Zhibi; Kuzay, T.M.

1994-01-01

In this paper, the finite difference formulations for variable thickness thermal analysis and variable thickness plane stress analysis are presented. In heat transfer analysis, radiation effects and temperature-dependent thermal conductivity are taken into account. While in thermal stress analysis, the thermal expansion coefficient is considered as temperature dependent. An application of the variable thickness window to an Advanced Photon Source beamline is presented
Descriptive statistics.

Science.gov (United States)

Nick, Todd G

2007-01-01

Statistics is defined by the Medical Subject Headings (MeSH) thesaurus as the science and art of collecting, summarizing, and analyzing data that are subject to random variation. The two broad categories of summarizing and analyzing data are referred to as descriptive and inferential statistics. This chapter considers the science and art of summarizing data where descriptive statistics and graphics are used to display data. In this chapter, we discuss the fundamentals of descriptive statistics, including describing qualitative and quantitative variables. For describing quantitative variables, measures of location and spread, for example the standard deviation, are presented along with graphical presentations. We also discuss distributions of statistics, for example the variance, as well as the use of transformations. The concepts in this chapter are useful for uncovering patterns within the data and for effectively presenting the results of a project.
The analysis of morphometric data on rocky mountain wolves and artic wolves using statistical method

Science.gov (United States)

Ammar Shafi, Muhammad; Saifullah Rusiman, Mohd; Hamzah, Nor Shamsidah Amir; Nor, Maria Elena; Ahmad, Noor’ani; Azia Hazida Mohamad Azmi, Nur; Latip, Muhammad Faez Ab; Hilmi Azman, Ahmad

2018-04-01

Morphometrics is a quantitative analysis depending on the shape and size of several specimens. Morphometric quantitative analyses are commonly used to analyse fossil record, shape and size of specimens and others. The aim of the study is to find the differences between rocky mountain wolves and arctic wolves based on gender. The sample utilised secondary data which included seven variables as independent variables and two dependent variables. Statistical modelling was used in the analysis such was the analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA). The results showed there exist differentiating results between arctic wolves and rocky mountain wolves based on independent factors and gender.
Wind Patterns of Coastal Tanzania: Their Variability and Trends

African Journals Online (AJOL)

Abstract—Patterns in Tanzanian coastal winds were investigated in terms of their variability at the weather stations of Tanga, Zanzibar, Dar es Salaam and Mtwara. Three-hourly data collected over a 30-year period (1977-2006) were used for the study. Statistical analyses included regressions, correlations, spectral analysis,.
Computational statistics handbook with Matlab

CERN Document Server

Martinez, Wendy L

2007-01-01

Prefaces Introduction What Is Computational Statistics? An Overview of the Book Probability Concepts Introduction Probability Conditional Probability and Independence Expectation Common Distributions Sampling Concepts Introduction Sampling Terminology and Concepts Sampling Distributions Parameter Estimation Empirical Distribution Function Generating Random Variables Introduction General Techniques for Generating Random Variables Generating Continuous Random Variables Generating Discrete Random Variables Exploratory Data Analysis Introduction Exploring Univariate Data Exploring Bivariate and Trivariate Data Exploring Multidimensional Data Finding Structure Introduction Projecting Data Principal Component Analysis Projection Pursuit EDA Independent Component Analysis Grand Tour Nonlinear Dimensionality Reduction Monte Carlo Methods for Inferential Statistics Introduction Classical Inferential Statistics Monte Carlo Methods for Inferential Statist...
PRIS-STATISTICS: Power Reactor Information System Statistical Reports. User's Manual

International Nuclear Information System (INIS)

2013-01-01

The IAEA developed the Power Reactor Information System (PRIS)-Statistics application to assist PRIS end users with generating statistical reports from PRIS data. Statistical reports provide an overview of the status, specification and performance results of every nuclear power reactor in the world. This user's manual was prepared to facilitate the use of the PRIS-Statistics application and to provide guidelines and detailed information for each report in the application. Statistical reports support analyses of nuclear power development and strategies, and the evaluation of nuclear power plant performance. The PRIS database can be used for comprehensive trend analyses and benchmarking against best performers and industrial standards.
Statistical Learning and Adaptive Decision-Making Underlie Human Response Time Variability in Inhibitory Control

Directory of Open Access Journals (Sweden)

Ning eMa

2015-08-01

Full Text Available Response time (RT is an oft-reported behavioral measure in psychological and neurocognitive experiments, but the high level of observed trial-to-trial variability in this measure has often limited its usefulness. Here, we combine computational modeling and psychophysics to examine the hypothesis that fluctuations in this noisy measure reflect dynamic computations in human statistical learning and corresponding cognitive adjustments. We present data from the stop-signal task, in which subjects respond to a go stimulus on each trial, unless instructed not to by a subsequent, infrequently presented stop signal. We model across-trial learning of stop signal frequency, P(stop, and stop-signal onset time, SSD (stop-signal delay, with a Bayesian hidden Markov model, and within-trial decision-making with an optimal stochastic control model. The combined model predicts that RT should increase with both expected P(stop and SSD. The human behavioral data (n=20 bear out this prediction, showing P(stop and SSD both to be significant, independent predictors of RT, with P(stop being a more prominent predictor in 75% of the subjects, and SSD being more prominent in the remaining 25%. The results demonstrate that humans indeed readily internalize environmental statistics and adjust their cognitive/behavioral strategy accordingly, and that subtle patterns in RT variability can serve as a valuable tool for validating models of statistical learning and decision-making. More broadly, the modeling tools presented in this work can be generalized to a large body of behavioral paradigms, in order to extract insights about cognitive and neural processing from apparently quite noisy behavioral measures. We also discuss how this behaviorally validated model can then be used to conduct model-based analysis of neural data, in order to help identify specific brain areas for representing and encoding key computational quantities in learning and decision-making.
Statistical learning and adaptive decision-making underlie human response time variability in inhibitory control.

Science.gov (United States)

Ma, Ning; Yu, Angela J

2015-01-01

Response time (RT) is an oft-reported behavioral measure in psychological and neurocognitive experiments, but the high level of observed trial-to-trial variability in this measure has often limited its usefulness. Here, we combine computational modeling and psychophysics to examine the hypothesis that fluctuations in this noisy measure reflect dynamic computations in human statistical learning and corresponding cognitive adjustments. We present data from the stop-signal task (SST), in which subjects respond to a go stimulus on each trial, unless instructed not to by a subsequent, infrequently presented stop signal. We model across-trial learning of stop signal frequency, P(stop), and stop-signal onset time, SSD (stop-signal delay), with a Bayesian hidden Markov model, and within-trial decision-making with an optimal stochastic control model. The combined model predicts that RT should increase with both expected P(stop) and SSD. The human behavioral data (n = 20) bear out this prediction, showing P(stop) and SSD both to be significant, independent predictors of RT, with P(stop) being a more prominent predictor in 75% of the subjects, and SSD being more prominent in the remaining 25%. The results demonstrate that humans indeed readily internalize environmental statistics and adjust their cognitive/behavioral strategy accordingly, and that subtle patterns in RT variability can serve as a valuable tool for validating models of statistical learning and decision-making. More broadly, the modeling tools presented in this work can be generalized to a large body of behavioral paradigms, in order to extract insights about cognitive and neural processing from apparently quite noisy behavioral measures. We also discuss how this behaviorally validated model can then be used to conduct model-based analysis of neural data, in order to help identify specific brain areas for representing and encoding key computational quantities in learning and decision-making.
Statistical Analyses of High-Resolution Aircraft and Satellite Observations of Sea Ice: Applications for Improving Model Simulations

Science.gov (United States)

Farrell, S. L.; Kurtz, N. T.; Richter-Menge, J.; Harbeck, J. P.; Onana, V.

2012-12-01

Satellite-derived estimates of ice thickness and observations of ice extent over the last decade point to a downward trend in the basin-scale ice volume of the Arctic Ocean. This loss has broad-ranging impacts on the regional climate and ecosystems, as well as implications for regional infrastructure, marine navigation, national security, and resource exploration. New observational datasets at small spatial and temporal scales are now required to improve our understanding of physical processes occurring within the ice pack and advance parameterizations in the next generation of numerical sea-ice models. High-resolution airborne and satellite observations of the sea ice are now available at meter-scale resolution or better that provide new details on the properties and morphology of the ice pack across basin scales. For example the NASA IceBridge airborne campaign routinely surveys the sea ice of the Arctic and Southern Oceans with an advanced sensor suite including laser and radar altimeters and digital cameras that together provide high-resolution measurements of sea ice freeboard, thickness, snow depth and lead distribution. Here we present statistical analyses of the ice pack primarily derived from the following IceBridge instruments: the Digital Mapping System (DMS), a nadir-looking, high-resolution digital camera; the Airborne Topographic Mapper, a scanning lidar; and the University of Kansas snow radar, a novel instrument designed to estimate snow depth on sea ice. Together these instruments provide data from which a wide range of sea ice properties may be derived. We provide statistics on lead distribution and spacing, lead width and area, floe size and distance between floes, as well as ridge height, frequency and distribution. The goals of this study are to (i) identify unique statistics that can be used to describe the characteristics of specific ice regions, for example first-year/multi-year ice, diffuse ice edge/consolidated ice pack, and convergent
Spectral analyses of systolic blood pressure and heart rate variability and their association with cognitive performance in elderly hypertensive subjects.

Science.gov (United States)

Santos, W B; Matoso, J M D; Maltez, M; Gonçalves, T; Casanova, M; Moreira, I F H; Lourenço, R A; Monteiro, W D; Farinatti, P T V; Soares, P P; Oigman, W; Neves, M F T; Correia, M L G

2015-08-01

Systolic hypertension is associated with cognitive decline in the elderly. Altered blood pressure (BP) variability is a possible mechanism of reduced cognitive performance in elderly hypertensives. We hypothesized that altered beat-to-beat systolic BP variability is associated with reduced global cognitive performance in elderly hypertensive subjects. In exploratory analyses, we also studied the correlation between diverse discrete cognitive domains and indices of systolic BP and heart rate variability. Disproving our initial hypothesis, we have shown that hypertension and low education, but not indices of systolic BP and heart rate variability, were independent predictors of lower global cognitive performance. However, exploratory analyses showed that the systolic BP variability in semi-upright position was an independent predictor of matrix reasoning (B = 0.08 ± .03, P-value = 0.005), whereas heart rate variability in semi-upright position was an independent predictor of the executive function score (B = -6.36 ± 2.55, P-value = 0.02). We conclude that myogenic vascular and sympathetic modulation of systolic BP do not contribute to reduced global cognitive performance in treated hypertensive subjects. Nevertheless, our results suggest that both systolic BP and heart rate variability might be associated with modulation of frontal lobe cognitive domains, such as executive function and matrix reasoning.
Introduction to Statistics

Directory of Open Access Journals (Sweden)

Mirjam Nielen

2017-01-01

Full Text Available Always wondered why research papers often present rather complicated statistical analyses? Or wondered how to properly analyse the results of a pragmatic trial from your own practice? This talk will give an overview of basic statistical principles and focus on the why of statistics, rather than on the how.This is a podcast of Mirjam's talk at the Veterinary Evidence Today conference, Edinburgh November 2, 2016.

Robustness assessments are needed to reduce bias in meta-analyses that include zero-event randomized trials

DEFF Research Database (Denmark)

Keus, F; Wetterslev, J; Gluud, C

2009-01-01

of statistical method on inference. RESULTS: In seven meta-analyses of seven outcomes from 15 trials, there were zero-event trials in 0 to 71.4% of the trials. We found inconsistency in significance in one of seven outcomes (14%; 95% confidence limit 0.4%-57.9%). There was also considerable variability......OBJECTIVES: Meta-analysis of randomized trials with binary data can use a variety of statistical methods. Zero-event trials may create analytic problems. We explored how different methods may impact inferences from meta-analyses containing zero-event trials. METHODS: Five levels of statistical...... methods are identified for meta-analysis with zero-event trials, leading to numerous data analyses. We used the binary outcomes from our Cochrane review of randomized trials of laparoscopic vs. small-incision cholecystectomy for patients with symptomatic cholecystolithiasis to illustrate the influence...
Complex analyses of inverted repeats in mitochondrial genomes revealed their importance and variability.

Science.gov (United States)

Cechová, Jana; Lýsek, Jirí; Bartas, Martin; Brázda, Václav

2018-04-01

The NCBI database contains mitochondrial DNA (mtDNA) genomes from numerous species. We investigated the presence and locations of inverted repeat sequences (IRs) in these mtDNA sequences, which are known to be important for regulating nuclear genomes. IRs were identified in mtDNA in all species. IR lengths and frequencies correlate with evolutionary age and the greatest variability was detected in subgroups of plants and fungi and the lowest variability in mammals. IR presence is non-random and evolutionary favoured. The frequency of IRs generally decreased with IR length, but not for IRs 24 or 30 bp long, which are 1.5 times more abundant. IRs are enriched in sequences from the replication origin, followed by D-loop, stem-loop and miscellaneous sequences, pointing to the importance of IRs in regulatory regions of mitochondrial DNA. Data were produced using Palindrome analyser, freely available on the web at http://bioinformatics.ibp.cz. vaclav@ibp.cz. Supplementary data are available at Bioinformatics online.
Systematic Mapping and Statistical Analyses of Valley Landform and Vegetation Asymmetries Across Hydroclimatic Gradients

Science.gov (United States)

Poulos, M. J.; Pierce, J. L.; McNamara, J. P.; Flores, A. N.; Benner, S. G.

2015-12-01

Terrain aspect alters the spatial distribution of insolation across topography, driving eco-pedo-hydro-geomorphic feedbacks that can alter landform evolution and result in valley asymmetries for a suite of land surface characteristics (e.g. slope length and steepness, vegetation, soil properties, and drainage development). Asymmetric valleys serve as natural laboratories for studying how landscapes respond to climate perturbation. In the semi-arid montane granodioritic terrain of the Idaho batholith, Northern Rocky Mountains, USA, prior works indicate that reduced insolation on northern (pole-facing) aspects prolongs snow pack persistence, and is associated with thicker, finer-grained soils, that retain more water, prolong the growing season, support coniferous forest rather than sagebrush steppe ecosystems, stabilize slopes at steeper angles, and produce sparser drainage networks. We hypothesize that the primary drivers of valley asymmetry development are changes in the pedon-scale water-balance that coalesce to alter catchment-scale runoff and drainage development, and ultimately cause the divide between north and south-facing land surfaces to migrate northward. We explore this conceptual framework by coupling land surface analyses with statistical modeling to assess relationships and the relative importance of land surface characteristics. Throughout the Idaho batholith, we systematically mapped and tabulated various statistical measures of landforms, land cover, and hydroclimate within discrete valley segments (n=~10,000). We developed a random forest based statistical model to predict valley slope asymmetry based upon numerous measures (n>300) of landscape asymmetries. Preliminary results suggest that drainages are tightly coupled with hillslopes throughout the region, with drainage-network slope being one of the strongest predictors of land-surface-averaged slope asymmetry. When slope-related statistics are excluded, due to possible autocorrelation, valley
Steric sea level variability (1993-2010) in an ensemble of ocean reanalyses and objective analyses

Science.gov (United States)

Storto, Andrea; Masina, Simona; Balmaseda, Magdalena; Guinehut, Stéphanie; Xue, Yan; Szekely, Tanguy; Fukumori, Ichiro; Forget, Gael; Chang, You-Soon; Good, Simon A.; Köhl, Armin; Vernieres, Guillaume; Ferry, Nicolas; Peterson, K. Andrew; Behringer, David; Ishii, Masayoshi; Masuda, Shuhei; Fujii, Yosuke; Toyoda, Takahiro; Yin, Yonghong; Valdivieso, Maria; Barnier, Bernard; Boyer, Tim; Lee, Tony; Gourrion, Jérome; Wang, Ou; Heimback, Patrick; Rosati, Anthony; Kovach, Robin; Hernandez, Fabrice; Martin, Matthew J.; Kamachi, Masafumi; Kuragano, Tsurane; Mogensen, Kristian; Alves, Oscar; Haines, Keith; Wang, Xiaochun

2017-08-01

Quantifying the effect of the seawater density changes on sea level variability is of crucial importance for climate change studies, as the sea level cumulative rise can be regarded as both an important climate change indicator and a possible danger for human activities in coastal areas. In this work, as part of the Ocean Reanalysis Intercomparison Project, the global and regional steric sea level changes are estimated and compared from an ensemble of 16 ocean reanalyses and 4 objective analyses. These estimates are initially compared with a satellite-derived (altimetry minus gravimetry) dataset for a short period (2003-2010). The ensemble mean exhibits a significant high correlation at both global and regional scale, and the ensemble of ocean reanalyses outperforms that of objective analyses, in particular in the Southern Ocean. The reanalysis ensemble mean thus represents a valuable tool for further analyses, although large uncertainties remain for the inter-annual trends. Within the extended intercomparison period that spans the altimetry era (1993-2010), we find that the ensemble of reanalyses and objective analyses are in good agreement, and both detect a trend of the global steric sea level of 1.0 and 1.1 ± 0.05 mm/year, respectively. However, the spread among the products of the halosteric component trend exceeds the mean trend itself, questioning the reliability of its estimate. This is related to the scarcity of salinity observations before the Argo era. Furthermore, the impact of deep ocean layers is non-negligible on the steric sea level variability (22 and 12 % for the layers below 700 and 1500 m of depth, respectively), although the small deep ocean trends are not significant with respect to the products spread.
Statistical analysis of manufacturing defects on fatigue life of wind turbine casted Component

DEFF Research Database (Denmark)

Rafsanjani, Hesam Mirzaei; Sørensen, John Dalsgaard; Mukherjee, Krishnendu

2014-01-01

Wind turbine components experience heavily variable loads during its lifetime and fatigue failure is a main failure mode of casted components during their design working life. The fatigue life is highly dependent on the microstructure (grain size and graphite form and size), number, type, location...... and size of defects in the casted components and is therefore rather uncertain and needs to be described by stochastic models. Uncertainties related to such defects influence prediction of the fatigue strengths and are therefore important in modelling and assessment of the reliability of wind turbine...... for the fatigue life, namely LogNormal and Weibull distributions. The statistical analyses are performed using the Maximum Likelihood Method and the statistical uncertainty is estimated. Further, stochastic models for the fatigue life obtained from the statistical analyses are used for illustration to assess...
Statistical analysis plan for the EuroHYP-1 trial

DEFF Research Database (Denmark)

Winkel, Per; Bath, Philip M; Gluud, Christian

2017-01-01

Score; (4) brain infarct size at 48 +/-24 hours; (5) EQ-5D-5 L score, and (6) WHODAS 2.0 score. Other outcomes are: the primary safety outcome serious adverse events; and the incremental cost-effectiveness, and cost utility ratios. The analysis sets include (1) the intention-to-treat population, and (2...... outcome), logistic regression (binary outcomes), general linear model (continuous outcomes), and the Poisson or negative binomial model (rate outcomes). DISCUSSION: Major adjustments compared with the original statistical analysis plan encompass: (1) adjustment of analyses by nationality; (2) power......) the per protocol population. The sample size is estimated to 800 patients (5% type 1 and 20% type 2 errors). All analyses are adjusted for the protocol-specified stratification variables (nationality of centre), and the minimisation variables. In the analysis, we use ordinal regression (the primary...
Statistical methodology for discrete fracture model - including fracture size, orientation uncertainty together with intensity uncertainty and variability

Energy Technology Data Exchange (ETDEWEB)

Darcel, C. (Itasca Consultants SAS (France)); Davy, P.; Le Goc, R.; Dreuzy, J.R. de; Bour, O. (Geosciences Rennes, UMR 6118 CNRS, Univ. def Rennes, Rennes (France))

2009-11-15

Investigations led for several years at Laxemar and Forsmark reveal the large heterogeneity of geological formations and associated fracturing. This project aims at reinforcing the statistical DFN modeling framework adapted to a site scale. This leads therefore to develop quantitative methods of characterization adapted to the nature of fracturing and data availability. We start with the hypothesis that the maximum likelihood DFN model is a power-law model with a density term depending on orientations. This is supported both by literature and specifically here by former analyses of the SKB data. This assumption is nevertheless thoroughly tested by analyzing the fracture trace and lineament maps. Fracture traces range roughly between 0.5 m and 10 m - i e the usual extension of the sample outcrops. Between the raw data and final data used to compute the fracture size distribution from which the size distribution model will arise, several steps are necessary, in order to correct data from finite-size, topographical and sampling effects. More precisely, a particular attention is paid to fracture segmentation status and fracture linkage consistent with the DFN model expected. The fracture scaling trend observed over both sites displays finally a shape parameter k{sub t} close to 1.2 with a density term (alpha{sub 2d}) between 1.4 and 1.8. Only two outcrops clearly display a different trend with k{sub t} close to 3 and a density term (alpha{sub 2d}) between 2 and 3.5. The fracture lineaments spread over the range between 100 meters and a few kilometers. When compared with fracture trace maps, these datasets are already interpreted and the linkage process developed previously has not to be done. Except for the subregional lineament map from Forsmark, lineaments display a clear power-law trend with a shape parameter k{sub t} equal to 3 and a density term between 2 and 4.5. The apparent variation in scaling exponent, from the outcrop scale (k{sub t} = 1.2) on one side, to
Influence of peer review on the reporting of primary outcome(s) and statistical analyses of randomised trials.

Science.gov (United States)

Hopewell, Sally; Witt, Claudia M; Linde, Klaus; Icke, Katja; Adedire, Olubusola; Kirtley, Shona; Altman, Douglas G

2018-01-11

Selective reporting of outcomes in clinical trials is a serious problem. We aimed to investigate the influence of the peer review process within biomedical journals on reporting of primary outcome(s) and statistical analyses within reports of randomised trials. Each month, PubMed (May 2014 to April 2015) was searched to identify primary reports of randomised trials published in six high-impact general and 12 high-impact specialty journals. The corresponding author of each trial was invited to complete an online survey asking authors about changes made to their manuscript as part of the peer review process. Our main outcomes were to assess: (1) the nature and extent of changes as part of the peer review process, in relation to reporting of the primary outcome(s) and/or primary statistical analysis; (2) how often authors followed these requests; and (3) whether this was related to specific journal or trial characteristics. Of 893 corresponding authors who were invited to take part in the online survey 258 (29%) responded. The majority of trials were multicentre (n = 191; 74%); median sample size 325 (IQR 138 to 1010). The primary outcome was clearly defined in 92% (n = 238), of which the direction of treatment effect was statistically significant in 49%. The majority responded (1-10 Likert scale) they were satisfied with the overall handling (mean 8.6, SD 1.5) and quality of peer review (mean 8.5, SD 1.5) of their manuscript. Only 3% (n = 8) said that the editor or peer reviewers had asked them to change or clarify the trial's primary outcome. However, 27% (n = 69) reported they were asked to change or clarify the statistical analysis of the primary outcome; most had fulfilled the request, the main motivation being to improve the statistical methods (n = 38; 55%) or avoid rejection (n = 30; 44%). Overall, there was little association between authors being asked to make this change and the type of journal, intervention, significance of the
A weighted U statistic for association analyses considering genetic heterogeneity.

Science.gov (United States)

Wei, Changshuai; Elston, Robert C; Lu, Qing

2016-07-20

Converging evidence suggests that common complex diseases with the same or similar clinical manifestations could have different underlying genetic etiologies. While current research interests have shifted toward uncovering rare variants and structural variations predisposing to human diseases, the impact of heterogeneity in genetic studies of complex diseases has been largely overlooked. Most of the existing statistical methods assume the disease under investigation has a homogeneous genetic effect and could, therefore, have low power if the disease undergoes heterogeneous pathophysiological and etiological processes. In this paper, we propose a heterogeneity-weighted U (HWU) method for association analyses considering genetic heterogeneity. HWU can be applied to various types of phenotypes (e.g., binary and continuous) and is computationally efficient for high-dimensional genetic data. Through simulations, we showed the advantage of HWU when the underlying genetic etiology of a disease was heterogeneous, as well as the robustness of HWU against different model assumptions (e.g., phenotype distributions). Using HWU, we conducted a genome-wide analysis of nicotine dependence from the Study of Addiction: Genetics and Environments dataset. The genome-wide analysis of nearly one million genetic markers took 7h, identifying heterogeneous effects of two new genes (i.e., CYP3A5 and IKBKB) on nicotine dependence. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Statistical processing of experimental data

OpenAIRE

NAVRÁTIL, Pavel

2012-01-01

This thesis contains theory of probability and statistical sets. Solved and unsolved problems of probability, random variable and distributions random variable, random vector, statistical sets, regression and correlation analysis. Unsolved problems contains solutions.
An audit of the statistics and the comparison with the parameter in the population

Science.gov (United States)

Bujang, Mohamad Adam; Sa'at, Nadiah; Joys, A. Reena; Ali, Mariana Mohamad

2015-10-01

The sufficient sample size that is needed to closely estimate the statistics for particular parameters are use to be an issue. Although sample size might had been calculated referring to objective of the study, however, it is difficult to confirm whether the statistics are closed with the parameter for a particular population. All these while, guideline that uses a p-value less than 0.05 is widely used as inferential evidence. Therefore, this study had audited results that were analyzed from various sub sample and statistical analyses and had compared the results with the parameters in three different populations. Eight types of statistical analysis and eight sub samples for each statistical analysis were analyzed. Results found that the statistics were consistent and were closed to the parameters when the sample study covered at least 15% to 35% of population. Larger sample size is needed to estimate parameter that involve with categorical variables compared with numerical variables. Sample sizes with 300 to 500 are sufficient to estimate the parameters for medium size of population.
Statistical approaches to assessing single and multiple outcome measures in dry eye therapy and diagnosis.

Science.gov (United States)

Tomlinson, Alan; Hair, Mario; McFadyen, Angus

2013-10-01

Dry eye is a multifactorial disease which would require a broad spectrum of test measures in the monitoring of its treatment and diagnosis. However, studies have typically reported improvements in individual measures with treatment. Alternative approaches involve multiple, combined outcomes being assessed by different statistical analyses. In order to assess the effect of various statistical approaches to the use of single and combined test measures in dry eye, this review reanalyzed measures from two previous studies (osmolarity, evaporation, tear turnover rate, and lipid film quality). These analyses assessed the measures as single variables within groups, pre- and post-intervention with a lubricant supplement, by creating combinations of these variables and by validating these combinations with the combined sample of data from all groups of dry eye subjects. The effectiveness of single measures and combinations in diagnosis of dry eye was also considered. Copyright © 2013. Published by Elsevier Inc.
A statistical, task-based evaluation method for three-dimensional x-ray breast imaging systems using variable-background phantoms

International Nuclear Information System (INIS)

Park, Subok; Jennings, Robert; Liu Haimo; Badano, Aldo; Myers, Kyle

2010-01-01

Purpose: For the last few years, development and optimization of three-dimensional (3D) x-ray breast imaging systems, such as digital breast tomosynthesis (DBT) and computed tomography, have drawn much attention from the medical imaging community, either academia or industry. However, there is still much room for understanding how to best optimize and evaluate the devices over a large space of many different system parameters and geometries. Current evaluation methods, which work well for 2D systems, do not incorporate the depth information from the 3D imaging systems. Therefore, it is critical to develop a statistically sound evaluation method to investigate the usefulness of inclusion of depth and background-variability information into the assessment and optimization of the 3D systems. Methods: In this paper, we present a mathematical framework for a statistical assessment of planar and 3D x-ray breast imaging systems. Our method is based on statistical decision theory, in particular, making use of the ideal linear observer called the Hotelling observer. We also present a physical phantom that consists of spheres of different sizes and materials for producing an ensemble of randomly varying backgrounds to be imaged for a given patient class. Lastly, we demonstrate our evaluation method in comparing laboratory mammography and three-angle DBT systems for signal detection tasks using the phantom's projection data. We compare the variable phantom case to that of a phantom of the same dimensions filled with water, which we call the uniform phantom, based on the performance of the Hotelling observer as a function of signal size and intensity. Results: Detectability trends calculated using the variable and uniform phantom methods are different from each other for both mammography and DBT systems. Conclusions: Our results indicate that measuring the system's detection performance with consideration of background variability may lead to differences in system performance
Mathematical statistics and stochastic processes

CERN Document Server

Bosq, Denis

2013-01-01

Generally, books on mathematical statistics are restricted to the case of independent identically distributed random variables. In this book however, both this case AND the case of dependent variables, i.e. statistics for discrete and continuous time processes, are studied. This second case is very important for today's practitioners.Mathematical Statistics and Stochastic Processes is based on decision theory and asymptotic statistics and contains up-to-date information on the relevant topics of theory of probability, estimation, confidence intervals, non-parametric statistics and rob
Analysis and classification of ECG-waves and rhythms using circular statistics and vector strength

Directory of Open Access Journals (Sweden)

Janßen Jan-Dirk

2017-09-01

Full Text Available The most common way to analyse heart rhythm is to calculate the RR-interval and the heart rate variability. For further evaluation, descriptive statistics are often used. Here we introduce a new and more natural heart rhythm analysis tool that is based on circular statistics and vector strength. Vector strength is a tool to measure the periodicity or lack of periodicity of a signal. We divide the signal into non-overlapping window segments and project the detected R-waves around the unit circle using the complex exponential function and the median RR-interval. In addition, we calculate the vector strength and apply circular statistics as wells as an angular histogram on the R-wave vectors. This approach enables an intuitive visualization and analysis of rhythmicity. Our results show that ECG-waves and rhythms can be easily visualized, analysed and classified by circular statistics and vector strength.
Reducing Inter-Laboratory Differences between Semen Analyses Using Z Score and Regression Transformations

Directory of Open Access Journals (Sweden)

Esther Leushuis

2016-12-01

Full Text Available Background: Standardization of the semen analysis may improve reproducibility. We assessed variability between laboratories in semen analyses and evaluated whether a transformation using Z scores and regression statistics was able to reduce this variability. Materials and Methods: We performed a retrospective cohort study. We calculated between-laboratory coefficients of variation (CVB for sperm concentration and for morphology. Subsequently, we standardized the semen analysis results by calculating laboratory specific Z scores, and by using regression. We used analysis of variance for four semen parameters to assess systematic differences between laboratories before and after the transformations, both in the circulation samples and in the samples obtained in the prospective cohort study in the Netherlands between January 2002 and February 2004. Results: The mean CVB was 7% for sperm concentration (range 3 to 13% and 32% for sperm morphology (range 18 to 51%. The differences between the laboratories were statistically significant for all semen parameters (all P<0.001. Standardization using Z scores did not reduce the differences in semen analysis results between the laboratories (all P<0.001. Conclusion: There exists large between-laboratory variability for sperm morphology and small, but statistically significant, between-laboratory variation for sperm concentration. Standardization using Z scores does not eliminate between-laboratory variability.
A Meta-Meta-Analysis: Empirical Review of Statistical Power, Type I Error Rates, Effect Sizes, and Model Selection of Meta-Analyses Published in Psychology

Science.gov (United States)

Cafri, Guy; Kromrey, Jeffrey D.; Brannick, Michael T.

2010-01-01

This article uses meta-analyses published in "Psychological Bulletin" from 1995 to 2005 to describe meta-analyses in psychology, including examination of statistical power, Type I errors resulting from multiple comparisons, and model choice. Retrospective power estimates indicated that univariate categorical and continuous moderators, individual…
Municipal solid waste composition: Sampling methodology, statistical analyses, and case study evaluation

International Nuclear Information System (INIS)

Edjabou, Maklawe Essonanawe; Jensen, Morten Bang; Götze, Ramona; Pivnenko, Kostyantyn; Petersen, Claus; Scheutz, Charlotte; Astrup, Thomas Fruergaard

2015-01-01

Highlights: • Tiered approach to waste sorting ensures flexibility and facilitates comparison of solid waste composition data. • Food and miscellaneous wastes are the main fractions contributing to the residual household waste. • Separation of food packaging from food leftovers during sorting is not critical for determination of the solid waste composition. - Abstract: Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10–50 waste fractions, organised according to a three-level (tiered approach) facilitating comparison of the waste data between individual sub-areas with different fractionation (waste from one municipality was sorted at “Level III”, e.g. detailed, while the two others were sorted only at “Level I”). The results showed that residual household waste mainly contained food waste (42 ± 5%, mass per wet basis) and miscellaneous combustibles (18 ± 3%, mass per wet basis). The residual household waste generation rate in the study areas was 3–4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three municipalities. While the waste generation rates were similar for each of the two housing types (single
Municipal solid waste composition: Sampling methodology, statistical analyses, and case study evaluation

Energy Technology Data Exchange (ETDEWEB)

Edjabou, Maklawe Essonanawe, E-mail: vine@env.dtu.dk [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark); Jensen, Morten Bang; Götze, Ramona; Pivnenko, Kostyantyn [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark); Petersen, Claus [Econet AS, Omøgade 8, 2.sal, 2100 Copenhagen (Denmark); Scheutz, Charlotte; Astrup, Thomas Fruergaard [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark)

2015-02-15

Highlights: • Tiered approach to waste sorting ensures flexibility and facilitates comparison of solid waste composition data. • Food and miscellaneous wastes are the main fractions contributing to the residual household waste. • Separation of food packaging from food leftovers during sorting is not critical for determination of the solid waste composition. - Abstract: Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10–50 waste fractions, organised according to a three-level (tiered approach) facilitating comparison of the waste data between individual sub-areas with different fractionation (waste from one municipality was sorted at “Level III”, e.g. detailed, while the two others were sorted only at “Level I”). The results showed that residual household waste mainly contained food waste (42 ± 5%, mass per wet basis) and miscellaneous combustibles (18 ± 3%, mass per wet basis). The residual household waste generation rate in the study areas was 3–4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three municipalities. While the waste generation rates were similar for each of the two housing types (single
Testing Genetic Pleiotropy with GWAS Summary Statistics for Marginal and Conditional Analyses.

Science.gov (United States)

Deng, Yangqing; Pan, Wei

2017-12-01

There is growing interest in testing genetic pleiotropy, which is when a single genetic variant influences multiple traits. Several methods have been proposed; however, these methods have some limitations. First, all the proposed methods are based on the use of individual-level genotype and phenotype data; in contrast, for logistical, and other, reasons, summary statistics of univariate SNP-trait associations are typically only available based on meta- or mega-analyzed large genome-wide association study (GWAS) data. Second, existing tests are based on marginal pleiotropy, which cannot distinguish between direct and indirect associations of a single genetic variant with multiple traits due to correlations among the traits. Hence, it is useful to consider conditional analysis, in which a subset of traits is adjusted for another subset of traits. For example, in spite of substantial lowering of low-density lipoprotein cholesterol (LDL) with statin therapy, some patients still maintain high residual cardiovascular risk, and, for these patients, it might be helpful to reduce their triglyceride (TG) level. For this purpose, in order to identify new therapeutic targets, it would be useful to identify genetic variants with pleiotropic effects on LDL and TG after adjusting the latter for LDL; otherwise, a pleiotropic effect of a genetic variant detected by a marginal model could simply be due to its association with LDL only, given the well-known correlation between the two types of lipids. Here, we develop a new pleiotropy testing procedure based only on GWAS summary statistics that can be applied for both marginal analysis and conditional analysis. Although the main technical development is based on published union-intersection testing methods, care is needed in specifying conditional models to avoid invalid statistical estimation and inference. In addition to the previously used likelihood ratio test, we also propose using generalized estimating equations under the

Comparative biochemical analyses of venous blood and peritoneal fluid from horses with colic using a portable analyser and an in-house analyser.

Science.gov (United States)

Saulez, M N; Cebra, C K; Dailey, M

2005-08-20

Fifty-six horses with colic were examined over a period of three months. The concentrations of glucose, lactate, sodium, potassium and chloride, and the pH of samples of blood and peritoneal fluid, were determined with a portable clinical analyser and with an in-house analyser and the results were compared. Compared with the in-house analyser, the portable analyser gave higher pH values for blood and peritoneal fluid with greater variability in the alkaline range, and lower pH values in the acidic range, lower concentrations of glucose in the range below 8.3 mmol/l, and lower concentrations of lactate in venous blood in the range below 5 mmol/l and in peritoneal fluid in the range below 2 mmol/l, with less variability. On average, the portable analyser underestimated the concentrations of lactate and glucose in peritoneal fluid in comparison with the in-house analyser. Its measurements of the concentrations of sodium and chloride in peritoneal fluid had a higher bias and were more variable than the measurements in venous blood, and its measurements of potassium in venous blood and peritoneal fluid had a smaller bias and less variability than the measurements made with the in-house analyser.
Temporal scaling and spatial statistical analyses of groundwater level fluctuations

Science.gov (United States)

Sun, H.; Yuan, L., Sr.; Zhang, Y.

2017-12-01

Natural dynamics such as groundwater level fluctuations can exhibit multifractionality and/or multifractality due likely to multi-scale aquifer heterogeneity and controlling factors, whose statistics requires efficient quantification methods. This study explores multifractionality and non-Gaussian properties in groundwater dynamics expressed by time series of daily level fluctuation at three wells located in the lower Mississippi valley, after removing the seasonal cycle in the temporal scaling and spatial statistical analysis. First, using the time-scale multifractional analysis, a systematic statistical method is developed to analyze groundwater level fluctuations quantified by the time-scale local Hurst exponent (TS-LHE). Results show that the TS-LHE does not remain constant, implying the fractal-scaling behavior changing with time and location. Hence, we can distinguish the potentially location-dependent scaling feature, which may characterize the hydrology dynamic system. Second, spatial statistical analysis shows that the increment of groundwater level fluctuations exhibits a heavy tailed, non-Gaussian distribution, which can be better quantified by a Lévy stable distribution. Monte Carlo simulations of the fluctuation process also show that the linear fractional stable motion model can well depict the transient dynamics (i.e., fractal non-Gaussian property) of groundwater level, while fractional Brownian motion is inadequate to describe natural processes with anomalous dynamics. Analysis of temporal scaling and spatial statistics therefore may provide useful information and quantification to understand further the nature of complex dynamics in hydrology.
Review of Statistical Analyses Resulting from Performance of HLDWD-DWPF-005

International Nuclear Information System (INIS)

Beck, R.S.

1997-01-01

The Engineering Department at the Defense Waste Processing Facility (DWPF) has reviewed two reports from the Statistical Consulting Section (SCS) involving the statistical analysis of test results for analysis of small sample inserts (references 1 ampersand 2). The test results cover two proposed analytical methods, a room temperature hydrofluoric acid preparation (Cold Chem) and a sodium peroxide/sodium hydroxide fusion modified for insert samples (Modified Fusion). The reports support implementation of the proposed small sample containers and analytical methods at DWPF. Hydragard sampler valve performance was typical of previous results (reference 3). Using an element from each major feed stream. lithium from the frit and iron from the sludge, the sampler was determined to deliver a uniform mixture in either sample container.The lithium to iron ratios were equivalent for the standard 15 ml vial and the 3 ml insert.The proposed method provide equivalent analyses as compared to the current methods. The biases associated with the proposed methods on a vitrified basis are less than 5% for major elements. The sum of oxides for the proposed method compares favorably with the sum of oxides for the conventional methods. However, the average sum of oxides for the Cold Chem method was 94.3% which is below the minimum required recovery of 95%. Both proposed methods, cold Chem and Modified Fusion, will be required at first to provide an accurate analysis which will routinely meet the 95% and 105% average sum of oxides limit for Product Composition Control System (PCCS).Issued to be resolved during phased implementation are as follows: (1) Determine calcine/vitrification factor for radioactive feed; (2) Evaluate covariance matrix change against process operating ranges to determine optimum sample size; (3) Evaluate sources for low sum of oxides; and (4) Improve remote operability of production versions of equipment and instruments for installation in 221-S.The specifics of
Internal variability in a regional climate model over West Africa

Energy Technology Data Exchange (ETDEWEB)

Vanvyve, Emilie; Ypersele, Jean-Pascal van [Universite catholique de Louvain, Institut d' astronomie et de geophysique Georges Lemaitre, Louvain-la-Neuve (Belgium); Hall, Nicholas [Laboratoire d' Etudes en Geophysique et Oceanographie Spatiales/Centre National d' Etudes Spatiales, Toulouse Cedex 9 (France); Messager, Christophe [University of Leeds, Institute for Atmospheric Science, Environment, School of Earth and Environment, Leeds (United Kingdom); Leroux, Stephanie [Universite Joseph Fourier, Laboratoire d' etude des Transferts en Hydrologie et Environnement, BP53, Grenoble Cedex 9 (France)

2008-02-15

Sensitivity studies with regional climate models are often performed on the basis of a few simulations for which the difference is analysed and the statistical significance is often taken for granted. In this study we present some simple measures of the confidence limits for these types of experiments by analysing the internal variability of a regional climate model run over West Africa. Two 1-year long simulations, differing only in their initial conditions, are compared. The difference between the two runs gives a measure of the internal variability of the model and an indication of which timescales are reliable for analysis. The results are analysed for a range of timescales and spatial scales, and quantitative measures of the confidence limits for regional model simulations are diagnosed for a selection of study areas for rainfall, low level temperature and wind. As the averaging period or spatial scale is increased, the signal due to internal variability gets smaller and confidence in the simulations increases. This occurs more rapidly for variations in precipitation, which appear essentially random, than for dynamical variables, which show some organisation on larger scales. (orig.)
Analysing inter-relationships among water, governance, human development variables in developing countries

Science.gov (United States)

Dondeynaz, C.; Carmona Moreno, C.; Céspedes Lorente, J. J.

2012-10-01

The "Integrated Water Resources Management" principle was formally laid down at the International Conference on Water and Sustainable development in Dublin 1992. One of the main results of this conference is that improving Water and Sanitation Services (WSS), being a complex and interdisciplinary issue, passes through collaboration and coordination of different sectors (environment, health, economic activities, governance, and international cooperation). These sectors influence or are influenced by the access to WSS. The understanding of these interrelations appears as crucial for decision makers in the water sector. In this framework, the Joint Research Centre (JRC) of the European Commission (EC) has developed a new database (WatSan4Dev database) containing 42 indicators (called variables in this paper) from environmental, socio-economic, governance and financial aid flows data in developing countries. This paper describes the development of the WatSan4Dev dataset, the statistical processes needed to improve the data quality, and finally, the analysis to verify the database coherence is presented. Based on 25 relevant variables, the relationships between variables are described and organised into five factors (HDP - Human Development against Poverty, AP - Human Activity Pressure on water resources, WR - Water Resources, ODA - Official Development Aid, CEC - Country Environmental Concern). Linear regression methods are used to identify key variables having influence on water supply and sanitation. First analysis indicates that the informal urbanisation development is an important factor negatively influencing the percentage of the population having access to WSS. Health, and in particular children's health, benefits from the improvement of WSS. Irrigation is also enhancing Water Supply service thanks to multi-purpose infrastructure. Five country profiles are also created to deeper understand and synthetize the amount of information gathered. This new
Analysing inter-relationships among water, governance, human development variables in developing countries

Directory of Open Access Journals (Sweden)

C. Dondeynaz

2012-10-01

Full Text Available The "Integrated Water Resources Management" principle was formally laid down at the International Conference on Water and Sustainable development in Dublin 1992. One of the main results of this conference is that improving Water and Sanitation Services (WSS, being a complex and interdisciplinary issue, passes through collaboration and coordination of different sectors (environment, health, economic activities, governance, and international cooperation. These sectors influence or are influenced by the access to WSS. The understanding of these interrelations appears as crucial for decision makers in the water sector. In this framework, the Joint Research Centre (JRC of the European Commission (EC has developed a new database (WatSan4Dev database containing 42 indicators (called variables in this paper from environmental, socio-economic, governance and financial aid flows data in developing countries. This paper describes the development of the WatSan4Dev dataset, the statistical processes needed to improve the data quality, and finally, the analysis to verify the database coherence is presented. Based on 25 relevant variables, the relationships between variables are described and organised into five factors (HDP – Human Development against Poverty, AP – Human Activity Pressure on water resources, WR – Water Resources, ODA – Official Development Aid, CEC – Country Environmental Concern. Linear regression methods are used to identify key variables having influence on water supply and sanitation. First analysis indicates that the informal urbanisation development is an important factor negatively influencing the percentage of the population having access to WSS. Health, and in particular children's health, benefits from the improvement of WSS. Irrigation is also enhancing Water Supply service thanks to multi-purpose infrastructure. Five country profiles are also created to deeper understand and synthetize the amount of information gathered
Conceptual and statistical problems associated with the use of diversity indices in ecology.

Science.gov (United States)

Barrantes, Gilbert; Sandoval, Luis

2009-09-01

Diversity indices, particularly the Shannon-Wiener index, have extensively been used in analyzing patterns of diversity at different geographic and ecological scales. These indices have serious conceptual and statistical problems which make comparisons of species richness or species abundances across communities nearly impossible. There is often no a single statistical method that retains all information needed to answer even a simple question. However, multivariate analyses could be used instead of diversity indices, such as cluster analyses or multiple regressions. More complex multivariate analyses, such as Canonical Correspondence Analysis, provide very valuable information on environmental variables associated to the presence and abundance of the species in a community. In addition, particular hypotheses associated to changes in species richness across localities, or change in abundance of one, or a group of species can be tested using univariate, bivariate, and/or rarefaction statistical tests. The rarefaction method has proved to be robust to standardize all samples to a common size. Even the simplest method as reporting the number of species per taxonomic category possibly provides more information than a diversity index value.
Exploratory study on a statistical method to analyse time resolved data obtained during nanomaterial exposure measurements

International Nuclear Information System (INIS)

Clerc, F; Njiki-Menga, G-H; Witschger, O

2013-01-01

Most of the measurement strategies that are suggested at the international level to assess workplace exposure to nanomaterials rely on devices measuring, in real time, airborne particles concentrations (according different metrics). Since none of the instruments to measure aerosols can distinguish a particle of interest to the background aerosol, the statistical analysis of time resolved data requires special attention. So far, very few approaches have been used for statistical analysis in the literature. This ranges from simple qualitative analysis of graphs to the implementation of more complex statistical models. To date, there is still no consensus on a particular approach and the current period is always looking for an appropriate and robust method. In this context, this exploratory study investigates a statistical method to analyse time resolved data based on a Bayesian probabilistic approach. To investigate and illustrate the use of the this statistical method, particle number concentration data from a workplace study that investigated the potential for exposure via inhalation from cleanout operations by sandpapering of a reactor producing nanocomposite thin films have been used. In this workplace study, the background issue has been addressed through the near-field and far-field approaches and several size integrated and time resolved devices have been used. The analysis of the results presented here focuses only on data obtained with two handheld condensation particle counters. While one was measuring at the source of the released particles, the other one was measuring in parallel far-field. The Bayesian probabilistic approach allows a probabilistic modelling of data series, and the observed task is modelled in the form of probability distributions. The probability distributions issuing from time resolved data obtained at the source can be compared with the probability distributions issuing from the time resolved data obtained far-field, leading in a
Developing a spatial-statistical model and map of historical malaria prevalence in Botswana using a staged variable selection procedure

Directory of Open Access Journals (Sweden)

Mabaso Musawenkosi LH

2007-09-01

Full Text Available Abstract Background Several malaria risk maps have been developed in recent years, many from the prevalence of infection data collated by the MARA (Mapping Malaria Risk in Africa project, and using various environmental data sets as predictors. Variable selection is a major obstacle due to analytical problems caused by over-fitting, confounding and non-independence in the data. Testing and comparing every combination of explanatory variables in a Bayesian spatial framework remains unfeasible for most researchers. The aim of this study was to develop a malaria risk map using a systematic and practicable variable selection process for spatial analysis and mapping of historical malaria risk in Botswana. Results Of 50 potential explanatory variables from eight environmental data themes, 42 were significantly associated with malaria prevalence in univariate logistic regression and were ranked by the Akaike Information Criterion. Those correlated with higher-ranking relatives of the same environmental theme, were temporarily excluded. The remaining 14 candidates were ranked by selection frequency after running automated step-wise selection procedures on 1000 bootstrap samples drawn from the data. A non-spatial multiple-variable model was developed through step-wise inclusion in order of selection frequency. Previously excluded variables were then re-evaluated for inclusion, using further step-wise bootstrap procedures, resulting in the exclusion of another variable. Finally a Bayesian geo-statistical model using Markov Chain Monte Carlo simulation was fitted to the data, resulting in a final model of three predictor variables, namely summer rainfall, mean annual temperature and altitude. Each was independently and significantly associated with malaria prevalence after allowing for spatial correlation. This model was used to predict malaria prevalence at unobserved locations, producing a smooth risk map for the whole country. Conclusion We have
Statistics 101 for Radiologists.

Science.gov (United States)

Anvari, Arash; Halpern, Elkan F; Samir, Anthony E

2015-10-01

Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced. © RSNA, 2015.
Design and Statistics in Quantitative Translation (Process) Research

DEFF Research Database (Denmark)

Balling, Laura Winther; Hvelplund, Kristian Tangsgaard

2015-01-01

Traditionally, translation research has been qualitative, but quantitative research is becoming increasingly important, especially in translation process research but also in other areas of translation studies. This poses problems to many translation scholars since this way of thinking...... is unfamiliar. In this article, we attempt to mitigate these problems by outlining our approach to good quantitative research, all the way from research questions and study design to data preparation and statistics. We concentrate especially on the nature of the variables involved, both in terms of their scale...... and their role in the design; this has implications for both design and choice of statistics. Although we focus on quantitative research, we also argue that such research should be supplemented with qualitative analyses and considerations of the translation product....
gHRV: Heart rate variability analysis made easy.

Science.gov (United States)

Rodríguez-Liñares, L; Lado, M J; Vila, X A; Méndez, A J; Cuesta, P

2014-08-01

In this paper, the gHRV software tool is presented. It is a simple, free and portable tool developed in python for analysing heart rate variability. It includes a graphical user interface and it can import files in multiple formats, analyse time intervals in the signal, test statistical significance and export the results. This paper also contains, as an example of use, a clinical analysis performed with the gHRV tool, namely to determine whether the heart rate variability indexes change across different stages of sleep. Results from tests completed by researchers who have tried gHRV are also explained: in general the application was positively valued and results reflect a high level of satisfaction. gHRV is in continuous development and new versions will include suggestions made by testers. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Statistical methods for the analysis of left-censored variables [Statistische Analysemethoden für linkszensierte Variablen und Beobachtungen mit Werten unterhalb einer Bestimmungs- oder Nachweisgrenze

Directory of Open Access Journals (Sweden)

Pesch, Beate

2013-03-01

Full Text Available [english] In some applications statisticians are confronted with values which are reported to be below a limit of detection or quantitation. These left-censored variables are a challenge in the statistical analysis. In a simulation study, we compare different methods to deal with this type of data in statistical applications. These include measures of location, dispersion, association, and statistical modeling. Our simulation study showed that the multiple imputation approach and the Tobit regression lead to unbiased estimates, whereas the naïve methods including simple substitution of non-detects lead to unreliable estimates. We illustrate the application of the multiple imputation approach and the Tobit regression with an example from occupational epidemiology. [german] In der statistischen Praxis treten immer wieder Variablen mit Werten unterhalb einer Bestimmungs- oder Nachweisgrenze auf. Diese sind linkszensiert und stellen daher eine Herausforderung für die statistische Analyse dar. Im Rahmen einer Simulationsstudie vergleichen wir Schätzmethoden zur Berechnung von Lage- und Streuungmaßen, Korrelationen und Regressionsparametern bei diesen Variablen. Unsere Ergebnisse zeigen, dass die multiple Imputationsmethode und die Tobit Regression zu unverzerrten Schätzungen führen. Naive Methoden, einschließlich der einfachen Substitution von zensierten Beobachtungen, ergeben hingegen unzuverlässige Schätzungen. Wir illustrieren die Anwendung der multiplen Imputationsmethode und der Tobit Regression anhand eines Beispiels aus der Epidemiologie der Arbeitswelt.
ICUD-0147 Extreme event statistics of urban pluvial floods – Return period assessment and rainfall variability impacts

DEFF Research Database (Denmark)

Tuyls, Damian Murla; Nielsen, Rasmus; Thorndahl, Søren Liedtke

2017-01-01

A return period assessment of urban flood has been performed and its adhered impact of rainfall variability studied over a urban drainage catchment area in Aalborg, Denmark. Recorded rainfall from 7 rain gauges has been used, located in a range of 7.5Km and for a period varying form 18-37 years....... Return period of rainfall and flood at catchment and local scale has been estimated, its derived ambiguities analysed and the variability of rain gauge based rainfall investigated regarding to flood estimation results. Results show a clear contrast between rainfall and flood return period estimates...
Explicit estimating equations for semiparametric generalized linear latent variable models

KAUST Repository

Ma, Yanyuan

2010-07-05

We study generalized linear latent variable models without requiring a distributional assumption of the latent variables. Using a geometric approach, we derive consistent semiparametric estimators. We demonstrate that these models have a property which is similar to that of a sufficient complete statistic, which enables us to simplify the estimating procedure and explicitly to formulate the semiparametric estimating equations. We further show that the explicit estimators have the usual root n consistency and asymptotic normality. We explain the computational implementation of our method and illustrate the numerical performance of the estimators in finite sample situations via extensive simulation studies. The advantage of our estimators over the existing likelihood approach is also shown via numerical comparison. We employ the method to analyse a real data example from economics. © 2010 Royal Statistical Society.
The statistical process control methods - SPC

Directory of Open Access Journals (Sweden)

Floreková Ľubica

1998-03-01

Full Text Available Methods of statistical evaluation of quality SPC (item 20 of the documentation system of quality control of ISO norm, series 900 of various processes, products and services belong amongst basic qualitative methods that enable us to analyse and compare data pertaining to various quantitative parameters. Also they enable, based on the latter, to propose suitable interventions with the aim of improving these processes, products and services. Theoretical basis and applicatibily of the principles of the: - diagnostics of a cause and effects, - Paret analysis and Lorentz curve, - number distribution and frequency curves of random variable distribution, - Shewhart regulation charts, are presented in the contribution.
The effect of a graphical interpretation of a statistic trend indicator (Trigg's Tracking Variable) on the detection of simulated changes.

Science.gov (United States)

Kennedy, R R; Merry, A F

2011-09-01

Anaesthesia involves processing large amounts of information over time. One task of the anaesthetist is to detect substantive changes in physiological variables promptly and reliably. It has been previously demonstrated that a graphical trend display of historical data leads to more rapid detection of such changes. We examined the effect of a graphical indication of the magnitude of Trigg's Tracking Variable, a simple statistically based trend detection algorithm, on the accuracy and latency of the detection of changes in a micro-simulation. Ten anaesthetists each viewed 20 simulations with four variables displayed as the current value with a simple graphical trend display. Values for these variables were generated by a computer model, and updated every second; after a period of stability a change occurred to a new random value at least 10 units from baseline. In 50% of the simulations an indication of the rate of change was given by a five level graphical representation of the value of Trigg's Tracking Variable. Participants were asked to indicate when they thought a change was occurring. Changes were detected 10.9% faster with the trend indicator present (mean 13.1 [SD 3.1] cycles vs 14.6 [SD 3.4] cycles, 95% confidence interval 0.4 to 2.5 cycles, P = 0.013. There was no difference in accuracy of detection (median with trend detection 97% [interquartile range 95 to 100%], without trend detection 100% [98 to 100%]), P = 0.8. We conclude that simple statistical trend detection may speed detection of changes during routine anaesthesia, even when a graphical trend display is present.
A simple technique investigating baseline heterogeneity helped to eliminate potential bias in meta-analyses.

Science.gov (United States)

Hicks, Amy; Fairhurst, Caroline; Torgerson, David J

2018-03-01

To perform a worked example of an approach that can be used to identify and remove potentially biased trials from meta-analyses via the analysis of baseline variables. True randomisation produces treatment groups that differ only by chance; therefore, a meta-analysis of a baseline measurement should produce no overall difference and zero heterogeneity. A meta-analysis from the British Medical Journal, known to contain significant heterogeneity and imbalance in baseline age, was chosen. Meta-analyses of baseline variables were performed and trials systematically removed, starting with those with the largest t-statistic, until the I 2 measure of heterogeneity became 0%, then the outcome meta-analysis repeated with only the remaining trials as a sensitivity check. We argue that heterogeneity in a meta-analysis of baseline variables should not exist, and therefore removing trials which contribute to heterogeneity from a meta-analysis will produce a more valid result. In our example none of the overall outcomes changed when studies contributing to heterogeneity were removed. We recommend routine use of this technique, using age and a second baseline variable predictive of outcome for the particular study chosen, to help eliminate potential bias in meta-analyses. Copyright © 2017 Elsevier Inc. All rights reserved.
Challenge in Enhancing the Teaching and Learning of Variable Measurements in Quantitative Research

Science.gov (United States)

Kee, Chang Peng; Osman, Kamisah; Ahmad, Fauziah

2013-01-01

Statistical analysis is one component that cannot be avoided in a quantitative research. Initial observations noted that students in higher education institution faced difficulty analysing quantitative data which were attributed to the confusions of various variable measurements. This paper aims to compare the outcomes of two approaches applied in…
Immature granulocyte detection by the SE-9000 haematology analyser during pregnancy.

Science.gov (United States)

Fernández-Suárez, A; Pascual, V T; Gimenez, M T F; Hernández, J F S

2003-12-01

The objective of this study was to determine the nature of the alarm for immature granulocytes appearing in haemograms from pregnant women, as detected by the immature cell information channel (IMI) of the SE-9000 automated haematology analyser. Of all tests run on pregnant women in a 4-month period (n = 698), the first 100 haemograms with immature granulocyte alarms (14.33%) were collected. Each of these samples was then stained with Wright-Giemsa stain. The following variables were also analysed: age of the mother, trimester and days of gestation, type of delivery, weight and sex of the baby, and Apgar score. Most pregnant women were in the third trimester of gestation (82%) when an alarm was noted on the IMI channel. Of the patients, 62% had normal deliveries. The most frequent complication was obstructed delivery (23%). Mean percentages by microscopic counts of band cells, metamyelocytes, and myelocytes were 2.99, 0.45, and 0.19%, respectively. There was a statistically significant correlation for all cell types between the SE-9000 and the manual count method. No association was observed between the presence of immature granulocytes and the clinical variables analysed. The SE-9000 analyser shows high sensitivity in the IMI channel for detection of immature forms.

Using statistical inference for decision making in best estimate analyses

International Nuclear Information System (INIS)

Sermer, P.; Weaver, K.; Hoppe, F.; Olive, C.; Quach, D.

2008-01-01

For broad classes of safety analysis problems, one needs to make decisions when faced with randomly varying quantities which are also subject to errors. The means for doing this involves a statistical approach which takes into account the nature of the physical problems, and the statistical constraints they impose. We describe the methodology for doing this which has been developed at Nuclear Safety Solutions, and we draw some comparisons to other methods which are commonly used in Canada and internationally. Our methodology has the advantages of being robust and accurate and compares favourably to other best estimate methods. (author)
The Statistical Fermi Paradox

Science.gov (United States)

Maccone, C.

In this paper is provided the statistical generalization of the Fermi paradox. The statistics of habitable planets may be based on a set of ten (and possibly more) astrobiological requirements first pointed out by Stephen H. Dole in his book Habitable planets for man (1964). The statistical generalization of the original and by now too simplistic Dole equation is provided by replacing a product of ten positive numbers by the product of ten positive random variables. This is denoted the SEH, an acronym standing for “Statistical Equation for Habitables”. The proof in this paper is based on the Central Limit Theorem (CLT) of Statistics, stating that the sum of any number of independent random variables, each of which may be ARBITRARILY distributed, approaches a Gaussian (i.e. normal) random variable (Lyapunov form of the CLT). It is then shown that: 1. The new random variable NHab, yielding the number of habitables (i.e. habitable planets) in the Galaxy, follows the log- normal distribution. By construction, the mean value of this log-normal distribution is the total number of habitable planets as given by the statistical Dole equation. 2. The ten (or more) astrobiological factors are now positive random variables. The probability distribution of each random variable may be arbitrary. The CLT in the so-called Lyapunov or Lindeberg forms (that both do not assume the factors to be identically distributed) allows for that. In other words, the CLT "translates" into the SEH by allowing an arbitrary probability distribution for each factor. This is both astrobiologically realistic and useful for any further investigations. 3. By applying the SEH it is shown that the (average) distance between any two nearby habitable planets in the Galaxy may be shown to be inversely proportional to the cubic root of NHab. This distance is denoted by new random variable D. The relevant probability density function is derived, which was named the "Maccone distribution" by Paul Davies in
Statistical Diversions

Science.gov (United States)

Petocz, Peter; Sowey, Eric

2012-01-01

The term "data snooping" refers to the practice of choosing which statistical analyses to apply to a set of data after having first looked at those data. Data snooping contradicts a fundamental precept of applied statistics, that the scheme of analysis is to be planned in advance. In this column, the authors shall elucidate the…
Introduction to Bayesian statistics

CERN Document Server

Bolstad, William M

2017-01-01

There is a strong upsurge in the use of Bayesian methods in applied statistical analysis, yet most introductory statistics texts only present frequentist methods. Bayesian statistics has many important advantages that students should learn about if they are going into fields where statistics will be used. In this Third Edition, four newly-added chapters address topics that reflect the rapid advances in the field of Bayesian staistics. The author continues to provide a Bayesian treatment of introductory statistical topics, such as scientific data gathering, discrete random variables, robust Bayesian methods, and Bayesian approaches to inferenfe cfor discrete random variables, bionomial proprotion, Poisson, normal mean, and simple linear regression. In addition, newly-developing topics in the field are presented in four new chapters: Bayesian inference with unknown mean and variance; Bayesian inference for Multivariate Normal mean vector; Bayesian inference for Multiple Linear RegressionModel; and Computati...
Essentials of Excel, Excel VBA, SAS and Minitab for statistical and financial analyses

CERN Document Server

Lee, Cheng-Few; Chang, Jow-Ran; Tai, Tzu

2016-01-01

This introductory textbook for business statistics teaches statistical analysis and research methods via business case studies and financial data using Excel, MINITAB, and SAS. Every chapter in this textbook engages the reader with data of individual stock, stock indices, options, and futures. One studies and uses statistics to learn how to study, analyze, and understand a data set of particular interest. Some of the more popular statistical programs that have been developed to use statistical and computational methods to analyze data sets are SAS, SPSS, and MINITAB. Of those, we look at MINITAB and SAS in this textbook. One of the main reasons to use MINITAB is that it is the easiest to use among the popular statistical programs. We look at SAS because it is the leading statistical package used in industry. We also utilize the much less costly and ubiquitous Microsoft Excel to do statistical analysis, as the benefits of Excel have become widely recognized in the academic world and its analytical capabilities...
Statistical data fusion for cross-tabulation

NARCIS (Netherlands)

Kamakura, W.A.; Wedel, M.

The authors address the situation in which a researcher wants to cross-tabulate two sets of discrete variables collected in independent samples, but a subset of the variables is common to both samples. The authors propose a statistical data-fusion model that allows for statistical tests of
Variability in reaction time performance of younger and older adults.

Science.gov (United States)

Hultsch, David F; MacDonald, Stuart W S; Dixon, Roger A

2002-03-01

Age differences in three basic types of variability were examined: variability between persons (diversity), variability within persons across tasks (dispersion), and variability within persons across time (inconsistency). Measures of variability were based on latency performance from four measures of reaction time (RT) performed by a total of 99 younger adults (ages 17--36 years) and 763 older adults (ages 54--94 years). Results indicated that all three types of variability were greater in older compared with younger participants even when group differences in speed were statistically controlled. Quantile-quantile plots showed age and task differences in the shape of the inconsistency distributions. Measures of within-person variability (dispersion and inconsistency) were positively correlated. Individual differences in RT inconsistency correlated negatively with level of performance on measures of perceptual speed, working memory, episodic memory, and crystallized abilities. Partial set correlation analyses indicated that inconsistency predicted cognitive performance independent of level of performance. The results indicate that variability of performance is an important indicator of cognitive functioning and aging.
Two independent pivotal statistics that test location and misspecification and add-up to the Anderson-Rubin statistic

NARCIS (Netherlands)

Kleibergen, F.R.

2002-01-01

We extend the novel pivotal statistics for testing the parameters in the instrumental variables regression model. We show that these statistics result from a decomposition of the Anderson-Rubin statistic into two independent pivotal statistics. The first statistic is a score statistic that tests
Statistics for experimentalists

CERN Document Server

Cooper, B E

2014-01-01

Statistics for Experimentalists aims to provide experimental scientists with a working knowledge of statistical methods and search approaches to the analysis of data. The book first elaborates on probability and continuous probability distributions. Discussions focus on properties of continuous random variables and normal variables, independence of two random variables, central moments of a continuous distribution, prediction from a normal distribution, binomial probabilities, and multiplication of probabilities and independence. The text then examines estimation and tests of significance. Topics include estimators and estimates, expected values, minimum variance linear unbiased estimators, sufficient estimators, methods of maximum likelihood and least squares, and the test of significance method. The manuscript ponders on distribution-free tests, Poisson process and counting problems, correlation and function fitting, balanced incomplete randomized block designs and the analysis of covariance, and experiment...
Practical Statistics

CERN Document Server

Lyons, L.

2016-01-01

Accelerators and detectors are expensive, both in terms of money and human effort. It is thus important to invest effort in performing a good statistical anal- ysis of the data, in order to extract the best information from it. This series of five lectures deals with practical aspects of statistical issues that arise in typical High Energy Physics analyses.
Statistical Modeling Approach to Quantitative Analysis of Interobserver Variability in Breast Contouring

Energy Technology Data Exchange (ETDEWEB)

Yang, Jinzhong, E-mail: jyang4@mdanderson.org [Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas (United States); Woodward, Wendy A.; Reed, Valerie K.; Strom, Eric A.; Perkins, George H.; Tereffe, Welela; Buchholz, Thomas A. [Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas (United States); Zhang, Lifei; Balter, Peter; Court, Laurence E. [Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas (United States); Li, X. Allen [Department of Radiation Oncology, Medical College of Wisconsin, Milwaukee, Wisconsin (United States); Dong, Lei [Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas (United States); Scripps Proton Therapy Center, San Diego, California (United States)

2014-05-01

Purpose: To develop a new approach for interobserver variability analysis. Methods and Materials: Eight radiation oncologists specializing in breast cancer radiation therapy delineated a patient's left breast “from scratch” and from a template that was generated using deformable image registration. Three of the radiation oncologists had previously received training in Radiation Therapy Oncology Group consensus contouring for breast cancer atlas. The simultaneous truth and performance level estimation algorithm was applied to the 8 contours delineated “from scratch” to produce a group consensus contour. Individual Jaccard scores were fitted to a beta distribution model. We also applied this analysis to 2 or more patients, which were contoured by 9 breast radiation oncologists from 8 institutions. Results: The beta distribution model had a mean of 86.2%, standard deviation (SD) of ±5.9%, a skewness of −0.7, and excess kurtosis of 0.55, exemplifying broad interobserver variability. The 3 RTOG-trained physicians had higher agreement scores than average, indicating that their contours were close to the group consensus contour. One physician had high sensitivity but lower specificity than the others, which implies that this physician tended to contour a structure larger than those of the others. Two other physicians had low sensitivity but specificity similar to the others, which implies that they tended to contour a structure smaller than the others. With this information, they could adjust their contouring practice to be more consistent with others if desired. When contouring from the template, the beta distribution model had a mean of 92.3%, SD ± 3.4%, skewness of −0.79, and excess kurtosis of 0.83, which indicated a much better consistency among individual contours. Similar results were obtained for the analysis of 2 additional patients. Conclusions: The proposed statistical approach was able to measure interobserver variability quantitatively
[Methods, challenges and opportunities for big data analyses of microbiome].

Science.gov (United States)

Sheng, Hua-Fang; Zhou, Hong-Wei

2015-07-01

Microbiome is a novel research field related with a variety of chronic inflamatory diseases. Technically, there are two major approaches to analysis of microbiome: metataxonome by sequencing the 16S rRNA variable tags, and metagenome by shot-gun sequencing of the total microbial (mainly bacterial) genome mixture. The 16S rRNA sequencing analyses pipeline includes sequence quality control, diversity analyses, taxonomy and statistics; metagenome analyses further includes gene annotation and functional analyses. With the development of the sequencing techniques, the cost of sequencing will decrease, and big data analyses will become the central task. Data standardization, accumulation, modeling and disease prediction are crucial for future exploit of these data. Meanwhile, the information property in these data, and the functional verification with culture-dependent and culture-independent experiments remain the focus in future research. Studies of human microbiome will bring a better understanding of the relations between the human body and the microbiome, especially in the context of disease diagnosis and therapy, which promise rich research opportunities.
Preliminary results of statistical dynamic experiments on a heat exchanger

International Nuclear Information System (INIS)

Corran, E.R.; Cummins, J.D.

1962-10-01

The inherent noise signals present in a heat exchanger have been recorded and analysed in order to determine some of the statistical dynamic characteristics of the heat exchanger. These preliminary results show that the primary side temperature frequency response may be determined by analysing the inherent noise. The secondary side temperature frequency response and cross coupled temperature frequency responses between primary and secondary are poorly determined because of the presence of a non-stationary noise source in the secondary circuit of this heat exchanger. This may be overcome by correlating the dependent variables with an externally applied noise signal. Some preliminary experiments with an externally applied random telegraph type of signal are reported. (author)
Statistical refinements for data analysis of mollusc reproduction tests: an example with Lymnaea stagnalis

DEFF Research Database (Denmark)

Holbech, Henrik

-contribution of each individual to the measured response. Furthermore, the combination of a Gamma-Poisson stochastic part with a Weibull concentration-response model allowed accounting for the inter-replicate variability. Second, we checked for the possibility of optimizing the initial experimental design through...... was twofold. First, we refined the statistical analyses of reproduction data accounting for mortality all along the test period. The variable “number of clutches/eggs produced per individual-day” was used for EC x modelling, as classically done in epidemiology in order to account for the time...
Spatio-temporal statistical models with applications to atmospheric processes

International Nuclear Information System (INIS)

Wikle, C.K.

1996-01-01

This doctoral dissertation is presented as three self-contained papers. An introductory chapter considers traditional spatio-temporal statistical methods used in the atmospheric sciences from a statistical perspective. Although this section is primarily a review, many of the statistical issues considered have not been considered in the context of these methods and several open questions are posed. The first paper attempts to determine a means of characterizing the semiannual oscillation (SAO) spatial variation in the northern hemisphere extratropical height field. It was discovered that the midlatitude SAO in 500hPa geopotential height could be explained almost entirely as a result of spatial and temporal asymmetries in the annual variation of stationary eddies. It was concluded that the mechanism for the SAO in the northern hemisphere is a result of land-sea contrasts. The second paper examines the seasonal variability of mixed Rossby-gravity waves (MRGW) in lower stratospheric over the equatorial Pacific. Advanced cyclostationary time series techniques were used for analysis. It was found that there are significant twice-yearly peaks in MRGW activity. Analyses also suggested a convergence of horizontal momentum flux associated with these waves. In the third paper, a new spatio-temporal statistical model is proposed that attempts to consider the influence of both temporal and spatial variability. This method is mainly concerned with prediction in space and time, and provides a spatially descriptive and temporally dynamic model
An analysis of distribution transformer failure using the statistical package for the social sciences (SPSS software

Directory of Open Access Journals (Sweden)

María Gabriela Mago Ramos

2012-05-01

Full Text Available A methodology was developed for analysing faults in distribution transformers using the statistical package for social sciences (SPSS; it consisted of organising and creating of database regarding failed equipment, incorporating such data into the processing programme and converting all the information into numerical variables to be processed, thereby obtaining descriptive statistics and enabling factor and discriminant analysis. The research was based on information provided by companies in areas served by Corpoelec (Valencia, Venezuela and Codensa (Bogotá, Colombia.
Statistically Enhanced Model of In Situ Oil Sands Extraction Operations: An Evaluation of Variability in Greenhouse Gas Emissions.

Science.gov (United States)

Orellana, Andrea; Laurenzi, Ian J; MacLean, Heather L; Bergerson, Joule A

2018-02-06

Greenhouse gas (GHG) emissions associated with extraction of bitumen from oil sands can vary from project to project and over time. However, the nature and magnitude of this variability have yet to be incorporated into life cycle studies. We present a statistically enhanced life cycle based model (GHOST-SE) for assessing variability of GHG emissions associated with the extraction of bitumen using in situ techniques in Alberta, Canada. It employs publicly available, company-reported operating data, facilitating assessment of inter- and intraproject variability as well as the time evolution of GHG emissions from commercial in situ oil sands projects. We estimate the median GHG emissions associated with bitumen production via cyclic steam stimulation (CSS) to be 77 kg CO 2 eq/bbl bitumen (80% CI: 61-109 kg CO 2 eq/bbl), and via steam assisted gravity drainage (SAGD) to be 68 kg CO 2 eq/bbl bitumen (80% CI: 49-102 kg CO 2 eq/bbl). We also show that the median emissions intensity of Alberta's CSS and SAGD projects have been relatively stable from 2000 to 2013, despite greater than 6-fold growth in production. Variability between projects is the single largest source of variability (driven in part by reservoir characteristics) but intraproject variability (e.g., startups, interruptions), is also important and must be considered in order to inform research or policy priorities.
A guide to statistical analysis in microbial ecology: a community-focused, living review of multivariate data analyses.

Science.gov (United States)

Buttigieg, Pier Luigi; Ramette, Alban

2014-12-01

The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynamic, web-based resource providing accessible descriptions of numerous multivariate techniques relevant to microbial ecologists. A combination of interactive elements allows users to discover and navigate between methods relevant to their needs and examine how they have been used by others in the field. We have designed GUSTA ME to become a community-led and -curated service, which we hope will provide a common reference and forum to discuss and disseminate analytical techniques relevant to the microbial ecology community. © 2014 The Authors. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.
On conditional scalar increment and joint velocity-scalar increment statistics

International Nuclear Information System (INIS)

Zhang Hengbin; Wang Danhong; Tong Chenning

2004-01-01

Conditional velocity and scalar increment statistics are usually studied in the context of Kolmogorov's refined similarity hypotheses and are considered universal (quasi-Gaussian) for inertial-range separations. In such analyses the locally averaged energy and scalar dissipation rates are used as conditioning variables. Recent studies have shown that certain local turbulence structures can be captured when the local scalar variance (φ 2 ) r and the local kinetic energy k r are used as the conditioning variables. We study the conditional increments using these conditioning variables, which also provide the local turbulence scales. Experimental data obtained in the fully developed region of an axisymmetric turbulent jet are used to compute the statistics. The conditional scalar increment probability density function (PDF) conditional on (φ 2 ) r is found to be close to Gaussian for (φ 2 ) r small compared with its mean and is sub-Gaussian and bimodal for large (φ 2 ) r , and therefore is not universal. We find that the different shapes of the conditional PDFs are related to the instantaneous degree of non-equilibrium (production larger than dissipation) of the local scalar. There is further evidence of this from the conditional PDF conditional on both (φ 2 ) r and χ r , which is largely a function of (φ 2 ) r /χ r , a measure of the degree of non-equilibrium. The velocity-scalar increment joint PDF is close to joint Gaussian and quad-modal for equilibrium and non-equilibrium local velocity and scalar, respectively. The latter shape is associated with a combination of the ramp-cliff and plane strain structures. Kolmogorov's refined similarity hypotheses also predict a dependence of the conditional PDF on the degree of non-equilibrium. Therefore, the quasi-Gaussian (joint) PDF, previously observed in the context of Kolmogorov's refined similarity hypotheses, is only one of the conditional PDF shapes of inertial range turbulence. The present study suggests that
Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses.

Science.gov (United States)

Liu, Ruijie; Holik, Aliaksei Z; Su, Shian; Jansz, Natasha; Chen, Kelan; Leong, Huei San; Blewitt, Marnie E; Asselin-Labat, Marie-Liesse; Smyth, Gordon K; Ritchie, Matthew E

2015-09-03

Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Accident Statistics

Data.gov (United States)

Department of Homeland Security — Accident statistics available on the Coast Guard’s website by state, year, and one variable to obtain tables and/or graphs. Data from reports has been loaded for...
An Update on Statistical Boosting in Biomedicine.

Science.gov (United States)

Mayr, Andreas; Hofner, Benjamin; Waldmann, Elisabeth; Hepp, Tobias; Meyer, Sebastian; Gefeller, Olaf

2017-01-01

Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.
Digital immunohistochemistry platform for the staining variation monitoring based on integration of image and statistical analyses with laboratory information system.

Science.gov (United States)

Laurinaviciene, Aida; Plancoulaine, Benoit; Baltrusaityte, Indra; Meskauskas, Raimundas; Besusparis, Justinas; Lesciute-Krilaviciene, Daiva; Raudeliunas, Darius; Iqbal, Yasir; Herlin, Paulette; Laurinavicius, Arvydas

2014-01-01

Digital immunohistochemistry (IHC) is one of the most promising applications brought by new generation image analysis (IA). While conventional IHC staining quality is monitored by semi-quantitative visual evaluation of tissue controls, IA may require more sensitive measurement. We designed an automated system to digitally monitor IHC multi-tissue controls, based on SQL-level integration of laboratory information system with image and statistical analysis tools. Consecutive sections of TMA containing 10 cores of breast cancer tissue were used as tissue controls in routine Ki67 IHC testing. Ventana slide label barcode ID was sent to the LIS to register the serial section sequence. The slides were stained and scanned (Aperio ScanScope XT), IA was performed by the Aperio/Leica Colocalization and Genie Classifier/Nuclear algorithms. SQL-based integration ensured automated statistical analysis of the IA data by the SAS Enterprise Guide project. Factor analysis and plot visualizations were performed to explore slide-to-slide variation of the Ki67 IHC staining results in the control tissue. Slide-to-slide intra-core IHC staining analysis revealed rather significant variation of the variables reflecting the sample size, while Brown and Blue Intensity were relatively stable. To further investigate this variation, the IA results from the 10 cores were aggregated to minimize tissue-related variance. Factor analysis revealed association between the variables reflecting the sample size detected by IA and Blue Intensity. Since the main feature to be extracted from the tissue controls was staining intensity, we further explored the variation of the intensity variables in the individual cores. MeanBrownBlue Intensity ((Brown+Blue)/2) and DiffBrownBlue Intensity (Brown-Blue) were introduced to better contrast the absolute intensity and the colour balance variation in each core; relevant factor scores were extracted. Finally, tissue-related factors of IHC staining variance were
Descriptive statistics and spatial distributions of geochemical variables associated with manganese oxide-rich phases in the northern Pacific

Science.gov (United States)

Botbol, Joseph Moses; Evenden, Gerald Ian

1989-01-01

Tables, graphs, and maps are used to portray the frequency characteristics and spatial distribution of manganese oxide-rich phase geochemical data, to characterize the northern Pacific in terms of publicly available nodule geochemical data, and to develop data portrayal methods that will facilitate data analysis. Source data are a subset of the Scripps Institute of Oceanography's Sediment Data Bank. The study area is bounded by 0° N., 40° N., 120° E., and 100° W. and is arbitrarily subdivided into 14-20°x20° geographic subregions. Frequency distributions of trace metals characterized in the original raw data are graphed as ogives, and salient parameters are tabulated. All variables are transformed to enrichment values relative to median concentration within their host subregions. Scatter plots of all pairs of original variables and their enrichment transforms are provided as an aid to the interpretation of correlations between variables. Gridded spatial distributions of all variables are portrayed as gray-scale maps. The use of tables and graphs to portray frequency statistics and gray-scale maps to portray spatial distributions is an effective way to prepare for and facilitate multivariate data analysis.
Statistical Data Editing in Scientific Articles.

Science.gov (United States)

Habibzadeh, Farrokh

2017-07-01

Scientific journals are important scholarly forums for sharing research findings. Editors have important roles in safeguarding standards of scientific publication and should be familiar with correct presentation of results, among other core competencies. Editors do not have access to the raw data and should thus rely on clues in the submitted manuscripts. To identify probable errors, they should look for inconsistencies in presented results. Common statistical problems that can be picked up by a knowledgeable manuscript editor are discussed in this article. Manuscripts should contain a detailed section on statistical analyses of the data. Numbers should be reported with appropriate precisions. Standard error of the mean (SEM) should not be reported as an index of data dispersion. Mean (standard deviation [SD]) and median (interquartile range [IQR]) should be used for description of normally and non-normally distributed data, respectively. If possible, it is better to report 95% confidence interval (CI) for statistics, at least for main outcome variables. And, P values should be presented, and interpreted with caution, if there is a hypothesis. To advance knowledge and skills of their members, associations of journal editors are better to develop training courses on basic statistics and research methodology for non-experts. This would in turn improve research reporting and safeguard the body of scientific evidence. © 2017 The Korean Academy of Medical Sciences.
Perceived Statistical Knowledge Level and Self-Reported Statistical Practice Among Academic Psychologists

Directory of Open Access Journals (Sweden)

Laura Badenes-Ribera

2018-06-01

Full Text Available Introduction: Publications arguing against the null hypothesis significance testing (NHST procedure and in favor of good statistical practices have increased. The most frequently mentioned alternatives to NHST are effect size statistics (ES, confidence intervals (CIs, and meta-analyses. A recent survey conducted in Spain found that academic psychologists have poor knowledge about effect size statistics, confidence intervals, and graphic displays for meta-analyses, which might lead to a misinterpretation of the results. In addition, it also found that, although the use of ES is becoming generalized, the same thing is not true for CIs. Finally, academics with greater knowledge about ES statistics presented a profile closer to good statistical practice and research design. Our main purpose was to analyze the extension of these results to a different geographical area through a replication study.Methods: For this purpose, we elaborated an on-line survey that included the same items as the original research, and we asked academic psychologists to indicate their level of knowledge about ES, their CIs, and meta-analyses, and how they use them. The sample consisted of 159 Italian academic psychologists (54.09% women, mean age of 47.65 years. The mean number of years in the position of professor was 12.90 (SD = 10.21.Results: As in the original research, the results showed that, although the use of effect size estimates is becoming generalized, an under-reporting of CIs for ES persists. The most frequent ES statistics mentioned were Cohen's d and R2/η2, which can have outliers or show non-normality or violate statistical assumptions. In addition, academics showed poor knowledge about meta-analytic displays (e.g., forest plot and funnel plot and quality checklists for studies. Finally, academics with higher-level knowledge about ES statistics seem to have a profile closer to good statistical practices.Conclusions: Changing statistical practice is not
Quantification of variables that affect energy consumption

International Nuclear Information System (INIS)

Warren, C.S.

1993-01-01

Facility energy consumption is the summation of a number of contributory factors, caused by equipment that uses energy in response to demands placed by the user and according to its particular design. While energy efficiency improvements usually concentrate on individual parts or systems, overall energy consumption is analyzed by examining the use of specific fuels. Because independent variables effect the consumption of these fuels, accurate comparisons of a facility's energy consumption for time-measured periods must include these effects. In many cases, it is possible to determine and quantify the effects of one or more of the independent variables through a statistically valid regression analysis of the data. The regression model can be linear, or be dependent on other functions such as powers, time lead or lag, or exponential. The most common model is linear, but other dependencies are often encountered. Regression analyses are not difficult to accomplish, and are included as one of the tools in most spreadsheet software. The analyses provide the energy manager with a means to better understand the energy consumption of his/her facility
Detailed statistical contact angle analyses; "slow moving" drops on inclining silicon-oxide surfaces.

Science.gov (United States)

Schmitt, M; Groß, K; Grub, J; Heib, F

2015-06-01

Contact angle determination by sessile drop technique is essential to characterise surface properties in science and in industry. Different specific angles can be observed on every solid which are correlated with the advancing or the receding of the triple line. Different procedures and definitions for the determination of specific angles exist which are often not comprehensible or reproducible. Therefore one of the most important things in this area is to build standard, reproducible and valid methods for determining advancing/receding contact angles. This contribution introduces novel techniques to analyse dynamic contact angle measurements (sessile drop) in detail which are applicable for axisymmetric and non-axisymmetric drops. Not only the recently presented fit solution by sigmoid function and the independent analysis of the different parameters (inclination, contact angle, velocity of the triple point) but also the dependent analysis will be firstly explained in detail. These approaches lead to contact angle data and different access on specific contact angles which are independent from "user-skills" and subjectivity of the operator. As example the motion behaviour of droplets on flat silicon-oxide surfaces after different surface treatments is dynamically measured by sessile drop technique when inclining the sample plate. The triple points, the inclination angles, the downhill (advancing motion) and the uphill angles (receding motion) obtained by high-precision drop shape analysis are independently and dependently statistically analysed. Due to the small covered distance for the dependent analysis (contact angle determination. They are characterised by small deviations of the computed values. Additional to the detailed introduction of this novel analytical approaches plus fit solution special motion relations for the drop on inclined surfaces and detailed relations about the reactivity of the freshly cleaned silicon wafer surface resulting in acceleration
Variables associated with active spondylolysis.

Science.gov (United States)

Gregg, Chris D; Dean, Sarah; Schneiders, Anthony G

2009-11-01

Retrospective non-experimental study. To investigate variables associated with active spondylolysis. A retrospective study audited clinical data over a two year period from patients with suspected spondylolysis that were referred for a SPECT bone scan. Six exploratory variables were identified and analysed using uni- and multi-variate regression from 82 patient records to determine the association between symptomatic, physical and demographic characteristics, and the presence of an active spondylolysis. Tertiary level multidisciplinary private practice sports medicine clinic. All patients with low back pain that required a SPECT bone scan to confirm suspected spondylolysis. 82 subjects were included in the final sample group. The six exploratory variables included Age, Gender, Injury duration, Injury onset, Sports participation and the result of the Single Leg Hyperextension Test. The dependent outcome variable was the result of the SPECT bone scan (scan-positive or scan-negative). Adolescent males had a higher incidence of spondylolysis detected by SPECT bone scan compared to other patients and a statistically significant association was demonstrated for both age (p=0.01) and gender (p=0.01). Subjects with an active spondylolysis were nearly five times more likely to be male and aged less than 20 years. Furthermore, the likelihood ratio indicated that adolescent males with suspected spondylolysis were three and a half times more likely to have a positive bone scan result. The Single Leg Hyperextension Test did not demonstrate a statistically significant association with spondylolysis (p=0.47). Clinicians assessing for a predisposition to the development of spondylolysis should consider the gender and age of the patient and not rely on the predictive ability of the Single Leg Hyperextension Test.
Effect of oligonucleotide primers in determining viral variability within hosts

Directory of Open Access Journals (Sweden)

Moya Andrés

2004-12-01

Full Text Available Abstract Background Genetic variability in viral populations is usually estimated by means of polymerase chain reaction (PCR based methods in which the relative abundance of each amplicon is assumed to be proportional to the frequency of the corresponding template in the initial sample. Although bias in template-to-product ratios has been described before, its relevance in describing viral genetic variability at the intrapatient level has not been fully assessed yet. Results To investigate the role of oligonucleotide design in estimating viral variability within hosts, genetic diversity in hepatitis C virus (HCV populations from eight infected patients was characterised by two parallel PCR amplifications performed with two slightly different sets of primers, followed by cloning and sequencing (mean = 89 cloned sequences per patient. Population genetics analyses of viral populations recovered by pairs of amplifications revealed that in seven patients statistically significant differences were detected between populations sampled with different set of primers. Conclusions Genetic variability analyses demonstrates that PCR selection due to the choice of primers, differing in their degeneracy degree at some nucleotide positions, can eclipse totally or partially viral variants, hence yielding significant different estimates of viral variability within a single patient and therefore eventually producing quite different qualitative and quantitative descriptions of viral populations within each host.
Effect of oligonucleotide primers in determining viral variability within hosts.

Science.gov (United States)

Bracho, Maria Alma; García-Robles, Inmaculada; Jiménez, Nuria; Torres-Puente, Manuela; Moya, Andrés; González-Candelas, Fernando

2004-12-09

Genetic variability in viral populations is usually estimated by means of polymerase chain reaction (PCR) based methods in which the relative abundance of each amplicon is assumed to be proportional to the frequency of the corresponding template in the initial sample. Although bias in template-to-product ratios has been described before, its relevance in describing viral genetic variability at the intrapatient level has not been fully assessed yet. To investigate the role of oligonucleotide design in estimating viral variability within hosts, genetic diversity in hepatitis C virus (HCV) populations from eight infected patients was characterised by two parallel PCR amplifications performed with two slightly different sets of primers, followed by cloning and sequencing (mean = 89 cloned sequences per patient). Population genetics analyses of viral populations recovered by pairs of amplifications revealed that in seven patients statistically significant differences were detected between populations sampled with different set of primers. Genetic variability analyses demonstrates that PCR selection due to the choice of primers, differing in their degeneracy degree at some nucleotide positions, can eclipse totally or partially viral variants, hence yielding significant different estimates of viral variability within a single patient and therefore eventually producing quite different qualitative and quantitative descriptions of viral populations within each host.
Scalar statistics in variable property turbulent channel flows

NARCIS (Netherlands)

Patel, A.; Boersma, B.J.; Pecnik, R.

2017-01-01

Direct numerical simulation of fully developed, internally heated channel flows with isothermal walls is performed using the low-Mach-number approximation of Navier-Stokes equation to investigate the influence of temperature-dependent properties on turbulent scalar statistics. Different constitutive
Estimating annual high-flow statistics and monthly and seasonal low-flow statistics for ungaged sites on streams in Alaska and conterminous basins in Canada

Science.gov (United States)

Wiley, Jeffrey B.; Curran, Janet H.

2003-01-01

Methods for estimating daily mean flow-duration statistics for seven regions in Alaska and low-flow frequencies for one region, southeastern Alaska, were developed from daily mean discharges for streamflow-gaging stations in Alaska and conterminous basins in Canada. The 15-, 10-, 9-, 8-, 7-, 6-, 5-, 4-, 3-, 2-, and 1-percent duration flows were computed for the October-through-September water year for 222 stations in Alaska and conterminous basins in Canada. The 98-, 95-, 90-, 85-, 80-, 70-, 60-, and 50-percent duration flows were computed for the individual months of July, August, and September for 226 stations in Alaska and conterminous basins in Canada. The 98-, 95-, 90-, 85-, 80-, 70-, 60-, and 50-percent duration flows were computed for the season July-through-September for 65 stations in southeastern Alaska. The 7-day, 10-year and 7-day, 2-year low-flow frequencies for the season July-through-September were computed for 65 stations for most of southeastern Alaska. Low-flow analyses were limited to particular months or seasons in order to omit winter low flows, when ice effects reduce the quality of the records and validity of statistical assumptions. Regression equations for estimating the selected high-flow and low-flow statistics for the selected months and seasons for ungaged sites were developed from an ordinary-least-squares regression model using basin characteristics as independent variables. Drainage area and precipitation were significant explanatory variables for high flows, and drainage area, precipitation, mean basin elevation, and area of glaciers were significant explanatory variables for low flows. The estimating equations can be used at ungaged sites in Alaska and conterminous basins in Canada where streamflow regulation, streamflow diversion, urbanization, and natural damming and releasing of water do not affect the streamflow data for the given month or season. Standard errors of estimate ranged from 15 to 56 percent for high-duration flow
Visual and statistical analysis of 18F-FDG PET in primary progressive aphasia

International Nuclear Information System (INIS)

Matias-Guiu, Jordi A.; Moreno-Ramos, Teresa; Garcia-Ramos, Rocio; Fernandez-Matarrubia, Marta; Oreja-Guevara, Celia; Matias-Guiu, Jorge; Cabrera-Martin, Maria Nieves; Perez-Castejon, Maria Jesus; Rodriguez-Rey, Cristina; Ortega-Candil, Aida; Carreras, Jose Luis

2015-01-01

Diagnosing progressive primary aphasia (PPA) and its variants is of great clinical importance, and fluorodeoxyglucose (FDG) positron emission tomography (PET) may be a useful diagnostic technique. The purpose of this study was to evaluate interobserver variability in the interpretation of FDG PET images in PPA as well as the diagnostic sensitivity and specificity of the technique. We also aimed to compare visual and statistical analyses of these images. There were 10 raters who analysed 44 FDG PET scans from 33 PPA patients and 11 controls. Five raters analysed the images visually, while the other five used maps created using Statistical Parametric Mapping software. Two spatial normalization procedures were performed: global mean normalization and cerebellar normalization. Clinical diagnosis was considered the gold standard. Inter-rater concordance was moderate for visual analysis (Fleiss' kappa 0.568) and substantial for statistical analysis (kappa 0.756-0.881). Agreement was good for all three variants of PPA except for the nonfluent/agrammatic variant studied with visual analysis. The sensitivity and specificity of each rater's diagnosis of PPA was high, averaging 87.8 and 89.9 % for visual analysis and 96.9 and 90.9 % for statistical analysis using global mean normalization, respectively. In cerebellar normalization, sensitivity was 88.9 % and specificity 100 %. FDG PET demonstrated high diagnostic accuracy for the diagnosis of PPA and its variants. Inter-rater concordance was higher for statistical analysis, especially for the nonfluent/agrammatic variant. These data support the use of FDG PET to evaluate patients with PPA and show that statistical analysis methods are particularly useful for identifying the nonfluent/agrammatic variant of PPA. (orig.)
Statistical analysis of the potassium concentration obtained through

International Nuclear Information System (INIS)

Pereira, Joao Eduardo da Silva; Silva, Jose Luiz Silverio da; Pires, Carlos Alberto da Fonseca; Strieder, Adelir Jose

2007-01-01

The present work was developed in outcrops of Santa Maria region, southern Brazil, Rio Grande do Sul State. Statistic evaluations were applied in different rock types. The possibility to distinguish different geologic units, sedimentary and volcanic (acid and basic types) by means of the statistic analyses from the use of airborne gamma-ray spectrometry integrating potash radiation emissions data with geological and geochemistry data is discussed. This Project was carried out at 1973 by Geological Survey of Brazil/Companhia de Pesquisas de Recursos Minerais. The Camaqua Project evaluated the behavior of potash concentrations generating XYZ Geosof 1997 format, one grid, thematic map and digital thematic map files from this total area. Using these data base, the integration of statistics analyses in sedimentary formations which belong to the Depressao Central do Rio Grande do Sul and/or to volcanic rocks from Planalto da Serra Geral at the border of Parana Basin was tested. Univariate statistics model was used: the media, the standard media error, and the trust limits were estimated. The Tukey's Test was used in order to compare mean values. The results allowed to create criteria to distinguish geological formations based on their potash content. The back-calibration technique was employed to transform K radiation to percentage. Inside this context it was possible to define characteristic values from radioactive potash emissions and their trust ranges in relation to geologic formations. The potash variable when evaluated in relation to geographic Universal Transverse Mercator coordinates system showed a spatial relation following one polynomial model of second order, with one determination coefficient. The statistica 7.1 software Generalist Linear Models produced by Statistics Department of Federal University of Santa Maria/Brazil was used. (author)
Statistical approaches in published ophthalmic clinical science papers: a comparison to statistical practice two decades ago.

Science.gov (United States)

Zhang, Harrison G; Ying, Gui-Shuang

2018-02-09

The aim of this study is to evaluate the current practice of statistical analysis of eye data in clinical science papers published in British Journal of Ophthalmology ( BJO ) and to determine whether the practice of statistical analysis has improved in the past two decades. All clinical science papers (n=125) published in BJO in January-June 2017 were reviewed for their statistical analysis approaches for analysing primary ocular measure. We compared our findings to the results from a previous paper that reviewed BJO papers in 1995. Of 112 papers eligible for analysis, half of the studies analysed the data at an individual level because of the nature of observation, 16 (14%) studies analysed data from one eye only, 36 (32%) studies analysed data from both eyes at ocular level, one study (1%) analysed the overall summary of ocular finding per individual and three (3%) studies used the paired comparison. Among studies with data available from both eyes, 50 (89%) of 56 papers in 2017 did not analyse data from both eyes or ignored the intereye correlation, as compared with in 60 (90%) of 67 papers in 1995 (P=0.96). Among studies that analysed data from both eyes at an ocular level, 33 (92%) of 36 studies completely ignored the intereye correlation in 2017, as compared with in 16 (89%) of 18 studies in 1995 (P=0.40). A majority of studies did not analyse the data properly when data from both eyes were available. The practice of statistical analysis did not improve in the past two decades. Collaborative efforts should be made in the vision research community to improve the practice of statistical analysis for ocular data. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
An Update on Statistical Boosting in Biomedicine

Directory of Open Access Journals (Sweden)

Andreas Mayr

2017-01-01

Full Text Available Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting. In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.
Influence of Immersion Conditions on The Tensile Strength of Recycled Kevlar®/Polyester/Low-Melting-Point Polyester Nonwoven Geotextiles through Applying Statistical Analyses

Directory of Open Access Journals (Sweden)

Jing-Chzi Hsieh

2016-05-01

Full Text Available The recycled Kevlar®/polyester/low-melting-point polyester (recycled Kevlar®/PET/LPET nonwoven geotextiles are immersed in neutral, strong acid, and strong alkali solutions, respectively, at different temperatures for four months. Their tensile strength is then tested according to various immersion periods at various temperatures, in order to determine their durability to chemicals. For the purpose of analyzing the possible factors that influence mechanical properties of geotextiles under diverse environmental conditions, the experimental results and statistical analyses are incorporated in this study. Therefore, influences of the content of recycled Kevlar® fibers, implementation of thermal treatment, and immersion periods on the tensile strength of recycled Kevlar®/PET/LPET nonwoven geotextiles are examined, after which their influential levels are statistically determined by performing multiple regression analyses. According to the results, the tensile strength of nonwoven geotextiles can be enhanced by adding recycled Kevlar® fibers and thermal treatment.
Exploring the temporal stability of global road safety statistics.

Science.gov (United States)

Dimitriou, Loukas; Nikolaou, Paraskevas; Antoniou, Constantinos

2018-02-08

Given the importance of rigorous quantitative reasoning in supporting national, regional or global road safety policies, data quality, reliability, and stability are of the upmost importance. This study focuses on macroscopic properties of road safety statistics and the temporal stability of these statistics at a global level. A thorough investigation of two years of measurements was conducted to identify any unexpected gaps that could highlight the existence of inconsistent measurements. The database used in this research includes 121 member countries of the United Nation (UN-121) with a population of at least one million (smaller country data shows higher instability) and includes road safety and socioeconomic variables collected from a number of international databases (e.g. WHO and World Bank) for the years 2010 and 2013. For the fulfillment of the earlier stated goal, a number of data visualization and exploratory analyses (Hierarchical Clustering and Principal Component Analysis) were conducted. Furthermore, in order to provide a richer analysis of the data, we developed and compared the specification of a number of Structural Equation Models for the years 2010 and 2013. Different scenarios have been developed, with different endogenous variables (indicators of mortality rate and fatality risk) and structural forms. The findings of the current research indicate inconsistency phenomena in global statistics of different instances/years. Finally, the results of this research provide evidence on the importance of careful and systematic data collection for developing advanced statistical and econometric techniques and furthermore for developing road safety policies. Copyright © 2017 Elsevier Ltd. All rights reserved.
Ordered random variables theory and applications

CERN Document Server

Shahbaz, Muhammad Qaiser; Hanif Shahbaz, Saman; Al-Zahrani, Bander M

2016-01-01

Ordered Random Variables have attracted several authors. The basic building block of Ordered Random Variables is Order Statistics which has several applications in extreme value theory and ordered estimation. The general model for ordered random variables, known as Generalized Order Statistics has been introduced relatively recently by Kamps (1995).

Does bisphenol A induce superfeminization in Marisa cornuarietis? Part II: toxicity test results and requirements for statistical power analyses.

Science.gov (United States)

Forbes, Valery E; Aufderheide, John; Warbritton, Ryan; van der Hoeven, Nelly; Caspers, Norbert

2007-03-01

This study presents results of the effects of bisphenol A (BPA) on adult egg production, egg hatchability, egg development rates and juvenile growth rates in the freshwater gastropod, Marisa cornuarietis. We observed no adult mortality, substantial inter-snail variability in reproductive output, and no effects of BPA on reproduction during 12 weeks of exposure to 0, 0.1, 1.0, 16, 160 or 640 microg/L BPA. We observed no effects of BPA on egg hatchability or timing of egg hatching. Juveniles showed good growth in the control and all treatments, and there were no significant effects of BPA on this endpoint. Our results do not support previous claims of enhanced reproduction in Marisa cornuarietis in response to exposure to BPA. Statistical power analysis indicated high levels of inter-snail variability in the measured endpoints and highlighted the need for sufficient replication when testing treatment effects on reproduction in M. cornuarietis with adequate power.
Lies, damn lies and statistics

International Nuclear Information System (INIS)

Jones, M.D.

2001-01-01

Statistics are widely employed within archaeological research. This is becoming increasingly so as user friendly statistical packages make increasingly sophisticated analyses available to non statisticians. However, all statistical techniques are based on underlying assumptions of which the end user may be unaware. If statistical analyses are applied in ignorance of the underlying assumptions there is the potential for highly erroneous inferences to be drawn. This does happen within archaeology and here this is illustrated with the example of 'date pooling', a technique that has been widely misused in archaeological research. This misuse may have given rise to an inevitable and predictable misinterpretation of New Zealand's archaeological record. (author). 10 refs., 6 figs., 1 tab
A risk-based statistical investigation of the quantification of polymorphic purity of a pharmaceutical candidate by solid-state 19F NMR.

Science.gov (United States)

Barry, Samantha J; Pham, Tran N; Borman, Phil J; Edwards, Andrew J; Watson, Simon A

2012-01-27

The DMAIC (Define, Measure, Analyse, Improve and Control) framework and associated statistical tools have been applied to both identify and reduce variability observed in a quantitative (19)F solid-state NMR (SSNMR) analytical method. The method had been developed to quantify levels of an additional polymorph (Form 3) in batches of an active pharmaceutical ingredient (API), where Form 1 is the predominant polymorph. In order to validate analyses of the polymorphic form, a single batch of API was used as a standard each time the method was used. The level of Form 3 in this standard was observed to gradually increase over time, the effect not being immediately apparent due to method variability. In order to determine the cause of this unexpected increase and to reduce method variability, a risk-based statistical investigation was performed to identify potential factors which could be responsible for these effects. Factors identified by the risk assessment were investigated using a series of designed experiments to gain a greater understanding of the method. The increase of the level of Form 3 in the standard was primarily found to correlate with the number of repeat analyses, an effect not previously reported in SSNMR literature. Differences in data processing (phasing and linewidth) were found to be responsible for the variability in the method. After implementing corrective actions the variability was reduced such that the level of Form 3 was within an acceptable range of ±1% ww(-1) in fresh samples of API. Copyright © 2011. Published by Elsevier B.V.
Statistical parametric mapping and statistical probabilistic anatomical mapping analyses of basal/acetazolamide Tc-99m ECD brain SPECT for efficacy assessment of endovascular stent placement for middle cerebral artery stenosis

International Nuclear Information System (INIS)

Lee, Tae-Hong; Kim, Seong-Jang; Kim, In-Ju; Kim, Yong-Ki; Kim, Dong-Soo; Park, Kyung-Pil

2007-01-01

Statistical parametric mapping (SPM) and statistical probabilistic anatomical mapping (SPAM) were applied to basal/acetazolamide Tc-99m ECD brain perfusion SPECT images in patients with middle cerebral artery (MCA) stenosis to assess the efficacy of endovascular stenting of the MCA. Enrolled in the study were 11 patients (8 men and 3 women, mean age 54.2 ± 6.2 years) who had undergone endovascular stent placement for MCA stenosis. Using SPM and SPAM analyses, we compared the number of significant voxels and cerebral counts in basal and acetazolamide SPECT images before and after stenting, and assessed the perfusion changes and cerebral vascular reserve index (CVRI). The numbers of hypoperfusion voxels in SPECT images were decreased from 10,083 ± 8,326 to 4,531 ± 5,091 in basal images (P 0.0317) and from 13,398 ± 14,222 to 7,699 ± 10,199 in acetazolamide images (P = 0.0142) after MCA stenting. On SPAM analysis, the increases in cerebral counts were significant in acetazolamide images (90.9 ± 2.2 to 93.5 ± 2.3, P = 0.0098) but not in basal images (91 ± 2.7 to 92 ± 2.6, P = 0.1602). The CVRI also showed a statistically significant increase from before stenting (median 0.32; 95% CI -2.19-2.37) to after stenting (median 1.59; 95% CI -0.85-4.16; P = 0.0068). This study revealed the usefulness of voxel-based analysis of basal/acetazolamide brain perfusion SPECT after MCA stent placement. This study showed that SPM and SPAM analyses of basal/acetazolamide Tc-99m brain SPECT could be used to evaluate the short-term hemodynamic efficacy of successful MCA stent placement. (orig.)
Statistical Analysis of Data for Timber Strengths

DEFF Research Database (Denmark)

Sørensen, John Dalsgaard; Hoffmeyer, P.

Statistical analyses are performed for material strength parameters from approximately 6700 specimens of structural timber. Non-parametric statistical analyses and fits to the following distributions types have been investigated: Normal, Lognormal, 2 parameter Weibull and 3-parameter Weibull...
Introduction to statistical modelling: linear regression.

Science.gov (United States)

Lunt, Mark

2015-07-01

In many studies we wish to assess how a range of variables are associated with a particular outcome and also determine the strength of such relationships so that we can begin to understand how these factors relate to each other at a population level. Ultimately, we may also be interested in predicting the outcome from a series of predictive factors available at, say, a routine clinic visit. In a recent article in Rheumatology, Desai et al. did precisely that when they studied the prediction of hip and spine BMD from hand BMD and various demographic, lifestyle, disease and therapy variables in patients with RA. This article aims to introduce the statistical methodology that can be used in such a situation and explain the meaning of some of the terms employed. It will also outline some common pitfalls encountered when performing such analyses. © The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Evaluation and application of summary statistic imputation to discover new height-associated loci.

Science.gov (United States)

Rüeger, Sina; McDaid, Aaron; Kutalik, Zoltán

2018-05-01

As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian
Statistical inference

CERN Document Server

Rohatgi, Vijay K

2003-01-01

Unified treatment of probability and statistics examines and analyzes the relationship between the two fields, exploring inferential issues. Numerous problems, examples, and diagrams--some with solutions--plus clear-cut, highlighted summaries of results. Advanced undergraduate to graduate level. Contents: 1. Introduction. 2. Probability Model. 3. Probability Distributions. 4. Introduction to Statistical Inference. 5. More on Mathematical Expectation. 6. Some Discrete Models. 7. Some Continuous Models. 8. Functions of Random Variables and Random Vectors. 9. Large-Sample Theory. 10. General Meth
[Statistics for statistics?--Thoughts about psychological tools].

Science.gov (United States)

Berger, Uwe; Stöbel-Richter, Yve

2007-12-01

Statistical methods take a prominent place among psychologists' educational programs. Being known as difficult to understand and heavy to learn, students fear of these contents. Those, who do not aspire after a research carrier at the university, will forget the drilled contents fast. Furthermore, because it does not apply for the work with patients and other target groups at a first glance, the methodological education as a whole was often questioned. For many psychological practitioners the statistical education makes only sense by enforcing respect against other professions, namely physicians. For the own business, statistics is rarely taken seriously as a professional tool. The reason seems to be clear: Statistics treats numbers, while psychotherapy treats subjects. So, does statistics ends in itself? With this article, we try to answer the question, if and how statistical methods were represented within the psychotherapeutical and psychological research. Therefore, we analyzed 46 Originals of a complete volume of the journal Psychotherapy, Psychosomatics, Psychological Medicine (PPmP). Within the volume, 28 different analyse methods were applied, from which 89 per cent were directly based upon statistics. To be able to write and critically read Originals as a backbone of research, presumes a high degree of statistical education. To ignore statistics means to ignore research and at least to reveal the own professional work to arbitrariness.
Age and gender effects on normal regional cerebral blood flow studied using two different voxel-based statistical analyses

International Nuclear Information System (INIS)

Pirson, A.S.; George, J.; Krug, B.; Vander Borght, T.; Van Laere, K.; Jamart, J.; D'Asseler, Y.; Minoshima, S.

2009-01-01

Fully automated analysis programs have been applied more and more to aid for the reading of regional cerebral blood flow SPECT study. They are increasingly based on the comparison of the patient study with a normal database. In this study, we evaluate the ability of Three-Dimensional Stereotactic Surface Projection (3 D-S.S.P.) to isolate effects of age and gender in a previously studied normal population. The results were also compared with those obtained using Statistical Parametric Mapping (S.P.M.99). Methods Eighty-nine 99m Tc-E.C.D.-SPECT studies performed in carefully screened healthy volunteers (46 females, 43 males; age 20 - 81 years) were analysed using 3 D-S.S.P.. A multivariate analysis based on the general linear model was performed with regions as intra-subject factor, gender as inter-subject factor and age as co-variate. Results Both age and gender had a significant interaction effect with regional tracer uptake. An age-related decline (p < 0.001) was found in the anterior cingulate gyrus, left frontal association cortex and left insula. Bilateral occipital association and left primary visual cortical uptake showed a significant relative increase with age (p < 0.001). Concerning the gender effect, women showed higher uptake (p < 0.01) in the parietal and right sensorimotor cortices. An age by gender interaction (p < 0.01) was only found in the left medial frontal cortex. The results were consistent with those obtained with S.P.M.99. Conclusion 3 D-S.S.P. analysis of normal r.C.B.F. variability is consistent with the literature and other automated voxel-based techniques, which highlight the effects of both age and gender. (authors)
Dispensing processes impact apparent biological activity as determined by computational and statistical analyses.

Directory of Open Access Journals (Sweden)

Sean Ekins

Full Text Available Dispensing and dilution processes may profoundly influence estimates of biological activity of compounds. Published data show Ephrin type-B receptor 4 IC50 values obtained via tip-based serial dilution and dispensing versus acoustic dispensing with direct dilution differ by orders of magnitude with no correlation or ranking of datasets. We generated computational 3D pharmacophores based on data derived by both acoustic and tip-based transfer. The computed pharmacophores differ significantly depending upon dispensing and dilution methods. The acoustic dispensing-derived pharmacophore correctly identified active compounds in a subsequent test set where the tip-based method failed. Data from acoustic dispensing generates a pharmacophore containing two hydrophobic features, one hydrogen bond donor and one hydrogen bond acceptor. This is consistent with X-ray crystallography studies of ligand-protein interactions and automatically generated pharmacophores derived from this structural data. In contrast, the tip-based data suggest a pharmacophore with two hydrogen bond acceptors, one hydrogen bond donor and no hydrophobic features. This pharmacophore is inconsistent with the X-ray crystallographic studies and automatically generated pharmacophores. In short, traditional dispensing processes are another important source of error in high-throughput screening that impacts computational and statistical analyses. These findings have far-reaching implications in biological research.
Statistical modelling of variability in sediment-water nutrient and oxygen fluxes

Science.gov (United States)

Serpetti, Natalia; Witte, Ursula; Heath, Michael

2016-06-01

Organic detritus entering, or produced, in the marine environment is re-mineralised to inorganic nutrient in the seafloor sediments. The flux of dissolved inorganic nutrient between the sediment and overlying water column is a key process in the marine ecosystem, which binds the biogeochemical sub-system to the living food web. These fluxes are potentially affected by a wide range of physical and biological factors and disentangling these is a significant challenge. Here we develop a set of General Additive Models (GAM) of nitrate, nitrite, ammonia, phosphate, silicate and oxygen fluxes, based on a year-long campaign of field measurements off the north-east coast of Scotland. We show that sediment grain size, turbidity due to sediment re-suspension, temperature, and biogenic matter content were the key factors affecting oxygen consumption, ammonia and silicate fluxes. However, phosphate fluxes were only related to suspended sediment concentrations, whilst nitrate fluxes showed no clear relationship to any of the expected drivers of change, probably due to the effects of denitrification. Our analyses show that the stoichiometry of nutrient regeneration in the ecosystem is not necessarily constant and may be affected by combinations of processes. We anticipate that our statistical modelling results will form the basis for testing the functionality of process-based mathematical models of whole-sediment biogeochemistry.
Visual and statistical analysis of {sup 18}F-FDG PET in primary progressive aphasia

Energy Technology Data Exchange (ETDEWEB)

Matias-Guiu, Jordi A.; Moreno-Ramos, Teresa; Garcia-Ramos, Rocio; Fernandez-Matarrubia, Marta; Oreja-Guevara, Celia; Matias-Guiu, Jorge [Hospital Clinico San Carlos, Department of Neurology, Madrid (Spain); Cabrera-Martin, Maria Nieves; Perez-Castejon, Maria Jesus; Rodriguez-Rey, Cristina; Ortega-Candil, Aida; Carreras, Jose Luis [San Carlos Health Research Institute (IdISSC) Complutense University of Madrid, Department of Nuclear Medicine, Hospital Clinico San Carlos, Madrid (Spain)

2015-05-01

Diagnosing progressive primary aphasia (PPA) and its variants is of great clinical importance, and fluorodeoxyglucose (FDG) positron emission tomography (PET) may be a useful diagnostic technique. The purpose of this study was to evaluate interobserver variability in the interpretation of FDG PET images in PPA as well as the diagnostic sensitivity and specificity of the technique. We also aimed to compare visual and statistical analyses of these images. There were 10 raters who analysed 44 FDG PET scans from 33 PPA patients and 11 controls. Five raters analysed the images visually, while the other five used maps created using Statistical Parametric Mapping software. Two spatial normalization procedures were performed: global mean normalization and cerebellar normalization. Clinical diagnosis was considered the gold standard. Inter-rater concordance was moderate for visual analysis (Fleiss' kappa 0.568) and substantial for statistical analysis (kappa 0.756-0.881). Agreement was good for all three variants of PPA except for the nonfluent/agrammatic variant studied with visual analysis. The sensitivity and specificity of each rater's diagnosis of PPA was high, averaging 87.8 and 89.9 % for visual analysis and 96.9 and 90.9 % for statistical analysis using global mean normalization, respectively. In cerebellar normalization, sensitivity was 88.9 % and specificity 100 %. FDG PET demonstrated high diagnostic accuracy for the diagnosis of PPA and its variants. Inter-rater concordance was higher for statistical analysis, especially for the nonfluent/agrammatic variant. These data support the use of FDG PET to evaluate patients with PPA and show that statistical analysis methods are particularly useful for identifying the nonfluent/agrammatic variant of PPA. (orig.)
Review and classification of variability analysis techniques with clinical applications

Science.gov (United States)

2011-01-01

Analysis of patterns of variation of time-series, termed variability analysis, represents a rapidly evolving discipline with increasing applications in different fields of science. In medicine and in particular critical care, efforts have focussed on evaluating the clinical utility of variability. However, the growth and complexity of techniques applicable to this field have made interpretation and understanding of variability more challenging. Our objective is to provide an updated review of variability analysis techniques suitable for clinical applications. We review more than 70 variability techniques, providing for each technique a brief description of the underlying theory and assumptions, together with a summary of clinical applications. We propose a revised classification for the domains of variability techniques, which include statistical, geometric, energetic, informational, and invariant. We discuss the process of calculation, often necessitating a mathematical transform of the time-series. Our aims are to summarize a broad literature, promote a shared vocabulary that would improve the exchange of ideas, and the analyses of the results between different studies. We conclude with challenges for the evolving science of variability analysis. PMID:21985357
Review and classification of variability analysis techniques with clinical applications.

Science.gov (United States)

Bravi, Andrea; Longtin, André; Seely, Andrew J E

2011-10-10

Analysis of patterns of variation of time-series, termed variability analysis, represents a rapidly evolving discipline with increasing applications in different fields of science. In medicine and in particular critical care, efforts have focussed on evaluating the clinical utility of variability. However, the growth and complexity of techniques applicable to this field have made interpretation and understanding of variability more challenging. Our objective is to provide an updated review of variability analysis techniques suitable for clinical applications. We review more than 70 variability techniques, providing for each technique a brief description of the underlying theory and assumptions, together with a summary of clinical applications. We propose a revised classification for the domains of variability techniques, which include statistical, geometric, energetic, informational, and invariant. We discuss the process of calculation, often necessitating a mathematical transform of the time-series. Our aims are to summarize a broad literature, promote a shared vocabulary that would improve the exchange of ideas, and the analyses of the results between different studies. We conclude with challenges for the evolving science of variability analysis.
Explorations in statistics: the log transformation.

Science.gov (United States)

Curran-Everett, Douglas

2018-06-01

Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This thirteenth installment of Explorations in Statistics explores the log transformation, an established technique that rescales the actual observations from an experiment so that the assumptions of some statistical analysis are better met. A general assumption in statistics is that the variability of some response Y is homogeneous across groups or across some predictor variable X. If the variability-the standard deviation-varies in rough proportion to the mean value of Y, a log transformation can equalize the standard deviations. Moreover, if the actual observations from an experiment conform to a skewed distribution, then a log transformation can make the theoretical distribution of the sample mean more consistent with a normal distribution. This is important: the results of a one-sample t test are meaningful only if the theoretical distribution of the sample mean is roughly normal. If we log-transform our observations, then we want to confirm the transformation was useful. We can do this if we use the Box-Cox method, if we bootstrap the sample mean and the statistic t itself, and if we assess the residual plots from the statistical model of the actual and transformed sample observations.
Environmental restoration and statistics: Issues and needs

International Nuclear Information System (INIS)

Gilbert, R.O.

1991-10-01

Statisticians have a vital role to play in environmental restoration (ER) activities. One facet of that role is to point out where additional work is needed to develop statistical sampling plans and data analyses that meet the needs of ER. This paper is an attempt to show where statistics fits into the ER process. The statistician, as member of the ER planning team, works collaboratively with the team to develop the site characterization sampling design, so that data of the quality and quantity required by the specified data quality objectives (DQOs) are obtained. At the same time, the statistician works with the rest of the planning team to design and implement, when appropriate, the observational approach to streamline the ER process and reduce costs. The statistician will also provide the expertise needed to select or develop appropriate tools for statistical analysis that are suited for problems that are common to waste-site data. These data problems include highly heterogeneous waste forms, large variability in concentrations over space, correlated data, data that do not have a normal (Gaussian) distribution, and measurements below detection limits. Other problems include environmental transport and risk models that yield highly uncertain predictions, and the need to effectively communicate to the public highly technical information, such as sampling plans, site characterization data, statistical analysis results, and risk estimates. Even though some statistical analysis methods are available ''off the shelf'' for use in ER, these problems require the development of additional statistical tools, as discussed in this paper. 29 refs
Probability theory and mathematical statistics for engineers

CERN Document Server

Pugachev, V S

1984-01-01

Probability Theory and Mathematical Statistics for Engineers focuses on the concepts of probability theory and mathematical statistics for finite-dimensional random variables.The publication first underscores the probabilities of events, random variables, and numerical characteristics of random variables. Discussions focus on canonical expansions of random vectors, second-order moments of random vectors, generalization of the density concept, entropy of a distribution, direct evaluation of probabilities, and conditional probabilities. The text then examines projections of random vector
Analysing temporal variability of particulate matter and possible contributing factors over Mahabaleshwar, a high-altitude station in Western Ghats, India

Science.gov (United States)

Leena, P. P.; Vijayakumar, K.; Anilkumar, V.; Pandithurai, G.

2017-11-01

Airborne particulate matter (PM) plays a vital role on climate change as well as human health. In the present study, temporal variability associated with mass concentrations of PM10, PM2.5, and PM1.0 were analysed using ground observations from Mahabaleswar (1348 m AMSL, 17.56 0N, 73.4 0E), a high-altitude station in the Western Ghats, India from June 2012 to May 2013. Concentrations of PM10, PM2.5, and PM1.0 showed strong diurnal, monthly, seasonal and weekday-weekend trends. The seasonal variation of PM1.0 and PM2.5 has showed highest concentrations during winter season compared to monsoon and pre-monsoon, but in the case of PM10 it showed highest concentrations in pre-monsoon season. Similarly, slightly higher PM concentrations were observed during weekends compared to weekdays. In addition, possible contributing factors to this temporal variability has been analysed based on the variation of secondary pollutants such as NO2, SO2, CO and O3 and long range transport of dust.
Modern applied statistics with S-plus

CERN Document Server

Venables, W N

1994-01-01

S-Plus is a powerful environment for statistical and graphical analysis of data. It provides the tools to implement many statistical ideas which have been made possible by the widespread availability of workstations having good graphics and computational capabilities. This book is a guide to using S-Plus to perform statistical analyses and provides both an introduction to the use of S-Plus and a course in modern statistical methods. The aim of the book is to show how to use S-Plus as a powerful and graphical system. Readers are assumed to have a basic grounding in statistics, and so the book is intended for would-be users of S-Plus, and both students and researchers using statistics. Throughout, the emphasis is on presenting practical problems and full analyses of real data sets.

New Statistical Model for Variability of Aerosol Optical Thickness: Theory and Application to MODIS Data over Ocean

Science.gov (United States)

Alexandrov, Mikhail Dmitrievic; Geogdzhayev, Igor V.; Tsigaridis, Konstantinos; Marshak, Alexander; Levy, Robert; Cairns, Brian

2016-01-01

A novel model for the variability in aerosol optical thickness (AOT) is presented. This model is based on the consideration of AOT fields as realizations of a stochastic process, that is the exponent of an underlying Gaussian process with a specific autocorrelation function. In this approach AOT fields have lognormal PDFs and structure functions having the correct asymptotic behavior at large scales. The latter is an advantage compared with fractal (scale-invariant) approaches. The simple analytical form of the structure function in the proposed model facilitates its use for the parameterization of AOT statistics derived from remote sensing data. The new approach is illustrated using a month-long global MODIS AOT dataset (over ocean) with 10 km resolution. It was used to compute AOT statistics for sample cells forming a grid with 5deg spacing. The observed shapes of the structure functions indicated that in a large number of cases the AOT variability is split into two regimes that exhibit different patterns of behavior: small-scale stationary processes and trends reflecting variations at larger scales. The small-scale patterns are suggested to be generated by local aerosols within the marine boundary layer, while the large-scale trends are indicative of elevated aerosols transported from remote continental sources. This assumption is evaluated by comparison of the geographical distributions of these patterns derived from MODIS data with those obtained from the GISS GCM. This study shows considerable potential to enhance comparisons between remote sensing datasets and climate models beyond regional mean AOTs.
Application of statistical experimental design to study the formulation variables influencing the coating process of lidocaine liposomes.

Science.gov (United States)

González-Rodríguez, M L; Barros, L B; Palma, J; González-Rodríguez, P L; Rabasco, A M

2007-06-07

In this paper, we have used statistical experimental design to investigate the effect of several factors in coating process of lidocaine hydrochloride (LID) liposomes by a biodegradable polymer (chitosan, CH). These variables were the concentration of CH coating solution, the dripping rate of this solution on the liposome colloidal dispersion, the stirring rate, the time since the liposome production to the liposome coating and finally the amount of drug entrapped into liposomes. The selected response variables were drug encapsulation efficiency (EE, %), coating efficiency (CE, %) and zeta potential. Liposomes were obtained by thin-layer evaporation method. They were subsequently coated with CH according the experimental plan provided by a fractional factorial (2(5-1)) screening matrix. We have used spectroscopic methods to determine the zeta potential values. The EE (%) assay was carried out in dialysis bags and the brilliant red probe was used to determine CE (%) due to its property of forming molecular complexes with CH. The graphic analysis of the effects allowed the identification of the main formulation and technological factors by the analysis of the selected responses and permitted the determination of the proper level of these factors for the response improvement. Moreover, fractional design allowed quantifying the interactions between the factors, which will consider in next experiments. The results obtained pointed out that LID amount was the predominant factor that increased the drug entrapment capacity (EE). The CE (%) response was mainly affected by the concentration of the CH solution and the stirring rate, although all the interactions between the main factors have statistical significance.
Statistics in a nutshell

CERN Document Server

Boslaugh, Sarah

2013-01-01

Need to learn statistics for your job? Want help passing a statistics course? Statistics in a Nutshell is a clear and concise introduction and reference for anyone new to the subject. Thoroughly revised and expanded, this edition helps you gain a solid understanding of statistics without the numbing complexity of many college texts. Each chapter presents easy-to-follow descriptions, along with graphics, formulas, solved examples, and hands-on exercises. If you want to perform common statistical analyses and learn a wide range of techniques without getting in over your head, this is your book.
Use of agricultural statistics to verify the interannual variability in land surface models: a case study over France with ISBA-A-gs

Directory of Open Access Journals (Sweden)

J.-C. Calvet

2012-01-01

Full Text Available In order to verify the interannual variability of the above-ground biomass of herbaceous vegetation simulated by the ISBA-A-gs land surface model, within the SURFEX modelling platform, French agricultural statistics for C3 crops and grasslands were compared with the simulations for the 1994–2008 period. While excellent correlations are obtained for grasslands, representing the interannual variability of crops is more difficult. It is shown that, the Maximum Available soil Water Capacity (MaxAWC has a large influence on the correlation between the model and the agricultural statistics. In particular, high values of MaxAWC tend to reduce the impact of the climate interannual variability on the simulated biomass. Also, high values of MaxAWC allow the simulation of a negative trend in biomass production, in relation to a marked warming trend, of about 0.12 Kyr⁻¹ on average, affecting the daily maximum air temperature during the growing period (April–June. This trend is particularly acute in Northern France. The estimates of MaxAWC for C3 crops and grasslands, currently used in SURFEX, are about 129 mm and do not vary much. Therefore, more accurate grid-cell values of this parameter are needed.
Visual classification of very fine-grained sediments: Evaluation through univariate and multivariate statistics

Science.gov (United States)

Hohn, M. Ed; Nuhfer, E.B.; Vinopal, R.J.; Klanderman, D.S.

1980-01-01

Classifying very fine-grained rocks through fabric elements provides information about depositional environments, but is subject to the biases of visual taxonomy. To evaluate the statistical significance of an empirical classification of very fine-grained rocks, samples from Devonian shales in four cored wells in West Virginia and Virginia were measured for 15 variables: quartz, illite, pyrite and expandable clays determined by X-ray diffraction; total sulfur, organic content, inorganic carbon, matrix density, bulk density, porosity, silt, as well as density, sonic travel time, resistivity, and ??-ray response measured from well logs. The four lithologic types comprised: (1) sharply banded shale, (2) thinly laminated shale, (3) lenticularly laminated shale, and (4) nonbanded shale. Univariate and multivariate analyses of variance showed that the lithologic classification reflects significant differences for the variables measured, difference that can be detected independently of stratigraphic effects. Little-known statistical methods found useful in this work included: the multivariate analysis of variance with more than one effect, simultaneous plotting of samples and variables on canonical variates, and the use of parametric ANOVA and MANOVA on ranked data. ?? 1980 Plenum Publishing Corporation.
Statistics Clinic

Science.gov (United States)

Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

2014-01-01

Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
Statistical Pattern Recognition

CERN Document Server

Webb, Andrew R

2011-01-01

Statistical pattern recognition relates to the use of statistical techniques for analysing data measurements in order to extract information and make justified decisions. It is a very active area of study and research, which has seen many advances in recent years. Applications such as data mining, web searching, multimedia data retrieval, face recognition, and cursive handwriting recognition, all require robust and efficient pattern recognition techniques. This third edition provides an introduction to statistical pattern theory and techniques, with material drawn from a wide range of fields,
Statistical Models for Social Networks

NARCIS (Netherlands)

Snijders, Tom A. B.; Cook, KS; Massey, DS

2011-01-01

Statistical models for social networks as dependent variables must represent the typical network dependencies between tie variables such as reciprocity, homophily, transitivity, etc. This review first treats models for single (cross-sectionally observed) networks and then for network dynamics. For
Boating Accident Statistics

Data.gov (United States)

Department of Homeland Security — Accident statistics available on the Coast Guard’s website by state, year, and one variable to obtain tables and/or graphs. Data from reports has been loaded for...
Statistical variability and confidence intervals for planar dose QA pass rates

Energy Technology Data Exchange (ETDEWEB)

Bailey, Daniel W.; Nelms, Benjamin E.; Attwood, Kristopher; Kumaraswamy, Lalith; Podgorsak, Matthew B. [Department of Physics, State University of New York at Buffalo, Buffalo, New York 14260 (United States) and Department of Radiation Medicine, Roswell Park Cancer Institute, Buffalo, New York 14263 (United States); Canis Lupus LLC, Merrimac, Wisconsin 53561 (United States); Department of Biostatistics, Roswell Park Cancer Institute, Buffalo, New York 14263 (United States); Department of Radiation Medicine, Roswell Park Cancer Institute, Buffalo, New York 14263 (United States); Department of Radiation Medicine, Roswell Park Cancer Institute, Buffalo, New York 14263 (United States); Department of Molecular and Cellular Biophysics and Biochemistry, Roswell Park Cancer Institute, Buffalo, New York 14263 (United States) and Department of Physiology and Biophysics, State University of New York at Buffalo, Buffalo, New York 14214 (United States)

2011-11-15

Purpose: The most common metric for comparing measured to calculated dose, such as for pretreatment quality assurance of intensity-modulated photon fields, is a pass rate (%) generated using percent difference (%Diff), distance-to-agreement (DTA), or some combination of the two (e.g., gamma evaluation). For many dosimeters, the grid of analyzed points corresponds to an array with a low areal density of point detectors. In these cases, the pass rates for any given comparison criteria are not absolute but exhibit statistical variability that is a function, in part, on the detector sampling geometry. In this work, the authors analyze the statistics of various methods commonly used to calculate pass rates and propose methods for establishing confidence intervals for pass rates obtained with low-density arrays. Methods: Dose planes were acquired for 25 prostate and 79 head and neck intensity-modulated fields via diode array and electronic portal imaging device (EPID), and matching calculated dose planes were created via a commercial treatment planning system. Pass rates for each dose plane pair (both centered to the beam central axis) were calculated with several common comparison methods: %Diff/DTA composite analysis and gamma evaluation, using absolute dose comparison with both local and global normalization. Specialized software was designed to selectively sample the measured EPID response (very high data density) down to discrete points to simulate low-density measurements. The software was used to realign the simulated detector grid at many simulated positions with respect to the beam central axis, thereby altering the low-density sampled grid. Simulations were repeated with 100 positional iterations using a 1 detector/cm{sup 2} uniform grid, a 2 detector/cm{sup 2} uniform grid, and similar random detector grids. For each simulation, %/DTA composite pass rates were calculated with various %Diff/DTA criteria and for both local and global %Diff normalization
Fundamental data analyses for measurement control

International Nuclear Information System (INIS)

Campbell, K.; Barlich, G.L.; Fazal, B.; Strittmatter, R.B.

1987-02-01

A set of measurment control data analyses was selected for use by analysts responsible for maintaining measurement quality of nuclear materials accounting instrumentation. The analyses consist of control charts for bias and precision and statistical tests used as analytic supplements to the control charts. They provide the desired detection sensitivity and yet can be interpreted locally, quickly, and easily. The control charts provide for visual inspection of data and enable an alert reviewer to spot problems possibly before statistical tests detect them. The statistical tests are useful for automating the detection of departures from the controlled state or from the underlying assumptions (such as normality). 8 refs., 3 figs., 5 tabs
Does habitat variability really promote metabolic network modularity?

Science.gov (United States)

Takemoto, Kazuhiro

2013-01-01

The hypothesis that variability in natural habitats promotes modular organization is widely accepted for cellular networks. However, results of some data analyses and theoretical studies have begun to cast doubt on the impact of habitat variability on modularity in metabolic networks. Therefore, we re-evaluated this hypothesis using statistical data analysis and current metabolic information. We were unable to conclude that an increase in modularity was the result of habitat variability. Although horizontal gene transfer was also considered because it may contribute for survival in a variety of environments, closely related to habitat variability, and is known to be positively correlated with network modularity, such a positive correlation was not concluded in the latest version of metabolic networks. Furthermore, we demonstrated that the previously observed increase in network modularity due to habitat variability and horizontal gene transfer was probably due to a lack of available data on metabolic reactions. Instead, we determined that modularity in metabolic networks is dependent on species growth conditions. These results may not entirely discount the impact of habitat variability and horizontal gene transfer. Rather, they highlight the need for a more suitable definition of habitat variability and a more careful examination of relationships of the network modularity with horizontal gene transfer, habitats, and environments.
Homeostasis and Gauss statistics: barriers to understanding natural variability.

Science.gov (United States)

West, Bruce J

2010-06-01

In this paper, the concept of knowledge is argued to be the top of a three-tiered system of science. The first tier is that of measurement and data, followed by information consisting of the patterns within the data, and ending with theory that interprets the patterns and yields knowledge. Thus, when a scientific theory ceases to be consistent with the database the knowledge based on that theory must be re-examined and potentially modified. Consequently, all knowledge, like glory, is transient. Herein we focus on the non-normal statistics of physiologic time series and conclude that the empirical inverse power-law statistics and long-time correlations are inconsistent with the theoretical notion of homeostasis. We suggest replacing the notion of homeostasis with that of Fractal Physiology.
A primer of multivariate statistics

CERN Document Server

Harris, Richard J

2014-01-01

Drawing upon more than 30 years of experience in working with statistics, Dr. Richard J. Harris has updated A Primer of Multivariate Statistics to provide a model of balance between how-to and why. This classic text covers multivariate techniques with a taste of latent variable approaches. Throughout the book there is a focus on the importance of describing and testing one's interpretations of the emergent variables that are produced by multivariate analysis. This edition retains its conversational writing style while focusing on classical techniques. The book gives the reader a feel for why
Discovering human germ cell mutagens with whole genome sequencing: Insights from power calculations reveal the importance of controlling for between-family variability.

Science.gov (United States)

Webster, R J; Williams, A; Marchetti, F; Yauk, C L

2018-07-01

Mutations in germ cells pose potential genetic risks to offspring. However, de novo mutations are rare events that are spread across the genome and are difficult to detect. Thus, studies in this area have generally been under-powered, and no human germ cell mutagen has been identified. Whole Genome Sequencing (WGS) of human pedigrees has been proposed as an approach to overcome these technical and statistical challenges. WGS enables analysis of a much wider breadth of the genome than traditional approaches. Here, we performed power analyses to determine the feasibility of using WGS in human families to identify germ cell mutagens. Different statistical models were compared in the power analyses (ANOVA and multiple regression for one-child families, and mixed effect model sampling between two to four siblings per family). Assumptions were made based on parameters from the existing literature, such as the mutation-by-paternal age effect. We explored two scenarios: a constant effect due to an exposure that occurred in the past, and an accumulating effect where the exposure is continuing. Our analysis revealed the importance of modeling inter-family variability of the mutation-by-paternal age effect. Statistical power was improved by models accounting for the family-to-family variability. Our power analyses suggest that sufficient statistical power can be attained with 4-28 four-sibling families per treatment group, when the increase in mutations ranges from 40 to 10% respectively. Modeling family variability using mixed effect models provided a reduction in sample size compared to a multiple regression approach. Much larger sample sizes were required to detect an interaction effect between environmental exposures and paternal age. These findings inform study design and statistical modeling approaches to improve power and reduce sequencing costs for future studies in this area. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
A Systematic Review of Statistical Methods Used to Test for Reliability of Medical Instruments Measuring Continuous Variables

Directory of Open Access Journals (Sweden)

Rafdzah Zaki

2013-06-01

Full Text Available Objective(s: Reliability measures precision or the extent to which test results can be replicated. This is the first ever systematic review to identify statistical methods used to measure reliability of equipment measuring continuous variables. This studyalso aims to highlight the inappropriate statistical method used in the reliability analysis and its implication in the medical practice. Materials and Methods: In 2010, five electronic databases were searched between 2007 and 2009 to look for reliability studies. A total of 5,795 titles were initially identified. Only 282 titles were potentially related, and finally 42 fitted the inclusion criteria. Results: The Intra-class Correlation Coefficient (ICC is the most popular method with 25 (60% studies having used this method followed by the comparing means (8 or 19%. Out of 25 studies using the ICC, only 7 (28% reported the confidence intervals and types of ICC used. Most studies (71% also tested the agreement of instruments. Conclusion: This study finds that the Intra-class Correlation Coefficient is the most popular method used to assess the reliability of medical instruments measuring continuous outcomes. There are also inappropriate applications and interpretations of statistical methods in some studies. It is important for medical researchers to be aware of this issue, and be able to correctly perform analysis in reliability studies.
Does environmental data collection need statistics?

NARCIS (Netherlands)

Pulles, M.P.J.

1998-01-01

The term 'statistics' with reference to environmental science and policymaking might mean different things: the development of statistical methodology, the methodology developed by statisticians to interpret and analyse such data, or the statistical data that are needed to understand environmental
Identifying individuality and variability in team tactics by means of statistical shape analysis and multilayer perceptrons.

Science.gov (United States)

Jäger, Jörg M; Schöllhorn, Wolfgang I

2012-04-01

Offensive and defensive systems of play represent important aspects of team sports. They include the players' positions at certain situations during a match, i.e., when players have to be on specific positions on the court. Patterns of play emerge based on the formations of the players on the court. Recognition of these patterns is important to react adequately and to adjust own strategies to the opponent. Furthermore, the ability to apply variable patterns of play seems to be promising since they make it harder for the opponent to adjust. The purpose of this study is to identify different team tactical patterns in volleyball and to analyze differences in variability. Overall 120 standard situations of six national teams in women's volleyball are analyzed during a world championship tournament. Twenty situations from each national team are chosen, including the base defence position (start configuration) and the two players block with middle back deep (end configuration). The shapes of the defence formations at the start and end configurations during the defence of each national team as well as the variability of these defence formations are statistically analyzed. Furthermore these shapes data are used to train multilayer perceptrons in order to test whether artificial neural networks can recognize the teams by their tactical patterns. Results show significant differences between the national teams in both the base defence position at the start and the two players block with middle back deep at the end of the standard defence situation. Furthermore, the national teams show significant differences in variability of the defence systems and start-positions are more variable than the end-positions. Multilayer perceptrons are able to recognize the teams at an average of 98.5%. It is concluded that defence systems in team sports are highly individual at a competitive level and variable even in standard situations. Artificial neural networks can be used to recognize
Statistical quality management using miniTAB 14

International Nuclear Information System (INIS)

An, Seong Jin

2007-01-01

This book explains statistical quality management giving descriptions of definition of quality, quality management, quality cost, basic methods of quality management, principles of control chart, control chart for variables, control chart for attributes, capability analysis, other issues of statistical process control, acceptance sampling, sampling for variable acceptance, design and analysis of experiment, Taguchi quality engineering, reaction surface methodology reliability analysis.
Reporting characteristics of meta-analyses in orthodontics: methodological assessment and statistical recommendations.

Science.gov (United States)

Papageorgiou, Spyridon N; Papadopoulos, Moschos A; Athanasiou, Athanasios E

2014-02-01

Ideally meta-analyses (MAs) should consolidate the characteristics of orthodontic research in order to produce an evidence-based answer. However severe flaws are frequently observed in most of them. The aim of this study was to evaluate the statistical methods, the methodology, and the quality characteristics of orthodontic MAs and to assess their reporting quality during the last years. Electronic databases were searched for MAs (with or without a proper systematic review) in the field of orthodontics, indexed up to 2011. The AMSTAR tool was used for quality assessment of the included articles. Data were analyzed with Student's t-test, one-way ANOVA, and generalized linear modelling. Risk ratios with 95% confidence intervals were calculated to represent changes during the years in reporting of key items associated with quality. A total of 80 MAs with 1086 primary studies were included in this evaluation. Using the AMSTAR tool, 25 (27.3%) of the MAs were found to be of low quality, 37 (46.3%) of medium quality, and 18 (22.5%) of high quality. Specific characteristics like explicit protocol definition, extensive searches, and quality assessment of included trials were associated with a higher AMSTAR score. Model selection and dealing with heterogeneity or publication bias were often problematic in the identified reviews. The number of published orthodontic MAs is constantly increasing, while their overall quality is considered to range from low to medium. Although the number of MAs of medium and high level seems lately to rise, several other aspects need improvement to increase their overall quality.

A statistical analysis of the body condition of cows from two veterinary stations in Zimbabwe

International Nuclear Information System (INIS)

Saporu, F.W.O.

2003-12-01

The improvement of livestock production is important for Zimbabwe's agriculturally base economy. This paper examines the relationship between the body condition and metabolic parameters of female cows, for the better understanding of traditional livestock farming in Zimbabwe. The data analysed are part of the baseline data on the improvement of livestock production, collected from two sites Chinamora and Bulawayo. Body condition is indexed by body score. Thirty-five variables are examined. The variable selection method employed is stepwise regression. Regression model assumptions of normality and independent observations are checked using normal probability plot and Durbin-Watson statistics for autocorrelation of residuals. Collinearity and outlier problems are examined using eigenanalysis and influence statistics. The effect of some factors, such as, site, which relates to livestock management, parity and season, categorized by the quality of forage available for grazing, are also studied. The data are analysed using SAS statistical package on a Personal Computer. The results show that only about four variables substantially influence the relationship in each of the two sites considered. For the better managed site, Bulawayo, these are PCV, Calcium and WBC. Strongyles, Progesterone Level, Phosphate and HB are obtained in Chinamora. Negative correlation coefficient corresponds to strongyles only. That is, the effect of stronglyes is to reduce the value of bodyscore. For other variables, an improvement in their respective values will bring about improved body condition. Site difference is identified as a factor affecting the relationship. This emphasizes the role of good management in livestock production. Parity and season are also identified. Only two interactions are significant; site-season and a progesterone level-season interaction. The latter is obtained only in Chinamora site and it can be deduced that the cyclic cows are exposed to the risk of loosing their
Human Responses to Climate Variability: The Case of South Africa

Science.gov (United States)

Oppenheimer, M.; Licker, R.; Mastrorillo, M.; Bohra-Mishra, P.; Estes, L. D.; Cai, R.

2014-12-01

Climate variability has been associated with a range of societal and individual outcomes including migration, violent conflict, changes in labor productivity, and health impacts. Some of these may be direct responses to changes in mean temperature or precipitation or extreme events, such as displacement of human populations by tropical cyclones. Others may be mediated by a variety of biological, social, or ecological factors such as migration in response to long-term changes in crops yields. Research is beginning to elucidate and distinguish the many channels through which climate variability may influence human behavior (ranging from the individual to the collective, societal level) in order to better understand how to improve resilience in the face of current variability as well as future climate change. Using a variety of data sets from South Africa, we show how climate variability has influenced internal (within country) migration in recent history. We focus on South Africa as it is a country with high levels of internal migration and dramatic temperature and precipitation changes projected for the 21st century. High poverty rates and significant levels of rain-fed, smallholder agriculture leave large portions of South Africa's population base vulnerable to future climate change. In this study, we utilize two complementary statistical models - one micro-level model, driven by individual and household level survey data, and one macro-level model, driven by national census statistics. In both models, we consider the effect of climate on migration both directly (with gridded climate reanalysis data) and indirectly (with agricultural production statistics). With our historical analyses of climate variability, we gain insights into how the migration decisions of South Africans may be influenced by future climate change. We also offer perspective on the utility of micro and macro level approaches in the study of climate change and human migration.
Statistics for NAEG: past efforts, new results, and future plans

International Nuclear Information System (INIS)

Gilbert, R.O.; Simpson, J.C.; Kinnison, R.R.; Engel, D.W.

1983-06-01

A brief review of Nevada Applied Ecology Group (NAEG) objectives is followed by a summary of past statistical analyses conducted by Pacific Northwest Laboratory for the NAEG. Estimates of spatial pattern of radionuclides and other statistical analyses at NS's 201, 219 and 221 are reviewed as background for new analyses presented in this paper. Suggested NAEG activities and statistical analyses needed for the projected termination date of NAEG studies in March 1986 are given
Chemometric and Statistical Analyses of ToF-SIMS Spectra of Increasingly Complex Biological Samples

Energy Technology Data Exchange (ETDEWEB)

Berman, E S; Wu, L; Fortson, S L; Nelson, D O; Kulp, K S; Wu, K J

2007-10-24

Characterizing and classifying molecular variation within biological samples is critical for determining fundamental mechanisms of biological processes that will lead to new insights including improved disease understanding. Towards these ends, time-of-flight secondary ion mass spectrometry (ToF-SIMS) was used to examine increasingly complex samples of biological relevance, including monosaccharide isomers, pure proteins, complex protein mixtures, and mouse embryo tissues. The complex mass spectral data sets produced were analyzed using five common statistical and chemometric multivariate analysis techniques: principal component analysis (PCA), linear discriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), soft independent modeling of class analogy (SIMCA), and decision tree analysis by recursive partitioning. PCA was found to be a valuable first step in multivariate analysis, providing insight both into the relative groupings of samples and into the molecular basis for those groupings. For the monosaccharides, pure proteins and protein mixture samples, all of LDA, PLSDA, and SIMCA were found to produce excellent classification given a sufficient number of compound variables calculated. For the mouse embryo tissues, however, SIMCA did not produce as accurate a classification. The decision tree analysis was found to be the least successful for all the data sets, providing neither as accurate a classification nor chemical insight for any of the tested samples. Based on these results we conclude that as the complexity of the sample increases, so must the sophistication of the multivariate technique used to classify the samples. PCA is a preferred first step for understanding ToF-SIMS data that can be followed by either LDA or PLSDA for effective classification analysis. This study demonstrates the strength of ToF-SIMS combined with multivariate statistical and chemometric techniques to classify increasingly complex biological samples
Variable selection in multivariate calibration based on clustering of variable concept.

Science.gov (United States)

Farrokhnia, Maryam; Karimi, Sadegh

2016-01-01

Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached. Copyright © 2015 Elsevier B.V. All rights reserved.
Statistical Modelling of Synaptic Vesicles Distribution and Analysing their Physical Characteristics

DEFF Research Database (Denmark)

Khanmohammadi, Mahdieh

transmission electron microscopy is used to acquire images from two experimental groups of rats: 1) rats subjected to a behavioral model of stress and 2) rats subjected to sham stress as the control group. The synaptic vesicle distribution and interactions are modeled by employing a point process approach......This Ph.D. thesis deals with mathematical and statistical modeling of synaptic vesicle distribution, shape, orientation and interactions. The first major part of this thesis treats the problem of determining the effect of stress on synaptic vesicle distribution and interactions. Serial section...... on differences of statistical measures in section and the same measures in between sections. Three-dimensional (3D) datasets are reconstructed by using image registration techniques and estimated thicknesses. We distinguish the effect of stress by estimating the synaptic vesicle densities and modeling...
Hierarchical probabilistic regionalization of volcanism for Sengan region in Japan using multivariate statistical techniques and geostatistical interpolation techniques

International Nuclear Information System (INIS)

Park, Jinyong; Balasingham, P.; McKenna, Sean Andrew; Kulatilake, Pinnaduwa H. S. W.

2004-01-01

Sandia National Laboratories, under contract to Nuclear Waste Management Organization of Japan (NUMO), is performing research on regional classification of given sites in Japan with respect to potential volcanic disruption using multivariate statistics and geo-statistical interpolation techniques. This report provides results obtained for hierarchical probabilistic regionalization of volcanism for the Sengan region in Japan by applying multivariate statistical techniques and geostatistical interpolation techniques on the geologic data provided by NUMO. A workshop report produced in September 2003 by Sandia National Laboratories (Arnold et al., 2003) on volcanism lists a set of most important geologic variables as well as some secondary information related to volcanism. Geologic data extracted for the Sengan region in Japan from the data provided by NUMO revealed that data are not available at the same locations for all the important geologic variables. In other words, the geologic variable vectors were found to be incomplete spatially. However, it is necessary to have complete geologic variable vectors to perform multivariate statistical analyses. As a first step towards constructing complete geologic variable vectors, the Universal Transverse Mercator (UTM) zone 54 projected coordinate system and a 1 km square regular grid system were selected. The data available for each geologic variable on a geographic coordinate system were transferred to the aforementioned grid system. Also the recorded data on volcanic activity for Sengan region were produced on the same grid system. Each geologic variable map was compared with the recorded volcanic activity map to determine the geologic variables that are most important for volcanism. In the regionalized classification procedure, this step is known as the variable selection step. The following variables were determined as most important for volcanism: geothermal gradient, groundwater temperature, heat discharge, groundwater
To what extent does variability of historical rainfall series influence extreme event statistics of sewer system surcharge and overflows?

Science.gov (United States)

Schaarup-Jensen, K; Rasmussen, M R; Thorndahl, S

2009-01-01

In urban drainage modelling long-term extreme statistics has become an important basis for decision-making e.g. in connection with renovation projects. Therefore it is of great importance to minimize the uncertainties with regards to long-term prediction of maximum water levels and combined sewer overflow (CSO) in drainage systems. These uncertainties originate from large uncertainties regarding rainfall inputs, parameters, and assessment of return periods. This paper investigates how the choice of rainfall time series influences the extreme events statistics of max water levels in manholes and CSO volumes. Traditionally, long-term rainfall series, from a local rain gauge, are unavailable. In the present case study, however, long and local rain series are available. 2 rainfall gauges have recorded events for approximately 9 years at 2 locations within the catchment. Beside these 2 gauges another 7 gauges are located at a distance of max 20 kilometers from the catchment. All gauges are included in the Danish national rain gauge system which was launched in 1976. The paper describes to what extent the extreme events statistics based on these 9 series diverge from each other and how this diversity can be handled, e.g. by introducing an "averaging procedure" based on the variability within the set of statistics. All simulations are performed by means of the MOUSE LTS model.
Statistics available for site studies in registers and surveys at Statistics Sweden

Energy Technology Data Exchange (ETDEWEB)

Haldorson, Marie [Statistics Sweden, Oerebro (Sweden)

2000-03-01

Statistics Sweden (SCB) has produced this report on behalf of the Swedish Nuclear Fuel and Waste Management Company (SKB), as part of the data to be used by SKB in conducting studies of potential sites. The report goes over the statistics obtainable from SCB in the form of registers and surveys. The purpose is to identify the variables that are available, and to specify their degree of geographical detail and the time series that are available. Chapter two describes the statistical registers available at SCB, registers that share the common feature that they provide total coverage, i.e. they contain all 'objects' of a given type, such as population, economic activities (e.g. from statements of employees' earnings provided to the tax authorities), vehicles, enterprises or real estate. SCB has exclusive responsibility for seven of the nine registers included in the chapter, while two registers are ordered by public authorities with statistical responsibilities. Chapter three describes statistical surveys that are conducted by SCB, with the exception of the National Forest Inventory, which is carried out by the Swedish University of Agricultural Sciences. In terms of geographical breakdown, the degree of detail in the surveys varies, but all provide some possibility of reporting data at lower than the national level. The level involved may be county, municipality, yield district, coastal district or category of enterprises, e.g. aquaculture. Six of the nine surveys included in the chapter have been ordered by public authorities with statistical responsibilities, while SCB has exclusive responsibility for the others. Chapter four presents an overview of the statistics on land use maintained by SCB. This chapter does not follow the same pattern as chapters two and three but instead gives a more general account. The conclusion can be drawn that there are good prospects that SKB can make use of SCB's data as background information or in other ways when
Statistics available for site studies in registers and surveys at Statistics Sweden

International Nuclear Information System (INIS)

Haldorson, Marie

2000-03-01

Statistics Sweden (SCB) has produced this report on behalf of the Swedish Nuclear Fuel and Waste Management Company (SKB), as part of the data to be used by SKB in conducting studies of potential sites. The report goes over the statistics obtainable from SCB in the form of registers and surveys. The purpose is to identify the variables that are available, and to specify their degree of geographical detail and the time series that are available. Chapter two describes the statistical registers available at SCB, registers that share the common feature that they provide total coverage, i.e. they contain all 'objects' of a given type, such as population, economic activities (e.g. from statements of employees' earnings provided to the tax authorities), vehicles, enterprises or real estate. SCB has exclusive responsibility for seven of the nine registers included in the chapter, while two registers are ordered by public authorities with statistical responsibilities. Chapter three describes statistical surveys that are conducted by SCB, with the exception of the National Forest Inventory, which is carried out by the Swedish University of Agricultural Sciences. In terms of geographical breakdown, the degree of detail in the surveys varies, but all provide some possibility of reporting data at lower than the national level. The level involved may be county, municipality, yield district, coastal district or category of enterprises, e.g. aquaculture. Six of the nine surveys included in the chapter have been ordered by public authorities with statistical responsibilities, while SCB has exclusive responsibility for the others. Chapter four presents an overview of the statistics on land use maintained by SCB. This chapter does not follow the same pattern as chapters two and three but instead gives a more general account. The conclusion can be drawn that there are good prospects that SKB can make use of SCB's data as background information or in other ways when undertaking future
Statistics available for site studies in registers and surveys at Statistics Sweden

Energy Technology Data Exchange (ETDEWEB)

Haldorson, Marie [Statistics Sweden, Oerebro (Sweden)

2000-03-01

Statistics Sweden (SCB) has produced this report on behalf of the Swedish Nuclear Fuel and Waste Management Company (SKB), as part of the data to be used by SKB in conducting studies of potential sites. The report goes over the statistics obtainable from SCB in the form of registers and surveys. The purpose is to identify the variables that are available, and to specify their degree of geographical detail and the time series that are available. Chapter two describes the statistical registers available at SCB, registers that share the common feature that they provide total coverage, i.e. they contain all 'objects' of a given type, such as population, economic activities (e.g. from statements of employees' earnings provided to the tax authorities), vehicles, enterprises or real estate. SCB has exclusive responsibility for seven of the nine registers included in the chapter, while two registers are ordered by public authorities with statistical responsibilities. Chapter three describes statistical surveys that are conducted by SCB, with the exception of the National Forest Inventory, which is carried out by the Swedish University of Agricultural Sciences. In terms of geographical breakdown, the degree of detail in the surveys varies, but all provide some possibility of reporting data at lower than the national level. The level involved may be county, municipality, yield district, coastal district or category of enterprises, e.g. aquaculture. Six of the nine surveys included in the chapter have been ordered by public authorities with statistical responsibilities, while SCB has exclusive responsibility for the others. Chapter four presents an overview of the statistics on land use maintained by SCB. This chapter does not follow the same pattern as chapters two and three but instead gives a more general account. The conclusion can be drawn that there are good prospects that SKB can make use of SCB's data as background information or in other ways when undertaking future
Additional methodology development for statistical evaluation of reactor safety analyses

International Nuclear Information System (INIS)

Marshall, J.A.; Shore, R.W.; Chay, S.C.; Mazumdar, M.

1977-03-01

The project described is motivated by the desire for methods to quantify uncertainties and to identify conservatisms in nuclear power plant safety analysis. The report examines statistical methods useful for assessing the probability distribution of output response from complex nuclear computer codes, considers sensitivity analysis and several other topics, and also sets the path for using the developed methods for realistic assessment of the design basis accident
Methods for Clustering Variables and the Use of them in Statistical Packages

Czech Academy of Sciences Publication Activity Database

Řezanková, H.; Húsek, Dušan

2002-01-01

Roč. 10, - (2002), s. 153-160 ISSN 1210-809X. [Applications of Mathematics and Statistics in Economy. Zadov, 13.09.2001-14.09.2001] R&D Projects: GA ČR GA201/01/1192 Institutional research plan: AV0Z1030915 Keywords : factor analysis * cluster analysis * multidimensional * statistical packages Subject RIV: BB - Applied Statistics, Operational Research
Seabed mapping and characterization of sediment variability using the usSEABED data base

Science.gov (United States)

Goff, J.A.; Jenkins, C.J.; Jeffress, Williams S.

2008-01-01

We present a methodology for statistical analysis of randomly located marine sediment point data, and apply it to the US continental shelf portions of usSEABED mean grain size records. The usSEABED database, like many modern, large environmental datasets, is heterogeneous and interdisciplinary. We statistically test the database as a source of mean grain size data, and from it provide a first examination of regional seafloor sediment variability across the entire US continental shelf. Data derived from laboratory analyses ("extracted") and from word-based descriptions ("parsed") are treated separately, and they are compared statistically and deterministically. Data records are selected for spatial analysis by their location within sample regions: polygonal areas defined in ArcGIS chosen by geography, water depth, and data sufficiency. We derive isotropic, binned semivariograms from the data, and invert these for estimates of noise variance, field variance, and decorrelation distance. The highly erratic nature of the semivariograms is a result both of the random locations of the data and of the high level of data uncertainty (noise). This decorrelates the data covariance matrix for the inversion, and largely prevents robust estimation of the fractal dimension. Our comparison of the extracted and parsed mean grain size data demonstrates important differences between the two. In particular, extracted measurements generally produce finer mean grain sizes, lower noise variance, and lower field variance than parsed values. Such relationships can be used to derive a regionally dependent conversion factor between the two. Our analysis of sample regions on the US continental shelf revealed considerable geographic variability in the estimated statistical parameters of field variance and decorrelation distance. Some regional relationships are evident, and overall there is a tendency for field variance to be higher where the average mean grain size is finer grained
Statistical evaluation of variables affecting occurrence of hydrocarbons in aquifers used for public supply, California

Science.gov (United States)

Landon, Matthew K.; Burton, Carmen A.; Davis, Tracy A.; Belitz, Kenneth; Johnson, Tyler D.

2014-01-01

The variables affecting the occurrence of hydrocarbons in aquifers used for public supply in California were assessed based on statistical evaluation of three large statewide datasets; gasoline oxygenates also were analyzed for comparison with hydrocarbons. Benzene is the most frequently detected (1.7%) compound among 17 hydrocarbons analyzed at generally low concentrations (median detected concentration 0.024 μg/l) in groundwater used for public supply in California; methyl tert-butyl ether (MTBE) is the most frequently detected (5.8%) compound among seven oxygenates analyzed (median detected concentration 0.1 μg/l). At aquifer depths used for public supply, hydrocarbons and MTBE rarely co-occur and are generally related to different variables; in shallower groundwater, co-occurrence is more frequent and there are similar relations to the density or proximity of potential sources. Benzene concentrations are most strongly correlated with reducing conditions, regardless of groundwater age and depth. Multiple lines of evidence indicate that benzene and other hydrocarbons detected in old, deep, and/or brackish groundwater result from geogenic sources of oil and gas. However, in recently recharged (since ~1950), generally shallower groundwater, higher concentrations and detection frequencies of benzene and hydrocarbons were associated with a greater proportion of commercial land use surrounding the well, likely reflecting effects of anthropogenic sources, particularly in combination with reducing conditions.
The number of subjects per variable required in linear regression analyses

NARCIS (Netherlands)

P.C. Austin (Peter); E.W. Steyerberg (Ewout)

2015-01-01

textabstractObjectives To determine the number of independent variables that can be included in a linear regression model. Study Design and Setting We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression
Statistical analyses of the performance of Macedonian investment and pension funds

Directory of Open Access Journals (Sweden)

Petar Taleski

2015-10-01

Full Text Available The foundation of the post-modern portfolio theory is creating a portfolio based on a desired target return. This specifically applies to the performance of investment and pension funds that provide a rate of return meeting payment requirements from investment funds. A desired target return is the goal of an investment or pension fund. It is the primary benchmark used to measure performances, dynamic monitoring and evaluation of the risk–return ratio on investment funds. The analysis in this paper is based on monthly returns of Macedonian investment and pension funds (June 2011 - June 2014. Such analysis utilizes the basic, but highly informative statistical characteristic moments like skewness, kurtosis, Jarque–Bera, and Chebyishev’s Inequality. The objective of this study is to perform a trough analysis, utilizing the above mentioned and other types of statistical techniques (Sharpe, Sortino, omega, upside potential, Calmar, Sterling to draw relevant conclusions regarding the risks and characteristic moments in Macedonian investment and pension funds. Pension funds are the second largest segment of the financial system, and has great potential for further growth due to constant inflows from pension insurance. The importance of investment funds for the financial system in the Republic of Macedonia is still small, although open-end investment funds have been the fastest growing segment of the financial system. Statistical analysis has shown that pension funds have delivered a significantly positive volatility-adjusted risk premium in the analyzed period more so than investment funds.
Statistical power of intervention analyses: simulation and empirical application to treated lumber prices

Science.gov (United States)

Jeffrey P. Prestemon

2009-01-01

Timber product markets are subject to large shocks deriving from natural disturbances and policy shifts. Statistical modeling of shocks is often done to assess their economic importance. In this article, I simulate the statistical power of univariate and bivariate methods of shock detection using time series intervention models. Simulations show that bivariate methods...
Notices about using elementary statistics in psychology

OpenAIRE

松田, 文子; 三宅, 幹子; 橋本, 優花里; 山崎, 理央; 森田, 愛子; 小嶋, 佳子

2003-01-01

Improper uses of elementary statistics that were often observed in beginners' manuscripts and papers were collected and better ways were suggested. This paper consists of three parts: About descriptive statistics, multivariate analyses, and statistical tests.
Assessing Regional Scale Variability in Extreme Value Statistics Under Altered Climate Scenarios

Energy Technology Data Exchange (ETDEWEB)

Brunsell, Nathaniel [Univ. of Kansas, Lawrence, KS (United States); Mechem, David [Univ. of Kansas, Lawrence, KS (United States); Ma, Chunsheng [Wichita State Univ., KS (United States)

2015-02-20

Recent studies have suggested that low-frequency modes of climate variability can significantly influence regional climate. The climatology associated with extreme events has been shown to be particularly sensitive. This has profound implications for droughts, heat waves, and food production. We propose to examine regional climate simulations conducted over the continental United States by applying a recently developed technique which combines wavelet multi–resolution analysis with information theory metrics. This research is motivated by two fundamental questions concerning the spatial and temporal structure of extreme events. These questions are 1) what temporal scales of the extreme value distributions are most sensitive to alteration by low-frequency climate forcings and 2) what is the nature of the spatial structure of variation in these timescales? The primary objective is to assess to what extent information theory metrics can be useful in characterizing the nature of extreme weather phenomena. Specifically, we hypothesize that (1) changes in the nature of extreme events will impact the temporal probability density functions and that information theory metrics will be sensitive these changes and (2) via a wavelet multi–resolution analysis, we will be able to characterize the relative contribution of different timescales on the stochastic nature of extreme events. In order to address these hypotheses, we propose a unique combination of an established regional climate modeling approach and advanced statistical techniques to assess the effects of low-frequency modes on climate extremes over North America. The behavior of climate extremes in RCM simulations for the 20th century will be compared with statistics calculated from the United States Historical Climatology Network (USHCN) and simulations from the North American Regional Climate Change Assessment Program (NARCCAP). This effort will serve to establish the baseline behavior of climate extremes, the

Introductory statistical inference

CERN Document Server

Mukhopadhyay, Nitis

2014-01-01

This gracefully organized text reveals the rigorous theory of probability and statistical inference in the style of a tutorial, using worked examples, exercises, figures, tables, and computer simulations to develop and illustrate concepts. Drills and boxed summaries emphasize and reinforce important ideas and special techniques.Beginning with a review of the basic concepts and methods in probability theory, moments, and moment generating functions, the author moves to more intricate topics. Introductory Statistical Inference studies multivariate random variables, exponential families of dist
Sea Surface Height Variability and Eddy Statistical Properties in the Red Sea

KAUST Repository

Zhan, Peng

2013-05-01

Satellite sea surface height (SSH) data over 1992-2012 are analyzed to study the spatial and temporal variability of sea level in the Red Sea. Empirical orthogonal functions (EOF) analysis suggests the remarkable seasonality of SSH in the Red Sea, and a significant correlation is found between SSH variation and seasonal wind cycle. A winding-angle based eddy identification algorithm is employed to derive the mesoscale eddy information from SSH data. Totally more than 5500 eddies are detected, belonging to 2583 eddy tracks. Statistics suggest that eddies generate over the entire Red Sea, with two regions in the central basin of high eddy frequency. 76% of the detected eddies have a radius ranging from 40km to 100km, of which both intensity and absolute vorticity decrease with eddy radius. The average eddy lifespan is about 5 weeks, and eddies with longer lifespan tend to have larger radius but less intensity. Different deformation rate exists between anticyclonic eddies (AEs) and cyclonic eddies (CEs), those eddies with higher intensity appear to be less deformed and more circular. Inspection of the 84 long-lived eddies suggests the AEs tend to move a little more northward than CEs. AE generation during summer is obviously lower than that during other seasons, while CE generation is higher during spring and summer. Other features of AEs and CEs are similar with both vorticity and intensity reaching the summer peaks in August and winter peaks in January. Inter-annual variability reveals that the eddies in the Red Sea are isolated from the global event. The eddy property tendencies are different from the south and north basin, both of which exhibit a two-year cycle. Showing a correlation coefficient of -0.91, Brunt–Väisälä frequency is negatively correlated with eddy kinetic energy (EKE), which results from AE activities in the high eddy frequency region. Climatological vertical velocity shear variation is identical with EKE except in the autumn, suggesting the
Relationship between physical fitness and game-related statistics in elite professional basketball players: Regular season vs. playoffs

Directory of Open Access Journals (Sweden)

João Henrique Gomes

2017-05-01

Full Text Available Abstract AIMS This study aimed to verify th erelation ship between of anthropometric and physical performance variables with game-related statistics in professional elite basketball players during a competition. METHODS Eleven male basketball players were evaluated during 10 weeks in two distinct moments (regular season and playoffs. Overall, 11 variables of physical fitness and 13 variables of game-related statistics were analysed. RESULTS The following significant Pearson’scorrelations were found in regular season: percentage of fat mass with assists (r = -0.62 and steals (r = -0.63; height (r = 0.68, lean mass (r = 0.64, and maximum strength (r = 0.67 with blocks; squat jump with steals (r = 0.63; and time in the T-test with success ful two-point field-goals (r = -0.65, success ful free-throws (r = -0.61, and steals (r = -0.62. However, in playoffs, only stature and lean mass maintained these correlations (p ≤ 0.05. CONCLUSIONS The anthropometric and physical characteristics of the players showed few correlations with the game-related statistics in regular season, and these correlations are even lower in the playoff games of a professional elite Champion ship, wherefore, not being good predictors of technical performance.
Game Related Statistics Which Discriminate Between Winning and Losing Under-16 Male Basketball Games

Science.gov (United States)

Lorenzo, Alberto; Gómez, Miguel Ángel; Ortega, Enrique; Ibáñez, Sergio José; Sampaio, Jaime

2010-01-01

The aim of the present study was to identify the game-related statistics which discriminate between winning and losing teams in under-16 years old male basketball games. The sample gathered all 122 games in the 2004 and 2005 Under-16 European Championships. The game-related statistics analysed were the free-throws (both successful and unsuccessful), 2- and 3-points field-goals (both successful and unsuccessful) offensive and defensive rebounds, blocks, assists, fouls, turnovers and steals. The winning teams exhibited lower ball possessions per game and better offensive and defensive efficacy coefficients than the losing teams. Results from discriminant analysis were statistically significant and allowed to emphasize several structure coefficients (SC). In close games (final score differences below 9 points), the discriminant variables were the turnovers (SC = -0.47) and the assists (SC = 0.33). In balanced games (final score differences between 10 and 29 points), the variables that discriminated between the groups were the successful 2-point field-goals (SC = -0.34) and defensive rebounds (SC = -0. 36); and in unbalanced games (final score differences above 30 points) the variables that best discriminated both groups were the successful 2-point field-goals (SC = 0.37). These results allowed understanding that these players' specific characteristics result in a different game-related statistical profile and helped to point out the importance of the perceptive and decision making process in practice and in competition. Key points The players' game-related statistical profile varied according to game type, game outcome and in formative categories in basketball. The results of this work help to point out the different player's performance described in U-16 men's basketball teams compared with senior and professional men's basketball teams. The results obtained enhance the importance of the perceptive and decision making process in practice and in competition. PMID
Properties of incident reporting systems in relation to statistical trend and pattern analysis

International Nuclear Information System (INIS)

Kalfsbeek, H.W.; Arsenis, S.P.

1990-01-01

This paper describes the properties deemed desirable for an incident reporting system in order to render it useful for extracting valid statistical trend and pattern information. The perspective under which a data collection system is seen in this paper is the following: data are essentially gathered on a set of variables describing an event or incident (the items featuring on a reporting format) in order to learn about (multiple) dependencies (called interactions) between these variables. Hence, the necessary features of the data source are highlighted and potential problem sources limiting the validity of the results to be obtained are identified. In this frame, important issues are the reporting completeness, related to the reporting criteria and reporting frequency, and of course the reporting contents and quality. The choice of the report items (the variables) and their categorization (code dictionary) may influence (bias) the insights gained from trend and pattern analyses, as may the presence or absence of a structure for correlating the reported issues within an incident. The issues addressed in this paper are brought in relation to some real world reporting systems on safety related events in Nuclear Power Plants, so that their possibilities and limitations with regard to statistical trend and pattern analysis become manifest
A statistical manual for chemists

CERN Document Server

Bauer, Edward

1971-01-01

A Statistical Manual for Chemists, Second Edition presents simple and fast statistical tools for data analysis of working chemists. This edition is organized into nine chapters and begins with an overview of the fundamental principles of the statistical techniques used in experimental data analysis. The subsequent chapters deal with the concept of statistical average, experimental design, and analysis of variance. The discussion then shifts to control charts, with particular emphasis on variable charts that are more useful to chemists and chemical engineers. A chapter focuses on the effect
Descriptive statistics: the specification of statistical measures and their presentation in tables and graphs. Part 7 of a series on evaluation of scientific publications.

Science.gov (United States)

Spriestersbach, Albert; Röhrig, Bernd; du Prel, Jean-Baptist; Gerhold-Ay, Aslihan; Blettner, Maria

2009-09-01

Descriptive statistics are an essential part of biometric analysis and a prerequisite for the understanding of further statistical evaluations, including the drawing of inferences. When data are well presented, it is usually obvious whether the author has collected and evaluated them correctly and in keeping with accepted practice in the field. Statistical variables in medicine may be of either the metric (continuous, quantitative) or categorical (nominal, ordinal) type. Easily understandable examples are given. Basic techniques for the statistical description of collected data are presented and illustrated with examples. The goal of a scientific study must always be clearly defined. The definition of the target value or clinical endpoint determines the level of measurement of the variables in question. Nearly all variables, whatever their level of measurement, can be usefully presented graphically and numerically. The level of measurement determines what types of diagrams and statistical values are appropriate. There are also different ways of presenting combinations of two independent variables graphically and numerically. The description of collected data is indispensable. If the data are of good quality, valid and important conclusions can already be drawn when they are properly described. Furthermore, data description provides a basis for inferential statistics.
Molecular variability analyses of Apple chlorotic leaf spot virus

Indian Academy of Sciences (India)

The highest degree of variability was observed in the middle portion with 9 amino acid substitutions in contrast to the N-terminal and C-terminal ends, which were maximally conserved with only 4 amino acid substitutions. In phylogenetic analysis no reasonable correlation between host species and/or geographic origin of ...
The statistical stability phenomenon

CERN Document Server

Gorban, Igor I

2017-01-01

This monograph investigates violations of statistical stability of physical events, variables, and processes and develops a new physical-mathematical theory taking into consideration such violations – the theory of hyper-random phenomena. There are five parts. The first describes the phenomenon of statistical stability and its features, and develops methods for detecting violations of statistical stability, in particular when data is limited. The second part presents several examples of real processes of different physical nature and demonstrates the violation of statistical stability over broad observation intervals. The third part outlines the mathematical foundations of the theory of hyper-random phenomena, while the fourth develops the foundations of the mathematical analysis of divergent and many-valued functions. The fifth part contains theoretical and experimental studies of statistical laws where there is violation of statistical stability. The monograph should be of particular interest to engineers...
Statistical learning from a regression perspective

CERN Document Server

Berk, Richard A

2016-01-01

This textbook considers statistical learning applications when interest centers on the conditional distribution of the response variable, given a set of predictors, and when it is important to characterize how the predictors are related to the response. As a first approximation, this can be seen as an extension of nonparametric regression. This fully revised new edition includes important developments over the past 8 years. Consistent with modern data analytics, it emphasizes that a proper statistical learning data analysis derives from sound data collection, intelligent data management, appropriate statistical procedures, and an accessible interpretation of results. A continued emphasis on the implications for practice runs through the text. Among the statistical learning procedures examined are bagging, random forests, boosting, support vector machines and neural networks. Response variables may be quantitative or categorical. As in the first edition, a unifying theme is supervised learning that can be trea...
Statistics II essentials

CERN Document Server

Milewski, Emil G

2012-01-01

REA's Essentials provide quick and easy access to critical information in a variety of different fields, ranging from the most basic to the most advanced. As its name implies, these concise, comprehensive study guides summarize the essentials of the field covered. Essentials are helpful when preparing for exams, doing homework and will remain a lasting reference source for students, teachers, and professionals. Statistics II discusses sampling theory, statistical inference, independent and dependent variables, correlation theory, experimental design, count data, chi-square test, and time se
Metal and physico-chemical variations at a hydroelectric reservoir analyzed by Multivariate Analyses and Artificial Neural Networks: environmental management and policy/decision-making tools.

Science.gov (United States)

Cavalcante, Y L; Hauser-Davis, R A; Saraiva, A C F; Brandão, I L S; Oliveira, T F; Silveira, A M

2013-01-01

This paper compared and evaluated seasonal variations in physico-chemical parameters and metals at a hydroelectric power station reservoir by applying Multivariate Analyses and Artificial Neural Networks (ANN) statistical techniques. A Factor Analysis was used to reduce the number of variables: the first factor was composed of elements Ca, K, Mg and Na, and the second by Chemical Oxygen Demand. The ANN showed 100% correct classifications in training and validation samples. Physico-chemical analyses showed that water pH values were not statistically different between the dry and rainy seasons, while temperature, conductivity, alkalinity, ammonia and DO were higher in the dry period. TSS, hardness and COD, on the other hand, were higher during the rainy season. The statistical analyses showed that Ca, K, Mg and Na are directly connected to the Chemical Oxygen Demand, which indicates a possibility of their input into the reservoir system by domestic sewage and agricultural run-offs. These statistical applications, thus, are also relevant in cases of environmental management and policy decision-making processes, to identify which factors should be further studied and/or modified to recover degraded or contaminated water bodies. Copyright © 2012 Elsevier B.V. All rights reserved.
Introduction to Statistics - eNotes

DEFF Research Database (Denmark)

Brockhoff, Per B.; Møller, Jan Kloppenborg; Andersen, Elisabeth Wreford

2015-01-01

Online textbook used in the introductory statistics courses at DTU. It provides a basic introduction to applied statistics for engineers. The necessary elements from probability theory are introduced (stochastic variable, density and distribution function, mean and variance, etc.) and thereafter...... the most basic statistical analysis methods are presented: Confidence band, hypothesis testing, simulation, simple and muliple regression, ANOVA and analysis of contingency tables. Examples with the software R are included for all presented theory and methods....
Basin-scale heterogeneity in Antarctic precipitation and its impact on surface mass variability

Directory of Open Access Journals (Sweden)

J. Fyke

2017-11-01

Full Text Available Annually averaged precipitation in the form of snow, the dominant term of the Antarctic Ice Sheet surface mass balance, displays large spatial and temporal variability. Here we present an analysis of spatial patterns of regional Antarctic precipitation variability and their impact on integrated Antarctic surface mass balance variability simulated as part of a preindustrial 1800-year global, fully coupled Community Earth System Model simulation. Correlation and composite analyses based on this output allow for a robust exploration of Antarctic precipitation variability. We identify statistically significant relationships between precipitation patterns across Antarctica that are corroborated by climate reanalyses, regional modeling and ice core records. These patterns are driven by variability in large-scale atmospheric moisture transport, which itself is characterized by decadal- to centennial-scale oscillations around the long-term mean. We suggest that this heterogeneity in Antarctic precipitation variability has a dampening effect on overall Antarctic surface mass balance variability, with implications for regulation of Antarctic-sourced sea level variability, detection of an emergent anthropogenic signal in Antarctic mass trends and identification of Antarctic mass loss accelerations.
Cardiac arrhythmia detection using combination of heart rate variability analyses and PUCK analysis.

Science.gov (United States)

Mahananto, Faizal; Igasaki, Tomohiko; Murayama, Nobuki

2013-01-01

This paper presents cardiac arrhythmia detection using the combination of a heart rate variability (HRV) analysis and a "potential of unbalanced complex kinetics" (PUCK) analysis. Detection performance was improved by adding features extracted from the PUCK analysis. Initially, R-R interval data were extracted from the original electrocardiogram (ECG) recordings and were cut into small segments and marked as either normal or arrhythmia. HRV analyses then were conducted using the segmented R-R interval data, including a time-domain analysis, frequency-domain analysis, and nonlinear analysis. In addition to the HRV analysis, PUCK analysis, which has been implemented successfully in a foreign exchange market series to characterize change, was employed. A decision-tree algorithm was applied to all of the obtained features for classification. The proposed method was tested using the MIT-BIH arrhythmia database and had an overall classification accuracy of 91.73%. After combining features obtained from the PUCK analysis, the overall accuracy increased to 92.91%. Therefore, we suggest that the use of a PUCK analysis in conjunction with HRV analysis might improve performance accuracy for the detection of cardiac arrhythmia.
Robust Machine Learning Variable Importance Analyses of Medical Conditions for Health Care Spending.

Science.gov (United States)

Rose, Sherri

2018-03-11

To propose nonparametric double robust machine learning in variable importance analyses of medical conditions for health spending. 2011-2012 Truven MarketScan database. I evaluate how much more, on average, commercially insured enrollees with each of 26 of the most prevalent medical conditions cost per year after controlling for demographics and other medical conditions. This is accomplished within the nonparametric targeted learning framework, which incorporates ensemble machine learning. Previous literature studying the impact of medical conditions on health care spending has almost exclusively focused on parametric risk adjustment; thus, I compare my approach to parametric regression. My results demonstrate that multiple sclerosis, congestive heart failure, severe cancers, major depression and bipolar disorders, and chronic hepatitis are the most costly medical conditions on average per individual. These findings differed from those obtained using parametric regression. The literature may be underestimating the spending contributions of several medical conditions, which is a potentially critical oversight. If current methods are not capturing the true incremental effect of medical conditions, undesirable incentives related to care may remain. Further work is needed to directly study these issues in the context of federal formulas. © Health Research and Educational Trust.
Guidelines for the design and statistical analysis of experiments in papers submitted to ATLA.

Science.gov (United States)

Festing, M F

2001-01-01

In vitro experiments need to be well designed and correctly analysed if they are to achieve their full potential to replace the use of animals in research. An "experiment" is a procedure for collecting scientific data in order to answer a hypothesis, or to provide material for generating new hypotheses, and differs from a survey because the scientist has control over the treatments that can be applied. Most experiments can be classified into one of a few formal designs, the most common being completely randomised, and randomised block designs. These are quite common with in vitro experiments, which are often replicated in time. Some experiments involve a single independent (treatment) variable, while other "factorial" designs simultaneously vary two or more independent variables, such as drug treatment and cell line. Factorial designs often provide additional information at little extra cost. Experiments need to be carefully planned to avoid bias, be powerful yet simple, provide for a valid statistical analysis and, in some cases, have a wide range of applicability. Virtually all experiments need some sort of statistical analysis in order to take account of biological variation among the experimental subjects. Parametric methods using the t test or analysis of variance are usually more powerful than non-parametric methods, provided the underlying assumptions of normality of the residuals and equal variances are approximately valid. The statistical analyses of data from a completely randomised design, and from a randomised-block design are demonstrated in Appendices 1 and 2, and methods of determining sample size are discussed in Appendix 3. Appendix 4 gives a checklist for authors submitting papers to ATLA.
A d-statistic for single-case designs that is equivalent to the usual between-groups d-statistic.

Science.gov (United States)

Shadish, William R; Hedges, Larry V; Pustejovsky, James E; Boyajian, Jonathan G; Sullivan, Kristynn J; Andrade, Alma; Barrientos, Jeannette L

2014-01-01

We describe a standardised mean difference statistic (d) for single-case designs that is equivalent to the usual d in between-groups experiments. We show how it can be used to summarise treatment effects over cases within a study, to do power analyses in planning new studies and grant proposals, and to meta-analyse effects across studies of the same question. We discuss limitations of this d-statistic, and possible remedies to them. Even so, this d-statistic is better founded statistically than other effect size measures for single-case design, and unlike many general linear model approaches such as multilevel modelling or generalised additive models, it produces a standardised effect size that can be integrated over studies with different outcome measures. SPSS macros for both effect size computation and power analysis are available.
Environmental variables, remote sensing and geographical information systems applied to the study of Rhodnius prolixus distribution in Colombia

International Nuclear Information System (INIS)

Guhl, Felipe

2010-01-01

A data base of the entomological survey performer during the Chagas Disease National Control Programme (CHDNCP) in 1997 - 2001 and temporal satellite images masp containing 57 environmental variables were used to build Rhodnius prolixus dispersion predictive maps in Colombia, based on temporal images, Fourier analyses and a discriminative multivaried statistical analyses of the variables studied. The maps show the dispersion of this species and its implication on the Chagas disease transmission in Colombia. A clear division in the predictive dispersion of R. prolixus in two geographical zones was found: one area in the southeast of the Eastern Cordillera associated with the environmental variables used in the present study and a second zone in the Andean Valleys, East of the Eastern Cordillera not much defined by the same variables. This would suggest that the Southwest Region. of Colombia presents a tendency to a wider dispersion of R. prolixus associated to other variables like human intervention. Sylvatic populations of R. prolixus were found recently in Attalea butyracea palm trees in this Region of the Eastern Planes demonstrating the prediction of the presence of this species.
Modern applied statistics with s-plus

CERN Document Server

Venables, W N

1997-01-01

S-PLUS is a powerful environment for the statistical and graphical analysis of data. It provides the tools to implement many statistical ideas which have been made possible by the widespread availability of workstations having good graphics and computational capabilities. This book is a guide to using S-PLUS to perform statistical analyses and provides both an introduction to the use of S-PLUS and a course in modern statistical methods. S-PLUS is available for both Windows and UNIX workstations, and both versions are covered in depth. The aim of the book is to show how to use S-PLUS as a powerful and graphical system. Readers are assumed to have a basic grounding in statistics, and so the book is intended for would-be users of S-PLUS, and both students and researchers using statistics. Throughout, the emphasis is on presenting practical problems and full analyses of real data sets. Many of the methods discussed are state-of-the-art approaches to topics such as linear and non-linear regression models, robust a...

Angular-momentum nonclassicality by breaking classical bounds on statistics

Energy Technology Data Exchange (ETDEWEB)

Luis, Alfredo [Departamento de Optica, Facultad de Ciencias Fisicas, Universidad Complutense, E-28040 Madrid (Spain); Rivas, Angel [Departamento de Fisica Teorica I, Facultad de Ciencias Fisicas, Universidad Complutense, E-28040 Madrid (Spain)

2011-10-15

We derive simple practical procedures revealing the quantum behavior of angular momentum variables by the violation of classical upper bounds on the statistics. Data analysis is minimum and definite conclusions are obtained without evaluation of moments, or any other more sophisticated procedures. These nonclassical tests are very general and independent of other typical quantum signatures of nonclassical behavior such as sub-Poissonian statistics, squeezing, or oscillatory statistics, being insensitive to the nonclassical behavior displayed by other variables.
Choosing appropriate independent variable in educational experimental research: some errors debunked

Science.gov (United States)

Panjaitan, R. L.

2018-03-01

It is found that a number of quantitative research reports of some beginning researchers, especially undergraduate students, tend to ‘merely’ quantitative with not really proper understanding of variables involved in the research. This paper focuses on some mistakes related to independent variable determination in experimental research in education. With literature research methodology, data were gathered from an undergraduate student’s thesis as a single non-human subject. This data analysis resulted some findings, such as misinterpreted variables that should have represented the research question, and unsuitable calculation of determination coefficient due to incorrect independent variable determination. When a researcher misinterprets data as data that could behave as the independent variable but actually it could not, all of the following data processes become pointless. These problems might lead to inaccurate research conclusion. In this paper, the problems were analysed and discussed. To avert the incorrectness in processing data, it is suggested that undergraduate students as beginning researchers have adequate statistics mastery. This study might function as a resource to researchers in education to be aware to and not to redo similar errors.
Errors in statistical decision making Chapter 2 in Applied Statistics in Agricultural, Biological, and Environmental Sciences

Science.gov (United States)

Agronomic and Environmental research experiments result in data that are analyzed using statistical methods. These data are unavoidably accompanied by uncertainty. Decisions about hypotheses, based on statistical analyses of these data are therefore subject to error. This error is of three types,...
First study of correlation between oleic acid content and SAD gene polymorphism in olive oil samples through statistical and bayesian modeling analyses.

Science.gov (United States)

Ben Ayed, Rayda; Ennouri, Karim; Ercişli, Sezai; Ben Hlima, Hajer; Hanana, Mohsen; Smaoui, Slim; Rebai, Ahmed; Moreau, Fabienne

2018-04-10

Virgin olive oil is appreciated for its particular aroma and taste and is recognized worldwide for its nutritional value and health benefits. The olive oil contains a vast range of healthy compounds such as monounsaturated free fatty acids, especially, oleic acid. The SAD.1 polymorphism localized in the Stearoyl-acyl carrier protein desaturase gene (SAD) was genotyped and showed that it is associated with the oleic acid composition of olive oil samples. However, the effect of polymorphisms in fatty acid-related genes on olive oil monounsaturated and saturated fatty acids distribution in the Tunisian olive oil varieties is not understood. Seventeen Tunisian olive-tree varieties were selected for fatty acid content analysis by gas chromatography. The association of SAD.1 genotypes with the fatty acids composition was studied by statistical and Bayesian modeling analyses. Fatty acid content analysis showed interestingly that some Tunisian virgin olive oil varieties could be classified as a functional food and nutraceuticals due to their particular richness in oleic acid. In fact, the TT-SAD.1 genotype was found to be associated with a higher proportion of mono-unsaturated fatty acids (MUFA), mainly oleic acid (C18:1) (r = - 0.79, p SAD.1 association with the oleic acid composition of olive oil was identified among the studied varieties. This correlation fluctuated between studied varieties, which might elucidate variability in lipidic composition among them and therefore reflecting genetic diversity through differences in gene expression and biochemical pathways. SAD locus would represent an excellent marker for identifying interesting amongst virgin olive oil lipidic composition.
STATISTICAL DOWNSCALING DENGAN PERGESERAN WAKTU BERDASARKAN KORELASI SILANG

Directory of Open Access Journals (Sweden)

Aji Hamim Wigena

2015-09-01

Full Text Available Pergeseran waktu (time lag dalam analisis data deret waktu diperlukan terutama untuk analisis hubungan dua peubah (variable, seperti dalam statistical downscaling. Pergeseran waktu ini ditentukan berdasarkan korelasi silang tinggi yang setara dengan hubungan yang kuat antar kedua peubah tersebut sehingga dapat digunakan dalam pemodelan untuk prakiraan yang lebih akurat. Makalah ini mengenai statistical downscaling dengan memperhatikan korelasi silang antara data curah hujan dengan data presipitasi Global Circulation Model (GCM dari Climate Model Inter Comparison Project (CMIP5. Salah satu syarat dalam statistical downscaling adalah peubah skala lokal dan global berkorelasi tinggi. Kedua tipe peubah tersebut berupa data deret waktu sehingga fungsi korelasi silang diterapkan untuk memperoleh pergeseran waktu. Korelasi silang yang tinggi menentukan pergeseran waktu pada luaran GCM yang menghasilkan hubungan fungsional lebih kuat antara kedua tipe peubah. Model regresi komponen utama dan regresi kuadrat terkecil parsial digunakan dalam makalah ini. Model-model dengan pergeseran waktu menduga curah hujan lebih baik daripada model-model tanpa pergeseran waktu. Time lag in time series data analysis is required especially to analyze the relationship of two variables, such as in statistical downscaling. Time lag is determined based on high cross correlation which is equivalent to strong relationship between the two variables and can be used in modeling for a more accurate forecast. This paper is about statistical downscaling by considering the cross correlation between rainfall data and precipitation data from Global Circulation Model (GCM of Climate Model Inter Comparison Project (CMIP5. One of the conditions in statistical downscaling is that local scale and global scale variables are highly correlated. Both types of variables are time series data, thus cross correlation function is applied to find time lags. High cross correlation determines
Planning future care services: Analyses of investments in Norwegian municipalities.

Science.gov (United States)

Hagen, Terje P; Tingvold, Laila

2018-06-01

To analyse whether the Norwegian Central Government's goal of subsidizing 12,000 places in nursing homes or sheltered housing using an earmarked grant was reached and to determine towards which group of users the planned investments were targeted. Data from the investment plans at municipal level were provided by the Norwegian Housing Bank and linked to variables describing the municipalities' financial situation as well as variables describing the local needs for services provided by Statistics Norway. Using regression analyses we estimated the associations between municipal characteristics and planned investments in total and by type of care place. The Norwegian Central Government reached its goal of giving subsidies to 12,000 new or rebuilt places in nursing homes and sheltered housing. A total of 54% of the subsidies (6878 places) were given to places in nursing homes. About 7500 places were available by the end of the planning period and the rest were under construction. About 50% of the places were planned for user groups aged Investments in nursing homes were correlated with the share of the population older than 80 years and investments in sheltered houses were correlated with the share of users with intellectual disabilities. Earmarked grants to municipalities can be adequate measures to affect local resource allocation and thereby stimulate investments in future care. With the current institutional setup the municipalities adapt investments to local needs.
Statistical Association Criteria in Forensic Psychiatry–A criminological evaluation of casuistry

Science.gov (United States)

Gheorghiu, V; Buda, O; Popescu, I; Trandafir, MS

2011-01-01

Purpose. Identification of potential shared primary psychoprophylaxis and crime prevention is measured by analyzing the rate of commitments for patients–subjects to forensic examination. Material and method. The statistic trial is a retrospective, document–based study. The statistical lot consists of 770 initial examination reports performed and completed during the whole year 2007, primarily analyzed in order to summarize the data within the National Institute of Forensic Medicine, Bucharest, Romania (INML), with one of the group variables being ‘particularities of the psychiatric patient history’, containing the items ‘forensic onset’, ‘commitments within the last year prior to the examination’ and ‘absence of commitments within the last year prior to the examination’. The method used was the Kendall bivariate correlation. For this study, the authors separately analyze only the two items regarding commitments by other correlation alternatives and by modern, elaborate statistical analyses, i.e. recording of the standard case study variables, Kendall bivariate correlation, cross tabulation, factor analysis and hierarchical cluster analysis. Results. The results are varied, from theoretically presumed clinical nosography (such as schizophrenia or manic depression), to non–presumed (conduct disorders) or unexpected behavioral acts, and therefore difficult to interpret. Conclusions. One took into consideration the features of the batch as well as the results of the previous standard correlation of the whole statistical lot. The authors emphasize the role of medical security measures that are actually applied in the therapeutic management in general and in risk and second offence management in particular, as well as the role of forensic psychiatric examinations in the detection of certain aspects related to the monitoring of mental patients. PMID:21505571
Detecting relationships between the interannual variability in climate records and ecological time series using a multivariate statistical approach - four case studies for the North Sea region

Energy Technology Data Exchange (ETDEWEB)

Heyen, H. [GKSS-Forschungszentrum Geesthacht GmbH (Germany). Inst. fuer Gewaesserphysik

1998-12-31

A multivariate statistical approach is presented that allows a systematic search for relationships between the interannual variability in climate records and ecological time series. Statistical models are built between climatological predictor fields and the variables of interest. Relationships are sought on different temporal scales and for different seasons and time lags. The possibilities and limitations of this approach are discussed in four case studies dealing with salinity in the German Bight, abundance of zooplankton at Helgoland Roads, macrofauna communities off Norderney and the arrival of migratory birds on Helgoland. (orig.) [Deutsch] Ein statistisches, multivariates Modell wird vorgestellt, das eine systematische Suche nach potentiellen Zusammenhaengen zwischen Variabilitaet in Klima- und oekologischen Zeitserien erlaubt. Anhand von vier Anwendungsbeispielen wird der Klimaeinfluss auf den Salzgehalt in der Deutschen Bucht, Zooplankton vor Helgoland, Makrofauna vor Norderney, und die Ankunft von Zugvoegeln auf Helgoland untersucht. (orig.)
[The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].

Science.gov (United States)

Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel

2017-01-01

The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
The research protocol VI: How to choose the appropriate statistical test. Inferential statistics

Directory of Open Access Journals (Sweden)

Eric Flores-Ruiz

2017-10-01

Full Text Available The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
Authigenic oxide Neodymium Isotopic composition as a proxy of seawater: applying multivariate statistical analyses.

Science.gov (United States)

McKinley, C. C.; Scudder, R.; Thomas, D. J.

2016-12-01

The Neodymium Isotopic composition (Nd IC) of oxide coatings has been applied as a tracer of water mass composition and used to address fundamental questions about past ocean conditions. The leached authigenic oxide coating from marine sediment is widely assumed to reflect the dissolved trace metal composition of the bottom water interacting with sediment at the seafloor. However, recent studies have shown that readily reducible sediment components, in addition to trace metal fluxes from the pore water, are incorporated into the bottom water, influencing the trace metal composition of leached oxide coatings. This challenges the prevailing application of the authigenic oxide Nd IC as a proxy of seawater composition. Therefore, it is important to identify the component end-members that create sediments of different lithology and determine if, or how they might contribute to the Nd IC of oxide coatings. To investigate lithologic influence on the results of sequential leaching, we selected two sites with complete bulk sediment statistical characterization. Site U1370 in the South Pacific Gyre, is predominantly composed of Rhyolite ( 60%) and has a distinguishable ( 10%) Fe-Mn Oxyhydroxide component (Dunlea et al., 2015). Site 1149 near the Izu-Bonin-Arc is predominantly composed of dispersed ash ( 20-50%) and eolian dust from Asia ( 50-80%) (Scudder et al., 2014). We perform a two-step leaching procedure: a 14 mL of 0.02 M hydroxylamine hydrochloride (HH) in 20% acetic acid buffered to a pH 4 for one hour, targeting metals bound to Fe- and Mn- oxides fractions, and a second HH leach for 12 hours, designed to remove any remaining oxides from the residual component. We analyze all three resulting fractions for a large suite of major, trace and rare earth elements, a sub-set of the samples are also analyzed for Nd IC. We use multivariate statistical analyses of the resulting geochemical data to identify how each component of the sediment partitions across the sequential
Optimal allocation of testing resources for statistical simulations

Science.gov (United States)

Quintana, Carolina; Millwater, Harry R.; Singh, Gulshan; Golden, Patrick

2015-07-01

Statistical estimates from simulation involve uncertainty caused by the variability in the input random variables due to limited data. Allocating resources to obtain more experimental data of the input variables to better characterize their probability distributions can reduce the variance of statistical estimates. The methodology proposed determines the optimal number of additional experiments required to minimize the variance of the output moments given single or multiple constraints. The method uses multivariate t-distribution and Wishart distribution to generate realizations of the population mean and covariance of the input variables, respectively, given an amount of available data. This method handles independent and correlated random variables. A particle swarm method is used for the optimization. The optimal number of additional experiments per variable depends on the number and variance of the initial data, the influence of the variable in the output function and the cost of each additional experiment. The methodology is demonstrated using a fretting fatigue example.
Intuitive introductory statistics

CERN Document Server

Wolfe, Douglas A

2017-01-01

This textbook is designed to give an engaging introduction to statistics and the art of data analysis. The unique scope includes, but also goes beyond, classical methodology associated with the normal distribution. What if the normal model is not valid for a particular data set? This cutting-edge approach provides the alternatives. It is an introduction to the world and possibilities of statistics that uses exercises, computer analyses, and simulations throughout the core lessons. These elementary statistical methods are intuitive. Counting and ranking features prominently in the text. Nonparametric methods, for instance, are often based on counts and ranks and are very easy to integrate into an introductory course. The ease of computation with advanced calculators and statistical software, both of which factor into this text, allows important techniques to be introduced earlier in the study of statistics. This book's novel scope also includes measuring symmetry with Walsh averages, finding a nonp...
47 CFR 1.363 - Introduction of statistical data.

Science.gov (United States)

2010-10-01

... 47 Telecommunication 1 2010-10-01 2010-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall be...
Analysing relations between specific and total liking scores

DEFF Research Database (Denmark)

Menichelli, Elena; Kraggerud, Hilde; Olsen, Nina Veflen

2013-01-01

The objective of this article is to present a new statistical approach for the study of consumer liking. Total liking data are extended by incorporating liking for specific sensory properties. The approach combines different analyses for the purpose of investigating the most important aspects...... of liking and indicating which products are similarly or differently perceived by which consumers. A method based on the differences between total liking and the specific liking variables is proposed for studying both relative differences among products and individual consumer differences. Segmentation...... is also tested out in order to distinguish consumers with the strongest differences in their liking values. The approach is illustrated by a case study, based on cheese data. In the consumer test consumers were asked to evaluate their total liking, the liking for texture and the liking for odour/taste. (C...
GIS and statistical analysis for landslide susceptibility mapping in the Daunia area, Italy

Science.gov (United States)

Mancini, F.; Ceppi, C.; Ritrovato, G.

2010-09-01

This study focuses on landslide susceptibility mapping in the Daunia area (Apulian Apennines, Italy) and achieves this by using a multivariate statistical method and data processing in a Geographical Information System (GIS). The Logistic Regression (hereafter LR) method was chosen to produce a susceptibility map over an area of 130 000 ha where small settlements are historically threatened by landslide phenomena. By means of LR analysis, the tendency to landslide occurrences was, therefore, assessed by relating a landslide inventory (dependent variable) to a series of causal factors (independent variables) which were managed in the GIS, while the statistical analyses were performed by means of the SPSS (Statistical Package for the Social Sciences) software. The LR analysis produced a reliable susceptibility map of the investigated area and the probability level of landslide occurrence was ranked in four classes. The overall performance achieved by the LR analysis was assessed by local comparison between the expected susceptibility and an independent dataset extrapolated from the landslide inventory. Of the samples classified as susceptible to landslide occurrences, 85% correspond to areas where landslide phenomena have actually occurred. In addition, the consideration of the regression coefficients provided by the analysis demonstrated that a major role is played by the "land cover" and "lithology" causal factors in determining the occurrence and distribution of landslide phenomena in the Apulian Apennines.
Statistical model for expected un supplied energy; Statistisk modell for forventet ILE

Energy Technology Data Exchange (ETDEWEB)

NONE

2005-07-01

Results from a statistical analysis of expected un supplied energy for Norwegian network companies are presented. The data are from the years 1996-2004. The estimation model includes several explanatory variables that together reflect the characteristics of the network, climatic aspects and other geographical conditions. The model has a high degree of accuracy when compared to the historical number of un supplied energy for about 90 percent of the network companies. But for 12 companies there are substantial, negative deviances that are not compatible with the available data. There is reason to believe that improved data for some types of variables can improve the accuracy of the model. In addition to establishing a norm for expected un supplied energy in the revenue estimations, the model can be used to reflect geographical constraints in NVEs (Norwegian Water and Energy directorate) efficiency analyses (ml)
[Clinical research IV. Relevancy of the statistical test chosen].

Science.gov (United States)

Talavera, Juan O; Rivas-Ruiz, Rodolfo

2011-01-01

When we look at the difference between two therapies or the association of a risk factor or prognostic indicator with its outcome, we need to evaluate the accuracy of the result. This assessment is based on a judgment that uses information about the study design and statistical management of the information. This paper specifically mentions the relevance of the statistical test selected. Statistical tests are chosen mainly from two characteristics: the objective of the study and type of variables. The objective can be divided into three test groups: a) those in which you want to show differences between groups or inside a group before and after a maneuver, b) those that seek to show the relationship (correlation) between variables, and c) those that aim to predict an outcome. The types of variables are divided in two: quantitative (continuous and discontinuous) and qualitative (ordinal and dichotomous). For example, if we seek to demonstrate differences in age (quantitative variable) among patients with systemic lupus erythematosus (SLE) with and without neurological disease (two groups), the appropriate test is the "Student t test for independent samples." But if the comparison is about the frequency of females (binomial variable), then the appropriate statistical test is the χ(2).
Order, disorder and generalized statistics

International Nuclear Information System (INIS)

Marino, E.C.; Swieca, J.A.

1980-06-01

We generalize the prescription of Kadanoff and Ceva for the computation of disorder variables correlation functions in the Ising model for continuous field theories with U(1) symmetry. By considering the product of order and disorder variables, we obtain a path integral representation for fields with generalized statistics. We discuss in detail the cases of massless Thirring and Schwinger models. (Author) [pt
Order, disorder and generalized statistics

International Nuclear Information System (INIS)

Marino, E.C.; Swieca, J.A.; Pontificia Universidade Catolica do Rio de Janeiro

1980-01-01

We generalize the prescription of Kadanoff and Ceva for the computation of disorder variable correlation functions in the Ising model for continuous field theories with U(1) symmetry. By considering the product of order and disorder variables, we obtain a path integral representation for fields with generalized statistics. We discuss in detail the cases of massless Thirring and Schwinger models. (orig.)

Spatial variability and trends of the rain intensity over Greece

Science.gov (United States)

Kambezidis, H. D.; Larissi, I. K.; Nastos, P. T.; Paliatsos, A. G.

2010-07-01

In this study, the spatial and temporal variability of the mean annual rain intensity in Greece are examined during a 41-year period (1962-2002). The meteorological datasets concern monthly rain amounts (mm) and the respective monthly durations (h) recorded at thirty two meteorological stations of the Hellenic National Meteorological Service, which are uniformly distributed on Greek territory, in order to calculate the mean monthly rain intensity. All the rain time series used in the analysis were tested by the application of the short-cut Bartlett test of homogeneity. The spatial distribution of the mean annual rain intensity is studied using the Kriging interpolation method, while the temporal variability, concerning the mean annual rain intensity trends along with their significance (Mann-Kendall test), is analysed. The findings of the analysis show that statistically significant negative trends (95% confidence level) appear mainly in the west sub-regions of Greece, while statistically significant positive trends (95% confidence level) appear in the wider area of Athens and the complex of Cyclades Islands. Further analysis concerning the seasonal rain intensity is needed, because there are different seasonal patterns, taking into account that, convective rain in Greece occurs mainly within the summer season.
A comparison of cephalometric analyses for assessing sagittal jaw relationship

International Nuclear Information System (INIS)

Erum, G.; Fida, M.

2008-01-01

To compare the seven methods of cephalometric analysis for assessing sagittal jaw relationship and to determine the level of agreement between them. Seven methods, describing anteroposterior jaw relationships (A-B plane, ANB, Wits, AXB, AF-BF, FABA and Beta angle) were measured on the lateral cephalographs of 85 patients. Correlation analysis, using Cramer's V-test, was performed to determine the possible agreement between the pair of analyses. The mean age of the sample, comprising 35 males and 50 females was 15 years and 3 months. Statistically significant relationships were found among seven sagittal parameters with p-value <0.001. Very strong correlation was found between AXB and AF-BF distance (r=0.924); and weak correlation between ANB and Beta angle (r=0.377). Wits appraisal showed the greatest coefficient of variability. Despite varying strengths of association, statistically significant correlations were found among seven methods for assessing sagittal jaw relationship. FABA and A-B plane may be used to predict the skeletal class in addition to the established ANB angle. (author)
Statistical analysis of hydrological response in urbanising catchments based on adaptive sampling using inter-amount times

Science.gov (United States)

ten Veldhuis, Marie-Claire; Schleiss, Marc

2017-04-01

Urban catchments are typically characterised by a more flashy nature of the hydrological response compared to natural catchments. Predicting flow changes associated with urbanisation is not straightforward, as they are influenced by interactions between impervious cover, basin size, drainage connectivity and stormwater management infrastructure. In this study, we present an alternative approach to statistical analysis of hydrological response variability and basin flashiness, based on the distribution of inter-amount times. We analyse inter-amount time distributions of high-resolution streamflow time series for 17 (semi-)urbanised basins in North Carolina, USA, ranging from 13 to 238 km2 in size. We show that in the inter-amount-time framework, sampling frequency is tuned to the local variability of the flow pattern, resulting in a different representation and weighting of high and low flow periods in the statistical distribution. This leads to important differences in the way the distribution quantiles, mean, coefficient of variation and skewness vary across scales and results in lower mean intermittency and improved scaling. Moreover, we show that inter-amount-time distributions can be used to detect regulation effects on flow patterns, identify critical sampling scales and characterise flashiness of hydrological response. The possibility to use both the classical approach and the inter-amount-time framework to identify minimum observable scales and analyse flow data opens up interesting areas for future research.
Inferring the origin of rare fruit distillates from compositional data using multivariate statistical analyses and the identification of new flavour constituents.

Science.gov (United States)

Mihajilov-Krstev, Tatjana M; Denić, Marija S; Zlatković, Bojan K; Stankov-Jovanović, Vesna P; Mitić, Violeta D; Stojanović, Gordana S; Radulović, Niko S

2015-04-01

In Serbia, delicatessen fruit alcoholic drinks are produced from autochthonous fruit-bearing species such as cornelian cherry, blackberry, elderberry, wild strawberry, European wild apple, European blueberry and blackthorn fruits. There are no chemical data on many of these and herein we analysed volatile minor constituents of these rare fruit distillates. Our second goal was to determine possible chemical markers of these distillates through a statistical/multivariate treatment of the herein obtained and previously reported data. Detailed chemical analyses revealed a complex volatile profile of all studied fruit distillates with 371 identified compounds. A number of constituents were recognised as marker compounds for a particular distillate. Moreover, 33 of them represent newly detected flavour constituents in alcoholic beverages or, in general, in foodstuffs. With the aid of multivariate analyses, these volatile profiles were successfully exploited to infer the origin of raw materials used in the production of these spirits. It was also shown that all fruit distillates possessed weak antimicrobial properties. It seems that the aroma of these highly esteemed wild-fruit spirits depends on the subtle balance of various minor volatile compounds, whereby some of them are specific to a certain type of fruit distillate and enable their mutual distinction. © 2014 Society of Chemical Industry.
Statistical Estimators Using Jointly Administrative and Survey Data to Produce French Structural Business Statistics

Directory of Open Access Journals (Sweden)

Brion Philippe

2015-12-01

Full Text Available Using as much administrative data as possible is a general trend among most national statistical institutes. Different kinds of administrative sources, from tax authorities or other administrative bodies, are very helpful material in the production of business statistics. However, these sources often have to be completed by information collected through statistical surveys. This article describes the way Insee has implemented such a strategy in order to produce French structural business statistics. The originality of the French procedure is that administrative and survey variables are used jointly for the same enterprises, unlike the majority of multisource systems, in which the two kinds of sources generally complement each other for different categories of units. The idea is to use, as much as possible, the richness of the administrative sources combined with the timeliness of a survey, even if the latter is conducted only on a sample of enterprises. One main issue is the classification of enterprises within the NACE nomenclature, which is a cornerstone variable in producing the breakdown of the results by industry. At a given date, two values of the corresponding code may coexist: the value of the register, not necessarily up to date, and the value resulting from the data collected via the survey, but only from a sample of enterprises. Using all this information together requires the implementation of specific statistical estimators combining some properties of the difference estimators with calibration techniques. This article presents these estimators, as well as their statistical properties, and compares them with those of other methods.
Elementary Statistics Tables

CERN Document Server

Neave, Henry R

2012-01-01

This book, designed for students taking a basic introductory course in statistical analysis, is far more than just a book of tables. Each table is accompanied by a careful but concise explanation and useful worked examples. Requiring little mathematical background, Elementary Statistics Tables is thus not just a reference book but a positive and user-friendly teaching and learning aid. The new edition contains a new and comprehensive "teach-yourself" section on a simple but powerful approach, now well-known in parts of industry but less so in academia, to analysing and interpreting process dat
Effect of non-normality on test statistics for one-way independent groups designs.

Science.gov (United States)

Cribbie, Robert A; Fiksenbaum, Lisa; Keselman, H J; Wilcox, Rand R

2012-02-01

The data obtained from one-way independent groups designs is typically non-normal in form and rarely equally variable across treatment populations (i.e., population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e., the analysis of variance F test) typically provides invalid results (e.g., too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non-normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e., trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non-normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non-normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non-normal. © 2011 The British Psychological Society.
Statistical Metadata Analysis of the Variability of Latency, Device Transfer Time, and Coordinate Position from Smartphone-Recorded Infrasound Data

Science.gov (United States)

Garces, E. L.; Garces, M. A.; Christe, A.

2017-12-01

The RedVox infrasound recorder app uses microphones and barometers in smartphones to record infrasound, low-frequency sound below the threshold of human hearing. We study a device's metadata, which includes position, latency time, the differences between the device's internal times and the server times, and the machine time, searching for patterns and possible errors or discontinuities in these scaled parameters. We highlight metadata variability through scaled multivariate displays (histograms, distribution curves, scatter plots), all created and organized through software development in Python. This project is helpful in ascertaining variability and honing the accuracy of smartphones, aiding the emergence of portable devices as viable geophysical data collection instruments. It can also improve the app and cloud service by increasing efficiency and accuracy, allowing to better document and foresee drastic natural movements like tsunamis, earthquakes, volcanic eruptions, storms, rocket launches, and meteor impacts; recorded data can later be used for studies and analysis by a variety of professions. We expect our final results to produce insight on how to counteract problematic issues in data mining and improve accuracy in smartphone data-collection. By eliminating lurking variables and minimizing the effect of confounding variables, we hope to discover efficient processes to reduce superfluous precision, unnecessary errors, and data artifacts. These methods should conceivably be transferable to other areas of software development, data analytics, and statistics-based experiments, contributing a precedent of smartphone metadata studies from geophysical rather than societal data. The results should facilitate the rise of civilian-accessible, hand-held, data-gathering mobile sensor networks and yield more straightforward data mining techniques.
VARIABILITY IN PHENOTYPIC EXPRESSION OF SEED QUALITY TRAITS IN SOYBEAN GERMPLASM

Directory of Open Access Journals (Sweden)

Aleksandra Sudarić

2017-01-01

Full Text Available The aim of this research was to determine the genetic variability of chosen soybean lines in seed quality by determining diversity in phenotypic expression of 1000 seed weight, as well as protein and oil concentrations in the seed. Field trials were set up in a randomized, complete block design with two replications, at the Agricultural Institute Osijek during three growing seasons (2010-2012. Each year, after harvest, 1000 seed weight, and protein and oil concentrations in the seed were determined. Statistical analyses of the results included: calculating basic measures of variation and analysis of variance. The analyzed data showed the existence of plant material's diversity in phenotypic expression of investigated seed quality traits, as well as the existence of statistically significant genotype and year effects.
Effect of Briquetting Process Variables on Hygroscopic Property of Water Hyacinth Briquettes

Directory of Open Access Journals (Sweden)

R. M. Davies

2013-01-01

Full Text Available The knowledge of water resistance capacity of briquettes is important in order to determine how sensitive the produced briquettes are to moisture change during storage. The relative changes in length and diameter of briquettes during immersion in water for 6 hours were investigated. This was conducted to determine hygroscopic property of produced briquettes under process variables levels of binder (10, 20, 30, 40, and 50% by weight of residue, compaction pressure (3.0, 5.0, 7.0, and 9.0 MPa and particle size (0.5, 1.6, and 4 mm of dried and ground water hyacinth. Data was statistically analysed using Analysis of Variance, the Duncan Multiple Range Test, and descriptive statistics. The relative change in length of briquettes with process variables ranged significantly from % to % (binder, % to % (compaction pressure, and % to % (particle size (. Furthermore, the relative change in diameter of briquettes with binder, compaction pressure, and particle size varied significantly from % to %, % to %, and % to %, respectively (. This study suggests optimum process variables required to produce briquettes of high water resistance capacity for humid environments like the Niger Delta, Nigeria, as 50% (binder proportion, 9 MPa (compaction pressure, and 0.5 mm (particle size.
Statistical analysis and interpretation of prenatal diagnostic imaging studies, Part 2: descriptive and inferential statistical methods.

Science.gov (United States)

Tuuli, Methodius G; Odibo, Anthony O

2011-08-01

The objective of this article is to discuss the rationale for common statistical tests used for the analysis and interpretation of prenatal diagnostic imaging studies. Examples from the literature are used to illustrate descriptive and inferential statistics. The uses and limitations of linear and logistic regression analyses are discussed in detail.
Spatial and temporal variability in urban fine particulate matter concentrations

International Nuclear Information System (INIS)

Levy, Jonathan I.; Hanna, Steven R.

2011-01-01

Identification of hot spots for urban fine particulate matter (PM 2.5 ) concentrations is complicated by the significant contributions from regional atmospheric transport and the dependence of spatial and temporal variability on averaging time. We focus on PM 2.5 patterns in New York City, which includes significant local sources, street canyons, and upwind contributions to concentrations. A literature synthesis demonstrates that long-term (e.g., one-year) average PM 2.5 concentrations at a small number of widely-distributed monitoring sites would not show substantial variability, whereas short-term (e.g., 1-h) average measurements with high spatial density would show significant variability. Statistical analyses of ambient monitoring data as a function of wind speed and direction reinforce the significance of regional transport but show evidence of local contributions. We conclude that current monitor siting may not adequately capture PM 2.5 variability in an urban area, especially in a mega-city, reinforcing the necessity of dispersion modeling and methods for analyzing high-resolution monitoring observations. - Highlights: →Fine particulate matter (PM 2.5 ) hot spots are hard to identify in urban areas. → Literature conclusions about PM 2.5 hot spots depend on study design and methods. → Hot spots are more likely for short-term concentrations at high spatial density. → Statistical methods illustrate local source impacts beyond regional transport. → Dispersion models and high-resolution monitors are both needed to find hot spots. - Fine particulate matter can vary spatially within large urban areas, in spite of the significant contribution from regional atmospheric transport.
Differences in game-related statistics of basketball performance by game location for men's winning and losing teams.

Science.gov (United States)

Gómez, Miguel A; Lorenzo, Alberto; Barakat, Rubén; Ortega, Enrique; Palao, José M

2008-02-01

The aim of the present study was to identify game-related statistics that differentiate winning and losing teams according to game location. The sample included 306 games of the 2004-2005 regular season of the Spanish professional men's league (ACB League). The independent variables were game location (home or away) and game result (win or loss). The game-related statistics registered were free throws (successful and unsuccessful), 2- and 3-point field goals (successful and unsuccessful), offensive and defensive rebounds, blocks, assists, fouls, steals, and turnovers. Descriptive and inferential analyses were done (one-way analysis of variance and discriminate analysis). The multivariate analysis showed that winning teams differ from losing teams in defensive rebounds (SC = .42) and in assists (SC = .38). Similarly, winning teams differ from losing teams when they play at home in defensive rebounds (SC = .40) and in assists (SC = .41). On the other hand, winning teams differ from losing teams when they play away in defensive rebounds (SC = .44), assists (SC = .30), successful 2-point field goals (SC = .31), and unsuccessful 3-point field goals (SC = -.35). Defensive rebounds and assists were the only game-related statistics common to all three analyses.
Review of the Statistical Techniques in Medical Sciences | Okeh ...

African Journals Online (AJOL)

... medical researcher in selecting the appropriate statistical techniques. Of course, all statistical techniques have certain underlying assumptions, which must be checked before the technique is applied. Keywords: Variable, Prospective Studies, Retrospective Studies, Statistical significance. Bio-Research Vol. 6 (1) 2008: pp.
Hidden Statistics of Schroedinger Equation

Science.gov (United States)

Zak, Michail

2011-01-01

Work was carried out in determination of the mathematical origin of randomness in quantum mechanics and creating a hidden statistics of Schr dinger equation; i.e., to expose the transitional stochastic process as a "bridge" to the quantum world. The governing equations of hidden statistics would preserve such properties of quantum physics as superposition, entanglement, and direct-product decomposability while allowing one to measure its state variables using classical methods.
First-Generation Transgenic Plants and Statistics

NARCIS (Netherlands)

Nap, Jan-Peter; Keizer, Paul; Jansen, Ritsert

1993-01-01

The statistical analyses of populations of first-generation transgenic plants are commonly based on mean and variance and generally require a test of normality. Since in many cases the assumptions of normality are not met, analyses can result in erroneous conclusions. Transformation of data to
DESIGNING ENVIRONMENTAL MONITORING DATABASES FOR STATISTIC ASSESSMENT

Science.gov (United States)

Databases designed for statistical analyses have characteristics that distinguish them from databases intended for general use. EMAP uses a probabilistic sampling design to collect data to produce statistical assessments of environmental conditions. In addition to supporting the ...
Consumer Loyalty and Loyalty Programs: a topographic examination of the scientific literature using bibliometrics, spatial statistics and network analyses

Directory of Open Access Journals (Sweden)

Viviane Moura Rocha

2015-04-01

Full Text Available This paper presents a topographic analysis of the fields of consumer loyalty and loyalty programs, vastly studied in the last decades and still relevant in the marketing literature. After the identification of 250 scientific papers that were published in the last ten years in indexed journals, a subset of 76 were chosen and their 3223 references were extracted. The journals in which these papers were published, their key words, abstracts, authors, institutions of origin and citation patterns were identified and analyzed using bibliometrics, spatial statistics techniques and network analyses. The results allow the identification of the central components of the field, as well as its main authors, journals, institutions and countries that intermediate the diffusion of knowledge, which contributes to the understanding of the constitution of the field by researchers and students.
Statistical methods in nuclear theory

International Nuclear Information System (INIS)

Shubin, Yu.N.

1974-01-01

The paper outlines statistical methods which are widely used for describing properties of excited states of nuclei and nuclear reactions. It discusses physical assumptions lying at the basis of known distributions between levels (Wigner, Poisson distributions) and of widths of highly excited states (Porter-Thomas distribution, as well as assumptions used in the statistical theory of nuclear reactions and in the fluctuation analysis. The author considers the random matrix method, which consists in replacing the matrix elements of a residual interaction by random variables with a simple statistical distribution. Experimental data are compared with results of calculations using the statistical model. The superfluid nucleus model is considered with regard to superconducting-type pair correlations
ANALYSIS OF THE STATISTICAL BEHAVIOUR OF DAILY MAXIMUM AND MONTHLY AVERAGE RAINFALL ALONG WITH RAINY DAYS VARIATION IN SYLHET, BANGLADESH

Directory of Open Access Journals (Sweden)

G. M. J. HASAN

2014-10-01

Full Text Available Climate, one of the major controlling factors for well-being of the inhabitants in the world, has been changing in accordance with the natural forcing and manmade activities. Bangladesh, the most densely populated countries in the world is under threat due to climate change caused by excessive use or abuse of ecology and natural resources. This study checks the rainfall patterns and their associated changes in the north-eastern part of Bangladesh mainly Sylhet city through statistical analysis of daily rainfall data during the period of 1957 - 2006. It has been observed that a good correlation exists between the monthly mean and daily maximum rainfall. A linear regression analysis of the data is found to be significant for all the months. Some key statistical parameters like the mean values of Coefficient of Variability (CV, Relative Variability (RV and Percentage Inter-annual Variability (PIV have been studied and found to be at variance. Monthly, yearly and seasonal variation of rainy days also analysed to check for any significant changes.

Statistical criteria for characterizing irradiance time series.

Energy Technology Data Exchange (ETDEWEB)

Stein, Joshua S.; Ellis, Abraham; Hansen, Clifford W.

2010-10-01

We propose and examine several statistical criteria for characterizing time series of solar irradiance. Time series of irradiance are used in analyses that seek to quantify the performance of photovoltaic (PV) power systems over time. Time series of irradiance are either measured or are simulated using models. Simulations of irradiance are often calibrated to or generated from statistics for observed irradiance and simulations are validated by comparing the simulation output to the observed irradiance. Criteria used in this comparison should derive from the context of the analyses in which the simulated irradiance is to be used. We examine three statistics that characterize time series and their use as criteria for comparing time series. We demonstrate these statistics using observed irradiance data recorded in August 2007 in Las Vegas, Nevada, and in June 2009 in Albuquerque, New Mexico.
Meta-analyses of the 5-HTTLPR polymorphisms and post-traumatic stress disorder.

Directory of Open Access Journals (Sweden)

Fernando Navarro-Mateu

Full Text Available OBJECTIVE: To conduct a meta-analysis of all published genetic association studies of 5-HTTLPR polymorphisms performed in PTSD cases. METHODS DATA SOURCES: Potential studies were identified through PubMed/MEDLINE, EMBASE, Web of Science databases (Web of Knowledge, WoK, PsychINFO, PsychArticles and HuGeNet (Human Genome Epidemiology Network up until December 2011. STUDY SELECTION: Published observational studies reporting genotype or allele frequencies of this genetic factor in PTSD cases and in non-PTSD controls were all considered eligible for inclusion in this systematic review. DATA EXTRACTION: Two reviewers selected studies for possible inclusion and extracted data independently following a standardized protocol. STATISTICAL ANALYSIS: A biallelic and a triallelic meta-analysis, including the total S and S' frequencies, the dominant (S+/LL and S'+/L'L' and the recessive model (SS/L+ and S'S'/L'+, was performed with a random-effect model to calculate the pooled OR and its corresponding 95% CI. Forest plots and Cochran's Q-Statistic and I(2 index were calculated to check for heterogeneity. Subgroup analyses and meta-regression were carried out to analyze potential moderators. Publication bias and quality of reporting were also analyzed. RESULTS: 13 studies met our inclusion criteria, providing a total sample of 1874 patients with PTSD and 7785 controls in the biallelic meta-analyses and 627 and 3524, respectively, in the triallelic. None of the meta-analyses showed evidence of an association between 5-HTTLPR and PTSD but several characteristics (exposure to the same principal stressor for PTSD cases and controls, adjustment for potential confounding variables, blind assessment, study design, type of PTSD, ethnic distribution and Total Quality Score influenced the results in subgroup analyses and meta-regression. There was no evidence of potential publication bias. CONCLUSIONS: Current evidence does not support a direct effect of 5-HTTLPR
The statistical-inference approach to generalized thermodynamics

International Nuclear Information System (INIS)

Lavenda, B.H.; Scherer, C.

1987-01-01

Limit theorems, such as the central-limit theorem and the weak law of large numbers, are applicable to statistical thermodynamics for sufficiently large sample size of indipendent and identically distributed observations performed on extensive thermodynamic (chance) variables. The estimation of the intensive thermodynamic quantities is a problem in parametric statistical estimation. The normal approximation to the Gibbs' distribution is justified by the analysis of large deviations. Statistical thermodynamics is generalized to include the statistical estimation of variance as well as mean values
Statistic analyses of the color experience according to the age of the observer.

Science.gov (United States)

Hunjet, Anica; Parac-Osterman, Durdica; Vucaj, Edita

2013-04-01

Psychological experience of color is a real state of the communication between the environment and color, and it will depend on the source of the light, angle of the view, and particular on the observer and his health condition. Hering's theory or a theory of the opponent processes supposes that cones, which are situated in the retina of the eye, are not sensible on the three chromatic domains (areas, fields, zones) (red, green and purple-blue), but they produce a signal based on the principle of the opposed pairs of colors. A reason of this theory depends on the fact that certain disorders of the color eyesight, which include blindness to certain colors, cause blindness to pairs of opponent colors. This paper presents a demonstration of the experience of blue and yellow tone according to the age of the observer. For the testing of the statistically significant differences in the omission in the color experience according to the color of the background we use following statistical tests: Mann-Whitnney U Test, Kruskal-Wallis ANOVA and Median test. It was proven that the differences are statistically significant in the elderly persons (older than 35 years).
Statistical processing of technological and radiochemical data

International Nuclear Information System (INIS)

Lahodova, Zdena; Vonkova, Kateřina

2011-01-01

The project described in this article had two goals. The main goal was to compare technological and radiochemical data from two units of nuclear power plant. The other goal was to check the collection, organization and interpretation of routinely measured data. Monitoring of analytical and radiochemical data is a very valuable source of knowledge for some processes in the primary circuit. Exploratory analysis of one-dimensional data was performed to estimate location and variability and to find extreme values, data trends, distribution, autocorrelation etc. This process allowed for the cleaning and completion of raw data. Then multiple analyses such as multiple comparisons, multiple correlation, variance analysis, and so on were performed. Measured data was organized into a data matrix. The results and graphs such as Box plots, Mahalanobis distance, Biplot, Correlation, and Trend graphs are presented in this article as statistical analysis tools. Tables of data were replaced with graphs because graphs condense large amounts of information into easy-to-understand formats. The significant conclusion of this work is that the collection and comprehension of data is a very substantial part of statistical processing. With well-prepared and well-understood data, its accurate evaluation is possible. Cooperation between the technicians who collect data and the statistician who processes it is also very important. (author)
R for statistics

CERN Document Server

Cornillon, Pierre-Andre; Husson, Francois; Jegou, Nicolas; Josse, Julie; Kloareg, Maela; Matzner-Lober, Eric; Rouviere, Laurent

2012-01-01

An Overview of RMain ConceptsInstalling RWork SessionHelpR ObjectsFunctionsPackagesExercisesPreparing DataReading Data from FileExporting ResultsManipulating VariablesManipulating IndividualsConcatenating Data TablesCross-TabulationExercisesR GraphicsConventional Graphical FunctionsGraphical Functions with latticeExercisesMaking Programs with RControl FlowsPredefined FunctionsCreating a FunctionExercisesStatistical MethodsIntroduction to the Statistical MethodsA Quick Start with RInstalling ROpening and Closing RThe Command PromptAttribution, Objects, and FunctionSelectionOther Rcmdr PackageImporting (or Inputting) DataGraphsStatistical AnalysisHypothesis TestConfidence Intervals for a MeanChi-Square Test of IndependenceComparison of Two MeansTesting Conformity of a ProportionComparing Several ProportionsThe Power of a TestRegressionSimple Linear RegressionMultiple Linear RegressionPartial Least Squares (PLS) RegressionAnalysis of Variance and CovarianceOne-Way Analysis of VarianceMulti-Way Analysis of Varian...
Analysis of the interannual variability of tropical cyclones striking the California coast based on statistical downscaling

Science.gov (United States)

Mendez, F. J.; Rueda, A.; Barnard, P.; Mori, N.; Nakajo, S.; Espejo, A.; del Jesus, M.; Diez Sierra, J.; Cofino, A. S.; Camus, P.

2016-02-01

Hurricanes hitting California have a very low ocurrence probability due to typically cool ocean temperature and westward tracks. However, damages associated to these improbable events would be dramatic in Southern California and understanding the oceanographic and atmospheric drivers is of paramount importance for coastal risk management for present and future climates. A statistical analysis of the historical events is very difficult due to the limited resolution of atmospheric and oceanographic forcing data available. In this work, we propose a combination of: (a) statistical downscaling methods (Espejo et al, 2015); and (b) a synthetic stochastic tropical cyclone (TC) model (Nakajo et al, 2014). To build the statistical downscaling model, Y=f(X), we apply a combination of principal component analysis and the k-means classification algorithm to find representative patterns from a potential TC index derived from large-scale SST fields in Eastern Central Pacific (predictor X) and the associated tropical cyclone ocurrence (predictand Y). SST data comes from NOAA Extended Reconstructed SST V3b providing information from 1854 to 2013 on a 2.0 degree x 2.0 degree global grid. As data for the historical occurrence and paths of tropical cycloneas are scarce, we apply a stochastic TC model which is based on a Monte Carlo simulation of the joint distribution of track, minimum sea level pressure and translation speed of the historical events in the Eastern Central Pacific Ocean. Results will show the ability of the approach to explain seasonal-to-interannual variability of the predictor X, which is clearly related to El Niño Southern Oscillation. References Espejo, A., Méndez, F.J., Diez, J., Medina, R., Al-Yahyai, S. (2015) Seasonal probabilistic forecasting of tropical cyclone activity in the North Indian Ocean, Journal of Flood Risk Management, DOI: 10.1111/jfr3.12197 Nakajo, S., N. Mori, T. Yasuda, and H. Mase (2014) Global Stochastic Tropical Cyclone Model Based on
Statistical analysis of management data

CERN Document Server

Gatignon, Hubert

2013-01-01

This book offers a comprehensive approach to multivariate statistical analyses. It provides theoretical knowledge of the concepts underlying the most important multivariate techniques and an overview of actual applications.
Exploratory Spectroscopy of Magnetic Cataclysmic Variables Candidates and Other Variable Objects

Energy Technology Data Exchange (ETDEWEB)

Oliveira, A. S.; Palhares, M. S. [IP and D, Universidade do Vale do Paraíba, 12244-000, São José dos Campos, SP (Brazil); Rodrigues, C. V.; Cieslinski, D.; Jablonski, F. J. [Divisão de Astrofísica, Instituto Nacional de Pesquisas Espaciais, 12227-010, São José dos Campos, SP (Brazil); Silva, K. M. G. [Gemini Observatory, Casilla 603, La Serena (Chile); Almeida, L. A. [Instituto de Astronomia, Geofísica e Ciências Atmosféricas, Universidade de São Paulo, 05508-900, São Paulo, SP (Brazil); Rodríguez-Ardila, A., E-mail: alexandre@univap.br [Laboratório Nacional de Astrofísica LNA/MCTI, 37504-364, Itajubá MG (Brazil)

2017-04-01

The increasing number of synoptic surveys made by small robotic telescopes, such as the photometric Catalina Real-Time Transient Survey (CRTS), provides a unique opportunity to discover variable sources and improves the statistical samples of such classes of objects. Our goal is the discovery of magnetic Cataclysmic Variables (mCVs). These are rare objects that probe interesting accretion scenarios controlled by the white-dwarf magnetic field. In particular, improved statistics of mCVs would help to address open questions on their formation and evolution. We performed an optical spectroscopy survey to search for signatures of magnetic accretion in 45 variable objects selected mostly from the CRTS. In this sample, we found 32 CVs, 22 being mCV candidates, 13 of which were previously unreported as such. If the proposed classifications are confirmed, it would represent an increase of 4% in the number of known polars and 12% in the number of known IPs. A fraction of our initial sample was classified as extragalactic sources or other types of variable stars by the inspection of the identification spectra. Despite the inherent complexity in identifying a source as an mCV, variability-based selection, followed by spectroscopic snapshot observations, has proved to be an efficient strategy for their discoveries, being a relatively inexpensive approach in terms of telescope time.
Statistical measure of ensemble non reproducibility and correction to Bell's inequality

International Nuclear Information System (INIS)

Khrennikov, A.

2000-01-01

In this work it has been analysed the proof of Bell's inequality and demonstrate that this inequality is related to one particular model of probability theory, namely Kolmogorov measure-theoretical axiomatic, 1933. It was found a (numerical) statistical correction to Bell's inequality. Such an additional term ε φ on the right-hand side of Bell's inequality can be considered as a probability invariant of a quantum state φ. This is a measure of non reproducibility of hidden variables in different runs of experiments. Experiments to verify Bell's inequality can be considered as just experiments to estimate the constant ε φ . It seems that Bell's inequality could not be used as a crucial reason to deny local realism
Practical Statistics for Environmental and Biological Scientists

CERN Document Server

Townend, John

2012-01-01

All students and researchers in environmental and biological sciences require statistical methods at some stage of their work. Many have a preconception that statistics are difficult and unpleasant and find that the textbooks available are difficult to understand. Practical Statistics for Environmental and Biological Scientists provides a concise, user-friendly, non-technical introduction to statistics. The book covers planning and designing an experiment, how to analyse and present data, and the limitations and assumptions of each statistical method. The text does not refer to a specific comp
The temporal variability of species densities

International Nuclear Information System (INIS)

Redfearn, A.; Pimm, S.L.

1993-01-01

Ecologists use the term 'stability' to mean to number of different things (Pimm 1984a). One use is to equate stability with low variability in population density over time (henceforth, temporal variability). Temporal variability varies greatly from species to species, so what effects it? There are at least three sets of factors: the variability of extrinsic abiotic factors, food web structure, and the intrinsic features of the species themselves. We can measure temporal variability using at least three statistics: the coefficient of variation of density (CV); the standard deviation of the logarithms of density (SDL); and the variance in the differences between logarithms of density for pairs of consecutive years (called annual variability, hence AV, b y Wolda 1978). There are advantages and disadvantages to each measure (Williamson 1984), though in our experience, the measures are strongly correlated across sets of taxonomically related species. The increasing availability of long-term data sets allows one to calculate these statistics for many species and so to begin to understand the various causes of species differences in temporal variability
A statistical GIS-based analysis of Wild boar (Sus scrofa traffic collisions in a Mediterranean area

Directory of Open Access Journals (Sweden)

Andrea Amici

2010-01-01

Full Text Available vThis study was aimed at analysing the most relevant environmental variables involved in vehicle–wild boar road collisions in a Mediterranean area, starting from a territorial information system implemented in a GIS environment. Landscape structure indices and local qualitative and quantitative variables were correlated to identify the most frequent predisposing factors of collisions. Statistical tests of the considered parameters indicated a higher frequency of collisions in the evening hours of late summer and autumn (P<0.05 compared with daytime and night-time, localized nearness to attraction points (feeding or resting sites; P<0.001 and closeness to no-hunting areas (P<0.001. In addition, local road parameters (shape of road, visibility at road sides, etc. considerably increased the risk of collisions.
Variations of Histone Modification Patterns: Contributions of Inter-plant Variability and Technical Factors

Directory of Open Access Journals (Sweden)

Sylva Brabencová

2017-12-01

Full Text Available Inter-individual variability of conspecific plants is governed by differences in their genetically determined growth and development traits, environmental conditions, and adaptive responses under epigenetic control involving histone post-translational modifications. The apparent variability in histone modifications among plants might be increased by technical variation introduced in sample processing during epigenetic analyses. Thus, to detect true variations in epigenetic histone patterns associated with given factors, the basal variability among samples that is not associated with them must be estimated. To improve knowledge of relative contribution of biological and technical variation, mass spectrometry was used to examine histone modification patterns (acetylation and methylation among Arabidopsis thaliana plants of ecotypes Columbia 0 (Col-0 and Wassilewskija (Ws homogenized by two techniques (grinding in a cryomill or with a mortar and pestle. We found little difference in histone modification profiles between the ecotypes. However, in comparison of the biological and technical components of variability, we found consistently higher inter-individual variability in histone mark levels among Ws plants than among Col-0 plants (grown from seeds collected either from single plants or sets of plants. Thus, more replicates of Ws would be needed for rigorous analysis of epigenetic marks. Regarding technical variability, the cryomill introduced detectably more heterogeneity in the data than the mortar and pestle treatment, but mass spectrometric analyses had minor apparent effects. Our study shows that it is essential to consider inter-sample variance and estimate suitable numbers of biological replicates for statistical analysis for each studied organism when investigating changes in epigenetic histone profiles.
Variations of Histone Modification Patterns: Contributions of Inter-plant Variability and Technical Factors.

Science.gov (United States)

Brabencová, Sylva; Ihnatová, Ivana; Potěšil, David; Fojtová, Miloslava; Fajkus, Jiří; Zdráhal, Zbyněk; Lochmanová, Gabriela

2017-01-01

Inter-individual variability of conspecific plants is governed by differences in their genetically determined growth and development traits, environmental conditions, and adaptive responses under epigenetic control involving histone post-translational modifications. The apparent variability in histone modifications among plants might be increased by technical variation introduced in sample processing during epigenetic analyses. Thus, to detect true variations in epigenetic histone patterns associated with given factors, the basal variability among samples that is not associated with them must be estimated. To improve knowledge of relative contribution of biological and technical variation, mass spectrometry was used to examine histone modification patterns (acetylation and methylation) among Arabidopsis thaliana plants of ecotypes Columbia 0 (Col-0) and Wassilewskija (Ws) homogenized by two techniques (grinding in a cryomill or with a mortar and pestle). We found little difference in histone modification profiles between the ecotypes. However, in comparison of the biological and technical components of variability, we found consistently higher inter-individual variability in histone mark levels among Ws plants than among Col-0 plants (grown from seeds collected either from single plants or sets of plants). Thus, more replicates of Ws would be needed for rigorous analysis of epigenetic marks. Regarding technical variability, the cryomill introduced detectably more heterogeneity in the data than the mortar and pestle treatment, but mass spectrometric analyses had minor apparent effects. Our study shows that it is essential to consider inter-sample variance and estimate suitable numbers of biological replicates for statistical analysis for each studied organism when investigating changes in epigenetic histone profiles.
Extending statistical boosting. An overview of recent methodological developments.

Science.gov (United States)

Mayr, A; Binder, H; Gefeller, O; Schmid, M

2014-01-01

Boosting algorithms to simultaneously estimate and select predictor effects in statistical models have gained substantial interest during the last decade. This review highlights recent methodological developments regarding boosting algorithms for statistical modelling especially focusing on topics relevant for biomedical research. We suggest a unified framework for gradient boosting and likelihood-based boosting (statistical boosting) which have been addressed separately in the literature up to now. The methodological developments on statistical boosting during the last ten years can be grouped into three different lines of research: i) efforts to ensure variable selection leading to sparser models, ii) developments regarding different types of predictor effects and how to choose them, iii) approaches to extend the statistical boosting framework to new regression settings. Statistical boosting algorithms have been adapted to carry out unbiased variable selection and automated model choice during the fitting process and can nowadays be applied in almost any regression setting in combination with a large amount of different types of predictor effects.
Observer variability when evaluating patient movement from electronic portal images of pelvic radiotherapy fields

International Nuclear Information System (INIS)

Geraint Lewis, D.; Ryan, Karen R.; Smith, Cyril W.

2005-01-01

Background and purpose: A study has been performed to evaluate inter-observer variability when assessing pelvic patient movement using an electronic portal imaging device (EPID). Materials and methods: Four patient image sets were used with 3-6 portal images per set. The observer group consisted of nine radiographers with 3-18 months clinical EPID experience. The observers outlined bony landmarks on a digital simulator image and used matching software to evaluate field placement errors (FPEs) on each portal image relative to the reference simulator image. Data were evaluated statistically, using a two-component analysis of variance technique, to quantify both the inter-observer variability in evaluating FPEs and inter-fraction variability in patient position relative to the residuals of the analysis. Intra-observer variability was also estimated using four of the observers carrying out three sets of repeat readings. Results: Eight sets of variance data were analysed, based on FPEs in two orthogonal directions for each of the four patient image sets studied. Initial analysis showed that both inter-observer variation and inter-fraction-patient position variation were statistically significant (P<0.05) in seven of the eight cases evaluated. The averaged root-mean-square (RMS) deviation of the observers from the group mean was 1.1 mm, with a maximum deviation of 5.0 mm recorded for an individual observer. After additional training and re-testing of two of the observers who recorded the largest deviations from the group mean, a subsequent analysis showed the inter-observer variability for the group to be significant in only three of the eight cases, with averaged RMS deviation reduced to 0.5 mm, with a maximum deviation of 2.7 mm. The intra-observer variability was 0.5 mm, averaged over the four observers tested. Conclusions: We have developed a quantitative approach to evaluate inter-observer variability in terms of its statistical significance compared to inter
Statistical limitations in functional neuroimaging. I. Non-inferential methods and statistical models.

Science.gov (United States)

Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P

1999-01-01

Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149
A new efficient statistical test for detecting variability in the gene expression data.

Science.gov (United States)

Mathur, Sunil; Dolo, Samuel

2008-08-01

DNA microarray technology allows researchers to monitor the expressions of thousands of genes under different conditions. The detection of differential gene expression under two different conditions is very important in microarray studies. Microarray experiments are multi-step procedures and each step is a potential source of variance. This makes the measurement of variability difficult because approach based on gene-by-gene estimation of variance will have few degrees of freedom. It is highly possible that the assumption of equal variance for all the expression levels may not hold. Also, the assumption of normality of gene expressions may not hold. Thus it is essential to have a statistical procedure which is not based on the normality assumption and also it can detect genes with differential variance efficiently. The detection of differential gene expression variance will allow us to identify experimental variables that affect different biological processes and accuracy of DNA microarray measurements.In this article, a new nonparametric test for scale is developed based on the arctangent of the ratio of two expression levels. Most of the tests available in literature require the assumption of normal distribution, which makes them inapplicable in many situations, and it is also hard to verify the suitability of the normal distribution assumption for the given data set. The proposed test does not require the assumption of the distribution for the underlying population and hence makes it more practical and widely applicable. The asymptotic relative efficiency is calculated under different distributions, which show that the proposed test is very powerful when the assumption of normality breaks down. Monte Carlo simulation studies are performed to compare the power of the proposed test with some of the existing procedures. It is found that the proposed test is more powerful than commonly used tests under almost all the distributions considered in the study. A
Statistical treatment of fatigue test data

International Nuclear Information System (INIS)

Raske, D.T.

1980-01-01

This report discussed several aspects of fatigue data analysis in order to provide a basis for the development of statistically sound design curves. Included is a discussion on the choice of the dependent variable, the assumptions associated with least squares regression models, the variability of fatigue data, the treatment of data from suspended tests and outlying observations, and various strain-life relations

Instrumented Impact Testing: Influence of Machine Variables and Specimen Position

International Nuclear Information System (INIS)

Lucon, E.; McCowan, C. N.; Santoyo, R. A.

2008-01-01

An investigation has been conducted on the influence of impact machine variables and specimen positioning on characteristic forces and absorbed energies from instrumented Charpy tests. Brittle and ductile fracture behavior has been investigated by testing NIST reference samples of low, high and super-high energy levels. Test machine variables included tightness of foundation, anvil and striker bolts, and the position of the center of percussion with respect to the center of strike. For specimen positioning, we tested samples which had been moved away or sideways with respect to the anvils. In order to assess the influence of the various factors, we compared mean values in the reference (unaltered) and altered conditions; for machine variables, t-test analyses were also performed in order to evaluate the statistical significance of the observed differences. Our results indicate that the only circumstance which resulted in variations larger than 5 percent for both brittle and ductile specimens is when the sample is not in contact with the anvils. These findings should be taken into account in future revisions of instrumented Charpy test standards.
Instrumented Impact Testing: Influence of Machine Variables and Specimen Position

Energy Technology Data Exchange (ETDEWEB)

Lucon, E.; McCowan, C. N.; Santoyo, R. A.

2008-09-15

An investigation has been conducted on the influence of impact machine variables and specimen positioning on characteristic forces and absorbed energies from instrumented Charpy tests. Brittle and ductile fracture behavior has been investigated by testing NIST reference samples of low, high and super-high energy levels. Test machine variables included tightness of foundation, anvil and striker bolts, and the position of the center of percussion with respect to the center of strike. For specimen positioning, we tested samples which had been moved away or sideways with respect to the anvils. In order to assess the influence of the various factors, we compared mean values in the reference (unaltered) and altered conditions; for machine variables, t-test analyses were also performed in order to evaluate the statistical significance of the observed differences. Our results indicate that the only circumstance which resulted in variations larger than 5 percent for both brittle and ductile specimens is when the sample is not in contact with the anvils. These findings should be taken into account in future revisions of instrumented Charpy test standards.
Variability in source sediment contributions by applying different statistic test for a Pyrenean catchment.

Science.gov (United States)

Palazón, L; Navas, A

2017-06-01

Information on sediment contribution and transport dynamics from the contributing catchments is needed to develop management plans to tackle environmental problems related with effects of fine sediment as reservoir siltation. In this respect, the fingerprinting technique is an indirect technique known to be valuable and effective for sediment source identification in river catchments. Large variability in sediment delivery was found in previous studies in the Barasona catchment (1509 km 2 , Central Spanish Pyrenees). Simulation results with SWAT and fingerprinting approaches identified badlands and agricultural uses as the main contributors to sediment supply in the reservoir. In this study the Kruskal-Wallis H-test and (3) principal components analysis. Source contribution results were different between assessed options with the greatest differences observed for option using #3, including the two step process: principal components analysis and discriminant function analysis. The characteristics of the solutions by the applied mixing model and the conceptual understanding of the catchment showed that the most reliable solution was achieved using #2, the two step process of Kruskal-Wallis H-test and discriminant function analysis. The assessment showed the importance of the statistical procedure used to define the optimum composite fingerprint for sediment fingerprinting applications. Copyright © 2016 Elsevier Ltd. All rights reserved.
Hospitalizations for ambulatory care sensitive conditions and quality of primary care: their relation with socioeconomic and health care variables in the Madrid regional health service (Spain).

Science.gov (United States)

Magán, Purificación; Alberquilla, Angel; Otero, Angel; Ribera, José Manuel

2011-01-01

Hospitalizations for ambulatory care sensitive conditions (ACSH) have been proposed as an indirect indicator of the effectiveness and quality of care provided by primary health care. To investigate the association of ACSH rates with population socioeconomic factors and with characteristics of primary health care. Cross-sectional, ecologic study. Using hospital discharge data, ACSH were selected from the list of conditions validated for Spain. All 34 health districts in the Region of Madrid, Spain. Individuals aged 65 years or older residing in the region of Madrid between 2001 and 2003, inclusive. Age- and gender-adjusted ACSH rates in each health district. The adjusted ACSH rate per 1000 population was 35.37 in men and 20.45 in women. In the Poisson regression analysis, an inverse relation was seen between ACSH rates and the socioeconomic variables. Physician workload was the only health care variable with a statistically significant relation (rate ratio of 1.066 [95% CI; 1.041-1.091]). These results were similar in the analyses disaggregated by gender. In the multivariate analyses that included health care variables, none of the health care variables were statistically significant. ACSH may be more closely related with socioeconomic variables than with characteristics of primary care activity. Therefore, other factors outside the health system must be considered to improve health outcomes in the population.
The semi-empirical low-level background statistics

International Nuclear Information System (INIS)

Tran Manh Toan; Nguyen Trieu Tu

1992-01-01

A semi-empirical low-level background statistics was proposed. The one can be applied to evaluated the sensitivity of low background systems, and to analyse the statistical error, the 'Rejection' and 'Accordance' criteria for processing of low-level experimental data. (author). 5 refs, 1 figs
Statistics for Ratios of Rayleigh, Rician, Nakagami-m, and Weibull Distributed Random Variables

Directory of Open Access Journals (Sweden)

Dragana Č. Pavlović

2013-01-01

Full Text Available The distributions of ratios of random variables are of interest in many areas of the sciences. In this brief paper, we present the joint probability density function (PDF and PDF of maximum of ratios μ1=R1/r1 and μ2=R2/r2 for the cases where R1, R2, r1, and r2 are Rayleigh, Rician, Nakagami-m, and Weibull distributed random variables. Random variables R1 and R2, as well as random variables r1 and r2, are correlated. Ascertaining on the suitability of the Weibull distribution to describe fading in both indoor and outdoor environments, special attention is dedicated to the case of Weibull random variables. For this case, analytical expressions for the joint PDF, PDF of maximum, PDF of minimum, and product moments of arbitrary number of ratios μi=Ri/ri, i=1,…,L are obtained. Random variables in numerator, Ri, as well as random variables in denominator, ri, are exponentially correlated. To the best of the authors' knowledge, analytical expressions for the PDF of minimum and product moments of {μi}i=1L are novel in the open technical literature. The proposed mathematical analysis is complemented by various numerical results. An application of presented theoretical results is illustrated with respect to performance assessment of wireless systems.
Search Databases and Statistics

DEFF Research Database (Denmark)

Refsgaard, Jan C; Munk, Stephanie; Jensen, Lars J

2016-01-01

having strengths and weaknesses that must be considered for the individual needs. These are reviewed in this chapter. Equally critical for generating highly confident output datasets is the application of sound statistical criteria to limit the inclusion of incorrect peptide identifications from database...... searches. Additionally, careful filtering and use of appropriate statistical tests on the output datasets affects the quality of all downstream analyses and interpretation of the data. Our considerations and general practices on these aspects of phosphoproteomics data processing are presented here....
A New Statistical Approach to the Optical Spectral Variability in Blazars

Directory of Open Access Journals (Sweden)

Jose A. Acosta-Pulido

2016-12-01

Full Text Available We present a spectral variability study of a sample of about 25 bright blazars, based on optical spectroscopy. Observations cover the period from the end of 2008 to mid 2015, with an approximately monthly cadence. Emission lines have been identified and measured in the spectra, which permits us to classify the sources into BL Lac-type or FSRQs, according to the commonly used EW limit. We have obtained synthetic photometry and produced colour-magnitude diagrams which show different trends associated with the object classes: generally, BL Lacs tend to become bluer when brighter and FSRQs become redder when brighter, although several objects exhibit both trends, depending on brightness. We have also applied a pattern recognition algorithm to obtain the minimum number of physical components which can explain the variability of the optical spectrum. We have used NMF (Non-Negative Matrix Factorization instead of PCA (Principal Component Analysis to avoid un-realistic negative components. For most targets we found that 2 or 3 meta-components are enough to explain the observed spectral variability.
Using R-Project for Free Statistical Analysis in Extension Research

Science.gov (United States)

Mangiafico, Salvatore S.

2013-01-01

One option for Extension professionals wishing to use free statistical software is to use online calculators, which are useful for common, simple analyses. A second option is to use a free computing environment capable of performing statistical analyses, like R-project. R-project is free, cross-platform, powerful, and respected, but may be…
Changing statistics of storms in the North Atlantic?

International Nuclear Information System (INIS)

Storch, H. von; Guddal, J.; Iden, K.A.; Jonsson, T.; Perlwitz, J.; Reistad, M.; Ronde, J. de; Schmidt, H.; Zorita, E.

1993-01-01

Problems in the present discussion about increasing storminess in the North Atlantic area are discusesd. Observational data so far available do not indicate a change in the storm statistics. Output from climate models points to an itensified storm track in the North Atlantic, but because of the limited skill of present-day climate models in simulating high-frequency variability and regional details any such 'forecast' has to be considered with caution. A downscaling procedure which relates large-scale time-mean aspects of the state of the atmosphere and ocean to the local statistics of storms is proposed to reconstruct past variations of high-frequency variability in the atmosphere (storminess) and in the sea state (wave statistics). First results are presented. (orig.)
A Comparison of Methods to Test Mediation and Other Intervening Variable Effects

Science.gov (United States)

MacKinnon, David P.; Lockwood, Chondra M.; Hoffman, Jeanne M.; West, Stephen G.; Sheets, Virgil

2010-01-01

A Monte Carlo study compared 14 methods to test the statistical significance of the intervening variable effect. An intervening variable (mediator) transmits the effect of an independent variable to a dependent variable. The commonly used R. M. Baron and D. A. Kenny (1986) approach has low statistical power. Two methods based on the distribution of the product and 2 difference-in-coefficients methods have the most accurate Type I error rates and greatest statistical power except in 1 important case in which Type I error rates are too high. The best balance of Type I error and statistical power across all cases is the test of the joint significance of the two effects comprising the intervening variable effect. PMID:11928892
Statistical model selection with “Big Data”

Directory of Open Access Journals (Sweden)

Jurgen A. Doornik

2015-12-01

Full Text Available Big Data offer potential benefits for statistical modelling, but confront problems including an excess of false positives, mistaking correlations for causes, ignoring sampling biases and selecting by inappropriate methods. We consider the many important requirements when searching for a data-based relationship using Big Data, and the possible role of Autometrics in that context. Paramount considerations include embedding relationships in general initial models, possibly restricting the number of variables to be selected over by non-statistical criteria (the formulation problem, using good quality data on all variables, analyzed with tight significance levels by a powerful selection procedure, retaining available theory insights (the selection problem while testing for relationships being well specified and invariant to shifts in explanatory variables (the evaluation problem, using a viable approach that resolves the computational problem of immense numbers of possible models.
Power calculator for instrumental variable analysis in pharmacoepidemiology.

Science.gov (United States)

Walker, Venexia M; Davies, Neil M; Windmeijer, Frank; Burgess, Stephen; Martin, Richard M

2017-10-01

Instrumental variable analysis, for example with physicians' prescribing preferences as an instrument for medications issued in primary care, is an increasingly popular method in the field of pharmacoepidemiology. Existing power calculators for studies using instrumental variable analysis, such as Mendelian randomization power calculators, do not allow for the structure of research questions in this field. This is because the analysis in pharmacoepidemiology will typically have stronger instruments and detect larger causal effects than in other fields. Consequently, there is a need for dedicated power calculators for pharmacoepidemiological research. The formula for calculating the power of a study using instrumental variable analysis in the context of pharmacoepidemiology is derived before being validated by a simulation study. The formula is applicable for studies using a single binary instrument to analyse the causal effect of a binary exposure on a continuous outcome. An online calculator, as well as packages in both R and Stata, are provided for the implementation of the formula by others. The statistical power of instrumental variable analysis in pharmacoepidemiological studies to detect a clinically meaningful treatment effect is an important consideration. Research questions in this field have distinct structures that must be accounted for when calculating power. The formula presented differs from existing instrumental variable power formulae due to its parametrization, which is designed specifically for ease of use by pharmacoepidemiologists. © The Author 2017. Published by Oxford University Press on behalf of the International Epidemiological Association
Applied statistics in agricultural, biological, and environmental sciences.

Science.gov (United States)

Agronomic research often involves measurement and collection of multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate statistical methods encompass the simultaneous analysis of all random variables measured on each experimental or s...
An Asynchronous Many-Task Implementation of In-Situ Statistical Analysis using Legion.

Energy Technology Data Exchange (ETDEWEB)

Pebay, Philippe Pierre [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Bennett, Janine Camille [Sandia National Lab. (SNL-CA), Livermore, CA (United States)

2015-11-01

In this report, we propose a framework for the design and implementation of in-situ analy- ses using an asynchronous many-task (AMT) model, using the Legion programming model together with the MiniAero mini-application as a surrogate for full-scale parallel scientific computing applications. The bulk of this work consists of converting the Learn/Derive/Assess model which we had initially developed for parallel statistical analysis using MPI [PTBM11], from a SPMD to an AMT model. In this goal, we propose an original use of the concept of Legion logical regions as a replacement for the parallel communication schemes used for the only operation of the statistics engines that require explicit communication. We then evaluate this proposed scheme in a shared memory environment, using the Legion port of MiniAero as a proxy for a full-scale scientific application, as a means to provide input data sets of variable size for the in-situ statistical analyses in an AMT context. We demonstrate in particular that the approach has merit, and warrants further investigation, in collaboration with ongoing efforts to improve the overall parallel performance of the Legion system.
Statistical analysis and data management

International Nuclear Information System (INIS)

Anon.

1981-01-01

This report provides an overview of the history of the WIPP Biology Program. The recommendations of the American Institute of Biological Sciences (AIBS) for the WIPP biology program are summarized. The data sets available for statistical analyses and problems associated with these data sets are also summarized. Biological studies base maps are presented. A statistical model is presented to evaluate any correlation between climatological data and small mammal captures. No statistically significant relationship between variance in small mammal captures on Dr. Gennaro's 90m x 90m grid and precipitation records from the Duval Potash Mine were found
Statistical methods and computing for big data

Science.gov (United States)

Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing

2016-01-01

Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay. PMID:27695593
Statistical methods and computing for big data.

Science.gov (United States)

Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing; Yan, Jun

2016-01-01

Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay.
Collective variables and dissipation

International Nuclear Information System (INIS)

Balian, R.

1984-09-01

This is an introduction to some basic concepts of non-equilibrium statistical mechanics. We emphasize in particular the relevant entropy relative to a given set of collective variables, the meaning of the projection method in the Liouville space, its use to establish the generalized transport equations for these variables, and the interpretation of dissipation in the framework of information theory
An Investigation of the Variety and Complexity of Statistical Methods Used in Current Internal Medicine Literature.

Science.gov (United States)

Narayanan, Roshni; Nugent, Rebecca; Nugent, Kenneth

2015-10-01

Accreditation Council for Graduate Medical Education guidelines require internal medicine residents to develop skills in the interpretation of medical literature and to understand the principles of research. A necessary component is the ability to understand the statistical methods used and their results, material that is not an in-depth focus of most medical school curricula and residency programs. Given the breadth and depth of the current medical literature and an increasing emphasis on complex, sophisticated statistical analyses, the statistical foundation and education necessary for residents are uncertain. We reviewed the statistical methods and terms used in 49 articles discussed at the journal club in the Department of Internal Medicine residency program at Texas Tech University between January 1, 2013 and June 30, 2013. We collected information on the study type and on the statistical methods used for summarizing and comparing samples, determining the relations between independent variables and dependent variables, and estimating models. We then identified the typical statistics education level at which each term or method is learned. A total of 14 articles came from the Journal of the American Medical Association Internal Medicine, 11 from the New England Journal of Medicine, 6 from the Annals of Internal Medicine, 5 from the Journal of the American Medical Association, and 13 from other journals. Twenty reported randomized controlled trials. Summary statistics included mean values (39 articles), category counts (38), and medians (28). Group comparisons were based on t tests (14 articles), χ2 tests (21), and nonparametric ranking tests (10). The relations between dependent and independent variables were analyzed with simple regression (6 articles), multivariate regression (11), and logistic regression (8). Nine studies reported odds ratios with 95% confidence intervals, and seven analyzed test performance using sensitivity and specificity calculations

Statistical-Dynamical Seasonal Forecasts of Central-Southwest Asian Winter Precipitation.

Science.gov (United States)

Tippett, Michael K.; Goddard, Lisa; Barnston, Anthony G.

2005-06-01

Interannual precipitation variability in central-southwest (CSW) Asia has been associated with East Asian jet stream variability and western Pacific tropical convection. However, atmospheric general circulation models (AGCMs) forced by observed sea surface temperature (SST) poorly simulate the region's interannual precipitation variability. The statistical-dynamical approach uses statistical methods to correct systematic deficiencies in the response of AGCMs to SST forcing. Statistical correction methods linking model-simulated Indo-west Pacific precipitation and observed CSW Asia precipitation result in modest, but statistically significant, cross-validated simulation skill in the northeast part of the domain for the period from 1951 to 1998. The statistical-dynamical method is also applied to recent (winter 1998/99 to 2002/03) multimodel, two-tier December-March precipitation forecasts initiated in October. This period includes 4 yr (winter of 1998/99 to 2001/02) of severe drought. Tercile probability forecasts are produced using ensemble-mean forecasts and forecast error estimates. The statistical-dynamical forecasts show enhanced probability of below-normal precipitation for the four drought years and capture the return to normal conditions in part of the region during the winter of 2002/03.May Kabul be without gold, but not without snow.—Traditional Afghan proverb
No effects of functional exercise therapy on walking biomechanics in patients with knee osteoarthritis: exploratory outcome analyses from a randomised trial.

Science.gov (United States)

Henriksen, Marius; Klokker, Louise; Bartholdy, Cecilie; Schjoedt-Jorgensen, Tanja; Bandak, Elisabeth; Bliddal, Henning

2016-01-01

To assess the effects of a functional and individualised exercise programme on gait biomechanics during walking in people with knee OA. Sixty participants were randomised to 12 weeks of facility-based functional and individualised neuromuscular exercise therapy (ET), 3 sessions per week supervised by trained physical therapists, or a no attention control group (CG). Three-dimensional gait analyses were used, from which a comprehensive list of conventional gait variables were extracted (totally 52 kinematic, kinetic and spatiotemporal variables). According to the protocol, the analyses were based on the 'Per-Protocol' population (defined as participants following the protocol with complete and valid gait analyses). Analysis of covariance adjusting for the level at baseline was used to determine differences between groups (95% CIs) in the changes from baseline at follow-up. The per-protocol population included 46 participants (24 ET/22 CG). There were no group differences in the analysed gait variables, except for a significant group difference in the second peak knee flexor moment and second peak vertical ground reaction force. While plausible we have limited confidence in the findings due to multiple statistical tests and lack of biomechanical logics. Therefore we conclude that a 12-week supervised individualised neuromuscular exercise programme has no effects on gait biomechanics. Future studies should focus on exercise programmes specifically designed to alter gait patterns, or include other measures of mobility, such as walking on stairs or inclined surfaces. ClinicalTrials.gov: NCT01545258.
Probing NWP model deficiencies by statistical postprocessing

DEFF Research Database (Denmark)

Rosgaard, Martin Haubjerg; Nielsen, Henrik Aalborg; Nielsen, Torben S.

2016-01-01

The objective in this article is twofold. On one hand, a Model Output Statistics (MOS) framework for improved wind speed forecast accuracy is described and evaluated. On the other hand, the approach explored identifies unintuitive explanatory value from a diagnostic variable in an operational....... Based on the statistical model candidates inferred from the data, the lifted index NWP model diagnostic is consistently found among the NWP model predictors of the best performing statistical models across sites....
Design and implementation of a modular program system for the carrying-through of statistical analyses

International Nuclear Information System (INIS)

Beck, W.

1984-01-01

From the complexity of computer programs for the solution of scientific and technical problems results a lot of questions. Typical questions concern the strength and weakness of computer programs, the propagation of incertainties among the input data, the sensitivity of input data on output data and the substitute of complex models by more simple ones, which provide equivalent results in certain ranges. Those questions have a general practical meaning, principle answers may be found by statistical methods, which are based on the Monte Carlo Method. In this report the statistical methods are chosen, described and valuated. They are implemented into the modular program system STAR, which is an own component of the program system RSYST. The design of STAR considers users with different knowledge of data processing and statistics. The variety of statistical methods, generating and evaluating procedures. The processing of large data sets in complex structures. The coupling to other components of RSYST and RSYST foreign programs. That the system can be easily modificated and enlarged. Four examples are given, which demonstrate the application of STAR. (orig.) [de
A simulation study on estimating biomarker-treatment interaction effects in randomized trials with prognostic variables.

Science.gov (United States)

Haller, Bernhard; Ulm, Kurt

2018-02-20

To individualize treatment decisions based on patient characteristics, identification of an interaction between a biomarker and treatment is necessary. Often such potential interactions are analysed using data from randomized clinical trials intended for comparison of two treatments. Tests of interactions are often lacking statistical power and we investigated if and how a consideration of further prognostic variables can improve power and decrease the bias of estimated biomarker-treatment interactions in randomized clinical trials with time-to-event outcomes. A simulation study was performed to assess how prognostic factors affect the estimate of the biomarker-treatment interaction for a time-to-event outcome, when different approaches, like ignoring other prognostic factors, including all available covariates or using variable selection strategies, are applied. Different scenarios regarding the proportion of censored observations, the correlation structure between the covariate of interest and further potential prognostic variables, and the strength of the interaction were considered. The simulation study revealed that in a regression model for estimating a biomarker-treatment interaction, the probability of detecting a biomarker-treatment interaction can be increased by including prognostic variables that are associated with the outcome, and that the interaction estimate is biased when relevant prognostic variables are not considered. However, the probability of a false-positive finding increases if too many potential predictors are included or if variable selection is performed inadequately. We recommend undertaking an adequate literature search before data analysis to derive information about potential prognostic variables and to gain power for detecting true interaction effects and pre-specifying analyses to avoid selective reporting and increased false-positive rates.
Statistical analyses of the magnet data for the advanced photon source storage ring magnets

International Nuclear Information System (INIS)

Kim, S.H.; Carnegie, D.W.; Doose, C.; Hogrefe, R.; Kim, K.; Merl, R.

1995-01-01

The statistics of the measured magnetic data of 80 dipole, 400 quadrupole, and 280 sextupole magnets of conventional resistive designs for the APS storage ring is summarized. In order to accommodate the vacuum chamber, the curved dipole has a C-type cross section and the quadrupole and sextupole cross sections have 180 degrees and 120 degrees symmetries, respectively. The data statistics include the integrated main fields, multipole coefficients, magnetic and mechanical axes, and roll angles of the main fields. The average and rms values of the measured magnet data meet the storage ring requirements
Grey literature in meta-analyses.

Science.gov (United States)

Conn, Vicki S; Valentine, Jeffrey C; Cooper, Harris M; Rantz, Marilyn J

2003-01-01

In meta-analysis, researchers combine the results of individual studies to arrive at cumulative conclusions. Meta-analysts sometimes include "grey literature" in their evidential base, which includes unpublished studies and studies published outside widely available journals. Because grey literature is a source of data that might not employ peer review, critics have questioned the validity of its data and the results of meta-analyses that include it. To examine evidence regarding whether grey literature should be included in meta-analyses and strategies to manage grey literature in quantitative synthesis. This article reviews evidence on whether the results of studies published in peer-reviewed journals are representative of results from broader samplings of research on a topic as a rationale for inclusion of grey literature. Strategies to enhance access to grey literature are addressed. The most consistent and robust difference between published and grey literature is that published research is more likely to contain results that are statistically significant. Effect size estimates of published research are about one-third larger than those of unpublished studies. Unfunded and small sample studies are less likely to be published. Yet, importantly, methodological rigor does not differ between published and grey literature. Meta-analyses that exclude grey literature likely (a) over-represent studies with statistically significant findings, (b) inflate effect size estimates, and (c) provide less precise effect size estimates than meta-analyses including grey literature. Meta-analyses should include grey literature to fully reflect the existing evidential base and should assess the impact of methodological variations through moderator analysis.
Handbook of latent variable and related models

CERN Document Server

Lee, Sik-Yum

2011-01-01

This Handbook covers latent variable models, which are a flexible class of models for modeling multivariate data to explore relationships among observed and latent variables.- Covers a wide class of important models- Models and statistical methods described provide tools for analyzing a wide spectrum of complicated data- Includes illustrative examples with real data sets from business, education, medicine, public health and sociology.- Demonstrates the use of a wide variety of statistical, computational, and mathematical techniques.
Analyse de l'évolution de la consommation de plats préparés en Belgique

Directory of Open Access Journals (Sweden)

Winandy, S.

2013-01-01

Full Text Available Trend analysis of the consumption of ready-made food in Belgium. This study aims to analyze the evolution of the consumption of ready-made food in Belgium and to discern any trends. The data used are expenditure made on convenience food from 1999 to 2009 (in euros and socio-demographic characteristics, obtained from the Household Budget Survey undertaken annually by the Federal Public Service Economy (DGSIE. Statistical analyses (simple, multiple and binary logistic regressions enable the study of consumption trends in terms of expenditure and percentage of consumers as well as the determination of consumer profiles. Convenience food is part of the eating habits of Belgian households, with more than nine out of ten households being consumers. Convenience food prepared from meat is the most frequently purchased. For all ready-made food, the variable of household size explains the most variability in consumption, followed by the variables of income and age.
Who theorizes age? The "socio-demographic variables" device and age-period-cohort analysis in the rhetoric of survey research.

Science.gov (United States)

Rughiniș, Cosima; Humă, Bogdana

2015-12-01

In this paper we argue that quantitative survey-based social research essentializes age, through specific rhetorical tools. We outline the device of 'socio-demographic variables' and we discuss its argumentative functions, looking at scientific survey-based analyses of adult scientific literacy, in the Public Understanding of Science research field. 'Socio-demographics' are virtually omnipresent in survey literature: they are, as a rule, used and discussed as bundles of independent variables, requiring little, if any, theoretical and measurement attention. 'Socio-demographics' are rhetorically effective through their common-sense richness of meaning and inferential power. We identify their main argumentation functions as 'structure building', 'pacification', and 'purification'. Socio-demographics are used to uphold causal vocabularies, supporting the transmutation of the descriptive statistical jargon of 'effects' and 'explained variance' into 'explanatory factors'. Age can also be studied statistically as a main variable of interest, through the age-period-cohort (APC) disambiguation technique. While this approach has generated interesting findings, it did not mitigate the reductionism that appears when treating age as a socio-demographic variable. By working with age as a 'socio-demographic variable', quantitative researchers convert it (inadvertently) into a quasi-biological feature, symmetrical, as regards analytical treatment, with pathogens in epidemiological research. Copyright © 2015 Elsevier Inc. All rights reserved.
Reliability, reference values and predictor variables of the ulnar sensory nerve in disease free adults.

Science.gov (United States)

Ruediger, T M; Allison, S C; Moore, J M; Wainner, R S

2014-09-01

The purposes of this descriptive and exploratory study were to examine electrophysiological measures of ulnar sensory nerve function in disease free adults to determine reliability, determine reference values computed with appropriate statistical methods, and examine predictive ability of anthropometric variables. Antidromic sensory nerve conduction studies of the ulnar nerve using surface electrodes were performed on 100 volunteers. Reference values were computed from optimally transformed data. Reliability was computed from 30 subjects. Multiple linear regression models were constructed from four predictor variables. Reliability was greater than 0.85 for all paired measures. Responses were elicited in all subjects; reference values for sensory nerve action potential (SNAP) amplitude from above elbow stimulation are 3.3 μV and decrement across-elbow less than 46%. No single predictor variable accounted for more than 15% of the variance in the response. Electrophysiologic measures of the ulnar sensory nerve are reliable. Absent SNAP responses are inconsistent with disease free individuals. Reference values recommended in this report are based on appropriate transformations of non-normally distributed data. No strong statistical model of prediction could be derived from the limited set of predictor variables. Reliability analyses combined with relatively low level of measurement error suggest that ulnar sensory reference values may be used with confidence. Copyright © 2014 Elsevier Masson SAS. All rights reserved.
Reproducible statistical analysis with multiple languages

DEFF Research Database (Denmark)

Lenth, Russell; Højsgaard, Søren

2011-01-01

This paper describes the system for making reproducible statistical analyses. differs from other systems for reproducible analysis in several ways. The two main differences are: (1) Several statistics programs can be in used in the same document. (2) Documents can be prepared using OpenOffice or ......Office or \\LaTeX. The main part of this paper is an example showing how to use and together in an OpenOffice text document. The paper also contains some practical considerations on the use of literate programming in statistics....
RBioplot: an easy-to-use R pipeline for automated statistical analysis and data visualization in molecular biology and biochemistry

Directory of Open Access Journals (Sweden)

Jing Zhang

2016-09-01

Full Text Available Background Statistical analysis and data visualization are two crucial aspects in molecular biology and biology. For analyses that compare one dependent variable between standard (e.g., control and one or multiple independent variables, a comprehensive yet highly streamlined solution is valuable. The computer programming language R is a popular platform for researchers to develop tools that are tailored specifically for their research needs. Here we present an R package RBioplot that takes raw input data for automated statistical analysis and plotting, highly compatible with various molecular biology and biochemistry lab techniques, such as, but not limited to, western blotting, PCR, and enzyme activity assays. Method The package is built based on workflows operating on a simple raw data layout, with minimum user input or data manipulation required. The package is distributed through GitHub, which can be easily installed through one single-line R command. A detailed installation guide is available at http://kenstoreylab.com/?page_id=2448. Users can also download demo datasets from the same website. Results and Discussion By integrating selected functions from existing statistical and data visualization packages with extensive customization, RBioplot features both statistical analysis and data visualization functionalities. Key properties of RBioplot include: -Fully automated and comprehensive statistical analysis, including normality test, equal variance test, Student’s t-test and ANOVA (with post-hoc tests; -Fully automated histogram, heatmap and joint-point curve plotting modules; -Detailed output files for statistical analysis, data manipulation and high quality graphs; -Axis range finding and user customizable tick settings; -High user-customizability.
Assessment of statistical methods used in library-based approaches to microbial source tracking.

Science.gov (United States)

Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D

2003-12-01

Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.
Critical analysis of adsorption data statistically

Science.gov (United States)

Kaushal, Achla; Singh, S. K.

2017-10-01

Experimental data can be presented, computed, and critically analysed in a different way using statistics. A variety of statistical tests are used to make decisions about the significance and validity of the experimental data. In the present study, adsorption was carried out to remove zinc ions from contaminated aqueous solution using mango leaf powder. The experimental data was analysed statistically by hypothesis testing applying t test, paired t test and Chi-square test to (a) test the optimum value of the process pH, (b) verify the success of experiment and (c) study the effect of adsorbent dose in zinc ion removal from aqueous solutions. Comparison of calculated and tabulated values of t and χ 2 showed the results in favour of the data collected from the experiment and this has been shown on probability charts. K value for Langmuir isotherm was 0.8582 and m value for Freundlich adsorption isotherm obtained was 0.725, both are mango leaf powder.
Honey bees as indicators of radionuclide contamination: exploring colony variability and temporal contaminant accumulation

International Nuclear Information System (INIS)

Haarmann, T.K.

1997-01-01

Two aspects of using honey bees, Apis mellifera, as indicators of environmental radionuclide contamination were investigated: colony variability and temporal contaminant accumulation. Two separate field experiments were conducted in areas with bioavailable radionuclide contamination. Bees were collected from colonies, analysed for concentrations of radionuclides, and the results were compared using graphical and statistical methods. The first experiment indicates that generally a low variability exists between samples collected within the same colony. A higher variability exists between samples collected from adjacent colonies. Levels of tritium and sodium-22 found in samples taken from similar colonies were inconsistent, while levels of cobalt-57, cobalt-60 and manganese-54 were consistent. A second experiment investigated the accumulation of radionuclides over time by comparing colonies that had been in the study area for different periods of time. This experiment demonstrated that there is indeed a significant accumulation of radionuclides within colonies
Nuclear power plant analysers: their approach to analysis and design

International Nuclear Information System (INIS)

Ancarani, A.; Zanobetti, D.

1985-01-01

''Analysers'' as used for nuclear power plant simulators are powerful tools and their purpose can be variously assigned: it may vary from the aid in the design of power plants to the assistance to operators in emergency situations. A fundamental problem arising from the analysers' concept and use is the definition of the simulation capability. This can be assessed either by comparison with previous operational data statistically significant and suitably elaborated; or by comparison with theoretical (computed) values obtained from engineering codes. In both these, to take advantage of all the possibilities offered by the ''analysers'', it is mandatory that suitable terms of reference be clearly stated and agreed upon. Particular care is devoted to accuracy in the prediction of physical values both for the steady state and the transient situations. For instance, it can be seen that such evaluations can be met by specifying the maximum error on value of parameters (ordinates), save for very fast transients; the maximum error on time (abscissae) for occurrence of extreme values; the maximum error on values of extremes (ordinates); the maximum error on derivatives (slopes) for rapidly variable transients, save near extreme values. The paper also deals with a brief account of the present projects and proposals in different countries as known from various sources, and mentions a possible co-ordination at international level. (author)
Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I2 statistic.

Science.gov (United States)

Bowden, Jack; Del Greco M, Fabiola; Minelli, Cosetta; Davey Smith, George; Sheehan, Nuala A; Thompson, John R

2016-12-01

: MR-Egger regression has recently been proposed as a method for Mendelian randomization (MR) analyses incorporating summary data estimates of causal effect from multiple individual variants, which is robust to invalid instruments. It can be used to test for directional pleiotropy and provides an estimate of the causal effect adjusted for its presence. MR-Egger regression provides a useful additional sensitivity analysis to the standard inverse variance weighted (IVW) approach that assumes all variants are valid instruments. Both methods use weights that consider the single nucleotide polymorphism (SNP)-exposure associations to be known, rather than estimated. We call this the `NO Measurement Error' (NOME) assumption. Causal effect estimates from the IVW approach exhibit weak instrument bias whenever the genetic variants utilized violate the NOME assumption, which can be reliably measured using the F-statistic. The effect of NOME violation on MR-Egger regression has yet to be studied. An adaptation of the I2 statistic from the field of meta-analysis is proposed to quantify the strength of NOME violation for MR-Egger. It lies between 0 and 1, and indicates the expected relative bias (or dilution) of the MR-Egger causal estimate in the two-sample MR context. We call it IGX2 . The method of simulation extrapolation is also explored to counteract the dilution. Their joint utility is evaluated using simulated data and applied to a real MR example. In simulated two-sample MR analyses we show that, when a causal effect exists, the MR-Egger estimate of causal effect is biased towards the null when NOME is violated, and the stronger the violation (as indicated by lower values of IGX2 ), the stronger the dilution. When additionally all genetic variants are valid instruments, the type I error rate of the MR-Egger test for pleiotropy is inflated and the causal effect underestimated. Simulation extrapolation is shown to substantially mitigate these adverse effects. We
Statistical Power in Plant Pathology Research.

Science.gov (United States)

Gent, David H; Esker, Paul D; Kriss, Alissa B

2018-01-01

In null hypothesis testing, failure to reject a null hypothesis may have two potential interpretations. One interpretation is that the treatments being evaluated do not have a significant effect, and a correct conclusion was reached in the analysis. Alternatively, a treatment effect may have existed but the conclusion of the study was that there was none. This is termed a Type II error, which is most likely to occur when studies lack sufficient statistical power to detect a treatment effect. In basic terms, the power of a study is the ability to identify a true effect through a statistical test. The power of a statistical test is 1 - (the probability of Type II errors), and depends on the size of treatment effect (termed the effect size), variance, sample size, and significance criterion (the probability of a Type I error, α). Low statistical power is prevalent in scientific literature in general, including plant pathology. However, power is rarely reported, creating uncertainty in the interpretation of nonsignificant results and potentially underestimating small, yet biologically significant relationships. The appropriate level of power for a study depends on the impact of Type I versus Type II errors and no single level of power is acceptable for all purposes. Nonetheless, by convention 0.8 is often considered an acceptable threshold and studies with power less than 0.5 generally should not be conducted if the results are to be conclusive. The emphasis on power analysis should be in the planning stages of an experiment. Commonly employed strategies to increase power include increasing sample sizes, selecting a less stringent threshold probability for Type I errors, increasing the hypothesized or detectable effect size, including as few treatment groups as possible, reducing measurement variability, and including relevant covariates in analyses. Power analysis will lead to more efficient use of resources and more precisely structured hypotheses, and may even
The dynamics of variable-density turbulence

International Nuclear Information System (INIS)

Sandoval, D.L.

1995-11-01

The dynamics of variable-density turbulent fluids are studied by direct numerical simulation. The flow is incompressible so that acoustic waves are decoupled from the problem, and implying that density is not a thermodynamic variable. Changes in density occur due to molecular mixing. The velocity field, is in general, divergent. A pseudo-spectral numerical technique is used to solve the equations of motion. Three-dimensional simulations are performed using a grid size of 128 3 grid points. Two types of problems are studied: (1) the decay of isotropic, variable-density turbulence, and (2) buoyancy-generated turbulence in a fluid with large density fluctuations. In the case of isotropic, variable-density turbulence, the overall statistical decay behavior, for the cases studied, is relatively unaffected by the presence of density variations when the initial density and velocity fields are statistically independent. The results for this case are in quantitative agreement with previous numerical and laboratory results. In this case, the initial density field has a bimodal probability density function (pdf) which evolves in time towards a Gaussian distribution. The pdf of the density field is symmetric about its mean value throughout its evolution. If the initial velocity and density fields are statistically dependent, however, the decay process is significantly affected by the density fluctuations. For the case of buoyancy-generated turbulence, variable-density departures from the Boussinesq approximation are studied. The results of the buoyancy-generated turbulence are compared with variable-density model predictions. Both a one-point (engineering) model and a two-point (spectral) model are tested against the numerical data. Some deficiencies in these variable-density models are discussed and modifications are suggested

PreP+07: improvements of a user friendly tool to preprocess and analyse microarray data

Directory of Open Access Journals (Sweden)

Claros M Gonzalo

2009-01-01

Full Text Available Abstract Background Nowadays, microarray gene expression analysis is a widely used technology that scientists handle but whose final interpretation usually requires the participation of a specialist. The need for this participation is due to the requirement of some background in statistics that most users lack or have a very vague notion of. Moreover, programming skills could also be essential to analyse these data. An interactive, easy to use application seems therefore necessary to help researchers to extract full information from data and analyse them in a simple, powerful and confident way. Results PreP+07 is a standalone Windows XP application that presents a friendly interface for spot filtration, inter- and intra-slide normalization, duplicate resolution, dye-swapping, error removal and statistical analyses. Additionally, it contains two unique implementation of the procedures – double scan and Supervised Lowess-, a complete set of graphical representations – MA plot, RG plot, QQ plot, PP plot, PN plot – and can deal with many data formats, such as tabulated text, GenePix GPR and ArrayPRO. PreP+07 performance has been compared with the equivalent functions in Bioconductor using a tomato chip with 13056 spots. The number of differentially expressed genes considering p-values coming from the PreP+07 and Bioconductor Limma packages were statistically identical when the data set was only normalized; however, a slight variability was appreciated when the data was both normalized and scaled. Conclusion PreP+07 implementation provides a high degree of freedom in selecting and organizing a small set of widely used data processing protocols, and can handle many data formats. Its reliability has been proven so that a laboratory researcher can afford a statistical pre-processing of his/her microarray results and obtain a list of differentially expressed genes using PreP+07 without any programming skills. All of this gives support to scientists
Detection and statistics of gusts

DEFF Research Database (Denmark)

Hannesdóttir, Ásta; Kelly, Mark C.; Mann, Jakob

In this project, a more realistic representation of gusts, based on statistical analysis, will account for the variability observed in real-world gusts. The gust representation will focus on temporal, spatial, and velocity scales that are relevant for modern wind turbines and which possibly affect...
Statistical Literacy in the Data Science Workplace

Science.gov (United States)

Grant, Robert

2017-01-01

Statistical literacy, the ability to understand and make use of statistical information including methods, has particular relevance in the age of data science, when complex analyses are undertaken by teams from diverse backgrounds. Not only is it essential to communicate to the consumers of information but also within the team. Writing from the…
Per Object statistical analysis

DEFF Research Database (Denmark)

2008-01-01

of a specific class in turn, and uses as pair of PPO stages to derive the statistics and then assign them to the objects' Object Variables. It may be that this could all be done in some other, simply way, but several other ways that were tried did not succeed. The procedure ouptut has been tested against...
Are conventional statistical techniques exhaustive for defining metal background concentrations in harbour sediments? A case study: The Coastal Area of Bari (Southeast Italy).

Science.gov (United States)

Mali, Matilda; Dell'Anna, Maria Michela; Mastrorilli, Piero; Damiani, Leonardo; Ungaro, Nicola; Belviso, Claudia; Fiore, Saverio

2015-11-01

Sediment contamination by metals poses significant risks to coastal ecosystems and is considered to be problematic for dredging operations. The determination of the background values of metal and metalloid distribution based on site-specific variability is fundamental in assessing pollution levels in harbour sediments. The novelty of the present work consists of addressing the scope and limitation of analysing port sediments through the use of conventional statistical techniques (such as: linear regression analysis, construction of cumulative frequency curves and the iterative 2σ technique), that are commonly employed for assessing Regional Geochemical Background (RGB) values in coastal sediments. This study ascertained that although the tout court use of such techniques in determining the RGB values in harbour sediments seems appropriate (the chemical-physical parameters of port sediments fit well with statistical equations), it should nevertheless be avoided because it may be misleading and can mask key aspects of the study area that can only be revealed by further investigations, such as mineralogical and multivariate statistical analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
Matched case-control studies: a review of reported statistical methodology

Directory of Open Access Journals (Sweden)

Niven DJ

2012-04-01

Full Text Available Daniel J Niven1, Luc R Berthiaume2, Gordon H Fick1, Kevin B Laupland11Department of Critical Care Medicine, Peter Lougheed Centre, Calgary, 2Department of Community Health Sciences, University of Calgary, Calgary, Alberta, CanadaBackground: Case-control studies are a common and efficient means of studying rare diseases or illnesses with long latency periods. Matching of cases and controls is frequently employed to control the effects of known potential confounding variables. The analysis of matched data requires specific statistical methods.Methods: The objective of this study was to determine the proportion of published, peer reviewed matched case-control studies that used statistical methods appropriate for matched data. Using a comprehensive set of search criteria we identified 37 matched case-control studies for detailed analysis.Results: Among these 37 articles, only 16 studies were analyzed with proper statistical techniques (43%. Studies that were properly analyzed were more likely to have included case patients with cancer and cardiovascular disease compared to those that did not use proper statistics (10/16 or 63%, versus 5/21 or 24%, P = 0.02. They were also more likely to have matched multiple controls for each case (14/16 or 88%, versus 13/21 or 62%, P = 0.08. In addition, studies with properly analyzed data were more likely to have been published in a journal with an impact factor listed in the top 100 according to the Journal Citation Reports index (12/16 or 69%, versus 1/21 or 5%, P ≤ 0.0001.Conclusion: The findings of this study raise concern that the majority of matched case-control studies report results that are derived from improper statistical analyses. This may lead to errors in estimating the relationship between a disease and exposure, as well as the incorrect adaptation of emerging medical literature.Keywords: case-control, matched, dependent data, statistics
Organizational downsizing and age discrimination litigation: the influence of personnel practices and statistical evidence on litigation outcomes.

Science.gov (United States)

Wingate, Peter H; Thornton, George C; McIntyre, Kelly S; Frame, Jennifer H

2003-02-01

The present study examined relationships between reduction-in-force (RIF) personnel practices, presentation of statistical evidence, and litigation outcomes. Policy capturing methods were utilized to analyze the components of 115 federal district court opinions involving age discrimination disparate treatment allegations and organizational downsizing. Univariate analyses revealed meaningful links between RIF personnel practices, use of statistical evidence, and judicial verdict. The defendant organization was awarded summary judgment in 73% of the claims included in the study. Judicial decisions in favor of the defendant organization were found to be significantly related to such variables as formal performance appraisal systems, termination decision review within the organization, methods of employee assessment and selection for termination, and the presence of a concrete layoff policy. The use of statistical evidence in ADEA disparate treatment litigation was investigated and found to be a potentially persuasive type of indirect evidence. Legal, personnel, and evidentiary ramifications are reviewed, and a framework of downsizing mechanics emphasizing legal defensibility is presented.
transformation of independent variables in polynomial regression ...

African Journals Online (AJOL)

Ada

preferable when possible to work with a simple functional form in transformed variables rather than with a more complicated form in the original variables. In this paper, it is shown that linear transformations applied to independent variables in polynomial regression models affect the t ratio and hence the statistical ...
Discriminatory power of water polo game-related statistics at the 2008 Olympic Games.

Science.gov (United States)

Escalante, Yolanda; Saavedra, Jose M; Mansilla, Mirella; Tella, Victor

2011-02-01

The aims of this study were (1) to compare water polo game-related statistics by context (winning and losing teams) and sex (men and women), and (2) to identify characteristics discriminating the performances for each sex. The game-related statistics of the 64 matches (44 men's and 20 women's) played in the final phase of the Olympic Games held in Beijing in 2008 were analysed. Unpaired t-tests compared winners and losers and men and women, and confidence intervals and effect sizes of the differences were calculated. The results were subjected to a discriminant analysis to identify the differentiating game-related statistics of the winning and losing teams. The results showed the differences between winning and losing men's teams to be in both defence and offence, whereas in women's teams they were only in offence. In men's games, passing (assists), aggressive play (exclusions), centre position effectiveness (centre shots), and goalkeeper defence (goalkeeper-blocked 5-m shots) predominated, whereas in women's games the play was more dynamic (possessions). The variable that most discriminated performance in men was goalkeeper-blocked shots, and in women shooting effectiveness (shots). These results should help coaches when planning training and competition.
Introduction to high-dimensional statistics

CERN Document Server

Giraud, Christophe

2015-01-01

Ever-greater computing technologies have given rise to an exponentially growing volume of data. Today massive data sets (with potentially thousands of variables) play an important role in almost every branch of modern human activity, including networks, finance, and genetics. However, analyzing such data has presented a challenge for statisticians and data analysts and has required the development of new statistical methods capable of separating the signal from the noise.Introduction to High-Dimensional Statistics is a concise guide to state-of-the-art models, techniques, and approaches for ha
Application of pedagogy reflective in statistical methods course and practicum statistical methods

Science.gov (United States)

Julie, Hongki

2017-08-01

Subject Elementary Statistics, Statistical Methods and Statistical Methods Practicum aimed to equip students of Mathematics Education about descriptive statistics and inferential statistics. The students' understanding about descriptive and inferential statistics were important for students on Mathematics Education Department, especially for those who took the final task associated with quantitative research. In quantitative research, students were required to be able to present and describe the quantitative data in an appropriate manner, to make conclusions from their quantitative data, and to create relationships between independent and dependent variables were defined in their research. In fact, when students made their final project associated with quantitative research, it was not been rare still met the students making mistakes in the steps of making conclusions and error in choosing the hypothetical testing process. As a result, they got incorrect conclusions. This is a very fatal mistake for those who did the quantitative research. There were some things gained from the implementation of reflective pedagogy on teaching learning process in Statistical Methods and Statistical Methods Practicum courses, namely: 1. Twenty two students passed in this course and and one student did not pass in this course. 2. The value of the most accomplished student was A that was achieved by 18 students. 3. According all students, their critical stance could be developed by them, and they could build a caring for each other through a learning process in this course. 4. All students agreed that through a learning process that they undergo in the course, they can build a caring for each other.
Analysing the teleconnection systems affecting the climate of the Carpathian Basin

Science.gov (United States)

Kristóf, Erzsébet; Bartholy, Judit; Pongrácz, Rita

2017-04-01

Nowadays, the increase of the global average near-surface air temperature is unequivocal. Atmospheric low-frequency variabilities have substantial impacts on climate variables such as air temperature and precipitation. Therefore, assessing their effects is essential to improve global and regional climate model simulations for the 21st century. The North Atlantic Oscillation (NAO) is one of the best-known atmospheric teleconnection patterns affecting the Carpathian Basin in Central Europe. Besides NAO, we aim to analyse other interannual-to-decadal teleconnection patterns, which might have significant impacts on the Carpathian Basin, namely, the East Atlantic/West Russia pattern, the Scandinavian pattern, the Mediterranean Oscillation, and the North-Sea Caspian Pattern. For this purpose primarily the European Centre for Medium-Range Weather Forecasts' (ECMWF) ERA-20C atmospheric reanalysis dataset and multivariate statistical methods are used. The indices of each teleconnection pattern and their correlations with temperature and precipitation will be calculated for the period of 1961-1990. On the basis of these data first the long range (i. e. seasonal and/or annual scale) forecast ability is evaluated. Then, we aim to calculate the same indices of the relevant teleconnection patterns for the historical and future simulations of Coupled Model Intercomparison Project Phase 5 (CMIP5) models and compare them against each other using statistical methods. Our ultimate goal is to examine all available CMIP5 models and evaluate their abilities to reproduce the selected teleconnection systems. Thus, climate predictions for the 21st century for the Carpathian Basin may be improved using the best-performing models among all CMIP5 model simulations.
Effects and detection of raw material variability on the performance of near-infrared calibration models for pharmaceutical products.

Science.gov (United States)

Igne, Benoit; Shi, Zhenqi; Drennen, James K; Anderson, Carl A

2014-02-01

The impact of raw material variability on the prediction ability of a near-infrared calibration model was studied. Calibrations, developed from a quaternary mixture design comprising theophylline anhydrous, lactose monohydrate, microcrystalline cellulose, and soluble starch, were challenged by intentional variation of raw material properties. A design with two theophylline physical forms, three lactose particle sizes, and two starch manufacturers was created to test model robustness. Further challenges to the models were accomplished through environmental conditions. Along with full-spectrum partial least squares (PLS) modeling, variable selection by dynamic backward PLS and genetic algorithms was utilized in an effort to mitigate the effects of raw material variability. In addition to evaluating models based on their prediction statistics, prediction residuals were analyzed by analyses of variance and model diagnostics (Hotelling's T(2) and Q residuals). Full-spectrum models were significantly affected by lactose particle size. Models developed by selecting variables gave lower prediction errors and proved to be a good approach to limit the effect of changing raw material characteristics. Hotelling's T(2) and Q residuals provided valuable information that was not detectable when studying only prediction trends. Diagnostic statistics were demonstrated to be critical in the appropriate interpretation of the prediction of quality parameters. © 2013 Wiley Periodicals, Inc. and the American Pharmacists Association.
Development of the Statistical Reasoning in Biology Concept Inventory (SRBCI)

Science.gov (United States)

Deane, Thomas; Nomme, Kathy; Jeffery, Erica; Pollock, Carol; Birol, Gülnur

2016-01-01

We followed established best practices in concept inventory design and developed a 12-item inventory to assess student ability in statistical reasoning in biology (Statistical Reasoning in Biology Concept Inventory [SRBCI]). It is important to assess student thinking in this conceptual area, because it is a fundamental requirement of being statistically literate and associated skills are needed in almost all walks of life. Despite this, previous work shows that non–expert-like thinking in statistical reasoning is common, even after instruction. As science educators, our goal should be to move students along a novice-to-expert spectrum, which could be achieved with growing experience in statistical reasoning. We used item response theory analyses (the one-parameter Rasch model and associated analyses) to assess responses gathered from biology students in two populations at a large research university in Canada in order to test SRBCI’s robustness and sensitivity in capturing useful data relating to the students’ conceptual ability in statistical reasoning. Our analyses indicated that SRBCI is a unidimensional construct, with items that vary widely in difficulty and provide useful information about such student ability. SRBCI should be useful as a diagnostic tool in a variety of biology settings and as a means of measuring the success of teaching interventions designed to improve statistical reasoning skills. PMID:26903497
Entropy statistics and information theory

NARCIS (Netherlands)

Frenken, K.; Hanusch, H.; Pyka, A.

2007-01-01

Entropy measures provide important tools to indicate variety in distributions at particular moments in time (e.g., market shares) and to analyse evolutionary processes over time (e.g., technical change). Importantly, entropy statistics are suitable to decomposition analysis, which renders the
Statistical characterization report for Single-Shell Tank 241-T-107

International Nuclear Information System (INIS)

Cromar, R.D.; Wilmarth, S.R.; Jensen, L.

1994-01-01

This report contains the results of the statistical analysis of data from three core samples obtained from single-shell tank 241-T-107 (T-107). Four specific topics are addressed. They are summarized below. Section 3.0 contains mean concentration estimates of analytes found in T-107. The estimates of open-quotes errorclose quotes associated with the concentration estimates are given as 95% confidence intervals (CI) on the mean. The results given are based on three types of samples: core composite samples, core segment samples, and drainable liquid samples. Section 4.0 contains estimates of the spatial variability (variability between cores and between segments) and the analytical variability (variability between the primary and the duplicate analysis). Statistical tests were performed to test the hypothesis that the between cores and the between segments spatial variability is zero. The results of the tests are as follows. Based on the core composite data, the between cores variance is significantly different from zero for 35 out of 74 analytes; i.e., for 53% of the analytes there is no statistically significant difference between the concentration means for two cores. Based on core segment data, the between segments variance is significantly different from zero for 22 out of 24 analytes and the between cores variance is significantly different from zero for 4 out of 24 analytes; i.e., for 8% of the analytes there is no statistically significant difference between segment means and for 83% of the analytes there is no difference between the means from the three cores. Section 5.0 contains the results of the application of multiple comparison methods to the core composite data, the core segment data, and the drainable liquid data. Section 6.0 contains the results of a statistical test conducted to determine the 222-S Analytical Laboratory's ability to homogenize solid core segments
Advanced Behavioral Analyses Show that the Presence of Food Causes Subtle Changes in C. elegans Movement.

Science.gov (United States)

Angstman, Nicholas B; Frank, Hans-Georg; Schmitz, Christoph

2016-01-01

As a widely used and studied model organism, Caenorhabditis elegans worms offer the ability to investigate implications of behavioral change. Although, investigation of C. elegans behavioral traits has been shown, analysis is often narrowed down to measurements based off a single point, and thus cannot pick up on subtle behavioral and morphological changes. In the present study videos were captured of four different C. elegans strains grown in liquid cultures and transferred to NGM-agar plates with an E. coli lawn or with no lawn. Using an advanced software, WormLab, the full skeleton and outline of worms were tracked to determine whether the presence of food affects behavioral traits. In all seven investigated parameters, statistically significant differences were found in worm behavior between those moving on NGM-agar plates with an E. coli lawn and NGM-agar plates with no lawn. Furthermore, multiple test groups showed differences in interaction between variables as the parameters that significantly correlated statistically with speed of locomotion varied. In the present study, we demonstrate the validity of a model to analyze C. elegans behavior beyond simple speed of locomotion. The need to account for a nested design while performing statistical analyses in similar studies is also demonstrated. With extended analyses, C. elegans behavioral change can be investigated with greater sensitivity, which could have wide utility in fields such as, but not limited to, toxicology, drug discovery, and RNAi screening.
Advanced behavioral analyses show that the presence of food causes subtle changes in C. elegans movement

Directory of Open Access Journals (Sweden)

Nicholas eAngstman

2016-03-01

Full Text Available As a widely used and studied model organism, C. elegans worms offer the ability to investigate implications of behavioral change. Although investigation of C. elegans behavioral traits has been shown, analysis is often narrowed down to measurements based off a single point, and thus cannot pick up on subtle behavioral and morphological changes. In the present study videos were captured of four different C. elegans strains grown in liquid cultures and transferred to NGM-agar plates with an E. coli lawn or with no lawn. Using an advanced software, WormLab, the full skeleton and outline of worms were tracked to determine whether the presence of food affects behavioral traits. In all seven investigated parameters, statistically significant differences were found in worm behavior between those moving on NGM-agar plates with an E. coli lawn and NGM-agar plates with no lawn. Furthermore, multiple test groups showed differences in interaction between variables as the parameters that significantly correlated statistically with speed of locomotion varied. In the present study, we demonstrate the validity of a model to analyze C. elegans behavior beyond simple speed of locomotion. The need to account for a nested design while performing statistical analyses in similar studies is also demonstrated. With extended analyses, C. elegans behavioral change can be investigated with greater sensitivity, which could have wide utility in fields such as, but not limited to, toxicology, drug discovery, and RNAi screening.
Hydrogeologic characterization and evolution of the 'excavation damaged zone' by statistical analyses of pressure signals: application to galleries excavated at the clay-stone sites of Mont Terri (Ga98) and Tournemire (Ga03)

International Nuclear Information System (INIS)

Fatmi, H.; Ababou, R.; Matray, J.M.; Joly, C.

2010-01-01

Document available in extended abstract form only. This paper presents methods of statistical analysis and interpretation of hydrogeological signals in clayey formations, e.g., pore water pressure and atmospheric pressure. The purpose of these analyses is to characterize the hydraulic behaviour of this type of formation in the case of a deep repository of Mid- Level/High-Level and Long-lived radioactive wastes, and to study the evolution of the geologic formation and its EDZ (Excavation Damaged Zone) during the excavation of galleries. We focus on galleries Ga98 and Ga03 in the sites of Mont Terri (Jura, Switzerland) and Tournemire (France, Aveyron), through data collected in the BPP- 1 and PH2 boreholes, respectively. The Mont Terri site, crossing the Aalenian Opalinus clay-stone, is an underground laboratory managed by an international consortium, namely the Mont Terri project (Switzerland). The Tournemire site, crossing the Toarcian clay-stone, is an Underground Research facility managed by IRSN (France). We have analysed pore water and atmospheric pressure signals at these sites, sometimes in correlation with other data. The methods of analysis are based on the theory of stationary random signals (correlation functions, Fourier spectra, transfer functions, envelopes), and on multi-resolution wavelet analysis (adapted to nonstationary and evolutionary signals). These methods are also combined with filtering techniques, and they can be used for single signals as well as pairs of signals (cross-analyses). The objective of this work is to exploit pressure measurements in selected boreholes from the two compacted clay sites, in order to: - evaluate phenomena affecting the measurements (earth tides, barometric pressures..); - estimate hydraulic properties (specific storage..) of the clay-stones prior to excavation works and compare them with those estimated by pulse or slug tests on shorter time scales; - analyze the effects of drift excavation on pore pressures
Statistical physics of hard optimization problems

International Nuclear Information System (INIS)

Zdeborova, L.

2009-01-01

Optimization is fundamental in many areas of science, from computer science and information theory to engineering and statistical physics, as well as to biology or social sciences. It typically involves a large number of variables and a cost function depending on these variables. Optimization problems in the non-deterministic polynomial (NP)-complete class are particularly difficult, it is believed that the number of operations required to minimize the cost function is in the most difficult cases exponential in the system size. However, even in an NP-complete problem the practically arising instances might, in fact, be easy to solve. The principal question we address in this article is: How to recognize if an NP-complete constraint satisfaction problem is typically hard and what are the main reasons for this? We adopt approaches from the statistical physics of disordered systems, in particular the cavity method developed originally to describe glassy systems. We describe new properties of the space of solutions in two of the most studied constraint satisfaction problems - random satisfy ability and random graph coloring. We suggest a relation between the existence of the so-called frozen variables and the algorithmic hardness of a problem. Based on these insights, we introduce a new class of problems which we named ”locked” constraint satisfaction, where the statistical description is easily solvable, but from the algorithmic point of view they are even more challenging than the canonical satisfy ability.

Statistical physics of hard optimization problems

International Nuclear Information System (INIS)

Zdeborova, L.

2009-01-01

Optimization is fundamental in many areas of science, from computer science and information theory to engineering and statistical physics, as well as to biology or social sciences. It typically involves a large number of variables and a cost function depending on these variables. Optimization problems in the non-deterministic polynomial-complete class are particularly difficult, it is believed that the number of operations required to minimize the cost function is in the most difficult cases exponential in the system size. However, even in an non-deterministic polynomial-complete problem the practically arising instances might, in fact, be easy to solve. The principal the question we address in the article is: How to recognize if an non-deterministic polynomial-complete constraint satisfaction problem is typically hard and what are the main reasons for this? We adopt approaches from the statistical physics of disordered systems, in particular the cavity method developed originally to describe glassy systems. We describe new properties of the space of solutions in two of the most studied constraint satisfaction problems - random satisfiability and random graph coloring. We suggest a relation between the existence of the so-called frozen variables and the algorithmic hardness of a problem. Based on these insights, we introduce a new class of problems which we named 'locked' constraint satisfaction, where the statistical description is easily solvable, but from the algorithmic point of view they are even more challenging than the canonical satisfiability (Authors)
Statistical physics of hard optimization problems

Science.gov (United States)

Zdeborová, Lenka

2009-06-01

Optimization is fundamental in many areas of science, from computer science and information theory to engineering and statistical physics, as well as to biology or social sciences. It typically involves a large number of variables and a cost function depending on these variables. Optimization problems in the non-deterministic polynomial (NP)-complete class are particularly difficult, it is believed that the number of operations required to minimize the cost function is in the most difficult cases exponential in the system size. However, even in an NP-complete problem the practically arising instances might, in fact, be easy to solve. The principal question we address in this article is: How to recognize if an NP-complete constraint satisfaction problem is typically hard and what are the main reasons for this? We adopt approaches from the statistical physics of disordered systems, in particular the cavity method developed originally to describe glassy systems. We describe new properties of the space of solutions in two of the most studied constraint satisfaction problems - random satisfiability and random graph coloring. We suggest a relation between the existence of the so-called frozen variables and the algorithmic hardness of a problem. Based on these insights, we introduce a new class of problems which we named "locked" constraint satisfaction, where the statistical description is easily solvable, but from the algorithmic point of view they are even more challenging than the canonical satisfiability.
Information trimming: Sufficient statistics, mutual information, and predictability from effective channel states

Science.gov (United States)

James, Ryan G.; Mahoney, John R.; Crutchfield, James P.

2017-06-01

One of the most basic characterizations of the relationship between two random variables, X and Y , is the value of their mutual information. Unfortunately, calculating it analytically and estimating it empirically are often stymied by the extremely large dimension of the variables. One might hope to replace such a high-dimensional variable by a smaller one that preserves its relationship with the other. It is well known that either X (or Y ) can be replaced by its minimal sufficient statistic about Y (or X ) while preserving the mutual information. While intuitively reasonable, it is not obvious or straightforward that both variables can be replaced simultaneously. We demonstrate that this is in fact possible: the information X 's minimal sufficient statistic preserves about Y is exactly the information that Y 's minimal sufficient statistic preserves about X . We call this procedure information trimming. As an important corollary, we consider the case where one variable is a stochastic process' past and the other its future. In this case, the mutual information is the channel transmission rate between the channel's effective states. That is, the past-future mutual information (the excess entropy) is the amount of information about the future that can be predicted using the past. Translating our result about minimal sufficient statistics, this is equivalent to the mutual information between the forward- and reverse-time causal states of computational mechanics. We close by discussing multivariate extensions to this use of minimal sufficient statistics.
Statistical literacy for clinical practitioners

CERN Document Server

Holmes, William H

2014-01-01

This textbook on statistics is written for students in medicine, epidemiology, and public health. It builds on the important role evidence-based medicine now plays in the clinical practice of physicians, physician assistants and allied health practitioners. By bringing research design and statistics to the fore, this book can integrate these skills into the curricula of professional programs. Students, particularly practitioners-in-training, will learn statistical skills that are required of today’s clinicians. Practice problems at the end of each chapter and downloadable data sets provided by the authors ensure readers get practical experience that they can then apply to their own work. Topics covered include: Functions of Statistics in Clinical Research Common Study Designs Describing Distributions of Categorical and Quantitative Variables Confidence Intervals and Hypothesis Testing Documenting Relationships in Categorical and Quantitative Data Assessing Screening and Diagnostic Tests Comparing Mean...
The effects of clinical and statistical heterogeneity on the predictive values of results from meta-analyses

NARCIS (Netherlands)

Melsen, W G; Rovers, M M; Bonten, M J M; Bootsma, M C J|info:eu-repo/dai/nl/304830305

Variance between studies in a meta-analysis will exist. This heterogeneity may be of clinical, methodological or statistical origin. The last of these is quantified by the I(2) -statistic. We investigated, using simulated studies, the accuracy of I(2) in the assessment of heterogeneity and the
Capturing spike variability in noisy Izhikevich neurons using point process generalized linear models

DEFF Research Database (Denmark)

Østergaard, Jacob; Kramer, Mark A.; Eden, Uri T.

2018-01-01

current. We then fit these spike train datawith a statistical model (a generalized linear model, GLM, with multiplicative influences of past spiking). For different levels of noise, we show how the GLM captures both the deterministic features of the Izhikevich neuron and the variability driven...... by the noise. We conclude that the GLM captures essential features of the simulated spike trains, but for near-deterministic spike trains, goodness-of-fit analyses reveal that the model does not fit very well in a statistical sense; the essential random part of the GLM is not captured....... are separately applied; understanding the relationships between these modeling approaches remains an area of active research. In this letter, we examine this relationship using simulation. To do so, we first generate spike train data from a well-known dynamical model, the Izhikevich neuron, with a noisy input...
A Stochastic Model of Space-Time Variability of Tropical Rainfall: I. Statistics of Spatial Averages

Science.gov (United States)

Kundu, Prasun K.; Bell, Thomas L.; Lau, William K. M. (Technical Monitor)

2002-01-01

Global maps of rainfall are of great importance in connection with modeling of the earth s climate. Comparison between the maps of rainfall predicted by computer-generated climate models with observation provides a sensitive test for these models. To make such a comparison, one typically needs the total precipitation amount over a large area, which could be hundreds of kilometers in size over extended periods of time of order days or months. This presents a difficult problem since rain varies greatly from place to place as well as in time. Remote sensing methods using ground radar or satellites detect rain over a large area by essentially taking a series of snapshots at infrequent intervals and indirectly deriving the average rain intensity within a collection of pixels , usually several kilometers in size. They measure area average of rain at a particular instant. Rain gauges, on the other hand, record rain accumulation continuously in time but only over a very small area tens of centimeters across, say, the size of a dinner plate. They measure only a time average at a single location. In making use of either method one needs to fill in the gaps in the observation - either the gaps in the area covered or the gaps in time of observation. This involves using statistical models to obtain information about the rain that is missed from what is actually detected. This paper investigates such a statistical model and validates it with rain data collected over the tropical Western Pacific from ship borne radars during TOGA COARE (Tropical Oceans Global Atmosphere Coupled Ocean-Atmosphere Response Experiment). The model incorporates a number of commonly observed features of rain. While rain varies rapidly with location and time, the variability diminishes when averaged over larger areas or longer periods of time. Moreover, rain is patchy in nature - at any instant on the average only a certain fraction of the observed pixels contain rain. The fraction of area covered by
The disagreeable behaviour of the kappa statistic.

Science.gov (United States)

Flight, Laura; Julious, Steven A

2015-01-01

It is often of interest to measure the agreement between a number of raters when an outcome is nominal or ordinal. The kappa statistic is used as a measure of agreement. The statistic is highly sensitive to the distribution of the marginal totals and can produce unreliable results. Other statistics such as the proportion of concordance, maximum attainable kappa and prevalence and bias adjusted kappa should be considered to indicate how well the kappa statistic represents agreement in the data. Each kappa should be considered and interpreted based on the context of the data being analysed. Copyright © 2014 John Wiley & Sons, Ltd.
Substantial Variability Exists in Utilities' Nuclear Decommissioning Funding Adequacy: Baseline Trends (1997-2001); and Scenario and Sensitivity Analyses (Year 2001)

International Nuclear Information System (INIS)

Williams, D. G.

2003-01-01

This paper explores the trends over 1997-2001 in my baseline simulation analysis of the sufficiency of electric utilities' funds to eventually decommission the nation's nuclear power plants. Further, for 2001, I describe the utilities' funding adequacy results obtained using scenario and sensitivity analyses, respectively. In this paper, I focus more on the wide variability observed in these adequacy measures among utilities than on the results for the ''average'' utility in the nuclear industry. Only individual utilities, not average utilities -- often used by the nuclear industry to represent its funding adequacy -- will decommission their nuclear plants. Industry-wide results tend to mask the varied results for individual utilities. This paper shows that over 1997-2001, the variability of my baseline decommissioning funding adequacy measures (in percentages) for both utility fund balances and current contributions has remained very large, reflected in the sizable ranges and frequency distributions of these percentages. The relevance of this variability for nuclear decommissioning funding adequacy is, of course, focused more on those utilities that show below ideal balances and contribution levels. Looking backward, 42 of 67 utility fund (available) balances, in 2001, were above (and 25 below) their ideal baseline levels; in 1997, 42 of 76 were above (and 34 below) ideal levels. Of these, many utility balances were far above, and many far below, such ideal levels. The problem of certain utilities continuing to show balances much below ideal persists even with increases in the adequacy of ''average'' utility balances
Predicting Statistical Distributions of Footbridge Vibrations

DEFF Research Database (Denmark)

Pedersen, Lars; Frier, Christian

2009-01-01

The paper considers vibration response of footbridges to pedestrian loading. Employing Newmark and Monte Carlo simulation methods, a statistical distribution of bridge vibration levels is calculated modelling walking parameters such as step frequency and stride length as random variables...
Dispersal of potato cyst nematodes measured using historical and spatial statistical analyses.

Science.gov (United States)

Banks, N C; Hodda, M; Singh, S K; Matveeva, E M

2012-06-01

Rates and modes of dispersal of potato cyst nematodes (PCNs) were investigated. Analysis of records from eight countries suggested that PCNs spread a mean distance of 5.3 km/year radially from the site of first detection, and spread 212 km over ≈40 years before detection. Data from four countries with more detailed histories of invasion were analyzed further, using distance from first detection, distance from previous detection, distance from nearest detection, straight line distance, and road distance. Linear distance from first detection was significantly related to the time since the first detection. Estimated rate of spread was 5.7 km/year, and did not differ statistically between countries. Time between the first detection and estimated introduction date varied between 0 and 20 years, and differed among countries. Road distances from nearest and first detection were statistically significantly related to time, and gave slightly higher estimates for rate of spread of 6.0 and 7.9 km/year, respectively. These results indicate that the original site of introduction of PCNs may act as a source for subsequent spread and that this may occur at a relatively constant rate over time regardless of whether this distance is measured by road or by a straight line. The implications of this constant radial rate of dispersal for biosecurity and pest management are discussed, along with the effects of control strategies.
Statistical methods in spatial genetics

DEFF Research Database (Denmark)

Guillot, Gilles; Leblois, Raphael; Coulon, Aurelie

2009-01-01

The joint analysis of spatial and genetic data is rapidly becoming the norm in population genetics. More and more studies explicitly describe and quantify the spatial organization of genetic variation and try to relate it to underlying ecological processes. As it has become increasingly difficult...... to keep abreast with the latest methodological developments, we review the statistical toolbox available to analyse population genetic data in a spatially explicit framework. We mostly focus on statistical concepts but also discuss practical aspects of the analytical methods, highlighting not only...
Usage statistics and demonstrator services

CERN Multimedia

CERN. Geneva

2007-01-01

An understanding of the use of repositories and their contents is clearly desirable for authors and repository managers alike, as well as those who are analysing the state of scholarly communications. A number of individual initiatives have produced statistics of variious kinds for individual repositories, but the real challenge is to produce statistics that can be collected and compared transparently on a global scale. This presentation details the steps to be taken to address the issues to attain this capability View Les Carr's biography
Statistical core design

International Nuclear Information System (INIS)

Oelkers, E.; Heller, A.S.; Farnsworth, D.A.; Kearfott, K.J.

1978-01-01

The report describes the statistical analysis of DNBR thermal-hydraulic margin of a 3800 MWt, 205-FA core under design overpower conditions. The analysis used LYNX-generated data at predetermined values of the input variables whose uncertainties were to be statistically combined. LYNX data were used to construct an efficient response surface model in the region of interest; the statistical analysis was accomplished through the evaluation of core reliability; utilizing propagation of the uncertainty distributions of the inputs. The response surface model was implemented in both the analytical error propagation and Monte Carlo Techniques. The basic structural units relating to the acceptance criteria are fuel pins. Therefore, the statistical population of pins with minimum DNBR values smaller than specified values is determined. The specified values are designated relative to the most probable and maximum design DNBR values on the power limiting pin used in present design analysis, so that gains over the present design criteria could be assessed for specified probabilistic acceptance criteria. The results are equivalent to gains ranging from 1.2 to 4.8 percent of rated power dependent on the acceptance criterion. The corresponding acceptance criteria range from 95 percent confidence that no pin will be in DNB to 99.9 percent of the pins, which are expected to avoid DNB
Multivariate Statistical Process Control

DEFF Research Database (Denmark)

Kulahci, Murat

2013-01-01

As sensor and computer technology continues to improve, it becomes a normal occurrence that we confront with high dimensional data sets. As in many areas of industrial statistics, this brings forth various challenges in statistical process control (SPC) and monitoring for which the aim...... is to identify “out-of-control” state of a process using control charts in order to reduce the excessive variation caused by so-called assignable causes. In practice, the most common method of monitoring multivariate data is through a statistic akin to the Hotelling’s T2. For high dimensional data with excessive...... amount of cross correlation, practitioners are often recommended to use latent structures methods such as Principal Component Analysis to summarize the data in only a few linear combinations of the original variables that capture most of the variation in the data. Applications of these control charts...
Narrative Review of Statistical Reporting Checklists, Mandatory Statistical Editing, and Rectifying Common Problems in the Reporting of Scientific Articles.

Science.gov (United States)

Dexter, Franklin; Shafer, Steven L

2017-03-01

Considerable attention has been drawn to poor reproducibility in the biomedical literature. One explanation is inadequate reporting of statistical methods by authors and inadequate assessment of statistical reporting and methods during peer review. In this narrative review, we examine scientific studies of several well-publicized efforts to improve statistical reporting. We also review several retrospective assessments of the impact of these efforts. These studies show that instructions to authors and statistical checklists are not sufficient; no findings suggested that either improves the quality of statistical methods and reporting. Second, even basic statistics, such as power analyses, are frequently missing or incorrectly performed. Third, statistical review is needed for all papers that involve data analysis. A consistent finding in the studies was that nonstatistical reviewers (eg, "scientific reviewers") and journal editors generally poorly assess statistical quality. We finish by discussing our experience with statistical review at Anesthesia & Analgesia from 2006 to 2016.
Development of a simplified statistical methodology for nuclear fuel rod internal pressure calculation

International Nuclear Information System (INIS)

Kim, Kyu Tae; Kim, Oh Hwan

1999-01-01

A simplified statistical methodology is developed in order to both reduce over-conservatism of deterministic methodologies employed for PWR fuel rod internal pressure (RIP) calculation and simplify the complicated calculation procedure of the widely used statistical methodology which employs the response surface method and Monte Carlo simulation. The simplified statistical methodology employs the system moment method with a deterministic statistical methodology employs the system moment method with a deterministic approach in determining the maximum variance of RIP. The maximum RIP variance is determined with the square sum of each maximum value of a mean RIP value times a RIP sensitivity factor for all input variables considered. This approach makes this simplified statistical methodology much more efficient in the routine reload core design analysis since it eliminates the numerous calculations required for the power history-dependent RIP variance determination. This simplified statistical methodology is shown to be more conservative in generating RIP distribution than the widely used statistical methodology. Comparison of the significances of each input variable to RIP indicates that fission gas release model is the most significant input variable. (author). 11 refs., 6 figs., 2 tabs
A case study: application of statistical process control tool for determining process capability and sigma level.

Science.gov (United States)

Chopra, Vikram; Bairagi, Mukesh; Trivedi, P; Nagar, Mona

2012-01-01

Statistical process control is the application of statistical methods to the measurement and analysis of variation process. Various regulatory authorities such as Validation Guidance for Industry (2011), International Conference on Harmonisation ICH Q10 (2009), the Health Canada guidelines (2009), Health Science Authority, Singapore: Guidance for Product Quality Review (2008), and International Organization for Standardization ISO-9000:2005 provide regulatory support for the application of statistical process control for better process control and understanding. In this study risk assessments, normal probability distributions, control charts, and capability charts are employed for selection of critical quality attributes, determination of normal probability distribution, statistical stability, and capability of production processes, respectively. The objective of this study is to determine tablet production process quality in the form of sigma process capability. By interpreting data and graph trends, forecasting of critical quality attributes, sigma process capability, and stability of process were studied. The overall study contributes to an assessment of process at the sigma level with respect to out-of-specification attributes produced. Finally, the study will point to an area where the application of quality improvement and quality risk assessment principles for achievement of six sigma-capable processes is possible. Statistical process control is the most advantageous tool for determination of the quality of any production process. This tool is new for the pharmaceutical tablet production process. In the case of pharmaceutical tablet production processes, the quality control parameters act as quality assessment parameters. Application of risk assessment provides selection of critical quality attributes among quality control parameters. Sequential application of normality distributions, control charts, and capability analyses provides a valid statistical
Does internal variability change in response to global warming? A large ensemble modelling study of tropical rainfall

Science.gov (United States)

Milinski, S.; Bader, J.; Jungclaus, J. H.; Marotzke, J.

2017-12-01

There is some consensus on mean state changes of rainfall under global warming; changes of the internal variability, on the other hand, are more difficult to analyse and have not been discussed as much despite their importance for understanding changes in extreme events, such as droughts or floodings. We analyse changes in the rainfall variability in the tropical Atlantic region. We use a 100-member ensemble of historical (1850-2005) model simulations with the Max Planck Institute for Meteorology Earth System Model (MPI-ESM1) to identify changes of internal rainfall variability. To investigate the effects of global warming on the internal variability, we employ an additional ensemble of model simulations with stronger external forcing (1% CO2-increase per year, same integration length as the historical simulations) with 68 ensemble members. The focus of our study is on the oceanic Atlantic ITCZ. We find that the internal variability of rainfall over the tropical Atlantic does change due to global warming and that these changes in variability are larger than changes in the mean state in some regions. From splitting the total variance into patterns of variability, we see that the variability on the southern flank of the ITCZ becomes more dominant, i.e. explaining a larger fraction of the total variance in a warmer climate. In agreement with previous studies, we find that changes in the mean state show an increase and narrowing of the ITCZ. The large ensembles allow us to do a statistically robust differentiation between the changes in variability that can be explained by internal variability and those that can be attributed to the external forcing. Furthermore, we argue that internal variability in a transient climate is only well defined in the ensemble domain and not in the temporal domain, which requires the use of a large ensemble.
Region-of-interest analyses of one-dimensional biomechanical trajectories: bridging 0D and 1D theory, augmenting statistical power

Directory of Open Access Journals (Sweden)

Todd C. Pataky

2016-11-01

Full Text Available One-dimensional (1D kinematic, force, and EMG trajectories are often analyzed using zero-dimensional (0D metrics like local extrema. Recently whole-trajectory 1D methods have emerged in the literature as alternatives. Since 0D and 1D methods can yield qualitatively different results, the two approaches may appear to be theoretically distinct. The purposes of this paper were (a to clarify that 0D and 1D approaches are actually just special cases of a more general region-of-interest (ROI analysis framework, and (b to demonstrate how ROIs can augment statistical power. We first simulated millions of smooth, random 1D datasets to validate theoretical predictions of the 0D, 1D and ROI approaches and to emphasize how ROIs provide a continuous bridge between 0D and 1D results. We then analyzed a variety of public datasets to demonstrate potential effects of ROIs on biomechanical conclusions. Results showed, first, that a priori ROI particulars can qualitatively affect the biomechanical conclusions that emerge from analyses and, second, that ROIs derived from exploratory/pilot analyses can detect smaller biomechanical effects than are detectable using full 1D methods. We recommend regarding ROIs, like data filtering particulars and Type I error rate, as parameters which can affect hypothesis testing results, and thus as sensitivity analysis tools to ensure arbitrary decisions do not influence scientific interpretations. Last, we describe open-source Python and MATLAB implementations of 1D ROI analysis for arbitrary experimental designs ranging from one-sample t tests to MANOVA.

On the Integrity of Online Testing for Introductory Statistics Courses: A Latent Variable Approach

Directory of Open Access Journals (Sweden)

Alan Fask

2015-04-01

Full Text Available There has been a remarkable growth in distance learning courses in higher education. Despite indications that distance learning courses are more vulnerable to cheating behavior than traditional courses, there has been little research studying whether online exams facilitate a relatively greater level of cheating. This article examines this issue by developing an approach using a latent variable to measure student cheating. This latent variable is linked to both known student mastery related variables and variables unrelated to student mastery. Grade scores from a proctored final exam and an unproctored final exam are used to test for increased cheating behavior in the unproctored exam
The relationship between procrastination, learning strategies and statistics anxiety among Iranian college students: a canonical correlation analysis.

Science.gov (United States)

Vahedi, Shahrum; Farrokhi, Farahman; Gahramani, Farahnaz; Issazadegan, Ali

2012-01-01

Approximately 66-80%of graduate students experience statistics anxiety and some researchers propose that many students identify statistics courses as the most anxiety-inducing courses in their academic curriculums. As such, it is likely that statistics anxiety is, in part, responsible for many students delaying enrollment in these courses for as long as possible. This paper proposes a canonical model by treating academic procrastination (AP), learning strategies (LS) as predictor variables and statistics anxiety (SA) as explained variables. A questionnaire survey was used for data collection and 246-college female student participated in this study. To examine the mutually independent relations between procrastination, learning strategies and statistics anxiety variables, a canonical correlation analysis was computed. Findings show that two canonical functions were statistically significant. The set of variables (metacognitive self-regulation, source management, preparing homework, preparing for test and preparing term papers) helped predict changes of statistics anxiety with respect to fearful behavior, Attitude towards math and class, Performance, but not Anxiety. These findings could be used in educational and psychological interventions in the context of statistics anxiety reduction.
Integrated Data Collection Analysis (IDCA) Program - Statistical Analysis of RDX Standard Data Sets

Energy Technology Data Exchange (ETDEWEB)

Sandstrom, Mary M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Brown, Geoffrey W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Preston, Daniel N. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Pollard, Colin J. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Warner, Kirstin F. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Sorensen, Daniel N. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Remmers, Daniel L. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Phillips, Jason J. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Shelley, Timothy J. [Air Force Research Lab. (AFRL), Tyndall AFB, FL (United States); Reyes, Jose A. [Applied Research Associates, Tyndall AFB, FL (United States); Hsu, Peter C. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Reynolds, John G. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2015-10-30

The Integrated Data Collection Analysis (IDCA) program is conducting a Proficiency Test for Small- Scale Safety and Thermal (SSST) testing of homemade explosives (HMEs). Described here are statistical analyses of the results for impact, friction, electrostatic discharge, and differential scanning calorimetry analysis of the RDX Type II Class 5 standard. The material was tested as a well-characterized standard several times during the proficiency study to assess differences among participants and the range of results that may arise for well-behaved explosive materials. The analyses show that there are detectable differences among the results from IDCA participants. While these differences are statistically significant, most of them can be disregarded for comparison purposes to assess potential variability when laboratories attempt to measure identical samples using methods assumed to be nominally the same. The results presented in this report include the average sensitivity results for the IDCA participants and the ranges of values obtained. The ranges represent variation about the mean values of the tests of between 26% and 42%. The magnitude of this variation is attributed to differences in operator, method, and environment as well as the use of different instruments that are also of varying age. The results appear to be a good representation of the broader safety testing community based on the range of methods, instruments, and environments included in the IDCA Proficiency Test.
Football goal distributions and extremal statistics

Science.gov (United States)

Greenhough, J.; Birch, P. C.; Chapman, S. C.; Rowlands, G.

2002-12-01

We analyse the distributions of the number of goals scored by home teams, away teams, and the total scored in the match, in domestic football games from 169 countries between 1999 and 2001. The probability density functions (PDFs) of goals scored are too heavy-tailed to be fitted over their entire ranges by Poisson or negative binomial distributions which would be expected for uncorrelated processes. Log-normal distributions cannot include zero scores and here we find that the PDFs are consistent with those arising from extremal statistics. In addition, we show that it is sufficient to model English top division and FA Cup matches in the seasons of 1970/71-2000/01 on Poisson or negative binomial distributions, as reported in analyses of earlier seasons, and that these are not consistent with extremal statistics.
R Package multiPIM: A Causal Inference Approach to Variable Importance Analysis

Directory of Open Access Journals (Sweden)

Stephan J Ritter

2014-04-01

Full Text Available We describe the R package multiPIM, including statistical background, functionality and user options. The package is for variable importance analysis, and is meant primarily for analyzing data from exploratory epidemiological studies, though it could certainly be applied in other areas as well. The approach taken to variable importance comes from the causal inference field, and is different from approaches taken in other R packages. By default, multiPIM uses a double robust targeted maximum likelihood estimator (TMLE of a parameter akin to the attributable risk. Several regression methods/machine learning algorithms are available for estimating the nuisance parameters of the models, including super learner, a meta-learner which combines several different algorithms into one. We describe a simulation in which the double robust TMLE is compared to the graphical computation estimator. We also provide example analyses using two data sets which are included with the package.
Analysing the spatial patterns of livestock anthrax in Kazakhstan in relation to environmental factors: a comparison of local (Gi* and morphology cluster statistics

Directory of Open Access Journals (Sweden)

Ian T. Kracalik

2012-11-01

Full Text Available We compared a local clustering and a cluster morphology statistic using anthrax outbreaks in large (cattle and small (sheep and goats domestic ruminants across Kazakhstan. The Getis-Ord (Gi* statistic and a multidirectional optimal ecotope algorithm (AMOEBA were compared using 1st, 2nd and 3rd order Rook contiguity matrices. Multivariate statistical tests were used to evaluate the environmental signatures between clusters and non-clusters from the AMOEBA and Gi* tests. A logistic regression was used to define a risk surface for anthrax outbreaks and to compare agreement between clustering methodologies. Tests revealed differences in the spatial distribution of clusters as well as the total number of clusters in large ruminants for AMOEBA (n = 149 and for small ruminants (n = 9. In contrast, Gi* revealed fewer large ruminant clusters (n = 122 and more small ruminant clusters (n = 61. Significant environmental differences were found between groups using the Kruskall-Wallis and Mann- Whitney U tests. Logistic regression was used to model the presence/absence of anthrax outbreaks and define a risk surface for large ruminants to compare with cluster analyses. The model predicted 32.2% of the landscape as high risk. Approximately 75% of AMOEBA clusters corresponded to predicted high risk, compared with ~64% of Gi* clusters. In general, AMOEBA predicted more irregularly shaped clusters of outbreaks in both livestock groups, while Gi* tended to predict larger, circular clusters. Here we provide an evaluation of both tests and a discussion of the use of each to detect environmental conditions associated with anthrax outbreak clusters in domestic livestock. These findings illustrate important differences in spatial statistical methods for defining local clusters and highlight the importance of selecting appropriate levels of data aggregation.
Effect of chamber characteristics, loading and analysis time on motility and kinetic variables analysed with the CASA-mot system in goat sperm.

Science.gov (United States)

Del Gallego, R; Sadeghi, S; Blasco, E; Soler, C; Yániz, J L; Silvestre, M A

2017-02-01

Several factors unrelated to the semen samples could be influencing in the sperm motility analysis. The aim of the present research was to study the effect of four chambers with different characteristics, namely; slide-coverslip, Spermtrack, ISAS D4C10, and ISAS D4C20 on the sperm motility. The filling procedure (drop or capillarity) and analysis time (0, 120 and 240s), depth of chamber (10 or 20μm) and field on motility variables were analysed by use of the CASA-mot system in goat sperm. Use of the drop-filling chambers resulted in greater values than capillarity-filling chambers for all sperm motility and kinetic variables, except for LIN (64.5% compared with 56.3% of motility for drop- and capillarity-filling chambers respectively, PCASA-mot system with a drop-loaded chamber within 2min after filling the chamber. Copyright © 2016 Elsevier B.V. All rights reserved.
Statistical properties of earthquakes clustering

Directory of Open Access Journals (Sweden)

A. Vecchio

2008-04-01

Full Text Available Often in nature the temporal distribution of inhomogeneous stochastic point processes can be modeled as a realization of renewal Poisson processes with a variable rate. Here we investigate one of the classical examples, namely, the temporal distribution of earthquakes. We show that this process strongly departs from a Poisson statistics for both catalogue and sequence data sets. This indicate the presence of correlations in the system probably related to the stressing perturbation characterizing the seismicity in the area under analysis. As shown by this analysis, the catalogues, at variance with sequences, show common statistical properties.
Linking GPS Telemetry Surveys and Scat Analyses Helps Explain Variability in Black Bear Foraging Strategies.

Science.gov (United States)

Lesmerises, Rémi; Rebouillat, Lucie; Dussault, Claude; St-Laurent, Martin-Hugues

2015-01-01

Studying diet is fundamental to animal ecology and scat analysis, a widespread approach, is considered a reliable dietary proxy. Nonetheless, this method has weaknesses such as non-random sampling of habitats and individuals, inaccurate evaluation of excretion date, and lack of assessment of inter-individual dietary variability. We coupled GPS telemetry and scat analyses of black bears Ursus americanus Pallas to relate diet to individual characteristics and habitat use patterns while foraging. We captured 20 black bears (6 males and 14 females) and fitted them with GPS/Argos collars. We then surveyed GPS locations shortly after individual bear visits and collected 139 feces in 71 different locations. Fecal content (relative dry matter biomass of ingested items) was subsequently linked to individual characteristics (sex, age, reproductive status) and to habitats visited during foraging bouts using Brownian bridges based on GPS locations prior to feces excretion. At the population level, diet composition was similar to what was previously described in studies on black bears. However, our individual-based method allowed us to highlight different intra-population patterns, showing that sex and female reproductive status had significant influence on individual diet. For example, in the same habitats, females with cubs did not use the same food sources as lone bears. Linking fecal content (i.e., food sources) to habitat previously visited by different individuals, we demonstrated a potential differential use of similar habitats dependent on individual characteristics. Females with cubs-of-the-year tended to use old forest clearcuts (6-20 years old) to feed on bunchberry, whereas females with yearling foraged for blueberry and lone bears for ants. Coupling GPS telemetry and scat analyses allows for efficient detection of inter-individual or inter-group variations in foraging strategies and of linkages between previous habitat use and food consumption, even for cryptic
Linking GPS Telemetry Surveys and Scat Analyses Helps Explain Variability in Black Bear Foraging Strategies.

Directory of Open Access Journals (Sweden)

Rémi Lesmerises

Full Text Available Studying diet is fundamental to animal ecology and scat analysis, a widespread approach, is considered a reliable dietary proxy. Nonetheless, this method has weaknesses such as non-random sampling of habitats and individuals, inaccurate evaluation of excretion date, and lack of assessment of inter-individual dietary variability. We coupled GPS telemetry and scat analyses of black bears Ursus americanus Pallas to relate diet to individual characteristics and habitat use patterns while foraging. We captured 20 black bears (6 males and 14 females and fitted them with GPS/Argos collars. We then surveyed GPS locations shortly after individual bear visits and collected 139 feces in 71 different locations. Fecal content (relative dry matter biomass of ingested items was subsequently linked to individual characteristics (sex, age, reproductive status and to habitats visited during foraging bouts using Brownian bridges based on GPS locations prior to feces excretion. At the population level, diet composition was similar to what was previously described in studies on black bears. However, our individual-based method allowed us to highlight different intra-population patterns, showing that sex and female reproductive status had significant influence on individual diet. For example, in the same habitats, females with cubs did not use the same food sources as lone bears. Linking fecal content (i.e., food sources to habitat previously visited by different individuals, we demonstrated a potential differential use of similar habitats dependent on individual characteristics. Females with cubs-of-the-year tended to use old forest clearcuts (6-20 years old to feed on bunchberry, whereas females with yearling foraged for blueberry and lone bears for ants. Coupling GPS telemetry and scat analyses allows for efficient detection of inter-individual or inter-group variations in foraging strategies and of linkages between previous habitat use and food consumption, even
Failure mode analysis using state variables derived from fault trees with application

International Nuclear Information System (INIS)

Bartholomew, R.J.

1982-01-01

Fault Tree Analysis (FTA) is used extensively to assess both the qualitative and quantitative reliability of engineered nuclear power systems employing many subsystems and components. FTA is very useful, but the method is limited by its inability to account for failure mode rate-of-change interdependencies (coupling) of statistically independent failure modes. The state variable approach (using FTA-derived failure modes as states) overcomes these difficulties and is applied to the determination of the lifetime distribution function for a heat pipe-thermoelectric nuclear power subsystem. Analyses are made using both Monte Carlo and deterministic methods and compared with a Markov model of the same subsystem
Statistical Analysis of Categorical Time Series of Atmospheric Elementary Circulation Mechanisms - Dzerdzeevski Classification for the Northern Hemisphere.

Science.gov (United States)

Brenčič, Mihael

2016-01-01

Northern hemisphere elementary circulation mechanisms, defined with the Dzerdzeevski classification and published on a daily basis from 1899-2012, are analysed with statistical methods as continuous categorical time series. Classification consists of 41 elementary circulation mechanisms (ECM), which are assigned to calendar days. Empirical marginal probabilities of each ECM were determined. Seasonality and the periodicity effect were investigated with moving dispersion filters and randomisation procedure on the ECM categories as well as with the time analyses of the ECM mode. The time series were determined as being non-stationary with strong time-dependent trends. During the investigated period, periodicity interchanges with periods when no seasonality is present. In the time series structure, the strongest division is visible at the milestone of 1986, showing that the atmospheric circulation pattern reflected in the ECM has significantly changed. This change is result of the change in the frequency of ECM categories; before 1986, the appearance of ECM was more diverse, and afterwards fewer ECMs appear. The statistical approach applied to the categorical climatic time series opens up new potential insight into climate variability and change studies that have to be performed in the future.
New Closed-Form Results on Ordered Statistics of Partial Sums of Gamma Random Variables and its Application to Performance Evaluation in the Presence of Nakagami Fading

KAUST Repository

Nam, Sung Sik

2017-06-19

Complex wireless transmission systems require multi-dimensional joint statistical techniques for performance evaluation. Here, we first present the exact closed-form results on order statistics of any arbitrary partial sums of Gamma random variables with the closedform results of core functions specialized for independent and identically distributed Nakagami-m fading channels based on a moment generating function-based unified analytical framework. These both exact closed-form results have never been published in the literature. In addition, as a feasible application example in which our new offered derived closed-form results can be applied is presented. In particular, we analyze the outage performance of the finger replacement schemes over Nakagami fading channels as an application of our method. Note that these analysis results are directly applicable to several applications, such as millimeter-wave communication systems in which an antenna diversity scheme operates using an finger replacement schemes-like combining scheme, and other fading scenarios. Note also that the statistical results can provide potential solutions for ordered statistics in any other research topics based on Gamma distributions or other advanced wireless communications research topics in the presence of Nakagami fading.
Effect of Variable Spatial Scales on USLE-GIS Computations

Science.gov (United States)

Patil, R. J.; Sharma, S. K.

2017-12-01

Use of appropriate spatial scale is very important in Universal Soil Loss Equation (USLE) based spatially distributed soil erosion modelling. This study aimed at assessment of annual rates of soil erosion at different spatial scales/grid sizes and analysing how changes in spatial scales affect USLE-GIS computations using simulation and statistical variabilities. Efforts have been made in this study to recommend an optimum spatial scale for further USLE-GIS computations for management and planning in the study area. The present research study was conducted in Shakkar River watershed, situated in Narsinghpur and Chhindwara districts of Madhya Pradesh, India. Remote Sensing and GIS techniques were integrated with Universal Soil Loss Equation (USLE) to predict spatial distribution of soil erosion in the study area at four different spatial scales viz; 30 m, 50 m, 100 m, and 200 m. Rainfall data, soil map, digital elevation model (DEM) and an executable C++ program, and satellite image of the area were used for preparation of the thematic maps for various USLE factors. Annual rates of soil erosion were estimated for 15 years (1992 to 2006) at four different grid sizes. The statistical analysis of four estimated datasets showed that sediment loss dataset at 30 m spatial scale has a minimum standard deviation (2.16), variance (4.68), percent deviation from observed values (2.68 - 18.91 %), and highest coefficient of determination (R2 = 0.874) among all the four datasets. Thus, it is recommended to adopt this spatial scale for USLE-GIS computations in the study area due to its minimum statistical variability and better agreement with the observed sediment loss data. This study also indicates large scope for use of finer spatial scales in spatially distributed soil erosion modelling.
How Genes Modulate Patterns of Aging-Related Changes on the Way to 100: Biodemographic Models and Methods in Genetic Analyses of Longitudinal Data

Science.gov (United States)

Yashin, Anatoliy I.; Arbeev, Konstantin G.; Wu, Deqing; Arbeeva, Liubov; Kulminski, Alexander; Kulminskaya, Irina; Akushevich, Igor; Ukraintseva, Svetlana V.

2016-01-01

Background and Objective To clarify mechanisms of genetic regulation of human aging and longevity traits, a number of genome-wide association studies (GWAS) of these traits have been performed. However, the results of these analyses did not meet expectations of the researchers. Most detected genetic associations have not reached a genome-wide level of statistical significance, and suffered from the lack of replication in the studies of independent populations. The reasons for slow progress in this research area include low efficiency of statistical methods used in data analyses, genetic heterogeneity of aging and longevity related traits, possibility of pleiotropic (e.g., age dependent) effects of genetic variants on such traits, underestimation of the effects of (i) mortality selection in genetically heterogeneous cohorts, (ii) external factors and differences in genetic backgrounds of individuals in the populations under study, the weakness of conceptual biological framework that does not fully account for above mentioned factors. One more limitation of conducted studies is that they did not fully realize the potential of longitudinal data that allow for evaluating how genetic influences on life span are mediated by physiological variables and other biomarkers during the life course. The objective of this paper is to address these issues. Data and Methods We performed GWAS of human life span using different subsets of data from the original Framingham Heart Study cohort corresponding to different quality control (QC) procedures and used one subset of selected genetic variants for further analyses. We used simulation study to show that approach to combining data improves the quality of GWAS. We used FHS longitudinal data to compare average age trajectories of physiological variables in carriers and non-carriers of selected genetic variants. We used stochastic process model of human mortality and aging to investigate genetic influence on hidden biomarkers of aging
Comparing Visual and Statistical Analysis of Multiple Baseline Design Graphs.

Science.gov (United States)

Wolfe, Katie; Dickenson, Tammiee S; Miller, Bridget; McGrath, Kathleen V

2018-04-01

A growing number of statistical analyses are being developed for single-case research. One important factor in evaluating these methods is the extent to which each corresponds to visual analysis. Few studies have compared statistical and visual analysis, and information about more recently developed statistics is scarce. Therefore, our purpose was to evaluate the agreement between visual analysis and four statistical analyses: improvement rate difference (IRD); Tau-U; Hedges, Pustejovsky, Shadish (HPS) effect size; and between-case standardized mean difference (BC-SMD). Results indicate that IRD and BC-SMD had the strongest overall agreement with visual analysis. Although Tau-U had strong agreement with visual analysis on raw values, it had poorer agreement when those values were dichotomized to represent the presence or absence of a functional relation. Overall, visual analysis appeared to be more conservative than statistical analysis, but further research is needed to evaluate the nature of these disagreements.
A new approach to analyse longitudinal epidemiological data with an excess of zeros

NARCIS (Netherlands)

Spriensma, Alette S.; Hajos, Tibor R. S.; de Boer, Michiel R.; Heymans, Martijn W.; Twisk, Jos W. R.

2013-01-01

Background: Within longitudinal epidemiological research, 'count' outcome variables with an excess of zeros frequently occur. Although these outcomes are frequently analysed with a linear mixed model, or a Poisson mixed model, a two-part mixed model would be better in analysing outcome variables
Probability and logical structure of statistical theories

International Nuclear Information System (INIS)

Hall, M.J.W.

1988-01-01

A characterization of statistical theories is given which incorporates both classical and quantum mechanics. It is shown that each statistical theory induces an associated logic and joint probability structure, and simple conditions are given for the structure to be of a classical or quantum type. This provides an alternative for the quantum logic approach to axiomatic quantum mechanics. The Bell inequalities may be derived for those statistical theories that have a classical structure and satisfy a locality condition weaker than factorizability. The relation of these inequalities to the issue of hidden variable theories for quantum mechanics is discussed and clarified
Applying Statistical Mechanics to pixel detectors

International Nuclear Information System (INIS)

Pindo, Massimiliano

2002-01-01

Pixel detectors, being made of a large number of active cells of the same kind, can be considered as significant sets to which Statistical Mechanics variables and methods can be applied. By properly redefining well known statistical parameters in order to let them match the ones that actually characterize pixel detectors, an analysis of the way they work can be performed in a totally new perspective. A deeper understanding of pixel detectors is attained, helping in the evaluation and comparison of their intrinsic characteristics and performance
Analysis of Norwegian bio energy statistics. Quality improvement proposals; Analyse av norsk bioenergistatistikk. Forslag til kvalitetsheving

Energy Technology Data Exchange (ETDEWEB)

NONE

2011-07-01

This report is an assessment of the current model and presentation form of bio energy statistics. It appears proposed revision and enhancement of both collection and data representation. In the context of market development both in general for energy and particularly for bio energy and government targets, a good bio energy statistics form the basis to follow up the objectives and means.(eb)

Longitudinal Variations in the Variability of Spread F Occurrence

Science.gov (United States)

Groves, K. M.; Bridgwood, C.; Carrano, C. S.

2017-12-01

The complex dynamics of the equatorial ionosphere have attracted the interest and attention of researchers for many decades. The relatively local processes that give rise to large meridional gradients have been well documented and the associated terminology has entered the common lexicon of ionospheric research (e.g., fountain effect, equatorial anomaly, bubbles, Spread F). Zonal variations have also been noted, principally at the level of determining longitudinal differences in seasonal activity patterns. Due to a historical lack of high resolution ground-based observations at low latitudes, the primary source of data for such analyses has been space-based observations from satellites such as ROCSAT, DMSP, C/NOFS that measure in situ electron density variations. An important longitudinal variation in electron density structure associated with non-migrating diurnal tides was discovered by Immel et al. in 2006 using data from the FUV sensor aboard the NASA IMAGE satellite. These satellite observations have been very helpful in identifying the structural characteristics of the equatorial ionosphere and the occurrence of Spread F, but they provide little insight into variations in scintillation features and potential differences in bubble development characteristics. Moreover space-based studies tend towards the statistics of occurrence frequency over periods of weeks to months. A recent analysis of daily spread F occurrence as determined by low latitude VHF scintillation activity shows that statistical results that are consistent with previous space-based observations, but the level of variability in the occurrence data show marked variations with longitude. For example, the American sector shows very low in-season variability while the African and Asian sectors exhibit true day-to-day variability regardless of seasonal variations. The results have significant implications for space weather as they suggest that long-term forecasts of equatorial scintillation may be
Sensitivity analysis of ranked data: from order statistics to quantiles

NARCIS (Netherlands)

Heidergott, B.F.; Volk-Makarewicz, W.

2015-01-01

In this paper we provide the mathematical theory for sensitivity analysis of order statistics of continuous random variables, where the sensitivity is with respect to a distributional parameter. Sensitivity analysis of order statistics over a finite number of observations is discussed before
Statistics I essentials

CERN Document Server

Milewski, Emil G

2012-01-01

REA's Essentials provide quick and easy access to critical information in a variety of different fields, ranging from the most basic to the most advanced. As its name implies, these concise, comprehensive study guides summarize the essentials of the field covered. Essentials are helpful when preparing for exams, doing homework and will remain a lasting reference source for students, teachers, and professionals. Statistics I covers include frequency distributions, numerical methods of describing data, measures of variability, parameters of distributions, probability theory, and distributions.
Dynamic statistical information theory

Institute of Scientific and Technical Information of China (English)

无

2006-01-01

In recent years we extended Shannon static statistical information theory to dynamic processes and established a Shannon dynamic statistical information theory, whose core is the evolution law of dynamic entropy and dynamic information. We also proposed a corresponding Boltzmman dynamic statistical information theory. Based on the fact that the state variable evolution equation of respective dynamic systems, i.e. Fokker-Planck equation and Liouville diffusion equation can be regarded as their information symbol evolution equation, we derived the nonlinear evolution equations of Shannon dynamic entropy density and dynamic information density and the nonlinear evolution equations of Boltzmann dynamic entropy density and dynamic information density, that describe respectively the evolution law of dynamic entropy and dynamic information. The evolution equations of these two kinds of dynamic entropies and dynamic informations show in unison that the time rate of change of dynamic entropy densities is caused by their drift, diffusion and production in state variable space inside the systems and coordinate space in the transmission processes; and that the time rate of change of dynamic information densities originates from their drift, diffusion and dissipation in state variable space inside the systems and coordinate space in the transmission processes. Entropy and information have been combined with the state and its law of motion of the systems. Furthermore we presented the formulas of two kinds of entropy production rates and information dissipation rates, the expressions of two kinds of drift information flows and diffusion information flows. We proved that two kinds of information dissipation rates (or the decrease rates of the total information) were equal to their corresponding entropy production rates (or the increase rates of the total entropy) in the same dynamic system. We obtained the formulas of two kinds of dynamic mutual informations and dynamic channel
Statistical methods for quantitative mass spectrometry proteomic experiments with labeling

Directory of Open Access Journals (Sweden)

Oberg Ann L

2012-11-01

Full Text Available Abstract Mass Spectrometry utilizing labeling allows multiple specimens to be subjected to mass spectrometry simultaneously. As a result, between-experiment variability is reduced. Here we describe use of fundamental concepts of statistical experimental design in the labeling framework in order to minimize variability and avoid biases. We demonstrate how to export data in the format that is most efficient for statistical analysis. We demonstrate how to assess the need for normalization, perform normalization, and check whether it worked. We describe how to build a model explaining the observed values and test for differential protein abundance along with descriptive statistics and measures of reliability of the findings. Concepts are illustrated through the use of three case studies utilizing the iTRAQ 4-plex labeling protocol.
Statistical methods for quantitative mass spectrometry proteomic experiments with labeling.

Science.gov (United States)

Oberg, Ann L; Mahoney, Douglas W

2012-01-01

Mass Spectrometry utilizing labeling allows multiple specimens to be subjected to mass spectrometry simultaneously. As a result, between-experiment variability is reduced. Here we describe use of fundamental concepts of statistical experimental design in the labeling framework in order to minimize variability and avoid biases. We demonstrate how to export data in the format that is most efficient for statistical analysis. We demonstrate how to assess the need for normalization, perform normalization, and check whether it worked. We describe how to build a model explaining the observed values and test for differential protein abundance along with descriptive statistics and measures of reliability of the findings. Concepts are illustrated through the use of three case studies utilizing the iTRAQ 4-plex labeling protocol.
How to Make Nothing Out of Something: Analyses of the Impact of Study Sampling and Statistical Interpretation in Misleading Meta-Analytic Conclusions

Directory of Open Access Journals (Sweden)

Michael Robert Cunningham

2016-10-01

Full Text Available The limited resource model states that self-control is governed by a relatively finite set of inner resources on which people draw when exerting willpower. Once self-control resources have been used up or depleted, they are less available for other self-control tasks, leading to a decrement in subsequent self-control success. The depletion effect has been studied for over 20 years, tested or extended in more than 600 studies, and supported in an independent meta-analysis (Hagger, Wood, Stiff, and Chatzisarantis, 2010. Meta-analyses are supposed to reduce bias in literature reviews. Carter, Kofler, Forster, and McCullough’s (2015 meta-analysis, by contrast, included a series of questionable decisions involving sampling, methods, and data analysis. We provide quantitative analyses of key sampling issues: exclusion of many of the best depletion studies based on idiosyncratic criteria and the emphasis on mini meta-analyses with low statistical power as opposed to the overall depletion effect. We discuss two key methodological issues: failure to code for research quality, and the quantitative impact of weak studies by novice researchers. We discuss two key data analysis issues: questionable interpretation of the results of trim and fill and funnel plot asymmetry test procedures, and the use and misinterpretation of the untested Precision Effect Test [PET] and Precision Effect Estimate with Standard Error (PEESE procedures. Despite these serious problems, the Carter et al. meta-analysis results actually indicate that there is a real depletion effect – contrary to their title.
A new approach to analyse longitudinal epidemiological data with an excess of zeros

NARCIS (Netherlands)

Spriensma, A.S.; Hajós, T.R.S.; de Boer, M.R.; Heijmans, M.W.; Twisk, J.W.R.

2013-01-01

Background: Within longitudinal epidemiological research, count outcome variables with an excess of zeros frequently occur. Although these outcomes are frequently analysed with a linear mixed model, or a Poisson mixed model, a two-part mixed model would be better in analysing outcome variables with
The relation between statistical power and inference in fMRI.

Directory of Open Access Journals (Sweden)

Henk R Cremers

Full Text Available Statistically underpowered studies can result in experimental failure even when all other experimental considerations have been addressed impeccably. In fMRI the combination of a large number of dependent variables, a relatively small number of observations (subjects, and a need to correct for multiple comparisons can decrease statistical power dramatically. This problem has been clearly addressed yet remains controversial-especially in regards to the expected effect sizes in fMRI, and especially for between-subjects effects such as group comparisons and brain-behavior correlations. We aimed to clarify the power problem by considering and contrasting two simulated scenarios of such possible brain-behavior correlations: weak diffuse effects and strong localized effects. Sampling from these scenarios shows that, particularly in the weak diffuse scenario, common sample sizes (n = 20-30 display extremely low statistical power, poorly represent the actual effects in the full sample, and show large variation on subsequent replications. Empirical data from the Human Connectome Project resembles the weak diffuse scenario much more than the localized strong scenario, which underscores the extent of the power problem for many studies. Possible solutions to the power problem include increasing the sample size, using less stringent thresholds, or focusing on a region-of-interest. However, these approaches are not always feasible and some have major drawbacks. The most prominent solutions that may help address the power problem include model-based (multivariate prediction methods and meta-analyses with related synthesis-oriented approaches.
SETI and SEH (Statistical Equation for Habitables)

Science.gov (United States)

Maccone, Claudio

2011-01-01

The statistics of habitable planets may be based on a set of ten (and possibly more) astrobiological requirements first pointed out by Stephen H. Dole in his book "Habitable planets for man" (1964). In this paper, we first provide the statistical generalization of the original and by now too simplistic Dole equation. In other words, a product of ten positive numbers is now turned into the product of ten positive random variables. This we call the SEH, an acronym standing for "Statistical Equation for Habitables". The mathematical structure of the SEH is then derived. The proof is based on the central limit theorem (CLT) of Statistics. In loose terms, the CLT states that the sum of any number of independent random variables, each of which may be arbitrarily distributed, approaches a Gaussian (i.e. normal) random variable. This is called the Lyapunov form of the CLT, or the Lindeberg form of the CLT, depending on the mathematical constraints assumed on the third moments of the various probability distributions. In conclusion, we show that The new random variable NHab, yielding the number of habitables (i.e. habitable planets) in the Galaxy, follows the lognormal distribution. By construction, the mean value of this lognormal distribution is the total number of habitable planets as given by the statistical Dole equation. But now we also derive the standard deviation, the mode, the median and all the moments of this new lognormal NHab random variable. The ten (or more) astrobiological factors are now positive random variables. The probability distribution of each random variable may be arbitrary. The CLT in the so-called Lyapunov or Lindeberg forms (that both do not assume the factors to be identically distributed) allows for that. In other words, the CLT "translates" into our SEH by allowing an arbitrary probability distribution for each factor. This is both astrobiologically realistic and useful for any further investigations. An application of our SEH then follows
The Relationship Between Procrastination, Learning Strategies and Statistics Anxiety Among Iranian College Students: A Canonical Correlation Analysis

Science.gov (United States)

Vahedi, Shahrum; Farrokhi, Farahman; Gahramani, Farahnaz; Issazadegan, Ali

2012-01-01

Objective: Approximately 66-80%of graduate students experience statistics anxiety and some researchers propose that many students identify statistics courses as the most anxiety-inducing courses in their academic curriculums. As such, it is likely that statistics anxiety is, in part, responsible for many students delaying enrollment in these courses for as long as possible. This paper proposes a canonical model by treating academic procrastination (AP), learning strategies (LS) as predictor variables and statistics anxiety (SA) as explained variables. Methods: A questionnaire survey was used for data collection and 246-college female student participated in this study. To examine the mutually independent relations between procrastination, learning strategies and statistics anxiety variables, a canonical correlation analysis was computed. Results: Findings show that two canonical functions were statistically significant. The set of variables (metacognitive self-regulation, source management, preparing homework, preparing for test and preparing term papers) helped predict changes of statistics anxiety with respect to fearful behavior, Attitude towards math and class, Performance, but not Anxiety. Conclusion: These findings could be used in educational and psychological interventions in the context of statistics anxiety reduction. PMID:24644468
Monte Carlo Bayesian inference on a statistical model of sub-gridcolumn moisture variability using high-resolution cloud observations. Part 1: Method

Science.gov (United States)

Norris, Peter M.; da Silva, Arlindo M.

2018-01-01

A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC. PMID:29618847
Monte Carlo Bayesian Inference on a Statistical Model of Sub-Gridcolumn Moisture Variability Using High-Resolution Cloud Observations. Part 1: Method

Science.gov (United States)

Norris, Peter M.; Da Silva, Arlindo M.

2016-01-01

A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.
Statistical learning and prejudice.

Science.gov (United States)

Madison, Guy; Ullén, Fredrik

2012-12-01

Human behavior is guided by evolutionarily shaped brain mechanisms that make statistical predictions based on limited information. Such mechanisms are important for facilitating interpersonal relationships, avoiding dangers, and seizing opportunities in social interaction. We thus suggest that it is essential for analyses of prejudice and prejudice reduction to take the predictive accuracy and adaptivity of the studied prejudices into account.
Multi-site study of diffusion metric variability: effects of site, vendor, field strength, and echo time on regions-of-interest and histogram-bin analyses.

Science.gov (United States)

Helmer, K G; Chou, M-C; Preciado, R I; Gimi, B; Rollins, N K; Song, A; Turner, J; Mori, S

2016-02-27

It is now common for magnetic-resonance-imaging (MRI) based multi-site trials to include diffusion-weighted imaging (DWI) as part of the protocol. It is also common for these sites to possess MR scanners of different manufacturers, different software and hardware, and different software licenses. These differences mean that scanners may not be able to acquire data with the same number of gradient amplitude values and number of available gradient directions. Variability can also occur in achievable b-values and minimum echo times. The challenge of a multi-site study then, is to create a common protocol by understanding and then minimizing the effects of scanner variability and identifying reliable and accurate diffusion metrics. This study describes the effect of site, scanner vendor, field strength, and TE on two diffusion metrics: the first moment of the diffusion tensor field (mean diffusivity, MD), and the fractional anisotropy (FA) using two common analyses (region-of-interest and mean-bin value of whole brain histograms). The goal of the study was to identify sources of variability in diffusion-sensitized imaging and their influence on commonly reported metrics. The results demonstrate that the site, vendor, field strength, and echo time all contribute to variability in FA and MD, though to different extent. We conclude that characterization of the variability of DTI metrics due to site, vendor, field strength, and echo time is a worthwhile step in the construction of multi-center trials.
Statistical Analysis for Multisite Trials Using Instrumental Variables with Random Coefficients

Science.gov (United States)

Raudenbush, Stephen W.; Reardon, Sean F.; Nomi, Takako

2012-01-01

Multisite trials can clarify the average impact of a new program and the heterogeneity of impacts across sites. Unfortunately, in many applications, compliance with treatment assignment is imperfect. For these applications, we propose an instrumental variable (IV) model with person-specific and site-specific random coefficients. Site-specific IV…
Partitioning inter annual variability in net ecosystem exchange between climatic variability and functional change

International Nuclear Information System (INIS)

Hui, D.; Luo, Y.; Katul, G.

2003-01-01

Inter annual variability in net ecosystem exchange of carbon is investigated using a homogeneity-of-slopes model to identify the function change contributing to inter annual variability, net ecosystem carbon exchange, and night-time ecosystem respiration. Results of employing this statistical approach to a data set collected at the Duke Forest AmeriFlux site from August 1997 to December 2001 are discussed. The results demonstrate that it is feasible to partition the variation in ecosystem carbon fluxes into direct effects of seasonal and inter annual climatic variability and functional change. 51 refs., 4 tabs., 5 figs
Study designs, use of statistical tests, and statistical analysis software choice in 2015: Results from two Pakistani monthly Medline indexed journals.

Science.gov (United States)

Shaikh, Masood Ali

2017-09-01

Assessment of research articles in terms of study designs used, statistical tests applied and the use of statistical analysis programmes help determine research activity profile and trends in the country. In this descriptive study, all original articles published by Journal of Pakistan Medical Association (JPMA) and Journal of the College of Physicians and Surgeons Pakistan (JCPSP), in the year 2015 were reviewed in terms of study designs used, application of statistical tests, and the use of statistical analysis programmes. JPMA and JCPSP published 192 and 128 original articles, respectively, in the year 2015. Results of this study indicate that cross-sectional study design, bivariate inferential statistical analysis entailing comparison between two variables/groups, and use of statistical software programme SPSS to be the most common study design, inferential statistical analysis, and statistical analysis software programmes, respectively. These results echo previously published assessment of these two journals for the year 2014.
Fatigue Crack Propagation Under Variable Amplitude Loading Analyses Based on Plastic Energy Approach

Directory of Open Access Journals (Sweden)

Sofiane Maachou

2014-04-01

Full Text Available Plasticity effects at the crack tip had been recognized as “motor” of crack propagation, the growth of cracks is related to the existence of a crack tip plastic zone, whose formation and intensification is accompanied by energy dissipation. In the actual state of knowledge fatigue crack propagation is modeled using crack closure concept. The fatigue crack growth behavior under constant amplitude and variable amplitude loading of the aluminum alloy 2024 T351 are analyzed using in terms energy parameters. In the case of VAL (variable amplitude loading tests, the evolution of the hysteretic energy dissipated per block is shown similar with that observed under constant amplitude loading. A linear relationship between the crack growth rate and the hysteretic energy dissipated per block is obtained at high growth rates. For lower growth rates values, the relationship between crack growth rate and hysteretic energy dissipated per block can represented by a power law. In this paper, an analysis of fatigue crack propagation under variable amplitude loading based on energetic approach is proposed.
An Application of Multivariate Statistical Analysis for Query-Driven Visualization

Energy Technology Data Exchange (ETDEWEB)

Gosink, Luke J. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Garth, Christoph [Univ. of California, Davis, CA (United States); Anderson, John C. [Univ. of California, Davis, CA (United States); Bethel, E. Wes [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Joy, Kenneth I. [Univ. of California, Davis, CA (United States)

2011-03-01

Driven by the ability to generate ever-larger, increasingly complex data, there is an urgent need in the scientific community for scalable analysis methods that can rapidly identify salient trends in scientific data. Query-Driven Visualization (QDV) strategies are among the small subset of techniques that can address both large and highly complex datasets. This paper extends the utility of QDV strategies with a statistics-based framework that integrates non-parametric distribution estimation techniques with a new segmentation strategy to visually identify statistically significant trends and features within the solution space of a query. In this framework, query distribution estimates help users to interactively explore their query's solution and visually identify the regions where the combined behavior of constrained variables is most important, statistically, to their inquiry. Our new segmentation strategy extends the distribution estimation analysis by visually conveying the individual importance of each variable to these regions of high statistical significance. We demonstrate the analysis benefits these two strategies provide and show how they may be used to facilitate the refinement of constraints over variables expressed in a user's query. We apply our method to datasets from two different scientific domains to demonstrate its broad applicability.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.