large statistically significant: Topics by WorldWideScience.org

Sample records for large statistically significant

Statistical significance of cis-regulatory modules

Directory of Open Access Journals (Sweden)

Smith Andrew D

2007-01-01

Full Text Available Abstract Background It is becoming increasingly important for researchers to be able to scan through large genomic regions for transcription factor binding sites or clusters of binding sites forming cis-regulatory modules. Correspondingly, there has been a push to develop algorithms for the rapid detection and assessment of cis-regulatory modules. While various algorithms for this purpose have been introduced, most are not well suited for rapid, genome scale scanning. Results We introduce methods designed for the detection and statistical evaluation of cis-regulatory modules, modeled as either clusters of individual binding sites or as combinations of sites with constrained organization. In order to determine the statistical significance of module sites, we first need a method to determine the statistical significance of single transcription factor binding site matches. We introduce a straightforward method of estimating the statistical significance of single site matches using a database of known promoters to produce data structures that can be used to estimate p-values for binding site matches. We next introduce a technique to calculate the statistical significance of the arrangement of binding sites within a module using a max-gap model. If the module scanned for has defined organizational parameters, the probability of the module is corrected to account for organizational constraints. The statistical significance of single site matches and the architecture of sites within the module can be combined to provide an overall estimation of statistical significance of cis-regulatory module sites. Conclusion The methods introduced in this paper allow for the detection and statistical evaluation of single transcription factor binding sites and cis-regulatory modules. The features described are implemented in the Search Tool for Occurrences of Regulatory Motifs (STORM and MODSTORM software.
Statistically significant relational data mining :

Energy Technology Data Exchange (ETDEWEB)

Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.

2014-02-01

This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
Significance levels for studies with correlated test statistics.

Science.gov (United States)

Shi, Jianxin; Levinson, Douglas F; Whittemore, Alice S

2008-07-01

When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.
Test for the statistical significance of differences between ROC curves

International Nuclear Information System (INIS)

Metz, C.E.; Kronman, H.B.

1979-01-01

A test for the statistical significance of observed differences between two measured Receiver Operating Characteristic (ROC) curves has been designed and evaluated. The set of observer response data for each ROC curve is assumed to be independent and to arise from a ROC curve having a form which, in the absence of statistical fluctuations in the response data, graphs as a straight line on double normal-deviate axes. To test the significance of an apparent difference between two measured ROC curves, maximum likelihood estimates of the two parameters of each curve and the associated parameter variances and covariance are calculated from the corresponding set of observer response data. An approximate Chi-square statistic with two degrees of freedom is then constructed from the differences between the parameters estimated for each ROC curve and from the variances and covariances of these estimates. This statistic is known to be truly Chi-square distributed only in the limit of large numbers of trials in the observer performance experiments. Performance of the statistic for data arising from a limited number of experimental trials was evaluated. Independent sets of rating scale data arising from the same underlying ROC curve were paired, and the fraction of differences found (falsely) significant was compared to the significance level, α, used with the test. Although test performance was found to be somewhat dependent on both the number of trials in the data and the position of the underlying ROC curve in the ROC space, the results for various significance levels showed the test to be reliable under practical experimental conditions
The large deviation approach to statistical mechanics

International Nuclear Information System (INIS)

Touchette, Hugo

2009-01-01

The theory of large deviations is concerned with the exponential decay of probabilities of large fluctuations in random systems. These probabilities are important in many fields of study, including statistics, finance, and engineering, as they often yield valuable information about the large fluctuations of a random system around its most probable state or trajectory. In the context of equilibrium statistical mechanics, the theory of large deviations provides exponential-order estimates of probabilities that refine and generalize Einstein's theory of fluctuations. This review explores this and other connections between large deviation theory and statistical mechanics, in an effort to show that the mathematical language of statistical mechanics is the language of large deviation theory. The first part of the review presents the basics of large deviation theory, and works out many of its classical applications related to sums of random variables and Markov processes. The second part goes through many problems and results of statistical mechanics, and shows how these can be formulated and derived within the context of large deviation theory. The problems and results treated cover a wide range of physical systems, including equilibrium many-particle systems, noise-perturbed dynamics, nonequilibrium systems, as well as multifractals, disordered systems, and chaotic systems. This review also covers many fundamental aspects of statistical mechanics, such as the derivation of variational principles characterizing equilibrium and nonequilibrium states, the breaking of the Legendre transform for nonconcave entropies, and the characterization of nonequilibrium fluctuations through fluctuation relations.
The large deviation approach to statistical mechanics

Science.gov (United States)

Touchette, Hugo

2009-07-01

The theory of large deviations is concerned with the exponential decay of probabilities of large fluctuations in random systems. These probabilities are important in many fields of study, including statistics, finance, and engineering, as they often yield valuable information about the large fluctuations of a random system around its most probable state or trajectory. In the context of equilibrium statistical mechanics, the theory of large deviations provides exponential-order estimates of probabilities that refine and generalize Einstein’s theory of fluctuations. This review explores this and other connections between large deviation theory and statistical mechanics, in an effort to show that the mathematical language of statistical mechanics is the language of large deviation theory. The first part of the review presents the basics of large deviation theory, and works out many of its classical applications related to sums of random variables and Markov processes. The second part goes through many problems and results of statistical mechanics, and shows how these can be formulated and derived within the context of large deviation theory. The problems and results treated cover a wide range of physical systems, including equilibrium many-particle systems, noise-perturbed dynamics, nonequilibrium systems, as well as multifractals, disordered systems, and chaotic systems. This review also covers many fundamental aspects of statistical mechanics, such as the derivation of variational principles characterizing equilibrium and nonequilibrium states, the breaking of the Legendre transform for nonconcave entropies, and the characterization of nonequilibrium fluctuations through fluctuation relations.
The thresholds for statistical and clinical significance

DEFF Research Database (Denmark)

Jakobsen, Janus Christian; Gluud, Christian; Winkel, Per

2014-01-01

BACKGROUND: Thresholds for statistical significance are insufficiently demonstrated by 95% confidence intervals or P-values when assessing results from randomised clinical trials. First, a P-value only shows the probability of getting a result assuming that the null hypothesis is true and does...... not reflect the probability of getting a result assuming an alternative hypothesis to the null hypothesis is true. Second, a confidence interval or a P-value showing significance may be caused by multiplicity. Third, statistical significance does not necessarily result in clinical significance. Therefore...... of the probability that a given trial result is compatible with a 'null' effect (corresponding to the P-value) divided by the probability that the trial result is compatible with the intervention effect hypothesised in the sample size calculation; (3) adjust the confidence intervals and the statistical significance...
The insignificance of statistical significance testing

Science.gov (United States)

Johnson, Douglas H.

1999-01-01

Despite their use in scientific journals such as The Journal of Wildlife Management, statistical hypothesis tests add very little value to the products of research. Indeed, they frequently confuse the interpretation of data. This paper describes how statistical hypothesis tests are often viewed, and then contrasts that interpretation with the correct one. I discuss the arbitrariness of P-values, conclusions that the null hypothesis is true, power analysis, and distinctions between statistical and biological significance. Statistical hypothesis testing, in which the null hypothesis about the properties of a population is almost always known a priori to be false, is contrasted with scientific hypothesis testing, which examines a credible null hypothesis about phenomena in nature. More meaningful alternatives are briefly outlined, including estimation and confidence intervals for determining the importance of factors, decision theory for guiding actions in the face of uncertainty, and Bayesian approaches to hypothesis testing and other statistical practices.
Cloud-based solution to identify statistically significant MS peaks differentiating sample categories.

Science.gov (United States)

Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B

2013-03-23

Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.
Some Statistics for Measuring Large-Scale Structure

OpenAIRE

Brandenberger, Robert H.; Kaplan, David M.; A, Stephen; Ramsey

1993-01-01

Good statistics for measuring large-scale structure in the Universe must be able to distinguish between different models of structure formation. In this paper, two and three dimensional ``counts in cell" statistics and a new ``discrete genus statistic" are applied to toy versions of several popular theories of structure formation: random phase cold dark matter model, cosmic string models, and global texture scenario. All three statistics appear quite promising in terms of differentiating betw...
A critical discussion of null hypothesis significance testing and statistical power analysis within psychological research

DEFF Research Database (Denmark)

Jones, Allan; Sommerlund, Bo

2007-01-01

The uses of null hypothesis significance testing (NHST) and statistical power analysis within psychological research are critically discussed. The article looks at the problems of relying solely on NHST when dealing with small and large sample sizes. The use of power-analysis in estimating...... the potential error introduced by small and large samples is advocated. Power analysis is not recommended as a replacement to NHST but as an additional source of information about the phenomena under investigation. Moreover, the importance of conceptual analysis in relation to statistical analysis of hypothesis...
Caveats for using statistical significance tests in research assessments

DEFF Research Database (Denmark)

Schneider, Jesper Wiborg

2013-01-01

controversial and numerous criticisms have been leveled against their use. Based on examples from articles by proponents of the use statistical significance tests in research assessments, we address some of the numerous problems with such tests. The issues specifically discussed are the ritual practice......This article raises concerns about the advantages of using statistical significance tests in research assessments as has recently been suggested in the debate about proper normalization procedures for citation indicators by Opthof and Leydesdorff (2010). Statistical significance tests are highly...... argue that applying statistical significance tests and mechanically adhering to their results are highly problematic and detrimental to critical thinking. We claim that the use of such tests do not provide any advantages in relation to deciding whether differences between citation indicators...
Statistical significance of epidemiological data. Seminar: Evaluation of epidemiological studies

International Nuclear Information System (INIS)

Weber, K.H.

1993-01-01

In stochastic damages, the numbers of events, e.g. the persons who are affected by or have died of cancer, and thus the relative frequencies (incidence or mortality) are binomially distributed random variables. Their statistical fluctuations can be characterized by confidence intervals. For epidemiologic questions, especially for the analysis of stochastic damages in the low dose range, the following issues are interesting: - Is a sample (a group of persons) with a definite observed damage frequency part of the whole population? - Is an observed frequency difference between two groups of persons random or statistically significant? - Is an observed increase or decrease of the frequencies with increasing dose random or statistically significant and how large is the regression coefficient (= risk coefficient) in this case? These problems can be solved by sttistical tests. So-called distribution-free tests and tests which are not bound to the supposition of normal distribution are of particular interest, such as: - χ 2 -independence test (test in contingency tables); - Fisher-Yates-test; - trend test according to Cochran; - rank correlation test given by Spearman. These tests are explained in terms of selected epidemiologic data, e.g. of leukaemia clusters, of the cancer mortality of the Japanese A-bomb survivors especially in the low dose range as well as on the sample of the cancer mortality in the high background area in Yangjiang (China). (orig.) [de
Common pitfalls in statistical analysis: "P" values, statistical significance and confidence intervals

Directory of Open Access Journals (Sweden)

Priya Ranganathan

2015-01-01

Full Text Available In the second part of a series on pitfalls in statistical analysis, we look at various ways in which a statistically significant study result can be expressed. We debunk some of the myths regarding the ′P′ value, explain the importance of ′confidence intervals′ and clarify the importance of including both values in a paper
Health significance and statistical uncertainty. The value of P-value.

Science.gov (United States)

Consonni, Dario; Bertazzi, Pier Alberto

2017-10-27

The P-value is widely used as a summary statistics of scientific results. Unfortunately, there is a widespread tendency to dichotomize its value in "P0.05" ("statistically not significant"), with the former implying a "positive" result and the latter a "negative" one. To show the unsuitability of such an approach when evaluating the effects of environmental and occupational risk factors. We provide examples of distorted use of P-value and of the negative consequences for science and public health of such a black-and-white vision. The rigid interpretation of P-value as a dichotomy favors the confusion between health relevance and statistical significance, discourages thoughtful thinking, and distorts attention from what really matters, the health significance. A much better way to express and communicate scientific results involves reporting effect estimates (e.g., risks, risks ratios or risk differences) and their confidence intervals (CI), which summarize and convey both health significance and statistical uncertainty. Unfortunately, many researchers do not usually consider the whole interval of CI but only examine if it includes the null-value, therefore degrading this procedure to the same P-value dichotomy (statistical significance or not). In reporting statistical results of scientific research present effects estimates with their confidence intervals and do not qualify the P-value as "significant" or "not significant".
Using the Bootstrap Method for a Statistical Significance Test of Differences between Summary Histograms

Science.gov (United States)

Xu, Kuan-Man

2006-01-01

A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.
The large break LOCA evaluation method with the simplified statistic approach

International Nuclear Information System (INIS)

Kamata, Shinya; Kubo, Kazuo

2004-01-01

USNRC published the Code Scaling, Applicability and Uncertainty (CSAU) evaluation methodology to large break LOCA which supported the revised rule for Emergency Core Cooling System performance in 1989. In USNRC regulatory guide 1.157, it is required that the peak cladding temperature (PCT) cannot exceed 2200deg F with high probability 95th percentile. In recent years, overseas countries have developed statistical methodology and best estimate code with the model which can provide more realistic simulation for the phenomena based on the CSAU evaluation methodology. In order to calculate PCT probability distribution by Monte Carlo trials, there are approaches such as the response surface technique using polynomials, the order statistics method, etc. For the purpose of performing rational statistic analysis, Mitsubishi Heavy Industries, LTD (MHI) tried to develop the statistic LOCA method using the best estimate LOCA code MCOBRA/TRAC and the simplified code HOTSPOT. HOTSPOT is a Monte Carlo heat conduction solver to evaluate the uncertainties of the significant fuel parameters at the PCT positions of the hot rod. The direct uncertainty sensitivity studies can be performed without the response surface because the Monte Carlo simulation for key parameters can be performed in short time using HOTSPOT. With regard to the parameter uncertainties, MHI established the treatment that the bounding conditions are given for LOCA boundary and plant initial conditions, the Monte Carlo simulation using HOTSPOT is applied to the significant fuel parameters. The paper describes the large break LOCA evaluation method with the simplified statistic approach and the results of the application of the method to the representative four-loop nuclear power plant. (author)
Common pitfalls in statistical analysis: “P” values, statistical significance and confidence intervals

Science.gov (United States)

Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc

2015-01-01

In the second part of a series on pitfalls in statistical analysis, we look at various ways in which a statistically significant study result can be expressed. We debunk some of the myths regarding the ‘P’ value, explain the importance of ‘confidence intervals’ and clarify the importance of including both values in a paper PMID:25878958
Multivariate statistics high-dimensional and large-sample approximations

CERN Document Server

Fujikoshi, Yasunori; Shimizu, Ryoichi

2010-01-01

A comprehensive examination of high-dimensional analysis of multivariate methods and their real-world applications Multivariate Statistics: High-Dimensional and Large-Sample Approximations is the first book of its kind to explore how classical multivariate methods can be revised and used in place of conventional statistical tools. Written by prominent researchers in the field, the book focuses on high-dimensional and large-scale approximations and details the many basic multivariate methods used to achieve high levels of accuracy. The authors begin with a fundamental presentation of the basic
A course in mathematical statistics and large sample theory

CERN Document Server

Bhattacharya, Rabi; Patrangenaru, Victor

2016-01-01

This graduate-level textbook is primarily aimed at graduate students of statistics, mathematics, science, and engineering who have had an undergraduate course in statistics, an upper division course in analysis, and some acquaintance with measure theoretic probability. It provides a rigorous presentation of the core of mathematical statistics. Part I of this book constitutes a one-semester course on basic parametric mathematical statistics. Part II deals with the large sample theory of statistics — parametric and nonparametric, and its contents may be covered in one semester as well. Part III provides brief accounts of a number of topics of current interest for practitioners and other disciplines whose work involves statistical methods. Large Sample theory with many worked examples, numerical calculations, and simulations to illustrate theory Appendices provide ready access to a number of standard results, with many proofs Solutions given to a number of selected exercises from Part I Part II exercises with ...

Shell model in large spaces and statistical spectroscopy

International Nuclear Information System (INIS)

Kota, V.K.B.

1996-01-01

For many nuclear structure problems of current interest it is essential to deal with shell model in large spaces. For this, three different approaches are now in use and two of them are: (i) the conventional shell model diagonalization approach but taking into account new advances in computer technology; (ii) the shell model Monte Carlo method. A brief overview of these two methods is given. Large space shell model studies raise fundamental questions regarding the information content of the shell model spectrum of complex nuclei. This led to the third approach- the statistical spectroscopy methods. The principles of statistical spectroscopy have their basis in nuclear quantum chaos and they are described (which are substantiated by large scale shell model calculations) in some detail. (author)
Understanding the Sampling Distribution and Its Use in Testing Statistical Significance.

Science.gov (United States)

Breunig, Nancy A.

Despite the increasing criticism of statistical significance testing by researchers, particularly in the publication of the 1994 American Psychological Association's style manual, statistical significance test results are still popular in journal articles. For this reason, it remains important to understand the logic of inferential statistics. A…
Swiss solar power statistics 2007 - Significant expansion

International Nuclear Information System (INIS)

Hostettler, T.

2008-01-01

This article presents and discusses the 2007 statistics for solar power in Switzerland. A significant number of new installations is noted as is the high production figures from newer installations. The basics behind the compilation of the Swiss solar power statistics are briefly reviewed and an overview for the period 1989 to 2007 is presented which includes figures on the number of photovoltaic plant in service and installed peak power. Typical production figures in kilowatt-hours (kWh) per installed kilowatt-peak power (kWp) are presented and discussed for installations of various sizes. Increased production after inverter replacement in older installations is noted. Finally, the general political situation in Switzerland as far as solar power is concerned are briefly discussed as are international developments.
Power, effects, confidence, and significance: an investigation of statistical practices in nursing research.

Science.gov (United States)

Gaskin, Cadeyrn J; Happell, Brenda

2014-05-01

To (a) assess the statistical power of nursing research to detect small, medium, and large effect sizes; (b) estimate the experiment-wise Type I error rate in these studies; and (c) assess the extent to which (i) a priori power analyses, (ii) effect sizes (and interpretations thereof), and (iii) confidence intervals were reported. Statistical review. Papers published in the 2011 volumes of the 10 highest ranked nursing journals, based on their 5-year impact factors. Papers were assessed for statistical power, control of experiment-wise Type I error, reporting of a priori power analyses, reporting and interpretation of effect sizes, and reporting of confidence intervals. The analyses were based on 333 papers, from which 10,337 inferential statistics were identified. The median power to detect small, medium, and large effect sizes was .40 (interquartile range [IQR]=.24-.71), .98 (IQR=.85-1.00), and 1.00 (IQR=1.00-1.00), respectively. The median experiment-wise Type I error rate was .54 (IQR=.26-.80). A priori power analyses were reported in 28% of papers. Effect sizes were routinely reported for Spearman's rank correlations (100% of papers in which this test was used), Poisson regressions (100%), odds ratios (100%), Kendall's tau correlations (100%), Pearson's correlations (99%), logistic regressions (98%), structural equation modelling/confirmatory factor analyses/path analyses (97%), and linear regressions (83%), but were reported less often for two-proportion z tests (50%), analyses of variance/analyses of covariance/multivariate analyses of variance (18%), t tests (8%), Wilcoxon's tests (8%), Chi-squared tests (8%), and Fisher's exact tests (7%), and not reported for sign tests, Friedman's tests, McNemar's tests, multi-level models, and Kruskal-Wallis tests. Effect sizes were infrequently interpreted. Confidence intervals were reported in 28% of papers. The use, reporting, and interpretation of inferential statistics in nursing research need substantial
Intensive inpatient treatment for bulimia nervosa: Statistical and clinical significance of symptom changes.

Science.gov (United States)

Diedrich, Alice; Schlegl, Sandra; Greetfeld, Martin; Fumi, Markus; Voderholzer, Ulrich

2018-03-01

This study examines the statistical and clinical significance of symptom changes during an intensive inpatient treatment program with a strong psychotherapeutic focus for individuals with severe bulimia nervosa. 295 consecutively admitted bulimic patients were administered the Structured Interview for Anorexic and Bulimic Syndromes-Self-Rating (SIAB-S), the Eating Disorder Inventory-2 (EDI-2), the Brief Symptom Inventory (BSI), and the Beck Depression Inventory-II (BDI-II) at treatment intake and discharge. Results indicated statistically significant symptom reductions with large effect sizes regarding severity of binge eating and compensatory behavior (SIAB-S), overall eating disorder symptom severity (EDI-2), overall psychopathology (BSI), and depressive symptom severity (BDI-II) even when controlling for antidepressant medication. The majority of patients showed either reliable (EDI-2: 33.7%, BSI: 34.8%, BDI-II: 18.1%) or even clinically significant symptom changes (EDI-2: 43.2%, BSI: 33.9%, BDI-II: 56.9%). Patients with clinically significant improvement were less distressed at intake and less likely to suffer from a comorbid borderline personality disorder when compared with those who did not improve to a clinically significant extent. Findings indicate that intensive psychotherapeutic inpatient treatment may be effective in about 75% of severely affected bulimic patients. For the remaining non-responding patients, inpatient treatment might be improved through an even stronger focus on the reduction of comorbid borderline personality traits.
Statistical measurement of power spectrum density of large aperture optical component

International Nuclear Information System (INIS)

Xu Jiancheng; Xu Qiao; Chai Liqun

2010-01-01

According to the requirement of ICF, a method based on statistical theory has been proposed to measure the power spectrum density (PSD) of large aperture optical components. The method breaks the large-aperture wavefront into small regions, and obtains the PSD of the large-aperture wavefront by weighted averaging of the PSDs of the regions, where the weight factor is each region's area. Simulation and experiment demonstrate the effectiveness of the proposed method. They also show that, the obtained PSDs of the large-aperture wavefront by statistical method and sub-aperture stitching method fit well, when the number of small regions is no less than 8 x 8. The statistical method is not sensitive to translation stage's errors and environment instabilities, thus it is appropriate for PSD measurement during the process of optical fabrication. (authors)
On detection and assessment of statistical significance of Genomic Islands

Directory of Open Access Journals (Sweden)

Chaudhuri Probal

2008-04-01

Full Text Available Abstract Background Many of the available methods for detecting Genomic Islands (GIs in prokaryotic genomes use markers such as transposons, proximal tRNAs, flanking repeats etc., or they use other supervised techniques requiring training datasets. Most of these methods are primarily based on the biases in GC content or codon and amino acid usage of the islands. However, these methods either do not use any formal statistical test of significance or use statistical tests for which the critical values and the P-values are not adequately justified. We propose a method, which is unsupervised in nature and uses Monte-Carlo statistical tests based on randomly selected segments of a chromosome. Such tests are supported by precise statistical distribution theory, and consequently, the resulting P-values are quite reliable for making the decision. Results Our algorithm (named Design-Island, an acronym for Detection of Statistically Significant Genomic Island runs in two phases. Some 'putative GIs' are identified in the first phase, and those are refined into smaller segments containing horizontally acquired genes in the refinement phase. This method is applied to Salmonella typhi CT18 genome leading to the discovery of several new pathogenicity, antibiotic resistance and metabolic islands that were missed by earlier methods. Many of these islands contain mobile genetic elements like phage-mediated genes, transposons, integrase and IS elements confirming their horizontal acquirement. Conclusion The proposed method is based on statistical tests supported by precise distribution theory and reliable P-values along with a technique for visualizing statistically significant islands. The performance of our method is better than many other well known methods in terms of their sensitivity and accuracy, and in terms of specificity, it is comparable to other methods.
Increasing the statistical significance of entanglement detection in experiments.

Science.gov (United States)

Jungnitsch, Bastian; Niekamp, Sönke; Kleinmann, Matthias; Gühne, Otfried; Lu, He; Gao, Wei-Bo; Chen, Yu-Ao; Chen, Zeng-Bing; Pan, Jian-Wei

2010-05-28

Entanglement is often verified by a violation of an inequality like a Bell inequality or an entanglement witness. Considerable effort has been devoted to the optimization of such inequalities in order to obtain a high violation. We demonstrate theoretically and experimentally that such an optimization does not necessarily lead to a better entanglement test, if the statistical error is taken into account. Theoretically, we show for different error models that reducing the violation of an inequality can improve the significance. Experimentally, we observe this phenomenon in a four-photon experiment, testing the Mermin and Ardehali inequality for different levels of noise. Furthermore, we provide a way to develop entanglement tests with high statistical significance.
Testing the Difference of Correlated Agreement Coefficients for Statistical Significance

Science.gov (United States)

Gwet, Kilem L.

2016-01-01

This article addresses the problem of testing the difference between two correlated agreement coefficients for statistical significance. A number of authors have proposed methods for testing the difference between two correlated kappa coefficients, which require either the use of resampling methods or the use of advanced statistical modeling…
A tutorial on hunting statistical significance by chasing N

Directory of Open Access Journals (Sweden)

Denes Szucs

2016-09-01

Full Text Available There is increasing concern about the replicability of studies in psychology and cognitive neuroscience. Hidden data dredging (also called p-hacking is a major contributor to this crisis because it substantially increases Type I error resulting in a much larger proportion of false positive findings than the usually expected 5%. In order to build better intuition to avoid, detect and criticise some typical problems, here I systematically illustrate the large impact of some easy to implement and so, perhaps frequent data dredging techniques on boosting false positive findings. I illustrate several forms of two special cases of data dredging. First, researchers may violate the data collection stopping rules of null hypothesis significance testing by repeatedly checking for statistical significance with various numbers of participants. Second, researchers may group participants post-hoc along potential but unplanned independent grouping variables. The first approach 'hacks' the number of participants in studies, the second approach ‘hacks’ the number of variables in the analysis. I demonstrate the high amount of false positive findings generated by these techniques with data from true null distributions. I also illustrate that it is extremely easy to introduce strong bias into data by very mild selection and re-testing. Similar, usually undocumented data dredging steps can easily lead to having 20-50%, or more false positives.
Efficient Partitioning of Large Databases without Query Statistics

Directory of Open Access Journals (Sweden)

Shahidul Islam KHAN

2016-11-01

Full Text Available An efficient way of improving the performance of a database management system is distributed processing. Distribution of data involves fragmentation or partitioning, replication, and allocation process. Previous research works provided partitioning based on empirical data about the type and frequency of the queries. These solutions are not suitable at the initial stage of a distributed database as query statistics are not available then. In this paper, I have presented a fragmentation technique, Matrix based Fragmentation (MMF, which can be applied at the initial stage as well as at later stages of distributed databases. Instead of using empirical data, I have developed a matrix, Modified Create, Read, Update and Delete (MCRUD, to partition a large database properly. Allocation of fragments is done simultaneously in my proposed technique. So using MMF, no additional complexity is added for allocating the fragments to the sites of a distributed database as fragmentation is synchronized with allocation. The performance of a DDBMS can be improved significantly by avoiding frequent remote access and high data transfer among the sites. Results show that proposed technique can solve the initial partitioning problem of large distributed databases.
Large-Deviation Results for Discriminant Statistics of Gaussian Locally Stationary Processes

Directory of Open Access Journals (Sweden)

Junichi Hirukawa

2012-01-01

Full Text Available This paper discusses the large-deviation principle of discriminant statistics for Gaussian locally stationary processes. First, large-deviation theorems for quadratic forms and the log-likelihood ratio for a Gaussian locally stationary process with a mean function are proved. Their asymptotics are described by the large deviation rate functions. Second, we consider the situations where processes are misspecified to be stationary. In these misspecified cases, we formally make the log-likelihood ratio discriminant statistics and derive the large deviation theorems of them. Since they are complicated, they are evaluated and illustrated by numerical examples. We realize the misspecification of the process to be stationary seriously affecting our discrimination.
Statistical Significance for Hierarchical Clustering

Science.gov (United States)

Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

2017-01-01

Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990
Statistical significance of trends in monthly heavy precipitation over the US

KAUST Repository

Mahajan, Salil

2011-05-11

Trends in monthly heavy precipitation, defined by a return period of one year, are assessed for statistical significance in observations and Global Climate Model (GCM) simulations over the contiguous United States using Monte Carlo non-parametric and parametric bootstrapping techniques. The results from the two Monte Carlo approaches are found to be similar to each other, and also to the traditional non-parametric Kendall\\'s τ test, implying the robustness of the approach. Two different observational data-sets are employed to test for trends in monthly heavy precipitation and are found to exhibit consistent results. Both data-sets demonstrate upward trends, one of which is found to be statistically significant at the 95% confidence level. Upward trends similar to observations are observed in some climate model simulations of the twentieth century, but their statistical significance is marginal. For projections of the twenty-first century, a statistically significant upwards trend is observed in most of the climate models analyzed. The change in the simulated precipitation variance appears to be more important in the twenty-first century projections than changes in the mean precipitation. Stochastic fluctuations of the climate-system are found to be dominate monthly heavy precipitation as some GCM simulations show a downwards trend even in the twenty-first century projections when the greenhouse gas forcings are strong. © 2011 Springer-Verlag.
Statistics of LES simulations of large wind farms

DEFF Research Database (Denmark)

Andersen, Søren Juhl; Sørensen, Jens Nørkær; Mikkelsen, Robert Flemming

2016-01-01

. The statistical moments appear to collapse and hence the turbulence inside large wind farms can potentially be scaled accordingly. The thrust coefficient is estimated by two different reference velocities and the generic CT expression by Frandsen. A reference velocity derived from the power production is shown...... to give very good agreement and furthermore enables the very good estimation of the thrust force using only the steady CT-curve, even for very short time samples. Finally, the effective turbulence inside large wind farms and the equivalent loads are examined....
Sibling Competition & Growth Tradeoffs. Biological vs. Statistical Significance.

Science.gov (United States)

Kramer, Karen L; Veile, Amanda; Otárola-Castillo, Erik

2016-01-01

Early childhood growth has many downstream effects on future health and reproduction and is an important measure of offspring quality. While a tradeoff between family size and child growth outcomes is theoretically predicted in high-fertility societies, empirical evidence is mixed. This is often attributed to phenotypic variation in parental condition. However, inconsistent study results may also arise because family size confounds the potentially differential effects that older and younger siblings can have on young children's growth. Additionally, inconsistent results might reflect that the biological significance associated with different growth trajectories is poorly understood. This paper addresses these concerns by tracking children's monthly gains in height and weight from weaning to age five in a high fertility Maya community. We predict that: 1) as an aggregate measure family size will not have a major impact on child growth during the post weaning period; 2) competition from young siblings will negatively impact child growth during the post weaning period; 3) however because of their economic value, older siblings will have a negligible effect on young children's growth. Accounting for parental condition, we use linear mixed models to evaluate the effects that family size, younger and older siblings have on children's growth. Congruent with our expectations, it is younger siblings who have the most detrimental effect on children's growth. While we find statistical evidence of a quantity/quality tradeoff effect, the biological significance of these results is negligible in early childhood. Our findings help to resolve why quantity/quality studies have had inconsistent results by showing that sibling competition varies with sibling age composition, not just family size, and that biological significance is distinct from statistical significance.
Sibling Competition & Growth Tradeoffs. Biological vs. Statistical Significance.

Directory of Open Access Journals (Sweden)

Karen L Kramer

Full Text Available Early childhood growth has many downstream effects on future health and reproduction and is an important measure of offspring quality. While a tradeoff between family size and child growth outcomes is theoretically predicted in high-fertility societies, empirical evidence is mixed. This is often attributed to phenotypic variation in parental condition. However, inconsistent study results may also arise because family size confounds the potentially differential effects that older and younger siblings can have on young children's growth. Additionally, inconsistent results might reflect that the biological significance associated with different growth trajectories is poorly understood. This paper addresses these concerns by tracking children's monthly gains in height and weight from weaning to age five in a high fertility Maya community. We predict that: 1 as an aggregate measure family size will not have a major impact on child growth during the post weaning period; 2 competition from young siblings will negatively impact child growth during the post weaning period; 3 however because of their economic value, older siblings will have a negligible effect on young children's growth. Accounting for parental condition, we use linear mixed models to evaluate the effects that family size, younger and older siblings have on children's growth. Congruent with our expectations, it is younger siblings who have the most detrimental effect on children's growth. While we find statistical evidence of a quantity/quality tradeoff effect, the biological significance of these results is negligible in early childhood. Our findings help to resolve why quantity/quality studies have had inconsistent results by showing that sibling competition varies with sibling age composition, not just family size, and that biological significance is distinct from statistical significance.
Increasing the statistical significance of entanglement detection in experiments

Energy Technology Data Exchange (ETDEWEB)

Jungnitsch, Bastian; Niekamp, Soenke; Kleinmann, Matthias; Guehne, Otfried [Institut fuer Quantenoptik und Quanteninformation, Innsbruck (Austria); Lu, He; Gao, Wei-Bo; Chen, Zeng-Bing [Hefei National Laboratory for Physical Sciences at Microscale and Department of Modern Physics, University of Science and Technology of China, Hefei (China); Chen, Yu-Ao; Pan, Jian-Wei [Hefei National Laboratory for Physical Sciences at Microscale and Department of Modern Physics, University of Science and Technology of China, Hefei (China); Physikalisches Institut, Universitaet Heidelberg (Germany)

2010-07-01

Entanglement is often verified by a violation of an inequality like a Bell inequality or an entanglement witness. Considerable effort has been devoted to the optimization of such inequalities in order to obtain a high violation. We demonstrate theoretically and experimentally that such an optimization does not necessarily lead to a better entanglement test, if the statistical error is taken into account. Theoretically, we show for different error models that reducing the violation of an inequality can improve the significance. We show this to be the case for an error model in which the variance of an observable is interpreted as its error and for the standard error model in photonic experiments. Specifically, we demonstrate that the Mermin inequality yields a Bell test which is statistically more significant than the Ardehali inequality in the case of a photonic four-qubit state that is close to a GHZ state. Experimentally, we observe this phenomenon in a four-photon experiment, testing the above inequalities for different levels of noise.
Reporting effect sizes as a supplement to statistical significance ...

African Journals Online (AJOL)

The purpose of the article is to review the statistical significance reporting practices in reading instruction studies and to provide guidelines for when to calculate and report effect sizes in educational research. A review of six readily accessible (online) and accredited journals publishing research on reading instruction ...
Your Chi-Square Test Is Statistically Significant: Now What?

Science.gov (United States)

Sharpe, Donald

2015-01-01

Applied researchers have employed chi-square tests for more than one hundred years. This paper addresses the question of how one should follow a statistically significant chi-square test result in order to determine the source of that result. Four approaches were evaluated: calculating residuals, comparing cells, ransacking, and partitioning. Data…

Polish Phoneme Statistics Obtained On Large Set Of Written Texts

Directory of Open Access Journals (Sweden)

Bartosz Ziółko

2009-01-01

Full Text Available The phonetical statistics were collected from several Polish corpora. The paper is a summaryof the data which are phoneme n-grams and some phenomena in the statistics. Triphonestatistics apply context-dependent speech units which have an important role in speech recognitionsystems and were never calculated for a large set of Polish written texts. The standardphonetic alphabet for Polish, SAMPA, and methods of providing phonetic transcriptions are described.
Confidence intervals permit, but don't guarantee, better inference than statistical significance testing

Directory of Open Access Journals (Sweden)

Melissa Coulson

2010-07-01

Full Text Available A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST, or confidence intervals (CIs. Authors of articles published in psychology, behavioural neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST.
Testing statistical significance scores of sequence comparison methods with structure similarity

Directory of Open Access Journals (Sweden)

Leunissen Jack AM

2006-10-01

Full Text Available Abstract Background In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. Results All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. Conclusion The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons.
4P: fast computing of population genetics statistics from large DNA polymorphism panels.

Science.gov (United States)

Benazzo, Andrea; Panziera, Alex; Bertorelle, Giorgio

2015-01-01

Massive DNA sequencing has significantly increased the amount of data available for population genetics and molecular ecology studies. However, the parallel computation of simple statistics within and between populations from large panels of polymorphic sites is not yet available, making the exploratory analyses of a set or subset of data a very laborious task. Here, we present 4P (parallel processing of polymorphism panels), a stand-alone software program for the rapid computation of genetic variation statistics (including the joint frequency spectrum) from millions of DNA variants in multiple individuals and multiple populations. It handles a standard input file format commonly used to store DNA variation from empirical or simulation experiments. The computational performance of 4P was evaluated using large SNP (single nucleotide polymorphism) datasets from human genomes or obtained by simulations. 4P was faster or much faster than other comparable programs, and the impact of parallel computing using multicore computers or servers was evident. 4P is a useful tool for biologists who need a simple and rapid computer program to run exploratory population genetics analyses in large panels of genomic data. It is also particularly suitable to analyze multiple data sets produced in simulation studies. Unix, Windows, and MacOs versions are provided, as well as the source code for easier pipeline implementations.
Statistics of geodesics in large quadrangulations

International Nuclear Information System (INIS)

Bouttier, J; Guitter, E

2008-01-01

We study the statistical properties of geodesics, i.e. paths of minimal length, in large random planar quadrangulations. We extend Schaeffer's well-labeled tree bijection to the case of quadrangulations with a marked geodesic, leading to the notion of 'spine trees', amenable to a direct enumeration. We obtain the generating functions for quadrangulations with a marked geodesic of fixed length, as well as with a set of 'confluent geodesics', i.e. a collection of non-intersecting minimal paths connecting two given points. In the limit of quadrangulations with a large area n, we find in particular an average number 3 x 2 i of geodesics between two fixed points at distance i >> 1 from each other. We show that, for generic endpoints, two confluent geodesics remain close to each other and have an extensive number of contacts. This property fails for a few 'exceptional' endpoints which can be linked by truly distinct geodesics. Results are presented both in the case of finite length i and in the scaling limit i ∼ n 1/4 . In particular, we give the scaling distribution of the exceptional points
Statistical significance versus clinical relevance.

Science.gov (United States)

van Rijn, Marieke H C; Bech, Anneke; Bouyer, Jean; van den Brand, Jan A J G

2017-04-01

In March this year, the American Statistical Association (ASA) posted a statement on the correct use of P-values, in response to a growing concern that the P-value is commonly misused and misinterpreted. We aim to translate these warnings given by the ASA into a language more easily understood by clinicians and researchers without a deep background in statistics. Moreover, we intend to illustrate the limitations of P-values, even when used and interpreted correctly, and bring more attention to the clinical relevance of study findings using two recently reported studies as examples. We argue that P-values are often misinterpreted. A common mistake is saying that P < 0.05 means that the null hypothesis is false, and P ≥0.05 means that the null hypothesis is true. The correct interpretation of a P-value of 0.05 is that if the null hypothesis were indeed true, a similar or more extreme result would occur 5% of the times upon repeating the study in a similar sample. In other words, the P-value informs about the likelihood of the data given the null hypothesis and not the other way around. A possible alternative related to the P-value is the confidence interval (CI). It provides more information on the magnitude of an effect and the imprecision with which that effect was estimated. However, there is no magic bullet to replace P-values and stop erroneous interpretation of scientific results. Scientists and readers alike should make themselves familiar with the correct, nuanced interpretation of statistical tests, P-values and CIs. © The Author 2017. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
Statistical Significance and Effect Size: Two Sides of a Coin.

Science.gov (United States)

Fan, Xitao

This paper suggests that statistical significance testing and effect size are two sides of the same coin; they complement each other, but do not substitute for one another. Good research practice requires that both should be taken into consideration to make sound quantitative decisions. A Monte Carlo simulation experiment was conducted, and a…
Statistics and Dynamics in the Large-scale Structure of the Universe

International Nuclear Information System (INIS)

Matsubara, Takahiko

2006-01-01

In cosmology, observations and theories are related to each other by statistics in most cases. Especially, statistical methods play central roles in analyzing fluctuations in the universe, which are seeds of the present structure of the universe. The confrontation of the statistics and dynamics is one of the key methods to unveil the structure and evolution of the universe. I will review some of the major statistical methods in cosmology, in connection with linear and nonlinear dynamics of the large-scale structure of the universe. The present status of analyses of the observational data such as the Sloan Digital Sky Survey, and the future prospects to constrain the nature of exotic components of the universe such as the dark energy will be presented
Publication of statistically significant research findings in prosthodontics & implant dentistry in the context of other dental specialties.

Science.gov (United States)

Papageorgiou, Spyridon N; Kloukos, Dimitrios; Petridis, Haralampos; Pandis, Nikolaos

2015-10-01

To assess the hypothesis that there is excessive reporting of statistically significant studies published in prosthodontic and implantology journals, which could indicate selective publication. The last 30 issues of 9 journals in prosthodontics and implant dentistry were hand-searched for articles with statistical analyses. The percentages of significant and non-significant results were tabulated by parameter of interest. Univariable/multivariable logistic regression analyses were applied to identify possible predictors of reporting statistically significance findings. The results of this study were compared with similar studies in dentistry with random-effects meta-analyses. From the 2323 included studies 71% of them reported statistically significant results, with the significant results ranging from 47% to 86%. Multivariable modeling identified that geographical area and involvement of statistician were predictors of statistically significant results. Compared to interventional studies, the odds that in vitro and observational studies would report statistically significant results was increased by 1.20 times (OR: 2.20, 95% CI: 1.66-2.92) and 0.35 times (OR: 1.35, 95% CI: 1.05-1.73), respectively. The probability of statistically significant results from randomized controlled trials was significantly lower compared to various study designs (difference: 30%, 95% CI: 11-49%). Likewise the probability of statistically significant results in prosthodontics and implant dentistry was lower compared to other dental specialties, but this result did not reach statistical significant (P>0.05). The majority of studies identified in the fields of prosthodontics and implant dentistry presented statistically significant results. The same trend existed in publications of other specialties in dentistry. Copyright © 2015 Elsevier Ltd. All rights reserved.
Significant Statistics: Viewed with a Contextual Lens

Science.gov (United States)

Tait-McCutcheon, Sandi

2010-01-01

This paper examines the pedagogical and organisational changes three lead teachers made to their statistics teaching and learning programs. The lead teachers posed the research question: What would the effect of contextually integrating statistical investigations and literacies into other curriculum areas be on student achievement? By finding the…
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"

Science.gov (United States)

Ozturk, Elif

2012-01-01

The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
Statistical vs. Economic Significance in Economics and Econometrics: Further comments on McCloskey & Ziliak

DEFF Research Database (Denmark)

Engsted, Tom

I comment on the controversy between McCloskey & Ziliak and Hoover & Siegler on statistical versus economic significance, in the March 2008 issue of the Journal of Economic Methodology. I argue that while McCloskey & Ziliak are right in emphasizing 'real error', i.e. non-sampling error that cannot...... be eliminated through specification testing, they fail to acknowledge those areas in economics, e.g. rational expectations macroeconomics and asset pricing, where researchers clearly distinguish between statistical and economic significance and where statistical testing plays a relatively minor role in model...
Current fluctuations and statistics during a large deviation event in an exactly solvable transport model

International Nuclear Information System (INIS)

Hurtado, Pablo I; Garrido, Pedro L

2009-01-01

We study the distribution of the time-integrated current in an exactly solvable toy model of heat conduction, both analytically and numerically. The simplicity of the model allows us to derive the full current large deviation function and the system statistics during a large deviation event. In this way we unveil a relation between system statistics at the end of a large deviation event and for intermediate times. The mid-time statistics is independent of the sign of the current, a reflection of the time-reversal symmetry of microscopic dynamics, while the end-time statistics does depend on the current sign, and also on its microscopic definition. We compare our exact results with simulations based on the direct evaluation of large deviation functions, analyzing the finite-size corrections of this simulation method and deriving detailed bounds for its applicability. We also show how the Gallavotti–Cohen fluctuation theorem can be used to determine the range of validity of simulation results
Intelligent system for statistically significant expertise knowledge on the basis of the model of self-organizing nonequilibrium dissipative system

Directory of Open Access Journals (Sweden)

E. A. Tatokchin

2017-01-01

Full Text Available Development of the modern educational technologies caused by broad introduction of comput-er testing and development of distant forms of education does necessary revision of methods of an examination of pupils. In work it was shown, need transition to mathematical criteria, exami-nations of knowledge which are deprived of subjectivity. In article the review of the problems arising at realization of this task and are offered approaches for its decision. The greatest atten-tion is paid to discussion of a problem of objective transformation of rated estimates of the ex-pert on to the scale estimates of the student. In general, the discussion this question is was con-cluded that the solution to this problem lies in the creation of specialized intellectual systems. The basis for constructing intelligent system laid the mathematical model of self-organizing nonequilibrium dissipative system, which is a group of students. This article assumes that the dissipative system is provided by the constant influx of new test items of the expert and non-equilibrium – individual psychological characteristics of students in the group. As a result, the system must self-organize themselves into stable patterns. This patern will allow for, relying on large amounts of data, get a statistically significant assessment of student. To justify the pro-posed approach in the work presents the data of the statistical analysis of the results of testing a large sample of students (> 90. Conclusions from this statistical analysis allowed to develop intelligent system statistically significant examination of student performance. It is based on data clustering algorithm (k-mean for the three key parameters. It is shown that this approach allows you to create of the dynamics and objective expertise evaluation.
The Importance of Integrating Clinical Relevance and Statistical Significance in the Assessment of Quality of Care--Illustrated Using the Swedish Stroke Register.

Directory of Open Access Journals (Sweden)

Anita Lindmark

Full Text Available When profiling hospital performance, quality inicators are commonly evaluated through hospital-specific adjusted means with confidence intervals. When identifying deviations from a norm, large hospitals can have statistically significant results even for clinically irrelevant deviations while important deviations in small hospitals can remain undiscovered. We have used data from the Swedish Stroke Register (Riksstroke to illustrate the properties of a benchmarking method that integrates considerations of both clinical relevance and level of statistical significance.The performance measure used was case-mix adjusted risk of death or dependency in activities of daily living within 3 months after stroke. A hospital was labeled as having outlying performance if its case-mix adjusted risk exceeded a benchmark value with a specified statistical confidence level. The benchmark was expressed relative to the population risk and should reflect the clinically relevant deviation that is to be detected. A simulation study based on Riksstroke patient data from 2008-2009 was performed to investigate the effect of the choice of the statistical confidence level and benchmark value on the diagnostic properties of the method.Simulations were based on 18,309 patients in 76 hospitals. The widely used setting, comparing 95% confidence intervals to the national average, resulted in low sensitivity (0.252 and high specificity (0.991. There were large variations in sensitivity and specificity for different requirements of statistical confidence. Lowering statistical confidence improved sensitivity with a relatively smaller loss of specificity. Variations due to different benchmark values were smaller, especially for sensitivity. This allows the choice of a clinically relevant benchmark to be driven by clinical factors without major concerns about sufficiently reliable evidence.The study emphasizes the importance of combining clinical relevance and level of statistical
The Importance of Integrating Clinical Relevance and Statistical Significance in the Assessment of Quality of Care--Illustrated Using the Swedish Stroke Register.

Science.gov (United States)

Lindmark, Anita; van Rompaye, Bart; Goetghebeur, Els; Glader, Eva-Lotta; Eriksson, Marie

2016-01-01

When profiling hospital performance, quality inicators are commonly evaluated through hospital-specific adjusted means with confidence intervals. When identifying deviations from a norm, large hospitals can have statistically significant results even for clinically irrelevant deviations while important deviations in small hospitals can remain undiscovered. We have used data from the Swedish Stroke Register (Riksstroke) to illustrate the properties of a benchmarking method that integrates considerations of both clinical relevance and level of statistical significance. The performance measure used was case-mix adjusted risk of death or dependency in activities of daily living within 3 months after stroke. A hospital was labeled as having outlying performance if its case-mix adjusted risk exceeded a benchmark value with a specified statistical confidence level. The benchmark was expressed relative to the population risk and should reflect the clinically relevant deviation that is to be detected. A simulation study based on Riksstroke patient data from 2008-2009 was performed to investigate the effect of the choice of the statistical confidence level and benchmark value on the diagnostic properties of the method. Simulations were based on 18,309 patients in 76 hospitals. The widely used setting, comparing 95% confidence intervals to the national average, resulted in low sensitivity (0.252) and high specificity (0.991). There were large variations in sensitivity and specificity for different requirements of statistical confidence. Lowering statistical confidence improved sensitivity with a relatively smaller loss of specificity. Variations due to different benchmark values were smaller, especially for sensitivity. This allows the choice of a clinically relevant benchmark to be driven by clinical factors without major concerns about sufficiently reliable evidence. The study emphasizes the importance of combining clinical relevance and level of statistical confidence when
Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry.

Science.gov (United States)

Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Sacks, David B; Yu, Yi-Kuo

2018-06-05

Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.
Distinguishing between statistical significance and practical/clinical meaningfulness using statistical inference.

Science.gov (United States)

Wilkinson, Michael

2014-03-01

Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.
Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance

Directory of Open Access Journals (Sweden)

Zhang Zhang

2012-03-01

Full Text Available Abstract Background Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB. Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis. Results Here we propose a novel measure--Codon Deviation Coefficient (CDC--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance. Conclusions As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.
Statistics Refresher for Molecular Imaging Technologists, Part 2: Accuracy of Interpretation, Significance, and Variance.

Science.gov (United States)

Farrell, Mary Beth

2018-06-01

This article is the second part of a continuing education series reviewing basic statistics that nuclear medicine and molecular imaging technologists should understand. In this article, the statistics for evaluating interpretation accuracy, significance, and variance are discussed. Throughout the article, actual statistics are pulled from the published literature. We begin by explaining 2 methods for quantifying interpretive accuracy: interreader and intrareader reliability. Agreement among readers can be expressed simply as a percentage. However, the Cohen κ-statistic is a more robust measure of agreement that accounts for chance. The higher the κ-statistic is, the higher is the agreement between readers. When 3 or more readers are being compared, the Fleiss κ-statistic is used. Significance testing determines whether the difference between 2 conditions or interventions is meaningful. Statistical significance is usually expressed using a number called a probability ( P ) value. Calculation of P value is beyond the scope of this review. However, knowing how to interpret P values is important for understanding the scientific literature. Generally, a P value of less than 0.05 is considered significant and indicates that the results of the experiment are due to more than just chance. Variance, standard deviation (SD), confidence interval, and standard error (SE) explain the dispersion of data around a mean of a sample drawn from a population. SD is commonly reported in the literature. A small SD indicates that there is not much variation in the sample data. Many biologic measurements fall into what is referred to as a normal distribution taking the shape of a bell curve. In a normal distribution, 68% of the data will fall within 1 SD, 95% will fall within 2 SDs, and 99.7% will fall within 3 SDs. Confidence interval defines the range of possible values within which the population parameter is likely to lie and gives an idea of the precision of the statistic being

Systematic reviews of anesthesiologic interventions reported as statistically significant

DEFF Research Database (Denmark)

Imberger, Georgina; Gluud, Christian; Boylan, John

2015-01-01

statistically significant meta-analyses of anesthesiologic interventions, we used TSA to estimate power and imprecision in the context of sparse data and repeated updates. METHODS: We conducted a search to identify all systematic reviews with meta-analyses that investigated an intervention that may......: From 11,870 titles, we found 682 systematic reviews that investigated anesthesiologic interventions. In the 50 sampled meta-analyses, the median number of trials included was 8 (interquartile range [IQR], 5-14), the median number of participants was 964 (IQR, 523-1736), and the median number...
A comparative analysis of the statistical properties of large mobile phone calling networks.

Science.gov (United States)

Li, Ming-Xia; Jiang, Zhi-Qiang; Xie, Wen-Jie; Miccichè, Salvatore; Tumminello, Michele; Zhou, Wei-Xing; Mantegna, Rosario N

2014-05-30

Mobile phone calling is one of the most widely used communication methods in modern society. The records of calls among mobile phone users provide us a valuable proxy for the understanding of human communication patterns embedded in social networks. Mobile phone users call each other forming a directed calling network. If only reciprocal calls are considered, we obtain an undirected mutual calling network. The preferential communication behavior between two connected users can be statistically tested and it results in two Bonferroni networks with statistically validated edges. We perform a comparative analysis of the statistical properties of these four networks, which are constructed from the calling records of more than nine million individuals in Shanghai over a period of 110 days. We find that these networks share many common structural properties and also exhibit idiosyncratic features when compared with previously studied large mobile calling networks. The empirical findings provide us an intriguing picture of a representative large social network that might shed new lights on the modelling of large social networks.
P-Value, a true test of statistical significance? a cautionary note ...

African Journals Online (AJOL)

While it's not the intention of the founders of significance testing and hypothesis testing to have the two ideas intertwined as if they are complementary, the inconvenient marriage of the two practices into one coherent, convenient, incontrovertible and misinterpreted practice has dotted our standard statistics textbooks and ...
Codon Deviation Coefficient: A novel measure for estimating codon usage bias and its statistical significance

KAUST Repository

Zhang, Zhang

2012-03-22

Background: Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis.Results: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.Conclusions: As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions. 2012 Zhang et al; licensee BioMed Central Ltd.
Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.

Science.gov (United States)

Kieffer, Kevin M.; Thompson, Bruce

As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate…
Galaxies distribution in the universe: large-scale statistics and structures

International Nuclear Information System (INIS)

Maurogordato, Sophie

1988-01-01

This research thesis addresses the distribution of galaxies in the Universe, and more particularly large scale statistics and structures. Based on an assessment of the main used statistical techniques, the author outlines the need to develop additional tools to correlation functions in order to characterise the distribution. She introduces a new indicator: the probability of a volume randomly tested in the distribution to be void. This allows a characterisation of void properties at the work scales (until 10h"-"1 Mpc) in the Harvard Smithsonian Center for Astrophysics Redshift Survey, or CfA catalog. A systematic analysis of statistical properties of different sub-samples has then been performed with respect to the size and location, luminosity class, and morphological type. This analysis is then extended to different scenarios of structure formation. A program of radial speed measurements based on observations allows the determination of possible relationships between apparent structures. The author also presents results of the search for south extensions of Perseus supernova [fr
Measuring individual significant change on the Beck Depression Inventory-II through IRT-based statistics.

NARCIS (Netherlands)

Brouwer, D.; Meijer, R.R.; Zevalkink, D.J.

2013-01-01

Several researchers have emphasized that item response theory (IRT)-based methods should be preferred over classical approaches in measuring change for individual patients. In the present study we discuss and evaluate the use of IRT-based statistics to measure statistical significant individual
Quantum probability, choice in large worlds, and the statistical structure of reality.

Science.gov (United States)

Ross, Don; Ladyman, James

2013-06-01

Classical probability models of incentive response are inadequate in "large worlds," where the dimensions of relative risk and the dimensions of similarity in outcome comparisons typically differ. Quantum probability models for choice in large worlds may be motivated pragmatically - there is no third theory - or metaphysically: statistical processing in the brain adapts to the true scale-relative structure of the universe.
The large sample size fallacy.

Science.gov (United States)

Lantz, Björn

2013-06-01

Significance in the statistical sense has little to do with significance in the common practical sense. Statistical significance is a necessary but not a sufficient condition for practical significance. Hence, results that are extremely statistically significant may be highly nonsignificant in practice. The degree of practical significance is generally determined by the size of the observed effect, not the p-value. The results of studies based on large samples are often characterized by extreme statistical significance despite small or even trivial effect sizes. Interpreting such results as significant in practice without further analysis is referred to as the large sample size fallacy in this article. The aim of this article is to explore the relevance of the large sample size fallacy in contemporary nursing research. Relatively few nursing articles display explicit measures of observed effect sizes or include a qualitative discussion of observed effect sizes. Statistical significance is often treated as an end in itself. Effect sizes should generally be calculated and presented along with p-values for statistically significant results, and observed effect sizes should be discussed qualitatively through direct and explicit comparisons with the effects in related literature. © 2012 Nordic College of Caring Science.
Strategies for Testing Statistical and Practical Significance in Detecting DIF with Logistic Regression Models

Science.gov (United States)

Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza

2014-01-01

This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
On the coupling of statistic sum of canonical and large canonical ensemble of interacting particles

International Nuclear Information System (INIS)

Vall, A.N.

2000-01-01

Potentiality of refining the known result based on analytic properties of a great statistical sum, as a function of the absolute activity of the boundary integral contribution into statistical sum, is considered. A strict asymptotic ratio between statistical sums of canonical and large canonical ensemble of interacting particles was derived [ru
Testing the statistical isotropy of large scale structure with multipole vectors

International Nuclear Information System (INIS)

Zunckel, Caroline; Huterer, Dragan; Starkman, Glenn D.

2011-01-01

A fundamental assumption in cosmology is that of statistical isotropy - that the Universe, on average, looks the same in every direction in the sky. Statistical isotropy has recently been tested stringently using cosmic microwave background data, leading to intriguing results on large angular scales. Here we apply some of the same techniques used in the cosmic microwave background to the distribution of galaxies on the sky. Using the multipole vector approach, where each multipole in the harmonic decomposition of galaxy density field is described by unit vectors and an amplitude, we lay out the basic formalism of how to reconstruct the multipole vectors and their statistics out of galaxy survey catalogs. We apply the algorithm to synthetic galaxy maps, and study the sensitivity of the multipole vector reconstruction accuracy to the density, depth, sky coverage, and pixelization of galaxy catalog maps.
Thresholds for statistical and clinical significance in systematic reviews with meta-analytic methods

DEFF Research Database (Denmark)

Jakobsen, Janus Christian; Wetterslev, Jorn; Winkel, Per

2014-01-01

BACKGROUND: Thresholds for statistical significance when assessing meta-analysis results are being insufficiently demonstrated by traditional 95% confidence intervals and P-values. Assessment of intervention effects in systematic reviews with meta-analysis deserves greater rigour. METHODS......: Methodologies for assessing statistical and clinical significance of intervention effects in systematic reviews were considered. Balancing simplicity and comprehensiveness, an operational procedure was developed, based mainly on The Cochrane Collaboration methodology and the Grading of Recommendations...... Assessment, Development, and Evaluation (GRADE) guidelines. RESULTS: We propose an eight-step procedure for better validation of meta-analytic results in systematic reviews (1) Obtain the 95% confidence intervals and the P-values from both fixed-effect and random-effects meta-analyses and report the most...
Elementary methods for statistical systems, mean field, large-n, and duality

International Nuclear Information System (INIS)

Itzykson, C.

1983-01-01

Renormalizable field theories are singled out by such precise restraints that regularization schemes must be used to break these invariances. Statistical methods can be adapted to these problems where asymptotically free models fail. This lecture surveys approximation schemes developed in the context of statistical mechanics. The confluence point of statistical mechanics and field theory is the use of discretized path integrals, where continuous space time has been replaced by a regular lattice. Dynamic variables, a Boltzman weight factor, and boundary conditions are the ingredients. Mean field approximations --field equations, Random field transform, and gauge invariant systems--are surveyed. Under Large-N limits vector models are found to simplify tremendously. The reasons why matrix models drawn from SU (n) gauge theories do not simplify are discussed. In the epilogue, random curves versus random surfaces are offered as an example where global and local symmetries are not alike
Extreme value statistics and thermodynamics of earthquakes. Large earthquakes

Energy Technology Data Exchange (ETDEWEB)

Lavenda, B. [Camerino Univ., Camerino, MC (Italy); Cipollone, E. [ENEA, Centro Ricerche Casaccia, S. Maria di Galeria, RM (Italy). National Centre for Research on Thermodynamics

2000-06-01

A compound Poisson process is used to derive a new shape parameter which can be used to discriminate between large earthquakes and aftershocks sequences. Sample exceedance distributions of large earthquakes are fitted to the Pareto tail and the actual distribution of the maximum to the Frechet distribution, while the sample distribution of aftershocks are fitted to a Beta distribution and the distribution of the minimum to the Weibull distribution for the smallest value. The transition between initial sample distributions and asymptotic extreme value distributions show that self-similar power laws are transformed into non scaling exponential distributions so that neither self-similarity nor the Gutenberg-Richter law can be considered universal. The energy-magnitude transformation converts the Frechet distribution into the Gumbel distribution, originally proposed by Epstein and Lomnitz, and not the Gompertz distribution as in the Lomnitz-Adler and Lomnitz generalization of the Gutenberg-Richter law. Numerical comparison is made with the Lomnitz-Adler and Lomnitz analysis using the same catalogue of Chinese earthquakes. An analogy is drawn between large earthquakes and high energy particle physics. A generalized equation of state is used to transform the Gamma density into the order-statistic Frechet distribution. Earthquake temperature and volume are determined as functions of the energy. Large insurance claims based on the Pareto distribution, which does not have a right endpoint, show why there cannot be a maximum earthquake energy.
Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains.

Science.gov (United States)

Xia, Li C; Ai, Dongmei; Cram, Jacob A; Liang, Xiaoyi; Fuhrman, Jed A; Sun, Fengzhu

2015-09-21

Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.
Statistical characterization of a large geochemical database and effect of sample size

Science.gov (United States)

Zhang, C.; Manheim, F.T.; Hinde, J.; Grossman, J.N.

2005-01-01

The authors investigated statistical distributions for concentrations of chemical elements from the National Geochemical Survey (NGS) database of the U.S. Geological Survey. At the time of this study, the NGS data set encompasses 48,544 stream sediment and soil samples from the conterminous United States analyzed by ICP-AES following a 4-acid near-total digestion. This report includes 27 elements: Al, Ca, Fe, K, Mg, Na, P, Ti, Ba, Ce, Co, Cr, Cu, Ga, La, Li, Mn, Nb, Nd, Ni, Pb, Sc, Sr, Th, V, Y and Zn. The goal and challenge for the statistical overview was to delineate chemical distributions in a complex, heterogeneous data set spanning a large geographic range (the conterminous United States), and many different geological provinces and rock types. After declustering to create a uniform spatial sample distribution with 16,511 samples, histograms and quantile-quantile (Q-Q) plots were employed to delineate subpopulations that have coherent chemical and mineral affinities. Probability groupings are discerned by changes in slope (kinks) on the plots. Major rock-forming elements, e.g., Al, Ca, K and Na, tend to display linear segments on normal Q-Q plots. These segments can commonly be linked to petrologic or mineralogical associations. For example, linear segments on K and Na plots reflect dilution of clay minerals by quartz sand (low in K and Na). Minor and trace element relationships are best displayed on lognormal Q-Q plots. These sensitively reflect discrete relationships in subpopulations within the wide range of the data. For example, small but distinctly log-linear subpopulations for Pb, Cu, Zn and Ag are interpreted to represent ore-grade enrichment of naturally occurring minerals such as sulfides. None of the 27 chemical elements could pass the test for either normal or lognormal distribution on the declustered data set. Part of the reasons relate to the presence of mixtures of subpopulations and outliers. Random samples of the data set with successively
Extreme value statistics and thermodynamics of earthquakes: large earthquakes

Directory of Open Access Journals (Sweden)

B. H. Lavenda

2000-06-01

Full Text Available A compound Poisson process is used to derive a new shape parameter which can be used to discriminate between large earthquakes and aftershock sequences. Sample exceedance distributions of large earthquakes are fitted to the Pareto tail and the actual distribution of the maximum to the Fréchet distribution, while the sample distribution of aftershocks are fitted to a Beta distribution and the distribution of the minimum to the Weibull distribution for the smallest value. The transition between initial sample distributions and asymptotic extreme value distributions shows that self-similar power laws are transformed into nonscaling exponential distributions so that neither self-similarity nor the Gutenberg-Richter law can be considered universal. The energy-magnitude transformation converts the Fréchet distribution into the Gumbel distribution, originally proposed by Epstein and Lomnitz, and not the Gompertz distribution as in the Lomnitz-Adler and Lomnitz generalization of the Gutenberg-Richter law. Numerical comparison is made with the Lomnitz-Adler and Lomnitz analysis using the same Catalogue of Chinese Earthquakes. An analogy is drawn between large earthquakes and high energy particle physics. A generalized equation of state is used to transform the Gamma density into the order-statistic Fréchet distribution. Earthquaketemperature and volume are determined as functions of the energy. Large insurance claims based on the Pareto distribution, which does not have a right endpoint, show why there cannot be a maximum earthquake energy.
Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets

KAUST Repository

Sun, Ying; Stein, Michael L.

2014-01-01

For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.
Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets

KAUST Repository

Sun, Ying

2014-11-07

For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.

Recent Literature on Whether Statistical Significance Tests Should or Should Not Be Banned.

Science.gov (United States)

Deegear, James

This paper summarizes the literature regarding statistical significant testing with an emphasis on recent literature in various discipline and literature exploring why researchers have demonstrably failed to be influenced by the American Psychological Association publication manual's encouragement to report effect sizes. Also considered are…
Characterization and potential functional significance of human-chimpanzee large INDEL variation

Directory of Open Access Journals (Sweden)

Polavarapu Nalini

2011-10-01

Full Text Available Abstract Background Although humans and chimpanzees have accumulated significant differences in a number of phenotypic traits since diverging from a common ancestor about six million years ago, their genomes are more than 98.5% identical at protein-coding loci. This modest degree of nucleotide divergence is not sufficient to explain the extensive phenotypic differences between the two species. It has been hypothesized that the genetic basis of the phenotypic differences lies at the level of gene regulation and is associated with the extensive insertion and deletion (INDEL variation between the two species. To test the hypothesis that large INDELs (80 to 12,000 bp may have contributed significantly to differences in gene regulation between the two species, we categorized human-chimpanzee INDEL variation mapping in or around genes and determined whether this variation is significantly correlated with previously determined differences in gene expression. Results Extensive, large INDEL variation exists between the human and chimpanzee genomes. This variation is primarily attributable to retrotransposon insertions within the human lineage. There is a significant correlation between differences in gene expression and large human-chimpanzee INDEL variation mapping in genes or in proximity to them. Conclusions The results presented herein are consistent with the hypothesis that large INDELs, particularly those associated with retrotransposons, have played a significant role in human-chimpanzee regulatory evolution.
Gene coexpression measures in large heterogeneous samples using count statistics.

Science.gov (United States)

Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan

2014-11-18

With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.
Statistical analyses of digital collections: Using a large corpus of systematic reviews to study non-citations

DEFF Research Database (Denmark)

Frandsen, Tove Faber; Nicolaisen, Jeppe

2017-01-01

Using statistical methods to analyse digital material for patterns makes it possible to detect patterns in big data that we would otherwise not be able to detect. This paper seeks to exemplify this fact by statistically analysing a large corpus of references in systematic reviews. The aim...
Statistical Analysis of Large Simulated Yield Datasets for Studying Climate Effects

Science.gov (United States)

Makowski, David; Asseng, Senthold; Ewert, Frank; Bassu, Simona; Durand, Jean-Louis; Martre, Pierre; Adam, Myriam; Aggarwal, Pramod K.; Angulo, Carlos; Baron, Chritian;

2015-01-01

Many studies have been carried out during the last decade to study the effect of climate change on crop yields and other key crop characteristics. In these studies, one or several crop models were used to simulate crop growth and development for different climate scenarios that correspond to different projections of atmospheric CO2 concentration, temperature, and rainfall changes (Semenov et al., 1996; Tubiello and Ewert, 2002; White et al., 2011). The Agricultural Model Intercomparison and Improvement Project (AgMIP; Rosenzweig et al., 2013) builds on these studies with the goal of using an ensemble of multiple crop models in order to assess effects of climate change scenarios for several crops in contrasting environments. These studies generate large datasets, including thousands of simulated crop yield data. They include series of yield values obtained by combining several crop models with different climate scenarios that are defined by several climatic variables (temperature, CO2, rainfall, etc.). Such datasets potentially provide useful information on the possible effects of different climate change scenarios on crop yields. However, it is sometimes difficult to analyze these datasets and to summarize them in a useful way due to their structural complexity; simulated yield data can differ among contrasting climate scenarios, sites, and crop models. Another issue is that it is not straightforward to extrapolate the results obtained for the scenarios to alternative climate change scenarios not initially included in the simulation protocols. Additional dynamic crop model simulations for new climate change scenarios are an option but this approach is costly, especially when a large number of crop models are used to generate the simulated data, as in AgMIP. Statistical models have been used to analyze responses of measured yield data to climate variables in past studies (Lobell et al., 2011), but the use of a statistical model to analyze yields simulated by complex

Examining reproducibility in psychology : A hybrid method for combining a statistically significant original study and a replication

NARCIS (Netherlands)

Van Aert, R.C.M.; Van Assen, M.A.L.M.

2018-01-01

The unrealistically high rate of positive results within psychology has increased the attention to replication research. However, researchers who conduct a replication and want to statistically combine the results of their replication with a statistically significant original study encounter
Ship detection using STFT sea background statistical modeling for large-scale oceansat remote sensing image

Science.gov (United States)

Wang, Lixia; Pei, Jihong; Xie, Weixin; Liu, Jinyuan

2018-03-01

Large-scale oceansat remote sensing images cover a big area sea surface, which fluctuation can be considered as a non-stationary process. Short-Time Fourier Transform (STFT) is a suitable analysis tool for the time varying nonstationary signal. In this paper, a novel ship detection method using 2-D STFT sea background statistical modeling for large-scale oceansat remote sensing images is proposed. First, the paper divides the large-scale oceansat remote sensing image into small sub-blocks, and 2-D STFT is applied to each sub-block individually. Second, the 2-D STFT spectrum of sub-blocks is studied and the obvious different characteristic between sea background and non-sea background is found. Finally, the statistical model for all valid frequency points in the STFT spectrum of sea background is given, and the ship detection method based on the 2-D STFT spectrum modeling is proposed. The experimental result shows that the proposed algorithm can detect ship targets with high recall rate and low missing rate.
Significance evaluation in factor graphs

DEFF Research Database (Denmark)

Madsen, Tobias; Hobolth, Asger; Jensen, Jens Ledet

2017-01-01

in genomics and the multiple-testing issues accompanying them, accurate significance evaluation is of great importance. We here address the problem of evaluating statistical significance of observations from factor graph models. Results Two novel numerical approximations for evaluation of statistical...... significance are presented. First a method using importance sampling. Second a saddlepoint approximation based method. We develop algorithms to efficiently compute the approximations and compare them to naive sampling and the normal approximation. The individual merits of the methods are analysed both from....... Conclusions The applicability of saddlepoint approximation and importance sampling is demonstrated on known models in the factor graph framework. Using the two methods we can substantially improve computational cost without compromising accuracy. This contribution allows analyses of large datasets...
Statistical Modeling of Large-Scale Signal Path Loss in Underwater Acoustic Networks

Directory of Open Access Journals (Sweden)

Manuel Perez Malumbres

2013-02-01

Full Text Available In an underwater acoustic channel, the propagation conditions are known to vary in time, causing the deviation of the received signal strength from the nominal value predicted by a deterministic propagation model. To facilitate a large-scale system design in such conditions (e.g., power allocation, we have developed a statistical propagation model in which the transmission loss is treated as a random variable. By applying repetitive computation to the acoustic field, using ray tracing for a set of varying environmental conditions (surface height, wave activity, small node displacements around nominal locations, etc., an ensemble of transmission losses is compiled and later used to infer the statistical model parameters. A reasonable agreement is found with log-normal distribution, whose mean obeys a log-distance increases, and whose variance appears to be constant for a certain range of inter-node distances in a given deployment location. The statistical model is deemed useful for higher-level system planning, where simulation is needed to assess the performance of candidate network protocols under various resource allocation policies, i.e., to determine the transmit power and bandwidth allocation necessary to achieve a desired level of performance (connectivity, throughput, reliability, etc..
Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations

Energy Technology Data Exchange (ETDEWEB)

Kleijnen, J.P.C.; Helton, J.C.

1999-04-01

The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are considered for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.
Statistical Analysis of Big Data on Pharmacogenomics

Science.gov (United States)

Fan, Jianqing; Liu, Han

2013-01-01

This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905
Statistical distribution of the local purity in a large quantum system

International Nuclear Information System (INIS)

De Pasquale, A; Pascazio, S; Facchi, P; Giovannetti, V; Parisi, G; Scardicchio, A

2012-01-01

The local purity of large many-body quantum systems can be studied by following a statistical mechanical approach based on a random matrix model. Restricting the analysis to the case of global pure states, this method proved to be successful, and a full characterization of the statistical properties of the local purity was obtained by computing the partition function of the problem. Here we generalize these techniques to the case of global mixed states. In this context, by uniformly sampling the phase space of states with assigned global mixedness, we determine the exact expression of the first two moments of the local purity and a general expression for the moments of higher order. This generalizes previous results obtained for globally pure configurations. Furthermore, through the introduction of a partition function for a suitable canonical ensemble, we compute the approximate expression of the first moment of the marginal purity in the high-temperature regime. In the process, we establish a formal connection with the theory of quantum twirling maps that provides an alternative, possibly fruitful, way of performing the calculation. (paper)
Statistical significance estimation of a signal within the GooFit framework on GPUs

Directory of Open Access Journals (Sweden)

Cristella Leonardo

2017-01-01

Full Text Available In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B+ → J/ψϕK+. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.
Data management in large-scale collaborative toxicity studies: how to file experimental data for automated statistical analysis.

Science.gov (United States)

Stanzel, Sven; Weimer, Marc; Kopp-Schneider, Annette

2013-06-01

High-throughput screening approaches are carried out for the toxicity assessment of a large number of chemical compounds. In such large-scale in vitro toxicity studies several hundred or thousand concentration-response experiments are conducted. The automated evaluation of concentration-response data using statistical analysis scripts saves time and yields more consistent results in comparison to data analysis performed by the use of menu-driven statistical software. Automated statistical analysis requires that concentration-response data are available in a standardised data format across all compounds. To obtain consistent data formats, a standardised data management workflow must be established, including guidelines for data storage, data handling and data extraction. In this paper two procedures for data management within large-scale toxicological projects are proposed. Both procedures are based on Microsoft Excel files as the researcher's primary data format and use a computer programme to automate the handling of data files. The first procedure assumes that data collection has not yet started whereas the second procedure can be used when data files already exist. Successful implementation of the two approaches into the European project ACuteTox is illustrated. Copyright © 2012 Elsevier Ltd. All rights reserved.
Gentile statistics with a large maximum occupation number

International Nuclear Information System (INIS)

Dai Wusheng; Xie Mi

2004-01-01

In Gentile statistics the maximum occupation number can take on unrestricted integers: 1 1 the Bose-Einstein case is not recovered from Gentile statistics as n goes to N. Attention is also concentrated on the contribution of the ground state which was ignored in related literature. The thermodynamic behavior of a ν-dimensional Gentile ideal gas of particle of dispersion E=p s /2m, where ν and s are arbitrary, is analyzed in detail. Moreover, we provide an alternative derivation of the partition function for Gentile statistics
Scalar energy fluctuations in Large-Eddy Simulation of turbulent flames: Statistical budgets and mesh quality criterion

Energy Technology Data Exchange (ETDEWEB)

Vervisch, Luc; Domingo, Pascale; Lodato, Guido [CORIA - CNRS and INSA de Rouen, Technopole du Madrillet, BP 8, 76801 Saint-Etienne-du-Rouvray (France); Veynante, Denis [EM2C - CNRS and Ecole Centrale Paris, Grande Voie des Vignes, 92295 Chatenay-Malabry (France)

2010-04-15

Large-Eddy Simulation (LES) provides space-filtered quantities to compare with measurements, which usually have been obtained using a different filtering operation; hence, numerical and experimental results can be examined side-by-side in a statistical sense only. Instantaneous, space-filtered and statistically time-averaged signals feature different characteristic length-scales, which can be combined in dimensionless ratios. From two canonical manufactured turbulent solutions, a turbulent flame and a passive scalar turbulent mixing layer, the critical values of these ratios under which measured and computed variances (resolved plus sub-grid scale) can be compared without resorting to additional residual terms are first determined. It is shown that actual Direct Numerical Simulation can hardly accommodate a sufficiently large range of length-scales to perform statistical studies of LES filtered reactive scalar-fields energy budget based on sub-grid scale variances; an estimation of the minimum Reynolds number allowing for such DNS studies is given. From these developments, a reliability mesh criterion emerges for scalar LES and scaling for scalar sub-grid scale energy is discussed. (author)
Notes on the Implementation of Non-Parametric Statistics within the Westinghouse Realistic Large Break LOCA Evaluation Model (ASTRUM)

International Nuclear Information System (INIS)

Frepoli, Cesare; Oriani, Luca

2006-01-01

In recent years, non-parametric or order statistics methods have been widely used to assess the impact of the uncertainties within Best-Estimate LOCA evaluation models. The bounding of the uncertainties is achieved with a direct Monte Carlo sampling of the uncertainty attributes, with the minimum trial number selected to 'stabilize' the estimation of the critical output values (peak cladding temperature (PCT), local maximum oxidation (LMO), and core-wide oxidation (CWO A non-parametric order statistics uncertainty analysis was recently implemented within the Westinghouse Realistic Large Break LOCA evaluation model, also referred to as 'Automated Statistical Treatment of Uncertainty Method' (ASTRUM). The implementation or interpretation of order statistics in safety analysis is not fully consistent within the industry. This has led to an extensive public debate among regulators and researchers which can be found in the open literature. The USNRC-approved Westinghouse method follows a rigorous implementation of the order statistics theory, which leads to the execution of 124 simulations within a Large Break LOCA analysis. This is a solid approach which guarantees that a bounding value (at 95% probability) of the 95 th percentile for each of the three 10 CFR 50.46 ECCS design acceptance criteria (PCT, LMO and CWO) is obtained. The objective of this paper is to provide additional insights on the ASTRUM statistical approach, with a more in-depth analysis of pros and cons of the order statistics and of the Westinghouse approach in the implementation of this statistical methodology. (authors)
Is statistical significance clinically important?--A guide to judge the clinical relevance of study findings

NARCIS (Netherlands)

Sierevelt, Inger N.; van Oldenrijk, Jakob; Poolman, Rudolf W.

2007-01-01

In this paper we describe several issues that influence the reporting of statistical significance in relation to clinical importance, since misinterpretation of p values is a common issue in orthopaedic literature. Orthopaedic research is tormented by the risks of false-positive (type I error) and
Statistical methods for including two-body forces in large system calculations

International Nuclear Information System (INIS)

Grimes, S.M.

1980-07-01

Large systems of interacting particles are often treated by assuming that the effect on any one particle of the remaining N-1 may be approximated by an average potential. This approach reduces the problem to that of finding the bound-state solutions for a particle in a potential; statistical mechanics is then used to obtain the properties of the many-body system. In some physical systems this approach may not be acceptable, because the two-body force component cannot be treated in this one-body limit. A technique for incorporating two-body forces in such calculations in a more realistic fashion is described. 1 figure
Statistical significance of theoretical predictions: A new dimension in nuclear structure theories (I)

International Nuclear Information System (INIS)

DUDEK, J; SZPAK, B; FORNAL, B; PORQUET, M-G

2011-01-01

In this and the follow-up article we briefly discuss what we believe represents one of the most serious problems in contemporary nuclear structure: the question of statistical significance of parametrizations of nuclear microscopic Hamiltonians and the implied predictive power of the underlying theories. In the present Part I, we introduce the main lines of reasoning of the so-called Inverse Problem Theory, an important sub-field in the contemporary Applied Mathematics, here illustrated on the example of the Nuclear Mean-Field Approach.

Significance of Operating Environment in Condition Monitoring of Large Civil Structures

OpenAIRE

Alampalli, Sreenivas

1999-01-01

Success of remote long-term condition monitoring of large civil structures and developing calibrated analytical models for damage detection, depend significantly on establishing accurate baseline signatures and their sensitivity. Most studies reported in the literature concentrated on the effect of structural damage on modal parameters without emphasis on reliability of modal parameters. Thus, a field bridge structure was studied for the significance of operating conditions in relation to bas...
Statistical Significance of the Contribution of Variables to the PCA Solution: An Alternative Permutation Strategy

Science.gov (United States)

Linting, Marielle; van Os, Bart Jan; Meulman, Jacqueline J.

2011-01-01

In this paper, the statistical significance of the contribution of variables to the principal components in principal components analysis (PCA) is assessed nonparametrically by the use of permutation tests. We compare a new strategy to a strategy used in previous research consisting of permuting the columns (variables) of a data matrix…
Large-eddy simulation in a mixing tee junction: High-order turbulent statistics analysis

International Nuclear Information System (INIS)

Howard, Richard J.A.; Serre, Eric

2015-01-01

Highlights: • Mixing and thermal fluctuations in a junction are studied using large eddy simulation. • Adiabatic and conducting steel wall boundaries are tested. • Wall thermal fluctuations are not the same between the flow and the solid. • Solid thermal fluctuations cannot be predicted from the fluid thermal fluctuations. • High-order turbulent statistics show that the turbulent transport term is important. - Abstract: This study analyses the mixing and thermal fluctuations induced in a mixing tee junction with circular cross-sections when cold water flowing in a pipe is joined by hot water from a branch pipe. This configuration is representative of industrial piping systems in which temperature fluctuations in the fluid may cause thermal fatigue damage on the walls. Implicit large-eddy simulations (LES) are performed for equal inflow rates corresponding to a bulk Reynolds number Re = 39,080. Two different thermal boundary conditions are studied for the pipe walls; an insulating adiabatic boundary and a conducting steel wall boundary. The predicted flow structures show a satisfactory agreement with the literature. The velocity and thermal fields (including high-order statistics) are not affected by the heat transfer with the steel walls. However, predicted thermal fluctuations at the boundary are not the same between the flow and the solid, showing that solid thermal fluctuations cannot be predicted by the knowledge of the fluid thermal fluctuations alone. The analysis of high-order turbulent statistics provides a better understanding of the turbulence features. In particular, the budgets of the turbulent kinetic energy and temperature variance allows a comparative analysis of dissipation, production and transport terms. It is found that the turbulent transport term is an important term that acts to balance the production. We therefore use a priori tests to evaluate three different models for the triple correlation
Solving Large-Scale Computational Problems Using Insights from Statistical Physics

Energy Technology Data Exchange (ETDEWEB)

Selman, Bart [Cornell University

2012-02-29

Many challenging problems in computer science and related fields can be formulated as constraint satisfaction problems. Such problems consist of a set of discrete variables and a set of constraints between those variables, and represent a general class of so-called NP-complete problems. The goal is to find a value assignment to the variables that satisfies all constraints, generally requiring a search through and exponentially large space of variable-value assignments. Models for disordered systems, as studied in statistical physics, can provide important new insights into the nature of constraint satisfaction problems. Recently, work in this area has resulted in the discovery of a new method for solving such problems, called the survey propagation (SP) method. With SP, we can solve problems with millions of variables and constraints, an improvement of two orders of magnitude over previous methods.
A Note on Comparing the Power of Test Statistics at Low Significance Levels.

Science.gov (United States)

Morris, Nathan; Elston, Robert

2011-01-01

It is an obvious fact that the power of a test statistic is dependent upon the significance (alpha) level at which the test is performed. It is perhaps a less obvious fact that the relative performance of two statistics in terms of power is also a function of the alpha level. Through numerous personal discussions, we have noted that even some competent statisticians have the mistaken intuition that relative power comparisons at traditional levels such as α = 0.05 will be roughly similar to relative power comparisons at very low levels, such as the level α = 5 × 10 -8 , which is commonly used in genome-wide association studies. In this brief note, we demonstrate that this notion is in fact quite wrong, especially with respect to comparing tests with differing degrees of freedom. In fact, at very low alpha levels the cost of additional degrees of freedom is often comparatively low. Thus we recommend that statisticians exercise caution when interpreting the results of power comparison studies which use alpha levels that will not be used in practice.
The Impact of a Flipped Classroom Model of Learning on a Large Undergraduate Statistics Class

Science.gov (United States)

Nielson, Perpetua Lynne; Bean, Nathan William Bean; Larsen, Ross Allen Andrew

2018-01-01

We examine the impact of a flipped classroom model of learning on student performance and satisfaction in a large undergraduate introductory statistics class. Two professors each taught a lecture-section and a flipped-class section. Using MANCOVA, a linear combination of final exam scores, average quiz scores, and course ratings was compared for…
Dark matter statistics for large galaxy catalogs: power spectra and covariance matrices

Science.gov (United States)

Klypin, Anatoly; Prada, Francisco

2018-06-01

Large-scale surveys of galaxies require accurate theoretical predictions of the dark matter clustering for thousands of mock galaxy catalogs. We demonstrate that this goal can be achieve with the new Parallel Particle-Mesh (PM) N-body code GLAM at a very low computational cost. We run ˜22, 000 simulations with ˜2 billion particles that provide ˜1% accuracy of the dark matter power spectra P(k) for wave-numbers up to k ˜ 1hMpc-1. Using this large data-set we study the power spectrum covariance matrix. In contrast to many previous analytical and numerical results, we find that the covariance matrix normalised to the power spectrum C(k, k΄)/P(k)P(k΄) has a complex structure of non-diagonal components: an upturn at small k, followed by a minimum at k ≈ 0.1 - 0.2 hMpc-1, and a maximum at k ≈ 0.5 - 0.6 hMpc-1. The normalised covariance matrix strongly evolves with redshift: C(k, k΄)∝δα(t)P(k)P(k΄), where δ is the linear growth factor and α ≈ 1 - 1.25, which indicates that the covariance matrix depends on cosmological parameters. We also show that waves longer than 1h-1Gpc have very little impact on the power spectrum and covariance matrix. This significantly reduces the computational costs and complexity of theoretical predictions: relatively small volume ˜(1h-1Gpc)3 simulations capture the necessary properties of dark matter clustering statistics. As our results also indicate, achieving ˜1% errors in the covariance matrix for k < 0.50 hMpc-1 requires a resolution better than ɛ ˜ 0.5h-1Mpc.
ClusterSignificance: A bioconductor package facilitating statistical analysis of class cluster separations in dimensionality reduced data

DEFF Research Database (Denmark)

Serviss, Jason T.; Gådin, Jesper R.; Eriksson, Per

2017-01-01

, e.g. genes in a specific pathway, alone can separate samples into these established classes. Despite this, the evaluation of class separations is often subjective and performed via visualization. Here we present the ClusterSignificance package; a set of tools designed to assess the statistical...... significance of class separations downstream of dimensionality reduction algorithms. In addition, we demonstrate the design and utility of the ClusterSignificance package and utilize it to determine the importance of long non-coding RNA expression in the identity of multiple hematological malignancies....
Statistical significance versus clinical importance: trials on exercise therapy for chronic low back pain as example.

NARCIS (Netherlands)

van Tulder, M.W.; Malmivaara, A.; Hayden, J.; Koes, B.

2007-01-01

STUDY DESIGN. Critical appraisal of the literature. OBJECIVES. The objective of this study was to assess if results of back pain trials are statistically significant and clinically important. SUMMARY OF BACKGROUND DATA. There seems to be a discrepancy between conclusions reported by authors and
Effects of baryons on the statistical properties of large scale structure of the Universe

International Nuclear Information System (INIS)

Guillet, T.

2010-01-01

Observations of weak gravitational lensing will provide strong constraints on the cosmic expansion history and the growth rate of large scale structure, yielding clues to the properties and nature of dark energy. Their interpretation is impacted by baryonic physics, which are expected to modify the total matter distribution at small scales. My work has focused on determining and modeling the impact of baryons on the statistics of the large scale matter distribution in the Universe. Using numerical simulations, I have extracted the effect of baryons on the power spectrum, variance and skewness of the total density field as predicted by these simulations. I have shown that a model based on the halo model construction, featuring a concentrated central component to account for cool condensed baryons, is able to reproduce accurately, and down to very small scales, the measured amplifications of both the variance and skewness of the density field. Because of well-known issues with baryons in current cosmological simulations, I have extended the central component model to rely on as many observation-based ingredients as possible. As an application, I have studied the effect of baryons on the predictions of the upcoming Euclid weak lensing survey. During the course of this work, I have also worked at developing and extending the RAMSES code, in particular by developing a parallel self-gravity solver, which offers significant performance gains, in particular for the simulation of some astrophysical setups such as isolated galaxy or cluster simulations. (author) [fr
Basics of statistical physics

CERN Document Server

Müller-Kirsten, Harald J W

2013-01-01

Statistics links microscopic and macroscopic phenomena, and requires for this reason a large number of microscopic elements like atoms. The results are values of maximum probability or of averaging. This introduction to statistical physics concentrates on the basic principles, and attempts to explain these in simple terms supplemented by numerous examples. These basic principles include the difference between classical and quantum statistics, a priori probabilities as related to degeneracies, the vital aspect of indistinguishability as compared with distinguishability in classical physics, the differences between conserved and non-conserved elements, the different ways of counting arrangements in the three statistics (Maxwell-Boltzmann, Fermi-Dirac, Bose-Einstein), the difference between maximization of the number of arrangements of elements, and averaging in the Darwin-Fowler method. Significant applications to solids, radiation and electrons in metals are treated in separate chapters, as well as Bose-Eins...
Remote sensing estimation of the total phosphorus concentration in a large lake using band combinations and regional multivariate statistical modeling techniques.

Science.gov (United States)

Gao, Yongnian; Gao, Junfeng; Yin, Hongbin; Liu, Chuansheng; Xia, Ting; Wang, Jing; Huang, Qi

2015-03-15

Remote sensing has been widely used for ater quality monitoring, but most of these monitoring studies have only focused on a few water quality variables, such as chlorophyll-a, turbidity, and total suspended solids, which have typically been considered optically active variables. Remote sensing presents a challenge in estimating the phosphorus concentration in water. The total phosphorus (TP) in lakes has been estimated from remotely sensed observations, primarily using the simple individual band ratio or their natural logarithm and the statistical regression method based on the field TP data and the spectral reflectance. In this study, we investigated the possibility of establishing a spatial modeling scheme to estimate the TP concentration of a large lake from multi-spectral satellite imagery using band combinations and regional multivariate statistical modeling techniques, and we tested the applicability of the spatial modeling scheme. The results showed that HJ-1A CCD multi-spectral satellite imagery can be used to estimate the TP concentration in a lake. The correlation and regression analysis showed a highly significant positive relationship between the TP concentration and certain remotely sensed combination variables. The proposed modeling scheme had a higher accuracy for the TP concentration estimation in the large lake compared with the traditional individual band ratio method and the whole-lake scale regression-modeling scheme. The TP concentration values showed a clear spatial variability and were high in western Lake Chaohu and relatively low in eastern Lake Chaohu. The northernmost portion, the northeastern coastal zone and the southeastern portion of western Lake Chaohu had the highest TP concentrations, and the other regions had the lowest TP concentration values, except for the coastal zone of eastern Lake Chaohu. These results strongly suggested that the proposed modeling scheme, i.e., the band combinations and the regional multivariate
Predictability of the recent slowdown and subsequent recovery of large-scale surface warming using statistical methods

Science.gov (United States)

Mann, Michael E.; Steinman, Byron A.; Miller, Sonya K.; Frankcombe, Leela M.; England, Matthew H.; Cheung, Anson H.

2016-04-01

The temporary slowdown in large-scale surface warming during the early 2000s has been attributed to both external and internal sources of climate variability. Using semiempirical estimates of the internal low-frequency variability component in Northern Hemisphere, Atlantic, and Pacific surface temperatures in concert with statistical hindcast experiments, we investigate whether the slowdown and its recent recovery were predictable. We conclude that the internal variability of the North Pacific, which played a critical role in the slowdown, does not appear to have been predictable using statistical forecast methods. An additional minor contribution from the North Atlantic, by contrast, appears to exhibit some predictability. While our analyses focus on combining semiempirical estimates of internal climatic variability with statistical hindcast experiments, possible implications for initialized model predictions are also discussed.
Transport Coefficients from Large Deviation Functions

Directory of Open Access Journals (Sweden)

Chloe Ya Gao

2017-10-01

Full Text Available We describe a method for computing transport coefficients from the direct evaluation of large deviation functions. This method is general, relying on only equilibrium fluctuations, and is statistically efficient, employing trajectory based importance sampling. Equilibrium fluctuations of molecular currents are characterized by their large deviation functions, which are scaled cumulant generating functions analogous to the free energies. A diffusion Monte Carlo algorithm is used to evaluate the large deviation functions, from which arbitrary transport coefficients are derivable. We find significant statistical improvement over traditional Green–Kubo based calculations. The systematic and statistical errors of this method are analyzed in the context of specific transport coefficient calculations, including the shear viscosity, interfacial friction coefficient, and thermal conductivity.
Transport Coefficients from Large Deviation Functions

Science.gov (United States)

Gao, Chloe; Limmer, David

2017-10-01

We describe a method for computing transport coefficients from the direct evaluation of large deviation function. This method is general, relying on only equilibrium fluctuations, and is statistically efficient, employing trajectory based importance sampling. Equilibrium fluctuations of molecular currents are characterized by their large deviation functions, which is a scaled cumulant generating function analogous to the free energy. A diffusion Monte Carlo algorithm is used to evaluate the large deviation functions, from which arbitrary transport coefficients are derivable. We find significant statistical improvement over traditional Green-Kubo based calculations. The systematic and statistical errors of this method are analyzed in the context of specific transport coefficient calculations, including the shear viscosity, interfacial friction coefficient, and thermal conductivity.
Indirectional statistics and the significance of an asymmetry discovered by Birch

International Nuclear Information System (INIS)

Kendall, D.G.; Young, G.A.

1984-01-01

Birch (1982, Nature, 298, 451) reported an apparent 'statistical asymmetry of the Universe'. The authors here develop 'indirectional analysis' as a technique for investigating statistical effects of this kind and conclude that the reported effect (whatever may be its origin) is strongly supported by the observations. The estimated pole of the asymmetry is at RA 13h 30m, Dec. -37deg. The angular error in its estimation is unlikely to exceed 20-30deg. (author)
Statistical Modeling of Large Wind Plant System's Generation - A Case Study

International Nuclear Information System (INIS)

Sabolic, D.

2014-01-01

This paper presents simplistic, yet very accurate, descriptive statistical models of various static and dynamic parameters of energy output from a large system of wind plants operated by Bonneville Power Administration (BPA), USA. The system's size at the end of 2013 was 4515 MW of installed capacity. The 5-minute readings from the beginning of 2007 to the end of 2013, recorded and published by BPA, were used to derive a number of experimental distributions, which were then used to devise theoretic statistical models with merely one or two parameters. In spite of the simplicity, they reproduced experimental data with great accuracy, which was checked by rigorous tests of goodness-of-fit. Statistical distribution functions were obtained for the following wind generation-related quantities: total generation as percentage of total installed capacity; change in total generation power in 5, 10, 15, 20, 25, 30, 45, and 60 minutes as percentage of total installed capacity; duration of intervals with total generated power, expressed as percentage of total installed capacity, lower than certain pre-specified level. Limitation of total installed wind plant capacity, when it is determined by regulation demand from wind plants, is discussed, too. The models presented here can be utilized in analyses related to power system economics/policy, which is also briefly discussed in the paper. (author).
Understanding Statistics and Statistics Education: A Chinese Perspective

Science.gov (United States)

Shi, Ning-Zhong; He, Xuming; Tao, Jian

2009-01-01

In recent years, statistics education in China has made great strides. However, there still exists a fairly large gap with the advanced levels of statistics education in more developed countries. In this paper, we identify some existing problems in statistics education in Chinese schools and make some proposals as to how they may be overcome. We…
Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data.

Science.gov (United States)

Kim, Sung-Min; Choi, Yosoon

2017-06-18

To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z -score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z -scores: high content with a high z -score (HH), high content with a low z -score (HL), low content with a high z -score (LH), and low content with a low z -score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1-4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.
Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data

Directory of Open Access Journals (Sweden)

Sung-Min Kim

2017-06-01

Full Text Available To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z-score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z-scores: high content with a high z-score (HH, high content with a low z-score (HL, low content with a high z-score (LH, and low content with a low z-score (LL. The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1–4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.

Significance of Operating Environment in Condition Monitoring of Large Civil Structures

Directory of Open Access Journals (Sweden)

Sreenivas Alampalli

1999-01-01

Full Text Available Success of remote long-term condition monitoring of large civil structures and developing calibrated analytical models for damage detection, depend significantly on establishing accurate baseline signatures and their sensitivity. Most studies reported in the literature concentrated on the effect of structural damage on modal parameters without emphasis on reliability of modal parameters. Thus, a field bridge structure was studied for the significance of operating conditions in relation to baseline signatures. Results indicate that in practice, civil structures should be monitored for at least one full cycle of in-service environmental changes before establishing baselines for condition monitoring or calibrating finite-element models. Boundary conditions deserve special attention.
Applying Statistical Mechanics to pixel detectors

International Nuclear Information System (INIS)

Pindo, Massimiliano

2002-01-01

Pixel detectors, being made of a large number of active cells of the same kind, can be considered as significant sets to which Statistical Mechanics variables and methods can be applied. By properly redefining well known statistical parameters in order to let them match the ones that actually characterize pixel detectors, an analysis of the way they work can be performed in a totally new perspective. A deeper understanding of pixel detectors is attained, helping in the evaluation and comparison of their intrinsic characteristics and performance
Evaluation of significantly modified water bodies in Vojvodina by using multivariate statistical techniques

Directory of Open Access Journals (Sweden)

Vujović Svetlana R.

2013-01-01

Full Text Available This paper illustrates the utility of multivariate statistical techniques for analysis and interpretation of water quality data sets and identification of pollution sources/factors with a view to get better information about the water quality and design of monitoring network for effective management of water resources. Multivariate statistical techniques, such as factor analysis (FA/principal component analysis (PCA and cluster analysis (CA, were applied for the evaluation of variations and for the interpretation of a water quality data set of the natural water bodies obtained during 2010 year of monitoring of 13 parameters at 33 different sites. FA/PCA attempts to explain the correlations between the observations in terms of the underlying factors, which are not directly observable. Factor analysis is applied to physico-chemical parameters of natural water bodies with the aim classification and data summation as well as segmentation of heterogeneous data sets into smaller homogeneous subsets. Factor loadings were categorized as strong and moderate corresponding to the absolute loading values of >0.75, 0.75-0.50, respectively. Four principal factors were obtained with Eigenvalues >1 summing more than 78 % of the total variance in the water data sets, which is adequate to give good prior information regarding data structure. Each factor that is significantly related to specific variables represents a different dimension of water quality. The first factor F1 accounting for 28 % of the total variance and represents the hydrochemical dimension of water quality. The second factor F2 accounting for 18% of the total variance and may be taken factor of water eutrophication. The third factor F3 accounting 17 % of the total variance and represents the influence of point sources of pollution on water quality. The fourth factor F4 accounting 13 % of the total variance and may be taken as an ecological dimension of water quality. Cluster analysis (CA is an
Statistically significant faunal differences among Middle Ordovician age, Chickamauga Group bryozoan bioherms, central Alabama

Energy Technology Data Exchange (ETDEWEB)

Crow, C.J.

1985-01-01

Middle Ordovician age Chickamauga Group carbonates crop out along the Birmingham and Murphrees Valley anticlines in central Alabama. The macrofossil contents on exposed surfaces of seven bioherms have been counted to determine their various paleontologic characteristics. Twelve groups of organisms are present in these bioherms. Dominant organisms include bryozoans, algae, brachiopods, sponges, pelmatozoans, stromatoporoids and corals. Minor accessory fauna include predators, scavengers and grazers such as gastropods, ostracods, trilobites, cephalopods and pelecypods. Vertical and horizontal niche zonation has been detected for some of the bioherm dwelling fauna. No one bioherm of those studied exhibits all 12 groups of organisms; rather, individual bioherms display various subsets of the total diversity. Statistical treatment (G-test) of the diversity data indicates a lack of statistical homogeneity of the bioherms, both within and between localities. Between-locality population heterogeneity can be ascribed to differences in biologic responses to such gross environmental factors as water depth and clarity, and energy levels. At any one locality, gross aspects of the paleoenvironments are assumed to have been more uniform. Significant differences among bioherms at any one locality may have resulted from patchy distribution of species populations, differential preservation and other factors.
Detecting Statistically Significant Communities of Triangle Motifs in Undirected Networks

Science.gov (United States)

2016-04-26

Systems, Statistics & Management Science, University of Alabama, USA. 1 DISTRIBUTION A: Distribution approved for public release. Contents 1 Summary 5...13 5 Application to Real Networks 18 5.1 2012 FBS Football Schedule Network... football schedule network. . . . . . . . . . . . . . . . . . . . . . 21 14 Stem plot of degree-ordered vertices versus the degree for college football
Statistical significant changes in ground thermal conditions of alpine Austria during the last decade

Science.gov (United States)

Kellerer-Pirklbauer, Andreas

2016-04-01

Longer data series (e.g. >10 a) of ground temperatures in alpine regions are helpful to improve the understanding regarding the effects of present climate change on distribution and thermal characteristics of seasonal frost- and permafrost-affected areas. Beginning in 2004 - and more intensively since 2006 - a permafrost and seasonal frost monitoring network was established in Central and Eastern Austria by the University of Graz. This network consists of c.60 ground temperature (surface and near-surface) monitoring sites which are located at 1922-3002 m a.s.l., at latitude 46°55'-47°22'N and at longitude 12°44'-14°41'E. These data allow conclusions about general ground thermal conditions, potential permafrost occurrence, trend during the observation period, and regional pattern of changes. Calculations and analyses of several different temperature-related parameters were accomplished. At an annual scale a region-wide statistical significant warming during the observation period was revealed by e.g. an increase in mean annual temperature values (mean, maximum) or the significant lowering of the surface frost number (F+). At a seasonal scale no significant trend of any temperature-related parameter was in most cases revealed for spring (MAM) and autumn (SON). Winter (DJF) shows only a weak warming. In contrast, the summer (JJA) season reveals in general a significant warming as confirmed by several different temperature-related parameters such as e.g. mean seasonal temperature, number of thawing degree days, number of freezing degree days, or days without night frost. On a monthly basis August shows the statistically most robust and strongest warming of all months, although regional differences occur. Despite the fact that the general ground temperature warming during the last decade is confirmed by the field data in the study region, complications in trend analyses arise by temperature anomalies (e.g. warm winter 2006/07) or substantial variations in the winter
Conducting tests for statistically significant differences using forest inventory data

Science.gov (United States)

James A. Westfall; Scott A. Pugh; John W. Coulston

2013-01-01

Many forest inventory and monitoring programs are based on a sample of ground plots from which estimates of forest resources are derived. In addition to evaluating metrics such as number of trees or amount of cubic wood volume, it is often desirable to make comparisons between resource attributes. To properly conduct statistical tests for differences, it is imperative...
An Efficient and Reliable Statistical Method for Estimating Functional Connectivity in Large Scale Brain Networks Using Partial Correlation.

Science.gov (United States)

Wang, Yikai; Kang, Jian; Kemmer, Phebe B; Guo, Ying

2016-01-01

Currently, network-oriented analysis of fMRI data has become an important tool for understanding brain organization and brain networks. Among the range of network modeling methods, partial correlation has shown great promises in accurately detecting true brain network connections. However, the application of partial correlation in investigating brain connectivity, especially in large-scale brain networks, has been limited so far due to the technical challenges in its estimation. In this paper, we propose an efficient and reliable statistical method for estimating partial correlation in large-scale brain network modeling. Our method derives partial correlation based on the precision matrix estimated via Constrained L1-minimization Approach (CLIME), which is a recently developed statistical method that is more efficient and demonstrates better performance than the existing methods. To help select an appropriate tuning parameter for sparsity control in the network estimation, we propose a new Dens-based selection method that provides a more informative and flexible tool to allow the users to select the tuning parameter based on the desired sparsity level. Another appealing feature of the Dens-based method is that it is much faster than the existing methods, which provides an important advantage in neuroimaging applications. Simulation studies show that the Dens-based method demonstrates comparable or better performance with respect to the existing methods in network estimation. We applied the proposed partial correlation method to investigate resting state functional connectivity using rs-fMRI data from the Philadelphia Neurodevelopmental Cohort (PNC) study. Our results show that partial correlation analysis removed considerable between-module marginal connections identified by full correlation analysis, suggesting these connections were likely caused by global effects or common connection to other nodes. Based on partial correlation, we find that the most significant
The distribution of P-values in medical research articles suggested selective reporting associated with statistical significance.

Science.gov (United States)

Perneger, Thomas V; Combescure, Christophe

2017-07-01

Published P-values provide a window into the global enterprise of medical research. The aim of this study was to use the distribution of published P-values to estimate the relative frequencies of null and alternative hypotheses and to seek irregularities suggestive of publication bias. This cross-sectional study included P-values published in 120 medical research articles in 2016 (30 each from the BMJ, JAMA, Lancet, and New England Journal of Medicine). The observed distribution of P-values was compared with expected distributions under the null hypothesis (i.e., uniform between 0 and 1) and the alternative hypothesis (strictly decreasing from 0 to 1). P-values were categorized according to conventional levels of statistical significance and in one-percent intervals. Among 4,158 recorded P-values, 26.1% were highly significant (P values values equal to 1, and (3) about twice as many P-values less than 0.05 compared with those more than 0.05. The latter finding was seen in both randomized trials and observational studies, and in most types of analyses, excepting heterogeneity tests and interaction tests. Under plausible assumptions, we estimate that about half of the tested hypotheses were null and the other half were alternative. This analysis suggests that statistical tests published in medical journals are not a random sample of null and alternative hypotheses but that selective reporting is prevalent. In particular, significant results are about twice as likely to be reported as nonsignificant results. Copyright © 2017 Elsevier Inc. All rights reserved.
Large scale statistical inference of signaling pathways from RNAi and microarray data

Directory of Open Access Journals (Sweden)

Poustka Annemarie

2007-10-01

Full Text Available Abstract Background The advent of RNA interference techniques enables the selective silencing of biologically interesting genes in an efficient way. In combination with DNA microarray technology this enables researchers to gain insights into signaling pathways by observing downstream effects of individual knock-downs on gene expression. These secondary effects can be used to computationally reverse engineer features of the upstream signaling pathway. Results In this paper we address this challenging problem by extending previous work by Markowetz et al., who proposed a statistical framework to score networks hypotheses in a Bayesian manner. Our extensions go in three directions: First, we introduce a way to omit the data discretization step needed in the original framework via a calculation based on p-values instead. Second, we show how prior assumptions on the network structure can be incorporated into the scoring scheme using regularization techniques. Third and most important, we propose methods to scale up the original approach, which is limited to around 5 genes, to large scale networks. Conclusion Comparisons of these methods on artificial data are conducted. Our proposed module network is employed to infer the signaling network between 13 genes in the ER-α pathway in human MCF-7 breast cancer cells. Using a bootstrapping approach this reconstruction can be found with good statistical stability. The code for the module network inference method is available in the latest version of the R-package nem, which can be obtained from the Bioconductor homepage.
Statistical and theoretical research

International Nuclear Information System (INIS)

Anon.

1983-01-01

Significant accomplishments include the creation of field designs to detect population impacts, new census procedures for small mammals, and methods for designing studies to determine where and how much of a contaminant is extent over certain landscapes. A book describing these statistical methods is currently being written and will apply to a variety of environmental contaminants, including radionuclides. PNL scientists also have devised an analytical method for predicting the success of field eexperiments on wild populations. Two highlights of current research are the discoveries that population of free-roaming horse herds can double in four years and that grizzly bear populations may be substantially smaller than once thought. As stray horses become a public nuisance at DOE and other large Federal sites, it is important to determine their number. Similar statistical theory can be readily applied to other situations where wild animals are a problem of concern to other government agencies. Another book, on statistical aspects of radionuclide studies, is written specifically for researchers in radioecology
Statistical Image Properties in Large Subsets of Traditional Art, Bad Art, and Abstract Art.

Science.gov (United States)

Redies, Christoph; Brachmann, Anselm

2017-01-01

Several statistical image properties have been associated with large subsets of traditional visual artworks. Here, we investigate some of these properties in three categories of art that differ in artistic claim and prestige: (1) Traditional art of different cultural origin from established museums and art collections (oil paintings and graphic art of Western provenance, Islamic book illustration and Chinese paintings), (2) Bad Art from two museums that collect contemporary artworks of lesser importance (© Museum Of Bad Art [MOBA], Somerville, and Official Bad Art Museum of Art [OBAMA], Seattle), and (3) twentieth century abstract art of Western provenance from two prestigious museums (Tate Gallery and Kunstsammlung Nordrhein-Westfalen). We measured the following four statistical image properties: the fractal dimension (a measure relating to subjective complexity); self-similarity (a measure of how much the sections of an image resemble the image as a whole), 1st-order entropy of edge orientations (a measure of how uniformly different orientations are represented in an image); and 2nd-order entropy of edge orientations (a measure of how independent edge orientations are across an image). As shown previously, traditional artworks of different styles share similar values for these measures. The values for Bad Art and twentieth century abstract art show a considerable overlap with those of traditional art, but we also identified numerous examples of Bad Art and abstract art that deviate from traditional art. By measuring statistical image properties, we quantify such differences in image composition for the first time.
Extremely large and significantly anisotropic magnetoresistance in ZrSiS single crystals

Energy Technology Data Exchange (ETDEWEB)

Lv, Yang-Yang; Zhang, Bin-Bin; Yao, Shu-Hua, E-mail: shyao@nju.edu.cn, E-mail: ybchen@nju.edu.cn, E-mail: zhoujian@nju.edu.cn; Zhou, Jian, E-mail: shyao@nju.edu.cn, E-mail: ybchen@nju.edu.cn, E-mail: zhoujian@nju.edu.cn; Zhang, Shan-Tao; Lu, Ming-Hui [National Laboratory of Solid State Microstructures and Department of Materials Science and Engineering, Nanjing University, Nanjing 210093 (China); Li, Xiao; Chen, Y. B., E-mail: shyao@nju.edu.cn, E-mail: ybchen@nju.edu.cn, E-mail: zhoujian@nju.edu.cn [National Laboratory of Solid State Microstructures and Department of Physics, Nanjing University, Nanjing 210093 (China); Chen, Yan-Feng [National Laboratory of Solid State Microstructures and Department of Materials Science and Engineering, Nanjing University, Nanjing 210093 (China); Collaborative Innovation Center of Advanced Microstructure, Nanjing University, Nanjing 210093 (China)

2016-06-13

Recently, the extremely large magnetoresistance (MR) observed in transition metal telluride, like WTe{sub 2}, attracted much attention because of the potential applications in magnetic sensor. Here, we report the observation of extremely large magnetoresistance as 3.0 × 10{sup 4}% measured at 2 K and 9 T magnetic field aligned along [001]-ZrSiS. The significant magnetoresistance change (∼1.4 × 10{sup 4}%) can be obtained when the magnetic field is titled from [001] to [011]-ZrSiS. These abnormal magnetoresistance behaviors in ZrSiS can be understood by electron-hole compensation and the open orbital of Fermi surface. Because of these superior MR properties, ZrSiS may be used in the magnetic sensors.
Statistics on the parameters of nonisothermal ionospheric plasma in large mesospheric electric fields

Science.gov (United States)

Martynenko, S.; Rozumenko, V.; Tyrnov, O.; Manson, A.; Meek, C.

The large V/m electric fields inherent in the mesosphere play an essential role in lower ionospheric electrodynamics. They must be the cause of large variations in the electron temperature and the electron collision frequency at D region altitudes, and consequently the ionospheric plasma in the lower part of the D region undergoes a transition into a nonisothermal state. This study is based on the databases on large mesospheric electric fields collected with the 2.2-MHz radar of the Institute of Space and Atmospheric Studies, University of Saskatchewan, Canada (52°N geographic latitude, 60.4°N geomagnetic latitude) and with the 2.3-MHz radar of the Kharkiv V. Karazin National University (49.6°N geographic latitude, 45.6°N geomagnetic latitude). The statistical analysis of these data is presented in Meek, C. E., A. H. Manson, S. I. Martynenko, V. T. Rozumenko, O. F. Tyrnov, Remote sensing of mesospheric electric fields using MF radars, Journal of Atmospheric and Solar-Terrestrial Physics, in press. The large mesospheric electric fields is experimentally established to follow a Rayleigh distribution in the interval 0
Statistical analyses of scatterplots to identify important factors in large-scale simulations, 2: robustness of techniques

International Nuclear Information System (INIS)

Kleijnen, J.P.C.; Helton, J.C.

1999-01-01

The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (i) linear relationships with correlation coefficients, (ii) monotonic relationships with rank correlation coefficients, (iii) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (iv) trends in variability as defined by variances and interquartile ranges, and (v) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are considered for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (i) Type I errors are unavoidable, (ii) Type II errors can occur when inappropriate analysis procedures are used, (iii) physical explanations should always be sought for why statistical procedures identify variables as being important, and (iv) the identification of important variables tends to be stable for independent Latin hypercube samples
Statistical learning and selective inference.

Science.gov (United States)

Taylor, Jonathan; Tibshirani, Robert J

2015-06-23

We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
The Strasbourg Large Refractor and Dome: Significant Improvements and Failed Attempts

Science.gov (United States)

Heck, Andre

2009-01-01

Founded by the German Empire in the late 19th century, Strasbourg Astronomical Observatory featured several novelties from the start. According to Mueller (1978), the separation of observing buildings from the study area and from the astronomers' residence was a revolution in observatory construction. The instruments were, as much as possible, isolated from the vibrations of the buildings themselves. "Gas flames" and water were used to reduce temperature effects. Thus the Large Dome (ca 11m diameter), housing the Large Refractor (ca 49cm, then the largest in Germany) and covered by zinc over wood, could be cooled down by water running from the top. Reports (including by the French who took over the observatory after World War I) are however somehow nonexistent on the effective usage and actual efficiency of such a system (which must have generated locally a significant amount of humidity). The paper will detail these technical attempts as well as the specificities of the instruments installed in that new observatory intended as a showcase of German astronomy.
Calculating p-values and their significances with the Energy Test for large datasets

Science.gov (United States)

Barter, W.; Burr, C.; Parkes, C.

2018-04-01

The energy test method is a multi-dimensional test of whether two samples are consistent with arising from the same underlying population, through the calculation of a single test statistic (called the T-value). The method has recently been used in particle physics to search for samples that differ due to CP violation. The generalised extreme value function has previously been used to describe the distribution of T-values under the null hypothesis that the two samples are drawn from the same underlying population. We show that, in a simple test case, the distribution is not sufficiently well described by the generalised extreme value function. We present a new method, where the distribution of T-values under the null hypothesis when comparing two large samples can be found by scaling the distribution found when comparing small samples drawn from the same population. This method can then be used to quickly calculate the p-values associated with the results of the test.
Turbulent Flow Over Large Roughness Elements: Effect of Frontal and Plan Solidity on Turbulence Statistics and Structure

Science.gov (United States)

Placidi, M.; Ganapathisubramani, B.

2018-04-01

Wind-tunnel experiments were carried out on fully-rough boundary layers with large roughness (δ /h ≈ 10, where h is the height of the roughness elements and δ is the boundary-layer thickness). Twelve different surface conditions were created by using LEGO™ bricks of uniform height. Six cases are tested for a fixed plan solidity (λ _P) with variations in frontal density (λ _F), while the other six cases have varying λ _P for fixed λ _F. Particle image velocimetry and floating-element drag-balance measurements were performed. The current results complement those contained in Placidi and Ganapathisubramani (J Fluid Mech 782:541-566, 2015), extending the previous analysis to the turbulence statistics and spatial structure. Results indicate that mean velocity profiles in defect form agree with Townsend's similarity hypothesis with varying λ _F, however, the agreement is worse for cases with varying λ _P. The streamwise and wall-normal turbulent stresses, as well as the Reynolds shear stresses, show a lack of similarity across most examined cases. This suggests that the critical height of the roughness for which outer-layer similarity holds depends not only on the height of the roughness, but also on the local wall morphology. A new criterion based on shelter solidity, defined as the sheltered plan area per unit wall-parallel area, which is similar to the `effective shelter area' in Raupach and Shaw (Boundary-Layer Meteorol 22:79-90, 1982), is found to capture the departure of the turbulence statistics from outer-layer similarity. Despite this lack of similarity reported in the turbulence statistics, proper orthogonal decomposition analysis, as well as two-point spatial correlations, show that some form of universal flow structure is present, as all cases exhibit virtually identical proper orthogonal decomposition mode shapes and correlation fields. Finally, reduced models based on proper orthogonal decomposition reveal that the small scales of the turbulence
The statistical-inference approach to generalized thermodynamics

International Nuclear Information System (INIS)

Lavenda, B.H.; Scherer, C.

1987-01-01

Limit theorems, such as the central-limit theorem and the weak law of large numbers, are applicable to statistical thermodynamics for sufficiently large sample size of indipendent and identically distributed observations performed on extensive thermodynamic (chance) variables. The estimation of the intensive thermodynamic quantities is a problem in parametric statistical estimation. The normal approximation to the Gibbs' distribution is justified by the analysis of large deviations. Statistical thermodynamics is generalized to include the statistical estimation of variance as well as mean values

Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

Directory of Open Access Journals (Sweden)

Ujjwal Maulik

Full Text Available Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution. The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post
Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

Science.gov (United States)

Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

2015-01-01

Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data
Sigsearch: a new term for post hoc unplanned search for statistically significant relationships with the intent to create publishable findings.

Science.gov (United States)

Hashim, Muhammad Jawad

2010-09-01

Post-hoc secondary data analysis with no prespecified hypotheses has been discouraged by textbook authors and journal editors alike. Unfortunately no single term describes this phenomenon succinctly. I would like to coin the term "sigsearch" to define this practice and bring it within the teaching lexicon of statistics courses. Sigsearch would include any unplanned, post-hoc search for statistical significance using multiple comparisons of subgroups. It would also include data analysis with outcomes other than the prespecified primary outcome measure of a study as well as secondary data analyses of earlier research.
[Effect sizes, statistical power and sample sizes in "the Japanese Journal of Psychology"].

Science.gov (United States)

Suzukawa, Yumi; Toyoda, Hideki

2012-04-01

This study analyzed the statistical power of research studies published in the "Japanese Journal of Psychology" in 2008 and 2009. Sample effect sizes and sample statistical powers were calculated for each statistical test and analyzed with respect to the analytical methods and the fields of the studies. The results show that in the fields like perception, cognition or learning, the effect sizes were relatively large, although the sample sizes were small. At the same time, because of the small sample sizes, some meaningful effects could not be detected. In the other fields, because of the large sample sizes, meaningless effects could be detected. This implies that researchers who could not get large enough effect sizes would use larger samples to obtain significant results.
Practical Statistics for Particle Physicists

CERN Document Server

Lista, Luca

2017-01-01

These three lectures provide an introduction to the main concepts of statistical data analysis useful for precision measurements and searches for new signals in High Energy Physics. The frequentist and Bayesian approaches to probability theory will introduced and, for both approaches, inference methods will be presented. Hypothesis tests will be discussed, then significance and upper limit evaluation will be presented with an overview of the modern and most advanced techniques adopted for data analysis at the Large Hadron Collider.
National transportation statistics 2010

Science.gov (United States)

2010-01-01

National Transportation Statistics presents statistics on the U.S. transportation system, including its physical components, safety record, economic performance, the human and natural environment, and national security. This is a large online documen...
IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

Science.gov (United States)

Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

2017-09-15

Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Lensing corrections to the Eg(z) statistics from large scale structure

Science.gov (United States)

Moradinezhad Dizgah, Azadeh; Durrer, Ruth

2016-09-01

We study the impact of the often neglected lensing contribution to galaxy number counts on the Eg statistics which is used to constrain deviations from GR. This contribution affects both the galaxy-galaxy and the convergence-galaxy spectra, while it is larger for the latter. At higher redshifts probed by upcoming surveys, for instance at z = 1.5, neglecting this term induces an error of (25-40)% in the spectra and therefore on the Eg statistics which is constructed from the combination of the two. Moreover, including it, renders the Eg statistics scale and bias-dependent and hence puts into question its very objective.
In-Depth Investigation of Statistical and Physicochemical Properties on the Field Study of the Intermittent Filling of Large Water Tanks

Directory of Open Access Journals (Sweden)

Do-Hwan Kim

2017-01-01

Full Text Available Large-demand customers, generally high-density dwellings and buildings, have dedicated ground or elevated water tanks to consistently supply drinking water to residents. Online field measurement for Nonsan-2 district meter area demonstrated that intermittent replenishment from large-demand customers could disrupt the normal operation of a water distribution system by taking large quantities of water in short times when filling the tanks from distribution mains. Based on the previous results of field measurement for hydraulic and water quality parameters, statistical analysis is performed for measured data in terms of autocorrelation, power spectral density, and cross-correlation. The statistical results show that the intermittent filling interval of 6.7 h and diurnal demand pattern of 23.3 h are detected through autocorrelation analyses, the similarities of the flow-pressure and the turbidity-particle count data are confirmed as a function of frequency through power spectral density analyses, and a strong cross-correlation is observed in the flow-pressure and turbidity-particle count analyses. In addition, physicochemical results show that the intermittent refill of storage tank from large-demand customers induces abnormal flow and pressure fluctuations and results in transient-induced turbid flow mainly composed of fine particles ranging within 2–4 μm and constituting Fe, Si, and Al.
Principles of Statistics: What the Sports Medicine Professional Needs to Know.

Science.gov (United States)

Riemann, Bryan L; Lininger, Monica R

2018-07-01

Understanding the results and statistics reported in original research remains a large challenge for many sports medicine practitioners and, in turn, may be among one of the biggest barriers to integrating research into sports medicine practice. The purpose of this article is to provide minimal essentials a sports medicine practitioner needs to know about interpreting statistics and research results to facilitate the incorporation of the latest evidence into practice. Topics covered include the difference between statistical significance and clinical meaningfulness; effect sizes and confidence intervals; reliability statistics, including the minimal detectable difference and minimal important difference; and statistical power. Copyright © 2018 Elsevier Inc. All rights reserved.
Statistics and Discoveries at the LHC (1/4)

CERN Multimedia

CERN. Geneva

2010-01-01

The lectures will give an introduction to statistics as applied in particle physics and will provide all the necessary basics for data analysis at the LHC. Special emphasis will be placed on the the problems and questions that arise when searching for new phenomena, including p-values, discovery significance, limit setting procedures, treatment of small signals in the presence of large backgrounds. Specific issues that will be addressed include the advantages and drawbacks of different statistical test procedures (cut-based, likelihood-ratio, etc.), the look-elsewhere effect and treatment of systematic uncertainties.
Statistics and Discoveries at the LHC (3/4)

CERN Multimedia

CERN. Geneva

2010-01-01

The lectures will give an introduction to statistics as applied in particle physics and will provide all the necessary basics for data analysis at the LHC. Special emphasis will be placed on the the problems and questions that arise when searching for new phenomena, including p-values, discovery significance, limit setting procedures, treatment of small signals in the presence of large backgrounds. Specific issues that will be addressed include the advantages and drawbacks of different statistical test procedures (cut-based, likelihood-ratio, etc.), the look-elsewhere effect and treatment of systematic uncertainties.
Statistics and Discoveries at the LHC (2/4)

CERN Multimedia

CERN. Geneva

2010-01-01

The lectures will give an introduction to statistics as applied in particle physics and will provide all the necessary basics for data analysis at the LHC. Special emphasis will be placed on the the problems and questions that arise when searching for new phenomena, including p-values, discovery significance, limit setting procedures, treatment of small signals in the presence of large backgrounds. Specific issues that will be addressed include the advantages and drawbacks of different statistical test procedures (cut-based, likelihood-ratio, etc.), the look-elsewhere effect and treatment of systematic uncertainties.
Statistics and Discoveries at the LHC (4/4)

CERN Multimedia

CERN. Geneva

2010-01-01

The lectures will give an introduction to statistics as applied in particle physics and will provide all the necessary basics for data analysis at the LHC. Special emphasis will be placed on the the problems and questions that arise when searching for new phenomena, including p-values, discovery significance, limit setting procedures, treatment of small signals in the presence of large backgrounds. Specific issues that will be addressed include the advantages and drawbacks of different statistical test procedures (cut-based, likelihood-ratio, etc.), the look-elsewhere effect and treatment of systematic uncertainties.
Estimates of statistical significance for comparison of individual positions in multiple sequence alignments

Directory of Open Access Journals (Sweden)

Sadreyev Ruslan I

2004-08-01

Full Text Available Abstract Background Profile-based analysis of multiple sequence alignments (MSA allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1 MSA position and a set of predicted residue frequencies, and (2 between two MSA positions. These problems are important for (i evaluation and optimization of methods predicting residue occurrence at protein positions; (ii detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii detection of sites that determine functional or structural specificity in two related families. Results For problems (1 and (2, we propose analytical estimates of P-value and apply them to the detection of significant positional dissimilarities in various experimental situations. (a We compare structure-based predictions of residue propensities at a protein position to the actual residue frequencies in the MSA of homologs. (b We evaluate our method by the ability to detect erroneous position matches produced by an automatic sequence aligner. (c We compare MSA positions that correspond to residues aligned by automatic structure aligners. (d We compare MSA positions that are aligned by high-quality manual superposition of structures. Detected dissimilarities reveal shortcomings of the automatic methods for residue frequency prediction and alignment construction. For the high-quality structural alignments, the dissimilarities suggest sites of potential functional or structural importance. Conclusion The proposed computational method is of significant potential value for the analysis of protein families.
Test the Overall Significance of p-values by Using Joint Tail Probability of Ordered p-values as Test Statistic

NARCIS (Netherlands)

Fang, Yongxiang; Wit, Ernst

2008-01-01

Fisher’s combined probability test is the most commonly used method to test the overall significance of a set independent p-values. However, it is very obviously that Fisher’s statistic is more sensitive to smaller p-values than to larger p-value and a small p-value may overrule the other p-values
The statistical significance of error probability as determined from decoding simulations for long codes

Science.gov (United States)

Massey, J. L.

1976-01-01

The very low error probability obtained with long error-correcting codes results in a very small number of observed errors in simulation studies of practical size and renders the usual confidence interval techniques inapplicable to the observed error probability. A natural extension of the notion of a 'confidence interval' is made and applied to such determinations of error probability by simulation. An example is included to show the surprisingly great significance of as few as two decoding errors in a very large number of decoding trials.
Statistical Model of Extreme Shear

DEFF Research Database (Denmark)

Larsen, Gunner Chr.; Hansen, Kurt Schaldemose

2004-01-01

In order to continue cost-optimisation of modern large wind turbines, it is important to continously increase the knowledge on wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... by a model that, on a statistically consistent basis, describe the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of high-sampled full-scale time series measurements...... are consistent, given the inevitabel uncertainties associated with model as well as with the extreme value data analysis. Keywords: Statistical model, extreme wind conditions, statistical analysis, turbulence, wind loading, statistical analysis, turbulence, wind loading, wind shear, wind turbines....
Past and future American Psychological Association guidelines for statistical practice

NARCIS (Netherlands)

Finch, S; Thomason, N; Cumming, G

2002-01-01

We review the publication guidelines of the American Psychological Association (APA) since 1929 and document their advice for authors about statistical practice. Although the advice has been extended with each revision of the guidelines, it has largely focused on null hypothesis significance testing
Statistical determination of significant curved I-girder bridge seismic response parameters

Science.gov (United States)

Seo, Junwon

2013-06-01

Curved steel bridges are commonly used at interchanges in transportation networks and more of these structures continue to be designed and built in the United States. Though the use of these bridges continues to increase in locations that experience high seismicity, the effects of curvature and other parameters on their seismic behaviors have been neglected in current risk assessment tools. These tools can evaluate the seismic vulnerability of a transportation network using fragility curves. One critical component of fragility curve development for curved steel bridges is the completion of sensitivity analyses that help identify influential parameters related to their seismic response. In this study, an accessible inventory of existing curved steel girder bridges located primarily in the Mid-Atlantic United States (MAUS) was used to establish statistical characteristics used as inputs for a seismic sensitivity study. Critical seismic response quantities were captured using 3D nonlinear finite element models. Influential parameters from these quantities were identified using statistical tools that incorporate experimental Plackett-Burman Design (PBD), which included Pareto optimal plots and prediction profiler techniques. The findings revealed that the potential variation in the influential parameters included number of spans, radius of curvature, maximum span length, girder spacing, and cross-frame spacing. These parameters showed varying levels of influence on the critical bridge response.

Statistical analysis of error rate of large-scale single flux quantum logic circuit by considering fluctuation of timing parameters

International Nuclear Information System (INIS)

Yamanashi, Yuki; Masubuchi, Kota; Yoshikawa, Nobuyuki

2016-01-01

The relationship between the timing margin and the error rate of the large-scale single flux quantum logic circuits is quantitatively investigated to establish a timing design guideline. We observed that the fluctuation in the set-up/hold time of single flux quantum logic gates caused by thermal noises is the most probable origin of the logical error of the large-scale single flux quantum circuit. The appropriate timing margin for stable operation of the large-scale logic circuit is discussed by taking the fluctuation of setup/hold time and the timing jitter in the single flux quantum circuits. As a case study, the dependence of the error rate of the 1-million-bit single flux quantum shift register on the timing margin is statistically analyzed. The result indicates that adjustment of timing margin and the bias voltage is important for stable operation of a large-scale SFQ logic circuit.
Statistical analysis of error rate of large-scale single flux quantum logic circuit by considering fluctuation of timing parameters

Energy Technology Data Exchange (ETDEWEB)

Yamanashi, Yuki, E-mail: yamanasi@ynu.ac.jp [Department of Electrical and Computer Engineering, Yokohama National University, Tokiwadai 79-5, Hodogaya-ku, Yokohama 240-8501 (Japan); Masubuchi, Kota; Yoshikawa, Nobuyuki [Department of Electrical and Computer Engineering, Yokohama National University, Tokiwadai 79-5, Hodogaya-ku, Yokohama 240-8501 (Japan)

2016-11-15

The relationship between the timing margin and the error rate of the large-scale single flux quantum logic circuits is quantitatively investigated to establish a timing design guideline. We observed that the fluctuation in the set-up/hold time of single flux quantum logic gates caused by thermal noises is the most probable origin of the logical error of the large-scale single flux quantum circuit. The appropriate timing margin for stable operation of the large-scale logic circuit is discussed by taking the fluctuation of setup/hold time and the timing jitter in the single flux quantum circuits. As a case study, the dependence of the error rate of the 1-million-bit single flux quantum shift register on the timing margin is statistically analyzed. The result indicates that adjustment of timing margin and the bias voltage is important for stable operation of a large-scale SFQ logic circuit.
EFFECT OF MEASUREMENT ERRORS ON PREDICTED COSMOLOGICAL CONSTRAINTS FROM SHEAR PEAK STATISTICS WITH LARGE SYNOPTIC SURVEY TELESCOPE

Energy Technology Data Exchange (ETDEWEB)

Bard, D.; Chang, C.; Kahn, S. M.; Gilmore, K.; Marshall, S. [KIPAC, Stanford University, 452 Lomita Mall, Stanford, CA 94309 (United States); Kratochvil, J. M.; Huffenberger, K. M. [Department of Physics, University of Miami, Coral Gables, FL 33124 (United States); May, M. [Physics Department, Brookhaven National Laboratory, Upton, NY 11973 (United States); AlSayyad, Y.; Connolly, A.; Gibson, R. R.; Jones, L.; Krughoff, S. [Department of Astronomy, University of Washington, Seattle, WA 98195 (United States); Ahmad, Z.; Bankert, J.; Grace, E.; Hannel, M.; Lorenz, S. [Department of Physics, Purdue University, West Lafayette, IN 47907 (United States); Haiman, Z.; Jernigan, J. G., E-mail: djbard@slac.stanford.edu [Department of Astronomy and Astrophysics, Columbia University, New York, NY 10027 (United States); and others

2013-09-01

We study the effect of galaxy shape measurement errors on predicted cosmological constraints from the statistics of shear peak counts with the Large Synoptic Survey Telescope (LSST). We use the LSST Image Simulator in combination with cosmological N-body simulations to model realistic shear maps for different cosmological models. We include both galaxy shape noise and, for the first time, measurement errors on galaxy shapes. We find that the measurement errors considered have relatively little impact on the constraining power of shear peak counts for LSST.
The use of test scores from large-scale assessment surveys: psychometric and statistical considerations

Directory of Open Access Journals (Sweden)

Henry Braun

2017-11-01

Full Text Available Abstract Background Economists are making increasing use of measures of student achievement obtained through large-scale survey assessments such as NAEP, TIMSS, and PISA. The construction of these measures, employing plausible value (PV methodology, is quite different from that of the more familiar test scores associated with assessments such as the SAT or ACT. These differences have important implications both for utilization and interpretation. Although much has been written about PVs, it appears that there are still misconceptions about whether and how to employ them in secondary analyses. Methods We address a range of technical issues, including those raised in a recent article that was written to inform economists using these databases. First, an extensive review of the relevant literature was conducted, with particular attention to key publications that describe the derivation and psychometric characteristics of such achievement measures. Second, a simulation study was carried out to compare the statistical properties of estimates based on the use of PVs with those based on other, commonly used methods. Results It is shown, through both theoretical analysis and simulation, that under fairly general conditions appropriate use of PV yields approximately unbiased estimates of model parameters in regression analyses of large scale survey data. The superiority of the PV methodology is particularly evident when measures of student achievement are employed as explanatory variables. Conclusions The PV methodology used to report student test performance in large scale surveys remains the state-of-the-art for secondary analyses of these databases.
Statistical inference

CERN Document Server

Rohatgi, Vijay K

2003-01-01

Unified treatment of probability and statistics examines and analyzes the relationship between the two fields, exploring inferential issues. Numerous problems, examples, and diagrams--some with solutions--plus clear-cut, highlighted summaries of results. Advanced undergraduate to graduate level. Contents: 1. Introduction. 2. Probability Model. 3. Probability Distributions. 4. Introduction to Statistical Inference. 5. More on Mathematical Expectation. 6. Some Discrete Models. 7. Some Continuous Models. 8. Functions of Random Variables and Random Vectors. 9. Large-Sample Theory. 10. General Meth
Testing statistical hypotheses

CERN Document Server

Lehmann, E L

2005-01-01

The third edition of Testing Statistical Hypotheses updates and expands upon the classic graduate text, emphasizing optimality theory for hypothesis testing and confidence sets. The principal additions include a rigorous treatment of large sample optimality, together with the requisite tools. In addition, an introduction to the theory of resampling methods such as the bootstrap is developed. The sections on multiple testing and goodness of fit testing are expanded. The text is suitable for Ph.D. students in statistics and includes over 300 new problems out of a total of more than 760. E.L. Lehmann is Professor of Statistics Emeritus at the University of California, Berkeley. He is a member of the National Academy of Sciences and the American Academy of Arts and Sciences, and the recipient of honorary degrees from the University of Leiden, The Netherlands and the University of Chicago. He is the author of Elements of Large-Sample Theory and (with George Casella) he is also the author of Theory of Point Estimat...
Use of a statistical model of the whole femur in a large scale, multi-model study of femoral neck fracture risk.

Science.gov (United States)

Bryan, Rebecca; Nair, Prasanth B; Taylor, Mark

2009-09-18

Interpatient variability is often overlooked in orthopaedic computational studies due to the substantial challenges involved in sourcing and generating large numbers of bone models. A statistical model of the whole femur incorporating both geometric and material property variation was developed as a potential solution to this problem. The statistical model was constructed using principal component analysis, applied to 21 individual computer tomography scans. To test the ability of the statistical model to generate realistic, unique, finite element (FE) femur models it was used as a source of 1000 femurs to drive a study on femoral neck fracture risk. The study simulated the impact of an oblique fall to the side, a scenario known to account for a large proportion of hip fractures in the elderly and have a lower fracture load than alternative loading approaches. FE model generation, application of subject specific loading and boundary conditions, FE processing and post processing of the solutions were completed automatically. The generated models were within the bounds of the training data used to create the statistical model with a high mesh quality, able to be used directly by the FE solver without remeshing. The results indicated that 28 of the 1000 femurs were at highest risk of fracture. Closer analysis revealed the percentage of cortical bone in the proximal femur to be a crucial differentiator between the failed and non-failed groups. The likely fracture location was indicated to be intertrochantic. Comparison to previous computational, clinical and experimental work revealed support for these findings.
Hydrologic effects of large southwestern USA wildfires significantly increase regional water supply: fact or fiction?

Science.gov (United States)

Wine, M. L.; Cadol, D.

2016-08-01

In recent years climate change and historic fire suppression have increased the frequency of large wildfires in the southwestern USA, motivating study of the hydrological consequences of these wildfires at point and watershed scales, typically over short periods of time. These studies have revealed that reduced soil infiltration capacity and reduced transpiration due to tree canopy combustion increase streamflow at the watershed scale. However, the degree to which these local increases in runoff propagate to larger scales—relevant to urban and agricultural water supply—remains largely unknown, particularly in semi-arid mountainous watersheds co-dominated by winter snowmelt and the North American monsoon. To address this question, we selected three New Mexico watersheds—the Jemez (1223 km2), Mogollon (191 km2), and Gila (4807 km2)—that together have been affected by over 100 wildfires since 1982. We then applied climate-driven linear models to test for effects of fire on streamflow metrics after controlling for climatic variability. Here we show that, after controlling for climatic and snowpack variability, significantly more streamflow discharged from the Gila watershed for three to five years following wildfires, consistent with increased regional water yield due to enhanced infiltration-excess overland flow and groundwater recharge at the large watershed scale. In contrast, we observed no such increase in discharge from the Jemez watershed following wildfires. Fire regimes represent a key difference between the contrasting responses of the Jemez and Gila watersheds with the latter experiencing more frequent wildfires, many caused by lightning strikes. While hydrologic dynamics at the scale of large watersheds were previously thought to be climatically dominated, these results suggest that if one fifth or more of a large watershed has been burned in the previous three to five years, significant increases in water yield can be expected.
Statistically significant dependence of the Xaa-Pro peptide bond conformation on secondary structure and amino acid sequence

Directory of Open Access Journals (Sweden)

Leitner Dietmar

2005-04-01

Full Text Available Abstract Background A reliable prediction of the Xaa-Pro peptide bond conformation would be a useful tool for many protein structure calculation methods. We have analyzed the Protein Data Bank and show that the combined use of sequential and structural information has a predictive value for the assessment of the cis versus trans peptide bond conformation of Xaa-Pro within proteins. For the analysis of the data sets different statistical methods such as the calculation of the Chou-Fasman parameters and occurrence matrices were used. Furthermore we analyzed the relationship between the relative solvent accessibility and the relative occurrence of prolines in the cis and in the trans conformation. Results One of the main results of the statistical investigations is the ranking of the secondary structure and sequence information with respect to the prediction of the Xaa-Pro peptide bond conformation. We observed a significant impact of secondary structure information on the occurrence of the Xaa-Pro peptide bond conformation, while the sequence information of amino acids neighboring proline is of little predictive value for the conformation of this bond. Conclusion In this work, we present an extensive analysis of the occurrence of the cis and trans proline conformation in proteins. Based on the data set, we derived patterns and rules for a possible prediction of the proline conformation. Upon adoption of the Chou-Fasman parameters, we are able to derive statistically relevant correlations between the secondary structure of amino acid fragments and the Xaa-Pro peptide bond conformation.
CONFIDENCE LEVELS AND/VS. STATISTICAL HYPOTHESIS TESTING IN STATISTICAL ANALYSIS. CASE STUDY

Directory of Open Access Journals (Sweden)

ILEANA BRUDIU

2009-05-01

Full Text Available Estimated parameters with confidence intervals and testing statistical assumptions used in statistical analysis to obtain conclusions on research from a sample extracted from the population. Paper to the case study presented aims to highlight the importance of volume of sample taken in the study and how this reflects on the results obtained when using confidence intervals and testing for pregnant. If statistical testing hypotheses not only give an answer "yes" or "no" to some questions of statistical estimation using statistical confidence intervals provides more information than a test statistic, show high degree of uncertainty arising from small samples and findings build in the "marginally significant" or "almost significant (p very close to 0.05.
Image Statistics

Energy Technology Data Exchange (ETDEWEB)

Wendelberger, Laura Jean [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2017-08-08

In large datasets, it is time consuming or even impossible to pick out interesting images. Our proposed solution is to find statistics to quantify the information in each image and use those to identify and pick out images of interest.
Significance of large Neptune-crossing objects for terrestrial catastrophism

Science.gov (United States)

Steel, D.

2014-07-01

Over the past few decades a substantial number of objects have been discovered on orbits beyond Neptune (i.e. transneptunian objects, in various sub-classes), crossing Neptune's orbit (here: the Neptune-crossers of interest), and also others crossing the orbits of any or all of the jovian planets (i.e. Centaurs). These range in size from tens of kilometres across to hundreds of kilometres and more. Although formally classified as minor planets/asteroids, plus a few dwarf planets, the physical reality of these objects is that they are giant comets. That is, they seem to be composed largely of ices and if they were to enter the inner solar system then they would demonstrate the commonly-observed behaviour of comets such as outgassing, and the formation of ion and dust tails. Commonly-observed cometary behaviour, however, also includes fragmentation events and sometimes complete disintegration for no apparent cause (such as tidal disruption or thermal stresses). One might therefore wonder what the implications would be for life on Earth and terrestrial catastrophism if and when one of these objects, say 100 to 500 kilometres in size, dropped into a short-period orbit with perihelion distance (q) less than 1 au; or even q ˜ 5 au, given what Jupiter's gravity might do to it. How often might such events occur? One way to address that question would be to conduct numerical integrations of suitable test orbits and identify how often small-q orbits result, but this comes up against the problem of identifying very-infrequent events (with annual probabilities per object perhaps of order 10^{-12}-10^{-10}. For example, Emel'yanenko et al. [1] recently followed test orbits for approximately 5 × 10^{14} particle-years (8,925 objects with 200 clones of each, for 300 Myr) but because these were selected on the basis of initial values of q only below 36 (rather than ˜30) au many were not immediately Neptune-crossers; however, many test particles did eventually migrate into small
Prognostic significance of large perfusion defects on thallium-201 myocardial scintigraphy in dilated cardiomyopathy

International Nuclear Information System (INIS)

Takata, Jun; Doi, Yoshinori; Chikamori, Taishiro; Yonezawa, Yoshihiro; Hamashige, Naohisa; Kuzume, Osamu; Ozawa, Toshio

1989-01-01

To evaluate the prognostic significance of perfusion abnormalities, particularly large defects, in dilated cardiomyopathy (DCM), we performed thallium-201 myocardial scintigraphy and 24-hour ambulatory ECG monitoring in 27 patients. The abnormal scintigraphic patterns and the presence of ventricular tachycardia (VT) were correlated with causes of death during a follow-up period of 30.0±19.4 months. Eight patients had large defects (LD), 11 had multiple small defects (MSD), and eight had no defects (NL). The patients with LD had extensive ventricular akinesis in the region of the perfusion defect, significantly elevated LVEDP (LD 20.6±7.4 mmHg, MSD 15.5±7.6 mmHg, NL 10.3±2.3 mmHg: LD vs NL; p<0.01, MSD vs NL; p<0.05), and reduced ejection fraction (LD 23.9±9.1%, MSD 32.7±7.2%, NL 40.3±7.7%: LD vs MSD; p<0.05, MSD vs NL; p<0.01). VT was detected in 11 patients; among whom three had LD, six had MSD, and two had no defects. Among seven patients who died during follow-up (five of heart failure, one sudden death, and one non-cardiac death), five had LD and two had MSD. There were no deaths among patients without defects. Among 11 patients with VT, only one died suddenly. In conclusion, large scintigraphic defects correlated well with severe LV dysfunction, and this is an important variable in predicting outcomes in DCM. (author)
Limiting values of large deviation probabilities of quadratic statistics

NARCIS (Netherlands)

Jeurnink, Gerardus A.M.; Kallenberg, W.C.M.

1990-01-01

Application of exact Bahadur efficiencies in testing theory or exact inaccuracy rates in estimation theory needs evaluation of large deviation probabilities. Because of the complexity of the expressions, frequently a local limit of the nonlocal measure is considered. Local limits of large deviation
Statistical analyses of scatterplots to identify important factors in large-scale simulations, 1: Review and comparison of techniques

International Nuclear Information System (INIS)

Kleijnen, J.P.C.; Helton, J.C.

1999-01-01

Procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses are described and illustrated. These procedures attempt to detect increasingly complex patterns in scatterplots and involve the identification of (i) linear relationships with correlation coefficients, (ii) monotonic relationships with rank correlation coefficients, (iii) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (iv) trends in variability as defined by variances and interquartile ranges, and (v) deviations from randomness as defined by the chi-square statistic. A sequence of example analyses with a large model for two-phase fluid flow illustrates how the individual procedures can differ in the variables that they identify as having effects on particular model outcomes. The example analyses indicate that the use of a sequence of procedures is a good analysis strategy and provides some assurance that an important effect is not overlooked
Presence and significant determinants of cognitive impairment in a large sample of patients with multiple sclerosis.

Directory of Open Access Journals (Sweden)

Martina Borghi

Full Text Available OBJECTIVES: To investigate the presence and the nature of cognitive impairment in a large sample of patients with Multiple Sclerosis (MS, and to identify clinical and demographic determinants of cognitive impairment in MS. METHODS: 303 patients with MS and 279 healthy controls were administered the Brief Repeatable Battery of Neuropsychological tests (BRB-N; measures of pre-morbid verbal competence and neuropsychiatric measures were also administered. RESULTS: Patients and healthy controls were matched for age, gender, education and pre-morbid verbal Intelligence Quotient. Patients presenting with cognitive impairment were 108/303 (35.6%. In the overall group of participants, the significant predictors of the most sensitive BRB-N scores were: presence of MS, age, education, and Vocabulary. The significant predictors when considering MS patients only were: course of MS, age, education, vocabulary, and depression. Using logistic regression analyses, significant determinants of the presence of cognitive impairment in relapsing-remitting MS patients were: duration of illness (OR = 1.053, 95% CI = 1.010-1.097, p = 0.015, Expanded Disability Status Scale score (OR = 1.247, 95% CI = 1.024-1.517, p = 0.028, and vocabulary (OR = 0.960, 95% CI = 0.936-0.984, p = 0.001, while in the smaller group of progressive MS patients these predictors did not play a significant role in determining the cognitive outcome. CONCLUSIONS: Our results corroborate the evidence about the presence and the nature of cognitive impairment in a large sample of patients with MS. Furthermore, our findings identify significant clinical and demographic determinants of cognitive impairment in a large sample of MS patients for the first time. Implications for further research and clinical practice were discussed.
Mask effects on cosmological studies with weak-lensing peak statistics

International Nuclear Information System (INIS)

Liu, Xiangkun; Pan, Chuzhong; Fan, Zuhui; Wang, Qiao

2014-01-01

With numerical simulations, we analyze in detail how the bad data removal, i.e., the mask effect, can influence the peak statistics of the weak-lensing convergence field reconstructed from the shear measurement of background galaxies. It is found that high peak fractions are systematically enhanced because of the presence of masks; the larger the masked area is, the higher the enhancement is. In the case where the total masked area is about 13% of the survey area, the fraction of peaks with signal-to-noise ratio ν ≥ 3 is ∼11% of the total number of peaks, compared with ∼7% of the mask-free case in our considered cosmological model. This can have significant effects on cosmological studies with weak-lensing convergence peak statistics, inducing a large bias in the parameter constraints if the effects are not taken into account properly. Even for a survey area of 9 deg 2 , the bias in (Ω m , σ 8 ) is already intolerably large and close to 3σ. It is noted that most of the affected peaks are close to the masked regions. Therefore, excluding peaks in those regions in the peak statistics can reduce the bias effect but at the expense of losing usable survey areas. Further investigations find that the enhancement of the number of high peaks around the masked regions can be largely attributed to the smaller number of galaxies usable in the weak-lensing convergence reconstruction, leading to higher noise than that of the areas away from the masks. We thus develop a model in which we exclude only those very large masks with radius larger than 3' but keep all the other masked regions in peak counting statistics. For the remaining part, we treat the areas close to and away from the masked regions separately with different noise levels. It is shown that this two-noise-level model can account for the mask effect on peak statistics very well, and the bias in cosmological parameters is significantly reduced if this model is applied in the parameter fitting.
Multiparametric statistics

CERN Document Server

Serdobolskii, Vadim Ivanovich

2007-01-01

This monograph presents mathematical theory of statistical models described by the essentially large number of unknown parameters, comparable with sample size but can also be much larger. In this meaning, the proposed theory can be called "essentially multiparametric". It is developed on the basis of the Kolmogorov asymptotic approach in which sample size increases along with the number of unknown parameters.This theory opens a way for solution of central problems of multivariate statistics, which up until now have not been solved. Traditional statistical methods based on the idea of an infinite sampling often break down in the solution of real problems, and, dependent on data, can be inefficient, unstable and even not applicable. In this situation, practical statisticians are forced to use various heuristic methods in the hope the will find a satisfactory solution.Mathematical theory developed in this book presents a regular technique for implementing new, more efficient versions of statistical procedures. ...
Statistical physics inspired energy-efficient coded-modulation for optical communications.

Science.gov (United States)

Djordjevic, Ivan B; Xu, Lei; Wang, Ting

2012-04-15

Because Shannon's entropy can be obtained by Stirling's approximation of thermodynamics entropy, the statistical physics energy minimization methods are directly applicable to the signal constellation design. We demonstrate that statistical physics inspired energy-efficient (EE) signal constellation designs, in combination with large-girth low-density parity-check (LDPC) codes, significantly outperform conventional LDPC-coded polarization-division multiplexed quadrature amplitude modulation schemes. We also describe an EE signal constellation design algorithm. Finally, we propose the discrete-time implementation of D-dimensional transceiver and corresponding EE polarization-division multiplexed system. © 2012 Optical Society of America
Long-range correlations, geometrical structure, and transport properties of macromolecular solutions. The equivalence of configurational statistics and geometrodynamics of large molecules.

Science.gov (United States)

Mezzasalma, Stefano A

2007-12-04

A special theory of Brownian relativity was previously proposed to describe the universal picture arising in ideal polymer solutions. In brief, it redefines a Gaussian macromolecule in a 4-dimensional diffusive spacetime, establishing a (weak) Lorentz-Poincaré invariance between liquid and polymer Einstein's laws for Brownian movement. Here, aimed at inquiring into the effect of correlations, we deepen the extension of the special theory to a general formulation. The previous statistical equivalence, for dynamic trajectories of liquid molecules and static configurations of macromolecules, and rather obvious in uncorrelated systems, is enlarged by a more general principle of equivalence, for configurational statistics and geometrodynamics. Accordingly, the three geodesic motion, continuity, and field equations could be rewritten, and a number of scaling behaviors were recovered in a spacetime endowed with general static isotropic metric (i.e., for equilibrium polymer solutions). We also dealt with universality in the volume fraction and, unexpectedly, found that a hyperscaling relation of the form, (average size) x (diffusivity) x (viscosity)1/2 ~f(N0, phi0) is fulfilled in several regimes, both in the chain monomer number (N) and polymer volume fraction (phi). Entangled macromolecular dynamics was treated as a geodesic light deflection, entaglements acting in close analogy to the field generated by a spherically symmetric mass source, where length fluctuations of the chain primitive path behave as azimuth fluctuations of its shape. Finally, the general transformation rule for translational and diffusive frames gives a coordinate gauge invariance, suggesting a widened Lorentz-Poincaré symmetry for Brownian statistics. We expect this approach to find effective applications to solutions of arbitrarily large molecules displaying a variety of structures, where the effect of geometry is more explicit and significant in itself (e.g., surfactants, lipids, proteins).

Statistical correlations in an ideal gas of particles obeying fractional exclusion statistics.

Science.gov (United States)

Pellegrino, F M D; Angilella, G G N; March, N H; Pucci, R

2007-12-01

After a brief discussion of the concepts of fractional exchange and fractional exclusion statistics, we report partly analytical and partly numerical results on thermodynamic properties of assemblies of particles obeying fractional exclusion statistics. The effect of dimensionality is one focal point, the ratio mu/k_(B)T of chemical potential to thermal energy being obtained numerically as a function of a scaled particle density. Pair correlation functions are also presented as a function of the statistical parameter, with Friedel oscillations developing close to the fermion limit, for sufficiently large density.
Statistical processing of large image sequences.

Science.gov (United States)

Khellah, F; Fieguth, P; Murray, M J; Allen, M

2005-01-01

The dynamic estimation of large-scale stochastic image sequences, as frequently encountered in remote sensing, is important in a variety of scientific applications. However, the size of such images makes conventional dynamic estimation methods, for example, the Kalman and related filters, impractical. In this paper, we present an approach that emulates the Kalman filter, but with considerably reduced computational and storage requirements. Our approach is illustrated in the context of a 512 x 512 image sequence of ocean surface temperature. The static estimation step, the primary contribution here, uses a mixture of stationary models to accurately mimic the effect of a nonstationary prior, simplifying both computational complexity and modeling. Our approach provides an efficient, stable, positive-definite model which is consistent with the given correlation structure. Thus, the methods of this paper may find application in modeling and single-frame estimation.
Possible uses of animal databases for further statistical evaluation and modeling

International Nuclear Information System (INIS)

Griffith, W.C.; Boecker, B.B.; Gerber, G.B.

1995-01-01

Many studies have been performed in animals which mimic potential exposures of people in order to understand how factors modify radiation dose-response relationships. Cooperative analyses by investigators in different laboratories have a large potential for strengthening the conclusions that can be drawn from individual studies. When information on each animal is combined, then formal tests can be made to demonstrate that apparent consistencies or inconsistencies are statistically significant. Statistical methods must be carefully chosen so that differences between laboratories or studies can be controlled or described as part of the analysis in the interpretation of the conclusions. In this report, the example of bone cancer of the large number of studies of modifying factors for bone cancer available from studies in US and European laboratories
Practical Statistics for the LHC

CERN Document Server

Cranmer, Kyle

2015-05-22

This document is a pedagogical introduction to statistics for particle physics. Emphasis is placed on the terminology, concepts, and methods being used at the Large Hadron Collider. The document addresses both the statistical tests applied to a model of the data and the modeling itself.
Ad hoc statistical consulting within a large research organization

CSIR Research Space (South Africa)

Elphinstone, CD

2009-08-01

Full Text Available requests were growing to the extent where it was difficult to manage them together with project and research workload. Also, the access to computing and some basic statistical literacy meant that a high proportion of advanced queries were received.... The challenge was to achieve this in a cost effective way with limited financial and personnel resources. Experience Some of the challenges experienced with the HotSeat service: • Researchers consulting with a statistician after the data is collected...
Local sequence alignments statistics: deviations from Gumbel statistics in the rare-event tail

Directory of Open Access Journals (Sweden)

Burghardt Bernd

2007-07-01

Full Text Available Abstract Background The optimal score for ungapped local alignments of infinitely long random sequences is known to follow a Gumbel extreme value distribution. Less is known about the important case, where gaps are allowed. For this case, the distribution is only known empirically in the high-probability region, which is biologically less relevant. Results We provide a method to obtain numerically the biologically relevant rare-event tail of the distribution. The method, which has been outlined in an earlier work, is based on generating the sequences with a parametrized probability distribution, which is biased with respect to the original biological one, in the framework of Metropolis Coupled Markov Chain Monte Carlo. Here, we first present the approach in detail and evaluate the convergence of the algorithm by considering a simple test case. In the earlier work, the method was just applied to one single example case. Therefore, we consider here a large set of parameters: We study the distributions for protein alignment with different substitution matrices (BLOSUM62 and PAM250 and affine gap costs with different parameter values. In the logarithmic phase (large gap costs it was previously assumed that the Gumbel form still holds, hence the Gumbel distribution is usually used when evaluating p-values in databases. Here we show that for all cases, provided that the sequences are not too long (L > 400, a "modified" Gumbel distribution, i.e. a Gumbel distribution with an additional Gaussian factor is suitable to describe the data. We also provide a "scaling analysis" of the parameters used in the modified Gumbel distribution. Furthermore, via a comparison with BLAST parameters, we show that significance estimations change considerably when using the true distributions as presented here. Finally, we study also the distribution of the sum statistics of the k best alignments. Conclusion Our results show that the statistics of gapped and ungapped local
Determining coding CpG islands by identifying regions significant for pattern statistics on Markov chains.

Science.gov (United States)

Singer, Meromit; Engström, Alexander; Schönhuth, Alexander; Pachter, Lior

2011-09-23

Recent experimental and computational work confirms that CpGs can be unmethylated inside coding exons, thereby showing that codons may be subjected to both genomic and epigenomic constraint. It is therefore of interest to identify coding CpG islands (CCGIs) that are regions inside exons enriched for CpGs. The difficulty in identifying such islands is that coding exons exhibit sequence biases determined by codon usage and constraints that must be taken into account. We present a method for finding CCGIs that showcases a novel approach we have developed for identifying regions of interest that are significant (with respect to a Markov chain) for the counts of any pattern. Our method begins with the exact computation of tail probabilities for the number of CpGs in all regions contained in coding exons, and then applies a greedy algorithm for selecting islands from among the regions. We show that the greedy algorithm provably optimizes a biologically motivated criterion for selecting islands while controlling the false discovery rate. We applied this approach to the human genome (hg18) and annotated CpG islands in coding exons. The statistical criterion we apply to evaluating islands reduces the number of false positives in existing annotations, while our approach to defining islands reveals significant numbers of undiscovered CCGIs in coding exons. Many of these appear to be examples of functional epigenetic specialization in coding exons.
Statistical Model of Extreme Shear

DEFF Research Database (Denmark)

Hansen, Kurt Schaldemose; Larsen, Gunner Chr.

2005-01-01

In order to continue cost-optimisation of modern large wind turbines, it is important to continuously increase the knowledge of wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... by a model that, on a statistically consistent basis, describes the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of full-scale measurements recorded with a high sampling rate...
Statistical measures of galaxy clustering

International Nuclear Information System (INIS)

Porter, D.H.

1988-01-01

Consideration is given to the large-scale distribution of galaxies and ways in which this distribution may be statistically measured. Galaxy clustering is hierarchical in nature, so that the positions of clusters of galaxies are themselves spatially clustered. A simple identification of groups of galaxies would be an inadequate description of the true richness of galaxy clustering. Current observations of the large-scale structure of the universe and modern theories of cosmology may be studied with a statistical description of the spatial and velocity distributions of galaxies. 8 refs
Reliability and statistical power analysis of cortical and subcortical FreeSurfer metrics in a large sample of healthy elderly.

Science.gov (United States)

Liem, Franziskus; Mérillat, Susan; Bezzola, Ladina; Hirsiger, Sarah; Philipp, Michel; Madhyastha, Tara; Jäncke, Lutz

2015-03-01

FreeSurfer is a tool to quantify cortical and subcortical brain anatomy automatically and noninvasively. Previous studies have reported reliability and statistical power analyses in relatively small samples or only selected one aspect of brain anatomy. Here, we investigated reliability and statistical power of cortical thickness, surface area, volume, and the volume of subcortical structures in a large sample (N=189) of healthy elderly subjects (64+ years). Reliability (intraclass correlation coefficient) of cortical and subcortical parameters is generally high (cortical: ICCs>0.87, subcortical: ICCs>0.95). Surface-based smoothing increases reliability of cortical thickness maps, while it decreases reliability of cortical surface area and volume. Nevertheless, statistical power of all measures benefits from smoothing. When aiming to detect a 10% difference between groups, the number of subjects required to test effects with sufficient power over the entire cortex varies between cortical measures (cortical thickness: N=39, surface area: N=21, volume: N=81; 10mm smoothing, power=0.8, α=0.05). For subcortical regions this number is between 16 and 76 subjects, depending on the region. We also demonstrate the advantage of within-subject designs over between-subject designs. Furthermore, we publicly provide a tool that allows researchers to perform a priori power analysis and sensitivity analysis to help evaluate previously published studies and to design future studies with sufficient statistical power. Copyright © 2014 Elsevier Inc. All rights reserved.
Statistical analysis of fuel failures in large break loss-of-coolant accident (LBLOCA) in EPR type nuclear power plant

International Nuclear Information System (INIS)

Arkoma, Asko; Hänninen, Markku; Rantamäki, Karin; Kurki, Joona; Hämäläinen, Anitta

2015-01-01

Highlights: • The number of failing fuel rods in a LB-LOCA in an EPR is evaluated. • 59 scenarios are simulated with the system code APROS. • 1000 rods per scenario are simulated with the fuel performance code FRAPTRAN-GENFLO. • All the rods in the reactor are simulated in the worst scenario. • Results suggest that the regulations set by the Finnish safety authority are met. - Abstract: In this paper, the number of failing fuel rods in a large break loss-of-coolant accident (LB-LOCA) in EPR-type nuclear power plant is evaluated using statistical methods. For this purpose, a statistical fuel failure analysis procedure has been developed. The developed method utilizes the results of nonparametric statistics, the Wilks’ formula in particular, and is based on the selection and variation of parameters that are important in accident conditions. The accident scenario is simulated with the coupled fuel performance – thermal hydraulics code FRAPTRAN-GENFLO using various parameter values and thermal hydraulic and power history boundary conditions between the simulations. The number of global scenarios is 59 (given by the Wilks’ formula), and 1000 rods are simulated in each scenario. The boundary conditions are obtained from a new statistical version of the system code APROS. As a result, in the worst global scenario, 1.2% of the simulated rods failed, and it can be concluded that the Finnish safety regulations are hereby met (max. 10% of the rods allowed to fail)
Statistical analysis of fuel failures in large break loss-of-coolant accident (LBLOCA) in EPR type nuclear power plant

Energy Technology Data Exchange (ETDEWEB)

Arkoma, Asko, E-mail: asko.arkoma@vtt.fi; Hänninen, Markku; Rantamäki, Karin; Kurki, Joona; Hämäläinen, Anitta

2015-04-15

Highlights: • The number of failing fuel rods in a LB-LOCA in an EPR is evaluated. • 59 scenarios are simulated with the system code APROS. • 1000 rods per scenario are simulated with the fuel performance code FRAPTRAN-GENFLO. • All the rods in the reactor are simulated in the worst scenario. • Results suggest that the regulations set by the Finnish safety authority are met. - Abstract: In this paper, the number of failing fuel rods in a large break loss-of-coolant accident (LB-LOCA) in EPR-type nuclear power plant is evaluated using statistical methods. For this purpose, a statistical fuel failure analysis procedure has been developed. The developed method utilizes the results of nonparametric statistics, the Wilks’ formula in particular, and is based on the selection and variation of parameters that are important in accident conditions. The accident scenario is simulated with the coupled fuel performance – thermal hydraulics code FRAPTRAN-GENFLO using various parameter values and thermal hydraulic and power history boundary conditions between the simulations. The number of global scenarios is 59 (given by the Wilks’ formula), and 1000 rods are simulated in each scenario. The boundary conditions are obtained from a new statistical version of the system code APROS. As a result, in the worst global scenario, 1.2% of the simulated rods failed, and it can be concluded that the Finnish safety regulations are hereby met (max. 10% of the rods allowed to fail)
Subdomain sensitive statistical parsing using raw corpora

NARCIS (Netherlands)

Plank, B.; Sima'an, K.

2008-01-01

Modern statistical parsers are trained on large annotated corpora (treebanks). These treebanks usually consist of sentences addressing different subdomains (e.g. sports, politics, music), which implies that the statistics gathered by current statistical parsers are mixtures of subdomains of language
Statistical mechanics in JINR

International Nuclear Information System (INIS)

Tonchev, N.; Shumovskij, A.S.

1986-01-01

The history of investigations, conducted at the JINR in the field of statistical mechanics, beginning with the fundamental works by Bogolyubov N.N. on superconductivity microscopic theory is presented. Ideas, introduced in these works and methods developed in them, have largely determined the ways for developing statistical mechanics in the JINR and Hartree-Fock-Bogolyubov variational principle has become an important method of the modern nucleus theory. A brief review of the main achievements, connected with the development of statistical mechanics methods and their application in different fields of physical science is given
Common misconceptions about data analysis and statistics.

Science.gov (United States)

Motulsky, Harvey J

2014-11-01

Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason maybe that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: 1. P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. 2. Overemphasis on P values rather than on the actual size of the observed effect. 3. Overuse of statistical hypothesis testing, and being seduced by the word "significant". 4. Overreliance on standard errors, which are often misunderstood.
Common misconceptions about data analysis and statistics.

Science.gov (United States)

Motulsky, Harvey J

2015-02-01

Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: (1) P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. (2) Overemphasis on P values rather than on the actual size of the observed effect. (3) Overuse of statistical hypothesis testing, and being seduced by the word "significant". (4) Overreliance on standard errors, which are often misunderstood.
Statistical mechanical analysis of the linear vector channel in digital communication

International Nuclear Information System (INIS)

Takeda, Koujin; Hatabu, Atsushi; Kabashima, Yoshiyuki

2007-01-01

A statistical mechanical framework to analyze linear vector channel models in digital wireless communication is proposed for a large system. The framework is a generalization of that proposed for code-division multiple-access systems in Takeda et al (2006 Europhys. Lett. 76 1193) and enables the analysis of the system in which the elements of the channel transfer matrix are statistically correlated with each other. The significance of the proposed scheme is demonstrated by assessing the performance of an existing model of multi-input multi-output communication systems
A large-scale perspective on stress-induced alterations in resting-state networks

Science.gov (United States)

Maron-Katz, Adi; Vaisvaser, Sharon; Lin, Tamar; Hendler, Talma; Shamir, Ron

2016-02-01

Stress is known to induce large-scale neural modulations. However, its neural effect once the stressor is removed and how it relates to subjective experience are not fully understood. Here we used a statistically sound data-driven approach to investigate alterations in large-scale resting-state functional connectivity (rsFC) induced by acute social stress. We compared rsfMRI profiles of 57 healthy male subjects before and after stress induction. Using a parcellation-based univariate statistical analysis, we identified a large-scale rsFC change, involving 490 parcel-pairs. Aiming to characterize this change, we employed statistical enrichment analysis, identifying anatomic structures that were significantly interconnected by these pairs. This analysis revealed strengthening of thalamo-cortical connectivity and weakening of cross-hemispheral parieto-temporal connectivity. These alterations were further found to be associated with change in subjective stress reports. Integrating report-based information on stress sustainment 20 minutes post induction, revealed a single significant rsFC change between the right amygdala and the precuneus, which inversely correlated with the level of subjective recovery. Our study demonstrates the value of enrichment analysis for exploring large-scale network reorganization patterns, and provides new insight on stress-induced neural modulations and their relation to subjective experience.
Strong laws for L- and U-statistics

NARCIS (Netherlands)

Aaronson, J; Burton, R; Dehling, H; Gilat, D; Hill, T; Weiss, B

Strong laws of large numbers are given for L-statistics (linear combinations of order statistics) and for U-statistics (averages of kernels of random samples) for ergodic stationary processes, extending classical theorems; of Hoeffding and of Helmers for lid sequences. Examples are given to show
Lensing corrections to the E {sub g} ( z ) statistics from large scale structure

Energy Technology Data Exchange (ETDEWEB)

Dizgah, Azadeh Moradinezhad; Durrer, Ruth, E-mail: Azadeh.Moradinezhad@unige.ch, E-mail: Ruth.Durrer@unige.ch [Department of Theoretical Physics and Center for Astroparticle Physics, University of Geneva, 24 quai E. Ansermet, CH-1211 Geneva 4 (Switzerland)

2016-09-01

We study the impact of the often neglected lensing contribution to galaxy number counts on the E {sub g} statistics which is used to constrain deviations from GR. This contribution affects both the galaxy-galaxy and the convergence-galaxy spectra, while it is larger for the latter. At higher redshifts probed by upcoming surveys, for instance at z = 1.5, neglecting this term induces an error of (25–40)% in the spectra and therefore on the E {sub g} statistics which is constructed from the combination of the two. Moreover, including it, renders the E {sub g} statistics scale and bias-dependent and hence puts into question its very objective.

Statistical aspects of determinantal point processes

DEFF Research Database (Denmark)

Lavancier, Frédéric; Møller, Jesper; Rubak, Ege

The statistical aspects of determinantal point processes (DPPs) seem largely unexplored. We review the appealing properties of DDPs, demonstrate that they are useful models for repulsiveness, detail a simulation procedure, and provide freely available software for simulation and statistical infer...
Test the Overall Significance of p-values by Using Joint Tail Probability of Ordered p-values as Test Statistic

OpenAIRE

Fang, Yongxiang; Wit, Ernst

2008-01-01

Fisher’s combined probability test is the most commonly used method to test the overall significance of a set independent p-values. However, it is very obviously that Fisher’s statistic is more sensitive to smaller p-values than to larger p-value and a small p-value may overrule the other p-values and decide the test result. This is, in some cases, viewed as a flaw. In order to overcome this flaw and improve the power of the test, the joint tail probability of a set p-values is proposed as a ...
Basics of modern mathematical statistics

CERN Document Server

Spokoiny, Vladimir

2015-01-01

This textbook provides a unified and self-contained presentation of the main approaches to and ideas of mathematical statistics. It collects the basic mathematical ideas and tools needed as a basis for more serious studies or even independent research in statistics. The majority of existing textbooks in mathematical statistics follow the classical asymptotic framework. Yet, as modern statistics has changed rapidly in recent years, new methods and approaches have appeared. The emphasis is on finite sample behavior, large parameter dimensions, and model misspecifications. The present book provides a fully self-contained introduction to the world of modern mathematical statistics, collecting the basic knowledge, concepts and findings needed for doing further research in the modern theoretical and applied statistics. This textbook is primarily intended for graduate and postdoc students and young researchers who are interested in modern statistical methods.
After statistics reform : Should we still teach significance testing?

NARCIS (Netherlands)

A. Hak (Tony)

2014-01-01

textabstractIn the longer term null hypothesis significance testing (NHST) will disappear because p- values are not informative and not replicable. Should we continue to teach in the future the procedures of then abolished routines (i.e., NHST)? Three arguments are discussed for not teaching NHST in
Worry, Intolerance of Uncertainty, and Statistics Anxiety

Science.gov (United States)

Williams, Amanda S.

2013-01-01

Statistics anxiety is a problem for most graduate students. This study investigates the relationship between intolerance of uncertainty, worry, and statistics anxiety. Intolerance of uncertainty was significantly related to worry, and worry was significantly related to three types of statistics anxiety. Six types of statistics anxiety were…
Reducing statistics anxiety and enhancing statistics learning achievement: effectiveness of a one-minute strategy.

Science.gov (United States)

Chiou, Chei-Chang; Wang, Yu-Min; Lee, Li-Tze

2014-08-01

Statistical knowledge is widely used in academia; however, statistics teachers struggle with the issue of how to reduce students' statistics anxiety and enhance students' statistics learning. This study assesses the effectiveness of a "one-minute paper strategy" in reducing students' statistics-related anxiety and in improving students' statistics-related achievement. Participants were 77 undergraduates from two classes enrolled in applied statistics courses. An experiment was implemented according to a pretest/posttest comparison group design. The quasi-experimental design showed that the one-minute paper strategy significantly reduced students' statistics anxiety and improved students' statistics learning achievement. The strategy was a better instructional tool than the textbook exercise for reducing students' statistics anxiety and improving students' statistics achievement.
Testing earthquake prediction algorithms: Statistically significant advance prediction of the largest earthquakes in the Circum-Pacific, 1992-1997

Science.gov (United States)

Kossobokov, V.G.; Romashkova, L.L.; Keilis-Borok, V. I.; Healy, J.H.

1999-01-01

Algorithms M8 and MSc (i.e., the Mendocino Scenario) were used in a real-time intermediate-term research prediction of the strongest earthquakes in the Circum-Pacific seismic belt. Predictions are made by M8 first. Then, the areas of alarm are reduced by MSc at the cost that some earthquakes are missed in the second approximation of prediction. In 1992-1997, five earthquakes of magnitude 8 and above occurred in the test area: all of them were predicted by M8 and MSc identified correctly the locations of four of them. The space-time volume of the alarms is 36% and 18%, correspondingly, when estimated with a normalized product measure of empirical distribution of epicenters and uniform time. The statistical significance of the achieved results is beyond 99% both for M8 and MSc. For magnitude 7.5 + , 10 out of 19 earthquakes were predicted by M8 in 40% and five were predicted by M8-MSc in 13% of the total volume considered. This implies a significance level of 81% for M8 and 92% for M8-MSc. The lower significance levels might result from a global change in seismic regime in 1993-1996, when the rate of the largest events has doubled and all of them become exclusively normal or reversed faults. The predictions are fully reproducible; the algorithms M8 and MSc in complete formal definitions were published before we started our experiment [Keilis-Borok, V.I., Kossobokov, V.G., 1990. Premonitory activation of seismic flow: Algorithm M8, Phys. Earth and Planet. Inter. 61, 73-83; Kossobokov, V.G., Keilis-Borok, V.I., Smith, S.W., 1990. Localization of intermediate-term earthquake prediction, J. Geophys. Res., 95, 19763-19772; Healy, J.H., Kossobokov, V.G., Dewey, J.W., 1992. A test to evaluate the earthquake prediction algorithm, M8. U.S. Geol. Surv. OFR 92-401]. M8 is available from the IASPEI Software Library [Healy, J.H., Keilis-Borok, V.I., Lee, W.H.K. (Eds.), 1997. Algorithms for Earthquake Statistics and Prediction, Vol. 6. IASPEI Software Library]. ?? 1999 Elsevier
An Application of Multivariate Statistical Analysis for Query-Driven Visualization

Energy Technology Data Exchange (ETDEWEB)

Gosink, Luke J. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Garth, Christoph [Univ. of California, Davis, CA (United States); Anderson, John C. [Univ. of California, Davis, CA (United States); Bethel, E. Wes [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Joy, Kenneth I. [Univ. of California, Davis, CA (United States)

2011-03-01

Driven by the ability to generate ever-larger, increasingly complex data, there is an urgent need in the scientific community for scalable analysis methods that can rapidly identify salient trends in scientific data. Query-Driven Visualization (QDV) strategies are among the small subset of techniques that can address both large and highly complex datasets. This paper extends the utility of QDV strategies with a statistics-based framework that integrates non-parametric distribution estimation techniques with a new segmentation strategy to visually identify statistically significant trends and features within the solution space of a query. In this framework, query distribution estimates help users to interactively explore their query's solution and visually identify the regions where the combined behavior of constrained variables is most important, statistically, to their inquiry. Our new segmentation strategy extends the distribution estimation analysis by visually conveying the individual importance of each variable to these regions of high statistical significance. We demonstrate the analysis benefits these two strategies provide and show how they may be used to facilitate the refinement of constraints over variables expressed in a user's query. We apply our method to datasets from two different scientific domains to demonstrate its broad applicability.
Incorporating social and cultural significance of large old trees in conservation policy.

Science.gov (United States)

Blicharska, Malgorzata; Mikusiński, Grzegorz

2014-12-01

In addition to providing key ecological functions, large old trees are a part of a social realm and as such provide numerous social-cultural benefits to people. However, their social and cultural values are often neglected when designing conservation policies and management guidelines. We believe that awareness of large old trees as a part of human identity and cultural heritage is essential when addressing the issue of their decline worldwide. Large old trees provide humans with aesthetic, symbolic, religious, and historic values, as well as concrete tangible benefits, such as leaves, branches, or nuts. In many cultures particularly large trees are treated with reverence. Also, contemporary popular culture utilizes the image of trees as sentient beings and builds on the ancient myths that attribute great powers to large trees. Although the social and cultural role of large old trees is usually not taken into account in conservation, accounting for human-related values of these trees is an important part of conservation policy because it may strengthen conservation by highlighting the potential synergies in protecting ecological and social values. © 2014 Society for Conservation Biology.
Statistical principles for prospective study protocols:

DEFF Research Database (Denmark)

Christensen, Robin; Langberg, Henning

2012-01-01

In the design of scientific studies it is essential to decide on which scientific questions one aims to answer, just as it is important to decide on the correct statistical methods to use to answer these questions. The correct use of statistical methods is crucial in all aspects of research...... to quantify relationships in data. Despite an increased focus on statistical content and complexity of biomedical research these topics remain difficult for most researchers. Statistical methods enable researchers to condense large spreadsheets with data into means, proportions, and difference between means...... the statistical principles for trial protocols in terms of design, analysis, and reporting of findings....
Large area synchrotron X-ray fluorescence mapping of biological samples

International Nuclear Information System (INIS)

Kempson, I.; Thierry, B.; Smith, E.; Gao, M.; De Jonge, M.

2014-01-01

Large area mapping of inorganic material in biological samples has suffered severely from prohibitively long acquisition times. With the advent of new detector technology we can now generate statistically relevant information for studying cell populations, inter-variability and bioinorganic chemistry in large specimen. We have been implementing ultrafast synchrotron-based XRF mapping afforded by the MAIA detector for large area mapping of biological material. For example, a 2.5 million pixel map can be acquired in 3 hours, compared to a typical synchrotron XRF set-up needing over 1 month of uninterrupted beamtime. Of particular focus to us is the fate of metals and nanoparticles in cells, 3D tissue models and animal tissues. The large area scanning has for the first time provided statistically significant information on sufficiently large numbers of cells to provide data on intercellular variability in uptake of nanoparticles. Techniques such as flow cytometry generally require analysis of thousands of cells for statistically meaningful comparison, due to the large degree of variability. Large area XRF now gives comparable information in a quantifiable manner. Furthermore, we can now image localised deposition of nanoparticles in tissues that would be highly improbable to 'find' by typical XRF imaging. In addition, the ultra fast nature also makes it viable to conduct 3D XRF tomography over large dimensions. This technology avails new opportunities in biomonitoring and understanding metal and nanoparticle fate ex-vivo. Following from this is extension to molecular imaging through specific anti-body targeted nanoparticles to label specific tissues and monitor cellular process or biological consequence
A novel complete-case analysis to determine statistical significance between treatments in an intention-to-treat population of randomized clinical trials involving missing data.

Science.gov (United States)

Liu, Wei; Ding, Jinhui

2018-04-01

The application of the principle of the intention-to-treat (ITT) to the analysis of clinical trials is challenged in the presence of missing outcome data. The consequences of stopping an assigned treatment in a withdrawn subject are unknown. It is difficult to make a single assumption about missing mechanisms for all clinical trials because there are complicated reactions in the human body to drugs due to the presence of complex biological networks, leading to data missing randomly or non-randomly. Currently there is no statistical method that can tell whether a difference between two treatments in the ITT population of a randomized clinical trial with missing data is significant at a pre-specified level. Making no assumptions about the missing mechanisms, we propose a generalized complete-case (GCC) analysis based on the data of completers. An evaluation of the impact of missing data on the ITT analysis reveals that a statistically significant GCC result implies a significant treatment effect in the ITT population at a pre-specified significance level unless, relative to the comparator, the test drug is poisonous to the non-completers as documented in their medical records. Applications of the GCC analysis are illustrated using literature data, and its properties and limits are discussed.
Large truck and bus crash facts, 2010.

Science.gov (United States)

2012-09-01

This annual edition of Large Truck and Bus Crash Facts contains descriptive statistics about fatal, injury, and : property damage only crashes involving large trucks and buses in 2010. Selected crash statistics on passenger : vehicles are also presen...
Large truck and bus crash facts, 2012.

Science.gov (United States)

2014-06-01

This annual edition of Large Truck and Bus Crash Facts contains descriptive statistics about fatal, injury, and property damage only crashes involving large trucks and buses in 2012. Selected crash statistics on passenger vehicles are also presented ...
Large truck and bus crash facts, 2013.

Science.gov (United States)

2015-04-01

This annual edition of Large Truck and Bus Crash Facts contains descriptive statistics about fatal, injury, and property damage only crashes involving large trucks and buses in 2013. Selected crash statistics on passenger vehicles are also presented ...
Large truck and bus crash facts, 2009.

Science.gov (United States)

2011-10-01

This annual edition of Large Truck and Bus Crash Facts contains descriptive statistics about fatal, injury, and : property damage only crashes involving large trucks and buses in 2009. Selected crash statistics on passenger : vehicles are also presen...
Large truck and bus crash facts, 2011.

Science.gov (United States)

2013-10-01

This annual edition of Large Truck and Bus Crash Facts contains descriptive statistics about fatal, injury, and : property damage only crashes involving large trucks and buses in 2011. Selected crash statistics on passenger : vehicles are also presen...
Immunohistochemical and molecular characteristics with prognostic significance in diffuse large B-cell lymphoma.

Directory of Open Access Journals (Sweden)

Carmen Bellas

Full Text Available Diffuse large B-cell lymphoma (DLBCL is an aggressive non-Hodgkin lymphoma with marked biologic heterogeneity. We analyzed 100 cases of DLBCL to evaluate the prognostic value of immunohistochemical markers derived from the gene expression profiling-defined cell origin signature, including MYC, BCL2, BCL6, and FOXP1 protein expression. We also investigated genetic alterations in BCL2, BCL6, MYC and FOXP1 using fluorescence in situ hybridization and assessed their prognostic significance. BCL6 rearrangements were detected in 29% of cases, and BCL6 gene alteration (rearrangement and/or amplification was associated with the non-germinal center B subtype (non-GCB. BCL2 translocation was associated with the GCB phenotype, and BCL2 protein expression was associated with the translocation and/or amplification of 18q21. MYC rearrangements were detected in 15% of cases, and MYC protein expression was observed in 29% of cases. FOXP1 expression, mainly of the non-GCB subtype, was demonstrated in 37% of cases. Co-expression of the MYC and BCL2 proteins, with non-GCB subtype predominance, was observed in 21% of cases. We detected an association between high FOXP1 expression and a high proliferation rate as well as a significant positive correlation between MYC overexpression and FOXP1 overexpression. MYC, BCL2 and FOXP1 expression were significant predictors of overall survival. The co-expression of MYC and BCL2 confers a poorer clinical outcome than MYC or BCL2 expression alone, whereas cases negative for both markers had the best outcomes. Our study confirms that DLBCL, characterized by the co-expression of MYC and BCL2 proteins, has a poor prognosis and establishes a significant positive correlation with MYC and FOXP1 over-expression in this entity.
Air Carrier Traffic Statistics.

Science.gov (United States)

2013-11-01

This report contains airline operating statistics for large certificated air carriers based on data reported to U.S. Department of Transportation (DOT) by carriers that hold a certificate issued under Section 401 of the Federal Aviation Act of 1958 a...
Air Carrier Traffic Statistics.

Science.gov (United States)

2012-07-01

This report contains airline operating statistics for large certificated air carriers based on data reported to U.S. Department of Transportation (DOT) by carriers that hold a certificate issued under Section 401 of the Federal Aviation Act of 1958 a...

Regularized Statistical Analysis of Anatomy

DEFF Research Database (Denmark)

Sjöstrand, Karl

2007-01-01

This thesis presents the application and development of regularized methods for the statistical analysis of anatomical structures. Focus is on structure-function relationships in the human brain, such as the connection between early onset of Alzheimer’s disease and shape changes of the corpus...... and mind. Statistics represents a quintessential part of such investigations as they are preluded by a clinical hypothesis that must be verified based on observed data. The massive amounts of image data produced in each examination pose an important and interesting statistical challenge...... efficient algorithms which make the analysis of large data sets feasible, and gives examples of applications....
Statistical aspects of determinantal point processes

DEFF Research Database (Denmark)

Lavancier, Frédéric; Møller, Jesper; Rubak, Ege Holger

The statistical aspects of determinantal point processes (DPPs) seem largely unexplored. We review the appealing properties of DDPs, demonstrate that they are useful models for repulsiveness, detail a simulation procedure, and provide freely available software for simulation and statistical...... inference. We pay special attention to stationary DPPs, where we give a simple condition ensuring their existence, construct parametric models, describe how they can be well approximated so that the likelihood can be evaluated and realizations can be simulated, and discuss how statistical inference...
Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution.

Science.gov (United States)

Gangnon, Ronald E

2012-03-01

The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, whereas rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. © 2011, The International Biometric Society.
SWORDS: A statistical tool for analysing large DNA sequences

Indian Academy of Sciences (India)

Unknown

These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software called SWORDS. Using sequences available in ... tions with the cellular processes like recombination, replication .... in DNA sequences using certain specific probability laws. (Pevzner et al ...
Statistical utilitarianism

OpenAIRE

Pivato, Marcus

2013-01-01

We show that, in a sufficiently large population satisfying certain statistical regularities, it is often possible to accurately estimate the utilitarian social welfare function, even if we only have very noisy data about individual utility functions and interpersonal utility comparisons. In particular, we show that it is often possible to identify an optimal or close-to-optimal utilitarian social choice using voting rules such as the Borda rule, approval voting, relative utilitarianism, or a...
Probability, statistics, and associated computing techniques

International Nuclear Information System (INIS)

James, F.

1983-01-01

This chapter attempts to explore the extent to which it is possible for the experimental physicist to find optimal statistical techniques to provide a unique and unambiguous quantitative measure of the significance of raw data. Discusses statistics as the inverse of probability; normal theory of parameter estimation; normal theory (Gaussian measurements); the universality of the Gaussian distribution; real-life resolution functions; combination and propagation of uncertainties; the sum or difference of 2 variables; local theory, or the propagation of small errors; error on the ratio of 2 discrete variables; the propagation of large errors; confidence intervals; classical theory; Bayesian theory; use of the likelihood function; the second derivative of the log-likelihood function; multiparameter confidence intervals; the method of MINOS; least squares; the Gauss-Markov theorem; maximum likelihood for uniform error distribution; the Chebyshev fit; the parameter uncertainties; the efficiency of the Chebyshev estimator; error symmetrization; robustness vs. efficiency; testing of hypotheses (e.g., the Neyman-Pearson test); goodness-of-fit; distribution-free tests; comparing two one-dimensional distributions; comparing multidimensional distributions; and permutation tests for comparing two point sets
Statistical assessment of crosstalk enrichment between gene groups in biological networks.

Science.gov (United States)

McCormack, Theodore; Frings, Oliver; Alexeyenko, Andrey; Sonnhammer, Erik L L

2013-01-01

Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions.
Permutation statistical methods an integrated approach

CERN Document Server

Berry, Kenneth J; Johnston, Janis E

2016-01-01

This research monograph provides a synthesis of a number of statistical tests and measures, which, at first consideration, appear disjoint and unrelated. Numerous comparisons of permutation and classical statistical methods are presented, and the two methods are compared via probability values and, where appropriate, measures of effect size. Permutation statistical methods, compared to classical statistical methods, do not rely on theoretical distributions, avoid the usual assumptions of normality and homogeneity of variance, and depend only on the data at hand. This text takes a unique approach to explaining statistics by integrating a large variety of statistical methods, and establishing the rigor of a topic that to many may seem to be a nascent field in statistics. This topic is new in that it took modern computing power to make permutation methods available to people working in the mainstream of research. This research monograph addresses a statistically-informed audience, and can also easily serve as a ...
Efficient statistically accurate algorithms for the Fokker-Planck equation in large dimensions

Science.gov (United States)

Chen, Nan; Majda, Andrew J.

2018-02-01

Solving the Fokker-Planck equation for high-dimensional complex turbulent dynamical systems is an important and practical issue. However, most traditional methods suffer from the curse of dimensionality and have difficulties in capturing the fat tailed highly intermittent probability density functions (PDFs) of complex systems in turbulence, neuroscience and excitable media. In this article, efficient statistically accurate algorithms are developed for solving both the transient and the equilibrium solutions of Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures. The algorithms involve a hybrid strategy that requires only a small number of ensembles. Here, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. Particularly, the parametric method provides closed analytical formulae for determining the conditional Gaussian distributions in the high-dimensional subspace and is therefore computationally efficient and accurate. The full non-Gaussian PDF of the system is then given by a Gaussian mixture. Different from traditional particle methods, each conditional Gaussian distribution here covers a significant portion of the high-dimensional PDF. Therefore a small number of ensembles is sufficient to recover the full PDF, which overcomes the curse of dimensionality. Notably, the mixture distribution has significant skill in capturing the transient behavior with fat tails of the high-dimensional non-Gaussian PDFs, and this facilitates the algorithms in accurately describing the intermittency and extreme events in complex turbulent systems. It is shown in a stringent set of test problems that the method only requires an order of O (100) ensembles to successfully recover the highly non-Gaussian transient PDFs in up to 6
Modified Distribution-Free Goodness-of-Fit Test Statistic.

Science.gov (United States)

Chun, So Yeon; Browne, Michael W; Shapiro, Alexander

2018-03-01

Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.
Statistical identification with hidden Markov models of large order splitting strategies in an equity market

Science.gov (United States)

Vaglica, Gabriella; Lillo, Fabrizio; Mantegna, Rosario N.

2010-07-01

Large trades in a financial market are usually split into smaller parts and traded incrementally over extended periods of time. We address these large trades as hidden orders. In order to identify and characterize hidden orders, we fit hidden Markov models to the time series of the sign of the tick-by-tick inventory variation of market members of the Spanish Stock Exchange. Our methodology probabilistically detects trading sequences, which are characterized by a significant majority of buy or sell transactions. We interpret these patches of sequential buying or selling transactions as proxies of the traded hidden orders. We find that the time, volume and number of transaction size distributions of these patches are fat tailed. Long patches are characterized by a large fraction of market orders and a low participation rate, while short patches have a large fraction of limit orders and a high participation rate. We observe the existence of a buy-sell asymmetry in the number, average length, average fraction of market orders and average participation rate of the detected patches. The detected asymmetry is clearly dependent on the local market trend. We also compare the hidden Markov model patches with those obtained with the segmentation method used in Vaglica et al (2008 Phys. Rev. E 77 036110), and we conclude that the former ones can be interpreted as a partition of the latter ones.
Evaluating statistical and clinical significance of intervention effects in single-case experimental designs: an SPSS method to analyze univariate data.

Science.gov (United States)

Maric, Marija; de Haan, Else; Hogendoorn, Sanne M; Wolters, Lidewij H; Huizenga, Hilde M

2015-03-01

Single-case experimental designs are useful methods in clinical research practice to investigate individual client progress. Their proliferation might have been hampered by methodological challenges such as the difficulty applying existing statistical procedures. In this article, we describe a data-analytic method to analyze univariate (i.e., one symptom) single-case data using the common package SPSS. This method can help the clinical researcher to investigate whether an intervention works as compared with a baseline period or another intervention type, and to determine whether symptom improvement is clinically significant. First, we describe the statistical method in a conceptual way and show how it can be implemented in SPSS. Simulation studies were performed to determine the number of observation points required per intervention phase. Second, to illustrate this method and its implications, we present a case study of an adolescent with anxiety disorders treated with cognitive-behavioral therapy techniques in an outpatient psychotherapy clinic, whose symptoms were regularly assessed before each session. We provide a description of the data analyses and results of this case study. Finally, we discuss the advantages and shortcomings of the proposed method. Copyright © 2014. Published by Elsevier Ltd.
Assessing large-scale weekly cycles in meteorological variables: a review

Directory of Open Access Journals (Sweden)

A. Sanchez-Lorenzo

2012-07-01

Full Text Available Several studies have claimed to have found significant weekly cycles of meteorological variables appearing over large domains, which can hardly be related to urban effects exclusively. Nevertheless, there is still an ongoing scientific debate whether these large-scale weekly cycles exist or not, and some other studies fail to reproduce them with statistical significance. In addition to the lack of the positive proof for the existence of these cycles, their possible physical explanations have been controversially discussed during the last years. In this work we review the main results about this topic published during the recent two decades, including a summary of the existence or non-existence of significant weekly weather cycles across different regions of the world, mainly over the US, Europe and Asia. In addition, some shortcomings of common statistical methods for analyzing weekly cycles are listed. Finally, a brief summary of supposed causes of the weekly cycles, focusing on the aerosol-cloud-radiation interactions and their impact on meteorological variables as a result of the weekly cycles of anthropogenic activities, and possible directions for future research, is presented.
Non-local statistical label fusion for multi-atlas segmentation.

Science.gov (United States)

Asman, Andrew J; Landman, Bennett A

2013-02-01

Multi-atlas segmentation provides a general purpose, fully-automated approach for transferring spatial information from an existing dataset ("atlases") to a previously unseen context ("target") through image registration. The method to resolve voxelwise label conflicts between the registered atlases ("label fusion") has a substantial impact on segmentation quality. Ideally, statistical fusion algorithms (e.g., STAPLE) would result in accurate segmentations as they provide a framework to elegantly integrate models of rater performance. The accuracy of statistical fusion hinges upon accurately modeling the underlying process of how raters err. Despite success on human raters, current approaches inaccurately model multi-atlas behavior as they fail to seamlessly incorporate exogenous intensity information into the estimation process. As a result, locally weighted voting algorithms represent the de facto standard fusion approach in clinical applications. Moreover, regardless of the approach, fusion algorithms are generally dependent upon large atlas sets and highly accurate registration as they implicitly assume that the registered atlases form a collectively unbiased representation of the target. Herein, we propose a novel statistical fusion algorithm, Non-Local STAPLE (NLS). NLS reformulates the STAPLE framework from a non-local means perspective in order to learn what label an atlas would have observed, given perfect correspondence. Through this reformulation, NLS (1) seamlessly integrates intensity into the estimation process, (2) provides a theoretically consistent model of multi-atlas observation error, and (3) largely diminishes the need for large atlas sets and very high-quality registrations. We assess the sensitivity and optimality of the approach and demonstrate significant improvement in two empirical multi-atlas experiments. Copyright © 2012 Elsevier B.V. All rights reserved.
Performance studies of GooFit on GPUs vs RooFit on CPUs while estimating the statistical significance of a new physical signal

Science.gov (United States)

Di Florio, Adriano

2017-10-01

In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B + → J/ψϕK +. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.
An Entropy-Based Statistic for Genomewide Association Studies

OpenAIRE

Zhao, Jinying; Boerwinkle, Eric; Xiong, Momiao

2005-01-01

Efficient genotyping methods and the availability of a large collection of single-nucleotide polymorphisms provide valuable tools for genetic studies of human disease. The standard χ2 statistic for case-control studies, which uses a linear function of allele frequencies, has limited power when the number of marker loci is large. We introduce a novel test statistic for genetic association studies that uses Shannon entropy and a nonlinear function of allele frequencies to amplify the difference...
Statistical Analysis and validation

NARCIS (Netherlands)

Hoefsloot, H.C.J.; Horvatovich, P.; Bischoff, R.

2013-01-01

In this chapter guidelines are given for the selection of a few biomarker candidates from a large number of compounds with a relative low number of samples. The main concepts concerning the statistical validation of the search for biomarkers are discussed. These complicated methods and concepts are
Robust statistical methods for significance evaluation and applications in cancer driver detection and biomarker discovery

DEFF Research Database (Denmark)

Madsen, Tobias

2017-01-01

In the present thesis I develop, implement and apply statistical methods for detecting genomic elements implicated in cancer development and progression. This is done in two separate bodies of work. The first uses the somatic mutation burden to distinguish cancer driver mutations from passenger m...
The Euclid Statistical Matrix Tool

Directory of Open Access Journals (Sweden)

Curtis Tilves

2017-06-01

Full Text Available Stataphobia, a term used to describe the fear of statistics and research methods, can result from a lack of improper training in statistical methods. Poor statistical methods training can have an effect on health policy decision making and may play a role in the low research productivity seen in developing countries. One way to reduce Stataphobia is to intervene in the teaching of statistics in the classroom; however, such an intervention must tackle several obstacles, including student interest in the material, multiple ways of learning materials, and language barriers. We present here the Euclid Statistical Matrix, a tool for combatting Stataphobia on a global scale. This free tool is comprised of popular statistical YouTube channels and web sources that teach and demonstrate statistical concepts in a variety of presentation methods. Working with international teams in Iran, Japan, Egypt, Russia, and the United States, we have also developed the Statistical Matrix in multiple languages to address language barriers to learning statistics. By utilizing already-established large networks, we are able to disseminate our tool to thousands of Farsi-speaking university faculty and students in Iran and the United States. Future dissemination of the Euclid Statistical Matrix throughout the Central Asia and support from local universities may help to combat low research productivity in this region.
Large truck and bus crash facts, 2008.

Science.gov (United States)

2010-03-01

This annual edition of Large Truck and Bus Crash Facts contains descriptive statistics about fatal, injury, and : property damage only crashes involving large trucks and buses in 2008. Selected crash statistics on passenger : vehicles are also presen...

Statistical tests to compare motif count exceptionalities

Directory of Open Access Journals (Sweden)

Vandewalle Vincent

2007-03-01

Full Text Available Abstract Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use.
Statistical Dependence of Pipe Breaks on Explanatory Variables

Directory of Open Access Journals (Sweden)

Patricia Gómez-Martínez

2017-02-01

Full Text Available Aging infrastructure is the main challenge currently faced by water suppliers. Estimation of assets lifetime requires reliable criteria to plan assets repair and renewal strategies. To do so, pipe break prediction is one of the most important inputs. This paper analyzes the statistical dependence of pipe breaks on explanatory variables, determining their optimal combination and quantifying their influence on failure prediction accuracy. A large set of registered data from Madrid water supply network, managed by Canal de Isabel II, has been filtered, classified and studied. Several statistical Bayesian models have been built and validated from the available information with a technique that combines reference periods of time as well as geographical location. Statistical models of increasing complexity are built from zero up to five explanatory variables following two approaches: a set of independent variables or a combination of two joint variables plus an additional number of independent variables. With the aim of finding the variable combination that provides the most accurate prediction, models are compared following an objective validation procedure based on the model skill to predict the number of pipe breaks in a large set of geographical locations. As expected, model performance improves as the number of explanatory variables increases. However, the rate of improvement is not constant. Performance metrics improve significantly up to three variables, but the tendency is softened for higher order models, especially in trunk mains where performance is reduced. Slight differences are found between trunk mains and distribution lines when selecting the most influent variables and models.
A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics.

Science.gov (United States)

Lu, Qiongshi; Li, Boyang; Ou, Derek; Erlendsdottir, Margret; Powles, Ryan L; Jiang, Tony; Hu, Yiming; Chang, David; Jin, Chentian; Dai, Wei; He, Qidu; Liu, Zefeng; Mukherjee, Shubhabrata; Crane, Paul K; Zhao, Hongyu

2017-12-07

Despite the success of large-scale genome-wide association studies (GWASs) on complex traits, our understanding of their genetic architecture is far from complete. Jointly modeling multiple traits' genetic profiles has provided insights into the shared genetic basis of many complex traits. However, large-scale inference sets a high bar for both statistical power and biological interpretability. Here we introduce a principled framework to estimate annotation-stratified genetic covariance between traits using GWAS summary statistics. Through theoretical and numerical analyses, we demonstrate that our method provides accurate covariance estimates, thereby enabling researchers to dissect both the shared and distinct genetic architecture across traits to better understand their etiologies. Among 50 complex traits with publicly accessible GWAS summary statistics (N total ≈ 4.5 million), we identified more than 170 pairs with statistically significant genetic covariance. In particular, we found strong genetic covariance between late-onset Alzheimer disease (LOAD) and amyotrophic lateral sclerosis (ALS), two major neurodegenerative diseases, in single-nucleotide polymorphisms (SNPs) with high minor allele frequencies and in SNPs located in the predicted functional genome. Joint analysis of LOAD, ALS, and other traits highlights LOAD's correlation with cognitive traits and hints at an autoimmune component for ALS. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Irrigated Area Maps and Statistics of India Using Remote Sensing and National Statistics

Directory of Open Access Journals (Sweden)

Prasad S. Thenkabail

2009-04-01

Full Text Available The goal of this research was to compare the remote-sensing derived irrigated areas with census-derived statistics reported in the national system. India, which has nearly 30% of global annualized irrigated areas (AIAs, and is the leading irrigated area country in the World, along with China, was chosen for the study. Irrigated areas were derived for nominal year 2000 using time-series remote sensing at two spatial resolutions: (a 10-km Advanced Very High Resolution Radiometer (AVHRR and (b 500-m Moderate Resolution Imaging Spectroradiometer (MODIS. These areas were compared with the Indian National Statistical Data on irrigated areas reported by the: (a Directorate of Economics and Statistics (DES of the Ministry of Agriculture (MOA, and (b Ministry of Water Resources (MoWR. A state-by-state comparison of remote sensing derived irrigated areas when compared with MoWR derived irrigation potential utilized (IPU, an equivalent of AIA, provided a high degree of correlation with R2 values of: (a 0.79 with 10-km, and (b 0.85 with MODIS 500-m. However, the remote sensing derived irrigated area estimates for India were consistently higher than the irrigated areas reported by the national statistics. The remote sensing derived total area available for irrigation (TAAI, which does not consider intensity of irrigation, was 101 million hectares (Mha using 10-km and 113 Mha using 500-m. The AIAs, which considers intensity of irrigation, was 132 Mha using 10-km and 146 Mha using 500-m. In contrast the IPU, an equivalent of AIAs, as reported by MoWR was 83 Mha. There are “large variations” in irrigated area statistics reported, even between two ministries (e.g., Directorate of Statistics of Ministry of Agriculture and Ministry of Water Resources of the same national system. The causes include: (a reluctance on part of the states to furnish irrigated area data in view of their vested interests in sharing of water, and (b reporting of large volumes of data
Multivariate Statistical Analysis Software Technologies for Astrophysical Research Involving Large Data Bases

Science.gov (United States)

Djorgovski, S. G.

1994-01-01

We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complex database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects of the SKICAT system, and of some of the scientific results achieved to date. We also developed a user-friendly package for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications and has
Multivariate statistical analysis software technologies for astrophysical research involving large data bases

Science.gov (United States)

Djorgovski, S. George

1994-01-01

We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complete database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful, and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications, and has produced real, published results.
Tree-space statistics and approximations for large-scale analysis of anatomical trees

DEFF Research Database (Denmark)

Feragen, Aasa; Owen, Megan; Petersen, Jens

2013-01-01

parametrize the relevant parts of tree-space well. Using the developed approximate statistics, we illustrate how the structure and geometry of airway trees vary across a population and show that airway trees with Chronic Obstructive Pulmonary Disease come from a different distribution in tree-space than...
Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution.

Science.gov (United States)

Gu, Xun; Wang, Yufeng; Gu, Jianying

2002-06-01

The classical (two-round) hypothesis of vertebrate genome duplication proposes two successive whole-genome duplication(s) (polyploidizations) predating the origin of fishes, a view now being seriously challenged. As the debate largely concerns the relative merits of the 'big-bang mode' theory (large-scale duplication) and the 'continuous mode' theory (constant creation by small-scale duplications), we tested whether a significant proportion of paralogous genes in the contemporary human genome was indeed generated in the early stage of vertebrate evolution. After an extensive search of major databases, we dated 1,739 gene duplication events from the phylogenetic analysis of 749 vertebrate gene families. We found a pattern characterized by two waves (I, II) and an ancient component. Wave I represents a recent gene family expansion by tandem or segmental duplications, whereas wave II, a rapid paralogous gene increase in the early stage of vertebrate evolution, supports the idea of genome duplication(s) (the big-bang mode). Further analysis indicated that large- and small-scale gene duplications both make a significant contribution during the early stage of vertebrate evolution to build the current hierarchy of the human proteome.
Fractional statistics and fractional quantized Hall effect

International Nuclear Information System (INIS)

Tao, R.; Wu, Y.S.

1985-01-01

The authors suggest that the origin of the odd-denominator rule observed in the fractional quantized Hall effect (FQHE) may lie in fractional statistics which govern quasiparticles in FQHE. A theorem concerning statistics of clusters of quasiparticles implies that fractional statistics do not allow coexistence of a large number of quasiparticles at fillings with an even denominator. Thus, no Hall plateau can be formed at these fillings, regardless of the presence of an energy gap. 15 references
Statistical power and the Rorschach: 1975-1991.

Science.gov (United States)

Acklin, M W; McDowell, C J; Orndoff, S

1992-10-01

The Rorschach Inkblot Test has been the source of long-standing controversies as to its nature and its psychometric properties. Consistent with behavioral science research in general, the concept of statistical power has been entirely ignored by Rorschach researchers. The concept of power is introduced and discussed, and a power survey of the Rorschach literature published between 1975 and 1991 in the Journal of Personality Assessment, Journal of Consulting and Clinical Psychology, Journal of Abnormal Psychology, Journal of Clinical Psychology, Journal of Personality, Psychological Bulletin, American Journal of Psychiatry, and Journal of Personality and Social Psychology was undertaken. Power was calculated for 2,300 statistical tests in 158 journal articles. Power to detect small, medium, and large effect sizes was .13, .56, and .85, respectively. Similar to the findings in other power surveys conducted on behavioral science research, we concluded that Rorschach research is underpowered to detect the differences under investigation. This undoubtedly contributes to the inconsistency of research findings which has been a source of controversy and criticism over the decades. It appears that research conducted according to the Comprehensive System for the Rorschach is more powerful. Recommendations are offered for improving power and strengthening the design sensitivity of Rorschach research, including increasing sample sizes, use of parametric statistics, reduction of error variance, more accurate reporting of findings, and editorial policies reflecting concern about the magnitude of relationships beyond an exclusive focus on levels of statistical significance.
Non-extensive statistical aspects of clustering and nuclear multi-fragmentation

International Nuclear Information System (INIS)

Calboreanu, A.

2002-01-01

Recent developments concerning an application of the non-extensive Tsalis statistics to describe clustering phenomena is briefly presented. Cluster formation is a common feature of a large number of physical phenomena encountered in molecular and nuclear physics, astrophysics, condensed matter and biophysics. Common to all these is the large number of degrees of freedom, thus justifying a statistical approach. However the conventional statistical mechanics paradigm seems to fail in dealing with clustering. Whether this is due to the prevalence of complex dynamical constrains, or it is a manifestation of new statistics is a subject of considerable interest, which was intensively debated during the last few years. Tsalis conjecture has proved extremely appealing due to its rather elegant and transparent basic arguments. We present here evidence for its adequacy for the study of a large class of physical phenomena related to cluster formation. An application to nuclear multi-fragmentation is presented. (author)
Applied Statistics Using SPSS, STATISTICA, MATLAB and R

CERN Document Server

De Sá, Joaquim P Marques

2007-01-01

This practical reference provides a comprehensive introduction and tutorial on the main statistical analysis topics, demonstrating their solution with the most common software package. Intended for anyone needing to apply statistical analysis to a large variety of science and enigineering problems, the book explains and shows how to use SPSS, MATLAB, STATISTICA and R for analysis such as data description, statistical inference, classification and regression, factor analysis, survival data and directional statistics. It concisely explains key concepts and methods, illustrated by practical examp
Experimental investigation of statistical models describing distribution of counts

International Nuclear Information System (INIS)

Salma, I.; Zemplen-Papp, E.

1992-01-01

The binomial, Poisson and modified Poisson models which are used for describing the statistical nature of the distribution of counts are compared theoretically, and conclusions for application are considered. The validity of the Poisson and the modified Poisson statistical distribution for observing k events in a short time interval is investigated experimentally for various measuring times. The experiments to measure the influence of the significant radioactive decay were performed with 89 Y m (T 1/2 =16.06 s), using a multichannel analyser (4096 channels) in the multiscaling mode. According to the results, Poisson statistics describe the counting experiment for short measuring times (up to T=0.5T 1/2 ) and its application is recommended. However, analysis of the data demonstrated, with confidence, that for long measurements (T≥T 1/2 ) Poisson distribution is not valid and the modified Poisson function is preferable. The practical implications in calculating uncertainties and in optimizing the measuring time are discussed. Differences between the standard deviations evaluated on the basis of the Poisson and binomial models are especially significant for experiments with long measuring time (T/T 1/2 ≥2) and/or large detection efficiency (ε>0.30). Optimization of the measuring time for paired observations yields the same solution for either the binomial or the Poisson distribution. (orig.)
Bayesian statistics in radionuclide metrology: measurement of a decaying source

International Nuclear Information System (INIS)

Bochud, F. O.; Bailat, C.J.; Laedermann, J.P.

2007-01-01

The most intuitive way of defining a probability is perhaps through the frequency at which it appears when a large number of trials are realized in identical conditions. The probability derived from the obtained histogram characterizes the so-called frequentist or conventional statistical approach. In this sense, probability is defined as a physical property of the observed system. By contrast, in Bayesian statistics, a probability is not a physical property or a directly observable quantity, but a degree of belief or an element of inference. The goal of this paper is to show how Bayesian statistics can be used in radionuclide metrology and what its advantages and disadvantages are compared with conventional statistics. This is performed through the example of an yttrium-90 source typically encountered in environmental surveillance measurement. Because of the very low activity of this kind of source and the small half-life of the radionuclide, this measurement takes several days, during which the source decays significantly. Several methods are proposed to compute simultaneously the number of unstable nuclei at a given reference time, the decay constant and the background. Asymptotically, all approaches give the same result. However, Bayesian statistics produces coherent estimates and confidence intervals in a much smaller number of measurements. Apart from the conceptual understanding of statistics, the main difficulty that could deter radionuclide metrologists from using Bayesian statistics is the complexity of the computation. (authors)
Statistical process control charts for attribute data involving very large sample sizes: a review of problems and solutions.

Science.gov (United States)

Mohammed, Mohammed A; Panesar, Jagdeep S; Laney, David B; Wilson, Richard

2013-04-01

The use of statistical process control (SPC) charts in healthcare is increasing. The primary purpose of SPC is to distinguish between common-cause variation which is attributable to the underlying process, and special-cause variation which is extrinsic to the underlying process. This is important because improvement under common-cause variation requires action on the process, whereas special-cause variation merits an investigation to first find the cause. Nonetheless, when dealing with attribute or count data (eg, number of emergency admissions) involving very large sample sizes, traditional SPC charts often produce tight control limits with most of the data points appearing outside the control limits. This can give a false impression of common and special-cause variation, and potentially misguide the user into taking the wrong actions. Given the growing availability of large datasets from routinely collected databases in healthcare, there is a need to present a review of this problem (which arises because traditional attribute charts only consider within-subgroup variation) and its solutions (which consider within and between-subgroup variation), which involve the use of the well-established measurements chart and the more recently developed attribute charts based on Laney's innovative approach. We close by making some suggestions for practice.
Crash risk factors for interstate large trucks in North Carolina.

Science.gov (United States)

Teoh, Eric R; Carter, Daniel L; Smith, Sarah; McCartt, Anne T

2017-09-01

Provide an updated examination of risk factors for large truck involvements in crashes resulting in injury or death. A matched case-control study was conducted in North Carolina of large trucks operated by interstate carriers. Cases were defined as trucks involved in crashes resulting in fatal or non-fatal injury, and one control truck was matched on the basis of location, weekday, time of day, and truck type. The matched-pair odds ratio provided an estimate of the effect of various driver, vehicle, or carrier factors. Out-of-service (OOS) brake violations tripled the risk of crashing; any OOS vehicle defect increased crash risk by 362%. Higher historical crash rates (fatal, injury, or all crashes) of the carrier were associated with increased risk of crashing. Operating on a short-haul exemption increased crash risk by 383%. Antilock braking systems reduced crash risk by 65%. All of these results were statistically significant at the 95% confidence level. Other safety technologies also showed estimated benefits, although not statistically significant. With the exception of the finding that short-haul exemption is associated with increased crash risk, results largely bolster what is currently known about large truck crash risk and reinforce current enforcement practices. Results also suggest vehicle safety technologies can be important in lowering crash risk. This means that as safety technology continues to penetrate the fleet, whether from voluntary usage or government mandates, reductions in large truck crashes may be achieved. Practical application: Results imply that increased enforcement and use of crash avoidance technologies can improve the large truck crash problem. Copyright © 2017 National Safety Council and Elsevier Ltd. All rights reserved.
Statistical lamb wave localization based on extreme value theory

Science.gov (United States)

Harley, Joel B.

2018-04-01

Guided wave localization methods based on delay-and-sum imaging, matched field processing, and other techniques have been designed and researched to create images that locate and describe structural damage. The maximum value of these images typically represent an estimated damage location. Yet, it is often unclear if this maximum value, or any other value in the image, is a statistically significant indicator of damage. Furthermore, there are currently few, if any, approaches to assess the statistical significance of guided wave localization images. As a result, we present statistical delay-and-sum and statistical matched field processing localization methods to create statistically significant images of damage. Our framework uses constant rate of false alarm statistics and extreme value theory to detect damage with little prior information. We demonstrate our methods with in situ guided wave data from an aluminum plate to detect two 0.75 cm diameter holes. Our results show an expected improvement in statistical significance as the number of sensors increase. With seventeen sensors, both methods successfully detect damage with statistical significance.
Statistical Characterization of the Chandra Source Catalog

Science.gov (United States)

Primini, Francis A.; Houck, John C.; Davis, John E.; Nowak, Michael A.; Evans, Ian N.; Glotfelty, Kenny J.; Anderson, Craig S.; Bonaventura, Nina R.; Chen, Judy C.; Doe, Stephen M.; Evans, Janet D.; Fabbiano, Giuseppina; Galle, Elizabeth C.; Gibbs, Danny G.; Grier, John D.; Hain, Roger M.; Hall, Diane M.; Harbo, Peter N.; He, Xiangqun Helen; Karovska, Margarita; Kashyap, Vinay L.; Lauer, Jennifer; McCollough, Michael L.; McDowell, Jonathan C.; Miller, Joseph B.; Mitschang, Arik W.; Morgan, Douglas L.; Mossman, Amy E.; Nichols, Joy S.; Plummer, David A.; Refsdal, Brian L.; Rots, Arnold H.; Siemiginowska, Aneta; Sundheim, Beth A.; Tibbetts, Michael S.; Van Stone, David W.; Winkelman, Sherry L.; Zografou, Panagoula

2011-06-01

The first release of the Chandra Source Catalog (CSC) contains ~95,000 X-ray sources in a total area of 0.75% of the entire sky, using data from ~3900 separate ACIS observations of a multitude of different types of X-ray sources. In order to maximize the scientific benefit of such a large, heterogeneous data set, careful characterization of the statistical properties of the catalog, i.e., completeness, sensitivity, false source rate, and accuracy of source properties, is required. Characterization efforts of other large Chandra catalogs, such as the ChaMP Point Source Catalog or the 2 Mega-second Deep Field Surveys, while informative, cannot serve this purpose, since the CSC analysis procedures are significantly different and the range of allowable data is much less restrictive. We describe here the characterization process for the CSC. This process includes both a comparison of real CSC results with those of other, deeper Chandra catalogs of the same targets and extensive simulations of blank-sky and point-source populations.
Managing Macroeconomic Risks by Using Statistical Simulation

Directory of Open Access Journals (Sweden)

Merkaš Zvonko

2017-06-01

Full Text Available The paper analyzes the possibilities of using statistical simulation in the macroeconomic risks measurement. At the level of the whole world, macroeconomic risks are, due to the excessive imbalance, significantly increased. Using analytical statistical methods and Monte Carlo simulation, the authors interpret the collected data sets, compare and analyze them in order to mitigate potential risks. The empirical part of the study is a qualitative case study that uses statistical methods and Monte Carlo simulation for managing macroeconomic risks, which is the central theme of this work. Application of statistical simulation is necessary because the system, for which it is necessary to specify the model, is too complex for an analytical approach. The objective of the paper is to point out the previous need for consideration of significant macroeconomic risks, particularly in terms of the number of the unemployed in the society, the movement of gross domestic product and the country’s credit rating, and the use of data previously processed by statistical methods, through statistical simulation, to analyze the existing model of managing the macroeconomic risks and suggest elements for a management model development that will allow, with the lowest possible probability and consequences, the emergence of the recent macroeconomic risks. The stochastic characteristics of the system, defined by random variables as input values defined by probability distributions, require the performance of a large number of iterations on which to record the output of the model and calculate the mathematical expectations. The paper expounds the basic procedures and techniques of discrete statistical simulation applied to systems that can be characterized by a number of events which represent a set of circumstances that have caused a change in the system’s state and the possibility of its application in the field of assessment of macroeconomic risks. The method has no
Multifocal Gastric Ulcers Caused by Diffuse Large B Cell Lymphoma in a Patient With Significant Weight Loss

OpenAIRE

Gromski, Mark A.; Peng, Jennifer L.; Zhou, Jiehao; Masuoka, Howard C.; Suvannasankha, Attaya; Liangpunsakul, Suthat

2016-01-01

Primary gastrointestinal (GI) lymphoma is a heterogeneous disease with varied clinical presentations. The stomach is the most common GI site and accounts for 70% to 75% of GI lymphomas. We present a patient with gastric diffuse large B cell lymphoma (DLBCL) who presented with significant weight loss, early satiety, and multifocal ulcerated gastric lesions. Esophagoduodenoscopy should be performed in patients presenting with warning symptoms as in our case. Diagnosis is usually made by endosco...

An introduction to statistical thermodynamics

CERN Document Server

Hill, Terrell L

1987-01-01

""A large number of exercises of a broad range of difficulty make this book even more useful…a good addition to the literature on thermodynamics at the undergraduate level."" - Philosophical MagazineAlthough written on an introductory level, this wide-ranging text provides extensive coverage of topics of current interest in equilibrium statistical mechanics. Indeed, certain traditional topics are given somewhat condensed treatment to allow room for a survey of more recent advances.The book is divided into four major sections. Part I deals with the principles of quantum statistical mechanics a
GPU-computing in econophysics and statistical physics

Science.gov (United States)

Preis, T.

2011-03-01

A recent trend in computer science and related fields is general purpose computing on graphics processing units (GPUs), which can yield impressive performance. With multiple cores connected by high memory bandwidth, today's GPUs offer resources for non-graphics parallel processing. This article provides a brief introduction into the field of GPU computing and includes examples. In particular computationally expensive analyses employed in financial market context are coded on a graphics card architecture which leads to a significant reduction of computing time. In order to demonstrate the wide range of possible applications, a standard model in statistical physics - the Ising model - is ported to a graphics card architecture as well, resulting in large speedup values.
Statistical searches for microlensing events in large, non-uniformly sampled time-domain surveys: A test using palomar transient factory data

Energy Technology Data Exchange (ETDEWEB)

Price-Whelan, Adrian M.; Agüeros, Marcel A. [Department of Astronomy, Columbia University, 550 W 120th Street, New York, NY 10027 (United States); Fournier, Amanda P. [Department of Physics, Broida Hall, University of California, Santa Barbara, CA 93106 (United States); Street, Rachel [Las Cumbres Observatory Global Telescope Network, Inc., 6740 Cortona Drive, Suite 102, Santa Barbara, CA 93117 (United States); Ofek, Eran O. [Benoziyo Center for Astrophysics, Weizmann Institute of Science, 76100 Rehovot (Israel); Covey, Kevin R. [Lowell Observatory, 1400 West Mars Hill Road, Flagstaff, AZ 86001 (United States); Levitan, David; Sesar, Branimir [Division of Physics, Mathematics, and Astronomy, California Institute of Technology, Pasadena, CA 91125 (United States); Laher, Russ R.; Surace, Jason, E-mail: adrn@astro.columbia.edu [Spitzer Science Center, California Institute of Technology, Mail Stop 314-6, Pasadena, CA 91125 (United States)

2014-01-20

Many photometric time-domain surveys are driven by specific goals, such as searches for supernovae or transiting exoplanets, which set the cadence with which fields are re-imaged. In the case of the Palomar Transient Factory (PTF), several sub-surveys are conducted in parallel, leading to non-uniform sampling over its ∼20,000 deg{sup 2} footprint. While the median 7.26 deg{sup 2} PTF field has been imaged ∼40 times in the R band, ∼2300 deg{sup 2} have been observed >100 times. We use PTF data to study the trade off between searching for microlensing events in a survey whose footprint is much larger than that of typical microlensing searches, but with far-from-optimal time sampling. To examine the probability that microlensing events can be recovered in these data, we test statistics used on uniformly sampled data to identify variables and transients. We find that the von Neumann ratio performs best for identifying simulated microlensing events in our data. We develop a selection method using this statistic and apply it to data from fields with >10 R-band observations, 1.1 × 10{sup 9} light curves, uncovering three candidate microlensing events. We lack simultaneous, multi-color photometry to confirm these as microlensing events. However, their number is consistent with predictions for the event rate in the PTF footprint over the survey's three years of operations, as estimated from near-field microlensing models. This work can help constrain all-sky event rate predictions and tests microlensing signal recovery in large data sets, which will be useful to future time-domain surveys, such as that planned with the Large Synoptic Survey Telescope.
Simulating metabolism with statistical thermodynamics.

Science.gov (United States)

Cannon, William R

2014-01-01

New methods are needed for large scale modeling of metabolism that predict metabolite levels and characterize the thermodynamics of individual reactions and pathways. Current approaches use either kinetic simulations, which are difficult to extend to large networks of reactions because of the need for rate constants, or flux-based methods, which have a large number of feasible solutions because they are unconstrained by the law of mass action. This report presents an alternative modeling approach based on statistical thermodynamics. The principles of this approach are demonstrated using a simple set of coupled reactions, and then the system is characterized with respect to the changes in energy, entropy, free energy, and entropy production. Finally, the physical and biochemical insights that this approach can provide for metabolism are demonstrated by application to the tricarboxylic acid (TCA) cycle of Escherichia coli. The reaction and pathway thermodynamics are evaluated and predictions are made regarding changes in concentration of TCA cycle intermediates due to 10- and 100-fold changes in the ratio of NAD+:NADH concentrations. Finally, the assumptions and caveats regarding the use of statistical thermodynamics to model non-equilibrium reactions are discussed.
Numerical reconstruction of photon-number statistics from photocounting statistics: Regularization of an ill-posed problem

International Nuclear Information System (INIS)

Starkov, V. N.; Semenov, A. A.; Gomonay, H. V.

2009-01-01

We demonstrate a practical possibility of loss compensation in measured photocounting statistics in the presence of dark counts and background radiation noise. It is shown that satisfactory results are obtained even in the case of low detection efficiency and large experimental errors.
Big Data as a Source for Official Statistics

Directory of Open Access Journals (Sweden)

Daas Piet J.H.

2015-06-01

Full Text Available More and more data are being produced by an increasing number of electronic devices physically surrounding us and on the internet. The large amount of data and the high frequency at which they are produced have resulted in the introduction of the term ‘Big Data’. Because these data reflect many different aspects of our daily lives and because of their abundance and availability, Big Data sources are very interesting from an official statistics point of view. This article discusses the exploration of both opportunities and challenges for official statistics associated with the application of Big Data. Experiences gained with analyses of large amounts of Dutch traffic loop detection records and Dutch social media messages are described to illustrate the topics characteristic of the statistical analysis and use of Big Data.
Cluster-level statistical inference in fMRI datasets: The unexpected behavior of random fields in high dimensions.

Science.gov (United States)

Bansal, Ravi; Peterson, Bradley S

2018-06-01

Identifying regional effects of interest in MRI datasets usually entails testing a priori hypotheses across many thousands of brain voxels, requiring control for false positive findings in these multiple hypotheses testing. Recent studies have suggested that parametric statistical methods may have incorrectly modeled functional MRI data, thereby leading to higher false positive rates than their nominal rates. Nonparametric methods for statistical inference when conducting multiple statistical tests, in contrast, are thought to produce false positives at the nominal rate, which has thus led to the suggestion that previously reported studies should reanalyze their fMRI data using nonparametric tools. To understand better why parametric methods may yield excessive false positives, we assessed their performance when applied both to simulated datasets of 1D, 2D, and 3D Gaussian Random Fields (GRFs) and to 710 real-world, resting-state fMRI datasets. We showed that both the simulated 2D and 3D GRFs and the real-world data contain a small percentage (<6%) of very large clusters (on average 60 times larger than the average cluster size), which were not present in 1D GRFs. These unexpectedly large clusters were deemed statistically significant using parametric methods, leading to empirical familywise error rates (FWERs) as high as 65%: the high empirical FWERs were not a consequence of parametric methods failing to model spatial smoothness accurately, but rather of these very large clusters that are inherently present in smooth, high-dimensional random fields. In fact, when discounting these very large clusters, the empirical FWER for parametric methods was 3.24%. Furthermore, even an empirical FWER of 65% would yield on average less than one of those very large clusters in each brain-wide analysis. Nonparametric methods, in contrast, estimated distributions from those large clusters, and therefore, by construct rejected the large clusters as false positives at the nominal
The significance of Good Chair as part of children’s school and home environment in the preventive treatment of body statistics distortions

OpenAIRE

Mirosław Mrozkowiak; Hanna Żukowska

2015-01-01

Mrozkowiak Mirosław, Żukowska Hanna. Znaczenie Dobrego Krzesła, jako elementu szkolnego i domowego środowiska ucznia, w profilaktyce zaburzeń statyki postawy ciała = The significance of Good Chair as part of children’s school and home environment in the preventive treatment of body statistics distortions. Journal of Education, Health and Sport. 2015;5(7):179-215. ISSN 2391-8306. DOI 10.5281/zenodo.19832 http://ojs.ukw.edu.pl/index.php/johs/article/view/2015%3B5%287%29%3A179-215 https:...
Falling in the elderly: Do statistical models matter for performance criteria of fall prediction? Results from two large population-based studies.

Science.gov (United States)

Kabeshova, Anastasiia; Launay, Cyrille P; Gromov, Vasilii A; Fantino, Bruno; Levinoff, Elise J; Allali, Gilles; Beauchet, Olivier

2016-01-01

To compare performance criteria (i.e., sensitivity, specificity, positive predictive value, negative predictive value, area under receiver operating characteristic curve and accuracy) of linear and non-linear statistical models for fall risk in older community-dwellers. Participants were recruited in two large population-based studies, "Prévention des Chutes, Réseau 4" (PCR4, n=1760, cross-sectional design, retrospective collection of falls) and "Prévention des Chutes Personnes Agées" (PCPA, n=1765, cohort design, prospective collection of falls). Six linear statistical models (i.e., logistic regression, discriminant analysis, Bayes network algorithm, decision tree, random forest, boosted trees), three non-linear statistical models corresponding to artificial neural networks (multilayer perceptron, genetic algorithm and neuroevolution of augmenting topologies [NEAT]) and the adaptive neuro fuzzy interference system (ANFIS) were used. Falls ≥1 characterizing fallers and falls ≥2 characterizing recurrent fallers were used as outcomes. Data of studies were analyzed separately and together. NEAT and ANFIS had better performance criteria compared to other models. The highest performance criteria were reported with NEAT when using PCR4 database and falls ≥1, and with both NEAT and ANFIS when pooling data together and using falls ≥2. However, sensitivity and specificity were unbalanced. Sensitivity was higher than specificity when identifying fallers, whereas the converse was found when predicting recurrent fallers. Our results showed that NEAT and ANFIS were non-linear statistical models with the best performance criteria for the prediction of falls but their sensitivity and specificity were unbalanced, underscoring that models should be used respectively for the screening of fallers and the diagnosis of recurrent fallers. Copyright © 2015 European Federation of Internal Medicine. Published by Elsevier B.V. All rights reserved.
Reconstructing Macroeconomics Based on Statistical Physics

Science.gov (United States)

Aoki, Masanao; Yoshikawa, Hiroshi

We believe that time has come to integrate the new approach based on statistical physics or econophysics into macroeconomics. Toward this goal, there must be more dialogues between physicists and economists. In this paper, we argue that there is no reason why the methods of statistical physics so successful in many fields of natural sciences cannot be usefully applied to macroeconomics that is meant to analyze the macroeconomy comprising a large number of economic agents. It is, in fact, weird to regard the macroeconomy as a homothetic enlargement of the representative micro agent. We trust the bright future of the new approach to macroeconomies based on statistical physics.
Can a significance test be genuinely Bayesian?

OpenAIRE

Pereira, Carlos A. de B.; Stern, Julio Michael; Wechsler, Sergio

2008-01-01

The Full Bayesian Significance Test, FBST, is extensively reviewed. Its test statistic, a genuine Bayesian measure of evidence, is discussed in detail. Its behavior in some problems of statistical inference like testing for independence in contingency tables is discussed.
Statistical principles for prospective study protocols:

DEFF Research Database (Denmark)

Christensen, Robin; Langberg, Henning

2012-01-01

In the design of scientific studies it is essential to decide on which scientific questions one aims to answer, just as it is important to decide on the correct statistical methods to use to answer these questions. The correct use of statistical methods is crucial in all aspects of research...... to quantify relationships in data. Despite an increased focus on statistical content and complexity of biomedical research these topics remain difficult for most researchers. Statistical methods enable researchers to condense large spreadsheets with data into means, proportions, and difference between means......, risk differences, and other quantities that convey information. One of the goals in biomedical research is to develop parsimonious models - meaning as simple as possible. This approach is valid if the subsequent research report (the article) is written independent of whether the results...
Clinical analysis and prognostic significance of haemophagocytic lymphohistiocytosis-associated anaplastic large cell lymphoma in children.

Science.gov (United States)

Pasqualini, Claudia; Minard-Colin, Veronique; Saada, Veronique; Lamant, Laurence; Delsol, Georges; Patte, Catherine; Le Deley, Marie-Cécile; Valteau-Couanet, Dominique; Brugières, Laurence

2014-04-01

Haemophagocytic lymphohistiocytosis (HLH) has been rarely described in children treated for an anaplastic large-cell lymphoma (ALCL). We evaluated the incidence, the clinical and histological characteristics and the prognosis of HLH associated-ALCL. The medical, biological, cytological and histological data of patients treated for ALK-positive ALCL in the paediatric department of a single institution between 1975 and 2008 were analysed and assessed for HLH according to diagnosis criteria of the Histiocyte Society. Data concerning a series of 50 consecutive children with ALCL were reviewed. HLH-associated ALCL was observed in 12% of the patients. Lung involvement was significantly more frequent in HLH-associated ALCL patients than in the group without HLH (P = 0·004), as well as central nervous system (CNS) and bone marrow involvement (P = 0·001 and P = 0·007 respectively). The histological subtype in children with HLH-associated ALCL did not differ from that of the group without HLH. There was no significant difference between the two groups in 5-year EFS and OS (P = 0·91 and P > 0·99 respectively). In conclusion, HLH is not rare in paediatric ALCL. Despite a high incidence of visceral, CNS and bone marrow involvement, HLH does not seem to exert a significant impact on outcome in children treated for ALCL. © 2014 John Wiley & Sons Ltd.
Statistical Seismology and Induced Seismicity

Science.gov (United States)

Tiampo, K. F.; González, P. J.; Kazemian, J.

2014-12-01

While seismicity triggered or induced by natural resources production such as mining or water impoundment in large dams has long been recognized, the recent increase in the unconventional production of oil and gas has been linked to rapid rise in seismicity in many places, including central North America (Ellsworth et al., 2012; Ellsworth, 2013). Worldwide, induced events of M~5 have occurred and, although rare, have resulted in both damage and public concern (Horton, 2012; Keranen et al., 2013). In addition, over the past twenty years, the increase in both number and coverage of seismic stations has resulted in an unprecedented ability to precisely record the magnitude and location of large numbers of small magnitude events. The increase in the number and type of seismic sequences available for detailed study has revealed differences in their statistics that previously difficult to quantify. For example, seismic swarms that produce significant numbers of foreshocks as well as aftershocks have been observed in different tectonic settings, including California, Iceland, and the East Pacific Rise (McGuire et al., 2005; Shearer, 2012; Kazemian et al., 2014). Similarly, smaller events have been observed prior to larger induced events in several occurrences from energy production. The field of statistical seismology has long focused on the question of triggering and the mechanisms responsible (Stein et al., 1992; Hill et al., 1993; Steacy et al., 2005; Parsons, 2005; Main et al., 2006). For example, in most cases the associated stress perturbations are much smaller than the earthquake stress drop, suggesting an inherent sensitivity to relatively small stress changes (Nalbant et al., 2005). Induced seismicity provides the opportunity to investigate triggering and, in particular, the differences between long- and short-range triggering. Here we investigate the statistics of induced seismicity sequences from around the world, including central North America and Spain, and
An initiative to improve the management of clinically significant test results in a large health care network.

Science.gov (United States)

Roy, Christopher L; Rothschild, Jeffrey M; Dighe, Anand S; Schiff, Gordon D; Graydon-Baker, Erin; Lenoci-Edwards, Jennifer; Dwyer, Cheryl; Khorasani, Ramin; Gandhi, Tejal K

2013-11-01

The failure of providers to communicate and follow up clinically significant test results (CSTR) is an important threat to patient safety. The Massachusetts Coalition for the Prevention of Medical Errors has endorsed the creation of systems to ensure that results can be received and acknowledged. In 2008 a task force was convened that represented clinicians, laboratories, radiology, patient safety, risk management, and information systems in a large health care network with the goals of providing recommendations and a road map for improvement in the management of CSTR and of implementing this improvement plan during the sub-force sequent five years. In drafting its charter, the task broadened the scope from "critical" results to "clinically significant" ones; clinically significant was defined as any result that requires further clinical action to avoid morbidity or mortality, regardless of the urgency of that action. The task force recommended four key areas for improvement--(1) standardization of policies and definitions, (2) robust identification of the patient's care team, (3) enhanced results management/tracking systems, and (4) centralized quality reporting and metrics. The task force faced many challenges in implementing these recommendations, including disagreements on definitions of CSTR and on who should have responsibility for CSTR, changes to established work flows, limitations of resources and of existing information systems, and definition of metrics. This large-scale effort to improve the communication and follow-up of CSTR in a health care network continues with ongoing work to address implementation challenges, refine policies, prepare for a new clinical information system platform, and identify new ways to measure the extent of this important safety problem.
Editorial to: Six papers on Dynamic Statistical Models

DEFF Research Database (Denmark)

2014-01-01

statistical methodology and theory for large and complex data sets that included biostatisticians and mathematical statisticians from three faculties at the University of Copenhagen. The satellite meeting took place August 17–19, 2011. Its purpose was to bring together researchers in statistics and related......The following six papers are based on invited lectures at the satellite meeting held at the University of Copenhagen before the 58th World Statistics Congress of the International Statistical Institute in Dublin in 2011. At the invitation of the Bernoulli Society, the satellite meeting...... was organized around the theme “Dynamic Statistical Models” as a part of the Program of Excellence at the University of Copenhagen on “Statistical methods for complex and high dimensional models” (http://statistics.ku.dk/). The Excellence Program in Statistics was a research project to develop and investigate...
Autonomic Differentiation Map: A Novel Statistical Tool for Interpretation of Heart Rate Variability

Directory of Open Access Journals (Sweden)

Daniela Lucini

2018-04-01

Full Text Available In spite of the large body of evidence suggesting Heart Rate Variability (HRV alone or combined with blood pressure variability (providing an estimate of baroreflex gain as a useful technique to assess the autonomic regulation of the cardiovascular system, there is still an ongoing debate about methodology, interpretation, and clinical applications. In the present investigation, we hypothesize that non-parametric and multivariate exploratory statistical manipulation of HRV data could provide a novel informational tool useful to differentiate normal controls from clinical groups, such as athletes, or subjects affected by obesity, hypertension, or stress. With a data-driven protocol in 1,352 ambulant subjects, we compute HRV and baroreflex indices from short-term data series as proxies of autonomic (ANS regulation. We apply a three-step statistical procedure, by first removing age and gender effects. Subsequently, by factor analysis, we extract four ANS latent domains that detain the large majority of information (86.94%, subdivided in oscillatory (40.84%, amplitude (18.04%, pressure (16.48%, and pulse domains (11.58%. Finally, we test the overall capacity to differentiate clinical groups vs. control. To give more practical value and improve readability, statistical results concerning individual discriminant ANS proxies and ANS differentiation profiles are displayed through peculiar graphical tools, i.e., significance diagram and ANS differentiation map, respectively. This approach, which simultaneously uses all available information about the system, shows what domains make up the difference in ANS discrimination. e.g., athletes differ from controls in all domains, but with a graded strength: maximal in the (normalized oscillatory and in the pulse domains, slightly less in the pressure domain and minimal in the amplitude domain. The application of multiple (non-parametric and exploratory statistical and graphical tools to ANS proxies defines
Application of extended statistical combination of uncertainties methodology for digital nuclear power plants

Energy Technology Data Exchange (ETDEWEB)

In, Wang Ki; Uh, Keun Sun; Chul, Kim Heui [Korea Atomic Energy Research Institute, Taejon (Korea, Republic of)

1995-02-01

A technically more direct statistical combinations of uncertainties methodology, extended SCU (XSCU), was applied to statistically combine the uncertainties associated with the DNBR alarm setpoint and the DNBR trip setpoint of digital nuclear power plants. The modified SCU (MSCU) methodology is currently used as the USNRC approved design methodology to perform the same function. In this report, the MSCU and XSCU methodologies were compared in terms of the total uncertainties and the net margins to the DNBR alarm and trip setpoints. The MSCU methodology resulted in the small total penalties due to a significantly negative bias which are quite large. However the XSCU methodology gave the virtually unbiased total uncertainties. The net margins to the DNBR alarm and trip setpoints by the MSCU methodology agree with those by the XSCU methodology within statistical variations. (Author) 12 refs., 17 figs., 5 tabs.
Statistical distributions of optimal global alignment scores of random protein sequences

Directory of Open Access Journals (Sweden)

Tang Jiaowei

2005-10-01

Full Text Available Abstract Background The inference of homology from statistically significant sequence similarity is a central issue in sequence alignments. So far the statistical distribution function underlying the optimal global alignments has not been completely determined. Results In this study, random and real but unrelated sequences prepared in six different ways were selected as reference datasets to obtain their respective statistical distributions of global alignment scores. All alignments were carried out with the Needleman-Wunsch algorithm and optimal scores were fitted to the Gumbel, normal and gamma distributions respectively. The three-parameter gamma distribution performs the best as the theoretical distribution function of global alignment scores, as it agrees perfectly well with the distribution of alignment scores. The normal distribution also agrees well with the score distribution frequencies when the shape parameter of the gamma distribution is sufficiently large, for this is the scenario when the normal distribution can be viewed as an approximation of the gamma distribution. Conclusion We have shown that the optimal global alignment scores of random protein sequences fit the three-parameter gamma distribution function. This would be useful for the inference of homology between sequences whose relationship is unknown, through the evaluation of gamma distribution significance between sequences.
Weighted statistical parameters for irregularly sampled time series

Science.gov (United States)

Rimoldini, Lorenzo

2014-01-01

Unevenly spaced time series are common in astronomy because of the day-night cycle, weather conditions, dependence on the source position in the sky, allocated telescope time and corrupt measurements, for example, or inherent to the scanning law of satellites like Hipparcos and the forthcoming Gaia. Irregular sampling often causes clumps of measurements and gaps with no data which can severely disrupt the values of estimators. This paper aims at improving the accuracy of common statistical parameters when linear interpolation (in time or phase) can be considered an acceptable approximation of a deterministic signal. A pragmatic solution is formulated in terms of a simple weighting scheme, adapting to the sampling density and noise level, applicable to large data volumes at minimal computational cost. Tests on time series from the Hipparcos periodic catalogue led to significant improvements in the overall accuracy and precision of the estimators with respect to the unweighted counterparts and those weighted by inverse-squared uncertainties. Automated classification procedures employing statistical parameters weighted by the suggested scheme confirmed the benefits of the improved input attributes. The classification of eclipsing binaries, Mira, RR Lyrae, Delta Cephei and Alpha2 Canum Venaticorum stars employing exclusively weighted descriptive statistics achieved an overall accuracy of 92 per cent, about 6 per cent higher than with unweighted estimators.

Fractional statistics and fractional quantized Hall effect. Revision

International Nuclear Information System (INIS)

Tao, R.; Wu, Y.S.

1984-01-01

We suggest that the origin of the odd denominator rule observed in the fractional quantized Hall effect (FQHE) may lie in fractional statistics which governs quasiparticles in FQHE. A theorem concerning statistics of clusters of quasiparticles implies that fractional statistics does not allow coexistence of a large number of quasiparticles at fillings with an even denominator. Thus no Hall plateau can be formed at these fillings, regardless of the presence of an energy gap. 15 references
Statistics for experimentalists

CERN Document Server

Cooper, B E

2014-01-01

Statistics for Experimentalists aims to provide experimental scientists with a working knowledge of statistical methods and search approaches to the analysis of data. The book first elaborates on probability and continuous probability distributions. Discussions focus on properties of continuous random variables and normal variables, independence of two random variables, central moments of a continuous distribution, prediction from a normal distribution, binomial probabilities, and multiplication of probabilities and independence. The text then examines estimation and tests of significance. Topics include estimators and estimates, expected values, minimum variance linear unbiased estimators, sufficient estimators, methods of maximum likelihood and least squares, and the test of significance method. The manuscript ponders on distribution-free tests, Poisson process and counting problems, correlation and function fitting, balanced incomplete randomized block designs and the analysis of covariance, and experiment...
Eigenfunction statistics on quantum graphs

International Nuclear Information System (INIS)

Gnutzmann, S.; Keating, J.P.; Piotet, F.

2010-01-01

We investigate the spatial statistics of the energy eigenfunctions on large quantum graphs. It has previously been conjectured that these should be described by a Gaussian Random Wave Model, by analogy with quantum chaotic systems, for which such a model was proposed by Berry in 1977. The autocorrelation functions we calculate for an individual quantum graph exhibit a universal component, which completely determines a Gaussian Random Wave Model, and a system-dependent deviation. This deviation depends on the graph only through its underlying classical dynamics. Classical criteria for quantum universality to be met asymptotically in the large graph limit (i.e. for the non-universal deviation to vanish) are then extracted. We use an exact field theoretic expression in terms of a variant of a supersymmetric σ model. A saddle-point analysis of this expression leads to the estimates. In particular, intensity correlations are used to discuss the possible equidistribution of the energy eigenfunctions in the large graph limit. When equidistribution is asymptotically realized, our theory predicts a rate of convergence that is a significant refinement of previous estimates. The universal and system-dependent components of intensity correlation functions are recovered by means of an exact trace formula which we analyse in the diagonal approximation, drawing in this way a parallel between the field theory and semiclassics. Our results provide the first instance where an asymptotic Gaussian Random Wave Model has been established microscopically for eigenfunctions in a system with no disorder.
On two methods of statistical image analysis

NARCIS (Netherlands)

Missimer, J; Knorr, U; Maguire, RP; Herzog, H; Seitz, RJ; Tellman, L; Leenders, K.L.

1999-01-01

The computerized brain atlas (CBA) and statistical parametric mapping (SPM) are two procedures for voxel-based statistical evaluation of PET activation studies. Each includes spatial standardization of image volumes, computation of a statistic, and evaluation of its significance. In addition,
Some challenges with statistical inference in adaptive designs.

Science.gov (United States)

Hung, H M James; Wang, Sue-Jane; Yang, Peiling

2014-01-01

Adaptive designs have generated a great deal of attention to clinical trial communities. The literature contains many statistical methods to deal with added statistical uncertainties concerning the adaptations. Increasingly encountered in regulatory applications are adaptive statistical information designs that allow modification of sample size or related statistical information and adaptive selection designs that allow selection of doses or patient populations during the course of a clinical trial. For adaptive statistical information designs, a few statistical testing methods are mathematically equivalent, as a number of articles have stipulated, but arguably there are large differences in their practical ramifications. We pinpoint some undesirable features of these methods in this work. For adaptive selection designs, the selection based on biomarker data for testing the correlated clinical endpoints may increase statistical uncertainty in terms of type I error probability, and most importantly the increased statistical uncertainty may be impossible to assess.
Statistical and Visualization Data Mining Tools for Foundry Production

Directory of Open Access Journals (Sweden)

M. Perzyk

2007-07-01

Full Text Available In recent years a rapid development of a new, interdisciplinary knowledge area, called data mining, is observed. Its main task is extracting useful information from previously collected large amount of data. The main possibilities and potential applications of data mining in manufacturing industry are characterized. The main types of data mining techniques are briefly discussed, including statistical, artificial intelligence, data base and visualization tools. The statistical methods and visualization methods are presented in more detail, showing their general possibilities, advantages as well as characteristic examples of applications in foundry production. Results of the author’s research are presented, aimed at validation of selected statistical tools which can be easily and effectively used in manufacturing industry. A performance analysis of ANOVA and contingency tables based methods, dedicated for determination of the most significant process parameters as well as for detection of possible interactions among them, has been made. Several numerical tests have been performed using simulated data sets, with assumed hidden relationships as well some real data, related to the strength of ductile cast iron, collected in a foundry. It is concluded that the statistical methods offer relatively easy and fairly reliable tools for extraction of that type of knowledge about foundry manufacturing processes. However, further research is needed, aimed at explanation of some imperfections of the investigated tools as well assessment of their validity for more complex tasks.
Software for statistical data analysis used in Higgs searches

International Nuclear Information System (INIS)

Gumpert, Christian; Moneta, Lorenzo; Cranmer, Kyle; Kreiss, Sven; Verkerke, Wouter

2014-01-01

The analysis and interpretation of data collected by the Large Hadron Collider (LHC) requires advanced statistical tools in order to quantify the agreement between observation and theoretical models. RooStats is a project providing a statistical framework for data analysis with the focus on discoveries, confidence intervals and combination of different measurements in both Bayesian and frequentist approaches. It employs the RooFit data modelling language where mathematical concepts such as variables, (probability density) functions and integrals are represented as C++ objects. RooStats and RooFit rely on the persistency technology of the ROOT framework. The usage of a common data format enables the concept of digital publishing of complicated likelihood functions. The statistical tools have been developed in close collaboration with the LHC experiments to ensure their applicability to real-life use cases. Numerous physics results have been produced using the RooStats tools, with the discovery of the Higgs boson by the ATLAS and CMS experiments being certainly the most popular among them. We will discuss tools currently used by LHC experiments to set exclusion limits, to derive confidence intervals and to estimate discovery significances based on frequentist statistics and the asymptotic behaviour of likelihood functions. Furthermore, new developments in RooStats and performance optimisation necessary to cope with complex models depending on more than 1000 variables will be reviewed
Statistics Using Just One Formula

Science.gov (United States)

Rosenthal, Jeffrey S.

2018-01-01

This article advocates that introductory statistics be taught by basing all calculations on a single simple margin-of-error formula and deriving all of the standard introductory statistical concepts (confidence intervals, significance tests, comparisons of means and proportions, etc) from that one formula. It is argued that this approach will…
Statistics Anxiety among Postgraduate Students

Science.gov (United States)

Koh, Denise; Zawi, Mohd Khairi

2014-01-01

Most postgraduate programmes, that have research components, require students to take at least one course of research statistics. Not all postgraduate programmes are science based, there are a significant number of postgraduate students who are from the social sciences that will be taking statistics courses, as they try to complete their…
On the Statistical Dependency of Identity Theft on Demographics

Science.gov (United States)

di Crescenzo, Giovanni

An improved understanding of the identity theft problem is widely agreed to be necessary to succeed in counter-theft efforts in legislative, financial and research institutions. In this paper we report on a statistical study about the existence of relationships between identity theft and area demographics in the US. The identity theft data chosen was the number of citizen complaints to the Federal Trade Commission in a large number of US municipalities. The list of demographics used for any such municipality included: estimated population, median resident age, estimated median household income, percentage of citizens with a high school or higher degree, percentage of unemployed residents, percentage of married residents, percentage of foreign born residents, percentage of residents living in poverty, density of law enforcement employees, crime index, and political orientation according to the 2004 presidential election. Our study findings, based on linear regression techniques, include statistically significant relationships between the number of identity theft complaints and a non-trivial subset of these demographics.
Development of the Large-Scale Statistical Analysis System of Satellites Observations Data with Grid Datafarm Architecture

Science.gov (United States)

Yamamoto, K.; Murata, K.; Kimura, E.; Honda, R.

2006-12-01

In the Solar-Terrestrial Physics (STP) field, the amount of satellite observation data has been increasing every year. It is necessary to solve the following three problems to achieve large-scale statistical analyses of plenty of data. (i) More CPU power and larger memory and disk size are required. However, total powers of personal computers are not enough to analyze such amount of data. Super-computers provide a high performance CPU and rich memory area, but they are usually separated from the Internet or connected only for the purpose of programming or data file transfer. (ii) Most of the observation data files are managed at distributed data sites over the Internet. Users have to know where the data files are located. (iii) Since no common data format in the STP field is available now, users have to prepare reading program for each data by themselves. To overcome the problems (i) and (ii), we constructed a parallel and distributed data analysis environment based on the Gfarm reference implementation of the Grid Datafarm architecture. The Gfarm shares both computational resources and perform parallel distributed processings. In addition, the Gfarm provides the Gfarm filesystem which can be as virtual directory tree among nodes. The Gfarm environment is composed of three parts; a metadata server to manage distributed files information, filesystem nodes to provide computational resources and a client to throw a job into metadata server and manages data processing schedulings. In the present study, both data files and data processes are parallelized on the Gfarm with 6 file system nodes: CPU clock frequency of each node is Pentium V 1GHz, 256MB memory and40GB disk. To evaluate performances of the present Gfarm system, we scanned plenty of data files, the size of which is about 300MB for each, in three processing methods: sequential processing in one node, sequential processing by each node and parallel processing by each node. As a result, in comparison between the
HistFitter software framework for statistical data analysis

Energy Technology Data Exchange (ETDEWEB)

Baak, M. [CERN, Geneva (Switzerland); Besjes, G.J. [Radboud University Nijmegen, Nijmegen (Netherlands); Nikhef, Amsterdam (Netherlands); Cote, D. [University of Texas, Arlington (United States); Koutsman, A. [TRIUMF, Vancouver (Canada); Lorenz, J. [Ludwig-Maximilians-Universitaet Muenchen, Munich (Germany); Excellence Cluster Universe, Garching (Germany); Short, D. [University of Oxford, Oxford (United Kingdom)

2015-04-15

We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fit to data and interpreted with statistical tests. Internally HistFitter uses the statistics packages RooStats and HistFactory. A key innovation of HistFitter is its design, which is rooted in analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with multiple models at once that describe the data, HistFitter introduces an additional level of abstraction that allows for easy bookkeeping, manipulation and testing of large collections of signal hypotheses. Finally, HistFitter provides a collection of tools to present results with publication quality style through a simple command-line interface. (orig.)
HistFitter software framework for statistical data analysis

International Nuclear Information System (INIS)

Baak, M.; Besjes, G.J.; Cote, D.; Koutsman, A.; Lorenz, J.; Short, D.

2015-01-01

We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fit to data and interpreted with statistical tests. Internally HistFitter uses the statistics packages RooStats and HistFactory. A key innovation of HistFitter is its design, which is rooted in analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with multiple models at once that describe the data, HistFitter introduces an additional level of abstraction that allows for easy bookkeeping, manipulation and testing of large collections of signal hypotheses. Finally, HistFitter provides a collection of tools to present results with publication quality style through a simple command-line interface. (orig.)
Robust statistical methods with R

CERN Document Server

Jureckova, Jana

2005-01-01

Robust statistical methods were developed to supplement the classical procedures when the data violate classical assumptions. They are ideally suited to applied research across a broad spectrum of study, yet most books on the subject are narrowly focused, overly theoretical, or simply outdated. Robust Statistical Methods with R provides a systematic treatment of robust procedures with an emphasis on practical application.The authors work from underlying mathematical tools to implementation, paying special attention to the computational aspects. They cover the whole range of robust methods, including differentiable statistical functions, distance of measures, influence functions, and asymptotic distributions, in a rigorous yet approachable manner. Highlighting hands-on problem solving, many examples and computational algorithms using the R software supplement the discussion. The book examines the characteristics of robustness, estimators of real parameter, large sample properties, and goodness-of-fit tests. It...
Statistical Methods for Unusual Count Data

DEFF Research Database (Denmark)

Guthrie, Katherine A.; Gammill, Hilary S.; Kamper-Jørgensen, Mads

2016-01-01

microchimerism data present challenges for statistical analysis, including a skewed distribution, excess zero values, and occasional large values. Methods for comparing microchimerism levels across groups while controlling for covariates are not well established. We compared statistical models for quantitative...... microchimerism values, applied to simulated data sets and 2 observed data sets, to make recommendations for analytic practice. Modeling the level of quantitative microchimerism as a rate via Poisson or negative binomial model with the rate of detection defined as a count of microchimerism genome equivalents per...
Large blackouts in North America: Historical trends and policy implications

International Nuclear Information System (INIS)

Hines, Paul; Apt, Jay; Talukdar, Sarosh

2009-01-01

Using data from the North American Electric Reliability Council (NERC) for 1984-2006, we find several trends. We find that the frequency of large blackouts in the United States has not decreased over time, that there is a statistically significant increase in blackout frequency during peak hours of the day and during late summer and mid-winter months (although non-storm-related risk is nearly constant through the year) and that there is strong statistical support for the previously observed power-law statistical relationship between blackout size and frequency. We do not find that blackout sizes and blackout durations are significantly correlated. These trends hold even after controlling for increasing demand and population and after eliminating small events, for which the data may be skewed by spotty reporting. Trends in blackout occurrences, such as those observed in the North American data, have important implications for those who make investment and policy decisions in the electricity industry. We provide a number of examples that illustrate how these trends can inform benefit-cost analysis calculations. Also, following procedures used in natural disaster planning we use the observed statistical trends to calculate the size of the 100-year blackout, which for North America is 186,000 MW.
Fundamental statistical features and self-similar properties of tagged networks

International Nuclear Information System (INIS)

Palla, Gergely; Farkas, Illes J; Pollner, Peter; Vicsek, Tamas; Derenyi, Imre

2008-01-01

We investigate the fundamental statistical features of tagged (or annotated) networks having a rich variety of attributes associated with their nodes. Tags (attributes, annotations, properties, features, etc) provide essential information about the entity represented by a given node, thus, taking them into account represents a significant step towards a more complete description of the structure of large complex systems. Our main goal here is to uncover the relations between the statistical properties of the node tags and those of the graph topology. In order to better characterize the networks with tagged nodes, we introduce a number of new notions, including tag-assortativity (relating link probability to node similarity), and new quantities, such as node uniqueness (measuring how rarely the tags of a node occur in the network) and tag-assortativity exponent. We apply our approach to three large networks representing very different domains of complex systems. A number of the tag related quantities display analogous behaviour (e.g. the networks we studied are tag-assortative, indicating possible universal aspects of tags versus topology), while some other features, such as the distribution of the node uniqueness, show variability from network to network allowing for pin-pointing large scale specific features of real-world complex networks. We also find that for each network the topology and the tag distribution are scale invariant, and this self-similar property of the networks can be well characterized by the tag-assortativity exponent, which is specific to each system.
FERMI/LARGE AREA TELESCOPE BRIGHT GAMMA-RAY SOURCE LIST

International Nuclear Information System (INIS)

Abdo, A. A.; Ackermann, M.; Ajello, M.; Bechtol, K.; Berenji, B.; Blandford, R. D.; Bloom, E. D.; Borgland, A. W.; Atwood, W. B.; Axelsson, M.; Battelino, M.; Baldini, L.; Bellazzini, R.; Ballet, J.; Band, D. L.; Barbiellini, G.; Bastieri, D.; Baughman, B. M.; Bignami, G. F.; Bonamente, E.

2009-01-01

Following its launch in 2008 June, the Fermi Gamma-ray Space Telescope (Fermi) began a sky survey in August. The Large Area Telescope (LAT) on Fermi in three months produced a deeper and better resolved map of the γ-ray sky than any previous space mission. We present here initial results for energies above 100 MeV for the 205 most significant (statistical significance greater than ∼10σ) γ-ray sources in these data. These are the best characterized and best localized point-like (i.e., spatially unresolved) γ-ray sources in the early mission data.
Fermi Large Area Telescope Bright Gamma-ray Source List

Energy Technology Data Exchange (ETDEWEB)

Abdo, Aous A.; /Naval Research Lab, Wash., D.C.; Ackermann, M.; /KIPAC, Menlo Park /SLAC; Ajello, M.; /KIPAC, Menlo Park /SLAC; Atwood, W.B.; /UC, Santa Cruz; Axelsson, M.; /Stockholm U., OKC /Stockholm U.; Baldini, L.; /INFN, Pisa; Ballet, J.; /DAPNIA, Saclay; Band, D.L.; /NASA, Goddard /NASA, Goddard; Barbiellini, Guido; /INFN, Trieste /Trieste U.; Bastieri, Denis; /INFN, Padua /Padua U.; Bechtol, K.; /KIPAC, Menlo Park /SLAC; Bellazzini, R.; /INFN, Pisa; Berenji, B.; /KIPAC, Menlo Park /SLAC; Bignami, G.F.; /Pavia U.; Bloom, Elliott D.; /KIPAC, Menlo Park /SLAC; Bonamente, E.; /INFN, Perugia /Perugia U.; Borgland, A.W.; /KIPAC, Menlo Park /SLAC; Bregeon, J.; /INFN, Pisa; Brigida, M.; /Bari U. /INFN, Bari; Bruel, P.; /Ecole Polytechnique; Burnett, Thompson H.; /Washington U., Seattle /Bari U. /INFN, Bari /KIPAC, Menlo Park /SLAC /IASF, Milan /IASF, Milan /DAPNIA, Saclay /ASDC, Frascati /INFN, Perugia /Perugia U. /KIPAC, Menlo Park /SLAC /George Mason U. /Naval Research Lab, Wash., D.C. /NASA, Goddard /KIPAC, Menlo Park /SLAC /INFN, Perugia /Perugia U. /KIPAC, Menlo Park /SLAC /Montpellier U. /Sonoma State U. /Stockholm U., OKC /Royal Inst. Tech., Stockholm /Stockholm U. /KIPAC, Menlo Park /SLAC /ASDC, Frascati /NASA, Goddard /Maryland U. /Naval Research Lab, Wash., D.C. /INFN, Trieste /Pavia U. /Bari U. /INFN, Bari /KIPAC, Menlo Park /SLAC /UC, Santa Cruz /KIPAC, Menlo Park /SLAC /KIPAC, Menlo Park /SLAC /KIPAC, Menlo Park /SLAC /Montpellier U. /Bari U. /INFN, Bari /Ecole Polytechnique /NASA, Goddard; /more authors..

2009-05-15

Following its launch in 2008 June, the Fermi Gamma-ray Space Telescope (Fermi) began a sky survey in August. The Large Area Telescope (LAT) on Fermi in three months produced a deeper and better resolved map of the {gamma}-ray sky than any previous space mission. We present here initial results for energies above 100 MeV for the 205 most significant (statistical significance greater than {approx}10{sigma}) {gamma}-ray sources in these data. These are the best characterized and best localized point-like (i.e., spatially unresolved) {gamma}-ray sources in the early mission data.
Statistics-Based Compression of Global Wind Fields

KAUST Repository

Jeong, Jaehong

2017-02-07

Wind has the potential to make a significant contribution to future energy resources. Locating the sources of this renewable energy on a global scale is however extremely challenging, given the difficulty to store very large data sets generated by modern computer models. We propose a statistical model that aims at reproducing the data-generating mechanism of an ensemble of runs via a Stochastic Generator (SG) of global annual wind data. We introduce an evolutionary spectrum approach with spatially varying parameters based on large-scale geographical descriptors such as altitude to better account for different regimes across the Earth\\'s orography. We consider a multi-step conditional likelihood approach to estimate the parameters that explicitly accounts for nonstationary features while also balancing memory storage and distributed computation. We apply the proposed model to more than 18 million points of yearly global wind speed. The proposed SG requires orders of magnitude less storage for generating surrogate ensemble members from wind than does creating additional wind fields from the climate model, even if an effective lossy data compression algorithm is applied to the simulation output.

Statistics-Based Compression of Global Wind Fields

KAUST Repository

Jeong, Jaehong; Castruccio, Stefano; Crippa, Paola; Genton, Marc G.

2017-01-01

Wind has the potential to make a significant contribution to future energy resources. Locating the sources of this renewable energy on a global scale is however extremely challenging, given the difficulty to store very large data sets generated by modern computer models. We propose a statistical model that aims at reproducing the data-generating mechanism of an ensemble of runs via a Stochastic Generator (SG) of global annual wind data. We introduce an evolutionary spectrum approach with spatially varying parameters based on large-scale geographical descriptors such as altitude to better account for different regimes across the Earth's orography. We consider a multi-step conditional likelihood approach to estimate the parameters that explicitly accounts for nonstationary features while also balancing memory storage and distributed computation. We apply the proposed model to more than 18 million points of yearly global wind speed. The proposed SG requires orders of magnitude less storage for generating surrogate ensemble members from wind than does creating additional wind fields from the climate model, even if an effective lossy data compression algorithm is applied to the simulation output.
STATISTICAL CHARACTERIZATION OF THE CHANDRA SOURCE CATALOG

International Nuclear Information System (INIS)

Primini, Francis A.; Evans, Ian N.; Glotfelty, Kenny J.; Anderson, Craig S.; Bonaventura, Nina R.; Chen, Judy C.; Doe, Stephen M.; Evans, Janet D.; Fabbiano, Giuseppina; Galle, Elizabeth C.; Gibbs, Danny G.; Grier, John D.; Hain, Roger M.; Harbo, Peter N.; He Xiangqun; Karovska, Margarita; Houck, John C.; Davis, John E.; Nowak, Michael A.; Hall, Diane M.

2011-01-01

The first release of the Chandra Source Catalog (CSC) contains ∼95,000 X-ray sources in a total area of 0.75% of the entire sky, using data from ∼3900 separate ACIS observations of a multitude of different types of X-ray sources. In order to maximize the scientific benefit of such a large, heterogeneous data set, careful characterization of the statistical properties of the catalog, i.e., completeness, sensitivity, false source rate, and accuracy of source properties, is required. Characterization efforts of other large Chandra catalogs, such as the ChaMP Point Source Catalog or the 2 Mega-second Deep Field Surveys, while informative, cannot serve this purpose, since the CSC analysis procedures are significantly different and the range of allowable data is much less restrictive. We describe here the characterization process for the CSC. This process includes both a comparison of real CSC results with those of other, deeper Chandra catalogs of the same targets and extensive simulations of blank-sky and point-source populations.
Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science.

Science.gov (United States)

Veldkamp, Coosje L S; Nuijten, Michèle B; Dominguez-Alvarez, Linda; van Assen, Marcel A L M; Wicherts, Jelte M

2014-01-01

Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this 'co-piloting' currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors.
Computational Inquiry in Introductory Statistics

Science.gov (United States)

Toews, Carl

2017-01-01

Inquiry-based pedagogies have a strong presence in proof-based undergraduate mathematics courses, but can be difficult to implement in courses that are large, procedural, or highly computational. An introductory course in statistics would thus seem an unlikely candidate for an inquiry-based approach, as these courses typically steer well clear of…
Confidence Intervals: From tests of statistical significance to confidence intervals, range hypotheses and substantial effects

Directory of Open Access Journals (Sweden)

Dominic Beaulieu-Prévost

2006-03-01

Full Text Available For the last 50 years of research in quantitative social sciences, the empirical evaluation of scientific hypotheses has been based on the rejection or not of the null hypothesis. However, more than 300 articles demonstrated that this method was problematic. In summary, null hypothesis testing (NHT is unfalsifiable, its results depend directly on sample size and the null hypothesis is both improbable and not plausible. Consequently, alternatives to NHT such as confidence intervals (CI and measures of effect size are starting to be used in scientific publications. The purpose of this article is, first, to provide the conceptual tools necessary to implement an approach based on confidence intervals, and second, to briefly demonstrate why such an approach is an interesting alternative to an approach based on NHT. As demonstrated in the article, the proposed CI approach avoids most problems related to a NHT approach and can often improve the scientific and contextual relevance of the statistical interpretations by testing range hypotheses instead of a point hypothesis and by defining the minimal value of a substantial effect. The main advantage of such a CI approach is that it replaces the notion of statistical power by an easily interpretable three-value logic (probable presence of a substantial effect, probable absence of a substantial effect and probabilistic undetermination. The demonstration includes a complete example.
The modified signed likelihood statistic and saddlepoint approximations

DEFF Research Database (Denmark)

Jensen, Jens Ledet

1992-01-01

SUMMARY: For a number of tests in exponential families we show that the use of a normal approximation to the modified signed likelihood ratio statistic r * is equivalent to the use of a saddlepoint approximation. This is also true in a large deviation region where the signed likelihood ratio...... statistic r is of order √ n. © 1992 Biometrika Trust....
Statistics 101 for Radiologists.

Science.gov (United States)

Anvari, Arash; Halpern, Elkan F; Samir, Anthony E

2015-10-01

Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced. © RSNA, 2015.
Preventing statistical errors in scientific journals.

NARCIS (Netherlands)

Nuijten, M.B.

2016-01-01

There is evidence for a high prevalence of statistical reporting errors in psychology and other scientific fields. These errors display a systematic preference for statistically significant results, distorting the scientific literature. There are several possible causes for this systematic error
Bias expansion of spatial statistics and approximation of differenced ...

Indian Academy of Sciences (India)

Investigations of spatial statistics, computed from lattice data in the plane, can lead to a special lattice point counting problem. The statistical goal is to expand the asymptotic expectation or large-sample bias of certain spatial covariance estimators, where this bias typically depends on the shape of a spatial sampling region.
Mutational profile and prognostic significance of TP53 in diffuse large B-cell lymphoma patients treated with R-CHOP

DEFF Research Database (Denmark)

Xu-Monette, Zijun Y; Wu, Lin; Visco, Carlo

2012-01-01

TP53 mutation is an independent marker of poor prognosis in patients with diffuse large B-cell lymphoma (DLBCL) treated with cyclophosphamide, hydroxydaunorubicin, vincristine, and prednisone (CHOP) therapy. However, its prognostic value in the rituximab immunochemotherapy era remains undefined. ...... for stratifying R-CHOP-treated patients into distinct prognostic subsets and has significant value in the design of future therapeutic strategies....
Renyi statistics in equilibrium statistical mechanics

International Nuclear Information System (INIS)

Parvan, A.S.; Biro, T.S.

2010-01-01

The Renyi statistics in the canonical and microcanonical ensembles is examined both in general and in particular for the ideal gas. In the microcanonical ensemble the Renyi statistics is equivalent to the Boltzmann-Gibbs statistics. By the exact analytical results for the ideal gas, it is shown that in the canonical ensemble, taking the thermodynamic limit, the Renyi statistics is also equivalent to the Boltzmann-Gibbs statistics. Furthermore it satisfies the requirements of the equilibrium thermodynamics, i.e. the thermodynamical potential of the statistical ensemble is a homogeneous function of first degree of its extensive variables of state. We conclude that the Renyi statistics arrives at the same thermodynamical relations, as those stemming from the Boltzmann-Gibbs statistics in this limit.
Wind Statistics from a Forested Landscape

DEFF Research Database (Denmark)

Arnqvist, Johan; Segalini, Antonio; Dellwik, Ebba

2015-01-01

An analysis and interpretation of measurements from a 138-m tall tower located in a forested landscape is presented. Measurement errors and statistical uncertainties are carefully evaluated to ensure high data quality. A 40(Formula presented.) wide wind-direction sector is selected as the most...... representative for large-scale forest conditions, and from that sector first-, second- and third-order statistics, as well as analyses regarding the characteristic length scale, the flux-profile relationship and surface roughness are presented for a wide range of stability conditions. The results are discussed...
THE RADIO/GAMMA-RAY CONNECTION IN ACTIVE GALACTIC NUCLEI IN THE ERA OF THE FERMI LARGE AREA TELESCOPE

International Nuclear Information System (INIS)

Ackermann, M.; Ajello, M.; Allafort, A.; Berenji, B.; Blandford, R. D.; Bloom, E. D.; Borgland, A. W.; Angelakis, E.; Axelsson, M.; Baldini, L.; Bellazzini, R.; Bregeon, J.; Brez, A.; Ballet, J.; Barbiellini, G.; Bastieri, D.; Bonamente, E.; Bouvier, A.; Brigida, M.; Bruel, P.

2011-01-01

We present a detailed statistical analysis of the correlation between radio and gamma-ray emission of the active galactic nuclei (AGNs) detected by Fermi during its first year of operation, with the largest data sets ever used for this purpose. We use both archival interferometric 8.4 GHz data (from the Very Large Array and ATCA, for the full sample of 599 sources) and concurrent single-dish 15 GHz measurements from the Owens Valley Radio Observatory (OVRO, for a sub sample of 199 objects). Our unprecedentedly large sample permits us to assess with high accuracy the statistical significance of the correlation, using a surrogate data method designed to simultaneously account for common-distance bias and the effect of a limited dynamical range in the observed quantities. We find that the statistical significance of a positive correlation between the centimeter radio and the broadband (E > 100 MeV) gamma-ray energy flux is very high for the whole AGN sample, with a probability of -7 for the correlation appearing by chance. Using the OVRO data, we find that concurrent data improve the significance of the correlation from 1.6 x 10 -6 to 9.0 x 10 -8 . Our large sample size allows us to study the dependence of correlation strength and significance on specific source types and gamma-ray energy band. We find that the correlation is very significant (chance probability -7 ) for both flat spectrum radio quasars and BL Lac objects separately; a dependence of the correlation strength on the considered gamma-ray energy band is also present, but additional data will be necessary to constrain its significance.
Statistical identification of effective input variables

International Nuclear Information System (INIS)

Vaurio, J.K.

1982-09-01

A statistical sensitivity analysis procedure has been developed for ranking the input data of large computer codes in the order of sensitivity-importance. The method is economical for large codes with many input variables, since it uses a relatively small number of computer runs. No prior judgemental elimination of input variables is needed. The sceening method is based on stagewise correlation and extensive regression analysis of output values calculated with selected input value combinations. The regression process deals with multivariate nonlinear functions, and statistical tests are also available for identifying input variables that contribute to threshold effects, i.e., discontinuities in the output variables. A computer code SCREEN has been developed for implementing the screening techniques. The efficiency has been demonstrated by several examples and applied to a fast reactor safety analysis code (Venus-II). However, the methods and the coding are general and not limited to such applications
Large Scale Cosmological Anomalies and Inhomogeneous Dark Energy

Directory of Open Access Journals (Sweden)

Leandros Perivolaropoulos

2014-01-01

Full Text Available A wide range of large scale observations hint towards possible modifications on the standard cosmological model which is based on a homogeneous and isotropic universe with a small cosmological constant and matter. These observations, also known as “cosmic anomalies” include unexpected Cosmic Microwave Background perturbations on large angular scales, large dipolar peculiar velocity flows of galaxies (“bulk flows”, the measurement of inhomogenous values of the fine structure constant on cosmological scales (“alpha dipole” and other effects. The presence of the observational anomalies could either be a large statistical fluctuation in the context of ΛCDM or it could indicate a non-trivial departure from the cosmological principle on Hubble scales. Such a departure is very much constrained by cosmological observations for matter. For dark energy however there are no significant observational constraints for Hubble scale inhomogeneities. In this brief review I discuss some of the theoretical models that can naturally lead to inhomogeneous dark energy, their observational constraints and their potential to explain the large scale cosmic anomalies.
Comparison of Statistical Algorithms for the Detection of Infectious Disease Outbreaks in Large Multiple Surveillance Systems

Science.gov (United States)

Farrington, C. Paddy; Noufaily, Angela; Andrews, Nick J.; Charlett, Andre

2016-01-01

A large-scale multiple surveillance system for infectious disease outbreaks has been in operation in England and Wales since the early 1990s. Changes to the statistical algorithm at the heart of the system were proposed and the purpose of this paper is to compare two new algorithms with the original algorithm. Test data to evaluate performance are created from weekly counts of the number of cases of each of more than 2000 diseases over a twenty-year period. The time series of each disease is separated into one series giving the baseline (background) disease incidence and a second series giving disease outbreaks. One series is shifted forward by twelve months and the two are then recombined, giving a realistic series in which it is known where outbreaks have been added. The metrics used to evaluate performance include a scoring rule that appropriately balances sensitivity against specificity and is sensitive to variation in probabilities near 1. In the context of disease surveillance, a scoring rule can be adapted to reflect the size of outbreaks and this was done. Results indicate that the two new algorithms are comparable to each other and better than the algorithm they were designed to replace. PMID:27513749
Foundation of statistical energy analysis in vibroacoustics

CERN Document Server

Le Bot, A

2015-01-01

This title deals with the statistical theory of sound and vibration. The foundation of statistical energy analysis is presented in great detail. In the modal approach, an introduction to random vibration with application to complex systems having a large number of modes is provided. For the wave approach, the phenomena of propagation, group speed, and energy transport are extensively discussed. Particular emphasis is given to the emergence of diffuse field, the central concept of the theory.
Towards a large deviation theory for strongly correlated systems

International Nuclear Information System (INIS)

Ruiz, Guiomar; Tsallis, Constantino

2012-01-01

A large-deviation connection of statistical mechanics is provided by N independent binary variables, the (N→∞) limit yielding Gaussian distributions. The probability of n≠N/2 out of N throws is governed by e −Nr , r related to the entropy. Large deviations for a strong correlated model characterized by indices (Q,γ) are studied, the (N→∞) limit yielding Q-Gaussians (Q→1 recovers a Gaussian). Its large deviations are governed by e q −Nr q (∝1/N 1/(q−1) , q>1), q=(Q−1)/(γ[3−Q])+1. This illustration opens the door towards a large-deviation foundation of nonextensive statistical mechanics. -- Highlights: ► We introduce the formalism of relative entropy for a single random binary variable and its q-generalization. ► We study a model of N strongly correlated binary random variables and their large-deviation probabilities. ► Large-deviation probability of strongly correlated model exhibits a q-exponential decay whose argument is proportional to N, as extensivity requires. ► Our results point to a q-generalized large deviation theory and suggest a large-deviation foundation of nonextensive statistical mechanics.
Statistics: a Bayesian perspective

National Research Council Canada - National Science Library

Berry, Donald A

1996-01-01

...: it is the only introductory textbook based on Bayesian ideas, it combines concepts and methods, it presents statistics as a means of integrating data into the significant process, it develops ideas...
High energy behaviour of particles and unified statistics

International Nuclear Information System (INIS)

Chang, Y.

1984-01-01

Theories and experiments suggest that particles at high energy appear to possess a new statistics unifying Bose-Einstein and Fermi-Dirac statistics via the GAMMA distribution. This hypothesis can be obtained from many models, and agrees quantitatively with scaling, the multiplicty, large transverse momentum, the mass spectrum, and other data. It may be applied to scatterings at high energy, and agrees with experiments and known QED's results. The Veneziano model and other theories have implied new statistics, such as, the B distribution and the Polya distribution. They revert to the GAMMA distribution at high energy. The possible inapplicability of Pauli's exclusion principle within the unified statistics is considered and associated to the quark constituents

Distribution, Statistics, and Resurfacing of Large Impact Basins on Mercury

Science.gov (United States)

Fassett, Caleb I.; Head, James W.; Baker, David M. H.; Chapman, Clark R.; Murchie, Scott L.; Neumann, Gregory A.; Oberst, Juergen; Prockter, Louise M.; Smith, David E.; Solomon, Sean C.;

2012-01-01

The distribution and geological history of large impact basins (diameter D greater than or equal to 300 km) on Mercury is important to understanding the planet's stratigraphy and surface evolution. It is also informative to compare the density of impact basins on Mercury with that of the Moon to understand similarities and differences in their impact crater and basin populations [1, 2]. A variety of impact basins were proposed on the basis of geological mapping with Mariner 10 data [e.g. 3]. This basin population can now be re-assessed and extended to the full planet, using data from the MErcury Surface, Space ENvironment, GEochemistry, and Ranging (MESSENGER) spacecraft. Note that small-to- medium-sized peak-ring basins on Mercury are being examined separately [4, 5]; only the three largest peak-ring basins on Mercury overlap with the size range we consider here. In this study, we (1) re-examine the large basins suggested on the basis of Mariner 10 data, (2) suggest additional basins from MESSENGER's global coverage of Mercury, (3) assess the size-frequency distribution of mercurian basins on the basis of these global observations and compare it to the Moon, and (4) analyze the implications of these observations for the modification history of basins on Mercury.

Statistical Analysis and Evaluation of the Depth of the Ruts on Lithuanian State Significance Roads

Directory of Open Access Journals (Sweden)

Erinijus Getautis

2011-04-01

Full Text Available The aim of this work is to gather information about the national flexible pavement roads ruts depth, to determine its statistical dispersijon index and to determine their validity for needed requirements. Analysis of scientific works of ruts apearance in the asphalt and their influence for driving is presented in this work. Dynamical models of ruts in asphalt are presented in the work as well. Experimental outcome data of rut depth dispersijon in the national highway of Lithuania Vilnius – Kaunas is prepared. Conclusions are formulated and presented. Article in Lithuanian
Large wood mobility processes in low-order Chilean river channels

Science.gov (United States)

Iroumé, Andrés; Mao, Luca; Andreoli, Andrea; Ulloa, Héctor; Ardiles, María Paz

2015-01-01

Large wood (LW) mobility was studied over several time periods in channel segments of four low-order mountain streams, southern Chile. All wood pieces found within the bankfull channels and on the streambanks extending into the channel with dimensions more than 10 cm in diameter and 1 m in length were measured and their position was referenced. Thirty six percent of measured wood pieces were tagged to investigate log mobility. All segments were first surveyed in summer and then after consecutive rainy winter periods. Annual LW mobility ranged between 0 and 28%. Eighty-four percent of the moved LW had diameters ≤ 40 cm and 92% had lengths ≤ 7 m. Large wood mobility was higher in periods when maximum water level (Hmax) exceeded channel bankfull depth (HBk) than in periods with flows less than HBk, but the difference was not statistically significant. Dimensions of moved LW showed no significant differences between periods with flows exceeding and with flows less than bankfull stage. Statistically significant relationships were found between annual LW mobility (%) and unit stream power (for Hmax) and Hmax/HBk. The mean diameter of transported wood pieces per period was significantly correlated with unit stream power for H15% and H50% (the level above which the flow remains for 15 and 50% of the time, respectively). These results contribute to an understanding of the complexity of LW mobilization processes in mountain streams and can be used to assess and prevent potential damage caused by LW mobilization during floods.
Effect of large volume paracentesis on plasma volume--a cause of hypovolemia

International Nuclear Information System (INIS)

Kao, H.W.; Rakov, N.E.; Savage, E.; Reynolds, T.B.

1985-01-01

Large volume paracentesis, while effectively relieving symptoms in patients with tense ascites, has been generally avoided due to reports of complications attributed to an acute reduction in intravascular volume. Measurements of plasma volume in these subjects have been by indirect methods and have not uniformly confirmed hypovolemia. We have prospectively evaluated 18 patients (20 paracenteses) with tense ascites and peripheral edema due to chronic liver disease undergoing 5 liter paracentesis for relief of symptoms. Plasma volume pre- and postparacentesis was assessed by a 125 I-labeled human serum albumin dilution technique as well as by the change in hematocrit and postural blood pressure difference. No significant change in serum sodium, urea nitrogen, hematocrit or postural systolic blood pressure difference was noted at 24 or 48 hr after paracentesis. Serum creatinine at 24 hr after paracentesis was unchanged but a small but statistically significant increase in serum creatinine was noted at 48 hr postparacentesis. Plasma volume changed -2.7% (n = 6, not statistically significant) during the first 24 hr and -2.8% (n = 12, not statistically significant) during the 0- to 48-hr period. No complications from paracentesis were noted. These results suggest that 5 liter paracentesis for relief of symptoms is safe in patients with tense ascites and peripheral edema from chronic liver disease
Statistics for X-chromosome associations.

Science.gov (United States)

Özbek, Umut; Lin, Hui-Min; Lin, Yan; Weeks, Daniel E; Chen, Wei; Shaffer, John R; Purcell, Shaun M; Feingold, Eleanor

2018-06-13

In a genome-wide association study (GWAS), association between genotype and phenotype at autosomal loci is generally tested by regression models. However, X-chromosome data are often excluded from published analyses of autosomes because of the difference between males and females in number of X chromosomes. Failure to analyze X-chromosome data at all is obviously less than ideal, and can lead to missed discoveries. Even when X-chromosome data are included, they are often analyzed with suboptimal statistics. Several mathematically sensible statistics for X-chromosome association have been proposed. The optimality of these statistics, however, is based on very specific simple genetic models. In addition, while previous simulation studies of these statistics have been informative, they have focused on single-marker tests and have not considered the types of error that occur even under the null hypothesis when the entire X chromosome is scanned. In this study, we comprehensively tested several X-chromosome association statistics using simulation studies that include the entire chromosome. We also considered a wide range of trait models for sex differences and phenotypic effects of X inactivation. We found that models that do not incorporate a sex effect can have large type I error in some cases. We also found that many of the best statistics perform well even when there are modest deviations, such as trait variance differences between the sexes or small sex differences in allele frequencies, from assumptions. © 2018 WILEY PERIODICALS, INC.
Assessment of climate change using methods of mathematic statistics and theory of probability

International Nuclear Information System (INIS)

Trajanoska, Lidija; Kaevski, Ivancho

2004-01-01

In simple terms: 'Climate' is the average of 'weather'. The Earth's weather system is a complex machine composed of coupled sub-systems (ocean, air, land, ice and the biosphere) between which energy are exchanged. The understanding and study of climate change does not only rely on the understanding of the physics of climate change but is linked to the following question: 'How we can detect change in a system that is changing all the time under its own volition'? What is even the meaning of 'change' in such a situation? The concept of 'change' we should transform into the concept of 'significant and long-term' then this re-phrasing allows for a definition in mathematical terms. Significant change in a system becomes a measure of how large an observed change is in terms of the variability one would see under 'normal' conditions. Example could be the analyses of the yearly temperature of the air and precipitations, like in this paper. A large amount of data are selected as representing the 'before' case (change) and another set of data are selected as being the 'after' case and then the average in these two cases are compared. These comparisons are in the form of 'hypothesis tests' in which one tests whether the hypothesis that there has Open no change can be rejected. Both parameter and nonparametric statistic methods are used in the theory of mathematic statistic. The most indicative changeable which show global change is an average, standard deviation and probability function distribution on examined time series. Examined meteorological series are taken like haphazard process so we can mathematic statistic applied.(Author)
Statistical methods in physical mapping

International Nuclear Information System (INIS)

Nelson, D.O.

1995-05-01

One of the great success stories of modern molecular genetics has been the ability of biologists to isolate and characterize the genes responsible for serious inherited diseases like fragile X syndrome, cystic fibrosis and myotonic muscular dystrophy. This dissertation concentrates on constructing high-resolution physical maps. It demonstrates how probabilistic modeling and statistical analysis can aid molecular geneticists in the tasks of planning, execution, and evaluation of physical maps of chromosomes and large chromosomal regions. The dissertation is divided into six chapters. Chapter 1 provides an introduction to the field of physical mapping, describing the role of physical mapping in gene isolation and ill past efforts at mapping chromosomal regions. The next two chapters review and extend known results on predicting progress in large mapping projects. Such predictions help project planners decide between various approaches and tactics for mapping large regions of the human genome. Chapter 2 shows how probability models have been used in the past to predict progress in mapping projects. Chapter 3 presents new results, based on stationary point process theory, for progress measures for mapping projects based on directed mapping strategies. Chapter 4 describes in detail the construction of all initial high-resolution physical map for human chromosome 19. This chapter introduces the probability and statistical models involved in map construction in the context of a large, ongoing physical mapping project. Chapter 5 concentrates on one such model, the trinomial model. This chapter contains new results on the large-sample behavior of this model, including distributional results, asymptotic moments, and detection error rates. In addition, it contains an optimality result concerning experimental procedures based on the trinomial model. The last chapter explores unsolved problems and describes future work
Statistical methods in physical mapping

Energy Technology Data Exchange (ETDEWEB)

Nelson, David O. [Univ. of California, Berkeley, CA (United States)

1995-05-01

One of the great success stories of modern molecular genetics has been the ability of biologists to isolate and characterize the genes responsible for serious inherited diseases like fragile X syndrome, cystic fibrosis and myotonic muscular dystrophy. This dissertation concentrates on constructing high-resolution physical maps. It demonstrates how probabilistic modeling and statistical analysis can aid molecular geneticists in the tasks of planning, execution, and evaluation of physical maps of chromosomes and large chromosomal regions. The dissertation is divided into six chapters. Chapter 1 provides an introduction to the field of physical mapping, describing the role of physical mapping in gene isolation and ill past efforts at mapping chromosomal regions. The next two chapters review and extend known results on predicting progress in large mapping projects. Such predictions help project planners decide between various approaches and tactics for mapping large regions of the human genome. Chapter 2 shows how probability models have been used in the past to predict progress in mapping projects. Chapter 3 presents new results, based on stationary point process theory, for progress measures for mapping projects based on directed mapping strategies. Chapter 4 describes in detail the construction of all initial high-resolution physical map for human chromosome 19. This chapter introduces the probability and statistical models involved in map construction in the context of a large, ongoing physical mapping project. Chapter 5 concentrates on one such model, the trinomial model. This chapter contains new results on the large-sample behavior of this model, including distributional results, asymptotic moments, and detection error rates. In addition, it contains an optimality result concerning experimental procedures based on the trinomial model. The last chapter explores unsolved problems and describes future work.
Perceived Statistical Knowledge Level and Self-Reported Statistical Practice Among Academic Psychologists

Directory of Open Access Journals (Sweden)

Laura Badenes-Ribera

2018-06-01

Full Text Available Introduction: Publications arguing against the null hypothesis significance testing (NHST procedure and in favor of good statistical practices have increased. The most frequently mentioned alternatives to NHST are effect size statistics (ES, confidence intervals (CIs, and meta-analyses. A recent survey conducted in Spain found that academic psychologists have poor knowledge about effect size statistics, confidence intervals, and graphic displays for meta-analyses, which might lead to a misinterpretation of the results. In addition, it also found that, although the use of ES is becoming generalized, the same thing is not true for CIs. Finally, academics with greater knowledge about ES statistics presented a profile closer to good statistical practice and research design. Our main purpose was to analyze the extension of these results to a different geographical area through a replication study.Methods: For this purpose, we elaborated an on-line survey that included the same items as the original research, and we asked academic psychologists to indicate their level of knowledge about ES, their CIs, and meta-analyses, and how they use them. The sample consisted of 159 Italian academic psychologists (54.09% women, mean age of 47.65 years. The mean number of years in the position of professor was 12.90 (SD = 10.21.Results: As in the original research, the results showed that, although the use of effect size estimates is becoming generalized, an under-reporting of CIs for ES persists. The most frequent ES statistics mentioned were Cohen's d and R2/η2, which can have outliers or show non-normality or violate statistical assumptions. In addition, academics showed poor knowledge about meta-analytic displays (e.g., forest plot and funnel plot and quality checklists for studies. Finally, academics with higher-level knowledge about ES statistics seem to have a profile closer to good statistical practices.Conclusions: Changing statistical practice is not
1979 DOE statistical symposium

International Nuclear Information System (INIS)

Gardiner, D.A.; Truett, T.

1980-09-01

The 1979 DOE Statistical Symposium was the fifth in the series of annual symposia designed to bring together statisticians and other interested parties who are actively engaged in helping to solve the nation's energy problems. The program included presentations of technical papers centered around exploration and disposal of nuclear fuel, general energy-related topics, and health-related issues, and workshops on model evaluation, risk analysis, analysis of large data sets, and resource estimation
1979 DOE statistical symposium

Energy Technology Data Exchange (ETDEWEB)

Gardiner, D.A.; Truett T. (comps. and eds.)

1980-09-01

The 1979 DOE Statistical Symposium was the fifth in the series of annual symposia designed to bring together statisticians and other interested parties who are actively engaged in helping to solve the nation's energy problems. The program included presentations of technical papers centered around exploration and disposal of nuclear fuel, general energy-related topics, and health-related issues, and workshops on model evaluation, risk analysis, analysis of large data sets, and resource estimation.
Understanding Statistics - Cancer Statistics

Science.gov (United States)

Annual reports of U.S. cancer statistics including new cases, deaths, trends, survival, prevalence, lifetime risk, and progress toward Healthy People targets, plus statistical summaries for a number of common cancer types.
Funding source and primary outcome changes in clinical trials registered on ClinicalTrials.gov are associated with the reporting of a statistically significant primary outcome: a cross-sectional study [v2; ref status: indexed, http://f1000r.es/5bj

Directory of Open Access Journals (Sweden)

Sreeram V Ramagopalan

2015-04-01

Full Text Available Background: We and others have shown a significant proportion of interventional trials registered on ClinicalTrials.gov have their primary outcomes altered after the listed study start and completion dates. The objectives of this study were to investigate whether changes made to primary outcomes are associated with the likelihood of reporting a statistically significant primary outcome on ClinicalTrials.gov. Methods: A cross-sectional analysis of all interventional clinical trials registered on ClinicalTrials.gov as of 20 November 2014 was performed. The main outcome was any change made to the initially listed primary outcome and the time of the change in relation to the trial start and end date. Findings: 13,238 completed interventional trials were registered with ClinicalTrials.gov that also had study results posted on the website. 2555 (19.3% had one or more statistically significant primary outcomes. Statistical analysis showed that registration year, funding source and primary outcome change after trial completion were associated with reporting a statistically significant primary outcome. Conclusions: Funding source and primary outcome change after trial completion are associated with a statistically significant primary outcome report on clinicaltrials.gov.
Statistical analysis and data management

International Nuclear Information System (INIS)

Anon.

1981-01-01

This report provides an overview of the history of the WIPP Biology Program. The recommendations of the American Institute of Biological Sciences (AIBS) for the WIPP biology program are summarized. The data sets available for statistical analyses and problems associated with these data sets are also summarized. Biological studies base maps are presented. A statistical model is presented to evaluate any correlation between climatological data and small mammal captures. No statistically significant relationship between variance in small mammal captures on Dr. Gennaro's 90m x 90m grid and precipitation records from the Duval Potash Mine were found
A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

Science.gov (United States)

Luo, Li; Zhu, Yun; Xiong, Momiao

2012-06-01

The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.
A perceptual space of local image statistics.

Science.gov (United States)

Victor, Jonathan D; Thengone, Daniel J; Rizvi, Syed M; Conte, Mary M

2015-12-01

Local image statistics are important for visual analysis of textures, surfaces, and form. There are many kinds of local statistics, including those that capture luminance distributions, spatial contrast, oriented segments, and corners. While sensitivity to each of these kinds of statistics have been well-studied, much less is known about visual processing when multiple kinds of statistics are relevant, in large part because the dimensionality of the problem is high and different kinds of statistics interact. To approach this problem, we focused on binary images on a square lattice - a reduced set of stimuli which nevertheless taps many kinds of local statistics. In this 10-parameter space, we determined psychophysical thresholds to each kind of statistic (16 observers) and all of their pairwise combinations (4 observers). Sensitivities and isodiscrimination contours were consistent across observers. Isodiscrimination contours were elliptical, implying a quadratic interaction rule, which in turn determined ellipsoidal isodiscrimination surfaces in the full 10-dimensional space, and made predictions for sensitivities to complex combinations of statistics. These predictions, including the prediction of a combination of statistics that was metameric to random, were verified experimentally. Finally, check size had only a mild effect on sensitivities over the range from 2.8 to 14min, but sensitivities to second- and higher-order statistics was substantially lower at 1.4min. In sum, local image statistics form a perceptual space that is highly stereotyped across observers, in which different kinds of statistics interact according to simple rules. Copyright © 2015 Elsevier Ltd. All rights reserved.
Confounding and Statistical Significance of Indirect Effects: Childhood Adversity, Education, Smoking, and Anxious and Depressive Symptomatology

Directory of Open Access Journals (Sweden)

Mashhood Ahmed Sheikh

2017-08-01

mediate the association between childhood adversity and ADS in adulthood. However, when education was excluded as a mediator-response confounding variable, the indirect effect of childhood adversity on ADS in adulthood was statistically significant (p < 0.05. This study shows that a careful inclusion of potential confounding variables is important when assessing mediation.
Representative volume size: A comparison of statistical continuum mechanics and statistical physics

Energy Technology Data Exchange (ETDEWEB)

AIDUN,JOHN B.; TRUCANO,TIMOTHY G.; LO,CHI S.; FYE,RICHARD M.

1999-05-01

In this combination background and position paper, the authors argue that careful work is needed to develop accurate methods for relating the results of fine-scale numerical simulations of material processes to meaningful values of macroscopic properties for use in constitutive models suitable for finite element solid mechanics simulations. To provide a definite context for this discussion, the problem is couched in terms of the lack of general objective criteria for identifying the size of the representative volume (RV) of a material. The objective of this report is to lay out at least the beginnings of an approach for applying results and methods from statistical physics to develop concepts and tools necessary for determining the RV size, as well as alternatives to RV volume-averaging for situations in which the RV is unmanageably large. The background necessary to understand the pertinent issues and statistical physics concepts is presented.
Statistical model for the mechanical behavior of the tissue engineering non-woven fibrous matrices under large deformation.

Science.gov (United States)

Rizvi, Mohd Suhail; Pal, Anupam

2014-09-01

The fibrous matrices are widely used as scaffolds for the regeneration of load-bearing tissues due to their structural and mechanical similarities with the fibrous components of the extracellular matrix. These scaffolds not only provide the appropriate microenvironment for the residing cells but also act as medium for the transmission of the mechanical stimuli, essential for the tissue regeneration, from macroscopic scale of the scaffolds to the microscopic scale of cells. The requirement of the mechanical loading for the tissue regeneration requires the fibrous scaffolds to be able to sustain the complex three-dimensional mechanical loading conditions. In order to gain insight into the mechanical behavior of the fibrous matrices under large amount of elongation as well as shear, a statistical model has been formulated to study the macroscopic mechanical behavior of the electrospun fibrous matrix and the transmission of the mechanical stimuli from scaffolds to the cells via the constituting fibers. The study establishes the load-deformation relationships for the fibrous matrices for different structural parameters. It also quantifies the changes in the fiber arrangement and tension generated in the fibers with the deformation of the matrix. The model reveals that the tension generated in the fibers on matrix deformation is not homogeneous and hence the cells located in different regions of the fibrous scaffold might experience different mechanical stimuli. The mechanical response of fibrous matrices was also found to be dependent on the aspect ratio of the matrix. Therefore, the model establishes a structure-mechanics interdependence of the fibrous matrices under large deformation, which can be utilized in identifying the appropriate structure and external mechanical loading conditions for the regeneration of load-bearing tissues. Copyright © 2014 Elsevier Ltd. All rights reserved.
Statistics Anxiety, State Anxiety during an Examination, and Academic Achievement

Science.gov (United States)

Macher, Daniel; Paechter, Manuela; Papousek, Ilona; Ruggeri, Kai; Freudenthaler, H. Harald; Arendasy, Martin

2013-01-01

Background: A large proportion of students identify statistics courses as the most anxiety-inducing courses in their curriculum. Many students feel impaired by feelings of state anxiety in the examination and therefore probably show lower achievements. Aims: The study investigates how statistics anxiety, attitudes (e.g., interest, mathematical…

Statistical theory and inference

CERN Document Server

Olive, David J

2014-01-01

This text is for a one semester graduate course in statistical theory and covers minimal and complete sufficient statistics, maximum likelihood estimators, method of moments, bias and mean square error, uniform minimum variance estimators and the Cramer-Rao lower bound, an introduction to large sample theory, likelihood ratio tests and uniformly most powerful tests and the Neyman Pearson Lemma. A major goal of this text is to make these topics much more accessible to students by using the theory of exponential families. Exponential families, indicator functions and the support of the distribution are used throughout the text to simplify the theory. More than 50 ``brand name" distributions are used to illustrate the theory with many examples of exponential families, maximum likelihood estimators and uniformly minimum variance unbiased estimators. There are many homework problems with over 30 pages of solutions.
Classical and statistical thermodynamics

CERN Document Server

Rizk, Hanna A

2016-01-01

This is a text book of thermodynamics for the student who seeks thorough training in science or engineering. Systematic and thorough treatment of the fundamental principles rather than presenting the large mass of facts has been stressed. The book includes some of the historical and humanistic background of thermodynamics, but without affecting the continuity of the analytical treatment. For a clearer and more profound understanding of thermodynamics this book is highly recommended. In this respect, the author believes that a sound grounding in classical thermodynamics is an essential prerequisite for the understanding of statistical thermodynamics. Such a book comprising the two wide branches of thermodynamics is in fact unprecedented. Being a written work dealing systematically with the two main branches of thermodynamics, namely classical thermodynamics and statistical thermodynamics, together with some important indexes under only one cover, this treatise is so eminently useful.
Thyroid Autoimmunity and Behçet’s Disease: Is There a Significant Association?

Directory of Open Access Journals (Sweden)

Filiz Cebeci

2013-01-01

Full Text Available Background. Behcet’s disease (BD could be regarded as an autoimmune disease in many aspects. Autoimmune thyroid disease (ATD is frequently accompanied by other various autoimmune diseases. Nevertheless, there is not still enough data showing the association between BD and ATD. In addition, no controlled study is present in the PubMed, which evaluates thyroidal autoimmunity using antithyroid peroxidase antibody in a large series of patients with BD. Methods. We aimed to investigate the frequency of ATD in patients with BD. The study included 124 patients with BD and 99 age- and sex-matched healthy volunteers. Results. Autoimmune thyroiditis was noted in 21 cases (16.9% with BD. In the control group, 22 cases (22.22% were diagnosed as autoimmune thyroiditis. There was no difference between the groups in respect to thyroid autoantibodies (. There were no statistically significant differences between baseline TSH levels of the BD patients and of the controls (. Statistically, the mean serum free T4 levels of the patients with BD were higher than those of the controls (. Conclusions. No association could be found between BD and ATD. Therefore, it is not of significance to investigate thyroid autoimmunity in BD.
A statistical-dynamical downscaling procedure for global climate simulations

International Nuclear Information System (INIS)

Frey-Buness, A.; Heimann, D.; Sausen, R.; Schumann, U.

1994-01-01

A statistical-dynamical downscaling procedure for global climate simulations is described. The procedure is based on the assumption that any regional climate is associated with a specific frequency distribution of classified large-scale weather situations. The frequency distributions are derived from multi-year episodes of low resolution global climate simulations. Highly resolved regional distributions of wind and temperature are calculated with a regional model for each class of large-scale weather situation. They are statistically evaluated by weighting them with the according climate-specific frequency. The procedure is exemplarily applied to the Alpine region for a global climate simulation of the present climate. (orig.)
[Big data in official statistics].

Science.gov (United States)

Zwick, Markus

2015-08-01

The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany.
Wind and wave extremes over the world oceans from very large ensembles

Science.gov (United States)

Breivik, Øyvind; Aarnes, Ole Johan; Abdalla, Saleh; Bidlot, Jean-Raymond; Janssen, Peter A. E. M.

2014-07-01

Global return values of marine wind speed and significant wave height are estimated from very large aggregates of archived ensemble forecasts at +240 h lead time. Long lead time ensures that the forecasts represent independent draws from the model climate. Compared with ERA-Interim, a reanalysis, the ensemble yields higher return estimates for both wind speed and significant wave height. Confidence intervals are much tighter due to the large size of the data set. The period (9 years) is short enough to be considered stationary even with climate change. Furthermore, the ensemble is large enough for nonparametric 100 year return estimates to be made from order statistics. These direct return estimates compare well with extreme value estimates outside areas with tropical cyclones. Like any method employing modeled fields, it is sensitive to tail biases in the numerical model, but we find that the biases are moderate outside areas with tropical cyclones.
Error Analysis of Statistical Linearization with Gaussian Closure for Large Degree-of-Freedom Systems

DEFF Research Database (Denmark)

Micaletti, R. C.; Cakmak, A. S.; Nielsen, Søren R. K.

This paper contains an analyses of the error induced by applying the method of the equivalent statistical linearzation (ESL) to randomly-exited multi-degree-of-freedom (MDOF) geometrically nonlinear shear-frame structures as the number of degrees of freedom increases. The quantity that is analyzed...
Performance modeling, loss networks, and statistical multiplexing

CERN Document Server

Mazumdar, Ravi

2009-01-01

This monograph presents a concise mathematical approach for modeling and analyzing the performance of communication networks with the aim of understanding the phenomenon of statistical multiplexing. The novelty of the monograph is the fresh approach and insights provided by a sample-path methodology for queueing models that highlights the important ideas of Palm distributions associated with traffic models and their role in performance measures. Also presented are recent ideas of large buffer, and many sources asymptotics that play an important role in understanding statistical multiplexing. I
Association between large strongyle genera in larval cultures--using rare-event poisson regression.

Science.gov (United States)

Cao, X; Vidyashankar, A N; Nielsen, M K

2013-09-01

Decades of intensive anthelmintic treatment has caused equine large strongyles to become quite rare, while the cyathostomins have developed resistance to several drug classes. The larval culture has been associated with low to moderate negative predictive values for detecting Strongylus vulgaris infection. It is unknown whether detection of other large strongyle species can be statistically associated with presence of S. vulgaris. This remains a statistical challenge because of the rare occurrence of large strongyle species. This study used a modified Poisson regression to analyse a dataset for associations between S. vulgaris infection and simultaneous occurrence of Strongylus edentatus and Triodontophorus spp. In 663 horses on 42 Danish farms, the individual prevalences of S. vulgaris, S. edentatus and Triodontophorus spp. were 12%, 3% and 12%, respectively. Both S. edentatus and Triodontophorus spp. were significantly associated with S. vulgaris infection with relative risks above 1. Further, S. edentatus was associated with use of selective therapy on the farms, as well as negatively associated with anthelmintic treatment carried out within 6 months prior to the study. The findings illustrate that occurrence of S. vulgaris in larval cultures can be interpreted as indicative of other large strongyles being likely to be present.
Statistical processing of technological and radiochemical data

International Nuclear Information System (INIS)

Lahodova, Zdena; Vonkova, Kateřina

2011-01-01

The project described in this article had two goals. The main goal was to compare technological and radiochemical data from two units of nuclear power plant. The other goal was to check the collection, organization and interpretation of routinely measured data. Monitoring of analytical and radiochemical data is a very valuable source of knowledge for some processes in the primary circuit. Exploratory analysis of one-dimensional data was performed to estimate location and variability and to find extreme values, data trends, distribution, autocorrelation etc. This process allowed for the cleaning and completion of raw data. Then multiple analyses such as multiple comparisons, multiple correlation, variance analysis, and so on were performed. Measured data was organized into a data matrix. The results and graphs such as Box plots, Mahalanobis distance, Biplot, Correlation, and Trend graphs are presented in this article as statistical analysis tools. Tables of data were replaced with graphs because graphs condense large amounts of information into easy-to-understand formats. The significant conclusion of this work is that the collection and comprehension of data is a very substantial part of statistical processing. With well-prepared and well-understood data, its accurate evaluation is possible. Cooperation between the technicians who collect data and the statistician who processes it is also very important. (author)
National Statistical Commission and Indian Official Statistics*

Indian Academy of Sciences (India)

IAS Admin

a good collection of official statistics of that time. With more .... statistical agencies and institutions to provide details of statistical activities .... ing several training programmes. .... ful completion of Indian Statistical Service examinations, the.
On Nonextensive Statistics, Chaos and Fractal Strings

CERN Document Server

Castro, C

2004-01-01

Motivated by the growing evidence of universality and chaos in QFT and string theory, we study the Tsallis non-extensive statistics ( with a non-additive $ q$-entropy ) of an ensemble of fractal strings and branes of different dimensionalities. Non-equilibrium systems with complex dynamics in stationary states may exhibit large fluctuations of intensive quantities which are described in terms of generalized statistics. Tsallis statistics is a particular representative of such class. The non-extensive entropy and probability distribution of a canonical ensemble of fractal strings and branes is studied in terms of their dimensional spectrum which leads to a natural upper cutoff in energy and establishes a direct correlation among dimensions, energy and temperature. The absolute zero temperature ( Kelvin ) corresponds to zero dimensions (energy ) and an infinite temperature corresponds to infinite dimensions. In the concluding remarks some applications of fractal statistics, quasi-particles, knot theory, quantum...
Official Statistics and Statistics Education: Bridging the Gap

Directory of Open Access Journals (Sweden)

Gal Iddo

2017-03-01

Full Text Available This article aims to challenge official statistics providers and statistics educators to ponder on how to help non-specialist adult users of statistics develop those aspects of statistical literacy that pertain to official statistics. We first document the gap in the literature in terms of the conceptual basis and educational materials needed for such an undertaking. We then review skills and competencies that may help adults to make sense of statistical information in areas of importance to society. Based on this review, we identify six elements related to official statistics about which non-specialist adult users should possess knowledge in order to be considered literate in official statistics: (1 the system of official statistics and its work principles; (2 the nature of statistics about society; (3 indicators; (4 statistical techniques and big ideas; (5 research methods and data sources; and (6 awareness and skills for citizens’ access to statistical reports. Based on this ad hoc typology, we discuss directions that official statistics providers, in cooperation with statistics educators, could take in order to (1 advance the conceptualization of skills needed to understand official statistics, and (2 expand educational activities and services, specifically by developing a collaborative digital textbook and a modular online course, to improve public capacity for understanding of official statistics.
Watt-Lite; Energy Statistics Made Tangible

DEFF Research Database (Denmark)

Jönsson, Li; Broms, Loove; Katzeff, Cecilia

2011-01-01

Increasing our knowledge of how design affects behaviour in the workplace has a large potential for reducing electricity consumption. This would be beneficial for the environment as well as for industry and society at large. In Western society energy use is hidden and for the great mass...... in the physical environments of its employees. The design of Watt-Lite is meant to explore ways of representing, understanding and interacting with electricity in industrial workspaces. We discuss three design inquiries and their implications for the design of Watt-Lite: the use of tangible statistics...
Statistical Analysis of Data for Timber Strengths

DEFF Research Database (Denmark)

Sørensen, John Dalsgaard

2003-01-01

Statistical analyses are performed for material strength parameters from a large number of specimens of structural timber. Non-parametric statistical analysis and fits have been investigated for the following distribution types: Normal, Lognormal, 2 parameter Weibull and 3-parameter Weibull...... fits to the data available, especially if tail fits are used whereas the Log Normal distribution generally gives a poor fit and larger coefficients of variation, especially if tail fits are used. The implications on the reliability level of typical structural elements and on partial safety factors...... for timber are investigated....
Beyond quantum microcanonical statistics

International Nuclear Information System (INIS)

Fresch, Barbara; Moro, Giorgio J.

2011-01-01

Descriptions of molecular systems usually refer to two distinct theoretical frameworks. On the one hand the quantum pure state, i.e., the wavefunction, of an isolated system is determined to calculate molecular properties and their time evolution according to the unitary Schroedinger equation. On the other hand a mixed state, i.e., a statistical density matrix, is the standard formalism to account for thermal equilibrium, as postulated in the microcanonical quantum statistics. In the present paper an alternative treatment relying on a statistical analysis of the possible wavefunctions of an isolated system is presented. In analogy with the classical ergodic theory, the time evolution of the wavefunction determines the probability distribution in the phase space pertaining to an isolated system. However, this alone cannot account for a well defined thermodynamical description of the system in the macroscopic limit, unless a suitable probability distribution for the quantum constants of motion is introduced. We present a workable formalism assuring the emergence of typical values of thermodynamic functions, such as the internal energy and the entropy, in the large size limit of the system. This allows the identification of macroscopic properties independently of the specific realization of the quantum state. A description of material systems in agreement with equilibrium thermodynamics is then derived without constraints on the physical constituents and interactions of the system. Furthermore, the canonical statistics is recovered in all generality for the reduced density matrix of a subsystem.
Statistical Mechanics of Turbulent Dynamos

Science.gov (United States)

Shebalin, John V.

2014-01-01

Incompressible magnetohydrodynamic (MHD) turbulence and magnetic dynamos, which occur in magnetofluids with large fluid and magnetic Reynolds numbers, will be discussed. When Reynolds numbers are large and energy decays slowly, the distribution of energy with respect to length scale becomes quasi-stationary and MHD turbulence can be described statistically. In the limit of infinite Reynolds numbers, viscosity and resistivity become zero and if these values are used in the MHD equations ab initio, a model system called ideal MHD turbulence results. This model system is typically confined in simple geometries with some form of homogeneous boundary conditions, allowing for velocity and magnetic field to be represented by orthogonal function expansions. One advantage to this is that the coefficients of the expansions form a set of nonlinearly interacting variables whose behavior can be described by equilibrium statistical mechanics, i.e., by a canonical ensemble theory based on the global invariants (energy, cross helicity and magnetic helicity) of ideal MHD turbulence. Another advantage is that truncated expansions provide a finite dynamical system whose time evolution can be numerically simulated to test the predictions of the associated statistical mechanics. If ensemble predictions are the same as time averages, then the system is said to be ergodic; if not, the system is nonergodic. Although it had been implicitly assumed in the early days of ideal MHD statistical theory development that these finite dynamical systems were ergodic, numerical simulations provided sufficient evidence that they were, in fact, nonergodic. Specifically, while canonical ensemble theory predicted that expansion coefficients would be (i) zero-mean random variables with (ii) energy that decreased with length scale, it was found that although (ii) was correct, (i) was not and the expected ergodicity was broken. The exact cause of this broken ergodicity was explained, after much
Implementation of an adaptive training and tracking game in statistics teaching

NARCIS (Netherlands)

Groeneveld, C.M.; Kalz, M.; Ras, E.

2014-01-01

Statistics teaching in higher education has a number of challenges. An adaptive training, tracking and teaching tool in a gaming environment aims to address problems inherent in statistics teaching. This paper discusses the implementation of this tool in a large first year university programme and
An application of an optimal statistic for characterizing relative orientations

Science.gov (United States)

Jow, Dylan L.; Hill, Ryley; Scott, Douglas; Soler, J. D.; Martin, P. G.; Devlin, M. J.; Fissel, L. M.; Poidevin, F.

2018-02-01

We present the projected Rayleigh statistic (PRS), a modification of the classic Rayleigh statistic, as a test for non-uniform relative orientation between two pseudo-vector fields. In the application here, this gives an effective way of investigating whether polarization pseudo-vectors (spin-2 quantities) are preferentially parallel or perpendicular to filaments in the interstellar medium. For example, there are other potential applications in astrophysics, e.g. when comparing small-scale orientations with larger scale shear patterns. We compare the efficiency of the PRS against histogram binning methods that have previously been used for characterizing the relative orientations of gas column density structures with the magnetic field projected on the plane of the sky. We examine data for the Vela C molecular cloud, where the column density is inferred from Herschel submillimetre observations, and the magnetic field from observations by the Balloon-borne Large-Aperture Submillimetre Telescope in the 250-, 350- and 500-μm wavelength bands. We find that the PRS has greater statistical power than approaches that bin the relative orientation angles, as it makes more efficient use of the information contained in the data. In particular, the use of the PRS to test for preferential alignment results in a higher statistical significance, in each of the four Vela C regions, with the greatest increase being by a factor 1.3 in the South-Nest region in the 250 - μ m band.
Novel Kalman filter algorithm for statistical monitoring of extensive landscapes with synoptic sensor data

Science.gov (United States)

Raymond L. Czaplewski

2015-01-01

Wall-to-wall remotely sensed data are increasingly available to monitor landscape dynamics over large geographic areas. However, statistical monitoring programs that use post-stratification cannot fully utilize those sensor data. The Kalman filter (KF) is an alternative statistical estimator. I develop a new KF algorithm that is numerically robust with large numbers of...

Development of free statistical software enabling researchers to calculate confidence levels, clinical significance curves and risk-benefit contours

International Nuclear Information System (INIS)

Shakespeare, T.P.; Mukherjee, R.K.; Gebski, V.J.

2003-01-01

Confidence levels, clinical significance curves, and risk-benefit contours are tools improving analysis of clinical studies and minimizing misinterpretation of published results, however no software has been available for their calculation. The objective was to develop software to help clinicians utilize these tools. Excel 2000 spreadsheets were designed using only built-in functions, without macros. The workbook was protected and encrypted so that users can modify only input cells. The workbook has 4 spreadsheets for use in studies comparing two patient groups. Sheet 1 comprises instructions and graphic examples for use. Sheet 2 allows the user to input the main study results (e.g. survival rates) into a 2-by-2 table. Confidence intervals (95%), p-value and the confidence level for Treatment A being better than Treatment B are automatically generated. An additional input cell allows the user to determine the confidence associated with a specified level of benefit. For example if the user wishes to know the confidence that Treatment A is at least 10% better than B, 10% is entered. Sheet 2 automatically displays clinical significance curves, graphically illustrating confidence levels for all possible benefits of one treatment over the other. Sheet 3 allows input of toxicity data, and calculates the confidence that one treatment is more toxic than the other. It also determines the confidence that the relative toxicity of the most effective arm does not exceed user-defined tolerability. Sheet 4 automatically calculates risk-benefit contours, displaying the confidence associated with a specified scenario of minimum benefit and maximum risk of one treatment arm over the other. The spreadsheet is freely downloadable at www.ontumor.com/professional/statistics.htm A simple, self-explanatory, freely available spreadsheet calculator was developed using Excel 2000. The incorporated decision-making tools can be used for data analysis and improve the reporting of results of any
Large roads reduce bat activity across multiple species.

Science.gov (United States)

Kitzes, Justin; Merenlender, Adina

2014-01-01

Although the negative impacts of roads on many terrestrial vertebrate and bird populations are well documented, there have been few studies of the road ecology of bats. To examine the effects of large roads on bat populations, we used acoustic recorders to survey bat activity along ten 300 m transects bordering three large highways in northern California, applying a newly developed statistical classifier to identify recorded calls to the species level. Nightly counts of bat passes were analyzed with generalized linear mixed models to determine the relationship between bat activity and distance from a road. Total bat activity recorded at points adjacent to roads was found to be approximately one-half the level observed at 300 m. Statistically significant road effects were also found for the Brazilian free-tailed bat (Tadarida brasiliensis), big brown bat (Eptesicus fuscus), hoary bat (Lasiurus cinereus), and silver-haired bat (Lasionycteris noctivagans). The road effect was found to be temperature dependent, with hot days both increasing total activity at night and reducing the difference between activity levels near and far from roads. These results suggest that the environmental impacts of road construction may include degradation of bat habitat and that mitigation activities for this habitat loss may be necessary to protect bat populations.
Statistical physics of community ecology: a cavity solution to MacArthur’s consumer resource model

Science.gov (United States)

Advani, Madhu; Bunin, Guy; Mehta, Pankaj

2018-03-01

A central question in ecology is to understand the ecological processes that shape community structure. Niche-based theories have emphasized the important role played by competition for maintaining species diversity. Many of these insights have been derived using MacArthur’s consumer resource model (MCRM) or its generalizations. Most theoretical work on the MCRM has focused on small ecosystems with a few species and resources. However theoretical insights derived from small ecosystems many not scale up to large ecosystems with many resources and species because large systems with many interacting components often display new emergent behaviors that cannot be understood or deduced from analyzing smaller systems. To address these shortcomings, we develop a statistical physics inspired cavity method to analyze MCRM when both the number of species and the number of resources is large. Unlike previous work in this limit, our theory addresses resource dynamics and resource depletion and demonstrates that species generically and consistently perturb their environments and significantly modify available ecological niches. We show how our cavity approach naturally generalizes niche theory to large ecosystems by accounting for the effect of collective phenomena on species invasion and ecological stability. Our theory suggests that such phenomena are a generic feature of large, natural ecosystems and must be taken into account when analyzing and interpreting community structure. It also highlights the important role that statistical-physics inspired approaches can play in furthering our understanding of ecology.
Statistical mechanics of economics I

Energy Technology Data Exchange (ETDEWEB)

Kusmartsev, F.V., E-mail: F.Kusmartsev@lboro.ac.u [Department of Physics, Loughborough University, Leicestershire, LE11 3TU (United Kingdom)

2011-02-07

We show that statistical mechanics is useful in the description of financial crisis and economics. Taking a large amount of instant snapshots of a market over an interval of time we construct their ensembles and study their statistical interference. This results in a probability description of the market and gives capital, money, income, wealth and debt distributions, which in the most cases takes the form of the Bose-Einstein distribution. In addition, statistical mechanics provides the main market equations and laws which govern the correlations between the amount of money, debt, product, prices and number of retailers. We applied the found relations to a study of the evolution of the economics in USA between the years 1996 to 2008 and observe that over that time the income of a major population is well described by the Bose-Einstein distribution which parameters are different for each year. Each financial crisis corresponds to a peak in the absolute activity coefficient. The analysis correctly indicates the past crises and predicts the future one.
Statistical mechanics of economics I

International Nuclear Information System (INIS)

Kusmartsev, F.V.

2011-01-01

We show that statistical mechanics is useful in the description of financial crisis and economics. Taking a large amount of instant snapshots of a market over an interval of time we construct their ensembles and study their statistical interference. This results in a probability description of the market and gives capital, money, income, wealth and debt distributions, which in the most cases takes the form of the Bose-Einstein distribution. In addition, statistical mechanics provides the main market equations and laws which govern the correlations between the amount of money, debt, product, prices and number of retailers. We applied the found relations to a study of the evolution of the economics in USA between the years 1996 to 2008 and observe that over that time the income of a major population is well described by the Bose-Einstein distribution which parameters are different for each year. Each financial crisis corresponds to a peak in the absolute activity coefficient. The analysis correctly indicates the past crises and predicts the future one.
Improvement of Statistical Decisions under Parametric Uncertainty

Science.gov (United States)

Nechval, Nicholas A.; Nechval, Konstantin N.; Purgailis, Maris; Berzins, Gundars; Rozevskis, Uldis

2011-10-01

A large number of problems in production planning and scheduling, location, transportation, finance, and engineering design require that decisions be made in the presence of uncertainty. Decision-making under uncertainty is a central problem in statistical inference, and has been formally studied in virtually all approaches to inference. The aim of the present paper is to show how the invariant embedding technique, the idea of which belongs to the authors, may be employed in the particular case of finding the improved statistical decisions under parametric uncertainty. This technique represents a simple and computationally attractive statistical method based on the constructive use of the invariance principle in mathematical statistics. Unlike the Bayesian approach, an invariant embedding technique is independent of the choice of priors. It allows one to eliminate unknown parameters from the problem and to find the best invariant decision rule, which has smaller risk than any of the well-known decision rules. To illustrate the proposed technique, application examples are given.
Gaussian statistics for palaeomagnetic vectors

Science.gov (United States)

Love, J.J.; Constable, C.G.

2003-01-01

With the aim of treating the statistics of palaeomagnetic directions and intensities jointly and consistently, we represent the mean and the variance of palaeomagnetic vectors, at a particular site and of a particular polarity, by a probability density function in a Cartesian three-space of orthogonal magnetic-field components consisting of a single (unimoda) non-zero mean, spherically-symmetrical (isotropic) Gaussian function. For palaeomagnetic data of mixed polarities, we consider a bimodal distribution consisting of a pair of such symmetrical Gaussian functions, with equal, but opposite, means and equal variances. For both the Gaussian and bi-Gaussian distributions, and in the spherical three-space of intensity, inclination, and declination, we obtain analytical expressions for the marginal density functions, the cumulative distributions, and the expected values and variances for each spherical coordinate (including the angle with respect to the axis of symmetry of the distributions). The mathematical expressions for the intensity and off-axis angle are closed-form and especially manageable, with the intensity distribution being Rayleigh-Rician. In the limit of small relative vectorial dispersion, the Gaussian (bi-Gaussian) directional distribution approaches a Fisher (Bingham) distribution and the intensity distribution approaches a normal distribution. In the opposite limit of large relative vectorial dispersion, the directional distributions approach a spherically-uniform distribution and the intensity distribution approaches a Maxwell distribution. We quantify biases in estimating the properties of the vector field resulting from the use of simple arithmetic averages, such as estimates of the intensity or the inclination of the mean vector, or the variances of these quantities. With the statistical framework developed here and using the maximum-likelihood method, which gives unbiased estimates in the limit of large data numbers, we demonstrate how to
Gaussian statistics for palaeomagnetic vectors

Science.gov (United States)

Love, J. J.; Constable, C. G.

2003-03-01

With the aim of treating the statistics of palaeomagnetic directions and intensities jointly and consistently, we represent the mean and the variance of palaeomagnetic vectors, at a particular site and of a particular polarity, by a probability density function in a Cartesian three-space of orthogonal magnetic-field components consisting of a single (unimodal) non-zero mean, spherically-symmetrical (isotropic) Gaussian function. For palaeomagnetic data of mixed polarities, we consider a bimodal distribution consisting of a pair of such symmetrical Gaussian functions, with equal, but opposite, means and equal variances. For both the Gaussian and bi-Gaussian distributions, and in the spherical three-space of intensity, inclination, and declination, we obtain analytical expressions for the marginal density functions, the cumulative distributions, and the expected values and variances for each spherical coordinate (including the angle with respect to the axis of symmetry of the distributions). The mathematical expressions for the intensity and off-axis angle are closed-form and especially manageable, with the intensity distribution being Rayleigh-Rician. In the limit of small relative vectorial dispersion, the Gaussian (bi-Gaussian) directional distribution approaches a Fisher (Bingham) distribution and the intensity distribution approaches a normal distribution. In the opposite limit of large relative vectorial dispersion, the directional distributions approach a spherically-uniform distribution and the intensity distribution approaches a Maxwell distribution. We quantify biases in estimating the properties of the vector field resulting from the use of simple arithmetic averages, such as estimates of the intensity or the inclination of the mean vector, or the variances of these quantities. With the statistical framework developed here and using the maximum-likelihood method, which gives unbiased estimates in the limit of large data numbers, we demonstrate how to
Ontologies and tag-statistics

Science.gov (United States)

Tibély, Gergely; Pollner, Péter; Vicsek, Tamás; Palla, Gergely

2012-05-01

Due to the increasing popularity of collaborative tagging systems, the research on tagged networks, hypergraphs, ontologies, folksonomies and other related concepts is becoming an important interdisciplinary area with great potential and relevance for practical applications. In most collaborative tagging systems the tagging by the users is completely ‘flat’, while in some cases they are allowed to define a shallow hierarchy for their own tags. However, usually no overall hierarchical organization of the tags is given, and one of the interesting challenges of this area is to provide an algorithm generating the ontology of the tags from the available data. In contrast, there are also other types of tagged networks available for research, where the tags are already organized into a directed acyclic graph (DAG), encapsulating the ‘is a sub-category of’ type of hierarchy between each other. In this paper, we study how this DAG affects the statistical distribution of tags on the nodes marked by the tags in various real networks. The motivation for this research was the fact that understanding the tagging based on a known hierarchy can help in revealing the hidden hierarchy of tags in collaborative tagging systems. We analyse the relation between the tag-frequency and the position of the tag in the DAG in two large sub-networks of the English Wikipedia and a protein-protein interaction network. We also study the tag co-occurrence statistics by introducing a two-dimensional (2D) tag-distance distribution preserving both the difference in the levels and the absolute distance in the DAG for the co-occurring pairs of tags. Our most interesting finding is that the local relevance of tags in the DAG (i.e. their rank or significance as characterized by, e.g., the length of the branches starting from them) is much more important than their global distance from the root. Furthermore, we also introduce a simple tagging model based on random walks on the DAG, capable of
Ontologies and tag-statistics

International Nuclear Information System (INIS)

Tibély, Gergely; Vicsek, Tamás; Pollner, Péter; Palla, Gergely

2012-01-01

Due to the increasing popularity of collaborative tagging systems, the research on tagged networks, hypergraphs, ontologies, folksonomies and other related concepts is becoming an important interdisciplinary area with great potential and relevance for practical applications. In most collaborative tagging systems the tagging by the users is completely ‘flat’, while in some cases they are allowed to define a shallow hierarchy for their own tags. However, usually no overall hierarchical organization of the tags is given, and one of the interesting challenges of this area is to provide an algorithm generating the ontology of the tags from the available data. In contrast, there are also other types of tagged networks available for research, where the tags are already organized into a directed acyclic graph (DAG), encapsulating the ‘is a sub-category of’ type of hierarchy between each other. In this paper, we study how this DAG affects the statistical distribution of tags on the nodes marked by the tags in various real networks. The motivation for this research was the fact that understanding the tagging based on a known hierarchy can help in revealing the hidden hierarchy of tags in collaborative tagging systems. We analyse the relation between the tag-frequency and the position of the tag in the DAG in two large sub-networks of the English Wikipedia and a protein-protein interaction network. We also study the tag co-occurrence statistics by introducing a two-dimensional (2D) tag-distance distribution preserving both the difference in the levels and the absolute distance in the DAG for the co-occurring pairs of tags. Our most interesting finding is that the local relevance of tags in the DAG (i.e. their rank or significance as characterized by, e.g., the length of the branches starting from them) is much more important than their global distance from the root. Furthermore, we also introduce a simple tagging model based on random walks on the DAG, capable of
Features of statistical dynamics in a finite system

International Nuclear Information System (INIS)

Yan, Shiwei; Sakata, Fumihiko; Zhuo Yizhong

2002-01-01

We study features of statistical dynamics in a finite Hamilton system composed of a relevant one degree of freedom coupled to an irrelevant multidegree of freedom system through a weak interaction. Special attention is paid on how the statistical dynamics changes depending on the number of degrees of freedom in the irrelevant system. It is found that the macrolevel statistical aspects are strongly related to an appearance of the microlevel chaotic motion, and a dissipation of the relevant motion is realized passing through three distinct stages: dephasing, statistical relaxation, and equilibrium regimes. It is clarified that the dynamical description and the conventional transport approach provide us with almost the same macrolevel and microlevel mechanisms only for the system with a very large number of irrelevant degrees of freedom. It is also shown that the statistical relaxation in the finite system is an anomalous diffusion and the fluctuation effects have a finite correlation time
Review of the Statistical Techniques in Medical Sciences | Okeh ...

African Journals Online (AJOL)

... medical researcher in selecting the appropriate statistical techniques. Of course, all statistical techniques have certain underlying assumptions, which must be checked before the technique is applied. Keywords: Variable, Prospective Studies, Retrospective Studies, Statistical significance. Bio-Research Vol. 6 (1) 2008: pp.
Logical analysis of diffuse large B-cell lymphomas.

Science.gov (United States)

Alexe, G; Alexe, S; Axelrod, D E; Hammer, P L; Weissmann, D

2005-07-01

The goal of this study is to re-examine the oligonucleotide microarray dataset of Shipp et al., which contains the intensity levels of 6817 genes of 58 patients with diffuse large B-cell lymphoma (DLBCL) and 19 with follicular lymphoma (FL), by means of the combinatorics, optimisation, and logic-based methodology of logical analysis of data (LAD). The motivations for this new analysis included the previously demonstrated capabilities of LAD and its expected potential (1) to identify different informative genes than those discovered by conventional statistical methods, (2) to identify combinations of gene expression levels capable of characterizing different types of lymphoma, and (3) to assemble collections of such combinations that if considered jointly are capable of accurately distinguishing different types of lymphoma. The central concept of LAD is a pattern or combinatorial biomarker, a concept that resembles a rule as used in decision tree methods. LAD is able to exhaustively generate the collection of all those patterns which satisfy certain quality constraints, through a systematic combinatorial process guided by clear optimization criteria. Then, based on a set covering approach, LAD aggregates the collection of patterns into classification models. In addition, LAD is able to use the information provided by large collections of patterns in order to extract subsets of variables, which collectively are able to distinguish between different types of disease. For the differential diagnosis of DLBCL versus FL, a model based on eight significant genes is constructed and shown to have a sensitivity of 94.7% and a specificity of 100% on the test set. For the prognosis of good versus poor outcome among the DLBCL patients, a model is constructed on another set consisting also of eight significant genes, and shown to have a sensitivity of 87.5% and a specificity of 90% on the test set. The genes selected by LAD also work well as a basis for other kinds of statistical
Summary of significant solar-initiated events during STIP interval XII

International Nuclear Information System (INIS)

Gergely, T.E.

1982-01-01

A summary of the significant solar-terrestrial events of STIP Interval XII (April 10-July 1, 1981) is presented. It is shown that the first half of the interval was extremely active, with several of the largest X-ray flares, particle events, and shocks of this solar cycle taking place during April and the first half of May. However, the second half of the interval was characterized by relatively quiet conditions. A detailed examination is presented of several large events which occurred on 10, 24, and 27 April and on 8 and 16 May. It is suggested that the comparison and statistical analysis of the numerous events for which excellent observations are available could provide information on what causes a type II burst to propagate in the interplanetary medium
Establishing statistical models of manufacturing parameters

International Nuclear Information System (INIS)

Senevat, J.; Pape, J.L.; Deshayes, J.F.

1991-01-01

This paper reports on the effect of pilgering and cold-work parameters on contractile strain ratio and mechanical properties that were investigated using a large population of Zircaloy tubes. Statistical models were established between: contractile strain ratio and tooling parameters, mechanical properties (tensile test, creep test) and cold-work parameters, and mechanical properties and stress-relieving temperature
Algorithm for computing significance levels using the Kolmogorov-Smirnov statistic and valid for both large and small samples

Energy Technology Data Exchange (ETDEWEB)

Kurtz, S.E.; Fields, D.E.

1983-10-01

The KSTEST code presented here is designed to perform the Kolmogorov-Smirnov one-sample test. The code may be used as a stand-alone program or the principal subroutines may be excerpted and used to service other programs. The Kolmogorov-Smirnov one-sample test is a nonparametric goodness-of-fit test. A number of codes to perform this test are in existence, but they suffer from the inability to provide meaningful results in the case of small sample sizes (number of values less than or equal to 80). The KSTEST code overcomes this inadequacy by using two distinct algorithms. If the sample size is greater than 80, an asymptotic series developed by Smirnov is evaluated. If the sample size is 80 or less, a table of values generated by Birnbaum is referenced. Valid results can be obtained from KSTEST when the sample contains from 3 to 300 data points. The program was developed on a Digital Equipment Corporation PDP-10 computer using the FORTRAN-10 language. The code size is approximately 450 card images and the typical CPU execution time is 0.19 s.
Spectral and cross-spectral analysis of uneven time series with the smoothed Lomb-Scargle periodogram and Monte Carlo evaluation of statistical significance

Science.gov (United States)

Pardo-Igúzquiza, Eulogio; Rodríguez-Tovar, Francisco J.

2012-12-01

Many spectral analysis techniques have been designed assuming sequences taken with a constant sampling interval. However, there are empirical time series in the geosciences (sediment cores, fossil abundance data, isotope analysis, …) that do not follow regular sampling because of missing data, gapped data, random sampling or incomplete sequences, among other reasons. In general, interpolating an uneven series in order to obtain a succession with a constant sampling interval alters the spectral content of the series. In such cases it is preferable to follow an approach that works with the uneven data directly, avoiding the need for an explicit interpolation step. The Lomb-Scargle periodogram is a popular choice in such circumstances, as there are programs available in the public domain for its computation. One new computer program for spectral analysis improves the standard Lomb-Scargle periodogram approach in two ways: (1) It explicitly adjusts the statistical significance to any bias introduced by variance reduction smoothing, and (2) it uses a permutation test to evaluate confidence levels, which is better suited than parametric methods when neighbouring frequencies are highly correlated. Another novel program for cross-spectral analysis offers the advantage of estimating the Lomb-Scargle cross-periodogram of two uneven time series defined on the same interval, and it evaluates the confidence levels of the estimated cross-spectra by a non-parametric computer intensive permutation test. Thus, the cross-spectrum, the squared coherence spectrum, the phase spectrum, and the Monte Carlo statistical significance of the cross-spectrum and the squared-coherence spectrum can be obtained. Both of the programs are written in ANSI Fortran 77, in view of its simplicity and compatibility. The program code is of public domain, provided on the website of the journal (http://www.iamg.org/index.php/publisher/articleview/frmArticleID/112/). Different examples (with simulated and
Combining statistical inference and decisions in ecology

Science.gov (United States)

Williams, Perry J.; Hooten, Mevin B.

2016-01-01

Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation, and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem.
Statistics of wind direction and its increments

International Nuclear Information System (INIS)

Doorn, Eric van; Dhruva, Brindesh; Sreenivasan, Katepalli R.; Cassella, Victor

2000-01-01

We study some elementary statistics of wind direction fluctuations in the atmosphere for a wide range of time scales (10 -4 sec to 1 h), and in both vertical and horizontal planes. In the plane parallel to the ground surface, the direction time series consists of two parts: a constant drift due to large weather systems moving with the mean wind speed, and fluctuations about this drift. The statistics of the direction fluctuations show a rough similarity to Brownian motion but depend, in detail, on the wind speed. This dependence manifests itself quite clearly in the statistics of wind-direction increments over various intervals of time. These increments are intermittent during periods of low wind speeds but Gaussian-like during periods of high wind speeds. (c) 2000 American Institute of Physics
Effect size and statistical power in the rodent fear conditioning literature - A systematic review.

Science.gov (United States)

Carneiro, Clarissa F D; Moulin, Thiago C; Macleod, Malcolm R; Amaral, Olavo B

2018-01-01

Proposals to increase research reproducibility frequently call for focusing on effect sizes instead of p values, as well as for increasing the statistical power of experiments. However, it is unclear to what extent these two concepts are indeed taken into account in basic biomedical science. To study this in a real-case scenario, we performed a systematic review of effect sizes and statistical power in studies on learning of rodent fear conditioning, a widely used behavioral task to evaluate memory. Our search criteria yielded 410 experiments comparing control and treated groups in 122 articles. Interventions had a mean effect size of 29.5%, and amnesia caused by memory-impairing interventions was nearly always partial. Mean statistical power to detect the average effect size observed in well-powered experiments with significant differences (37.2%) was 65%, and was lower among studies with non-significant results. Only one article reported a sample size calculation, and our estimated sample size to achieve 80% power considering typical effect sizes and variances (15 animals per group) was reached in only 12.2% of experiments. Actual effect sizes correlated with effect size inferences made by readers on the basis of textual descriptions of results only when findings were non-significant, and neither effect size nor power correlated with study quality indicators, number of citations or impact factor of the publishing journal. In summary, effect sizes and statistical power have a wide distribution in the rodent fear conditioning literature, but do not seem to have a large influence on how results are described or cited. Failure to take these concepts into consideration might limit attempts to improve reproducibility in this field of science.

Role of radiation therapy in large cell lymphoma

International Nuclear Information System (INIS)

Stryker, J.A.; Bartholomew, M.J.; Beatty, R.E.

1990-01-01

This paper compares the results of treatment for large cell lymphoma with use of radiation therapy (RT), chemotherapy (CT), or both. The authors retrospectively studied 142 patients with large cell lymphoma. Seventy-two has stage I or II disease and 70, stage III or IV; 37% had B symptoms. CT was used in 66 patients, RT in 22, both in 46, and surgery with or without RT or CT in eight. CT regimens were CHOP, 38 patients; C-MOPP/COPP, 25; CHOP-bleo/BACOP, 15; COP-BLAN-MEL, 8; M-BACOD, 8; COP/CVP, 5; COP-BLAM, 5; and other regimens, 12. Statistical analysis showed that age, stage B symptoms, and treatment were significant variables determining survival. In stages I and II, the 5-year survival rate with RT plus CT was 65%; with CT, 35%; and with RT, 9% (P = < .01)
Statistics Tables For Mathematicians, Engineers, Economists and the Behavioural and Management Sciences

CERN Document Server

Neave, Henry R

2012-01-01

For three decades, Henry Neave's Statistics Tables has been the gold standard for all students taking an introductory statistical methods course as part of their wider degree in a host of disciplines including mathematics, economics, business and management, geography and psychology. The period has seen a large increase in the level of mathematics and statistics required to achieve these qualifications and Statistics Tables has helped several generations of students meet their goals.All the features of the first edition are retained including the full range of best-known standard statistical t
Hydrometeorological variability on a large french catchment and its relation to large-scale circulation across temporal scales

Science.gov (United States)

Massei, Nicolas; Dieppois, Bastien; Fritier, Nicolas; Laignel, Benoit; Debret, Maxime; Lavers, David; Hannah, David

2015-04-01

basically consisted in 1- decomposing both signals (SLP field and precipitation or streamflow) using discrete wavelet multiresolution analysis and synthesis, 2- generating one statistical downscaling model per time-scale, 3- summing up all scale-dependent models in order to obtain a final reconstruction of the predictand. The results obtained revealed a significant improvement of the reconstructions for both precipitation and streamflow when using the multiresolution ESD model instead of basic ESD ; in addition, the scale-dependent spatial patterns associated to the model matched quite well those obtained from scale-dependent composite analysis. In particular, the multiresolution ESD model handled very well the significant changes in variance through time observed in either prepciptation or streamflow. For instance, the post-1980 period, which had been characterized by particularly high amplitudes in interannual-to-interdecadal variability associated with flood and extremely low-flow/drought periods (e.g., winter 2001, summer 2003), could not be reconstructed without integrating wavelet multiresolution analysis into the model. Further investigations would be required to address the issue of the stationarity of the large-scale/local-scale relationships and to test the capability of the multiresolution ESD model for interannual-to-interdecadal forecasting. In terms of methodological approach, further investigations may concern a fully comprehensive sensitivity analysis of the modeling to the parameter of the multiresolution approach (different families of scaling and wavelet functions used, number of coefficients/degree of smoothness, etc.).
Extending statistical boosting. An overview of recent methodological developments.

Science.gov (United States)

Mayr, A; Binder, H; Gefeller, O; Schmid, M

2014-01-01

Boosting algorithms to simultaneously estimate and select predictor effects in statistical models have gained substantial interest during the last decade. This review highlights recent methodological developments regarding boosting algorithms for statistical modelling especially focusing on topics relevant for biomedical research. We suggest a unified framework for gradient boosting and likelihood-based boosting (statistical boosting) which have been addressed separately in the literature up to now. The methodological developments on statistical boosting during the last ten years can be grouped into three different lines of research: i) efforts to ensure variable selection leading to sparser models, ii) developments regarding different types of predictor effects and how to choose them, iii) approaches to extend the statistical boosting framework to new regression settings. Statistical boosting algorithms have been adapted to carry out unbiased variable selection and automated model choice during the fitting process and can nowadays be applied in almost any regression setting in combination with a large amount of different types of predictor effects.
Statistical thermodynamics

International Nuclear Information System (INIS)

Lim, Gyeong Hui

2008-03-01

This book consists of 15 chapters, which are basic conception and meaning of statistical thermodynamics, Maxwell-Boltzmann's statistics, ensemble, thermodynamics function and fluctuation, statistical dynamics with independent particle system, ideal molecular system, chemical equilibrium and chemical reaction rate in ideal gas mixture, classical statistical thermodynamics, ideal lattice model, lattice statistics and nonideal lattice model, imperfect gas theory on liquid, theory on solution, statistical thermodynamics of interface, statistical thermodynamics of a high molecule system and quantum statistics
Effect of model choice and sample size on statistical tolerance limits

International Nuclear Information System (INIS)

Duran, B.S.; Campbell, K.

1980-03-01

Statistical tolerance limits are estimates of large (or small) quantiles of a distribution, quantities which are very sensitive to the shape of the tail of the distribution. The exact nature of this tail behavior cannot be ascertained brom small samples, so statistical tolerance limits are frequently computed using a statistical model chosen on the basis of theoretical considerations or prior experience with similar populations. This report illustrates the effects of such choices on the computations
Data management and statistical analysis for environmental assessment

International Nuclear Information System (INIS)

Wendelberger, J.R.; McVittie, T.I.

1995-01-01

Data management and statistical analysis for environmental assessment are important issues on the interface of computer science and statistics. Data collection for environmental decision making can generate large quantities of various types of data. A database/GIS system developed is described which provides efficient data storage as well as visualization tools which may be integrated into the data analysis process. FIMAD is a living database and GIS system. The system has changed and developed over time to meet the needs of the Los Alamos National Laboratory Restoration Program. The system provides a repository for data which may be accessed by different individuals for different purposes. The database structure is driven by the large amount and varied types of data required for environmental assessment. The integration of the database with the GIS system provides the foundation for powerful visualization and analysis capabilities
Development of statistical analysis code for meteorological data (W-View)

International Nuclear Information System (INIS)

Tachibana, Haruo; Sekita, Tsutomu; Yamaguchi, Takenori

2003-03-01

A computer code (W-View: Weather View) was developed to analyze the meteorological data statistically based on 'the guideline of meteorological statistics for the safety analysis of nuclear power reactor' (Nuclear Safety Commission on January 28, 1982; revised on March 29, 2001). The code gives statistical meteorological data to assess the public dose in case of normal operation and severe accident to get the license of nuclear reactor operation. This code was revised from the original code used in a large office computer code to enable a personal computer user to analyze the meteorological data simply and conveniently and to make the statistical data tables and figures of meteorology. (author)
Statistical mechanics and the foundations of thermodynamics

International Nuclear Information System (INIS)

Loef, A.M.

1979-01-01

An introduction to classical statistical mechanics and its relation to thermodynamics is presented. Emphasis is put on getting a detailed and logical presentation of the foundations of thermodynamics based on the maximum entropy principles which govern the values taken by macroscopic variables according to the laws of large numbers
Tracking of large-scale structures in turbulent channel with direct numerical simulation of low Prandtl number passive scalar

Science.gov (United States)

Tiselj, Iztok

2014-12-01

Channel flow DNS (Direct Numerical Simulation) at friction Reynolds number 180 and with passive scalars of Prandtl numbers 1 and 0.01 was performed in various computational domains. The "normal" size domain was ˜2300 wall units long and ˜750 wall units wide; size taken from the similar DNS of Moser et al. The "large" computational domain, which is supposed to be sufficient to describe the largest structures of the turbulent flows was 3 times longer and 3 times wider than the "normal" domain. The "very large" domain was 6 times longer and 6 times wider than the "normal" domain. All simulations were performed with the same spatial and temporal resolution. Comparison of the standard and large computational domains shows the velocity field statistics (mean velocity, root-mean-square (RMS) fluctuations, and turbulent Reynolds stresses) that are within 1%-2%. Similar agreement is observed for Pr = 1 temperature fields and can be observed also for the mean temperature profiles at Pr = 0.01. These differences can be attributed to the statistical uncertainties of the DNS. However, second-order moments, i.e., RMS temperature fluctuations of standard and large computational domains at Pr = 0.01 show significant differences of up to 20%. Stronger temperature fluctuations in the "large" and "very large" domains confirm the existence of the large-scale structures. Their influence is more or less invisible in the main velocity field statistics or in the statistics of the temperature fields at Prandtl numbers around 1. However, these structures play visible role in the temperature fluctuations at low Prandtl number, where high temperature diffusivity effectively smears the small-scale structures in the thermal field and enhances the relative contribution of large-scales. These large thermal structures represent some kind of an echo of the large scale velocity structures: the highest temperature-velocity correlations are not observed between the instantaneous temperatures and
Extracting climate signals from large hydrological data cubes using multivariate statistics - an example for the Mediterranean basin

Science.gov (United States)

Kauer, Agnes; Dorigo, Wouter; Bauer-Marschallinger, Bernhard

2017-04-01

Global warming is expected to change ocean-atmosphere oscillation patterns, e.g. the El Nino Southern Oscillation, and may thus have a substantial impact on water resources over land. Yet, the link between climate oscillations and terrestrial hydrology has large uncertainties. In particular, the climate in the Mediterranean basin is expected to be sensitive to global warming as it may increase insufficient and irregular water supply and lead to more frequent and intense droughts and heavy precipitation events. The ever increasing need for water in tourism and agriculture reinforce the problem. Therefore, the monitoring and better understanding of the hydrological cycle are crucial for this area. This study seeks to quantify the effect of regional climate modes, e.g. the Northern Atlantic Oscillation (NAO) on the hydrological cycle in the Mediterranean. We apply Empirical Orthogonal Functions (EOF) to a wide range of hydrological datasets to extract the major modes of variation over the study period. We use more than ten datasets describing precipitation, soil moisture, evapotranspiration, and changes in water mass with study periods ranging from one to three decades depending on the dataset. The resulting EOFs are then examined for correlations with regional climate modes using Spearman rank correlation analysis. This is done for the entire time span of the EOFs and for monthly and seasonally sampled data. We find relationships between the hydrological datasets and the climate modes NAO, Arctic Oscillation (AO), Eastern Atlantic (EA), and Tropical Northern Atlantic (TNA). Analyses of monthly and seasonally sampled data reveal high correlations especially in the winter months. However, the spatial extent of the data cube considered for the analyses have a large impact on the results. Our statistical analyses suggest an impact of regional climate modes on the hydrological cycle in the Mediterranean area and may provide valuable input for evaluating process
Correlation length of magnetosheath fluctuations: Cluster statistics

Directory of Open Access Journals (Sweden)

O. Gutynska

2008-09-01

Full Text Available Magnetosheath parameters are usually described by gasdynamic or magnetohydrodynamic (MHD models but these models cannot account for one of the most important sources of magnetosheath fluctuations – the foreshock. Earlier statistical processing of a large amount of magnetosheath observations has shown that the magnetosheath magnetic field and plasma flow fluctuations downstream of the quasiparallel shock are much larger than those at the opposite flank. These studies were based on the observations of a single spacecraft and thus they could not provide full information on propagation of the fluctuations through the magnetosheath.

We present the results of a statistical survey of the magnetosheath magnetic field fluctuations using two years of Cluster observations. We discuss the dependence of the cross-correlation coefficients between different spacecraft pairs on the orientation of the separation vector with respect to the average magnetic field and plasma flow vectors and other parameters. We have found that the correlation length does not exceed ~1 R_E in the analyzed frequency range (0.001–0.125 Hz and does not depend significantly on the magnetic field or plasma flow direction. A close connection of cross-correlation coefficients computed in the magnetosheath with the cross-correlation coefficients between a solar wind monitor and a magnetosheath spacecraft suggests that solar wind structures persist on the background of magnetosheath fluctuations.
Correlation length of magnetosheath fluctuations: Cluster statistics

Directory of Open Access Journals (Sweden)

O. Gutynska

2008-09-01

Full Text Available Magnetosheath parameters are usually described by gasdynamic or magnetohydrodynamic (MHD models but these models cannot account for one of the most important sources of magnetosheath fluctuations – the foreshock. Earlier statistical processing of a large amount of magnetosheath observations has shown that the magnetosheath magnetic field and plasma flow fluctuations downstream of the quasiparallel shock are much larger than those at the opposite flank. These studies were based on the observations of a single spacecraft and thus they could not provide full information on propagation of the fluctuations through the magnetosheath. We present the results of a statistical survey of the magnetosheath magnetic field fluctuations using two years of Cluster observations. We discuss the dependence of the cross-correlation coefficients between different spacecraft pairs on the orientation of the separation vector with respect to the average magnetic field and plasma flow vectors and other parameters. We have found that the correlation length does not exceed ~1 RE in the analyzed frequency range (0.001–0.125 Hz and does not depend significantly on the magnetic field or plasma flow direction. A close connection of cross-correlation coefficients computed in the magnetosheath with the cross-correlation coefficients between a solar wind monitor and a magnetosheath spacecraft suggests that solar wind structures persist on the background of magnetosheath fluctuations.
[Statistics for statistics?--Thoughts about psychological tools].

Science.gov (United States)

Berger, Uwe; Stöbel-Richter, Yve

2007-12-01

Statistical methods take a prominent place among psychologists' educational programs. Being known as difficult to understand and heavy to learn, students fear of these contents. Those, who do not aspire after a research carrier at the university, will forget the drilled contents fast. Furthermore, because it does not apply for the work with patients and other target groups at a first glance, the methodological education as a whole was often questioned. For many psychological practitioners the statistical education makes only sense by enforcing respect against other professions, namely physicians. For the own business, statistics is rarely taken seriously as a professional tool. The reason seems to be clear: Statistics treats numbers, while psychotherapy treats subjects. So, does statistics ends in itself? With this article, we try to answer the question, if and how statistical methods were represented within the psychotherapeutical and psychological research. Therefore, we analyzed 46 Originals of a complete volume of the journal Psychotherapy, Psychosomatics, Psychological Medicine (PPmP). Within the volume, 28 different analyse methods were applied, from which 89 per cent were directly based upon statistics. To be able to write and critically read Originals as a backbone of research, presumes a high degree of statistical education. To ignore statistics means to ignore research and at least to reveal the own professional work to arbitrariness.
Statistical analysis of disruptions in JET

International Nuclear Information System (INIS)

De Vries, P.C.; Johnson, M.F.; Segui, I.

2009-01-01

The disruption rate (the percentage of discharges that disrupt) in JET was found to drop steadily over the years. Recent campaigns (2005-2007) show a yearly averaged disruption rate of only 6% while from 1991 to 1995 this was often higher than 20%. Besides the disruption rate, the so-called disruptivity, or the likelihood of a disruption depending on the plasma parameters, has been determined. The disruptivity of plasmas was found to be significantly higher close to the three main operational boundaries for tokamaks; the low-q, high density and β-limit. The frequency at which JET operated close to the density-limit increased six fold over the last decade; however, only a small reduction in disruptivity was found. Similarly the disruptivity close to the low-q and β-limit was found to be unchanged. The most significant reduction in disruptivity was found far from the operational boundaries, leading to the conclusion that the improved disruption rate is due to a better technical capability of operating JET, instead of safer operations close to the physics limits. The statistics showed that a simple protection system was able to mitigate the forces of a large fraction of disruptions, although it has proved to be at present more difficult to ameliorate the heat flux.
Universality of correlations of levels with discrete statistics

OpenAIRE

Brezin, Edouard; Kazakov, Vladimir

1999-01-01

We study the statistics of a system of N random levels with integer values, in the presence of a logarithmic repulsive potential of Dyson type. This probleme arises in sums over representations (Young tableaux) of GL(N) in various matrix problems and in the study of statistics of partitions for the permutation group. The model is generalized to include an external source and its correlators are found in closed form for any N. We reproduce the density of levels in the large N and double scalin...
On a curvature-statistics theorem

International Nuclear Information System (INIS)

Calixto, M; Aldaya, V

2008-01-01

The spin-statistics theorem in quantum field theory relates the spin of a particle to the statistics obeyed by that particle. Here we investigate an interesting correspondence or connection between curvature (κ = ±1) and quantum statistics (Fermi-Dirac and Bose-Einstein, respectively). The interrelation between both concepts is established through vacuum coherent configurations of zero modes in quantum field theory on the compact O(3) and noncompact O(2; 1) (spatial) isometry subgroups of de Sitter and Anti de Sitter spaces, respectively. The high frequency limit, is retrieved as a (zero curvature) group contraction to the Newton-Hooke (harmonic oscillator) group. We also make some comments on the physical significance of the vacuum energy density and the cosmological constant problem.
On a curvature-statistics theorem

Energy Technology Data Exchange (ETDEWEB)

Calixto, M [Departamento de Matematica Aplicada y Estadistica, Universidad Politecnica de Cartagena, Paseo Alfonso XIII 56, 30203 Cartagena (Spain); Aldaya, V [Instituto de Astrofisica de Andalucia, Apartado Postal 3004, 18080 Granada (Spain)], E-mail: Manuel.Calixto@upct.es

2008-08-15

The spin-statistics theorem in quantum field theory relates the spin of a particle to the statistics obeyed by that particle. Here we investigate an interesting correspondence or connection between curvature ({kappa} = {+-}1) and quantum statistics (Fermi-Dirac and Bose-Einstein, respectively). The interrelation between both concepts is established through vacuum coherent configurations of zero modes in quantum field theory on the compact O(3) and noncompact O(2; 1) (spatial) isometry subgroups of de Sitter and Anti de Sitter spaces, respectively. The high frequency limit, is retrieved as a (zero curvature) group contraction to the Newton-Hooke (harmonic oscillator) group. We also make some comments on the physical significance of the vacuum energy density and the cosmological constant problem.
Field significance of performance measures in the context of regional climate model evaluation. Part 2: precipitation

Science.gov (United States)

Ivanov, Martin; Warrach-Sagi, Kirsten; Wulfmeyer, Volker

2018-04-01

A new approach for rigorous spatial analysis of the downscaling performance of regional climate model (RCM) simulations is introduced. It is based on a multiple comparison of the local tests at the grid cells and is also known as `field' or `global' significance. The block length for the local resampling tests is precisely determined to adequately account for the time series structure. New performance measures for estimating the added value of downscaled data relative to the large-scale forcing fields are developed. The methodology is exemplarily applied to a standard EURO-CORDEX hindcast simulation with the Weather Research and Forecasting (WRF) model coupled with the land surface model NOAH at 0.11 ∘ grid resolution. Daily precipitation climatology for the 1990-2009 period is analysed for Germany for winter and summer in comparison with high-resolution gridded observations from the German Weather Service. The field significance test controls the proportion of falsely rejected local tests in a meaningful way and is robust to spatial dependence. Hence, the spatial patterns of the statistically significant local tests are also meaningful. We interpret them from a process-oriented perspective. While the downscaled precipitation distributions are statistically indistinguishable from the observed ones in most regions in summer, the biases of some distribution characteristics are significant over large areas in winter. WRF-NOAH generates appropriate stationary fine-scale climate features in the daily precipitation field over regions of complex topography in both seasons and appropriate transient fine-scale features almost everywhere in summer. As the added value of global climate model (GCM)-driven simulations cannot be smaller than this perfect-boundary estimate, this work demonstrates in a rigorous manner the clear additional value of dynamical downscaling over global climate simulations. The evaluation methodology has a broad spectrum of applicability as it is
Christians in South Africa: The statistical picture

African Journals Online (AJOL)

Abstract. Christians in South Africa; The statistical picture. Government censuses since 1960 indicate that the religious picture was already largely fixed by the 1950s. Already at that stage some 3 out of 4. South Africans identified themselves as 'Christians'. Since then this percentage grew steadily, mainly because of ...

Testing statistical hypotheses of equivalence

CERN Document Server

Wellek, Stefan

2010-01-01

Equivalence testing has grown significantly in importance over the last two decades, especially as its relevance to a variety of applications has become understood. Yet published work on the general methodology remains scattered in specialists' journals, and for the most part, it focuses on the relatively narrow topic of bioequivalence assessment.With a far broader perspective, Testing Statistical Hypotheses of Equivalence provides the first comprehensive treatment of statistical equivalence testing. The author addresses a spectrum of specific, two-sided equivalence testing problems, from the
Effective control of complex turbulent dynamical systems through statistical functionals.

Science.gov (United States)

Majda, Andrew J; Qi, Di

2017-05-30

Turbulent dynamical systems characterized by both a high-dimensional phase space and a large number of instabilities are ubiquitous among complex systems in science and engineering, including climate, material, and neural science. Control of these complex systems is a grand challenge, for example, in mitigating the effects of climate change or safe design of technology with fully developed shear turbulence. Control of flows in the transition to turbulence, where there is a small dimension of instabilities about a basic mean state, is an important and successful discipline. In complex turbulent dynamical systems, it is impossible to track and control the large dimension of instabilities, which strongly interact and exchange energy, and new control strategies are needed. The goal of this paper is to propose an effective statistical control strategy for complex turbulent dynamical systems based on a recent statistical energy principle and statistical linear response theory. We illustrate the potential practical efficiency and verify this effective statistical control strategy on the 40D Lorenz 1996 model in forcing regimes with various types of fully turbulent dynamics with nearly one-half of the phase space unstable.
Large-Angle CMB Suppression and Polarisation Predictions

CERN Document Server

Copi, C.J.; Schwarz, D.J.; Starkman, G.D.

2013-01-01

The anomalous lack of large angle temperature correlations has been a surprising feature of the CMB since first observed by COBE-DMR and subsequently confirmed and strengthened by WMAP. This anomaly may point to the need for modifications of the standard model of cosmology or may show that our Universe is a rare statistical fluctuation within that model. Further observations of the temperature auto-correlation function will not elucidate the issue; sufficiently high precision statistical observations already exist. Instead, alternative probes are required. In this work we explore the expectations for forthcoming polarisation observations. We define a prescription to test the hypothesis that the large-angle CMB temperature perturbations in our Universe represent a rare statistical fluctuation within the standard cosmological model. These tests are based on the temperature-Q Stokes parameter correlation. Unfortunately these tests cannot be expected to be definitive. However, we do show that if this TQ-correlati...
Beyond δ : Tailoring marked statistics to reveal modified gravity

Science.gov (United States)

Valogiannis, Georgios; Bean, Rachel

2018-01-01

Models that seek to explain cosmic acceleration through modifications to general relativity (GR) evade stringent Solar System constraints through a restoring, screening mechanism. Down-weighting the high-density, screened regions in favor of the low density, unscreened ones offers the potential to enhance the amount of information carried in such modified gravity models. In this work, we assess the performance of a new "marked" transformation and perform a systematic comparison with the clipping and logarithmic transformations, in the context of Λ CDM and the symmetron and f (R ) modified gravity models. Performance is measured in terms of the fractional boost in the Fisher information and the signal-to-noise ratio (SNR) for these models relative to the statistics derived from the standard density distribution. We find that all three statistics provide improved Fisher boosts over the basic density statistics. The model parameters for the marked and clipped transformation that best enhance signals and the Fisher boosts are determined. We also show that the mark is useful both as a Fourier and real-space transformation; a marked correlation function also enhances the SNR relative to the standard correlation function, and can on mildly nonlinear scales show a significant difference between the Λ CDM and the modified gravity models. Our results demonstrate how a series of simple analytical transformations could dramatically increase the predicted information extracted on deviations from GR, from large-scale surveys, and give the prospect for a much more feasible potential detection.
Statistical-mechanical entropy by the thin-layer method

International Nuclear Information System (INIS)

Feng, He; Kim, Sung Won

2003-01-01

G. Hooft first studied the statistical-mechanical entropy of a scalar field in a Schwarzschild black hole background by the brick-wall method and hinted that the statistical-mechanical entropy is the statistical origin of the Bekenstein-Hawking entropy of the black hole. However, according to our viewpoint, the statistical-mechanical entropy is only a quantum correction to the Bekenstein-Hawking entropy of the black-hole. The brick-wall method based on thermal equilibrium at a large scale cannot be applied to the cases out of equilibrium such as a nonstationary black hole. The statistical-mechanical entropy of a scalar field in a nonstationary black hole background is calculated by the thin-layer method. The condition of local equilibrium near the horizon of the black hole is used as a working postulate and is maintained for a black hole which evaporates slowly enough and whose mass is far greater than the Planck mass. The statistical-mechanical entropy is also proportional to the area of the black hole horizon. The difference from the stationary black hole is that the result relies on a time-dependent cutoff
Calculating statistical distributions from operator relations: The statistical distributions of various intermediate statistics

International Nuclear Information System (INIS)

Dai, Wu-Sheng; Xie, Mi

2013-01-01

In this paper, we give a general discussion on the calculation of the statistical distribution from a given operator relation of creation, annihilation, and number operators. Our result shows that as long as the relation between the number operator and the creation and annihilation operators can be expressed as a † b=Λ(N) or N=Λ −1 (a † b), where N, a † , and b denote the number, creation, and annihilation operators, i.e., N is a function of quadratic product of the creation and annihilation operators, the corresponding statistical distribution is the Gentile distribution, a statistical distribution in which the maximum occupation number is an arbitrary integer. As examples, we discuss the statistical distributions corresponding to various operator relations. In particular, besides the Bose–Einstein and Fermi–Dirac cases, we discuss the statistical distributions for various schemes of intermediate statistics, especially various q-deformation schemes. Our result shows that the statistical distributions corresponding to various q-deformation schemes are various Gentile distributions with different maximum occupation numbers which are determined by the deformation parameter q. This result shows that the results given in much literature on the q-deformation distribution are inaccurate or incomplete. -- Highlights: ► A general discussion on calculating statistical distribution from relations of creation, annihilation, and number operators. ► A systemic study on the statistical distributions corresponding to various q-deformation schemes. ► Arguing that many results of q-deformation distributions in literature are inaccurate or incomplete
Statistical properties of turbulent transport and fluctuations in tokamak and stellarator devices

Energy Technology Data Exchange (ETDEWEB)

Hidalgo, C; Pedrosa, M A; Milligen, B Van; Sanchez, E; Balbin, R; Garcia-Cortes, I [Euratom-CIEMAT Association, Madrid (Spain); Bleuel, J; Giannone, L.; Niedermeyer, H [Euratom-IPP Association, Garching (Germany)

1997-05-01

The statistical properties of fluctuations and turbulent transport have been studied in the plasma boundary region of stellarator (TJ-IU, W7-AS) and tokamak (TJ-I) devices. The local flux probability distribution function shows the bursty character of the flux and presents a systematic change as a function of the radial location. There exist large amplitude transport bursts that account for a significant part of the total flux. There is a strong similarity between the statistical properties of the turbulent fluxes in different devices. The value of the radial coherence associated with fluctuations and turbulent transport is strongly intermittent. This result emphasizes the importance of measurements with time resolution in understanding the interplay between the edge and the core regions in the plasma. For measurements in the plasma edge region of the TJ-IU torsatron, the turbulent flux does not, in general, show a larger radial coherence than the one associated with the fluctuations. (author). 14 refs, 6 figs.
Testing the statistical compatibility of independent data sets

International Nuclear Information System (INIS)

Maltoni, M.; Schwetz, T.

2003-01-01

We discuss a goodness-of-fit method which tests the compatibility between statistically independent data sets. The method gives sensible results even in cases where the χ 2 minima of the individual data sets are very low or when several parameters are fitted to a large number of data points. In particular, it avoids the problem that a possible disagreement between data sets becomes diluted by data points which are insensitive to the crucial parameters. A formal derivation of the probability distribution function for the proposed test statistics is given, based on standard theorems of statistics. The application of the method is illustrated on data from neutrino oscillation experiments, and its complementarity to the standard goodness-of-fit is discussed
Changing statistics of storms in the North Atlantic?

International Nuclear Information System (INIS)

Storch, H. von; Guddal, J.; Iden, K.A.; Jonsson, T.; Perlwitz, J.; Reistad, M.; Ronde, J. de; Schmidt, H.; Zorita, E.

1993-01-01

Problems in the present discussion about increasing storminess in the North Atlantic area are discusesd. Observational data so far available do not indicate a change in the storm statistics. Output from climate models points to an itensified storm track in the North Atlantic, but because of the limited skill of present-day climate models in simulating high-frequency variability and regional details any such 'forecast' has to be considered with caution. A downscaling procedure which relates large-scale time-mean aspects of the state of the atmosphere and ocean to the local statistics of storms is proposed to reconstruct past variations of high-frequency variability in the atmosphere (storminess) and in the sea state (wave statistics). First results are presented. (orig.)
Significance analysis of lexical bias in microarray data

Directory of Open Access Journals (Sweden)

Falkow Stanley

2003-04-01

Full Text Available Abstract Background Genes that are determined to be significantly differentially regulated in microarray analyses often appear to have functional commonalities, such as being components of the same biochemical pathway. This results in certain words being under- or overrepresented in the list of genes. Distinguishing between biologically meaningful trends and artifacts of annotation and analysis procedures is of the utmost importance, as only true biological trends are of interest for further experimentation. A number of sophisticated methods for identification of significant lexical trends are currently available, but these methods are generally too cumbersome for practical use by most microarray users. Results We have developed a tool, LACK, for calculating the statistical significance of apparent lexical bias in microarray datasets. The frequency of a user-specified list of search terms in a list of genes which are differentially regulated is assessed for statistical significance by comparison to randomly generated datasets. The simplicity of the input files and user interface targets the average microarray user who wishes to have a statistical measure of apparent lexical trends in analyzed datasets without the need for bioinformatics skills. The software is available as Perl source or a Windows executable. Conclusion We have used LACK in our laboratory to generate biological hypotheses based on our microarray data. We demonstrate the program's utility using an example in which we confirm significant upregulation of SPI-2 pathogenicity island of Salmonella enterica serovar Typhimurium by the cation chelator dipyridyl.
Vortex dynamics and Lagrangian statistics in a model for active turbulence.

Science.gov (United States)

James, Martin; Wilczek, Michael

2018-02-14

Cellular suspensions such as dense bacterial flows exhibit a turbulence-like phase under certain conditions. We study this phenomenon of "active turbulence" statistically by using numerical tools. Following Wensink et al. (Proc. Natl. Acad. Sci. U.S.A. 109, 14308 (2012)), we model active turbulence by means of a generalized Navier-Stokes equation. Two-point velocity statistics of active turbulence, both in the Eulerian and the Lagrangian frame, is explored. We characterize the scale-dependent features of two-point statistics in this system. Furthermore, we extend this statistical study with measurements of vortex dynamics in this system. Our observations suggest that the large-scale statistics of active turbulence is close to Gaussian with sub-Gaussian tails.
Development of statistical analysis code for meteorological data (W-View)

Energy Technology Data Exchange (ETDEWEB)

Tachibana, Haruo; Sekita, Tsutomu; Yamaguchi, Takenori [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment

2003-03-01

A computer code (W-View: Weather View) was developed to analyze the meteorological data statistically based on 'the guideline of meteorological statistics for the safety analysis of nuclear power reactor' (Nuclear Safety Commission on January 28, 1982; revised on March 29, 2001). The code gives statistical meteorological data to assess the public dose in case of normal operation and severe accident to get the license of nuclear reactor operation. This code was revised from the original code used in a large office computer code to enable a personal computer user to analyze the meteorological data simply and conveniently and to make the statistical data tables and figures of meteorology. (author)
Statistics for the LHC: Quantifying our Scientific Narrative (1/4)

CERN Multimedia

CERN. Geneva

2011-01-01

Now that the LHC physics program is well under way and results have begun to pour out of the experiments, the statistical methodology used for these results is a hot topic. This is a challenge at the LHC, as we have sensitivity to discover new physics in a stage of the experiments where systematic uncertainties can still be quite large. The emphasis of these lectures is how we can translate the scientific narrative of why we think we know what we know into quantitative statistical statements about the presence or absence of new physics. Topics will include statistical modeling, incorporation of control samples to constrain systematics, and Bayesian and Frequentist statistical tests that are capable of answering these questions.
Significance of atmospheric effects of heat rejection from energy centers in the semi arid northwest

International Nuclear Information System (INIS)

Ramsdell, J.V.; Drake, R.L.; Young, J.R.

1976-01-01

The results presented in this paper have been obtained using simple atmospheric models in an attempt to optimize heat sink management in a conceptual nuclear energy center (NEC) at Hanford. The models have been designed to be conservatice in the sense that they are biased toward over prediction of the impact of cooling system effluents on humidity and fog. Thus the models are screening tools to be used to identify subjects for further, more realistic examination. Within this context the following conclusions have been reached: the evaluation of any atmospheric impact postulated for heat dissipation must be conducted in quantitative terms which can be used to determine the significance of the impact; of the potential atmospheric impacts of large heat releases from energy centers, the one most amenable to quantitative evaluation in meaningful terms as the increase in fog; a postulated increase in frequency of fog can be translated into terms of visibility and both can be evaluated statistically; the translation of a increase in fog to visibility terms permits economic evaluation of the impact; and the predicted impact of the HNEC on fog and visibility is statistically significant whether the energy center consists of 20 or 40 units
Whither Statistics Education Research?

Science.gov (United States)

Watson, Jane

2016-01-01

This year marks the 25th anniversary of the publication of a "National Statement on Mathematics for Australian Schools", which was the first curriculum statement this country had including "Chance and Data" as a significant component. It is hence an opportune time to survey the history of the related statistics education…
Effect size and statistical power in the rodent fear conditioning literature – A systematic review

Science.gov (United States)

Macleod, Malcolm R.

2018-01-01

Proposals to increase research reproducibility frequently call for focusing on effect sizes instead of p values, as well as for increasing the statistical power of experiments. However, it is unclear to what extent these two concepts are indeed taken into account in basic biomedical science. To study this in a real-case scenario, we performed a systematic review of effect sizes and statistical power in studies on learning of rodent fear conditioning, a widely used behavioral task to evaluate memory. Our search criteria yielded 410 experiments comparing control and treated groups in 122 articles. Interventions had a mean effect size of 29.5%, and amnesia caused by memory-impairing interventions was nearly always partial. Mean statistical power to detect the average effect size observed in well-powered experiments with significant differences (37.2%) was 65%, and was lower among studies with non-significant results. Only one article reported a sample size calculation, and our estimated sample size to achieve 80% power considering typical effect sizes and variances (15 animals per group) was reached in only 12.2% of experiments. Actual effect sizes correlated with effect size inferences made by readers on the basis of textual descriptions of results only when findings were non-significant, and neither effect size nor power correlated with study quality indicators, number of citations or impact factor of the publishing journal. In summary, effect sizes and statistical power have a wide distribution in the rodent fear conditioning literature, but do not seem to have a large influence on how results are described or cited. Failure to take these concepts into consideration might limit attempts to improve reproducibility in this field of science. PMID:29698451
Statistical Models of Adaptive Immune populations

Science.gov (United States)

Sethna, Zachary; Callan, Curtis; Walczak, Aleksandra; Mora, Thierry

The availability of large (104-106 sequences) datasets of B or T cell populations from a single individual allows reliable fitting of complex statistical models for naïve generation, somatic selection, and hypermutation. It is crucial to utilize a probabilistic/informational approach when modeling these populations. The inferred probability distributions allow for population characterization, calculation of probability distributions of various hidden variables (e.g. number of insertions), as well as statistical properties of the distribution itself (e.g. entropy). In particular, the differences between the T cell populations of embryonic and mature mice will be examined as a case study. Comparing these populations, as well as proposed mixed populations, provides a concrete exercise in model creation, comparison, choice, and validation.
Statistics

CERN Document Server

Hayslett, H T

1991-01-01

Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the
Statistical basis for positive identification in forensic anthropology.

Science.gov (United States)

Steadman, Dawnie Wolfe; Adams, Bradley J; Konigsberg, Lyle W

2006-09-01

Forensic scientists are often expected to present the likelihood of DNA identifications in US courts based on comparative population data, yet forensic anthropologists tend not to quantify the strength of an osteological identification. Because forensic anthropologists are trained first and foremost as physical anthropologists, they emphasize estimation problems at the expense of evidentiary problems, but this approach must be reexamined. In this paper, the statistical bases for presenting osteological and dental evidence are outlined, using a forensic case as a motivating example. A brief overview of Bayesian statistics is provided, and methods to calculate likelihood ratios for five aspects of the biological profile are demonstrated. This paper emphasizes the definition of appropriate reference samples and of the "population at large," and points out the conceptual differences between them. Several databases are introduced for both reference information and to characterize the "population at large," and new data are compiled to calculate the frequency of specific characters, such as age or fractures, within the "population at large." Despite small individual likelihood ratios for age, sex, and stature in the case example, the power of this approach is that, assuming each likelihood ratio is independent, the product rule can be applied. In this particular example, it is over three million times more likely to obtain the observed osteological and dental data if the identification is correct than if the identification is incorrect. This likelihood ratio is a convincing statistic that can support the forensic anthropologist's opinion on personal identity in court. 2006 Wiley-Liss, Inc.
A Statistical Modeling Framework for Characterising Uncertainty in Large Datasets: Application to Ocean Colour

Directory of Open Access Journals (Sweden)

Peter E. Land

2018-05-01

Full Text Available Uncertainty estimation is crucial to establishing confidence in any data analysis, and this is especially true for Essential Climate Variables, including ocean colour. Methods for deriving uncertainty vary greatly across data types, so a generic statistics-based approach applicable to multiple data types is an advantage to simplify the use and understanding of uncertainty data. Progress towards rigorous uncertainty analysis of ocean colour has been slow, in part because of the complexity of ocean colour processing. Here, we present a general approach to uncertainty characterisation, using a database of satellite-in situ matchups to generate a statistical model of satellite uncertainty as a function of its contributing variables. With an example NASA MODIS-Aqua chlorophyll-a matchups database mostly covering the north Atlantic, we demonstrate a model that explains 67% of the squared error in log(chlorophyll-a as a potentially correctable bias, with the remaining uncertainty being characterised as standard deviation and standard error at each pixel. The method is quite general, depending only on the existence of a suitable database of matchups or reference values, and can be applied to other sensors and data types such as other satellite observed Essential Climate Variables, empirical algorithms derived from in situ data, or even model data.

A NEW TEST OF THE STATISTICAL NATURE OF THE BRIGHTEST CLUSTER GALAXIES

International Nuclear Information System (INIS)

Lin, Yen-Ting; Ostriker, Jeremiah P.; Miller, Christopher J.

2010-01-01

A novel statistic is proposed to examine the hypothesis that all cluster galaxies are drawn from the same luminosity distribution (LD). In such a 'statistical model' of galaxy LD, the brightest cluster galaxies (BCGs) are simply the statistical extreme of the galaxy population. Using a large sample of nearby clusters, we show that BCGs in high luminosity clusters (e.g., L tot ∼> 4 x 10 11 h -2 70 L sun ) are unlikely (probability ≤3 x 10 -4 ) to be drawn from the LD defined by all red cluster galaxies more luminous than M r = -20. On the other hand, BCGs in less luminous clusters are consistent with being the statistical extreme. Applying our method to the second brightest galaxies, we show that they are consistent with being the statistical extreme, which implies that the BCGs are also distinct from non-BCG luminous, red, cluster galaxies. We point out some issues with the interpretation of the classical tests proposed by Tremaine and Richstone (TR) that are designed to examine the statistical nature of BCGs, investigate the robustness of both our statistical test and those of TR against difficulties in photometry of galaxies of large angular size, and discuss the implication of our findings on surveys that use the luminous red galaxies to measure the baryon acoustic oscillation features in the galaxy power spectrum.
New scanning technique using Adaptive Statistical Iterative Reconstruction (ASIR) significantly reduced the radiation dose of cardiac CT.

Science.gov (United States)

Tumur, Odgerel; Soon, Kean; Brown, Fraser; Mykytowycz, Marcus

2013-06-01

The aims of our study were to evaluate the effect of application of Adaptive Statistical Iterative Reconstruction (ASIR) algorithm on the radiation dose of coronary computed tomography angiography (CCTA) and its effects on image quality of CCTA and to evaluate the effects of various patient and CT scanning factors on the radiation dose of CCTA. This was a retrospective study that included 347 consecutive patients who underwent CCTA at a tertiary university teaching hospital between 1 July 2009 and 20 September 2011. Analysis was performed comparing patient demographics, scan characteristics, radiation dose and image quality in two groups of patients in whom conventional Filtered Back Projection (FBP) or ASIR was used for image reconstruction. There were 238 patients in the FBP group and 109 patients in the ASIR group. There was no difference between the groups in the use of prospective gating, scan length or tube voltage. In ASIR group, significantly lower tube current was used compared with FBP group, 550 mA (450-600) vs. 650 mA (500-711.25) (median (interquartile range)), respectively, P ASIR group compared with FBP group, 4.29 mSv (2.84-6.02) vs. 5.84 mSv (3.88-8.39) (median (interquartile range)), respectively, P ASIR was associated with increased image noise compared with FBP (39.93 ± 10.22 vs. 37.63 ± 18.79 (mean ± standard deviation), respectively, P ASIR reduces the radiation dose of CCTA without affecting the image quality. © 2013 The Authors. Journal of Medical Imaging and Radiation Oncology © 2013 The Royal Australian and New Zealand College of Radiologists.
Multivariate statistical analysis for x-ray photoelectron spectroscopy spectral imaging: Effect of image acquisition time

International Nuclear Information System (INIS)

Peebles, D.E.; Ohlhausen, J.A.; Kotula, P.G.; Hutton, S.; Blomfield, C.

2004-01-01

The acquisition of spectral images for x-ray photoelectron spectroscopy (XPS) is a relatively new approach, although it has been used with other analytical spectroscopy tools for some time. This technique provides full spectral information at every pixel of an image, in order to provide a complete chemical mapping of the imaged surface area. Multivariate statistical analysis techniques applied to the spectral image data allow the determination of chemical component species, and their distribution and concentrations, with minimal data acquisition and processing times. Some of these statistical techniques have proven to be very robust and efficient methods for deriving physically realistic chemical components without input by the user other than the spectral matrix itself. The benefits of multivariate analysis of the spectral image data include significantly improved signal to noise, improved image contrast and intensity uniformity, and improved spatial resolution - which are achieved due to the effective statistical aggregation of the large number of often noisy data points in the image. This work demonstrates the improvements in chemical component determination and contrast, signal-to-noise level, and spatial resolution that can be obtained by the application of multivariate statistical analysis to XPS spectral images
Model selection for contingency tables with algebraic statistics

NARCIS (Netherlands)

Krampe, A.; Kuhnt, S.; Gibilisco, P.; Riccimagno, E.; Rogantin, M.P.; Wynn, H.P.

2009-01-01

Goodness-of-fit tests based on chi-square approximations are commonly used in the analysis of contingency tables. Results from algebraic statistics combined with MCMC methods provide alternatives to the chi-square approximation. However, within a model selection procedure usually a large number of
Industrial commodity statistics yearbook 2001. Production statistics (1992-2001)

International Nuclear Information System (INIS)

2003-01-01

This is the thirty-fifth in a series of annual compilations of statistics on world industry designed to meet both the general demand for information of this kind and the special requirements of the United Nations and related international bodies. Beginning with the 1992 edition, the title of the publication was changed to industrial Commodity Statistics Yearbook as the result of a decision made by the United Nations Statistical Commission at its twenty-seventh session to discontinue, effective 1994, publication of the Industrial Statistics Yearbook, volume I, General Industrial Statistics by the Statistics Division of the United Nations. The United Nations Industrial Development Organization (UNIDO) has become responsible for the collection and dissemination of general industrial statistics while the Statistics Division of the United Nations continues to be responsible for industrial commodity production statistics. The previous title, Industrial Statistics Yearbook, volume II, Commodity Production Statistics, was introduced in the 1982 edition. The first seven editions in this series were published under the title The Growth of World industry and the next eight editions under the title Yearbook of Industrial Statistics. This edition of the Yearbook contains annual quantity data on production of industrial commodities by country, geographical region, economic grouping and for the world. A standard list of about 530 commodities (about 590 statistical series) has been adopted for the publication. The statistics refer to the ten-year period 1992-2001 for about 200 countries and areas
Industrial commodity statistics yearbook 2002. Production statistics (1993-2002)

International Nuclear Information System (INIS)

2004-01-01

This is the thirty-sixth in a series of annual compilations of statistics on world industry designed to meet both the general demand for information of this kind and the special requirements of the United Nations and related international bodies. Beginning with the 1992 edition, the title of the publication was changed to industrial Commodity Statistics Yearbook as the result of a decision made by the United Nations Statistical Commission at its twenty-seventh session to discontinue, effective 1994, publication of the Industrial Statistics Yearbook, volume I, General Industrial Statistics by the Statistics Division of the United Nations. The United Nations Industrial Development Organization (UNIDO) has become responsible for the collection and dissemination of general industrial statistics while the Statistics Division of the United Nations continues to be responsible for industrial commodity production statistics. The previous title, Industrial Statistics Yearbook, volume II, Commodity Production Statistics, was introduced in the 1982 edition. The first seven editions in this series were published under the title 'The Growth of World industry' and the next eight editions under the title 'Yearbook of Industrial Statistics'. This edition of the Yearbook contains annual quantity data on production of industrial commodities by country, geographical region, economic grouping and for the world. A standard list of about 530 commodities (about 590 statistical series) has been adopted for the publication. The statistics refer to the ten-year period 1993-2002 for about 200 countries and areas
Industrial commodity statistics yearbook 2000. Production statistics (1991-2000)

International Nuclear Information System (INIS)

2002-01-01

This is the thirty-third in a series of annual compilations of statistics on world industry designed to meet both the general demand for information of this kind and the special requirements of the United Nations and related international bodies. Beginning with the 1992 edition, the title of the publication was changed to industrial Commodity Statistics Yearbook as the result of a decision made by the United Nations Statistical Commission at its twenty-seventh session to discontinue, effective 1994, publication of the Industrial Statistics Yearbook, volume I, General Industrial Statistics by the Statistics Division of the United Nations. The United Nations Industrial Development Organization (UNIDO) has become responsible for the collection and dissemination of general industrial statistics while the Statistics Division of the United Nations continues to be responsible for industrial commodity production statistics. The previous title, Industrial Statistics Yearbook, volume II, Commodity Production Statistics, was introduced in the 1982 edition. The first seven editions in this series were published under the title The Growth of World industry and the next eight editions under the title Yearbook of Industrial Statistics. This edition of the Yearbook contains annual quantity data on production of industrial commodities by country, geographical region, economic grouping and for the world. A standard list of about 530 commodities (about 590 statistical series) has been adopted for the publication. Most of the statistics refer to the ten-year period 1991-2000 for about 200 countries and areas
Macro-indicators of citation impacts of six prolific countries: InCites data and the statistical significance of trends.

Directory of Open Access Journals (Sweden)

Lutz Bornmann

Full Text Available Using the InCites tool of Thomson Reuters, this study compares normalized citation impact values calculated for China, Japan, France, Germany, United States, and the UK throughout the time period from 1981 to 2010. InCites offers a unique opportunity to study the normalized citation impacts of countries using (i a long publication window (1981 to 2010, (ii a differentiation in (broad or more narrow subject areas, and (iii allowing for the use of statistical procedures in order to obtain an insightful investigation of national citation trends across the years. Using four broad categories, our results show significantly increasing trends in citation impact values for France, the UK, and especially Germany across the last thirty years in all areas. The citation impact of papers from China is still at a relatively low level (mostly below the world average, but the country follows an increasing trend line. The USA exhibits a stable pattern of high citation impact values across the years. With small impact differences between the publication years, the US trend is increasing in engineering and technology but decreasing in medical and health sciences as well as in agricultural sciences. Similar to the USA, Japan follows increasing as well as decreasing trends in different subject areas, but the variability across the years is small. In most of the years, papers from Japan perform below or approximately at the world average in each subject area.
Robust Combining of Disparate Classifiers Through Order Statistics

Science.gov (United States)

Tumer, Kagan; Ghosh, Joydeep

2001-01-01

Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.
Nonlinear wave chaos: statistics of second harmonic fields.

Science.gov (United States)

Zhou, Min; Ott, Edward; Antonsen, Thomas M; Anlage, Steven M

2017-10-01

Concepts from the field of wave chaos have been shown to successfully predict the statistical properties of linear electromagnetic fields in electrically large enclosures. The Random Coupling Model (RCM) describes these properties by incorporating both universal features described by Random Matrix Theory and the system-specific features of particular system realizations. In an effort to extend this approach to the nonlinear domain, we add an active nonlinear frequency-doubling circuit to an otherwise linear wave chaotic system, and we measure the statistical properties of the resulting second harmonic fields. We develop an RCM-based model of this system as two linear chaotic cavities coupled by means of a nonlinear transfer function. The harmonic field strengths are predicted to be the product of two statistical quantities and the nonlinearity characteristics. Statistical results from measurement-based calculation, RCM-based simulation, and direct experimental measurements are compared and show good agreement over many decades of power.
HistFitter software framework for statistical data analysis

CERN Document Server

Baak, M.; Côte, D.; Koutsman, A.; Lorenz, J.; Short, D.

2015-01-01

We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fitted to data and interpreted with statistical tests. A key innovation of HistFitter is its design, which is rooted in core analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its very fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with mu...
Binomial vs poisson statistics in radiation studies

International Nuclear Information System (INIS)

Foster, J.; Kouris, K.; Spyrou, N.M.; Matthews, I.P.; Welsh National School of Medicine, Cardiff

1983-01-01

The processes of radioactive decay, decay and growth of radioactive species in a radioactive chain, prompt emission(s) from nuclear reactions, conventional activation and cyclic activation are discussed with respect to their underlying statistical density function. By considering the transformation(s) that each nucleus may undergo it is shown that all these processes are fundamentally binomial. Formally, when the number of experiments N is large and the probability of success p is close to zero, the binomial is closely approximated by the Poisson density function. In radiation and nuclear physics, N is always large: each experiment can be conceived of as the observation of the fate of each of the N nuclei initially present. Whether p, the probability that a given nucleus undergoes a prescribed transformation, is close to zero depends on the process and nuclide(s) concerned. Hence, although a binomial description is always valid, the Poisson approximation is not always adequate. Therefore further clarification is provided as to when the binomial distribution must be used in the statistical treatment of detected events. (orig.)
Classification, (big) data analysis and statistical learning

CERN Document Server

Conversano, Claudio; Vichi, Maurizio

2018-01-01

This edited book focuses on the latest developments in classification, statistical learning, data analysis and related areas of data science, including statistical analysis of large datasets, big data analytics, time series clustering, integration of data from different sources, as well as social networks. It covers both methodological aspects as well as applications to a wide range of areas such as economics, marketing, education, social sciences, medicine, environmental sciences and the pharmaceutical industry. In addition, it describes the basic features of the software behind the data analysis results, and provides links to the corresponding codes and data sets where necessary. This book is intended for researchers and practitioners who are interested in the latest developments and applications in the field. The peer-reviewed contributions were presented at the 10th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society, held in Santa Margherita di Pul...
Transport Coefficients from Large Deviation Functions

OpenAIRE

Gao, Chloe Ya; Limmer, David T.

2017-01-01

We describe a method for computing transport coefficients from the direct evaluation of large deviation functions. This method is general, relying on only equilibrium fluctuations, and is statistically efficient, employing trajectory based importance sampling. Equilibrium fluctuations of molecular currents are characterized by their large deviation functions, which are scaled cumulant generating functions analogous to the free energies. A diffusion Monte Carlo algorithm is used to evaluate th...
Childhood-compared to adolescent-onset bipolar disorder has more statistically significant clinical correlates.

Science.gov (United States)

Holtzman, Jessica N; Miller, Shefali; Hooshmand, Farnaz; Wang, Po W; Chang, Kiki D; Hill, Shelley J; Rasgon, Natalie L; Ketter, Terence A

2015-07-01

The strengths and limitations of considering childhood-and adolescent-onset bipolar disorder (BD) separately versus together remain to be established. We assessed this issue. BD patients referred to the Stanford Bipolar Disorder Clinic during 2000-2011 were assessed with the Systematic Treatment Enhancement Program for BD Affective Disorders Evaluation. Patients with childhood- and adolescent-onset were compared to those with adult-onset for 7 unfavorable bipolar illness characteristics with replicated associations with early-onset patients. Among 502 BD outpatients, those with childhood- (adolescent- (13-18 years, N=218) onset had significantly higher rates for 4/7 unfavorable illness characteristics, including lifetime comorbid anxiety disorder, at least ten lifetime mood episodes, lifetime alcohol use disorder, and prior suicide attempt, than those with adult-onset (>18 years, N=174). Childhood- but not adolescent-onset BD patients also had significantly higher rates of first-degree relative with mood disorder, lifetime substance use disorder, and rapid cycling in the prior year. Patients with pooled childhood/adolescent - compared to adult-onset had significantly higher rates for 5/7 of these unfavorable illness characteristics, while patients with childhood- compared to adolescent-onset had significantly higher rates for 4/7 of these unfavorable illness characteristics. Caucasian, insured, suburban, low substance abuse, American specialty clinic-referred sample limits generalizability. Onset age is based on retrospective recall. Childhood- compared to adolescent-onset BD was more robustly related to unfavorable bipolar illness characteristics, so pooling these groups attenuated such relationships. Further study is warranted to determine the extent to which adolescent-onset BD represents an intermediate phenotype between childhood- and adult-onset BD. Copyright © 2015 Elsevier B.V. All rights reserved.
Encounter Probability of Significant Wave Height

DEFF Research Database (Denmark)

Liu, Z.; Burcharth, H. F.

The determination of the design wave height (often given as the significant wave height) is usually based on statistical analysis of long-term extreme wave height measurement or hindcast. The result of such extreme wave height analysis is often given as the design wave height corresponding to a c...
LOD significance thresholds for QTL analysis in experimental populations of diploid species

Science.gov (United States)

Van Ooijen JW

1999-11-01

Linkage analysis with molecular genetic markers is a very powerful tool in the biological research of quantitative traits. The lack of an easy way to know what areas of the genome can be designated as statistically significant for containing a gene affecting the quantitative trait of interest hampers the important prediction of the rate of false positives. In this paper four tables, obtained by large-scale simulations, are presented that can be used with a simple formula to get the false-positives rate for analyses of the standard types of experimental populations with diploid species with any size of genome. A new definition of the term 'suggestive linkage' is proposed that allows a more objective comparison of results across species.
Statistical methods to monitor the West Valley off-gas system

International Nuclear Information System (INIS)

Eggett, D.L.

1990-01-01

This paper reports on the of-gas system for the ceramic melter operated at the West Valley Demonstration Project at West Valley, NY, monitored during melter operation. A one-at-a-time method of monitoring the parameters of the off-gas system is not statistically sound. Therefore, multivariate statistical methods appropriate for the monitoring of many correlated parameters will be used. Monitoring a large number of parameters increases the probability of a false out-of-control signal. If the parameters being monitored are statistically independent, the control limits can be easily adjusted to obtain the desired probability of a false out-of-control signal. The principal component (PC) scores have desirable statistical properties when the original variables are distributed as multivariate normals. Two statistics derived from the PC scores and used to form multivariate control charts are outlined and their distributional properties reviewed
Study on loss detection algorithms for tank monitoring data using multivariate statistical analysis

International Nuclear Information System (INIS)

Suzuki, Mitsutoshi; Burr, Tom

2009-01-01

Evaluation of solution monitoring data to support material balance evaluation was proposed about a decade ago because of concerns regarding the large throughput planned at Rokkasho Reprocessing Plant (RRP). A numerical study using the simulation code (FACSIM) was done and significant increases in the detection probabilities (DP) for certain types of losses were shown. To be accepted internationally, it is very important to verify such claims using real solution monitoring data. However, a demonstrative study with real tank data has not been carried out due to the confidentiality of the tank data. This paper describes an experimental study that has been started using actual data from the Solution Measurement and Monitoring System (SMMS) in the Tokai Reprocessing Plant (TRP) and the Savannah River Site (SRS). Multivariate statistical methods, such as a vector cumulative sum and a multi-scale statistical analysis, have been applied to the real tank data that have superimposed simulated loss. Although quantitative conclusions have not been derived for the moment due to the difficulty of baseline evaluation, the multivariate statistical methods remain promising for abrupt and some types of protracted loss detection. (author)
Statistical complexity without explicit reference to underlying probabilities

Science.gov (United States)

Pennini, F.; Plastino, A.

2018-06-01

We show that extremely simple systems of a not too large number of particles can be simultaneously thermally stable and complex. To such an end, we extend the statistical complexity's notion to simple configurations of non-interacting particles, without appeal to probabilities, and discuss configurational properties.

Power of mental health nursing research: a statistical analysis of studies in the International Journal of Mental Health Nursing.

Science.gov (United States)

Gaskin, Cadeyrn J; Happell, Brenda

2013-02-01

Having sufficient power to detect effect sizes of an expected magnitude is a core consideration when designing studies in which inferential statistics will be used. The main aim of this study was to investigate the statistical power in studies published in the International Journal of Mental Health Nursing. From volumes 19 (2010) and 20 (2011) of the journal, studies were analysed for their power to detect small, medium, and large effect sizes, according to Cohen's guidelines. The power of the 23 studies included in this review to detect small, medium, and large effects was 0.34, 0.79, and 0.94, respectively. In 90% of papers, no adjustments for experiment-wise error were reported. With a median of nine inferential tests per paper, the mean experiment-wise error rate was 0.51. A priori power analyses were only reported in 17% of studies. Although effect sizes for correlations and regressions were routinely reported, effect sizes for other tests (χ(2)-tests, t-tests, ANOVA/MANOVA) were largely absent from the papers. All types of effect sizes were infrequently interpreted. Researchers are strongly encouraged to conduct power analyses when designing studies, and to avoid scattergun approaches to data analysis (i.e. undertaking large numbers of tests in the hope of finding 'significant' results). Because reviewing effect sizes is essential for determining the clinical significance of study findings, researchers would better serve the field of mental health nursing if they reported and interpreted effect sizes. © 2012 The Authors. International Journal of Mental Health Nursing © 2012 Australian College of Mental Health Nurses Inc.
A Statistical Test of Correlations and Periodicities in the Geological Records

Science.gov (United States)

Yabushita, S.

1997-09-01

Matsumoto & Kubotani argued that there is a positive and statistically significant correlation between cratering and mass extinction. This argument is critically examined by adopting a method of Ertel used by Matsumoto & Kubotani but by applying it more directly to the extinction and cratering records. It is shown that on the null-hypothesis of random distribution of crater ages, the observed correlation has a probability of occurrence of 13%. However, when large craters are excluded whose ages agree with the times of peaks of extinction rate of marine fauna, one obtains a negative correlation. This result strongly indicates that mass extinction are not due to accumulation of impacts but due to isolated gigantic impacts.
[Comment on] Statistical discrimination

Science.gov (United States)

Chinn, Douglas

In the December 8, 1981, issue of Eos, a news item reported the conclusion of a National Research Council study that sexual discrimination against women with Ph.D.'s exists in the field of geophysics. Basically, the item reported that even when allowances are made for motherhood the percentage of female Ph.D.'s holding high university and corporate positions is significantly lower than the percentage of male Ph.D.'s holding the same types of positions. The sexual discrimination conclusion, based only on these statistics, assumes that there are no basic psychological differences between men and women that might cause different populations in the employment group studied. Therefore, the reasoning goes, after taking into account possible effects from differences related to anatomy, such as women stopping their careers in order to bear and raise children, the statistical distributions of positions held by male and female Ph.D.'s ought to be very similar to one another. Any significant differences between the distributions must be caused primarily by sexual discrimination.
Statistical characteristics and stability index (si) of large-sized landslide dams around the world

International Nuclear Information System (INIS)

Iqbal, J.; Dai, F.; Raja, I.A.

2014-01-01

In the last few decades, landslide dams have received greater attention of researchers, as they have caused loss to property and human lives. Over 261 large-sized landslide dams from different countries of the world with volume greater than 1 x 105 m have been reviewed for this study. The data collected for this study shows that 58% of the catastrophic landslides were triggered by earthquakes and 21 % by rainfall, revealing that earthquake and rainfall are the two major triggers, accounting for 75% of large-sized landslide dams. These land-slides were most frequent during last two decades (1990-2010) throughout the world. The mean landslide dam volume of the studied cases was 53.39 x 10 m with mean dam height of 71.98 m, while the mean lake volume was found to be 156.62 x 10 m. Failure of these large landslide dams pose a severe threat to the property and people living downstream, hence immediate attention is required to deal with this problem. A stability index (SI) has been derived on the basis on 59 large-sized landslide dams (out of the 261 dams) with complete parametric information. (author)
What are decision making styles for international apparel brands in a large emerging market?

OpenAIRE

De Mattos, Claudio; Salciuviene, Laura; Auruskeviciene, Vilte; Juneja, Garima

2015-01-01

The main purpose of the paper is to identify consumer decision making styles based on Sproles & Kendall's (1986) framework in a large emerging market for international apparel brands. An online questionnaire-based survey with individual Indian consumers was conducted. The results of this study identify five consumer decision making styles among Indian consumers when selecting international apparel brands. The findings also suggest significant statistical differences between males and fema...
Harmonic statistics

International Nuclear Information System (INIS)

Eliazar, Iddo

2017-01-01

The exponential, the normal, and the Poisson statistical laws are of major importance due to their universality. Harmonic statistics are as universal as the three aforementioned laws, but yet they fall short in their ‘public relations’ for the following reason: the full scope of harmonic statistics cannot be described in terms of a statistical law. In this paper we describe harmonic statistics, in their full scope, via an object termed harmonic Poisson process: a Poisson process, over the positive half-line, with a harmonic intensity. The paper reviews the harmonic Poisson process, investigates its properties, and presents the connections of this object to an assortment of topics: uniform statistics, scale invariance, random multiplicative perturbations, Pareto and inverse-Pareto statistics, exponential growth and exponential decay, power-law renormalization, convergence and domains of attraction, the Langevin equation, diffusions, Benford’s law, and 1/f noise. - Highlights: • Harmonic statistics are described and reviewed in detail. • Connections to various statistical laws are established. • Connections to perturbation, renormalization and dynamics are established.
Harmonic statistics

Energy Technology Data Exchange (ETDEWEB)

Eliazar, Iddo, E-mail: eliazar@post.tau.ac.il

2017-05-15

The exponential, the normal, and the Poisson statistical laws are of major importance due to their universality. Harmonic statistics are as universal as the three aforementioned laws, but yet they fall short in their ‘public relations’ for the following reason: the full scope of harmonic statistics cannot be described in terms of a statistical law. In this paper we describe harmonic statistics, in their full scope, via an object termed harmonic Poisson process: a Poisson process, over the positive half-line, with a harmonic intensity. The paper reviews the harmonic Poisson process, investigates its properties, and presents the connections of this object to an assortment of topics: uniform statistics, scale invariance, random multiplicative perturbations, Pareto and inverse-Pareto statistics, exponential growth and exponential decay, power-law renormalization, convergence and domains of attraction, the Langevin equation, diffusions, Benford’s law, and 1/f noise. - Highlights: • Harmonic statistics are described and reviewed in detail. • Connections to various statistical laws are established. • Connections to perturbation, renormalization and dynamics are established.
Serum Advanced Oxidation Protein Products in Oral Squamous Cell Carcinoma: Possible Markers of Diagnostic Significance

Directory of Open Access Journals (Sweden)

Abhishek Singh Nayyar

2013-07-01

Full Text Available Background: The aim of this study was to measure the concentrations (levels ofserum total proteins and advanced oxidation protein products as markers of oxidantmediated protein damage in the sera of patients with oral cancers.Methods: The study consisted of the sera analyses of serum total protein andadvanced oxidation protein products’ levels in 30 age and sex matched controls, 60patients with reported pre-cancerous lesions and/or conditions and 60 patients withhistologically proven oral squamous cell carcinoma. One way analyses of variance wereused to test the difference between groups. To determine which of the two groups’ meanswere significantly different, the post-hoc test of Bonferroni was used. The results wereaveraged as mean ± standard deviation. In the above test, P values less than 0.05 weretaken to be statistically significant. The normality of data was checked before thestatistical analysis was performed.Results: The study revealed statistically significant variations in serum levels ofadvanced oxidation protein products (P<0.001. Serum levels of total protein showedextensive variations; therefore the results were largely inconclusive and statisticallyinsignificant.Conclusion: The results emphasize the need for more studies with larger samplesizes to be conducted before a conclusive role can be determined for sera levels of totalprotein and advanced oxidation protein products as markers both for diagnosticsignificance and the transition from the various oral pre-cancerous lesions and conditionsinto frank oral cancers.
Statistical mechanics for a class of quantum statistics

International Nuclear Information System (INIS)

Isakov, S.B.

1994-01-01

Generalized statistical distributions for identical particles are introduced for the case where filling a single-particle quantum state by particles depends on filling states of different momenta. The system of one-dimensional bosons with a two-body potential that can be solved by means of the thermodynamic Bethe ansatz is shown to be equivalent thermodynamically to a system of free particles obeying statistical distributions of the above class. The quantum statistics arising in this way are completely determined by the two-particle scattering phases of the corresponding interacting systems. An equation determining the statistical distributions for these statistics is derived
STATISTICAL STUDY OF STRONG AND EXTREME GEOMAGNETIC DISTURBANCES AND SOLAR CYCLE CHARACTERISTICS

International Nuclear Information System (INIS)

Kilpua, E. K. J.; Olspert, N.; Grigorievskiy, A.; Käpylä, M. J.; Tanskanen, E. I.; Pelt, J.; Miyahara, H.; Kataoka, R.; Liu, Y. D.

2015-01-01

We study the relation between strong and extreme geomagnetic storms and solar cycle characteristics. The analysis uses an extensive geomagnetic index AA data set spanning over 150 yr complemented by the Kakioka magnetometer recordings. We apply Pearson correlation statistics and estimate the significance of the correlation with a bootstrapping technique. We show that the correlation between the storm occurrence and the strength of the solar cycle decreases from a clear positive correlation with increasing storm magnitude toward a negligible relationship. Hence, the quieter Sun can also launch superstorms that may lead to significant societal and economic impact. Our results show that while weaker storms occur most frequently in the declining phase, the stronger storms have the tendency to occur near solar maximum. Our analysis suggests that the most extreme solar eruptions do not have a direct connection between the solar large-scale dynamo-generated magnetic field, but are rather associated with smaller-scale dynamo and resulting turbulent magnetic fields. The phase distributions of sunspots and storms becoming increasingly in phase with increasing storm strength, on the other hand, may indicate that the extreme storms are related to the toroidal component of the solar large-scale field
STATISTICAL STUDY OF STRONG AND EXTREME GEOMAGNETIC DISTURBANCES AND SOLAR CYCLE CHARACTERISTICS

Energy Technology Data Exchange (ETDEWEB)

Kilpua, E. K. J. [Department of Physics, University Helsinki (Finland); Olspert, N.; Grigorievskiy, A.; Käpylä, M. J.; Tanskanen, E. I.; Pelt, J. [ReSoLVE Centre of Excellence, Department of Computer Science, P.O. Box 15400, FI-00076 Aalto Univeristy (Finland); Miyahara, H. [Musashino Art University, 1-736 Ogawa-cho, Kodaira-shi, Tokyo 187-8505 (Japan); Kataoka, R. [National Institute of Polar Research, 10-3 Midori-cho, Tachikawa, Tokyo 190-8518 (Japan); Liu, Y. D. [State Key Laboratory of Space Weather, National Space Science Center, Chinese Academy of Sciences, Beijing 100190 (China)

2015-06-20

We study the relation between strong and extreme geomagnetic storms and solar cycle characteristics. The analysis uses an extensive geomagnetic index AA data set spanning over 150 yr complemented by the Kakioka magnetometer recordings. We apply Pearson correlation statistics and estimate the significance of the correlation with a bootstrapping technique. We show that the correlation between the storm occurrence and the strength of the solar cycle decreases from a clear positive correlation with increasing storm magnitude toward a negligible relationship. Hence, the quieter Sun can also launch superstorms that may lead to significant societal and economic impact. Our results show that while weaker storms occur most frequently in the declining phase, the stronger storms have the tendency to occur near solar maximum. Our analysis suggests that the most extreme solar eruptions do not have a direct connection between the solar large-scale dynamo-generated magnetic field, but are rather associated with smaller-scale dynamo and resulting turbulent magnetic fields. The phase distributions of sunspots and storms becoming increasingly in phase with increasing storm strength, on the other hand, may indicate that the extreme storms are related to the toroidal component of the solar large-scale field.
Misspecified poisson regression models for large-scale registry data: inference for 'large n and small p'.

Science.gov (United States)

Grøn, Randi; Gerds, Thomas A; Andersen, Per K

2016-03-30

Poisson regression is an important tool in register-based epidemiology where it is used to study the association between exposure variables and event rates. In this paper, we will discuss the situation with 'large n and small p', where n is the sample size and p is the number of available covariates. Specifically, we are concerned with modeling options when there are time-varying covariates that can have time-varying effects. One problem is that tests of the proportional hazards assumption, of no interactions between exposure and other observed variables, or of other modeling assumptions have large power due to the large sample size and will often indicate statistical significance even for numerically small deviations that are unimportant for the subject matter. Another problem is that information on important confounders may be unavailable. In practice, this situation may lead to simple working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods are illustrated using data from the Danish national registries investigating the diabetes incidence for individuals treated with antipsychotics compared with the general unexposed population. Copyright © 2015 John Wiley & Sons, Ltd.
Weather is not significantly correlated with destination-specific transport-related physical activity among adults: A large-scale temporally matched analysis.

Science.gov (United States)

Durand, Casey P; Zhang, Kai; Salvo, Deborah

2017-08-01

Weather is an element of the natural environment that could have a significant effect on physical activity. Existing research, however, indicates only modest correlations between measures of weather and physical activity. This prior work has been limited by a failure to use time-matched weather and physical activity data, or has not adequately examined the different domains of physical activity (transport, leisure, occupational, etc.). Our objective was to identify the correlation between weather variables and destination-specific transport-related physical activity in adults. Data were sourced from the California Household Travel Survey, collected in 2012-3. Weather variables included: relative humidity, temperature, wind speed, and precipitation. Transport-related physical activity (walking) was sourced from participant-recorded travel diaries. Three-part hurdle models were used to analyze the data. Results indicate statistically or substantively insignificant correlations between the weather variables and transport-related physical activity for all destination types. These results provide the strongest evidence to date that transport-related physical activity may occur relatively independently of weather conditions. The knowledge that weather conditions do not seem to be a significant barrier to this domain of activity may potentially expand the universe of geographic locations that are amenable to environmental and programmatic interventions to increase transport-related walking. Copyright © 2017 Elsevier Inc. All rights reserved.
Data-driven inference for the spatial scan statistic.

Science.gov (United States)

Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C

2011-08-02

Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
The problem of low variance voxels in statistical parametric mapping; a new hat avoids a 'haircut'.

Science.gov (United States)

Ridgway, Gerard R; Litvak, Vladimir; Flandin, Guillaume; Friston, Karl J; Penny, Will D

2012-02-01

Statistical parametric mapping (SPM) locates significant clusters based on a ratio of signal to noise (a 'contrast' of the parameters divided by its standard error) meaning that very low noise regions, for example outside the brain, can attain artefactually high statistical values. Similarly, the commonly applied preprocessing step of Gaussian spatial smoothing can shift the peak statistical significance away from the peak of the contrast and towards regions of lower variance. These problems have previously been identified in positron emission tomography (PET) (Reimold et al., 2006) and voxel-based morphometry (VBM) (Acosta-Cabronero et al., 2008), but can also appear in functional magnetic resonance imaging (fMRI) studies. Additionally, for source-reconstructed magneto- and electro-encephalography (M/EEG), the problems are particularly severe because sparsity-favouring priors constrain meaningfully large signal and variance to a small set of compactly supported regions within the brain. (Acosta-Cabronero et al., 2008) suggested adding noise to background voxels (the 'haircut'), effectively increasing their noise variance, but at the cost of contaminating neighbouring regions with the added noise once smoothed. Following theory and simulations, we propose to modify--directly and solely--the noise variance estimate, and investigate this solution on real imaging data from a range of modalities. Copyright © 2011 Elsevier Inc. All rights reserved.
Medical Statistics – Mathematics or Oracle? Farewell Lecture

Directory of Open Access Journals (Sweden)

Gaus, Wilhelm

2005-06-01

Full Text Available Certainty is rare in medicine. This is a direct consequence of the individuality of each and every human being and the reason why we need medical statistics. However, statistics have their pitfalls, too. Fig. 1 shows that the suicide rate peaks in youth, while in Fig. 2 the rate is highest in midlife and Fig. 3 in old age. Which of these contradictory messages is right? After an introduction to the principles of statistical testing, this lecture examines the probability with which statistical test results are correct. For this purpose the level of significance and the power of the test are compared with the sensitivity and specificity of a diagnostic procedure. The probability of obtaining correct statistical test results is the same as that for the positive and negative correctness of a diagnostic procedure and therefore depends on prevalence. The focus then shifts to the problem of multiple statistical testing. The lecture demonstrates that for each data set of reasonable size at least one test result proves to be significant - even if the data set is produced by a random number generator. It is extremely important that a hypothesis is generated independently from the data used for its testing. These considerations enable us to understand the gradation of "lame excuses, lies and statistics" and the difference between pure truth and the full truth. Finally, two historical oracles are cited.
Testing for significance of phase synchronisation dynamics in the EEG.

Science.gov (United States)

Daly, Ian; Sweeney-Reed, Catherine M; Nasuto, Slawomir J

2013-06-01

A number of tests exist to check for statistical significance of phase synchronisation within the Electroencephalogram (EEG); however, the majority suffer from a lack of generality and applicability. They may also fail to account for temporal dynamics in the phase synchronisation, regarding synchronisation as a constant state instead of a dynamical process. Therefore, a novel test is developed for identifying the statistical significance of phase synchronisation based upon a combination of work characterising temporal dynamics of multivariate time-series and Markov modelling. We show how this method is better able to assess the significance of phase synchronisation than a range of commonly used significance tests. We also show how the method may be applied to identify and classify significantly different phase synchronisation dynamics in both univariate and multivariate datasets.
New scanning technique using Adaptive Statistical lterative Reconstruction (ASIR) significantly reduced the radiation dose of cardiac CT

International Nuclear Information System (INIS)

Tumur, Odgerel; Soon, Kean; Brown, Fraser; Mykytowycz, Marcus

2013-01-01

The aims of our study were to evaluate the effect of application of Adaptive Statistical Iterative Reconstruction (ASIR) algorithm on the radiation dose of coronary computed tomography angiography (CCTA) and its effects on image quality of CCTA and to evaluate the effects of various patient and CT scanning factors on the radiation dose of CCTA. This was a retrospective study that included 347 consecutive patients who underwent CCTA at a tertiary university teaching hospital between 1 July 2009 and 20 September 2011. Analysis was performed comparing patient demographics, scan characteristics, radiation dose and image quality in two groups of patients in whom conventional Filtered Back Projection (FBP) or ASIR was used for image reconstruction. There were 238 patients in the FBP group and 109 patients in the ASIR group. There was no difference between the groups in the use of prospective gating, scan length or tube voltage. In ASIR group, significantly lower tube current was used compared with FBP group, 550mA (450–600) vs. 650mA (500–711.25) (median (interquartile range)), respectively, P<0.001. There was 27% effective radiation dose reduction in the ASIR group compared with FBP group, 4.29mSv (2.84–6.02) vs. 5.84mSv (3.88–8.39) (median (interquartile range)), respectively, P<0.001. Although ASIR was associated with increased image noise compared with FBP (39.93±10.22 vs. 37.63±18.79 (mean ±standard deviation), respectively, P<001), it did not affect the signal intensity, signal-to-noise ratio, contrast-to-noise ratio or the diagnostic quality of CCTA. Application of ASIR reduces the radiation dose of CCTA without affecting the image quality.
Multifocal Gastric Ulcers Caused by Diffuse Large B Cell Lymphoma in a Patient With Significant Weight Loss

Directory of Open Access Journals (Sweden)

Mark A. Gromski MD

2016-12-01

Full Text Available Primary gastrointestinal (GI lymphoma is a heterogeneous disease with varied clinical presentations. The stomach is the most common GI site and accounts for 70% to 75% of GI lymphomas. We present a patient with gastric diffuse large B cell lymphoma (DLBCL who presented with significant weight loss, early satiety, and multifocal ulcerated gastric lesions. Esophagoduodenoscopy should be performed in patients presenting with warning symptoms as in our case. Diagnosis is usually made by endoscopic biopsies. Multiple treatment modalities including surgery, radiotherapy, and chemotherapy have been used. Advancements in endoscopic and pathologic technology decrease turnaround time for diagnosis and treatment initiation, thus reducing the need for surgery. Health care providers should maintain a high level of suspicion and consider gastric DLBCL as part of the differential diagnosis, especially in those with warning symptoms such as weight loss and early satiety with abnormal endoscopic findings.
Conversion factors and oil statistics

International Nuclear Information System (INIS)

Karbuz, Sohbet

2004-01-01

World oil statistics, in scope and accuracy, are often far from perfect. They can easily lead to misguided conclusions regarding the state of market fundamentals. Without proper attention directed at statistic caveats, the ensuing interpretation of oil market data opens the door to unnecessary volatility, and can distort perception of market fundamentals. Among the numerous caveats associated with the compilation of oil statistics, conversion factors, used to produce aggregated data, play a significant role. Interestingly enough, little attention is paid to conversion factors, i.e. to the relation between different units of measurement for oil. Additionally, the underlying information regarding the choice of a specific factor when trying to produce measurements of aggregated data remains scant. The aim of this paper is to shed some light on the impact of conversion factors for two commonly encountered issues, mass to volume equivalencies (barrels to tonnes) and for broad energy measures encountered in world oil statistics. This paper will seek to demonstrate how inappropriate and misused conversion factors can yield wildly varying results and ultimately distort oil statistics. Examples will show that while discrepancies in commonly used conversion factors may seem trivial, their impact on the assessment of a world oil balance is far from negligible. A unified and harmonised convention for conversion factors is necessary to achieve accurate comparisons and aggregate oil statistics for the benefit of both end-users and policy makers

Development of a statistically-based lower bound fracture toughness curve (Ksub(IR) curve)

International Nuclear Information System (INIS)

Wullaert, R.A.; Server, W.L.; Oldfield, W.; Stahlkopf, K.E.

1977-01-01

A program of initiation fracture toughness measurements on fifty heats of nuclear pressure vessel production materials (including weldments) was used to develop a methodology for establishing a revised reference toughness curve. The new methodology was statistically developed and provides a predefined confidence limit (or tolerance limit) for fracture toughness based upon many heats of a particular type of material. Overall reference curves were developed for seven specific materials using large specimen static and dynamic fracture toughness results. The heat-to-heat variation was removed by normalizing both the fracture toughness and temperature data with the precracked Charpy tanh curve coefficients for each particular heat. The variance and distribution about the curve were determined, and lower bounds of predetermined statistical significance were drawn based upon a Pearson distribution in the lower shelf region (since the data were skewed to high values) and a t-distribution in the transition temperature region (since the data were normally distributed)
VESPA: Very large-scale Evolutionary and Selective Pressure Analyses

Directory of Open Access Journals (Sweden)

Andrew E. Webb

2017-06-01

Full Text Available Background Large-scale molecular evolutionary analyses of protein coding sequences requires a number of preparatory inter-related steps from finding gene families, to generating alignments and phylogenetic trees and assessing selective pressure variation. Each phase of these analyses can represent significant challenges, particularly when working with entire proteomes (all protein coding sequences in a genome from a large number of species. Methods We present VESPA, software capable of automating a selective pressure analysis using codeML in addition to the preparatory analyses and summary statistics. VESPA is written in python and Perl and is designed to run within a UNIX environment. Results We have benchmarked VESPA and our results show that the method is consistent, performs well on both large scale and smaller scale datasets, and produces results in line with previously published datasets. Discussion Large-scale gene family identification, sequence alignment, and phylogeny reconstruction are all important aspects of large-scale molecular evolutionary analyses. VESPA provides flexible software for simplifying these processes along with downstream selective pressure variation analyses. The software automatically interprets results from codeML and produces simplified summary files to assist the user in better understanding the results. VESPA may be found at the following website: http://www.mol-evol.org/VESPA.
Multivariate statistical analysis a high-dimensional approach

CERN Document Server

Serdobolskii, V

2000-01-01

In the last few decades the accumulation of large amounts of in formation in numerous applications. has stimtllated an increased in terest in multivariate analysis. Computer technologies allow one to use multi-dimensional and multi-parametric models successfully. At the same time, an interest arose in statistical analysis with a de ficiency of sample data. Nevertheless, it is difficult to describe the recent state of affairs in applied multivariate methods as satisfactory. Unimprovable (dominating) statistical procedures are still unknown except for a few specific cases. The simplest problem of estimat ing the mean vector with minimum quadratic risk is unsolved, even for normal distributions. Commonly used standard linear multivari ate procedures based on the inversion of sample covariance matrices can lead to unstable results or provide no solution in dependence of data. Programs included in standard statistical packages cannot process 'multi-collinear data' and there are no theoretical recommen ...
Kappa statistic for clustered matched-pair data.

Science.gov (United States)

Yang, Zhao; Zhou, Ming

2014-07-10

Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
Capturing 'R&D excellence': indicators, international statistics, and innovative universities.

Science.gov (United States)

Tijssen, Robert J W; Winnink, Jos J

2018-01-01

Excellent research may contribute to successful science-based technological innovation. We define 'R&D excellence' in terms of scientific research that has contributed to the development of influential technologies, where 'excellence' refers to the top segment of a statistical distribution based on internationally comparative performance scores. Our measurements are derived from frequency counts of literature references ('citations') from patents to research publications during the last 15 years. The 'D' part in R&D is represented by the top 10% most highly cited 'excellent' patents worldwide. The 'R' part is captured by research articles in international scholarly journals that are cited by these patented technologies. After analyzing millions of citing patents and cited research publications, we find very large differences between countries worldwide in terms of the volume of domestic science contributing to those patented technologies. Where the USA produces the largest numbers of cited research publications (partly because of database biases), Switzerland and Israel outperform the US after correcting for the size of their national science systems. To tease out possible explanatory factors, which may significantly affect or determine these performance differentials, we first studied high-income nations and advanced economies. Here we find that the size of R&D expenditure correlates with the sheer size of cited publications, as does the degree of university research cooperation with domestic firms. When broadening our comparative framework to 70 countries (including many medium-income nations) while correcting for size of national science systems, the important explanatory factors become the availability of human resources and quality of science systems. Focusing on the latter factor, our in-depth analysis of 716 research-intensive universities worldwide reveals several universities with very high scores on our two R&D excellence indicators. Confirming the above
A scan statistic for binary outcome based on hypergeometric probability model, with an application to detecting spatial clusters of Japanese encephalitis.

Science.gov (United States)

Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong

2013-01-01

As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.
Large-scale quantitative analysis of painting arts.

Science.gov (United States)

Kim, Daniel; Son, Seung-Woo; Jeong, Hawoong

2014-12-11

Scientists have made efforts to understand the beauty of painting art in their own languages. As digital image acquisition of painting arts has made rapid progress, researchers have come to a point where it is possible to perform statistical analysis of a large-scale database of artistic paints to make a bridge between art and science. Using digital image processing techniques, we investigate three quantitative measures of images - the usage of individual colors, the variety of colors, and the roughness of the brightness. We found a difference in color usage between classical paintings and photographs, and a significantly low color variety of the medieval period. Interestingly, moreover, the increment of roughness exponent as painting techniques such as chiaroscuro and sfumato have advanced is consistent with historical circumstances.
Statistics with JMP graphs, descriptive statistics and probability

CERN Document Server

Goos, Peter

2015-01-01

Peter Goos, Department of Statistics, University ofLeuven, Faculty of Bio-Science Engineering and University ofAntwerp, Faculty of Applied Economics, BelgiumDavid Meintrup, Department of Mathematics and Statistics,University of Applied Sciences Ingolstadt, Faculty of MechanicalEngineering, GermanyThorough presentation of introductory statistics and probabilitytheory, with numerous examples and applications using JMPDescriptive Statistics and Probability provides anaccessible and thorough overview of the most important descriptivestatistics for nominal, ordinal and quantitative data withpartic
The application of statistical methods to assess economic assets

Directory of Open Access Journals (Sweden)

D. V. Dianov

2017-01-01

Full Text Available The article is devoted to consideration and evaluation of machinery, equipment and special equipment, methodological aspects of the use of standards for assessment of buildings and structures in current prices, the valuation of residential, specialized houses, office premises, assessment and reassessment of existing and inactive military assets, the application of statistical methods to obtain the relevant cost estimates.The objective of the scientific article is to consider possible application of statistical tools in the valuation of the assets, composing the core group of elements of national wealth – the fixed assets. Firstly, capital tangible assets constitute the basis of material base of a new value creation, products and non-financial services. The gain, accumulated of tangible assets of a capital nature is a part of the gross domestic product, and from its volume and specific weight in the composition of GDP we can judge the scope of reproductive processes in the country.Based on the methodological materials of the state statistics bodies of the Russian Federation, regulations of the theory of statistics, which describe the methods of statistical analysis such as the index, average values, regression, the methodical approach is structured in the application of statistical tools to obtain value estimates of property, plant and equipment with significant accumulated depreciation. Until now, the use of statistical methodology in the practice of economic assessment of assets is only fragmentary. This applies to both Federal Legislation (Federal law № 135 «On valuation activities in the Russian Federation» dated 16.07.1998 in edition 05.07.2016 and the methodological documents and regulations of the estimated activities, in particular, the valuation activities’ standards. A particular problem is the use of a digital database of Rosstat (Federal State Statistics Service, as to the specific fixed assets the comparison should be carried
Testing University Rankings Statistically: Why this Perhaps is not such a Good Idea after All. Some Reflections on Statistical Power, Effect Size, Random Sampling and Imaginary Populations

DEFF Research Database (Denmark)

Schneider, Jesper Wiborg

2012-01-01

In this paper we discuss and question the use of statistical significance tests in relation to university rankings as recently suggested. We outline the assumptions behind and interpretations of statistical significance tests and relate this to examples from the recent SCImago Institutions Rankin...
Statistical testing and power analysis for brain-wide association study.

Science.gov (United States)

Gong, Weikang; Wan, Lin; Lu, Wenlian; Ma, Liang; Cheng, Fan; Cheng, Wei; Grünewald, Stefan; Feng, Jianfeng

2018-04-05

The identification of connexel-wise associations, which involves examining functional connectivities between pairwise voxels across the whole brain, is both statistically and computationally challenging. Although such a connexel-wise methodology has recently been adopted by brain-wide association studies (BWAS) to identify connectivity changes in several mental disorders, such as schizophrenia, autism and depression, the multiple correction and power analysis methods designed specifically for connexel-wise analysis are still lacking. Therefore, we herein report the development of a rigorous statistical framework for connexel-wise significance testing based on the Gaussian random field theory. It includes controlling the family-wise error rate (FWER) of multiple hypothesis testings using topological inference methods, and calculating power and sample size for a connexel-wise study. Our theoretical framework can control the false-positive rate accurately, as validated empirically using two resting-state fMRI datasets. Compared with Bonferroni correction and false discovery rate (FDR), it can reduce false-positive rate and increase statistical power by appropriately utilizing the spatial information of fMRI data. Importantly, our method bypasses the need of non-parametric permutation to correct for multiple comparison, thus, it can efficiently tackle large datasets with high resolution fMRI images. The utility of our method is shown in a case-control study. Our approach can identify altered functional connectivities in a major depression disorder dataset, whereas existing methods fail. A software package is available at https://github.com/weikanggong/BWAS. Copyright © 2018 Elsevier B.V. All rights reserved.
Intermetallics structures, properties, and statistics

CERN Document Server

Steurer, Walter

2016-01-01

The focus of this book is clearly on the statistics, topology, and geometry of crystal structures and crystal structure types. This allows one to uncover important structural relationships and to illustrate the relative simplicity of most of the general structural building principles. It also allows one to show that a large variety of actual structures can be related to a rather small number of aristotypes. It is important that this book is readable and beneficial in the one way or another for everyone interested in intermetallic phases, from graduate students to experts in solid-state chemistry/physics/materials science. For that purpose it avoids using an enigmatic abstract terminology for the classification of structures. The focus on the statistical analysis of structures and structure types should be seen as an attempt to draw the background of the big picture of intermetallics, and to point to the white spots in it, which could be worthwhile exploring. This book was not planned as a textbook; rather, it...
Improvement in Generic Problem-Solving Abilities of Students by Use of Tutor-less Problem-Based Learning in a Large Classroom Setting

Science.gov (United States)

Klegeris, Andis; Bahniwal, Manpreet; Hurren, Heather

2013-01-01

Problem-based learning (PBL) was originally introduced in medical education programs as a form of small-group learning, but its use has now spread to large undergraduate classrooms in various other disciplines. Introduction of new teaching techniques, including PBL-based methods, needs to be justified by demonstrating the benefits of such techniques over classical teaching styles. Previously, we demonstrated that introduction of tutor-less PBL in a large third-year biochemistry undergraduate class increased student satisfaction and attendance. The current study assessed the generic problem-solving abilities of students from the same class at the beginning and end of the term, and compared student scores with similar data obtained in three classes not using PBL. Two generic problem-solving tests of equal difficulty were administered such that students took different tests at the beginning and the end of the term. Blinded marking showed a statistically significant 13% increase in the test scores of the biochemistry students exposed to PBL, while no trend toward significant change in scores was observed in any of the control groups not using PBL. Our study is among the first to demonstrate that use of tutor-less PBL in a large classroom leads to statistically significant improvement in generic problem-solving skills of students. PMID:23463230
Uncertainty the soul of modeling, probability & statistics

CERN Document Server

Briggs, William

2016-01-01

This book presents a philosophical approach to probability and probabilistic thinking, considering the underpinnings of probabilistic reasoning and modeling, which effectively underlie everything in data science. The ultimate goal is to call into question many standard tenets and lay the philosophical and probabilistic groundwork and infrastructure for statistical modeling. It is the first book devoted to the philosophy of data aimed at working scientists and calls for a new consideration in the practice of probability and statistics to eliminate what has been referred to as the "Cult of Statistical Significance". The book explains the philosophy of these ideas and not the mathematics, though there are a handful of mathematical examples. The topics are logically laid out, starting with basic philosophy as related to probability, statistics, and science, and stepping through the key probabilistic ideas and concepts, and ending with statistical models. Its jargon-free approach asserts that standard methods, suc...
Application of nonparametric statistics to material strength/reliability assessment

International Nuclear Information System (INIS)

Arai, Taketoshi

1992-01-01

An advanced material technology requires data base on a wide variety of material behavior which need to be established experimentally. It may often happen that experiments are practically limited in terms of reproducibility or a range of test parameters. Statistical methods can be applied to understanding uncertainties in such a quantitative manner as required from the reliability point of view. Statistical assessment involves determinations of a most probable value and the maximum and/or minimum value as one-sided or two-sided confidence limit. A scatter of test data can be approximated by a theoretical distribution only if the goodness of fit satisfies a test criterion. Alternatively, nonparametric statistics (NPS) or distribution-free statistics can be applied. Mathematical procedures by NPS are well established for dealing with most reliability problems. They handle only order statistics of a sample. Mathematical formulas and some applications to engineering assessments are described. They include confidence limits of median, population coverage of sample, required minimum number of a sample, and confidence limits of fracture probability. These applications demonstrate that a nonparametric statistical estimation is useful in logical decision making in the case a large uncertainty exists. (author)
Investigating salt frost scaling by using statistical methods

DEFF Research Database (Denmark)

Hasholt, Marianne Tange; Clemmensen, Line Katrine Harder

2010-01-01

A large data set comprising data for 118 concrete mixes on mix design, air void structure, and the outcome of freeze/thaw testing according to SS 13 72 44 has been analysed by use of statistical methods. The results show that with regard to mix composition, the most important parameter...
Visualization and quantification of large bowel motility with functional cine-MRI

International Nuclear Information System (INIS)

Buhmann, S.; Wielage, C.; Fischer, T.; Reiser, M.; Lienemann, A.; Kirchhoff, C.; Mussack, T.

2005-01-01

Purpose: to develop and evaluate a method to visualize and quantify large bowel motility using functional cine MRI. Methods: fifteen healthy individuals (8males, 7 females, 20 to 45 years old) with no history or present symptoms of bowel disorders were enrolled in a functional cine MRI examination at 6 a. m. after a starving phase for at least eight hours before and after oral administration of Senna tea (mild stimulating purgative). Two consecutive sets of repeated measurements of the entire abdomen were performed using a 1.5T MRI system with coronal T2-weighted HASTE sequences anatomically adjusted to the course of the large bowel. A navigator technique was used for respiratory gating at the level of the right dorsal diaphragm. The changes in diameter (given in cm) were measured at 5 different locations of the ascending (AC), transverse (TC) and descending colon (DC), and assessed as parameters for the bowel motility. Results: the mean values as a statistical measure for large bowel relaxation were determined. Before ingestion of Senna tea, the mean diameter measured 3.41 cm (ascending colon), 3 cm (transverse colon) and 2.67 cm (descending colon). After the ingestion of Senna tea, the mean diameter increased to 3.69 cm (ascending colon) to 3.4 cm (transverse colon) and to 2.9 cm (descending colon). A statistically significant difference was demonstrated with the Wilcoxon test (level of confidence 0.05). For the determination of dynamic increase, the changes of the statistical scatter amplitude to the mean value were expressed as percentage before and after the ingestion of Senna tea. Thereby, an increase in variation and dynamic range was detected for the AC (112.9%) and DC (100%), but a decrease in the dynamics for the TC (69%). Conclusion: a non-invasive method for the assessment of bowel motility was developed for the first time. The use of functional cine MRI utilizing a prokinetic stimulus allowed visualisation and quantification of large bowel motility
Statistical characterization of the standard map

Science.gov (United States)

Ruiz, Guiomar; Tirnakli, Ugur; Borges, Ernesto P.; Tsallis, Constantino

2017-06-01

The standard map, paradigmatic conservative system in the (x, p) phase space, has been recently shown (Tirnakli and Borges (2016 Sci. Rep. 6 23644)) to exhibit interesting statistical behaviors directly related to the value of the standard map external parameter K. A comprehensive statistical numerical description is achieved in the present paper. More precisely, for large values of K (e.g. K = 10) where the Lyapunov exponents are neatly positive over virtually the entire phase space consistently with Boltzmann-Gibbs (BG) statistics, we verify that the q-generalized indices related to the entropy production q{ent} , the sensitivity to initial conditions q{sen} , the distribution of a time-averaged (over successive iterations) phase-space coordinate q{stat} , and the relaxation to the equilibrium final state q{rel} , collapse onto a fixed point, i.e. q{ent}=q{sen}=q{stat}=q{rel}=1 . In remarkable contrast, for small values of K (e.g. K = 0.2) where the Lyapunov exponents are virtually zero over the entire phase space, we verify q{ent}=q{sen}=0 , q{stat} ≃ 1.935 , and q{rel} ≃1.4 . The situation corresponding to intermediate values of K, where both stable orbits and a chaotic sea are present, is discussed as well. The present results transparently illustrate when BG behavior and/or q-statistical behavior are observed.
Chromatographic lipophilicity determination using large volume injections of the solvents non-miscible with the mobile phase.

Science.gov (United States)

Sârbu, Costel; Naşcu-Briciu, Rodica Domnica; Casoni, Dorina; Kot-Wasik, Agata; Wasik, Andrzej; Namieśnik, Jacek

2012-11-30

A new perspective in the lipophilicity evaluation through RP-HPLC is permitted by analysis of the retention factor (k) obtained by injecting large volumes of test samples prepared in solvents immiscible with mobile phase. The experiment is carried out on representative groups of compounds with increased toxicity (mycotoxins and alkaloids) and amines with important biological activity (naturally occurring monoamine compounds and related drugs), which are covering a large interval of lipophilicity. The stock solution of each compound was prepared in hexane and the used mobile phases were mixtures of methanol or acetonitrile and water, in suited volume ratio. The injected volume was between 10 and 100 μL, while the used stationary phases were RP-18 and RP-8. On both reverse stationary phases the retention factors were linearly decreasing while the injection volume was increasing. In all cases, the linear models were highly statistically significant. On the basis of the obtained results new lipophilicity indices were purposed and discussed. The developed lipophilicity indices and the computationally expressed ones are correlated at a high level of statistical significance. Copyright © 2012 Elsevier B.V. All rights reserved.
Statistical methods to evaluate thermoluminescence ionizing radiation dosimetry data

International Nuclear Information System (INIS)

Segre, Nadia; Matoso, Erika; Fagundes, Rosane Correa

2011-01-01

Ionizing radiation levels, evaluated through the exposure of CaF 2 :Dy thermoluminescence dosimeters (TLD- 200), have been monitored at Centro Experimental Aramar (CEA), located at Ipero in Sao Paulo state, Brazil, since 1991 resulting in a large amount of measurements until 2009 (more than 2,000). The data amount associated with measurements dispersion, since every process has deviation, reinforces the utilization of statistical tools to evaluate the results, procedure also imposed by the Brazilian Standard CNEN-NN-3.01/PR- 3.01-008 which regulates the radiometric environmental monitoring. Thermoluminescence ionizing radiation dosimetry data are statistically compared in order to evaluate potential CEA's activities environmental impact. The statistical tools discussed in this work are box plots, control charts and analysis of variance. (author)

Differentiation of large (≥5 cm) gastrointestinal stromal tumors from benign subepithelial tumors in the stomach: Radiologists’ performance using CT

Energy Technology Data Exchange (ETDEWEB)

Choi, Ye Ra [Department of Radiology, Seoul National University Hospital (Korea, Republic of); Kim, Se Hyung, E-mail: shkim7071@gmail.com [Department of Radiology, Seoul National University Hospital (Korea, Republic of); The Institute of Radiation Medicine, Seoul National University Hospital (Korea, Republic of); Kim, Sun-Ah [Department of Radiology, Seoul National University Hospital (Korea, Republic of); Shin, Cheong-il [Department of Radiology, Seoul National University Hospital (Korea, Republic of); The Institute of Radiation Medicine, Seoul National University Hospital (Korea, Republic of); Kim, Hyung Jin; Kim, Seong Ho [Department of Radiology, Seoul National University Hospital (Korea, Republic of); Han, Joon Koo; Choi, Byung Ihn [Department of Radiology, Seoul National University Hospital (Korea, Republic of); The Institute of Radiation Medicine, Seoul National University Hospital (Korea, Republic of)

2014-02-15

Purpose: To identify significant CT findings for the differentiation of large (≥5 cm) gastric gastrointestinal stromal tumors (GIST) from benign subepithelial tumors and to assess whether radiologists’ performance in differentiation is improved with knowledge of significant CT criteria. Materials and methods: One-hundred twenty patients with pathologically proven large (≥5 cm) GISTs (n = 99), schwannomas (n = 16), and leiomyomas (n = 5) who underwent CT were enrolled. Two radiologists (A and B) retrospectively reviewed their CT images in consensus for the location, size, degree and pattern of enhancement, contour, growth pattern and the presence of calcification, necrosis, surface ulceration, or enlarged lymph nodes. CT findings considered significant for differentiation were determined using uni- and multivariate statistical analyses. Thereafter, two successive review sessions for the differentiation of GIST from non-GIST were independently performed by two other reviewers (C and D) with different expertise of 2 and 9 years using a 5-point confidence scale. At the first session, reviewers interpreted CT images without knowledge of significant CT findings. At the second session, the results of statistical analyses were provided to the reviewers. To assess improvement in radiologists’ performance, a pairwise comparison of receiver operating curves (ROC) was performed. Results: Heterogeneous enhancement, presence of necrosis, absence of lymph nodes, and mean size of ≥6 cm were found to be significant for differentiating GIST from schwannoma (P < 0.05). Non-cardial location, heterogeneous enhancement, and presence of necrosis were differential CT features of GIST from leiomyoma (P < 0.05). Multivariate analyses indicated that absence of enlarged LNs was the only statistically significant variable for GIST differentiating from schwannoma. The area under the curve of both reviewers obtained using ROC significantly increased from 0.682 and 0.613 to 0.903 and 0
Statistical process control in nursing research.

Science.gov (United States)

Polit, Denise F; Chaboyer, Wendy

2012-02-01

In intervention studies in which randomization to groups is not possible, researchers typically use quasi-experimental designs. Time series designs are strong quasi-experimental designs but are seldom used, perhaps because of technical and analytic hurdles. Statistical process control (SPC) is an alternative analytic approach to testing hypotheses about intervention effects using data collected over time. SPC, like traditional statistical methods, is a tool for understanding variation and involves the construction of control charts that distinguish between normal, random fluctuations (common cause variation), and statistically significant special cause variation that can result from an innovation. The purpose of this article is to provide an overview of SPC and to illustrate its use in a study of a nursing practice improvement intervention. Copyright © 2011 Wiley Periodicals, Inc.
Redesigning a Large Introductory Course to Incorporate the GAISE Guidelines

Science.gov (United States)

Woodard, Roger; McGowan, Herle

2012-01-01

In 2005, the "Guidelines for Assessment and Instruction in Statistics Education" (GAISE) college report described several recommendations for teaching introductory statistics. This paper discusses how a large multi-section introductory course was redesigned in order to implement these recommendations. The experience described discusses…
Statistics Anxiety and Business Statistics: The International Student

Science.gov (United States)

Bell, James A.

2008-01-01

Does the international student suffer from statistics anxiety? To investigate this, the Statistics Anxiety Rating Scale (STARS) was administered to sixty-six beginning statistics students, including twelve international students and fifty-four domestic students. Due to the small number of international students, nonparametric methods were used to…
Large-field image intensifiers versus conventional chest radiography: ROC study with simulated interstitial disease

International Nuclear Information System (INIS)

Winter, L.H.L.; Chakraborty, D.P.; Waes, P.F.G.M.

1988-01-01

Two image intensifier tubes have recently been introduced whose large imaging area makes them suitable for chest imaging (Phillips Pulmodiagnost TLX slit II and Siemens TX 57 large entrance field II). Both modalities present a 10 x 10-cm hard copy image to the radiologist. A receiver operating characteristic (ROC) curve study with simulated interstitial disease was performed to compare the image quality of these image intensifiers with conventional chest images. The relative ranking in terms of decreasing ROC areas was Siemens, conventional, and Philips. Compared with conventional imaging, none of the differences in ROC curve area were statistically significant at the 5% level
Changing world extreme temperature statistics

Science.gov (United States)

Finkel, J. M.; Katz, J. I.

2018-04-01

We use the Global Historical Climatology Network--daily database to calculate a nonparametric statistic that describes the rate at which all-time daily high and low temperature records have been set in nine geographic regions (continents or major portions of continents) during periods mostly from the mid-20th Century to the present. This statistic was defined in our earlier work on temperature records in the 48 contiguous United States. In contrast to this earlier work, we find that in every region except North America all-time high records were set at a rate significantly (at least $3\\sigma$) higher than in the null hypothesis of a stationary climate. Except in Antarctica, all-time low records were set at a rate significantly lower than in the null hypothesis. In Europe, North Africa and North Asia the rate of setting new all-time highs increased suddenly in the 1990's, suggesting a change in regional climate regime; in most other regions there was a steadier increase.
Empirical Correction to the Likelihood Ratio Statistic for Structural Equation Modeling with Many Variables.

Science.gov (United States)

Yuan, Ke-Hai; Tian, Yubin; Yanagihara, Hirokazu

2015-06-01

Survey data typically contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. The most widely used statistic for evaluating the adequacy of a SEM model is T ML, a slight modification to the likelihood ratio statistic. Under normality assumption, T ML approximately follows a chi-square distribution when the number of observations (N) is large and the number of items or variables (p) is small. However, in practice, p can be rather large while N is always limited due to not having enough participants. Even with a relatively large N, empirical results show that T ML rejects the correct model too often when p is not too small. Various corrections to T ML have been proposed, but they are mostly heuristic. Following the principle of the Bartlett correction, this paper proposes an empirical approach to correct T ML so that the mean of the resulting statistic approximately equals the degrees of freedom of the nominal chi-square distribution. Results show that empirically corrected statistics follow the nominal chi-square distribution much more closely than previously proposed corrections to T ML, and they control type I errors reasonably well whenever N ≥ max(50,2p). The formulations of the empirically corrected statistics are further used to predict type I errors of T ML as reported in the literature, and they perform well.
Introductory statistics for the behavioral sciences

CERN Document Server

Welkowitz, Joan; Cohen, Jacob

1971-01-01

Introductory Statistics for the Behavioral Sciences provides an introduction to statistical concepts and principles. This book emphasizes the robustness of parametric procedures wherein such significant tests as t and F yield accurate results even if such assumptions as equal population variances and normal population distributions are not well met.Organized into three parts encompassing 16 chapters, this book begins with an overview of the rationale upon which much of behavioral science research is based, namely, drawing inferences about a population based on data obtained from a samp
Statistical study of auroral fragmentation into patches

Science.gov (United States)

Hashimoto, Ayumi; Shiokawa, Kazuo; Otsuka, Yuichi; Oyama, Shin-ichiro; Nozawa, Satonori; Hori, Tomoaki; Lester, Mark; Johnsen, Magnar Gullikstad

2015-08-01

The study of auroral dynamics is important when considering disturbances of the magnetosphere. Shiokawa et al. (2010, 2014) reported observations of finger-like auroral structures that cause auroral fragmentation. Those structures are probably produced by macroscopic instabilities in the magnetosphere, mainly of the Rayleigh-Taylor type. However, the statistical characteristics of these structures have not yet been investigated. Here based on observations by an all-sky imager at Tromsø (magnetic latitude = 67.1°N), Norway, over three winter seasons, we statistically analyzed the occurrence conditions of 14 large-scale finger-like structures that developed from large-scale auroral regions including arcs and 6 small-scale finger-like structures that developed in auroral patches. The large-scale structures were seen from midnight to dawn local time and usually appeared at the beginning of the substorm recovery phase, near the low-latitude boundary of the auroral region. The small-scale structures were primarily seen at dawn and mainly occurred in the late recovery phase of substorms. The sizes of these large- and small-scale structures mapped in the magnetospheric equatorial plane are usually larger than the gyroradius of 10 keV protons, indicating that the finger-like structures could be caused by magnetohydrodynamic instabilities. However, the scale of small structures is only twice the gyroradius of 10 keV protons, suggesting that finite Larmor radius effects may contribute to the formation of small-scale structures. The eastward propagation velocities of the structures are -40 to +200 m/s and are comparable with those of plasma drift velocities measured by the colocating Super Dual Auroral Radar Network radar.
Tail-constraining stochastic linear–quadratic control: a large deviation and statistical physics approach

International Nuclear Information System (INIS)

Chertkov, Michael; Kolokolov, Igor; Lebedev, Vladimir

2012-01-01

The standard definition of the stochastic risk-sensitive linear–quadratic (RS-LQ) control depends on the risk parameter, which is normally left to be set exogenously. We reconsider the classical approach and suggest two alternatives, resolving the spurious freedom naturally. One approach consists in seeking for the minimum of the tail of the probability distribution function (PDF) of the cost functional at some large fixed value. Another option suggests minimizing the expectation value of the cost functional under a constraint on the value of the PDF tail. Under the assumption of resulting control stability, both problems are reduced to static optimizations over a stationary control matrix. The solutions are illustrated using the examples of scalar and 1D chain (string) systems. The large deviation self-similar asymptotic of the cost functional PDF is analyzed. (paper)
A genetic study of Factor V Leiden (G1691A) mutation in young ischemic strokes with large vessel disease in a South Indian population.

Science.gov (United States)

Anadure, Ravi; Christopher, Rita; Nagaraja, Dindagur; Narayanan, Coimbatore

2017-10-01

Factor V Leiden (FVL) has been, by far, the most investigated gene mutation, with 26 studies to date, on its role in arterial strokes. Overall, a meta-analysis of all these studies taken together showed that carriers of the Factor V Leiden allele were 1.33times more likely to develop arterial strokes when compared to controls. We subjected a highly select subset of young strokes, with large vessel infarcts, to genetic analysis for FVL mutation and compared them with matched healthy controls to look for a statistically significant association. In this prospective study, 6/120 cases (5%) and 2/120 controls (1.6%) were positive for heterozygous FVL (G1691A) mutation. The higher prevalence of FVL mutation in cases (5%) compared to controls (1.6%) did not show statistical significance with a Pearson's Chi square P value of 0.15. The Odds Ratio (OR) for risk of large vessel disease in FVL positive cases was 3.10 (95% CI of 0.61-15.7). FVL mutation (G1691A) in young Indian subjects with ischemic strokes does not seem to be significantly associated with large vessel disease. Copyright © 2017 Elsevier Ltd. All rights reserved.
Spreadsheets as tools for statistical computing and statistics education

OpenAIRE

Neuwirth, Erich

2000-01-01

Spreadsheets are an ubiquitous program category, and we will discuss their use in statistics and statistics education on various levels, ranging from very basic examples to extremely powerful methods. Since the spreadsheet paradigm is very familiar to many potential users, using it as the interface to statistical methods can make statistics more easily accessible.
Statistical Thermodynamics of Disperse Systems

DEFF Research Database (Denmark)

Shapiro, Alexander

1996-01-01

Principles of statistical physics are applied for the description of thermodynamic equilibrium in disperse systems. The cells of disperse systems are shown to possess a number of non-standard thermodynamic parameters. A random distribution of these parameters in the system is determined....... On the basis of this distribution, it is established that the disperse system has an additional degree of freedom called the macro-entropy. A large set of bounded ideal disperse systems allows exact evaluation of thermodynamic characteristics. The theory developed is applied to the description of equilibrium...
Order statistics & inference estimation methods

CERN Document Server

Balakrishnan, N

1991-01-01

The literature on order statistics and inferenc eis quite extensive and covers a large number of fields ,but most of it is dispersed throughout numerous publications. This volume is the consolidtion of the most important results and places an emphasis on estimation. Both theoretical and computational procedures are presented to meet the needs of researchers, professionals, and students. The methods of estimation discussed are well-illustrated with numerous practical examples from both the physical and life sciences, including sociology,psychology,a nd electrical and chemical engineering. A co
Tests and Confidence Intervals for an Extended Variance Component Using the Modified Likelihood Ratio Statistic

DEFF Research Database (Denmark)

Christensen, Ole Fredslund; Frydenberg, Morten; Jensen, Jens Ledet

2005-01-01

The large deviation modified likelihood ratio statistic is studied for testing a variance component equal to a specified value. Formulas are presented in the general balanced case, whereas in the unbalanced case only the one-way random effects model is studied. Simulation studies are presented......, showing that the normal approximation to the large deviation modified likelihood ratio statistic gives confidence intervals for variance components with coverage probabilities very close to the nominal confidence coefficient....
Environmental restoration and statistics: Issues and needs

International Nuclear Information System (INIS)

Gilbert, R.O.

1991-10-01

Statisticians have a vital role to play in environmental restoration (ER) activities. One facet of that role is to point out where additional work is needed to develop statistical sampling plans and data analyses that meet the needs of ER. This paper is an attempt to show where statistics fits into the ER process. The statistician, as member of the ER planning team, works collaboratively with the team to develop the site characterization sampling design, so that data of the quality and quantity required by the specified data quality objectives (DQOs) are obtained. At the same time, the statistician works with the rest of the planning team to design and implement, when appropriate, the observational approach to streamline the ER process and reduce costs. The statistician will also provide the expertise needed to select or develop appropriate tools for statistical analysis that are suited for problems that are common to waste-site data. These data problems include highly heterogeneous waste forms, large variability in concentrations over space, correlated data, data that do not have a normal (Gaussian) distribution, and measurements below detection limits. Other problems include environmental transport and risk models that yield highly uncertain predictions, and the need to effectively communicate to the public highly technical information, such as sampling plans, site characterization data, statistical analysis results, and risk estimates. Even though some statistical analysis methods are available ''off the shelf'' for use in ER, these problems require the development of additional statistical tools, as discussed in this paper. 29 refs
An Exploration of the Perceived Usefulness of the Introductory Statistics Course and Students’ Intentions to Further Engage in Statistics

Directory of Open Access Journals (Sweden)

Rossi Hassad

2018-01-01

Full Text Available Students� attitude, including perceived usefulness, is generally associated with academic success. The related research in statistics education has focused almost exclusively on the role of attitude in explaining and predicting academic learning outcomes, hence there is a paucity of research evidence on how attitude (particularly perceived usefulness impacts students� intentions to use and stay engaged in statistics beyond the introductory course. This study explored the relationship between college students� perception of the usefulness of an introductory statistics course, their beliefs about where statistics will be most useful, and their intentions to take another statistics course. A cross-sectional study of 106 students was conducted. The mean rating for usefulness was 4.7 (out of 7, with no statistically significant differences based on gender and age. Sixty-four percent reported that they would consider taking another statistics course, and this subgroup rated the course as more useful (p = .01. The majority (67% reported that statistics would be most useful for either graduate school or research, whereas 14% indicated their job, and 19% were undecided. The �undecided� students had the lowest mean rating for usefulness of the course (p = .001. Addressing data, in the context of real-world problem-solving and decision-making, could facilitate students to better appreciate the usefulness and practicality of statistics. Qualitative research methods could help to elucidate these findings.
Sentinel node status prediction by four statistical models: results from a large bi-institutional series (n = 1132).

Science.gov (United States)

Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R

2009-12-01

To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.
Data-driven inference for the spatial scan statistic

Directory of Open Access Journals (Sweden)

Duczmal Luiz H

2011-08-01

Full Text Available Abstract Background Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. Results A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. Conclusions A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
Register-based statistics statistical methods for administrative data

CERN Document Server

Wallgren, Anders

2014-01-01

This book provides a comprehensive and up to date treatment of theory and practical implementation in Register-based statistics. It begins by defining the area, before explaining how to structure such systems, as well as detailing alternative approaches. It explains how to create statistical registers, how to implement quality assurance, and the use of IT systems for register-based statistics. Further to this, clear details are given about the practicalities of implementing such statistical methods, such as protection of privacy and the coordination and coherence of such an undertaking. Thi

Some links on this page may take you to non-federal websites. Their policies may differ from this site.