WorldWideScience

Sample records for non-parametric statistical test

  1. t-tests, non-parametric tests, and large studies—a paradox of statistical practice?

    Directory of Open Access Journals (Sweden)

    Fagerland Morten W

    2012-06-01

    Full Text Available Abstract Background During the last 30 years, the median sample size of research studies published in high-impact medical journals has increased manyfold, while the use of non-parametric tests has increased at the expense of t-tests. This paper explores this paradoxical practice and illustrates its consequences. Methods A simulation study is used to compare the rejection rates of the Wilcoxon-Mann-Whitney (WMW test and the two-sample t-test for increasing sample size. Samples are drawn from skewed distributions with equal means and medians but with a small difference in spread. A hypothetical case study is used for illustration and motivation. Results The WMW test produces, on average, smaller p-values than the t-test. This discrepancy increases with increasing sample size, skewness, and difference in spread. For heavily skewed data, the proportion of p Conclusions Non-parametric tests are most useful for small studies. Using non-parametric tests in large studies may provide answers to the wrong question, thus confusing readers. For studies with a large sample size, t-tests and their corresponding confidence intervals can and should be used even for heavily skewed data.

  2. COLOR IMAGE RETRIEVAL BASED ON NON-PARAMETRIC STATISTICAL TESTS OF HYPOTHESIS

    Directory of Open Access Journals (Sweden)

    R. Shekhar

    2016-09-01

    Full Text Available A novel method for color image retrieval, based on statistical non-parametric tests such as twosample Wald Test for equality of variance and Man-Whitney U test, is proposed in this paper. The proposed method tests the deviation, i.e. distance in terms of variance between the query and target images; if the images pass the test, then it is proceeded to test the spectrum of energy, i.e. distance between the mean values of the two images; otherwise, the test is dropped. If the query and target images pass the tests then it is inferred that the two images belong to the same class, i.e. both the images are same; otherwise, it is assumed that the images belong to different classes, i.e. both images are different. The proposed method is robust for scaling and rotation, since it adjusts itself and treats either the query image or the target image is the sample of other.

  3. A non-parametric statistical test to compare clusters with applications in functional magnetic resonance imaging data.

    Science.gov (United States)

    Fujita, André; Takahashi, Daniel Y; Patriota, Alexandre G; Sato, João R

    2014-12-10

    Statistical inference of functional magnetic resonance imaging (fMRI) data is an important tool in neuroscience investigation. One major hypothesis in neuroscience is that the presence or not of a psychiatric disorder can be explained by the differences in how neurons cluster in the brain. Therefore, it is of interest to verify whether the properties of the clusters change between groups of patients and controls. The usual method to show group differences in brain imaging is to carry out a voxel-wise univariate analysis for a difference between the mean group responses using an appropriate test and to assemble the resulting 'significantly different voxels' into clusters, testing again at cluster level. In this approach, of course, the primary voxel-level test is blind to any cluster structure. Direct assessments of differences between groups at the cluster level seem to be missing in brain imaging. For this reason, we introduce a novel non-parametric statistical test called analysis of cluster structure variability (ANOCVA), which statistically tests whether two or more populations are equally clustered. The proposed method allows us to compare the clustering structure of multiple groups simultaneously and also to identify features that contribute to the differential clustering. We illustrate the performance of ANOCVA through simulations and an application to an fMRI dataset composed of children with attention deficit hyperactivity disorder (ADHD) and controls. Results show that there are several differences in the clustering structure of the brain between them. Furthermore, we identify some brain regions previously not described to be involved in the ADHD pathophysiology, generating new hypotheses to be tested. The proposed method is general enough to be applied to other types of datasets, not limited to fMRI, where comparison of clustering structures is of interest. Copyright © 2014 John Wiley & Sons, Ltd.

  4. Using Mathematica to build Non-parametric Statistical Tables

    Directory of Open Access Journals (Sweden)

    Gloria Perez Sainz de Rozas

    2003-01-01

    Full Text Available In this paper, I present computational procedures to obtian statistical tables. The tables of the asymptotic distribution and the exact distribution of Kolmogorov-Smirnov statistic Dn for one population, the table of the distribution of the runs R, the table of the distribution of Wilcoxon signed-rank statistic W+ and the table of the distribution of Mann-Whitney statistic Ux using Mathematica, Version 3.9 under Window98. I think that it is an interesting cuestion because many statistical packages give the asymptotic significance level in the statistical tests and with these porcedures one can easily calculate the exact significance levels and the left-tail and right-tail probabilities with non-parametric distributions. I have used mathematica to make these calculations because one can use symbolic language to solve recursion relations. It's very easy to generate the format of the tables, and it's possible to obtain any table of the mentioned non-parametric distributions with any precision, not only with the standard parameters more used in Statistics, and without transcription mistakes. Furthermore, using similar procedures, we can generate tables for the following distribution functions: Binomial, Poisson, Hypergeometric, Normal, x2 Chi-Square, T-Student, F-Snedecor, Geometric, Gamma and Beta.

  5. A Non-Parametric Spatial Independence Test Using Symbolic Entropy

    Directory of Open Access Journals (Sweden)

    López Hernández, Fernando

    2008-01-01

    Full Text Available In the present paper, we construct a new, simple, consistent and powerful test forspatial independence, called the SG test, by using symbolic dynamics and symbolic entropyas a measure of spatial dependence. We also give a standard asymptotic distribution of anaffine transformation of the symbolic entropy under the null hypothesis of independencein the spatial process. The test statistic and its standard limit distribution, with theproposed symbolization, are invariant to any monotonuous transformation of the data.The test applies to discrete or continuous distributions. Given that the test is based onentropy measures, it avoids smoothed nonparametric estimation. We include a MonteCarlo study of our test, together with the well-known Moran’s I, the SBDS (de Graaffet al, 2001 and (Brett and Pinkse, 1997 non parametric test, in order to illustrate ourapproach.

  6. Biological parametric mapping with robust and non-parametric statistics.

    Science.gov (United States)

    Yang, Xue; Beason-Held, Lori; Resnick, Susan M; Landman, Bennett A

    2011-07-15

    Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, regions of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrices. Recently, biological parametric mapping has extended the widely popular statistical parametric mapping approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and non-parametric regression in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provide a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities. Copyright © 2011 Elsevier Inc. All rights reserved.

  7. Non-parametric combination and related permutation tests for neuroimaging.

    Science.gov (United States)

    Winkler, Anderson M; Webster, Matthew A; Brooks, Jonathan C; Tracey, Irene; Smith, Stephen M; Nichols, Thomas E

    2016-04-01

    In this work, we show how permutation methods can be applied to combination analyses such as those that include multiple imaging modalities, multiple data acquisitions of the same modality, or simply multiple hypotheses on the same data. Using the well-known definition of union-intersection tests and closed testing procedures, we use synchronized permutations to correct for such multiplicity of tests, allowing flexibility to integrate imaging data with different spatial resolutions, surface and/or volume-based representations of the brain, including non-imaging data. For the problem of joint inference, we propose and evaluate a modification of the recently introduced non-parametric combination (NPC) methodology, such that instead of a two-phase algorithm and large data storage requirements, the inference can be performed in a single phase, with reasonable computational demands. The method compares favorably to classical multivariate tests (such as MANCOVA), even when the latter is assessed using permutations. We also evaluate, in the context of permutation tests, various combining methods that have been proposed in the past decades, and identify those that provide the best control over error rate and power across a range of situations. We show that one of these, the method of Tippett, provides a link between correction for the multiplicity of tests and their combination. Finally, we discuss how the correction can solve certain problems of multiple comparisons in one-way ANOVA designs, and how the combination is distinguished from conjunctions, even though both can be assessed using permutation tests. We also provide a common algorithm that accommodates combination and correction.

  8. Non-Parametric Tests of Structure for High Angular Resolution Diffusion Imaging in Q-Space

    CERN Document Server

    Olhede, Sofia C

    2010-01-01

    High angular resolution diffusion imaging data is the observed characteristic function for the local diffusion of water molecules in tissue. This data is used to infer structural information in brain imaging. Non-parametric scalar measures are proposed to summarize such data, and to locally characterize spatial features of the diffusion probability density function (PDF), relying on the geometry of the characteristic function. Summary statistics are defined so that their distributions are, to first order, both independent of nuisance parameters and also analytically tractable. The dominant direction of the diffusion at a spatial location (voxel) is determined, and a new set of axes are introduced in Fourier space. Variation quantified in these axes determines the local spatial properties of the diffusion density. Non-parametric hypothesis tests for determining whether the diffusion is unimodal, isotropic or multi-modal are proposed. More subtle characteristics of white-matter microstructure, such as the degre...

  9. Patterns of trunk muscle activation during walking and pole walking using statistical non-parametric mapping.

    Science.gov (United States)

    Zoffoli, Luca; Ditroilo, Massimiliano; Federici, Ario; Lucertini, Francesco

    2017-09-09

    This study used surface electromyography (EMG) to investigate the regions and patterns of activity of the external oblique (EO), erector spinae longissimus (ES), multifidus (MU) and rectus abdominis (RA) muscles during walking (W) and pole walking (PW) performed at different speeds and grades. Eighteen healthy adults undertook W and PW on a motorized treadmill at 60% and 100% of their walk-to-run preferred transition speed at 0% and 7% treadmill grade. The Teager-Kaiser energy operator was employed to improve the muscle activity detection and statistical non-parametric mapping based on paired t-tests was used to highlight statistical differences in the EMG patterns corresponding to different trials. The activation amplitude of all trunk muscles increased at high speed, while no differences were recorded at 7% treadmill grade. ES and MU appeared to support the upper body at the heel-strike during both W and PW, with the latter resulting in elevated recruitment of EO and RA as required to control for the longer stride and the push of the pole. Accordingly, the greater activity of the abdominal muscles and the comparable intervention of the spine extensors supports the use of poles by walkers seeking higher engagement of the lower trunk region. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Statistic Non-Parametric Methods of Measurement and Interpretation of Existing Statistic Connections within Seaside Hydro Tourism

    OpenAIRE

    MIRELA SECARĂ

    2008-01-01

    Tourism represents an important field of economic and social life in our country, and the main sector of the economy of Constanta County is the balneary touristic capitalization of Romanian seaside. In order to statistically analyze hydro tourism on Romanian seaside, we have applied non-parametric methods of measuring and interpretation of existing statistic connections within seaside hydro tourism. Major objective of this research is represented by hydro tourism re-establishment on Romanian ...

  11. Non-parametric Estimation approach in statistical investigation of nuclear spectra

    CERN Document Server

    Jafarizadeh, M A; Sabri, H; Maleki, B Rashidian

    2011-01-01

    In this paper, Kernel Density Estimation (KDE) as a non-parametric estimation method is used to investigate statistical properties of nuclear spectra. The deviation to regular or chaotic dynamics, is exhibited by closer distances to Poisson or Wigner limits respectively which evaluated by Kullback-Leibler Divergence (KLD) measure. Spectral statistics of different sequences prepared by nuclei corresponds to three dynamical symmetry limits of Interaction Boson Model(IBM), oblate and prolate nuclei and also the pairing effect on nuclear level statistics are analyzed (with pure experimental data). KD-based estimated density function, confirm previous predictions with minimum uncertainty (evaluated with Integrate Absolute Error (IAE)) in compare to Maximum Likelihood (ML)-based method. Also, the increasing of regularity degrees of spectra due to pairing effect is reveal.

  12. Non-parametric tests of productive efficiency with errors-in-variables

    NARCIS (Netherlands)

    Kuosmanen, T.K.; Post, T.; Scholtes, S.

    2007-01-01

    We develop a non-parametric test of productive efficiency that accounts for errors-in-variables, following the approach of Varian. [1985. Nonparametric analysis of optimizing behavior with measurement error. Journal of Econometrics 30(1/2), 445-458]. The test is based on the general Pareto-Koopmans

  13. A note on the use of the non-parametric Wilcoxon-Mann-Whitney test in the analysis of medical studies

    Directory of Open Access Journals (Sweden)

    Kühnast, Corinna

    2008-04-01

    Full Text Available Background: Although non-normal data are widespread in biomedical research, parametric tests unnecessarily predominate in statistical analyses. Methods: We surveyed five biomedical journals and – for all studies which contain at least the unpaired t-test or the non-parametric Wilcoxon-Mann-Whitney test – investigated the relationship between the choice of a statistical test and other variables such as type of journal, sample size, randomization, sponsoring etc. Results: The non-parametric Wilcoxon-Mann-Whitney was used in 30% of the studies. In a multivariable logistic regression the type of journal, the test object, the scale of measurement and the statistical software were significant. The non-parametric test was more common in case of non-continuous data, in high-impact journals, in studies in humans, and when the statistical software is specified, in particular when SPSS was used.

  14. Non-parametric three-way mixed ANOVA with aligned rank tests.

    Science.gov (United States)

    Oliver-Rodríguez, Juan C; Wang, X T

    2015-02-01

    Research problems that require a non-parametric analysis of multifactor designs with repeated measures arise in the behavioural sciences. There is, however, a lack of available procedures in commonly used statistical packages. In the present study, a generalization of the aligned rank test for the two-way interaction is proposed for the analysis of the typical sources of variation in a three-way analysis of variance (ANOVA) with repeated measures. It can be implemented in the usual statistical packages. Its statistical properties are tested by using simulation methods with two sample sizes (n = 30 and n = 10) and three distributions (normal, exponential and double exponential). Results indicate substantial increases in power for non-normal distributions in comparison with the usual parametric tests. Similar levels of Type I error for both parametric and aligned rank ANOVA were obtained with non-normal distributions and large sample sizes. Degrees-of-freedom adjustments for Type I error control in small samples are proposed. The procedure is applied to a case study with 30 participants per group where it detects gender differences in linguistic abilities in blind children not shown previously by other methods.

  15. Non-parametric Tuning of PID Controllers A Modified Relay-Feedback-Test Approach

    CERN Document Server

    Boiko, Igor

    2013-01-01

    The relay feedback test (RFT) has become a popular and efficient  tool used in process identification and automatic controller tuning. Non-parametric Tuning of PID Controllers couples new modifications of classical RFT with application-specific optimal tuning rules to form a non-parametric method of test-and-tuning. Test and tuning are coordinated through a set of common parameters so that a PID controller can obtain the desired gain or phase margins in a system exactly, even with unknown process dynamics. The concept of process-specific optimal tuning rules in the nonparametric setup, with corresponding tuning rules for flow, level pressure, and temperature control loops is presented in the text.   Common problems of tuning accuracy based on parametric and non-parametric approaches are addressed. In addition, the text treats the parametric approach to tuning based on the modified RFT approach and the exact model of oscillations in the system under test using the locus of a perturbedrelay system (LPRS) meth...

  16. Two new non-parametric tests to the distance duality relation with galaxy clusters

    CERN Document Server

    Costa, S S; Holanda, R F L

    2015-01-01

    The cosmic distance duality relation is a milestone of cosmology involving the luminosity and angular diameter distances. Any departure of the relation points to new physics or systematic errors in the observations, therefore tests of the relation are extremely important to build a consistent cosmological framework. Here, two new tests are proposed based on galaxy clusters observations (angular diameter distance and gas mass fraction) and $H(z)$ measurements. By applying Gaussian Processes, a non-parametric method, we are able to derive constraints on departures of the relation where no evidence of deviation is found in both methods, reinforcing the cosmological and astrophysical hypotheses adopted so far.

  17. The geometry of distributional preferences and a non-parametric identification approach: The Equality Equivalence Test.

    Science.gov (United States)

    Kerschbamer, Rudolf

    2015-05-01

    This paper proposes a geometric delineation of distributional preference types and a non-parametric approach for their identification in a two-person context. It starts with a small set of assumptions on preferences and shows that this set (i) naturally results in a taxonomy of distributional archetypes that nests all empirically relevant types considered in previous work; and (ii) gives rise to a clean experimental identification procedure - the Equality Equivalence Test - that discriminates between archetypes according to core features of preferences rather than properties of specific modeling variants. As a by-product the test yields a two-dimensional index of preference intensity.

  18. Robust non-parametric one-sample tests for the analysis of recurrent events.

    Science.gov (United States)

    Rebora, Paola; Galimberti, Stefania; Valsecchi, Maria Grazia

    2010-12-30

    One-sample non-parametric tests are proposed here for inference on recurring events. The focus is on the marginal mean function of events and the basis for inference is the standardized distance between the observed and the expected number of events under a specified reference rate. Different weights are considered in order to account for various types of alternative hypotheses on the mean function of the recurrent events process. A robust version and a stratified version of the test are also proposed. The performance of these tests was investigated through simulation studies under various underlying event generation processes, such as homogeneous and nonhomogeneous Poisson processes, autoregressive and renewal processes, with and without frailty effects. The robust versions of the test have been shown to be suitable in a wide variety of event generating processes. The motivating context is a study on gene therapy in a very rare immunodeficiency in children, where a major end-point is the recurrence of severe infections. Robust non-parametric one-sample tests for recurrent events can be useful to assess efficacy and especially safety in non-randomized studies or in epidemiological studies for comparison with a standard population.

  19. The application of non-parametric statistical techniques to an ALARA programme.

    Science.gov (United States)

    Moon, J H; Cho, Y H; Kang, C S

    2001-01-01

    For the cost-effective reduction of occupational radiation dose (ORD) at nuclear power plants, it is necessary to identify what are the processes of repetitive high ORD during maintenance and repair operations. To identify the processes, the point values such as mean and median are generally used, but they sometimes lead to misjudgment since they cannot show other important characteristics such as dose distributions and frequencies of radiation jobs. As an alternative, the non-parametric analysis method is proposed, which effectively identifies the processes of repetitive high ORD. As a case study, the method is applied to ORD data of maintenance and repair processes at Kori Units 3 and 4 that are pressurised water reactors with 950 MWe capacity and have been operating since 1986 and 1987 respectively, in Korea and the method is demonstrated to be an efficient way of analysing the data.

  20. Zero- vs. one-dimensional, parametric vs. non-parametric, and confidence interval vs. hypothesis testing procedures in one-dimensional biomechanical trajectory analysis.

    Science.gov (United States)

    Pataky, Todd C; Vanrenterghem, Jos; Robinson, Mark A

    2015-05-01

    Biomechanical processes are often manifested as one-dimensional (1D) trajectories. It has been shown that 1D confidence intervals (CIs) are biased when based on 0D statistical procedures, and the non-parametric 1D bootstrap CI has emerged in the Biomechanics literature as a viable solution. The primary purpose of this paper was to clarify that, for 1D biomechanics datasets, the distinction between 0D and 1D methods is much more important than the distinction between parametric and non-parametric procedures. A secondary purpose was to demonstrate that a parametric equivalent to the 1D bootstrap exists in the form of a random field theory (RFT) correction for multiple comparisons. To emphasize these points we analyzed six datasets consisting of force and kinematic trajectories in one-sample, paired, two-sample and regression designs. Results showed, first, that the 1D bootstrap and other 1D non-parametric CIs were qualitatively identical to RFT CIs, and all were very different from 0D CIs. Second, 1D parametric and 1D non-parametric hypothesis testing results were qualitatively identical for all six datasets. Last, we highlight the limitations of 1D CIs by demonstrating that they are complex, design-dependent, and thus non-generalizable. These results suggest that (i) analyses of 1D data based on 0D models of randomness are generally biased unless one explicitly identifies 0D variables before the experiment, and (ii) parametric and non-parametric 1D hypothesis testing provide an unambiguous framework for analysis when one׳s hypothesis explicitly or implicitly pertains to whole 1D trajectories.

  1. Does sunspot numbers cause global temperatures? A reconsideration using non-parametric causality tests

    Science.gov (United States)

    Hassani, Hossein; Huang, Xu; Gupta, Rangan; Ghodsi, Mansi

    2016-10-01

    In a recent paper, Gupta et al., (2015), analyzed whether sunspot numbers cause global temperatures based on monthly data covering the period 1880:1-2013:9. The authors find that standard time domain Granger causality test fails to reject the null hypothesis that sunspot numbers do not cause global temperatures for both full and sub-samples, namely 1880:1-1936:2, ​1936:3-1986:11 and 1986:12-2013:9 (identified based on tests of structural breaks). However, frequency domain causality test detects predictability for the full-sample at short (2-2.6 months) cycle lengths, but not the sub-samples. But since, full-sample causality cannot be relied upon due to structural breaks, Gupta et al., (2015) conclude that the evidence of causality running from sunspot numbers to global temperatures is weak and inconclusive. Given the importance of the issue of global warming, our current paper aims to revisit this issue of whether sunspot numbers cause global temperatures, using the same data set and sub-samples used by Gupta et al., (2015), based on an nonparametric Singular Spectrum Analysis (SSA)-based causality test. Based on this test, we however, show that sunspot numbers have predictive ability for global temperatures for the three sub-samples, over and above the full-sample. Thus, generally speaking, our non-parametric SSA-based causality test outperformed both time domain and frequency domain causality tests and highlighted that sunspot numbers have always been important in predicting global temperatures.

  2. Inferential, non-parametric statistics to assess the quality of probabilistic forecast systems

    NARCIS (Netherlands)

    Maia, A.H.N.; Meinke, H.B.; Lennox, S.; Stone, R.C.

    2007-01-01

    Many statistical forecast systems are available to interested users. To be useful for decision making, these systems must be based on evidence of underlying mechanisms. Once causal connections between the mechanism and its statistical manifestation have been firmly established, the forecasts must al

  3. Inferential, non-parametric statistics to assess the quality of probabilistic forecast systems

    NARCIS (Netherlands)

    Maia, A.H.N.; Meinke, H.B.; Lennox, S.; Stone, R.C.

    2007-01-01

    Many statistical forecast systems are available to interested users. To be useful for decision making, these systems must be based on evidence of underlying mechanisms. Once causal connections between the mechanism and its statistical manifestation have been firmly established, the forecasts must al

  4. Technical Topic 3.2.2.d Bayesian and Non-Parametric Statistics: Integration of Neural Networks with Bayesian Networks for Data Fusion and Predictive Modeling

    Science.gov (United States)

    2016-05-31

    Distribution Unlimited UU UU UU UU 31-05-2016 15-Apr-2014 14-Jan-2015 Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics...of Papers published in non peer-reviewed journals: Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics: Integration of Neural...Transfer N/A Number of graduating undergraduates who achieved a 3.5 GPA to 4.0 (4.0 max scale ): Number of graduating undergraduates funded by a DoD funded

  5. Non-parametric group-level statistics for source-resolved ERP analysis.

    Science.gov (United States)

    Lee, Clement; Miyakoshi, Makoto; Delorme, Arnaud; Cauwenberghs, Gert; Makeig, Scott

    2015-01-01

    We have developed a new statistical framework for group-level event-related potential (ERP) analysis in EEGLAB. The framework calculates the variance of scalp channel signals accounted for by the activity of homogeneous clusters of sources found by independent component analysis (ICA). When ICA data decomposition is performed on each subject's data separately, functionally equivalent ICs can be grouped into EEGLAB clusters. Here, we report a new addition (statPvaf) to the EEGLAB plug-in std_envtopo to enable inferential statistics on main effects and interactions in event related potentials (ERPs) of independent component (IC) processes at the group level. We demonstrate the use of the updated plug-in on simulated and actual EEG data.

  6. A Java program for non-parametric statistic comparison of community structure

    Directory of Open Access Journals (Sweden)

    WenJun Zhang

    2011-09-01

    Full Text Available The Java algorithm to statistically compare structure difference of two communities was presented in this study. Euclidean distance, Manhattan distance, Pearson correlation, Point correlation, quadratic correlation and Jaccard coefficient were included in the algorithm. The algorithm was used to compare rice arthropod communities in Pearl River Delta, China, and the results showed that the family composition of arthropods for Guangzhou, Zhongshan, Zhuhai, and Dongguan are not significantly different.

  7. Non-parametric Bayesian mixture of sparse regressions with application towards feature selection for statistical downscaling

    Directory of Open Access Journals (Sweden)

    D. Das

    2014-04-01

    Full Text Available Climate projections simulated by Global Climate Models (GCM are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often precludes their application towards accurately assessing the effects of climate change on finer regional scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods are often performed for regional climate projections. Statistical downscaling (SD is based on the understanding that the regional climate is influenced by two factors – the large scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model which relates these features (predictors to a climatic variable of interest (predictand based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP, for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence relatively more generalizable than non-sparse alternatives, and lends to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling shows our method can lead to new insights.

  8. Applications of non-parametric statistics and analysis of variance on sample variances

    Science.gov (United States)

    Myers, R. H.

    1981-01-01

    Nonparametric methods that are available for NASA-type applications are discussed. An attempt will be made here to survey what can be used, to attempt recommendations as to when each would be applicable, and to compare the methods, when possible, with the usual normal-theory procedures that are avavilable for the Gaussion analog. It is important here to point out the hypotheses that are being tested, the assumptions that are being made, and limitations of the nonparametric procedures. The appropriateness of doing analysis of variance on sample variances are also discussed and studied. This procedure is followed in several NASA simulation projects. On the surface this would appear to be reasonably sound procedure. However, difficulties involved center around the normality problem and the basic homogeneous variance assumption that is mase in usual analysis of variance problems. These difficulties discussed and guidelines given for using the methods.

  9. Spatial Modeling of Rainfall Patterns over the Ebro River Basin Using Multifractality and Non-Parametric Statistical Techniques

    Directory of Open Access Journals (Sweden)

    José L. Valencia

    2015-11-01

    Full Text Available Rainfall, one of the most important climate variables, is commonly studied due to its great heterogeneity, which occasionally causes negative economic, social, and environmental consequences. Modeling the spatial distributions of rainfall patterns over watersheds has become a major challenge for water resources management. Multifractal analysis can be used to reproduce the scale invariance and intermittency of rainfall processes. To identify which factors are the most influential on the variability of multifractal parameters and, consequently, on the spatial distribution of rainfall patterns for different time scales in this study, universal multifractal (UM analysis—C1, α, and γs UM parameters—was combined with non-parametric statistical techniques that allow spatial-temporal comparisons of distributions by gradients. The proposed combined approach was applied to a daily rainfall dataset of 132 time-series from 1931 to 2009, homogeneously spatially-distributed across a 25 km × 25 km grid covering the Ebro River Basin. A homogeneous increase in C1 over the watershed and a decrease in α mainly in the western regions, were detected, suggesting an increase in the frequency of dry periods at different scales and an increase in the occurrence of rainfall process variability over the last decades.

  10. Non Parametric Statistical Analysis Research on College Students' Math Anxiety Generation Factors%大学生数学焦虑产生因素的非参数统计分析

    Institute of Scientific and Technical Information of China (English)

    范大付; 李春红

    2012-01-01

    The non-parametric statistics is a test method which does not involve the general parameter and does not depend on the distribution. By using the non-parametric statistics for analyzing and researching the factors of college students' math anxiety, we try to solve the negative effect for studying from math anxiety, and increase the academic achievement of the college students.%采用非参数统计方法中的Wilconxon秩和检验、Friedman检验、Mann-WhitneyU检验对大学生数学焦虑的5个主要影响因素进行了定量分析与评价,获得了数学焦虑产生因素的相关非参数统计结果,为解决数学焦虑所带来的学习负效应提供参考。

  11. A new non-parametric stationarity test of time series in the time domain

    KAUST Repository

    Jin, Lei

    2014-11-07

    © 2015 The Royal Statistical Society and Blackwell Publishing Ltd. We propose a new double-order selection test for checking second-order stationarity of a time series. To develop the test, a sequence of systematic samples is defined via Walsh functions. Then the deviations of the autocovariances based on these systematic samples from the corresponding autocovariances of the whole time series are calculated and the uniform asymptotic joint normality of these deviations over different systematic samples is obtained. With a double-order selection scheme, our test statistic is constructed by combining the deviations at different lags in the systematic samples. The null asymptotic distribution of the statistic proposed is derived and the consistency of the test is shown under fixed and local alternatives. Simulation studies demonstrate well-behaved finite sample properties of the method proposed. Comparisons with some existing tests in terms of power are given both analytically and empirically. In addition, the method proposed is applied to check the stationarity assumption of a chemical process viscosity readings data set.

  12. A simple 2D non-parametric resampling statistical approach to assess confidence in species identification in DNA barcoding--an alternative to likelihood and bayesian approaches.

    Science.gov (United States)

    Jin, Qian; He, Li-Jun; Zhang, Ai-Bing

    2012-01-01

    In the recent worldwide campaign for the global biodiversity inventory via DNA barcoding, a simple and easily used measure of confidence for assigning sequences to species in DNA barcoding has not been established so far, although the likelihood ratio test and the bayesian approach had been proposed to address this issue from a statistical point of view. The TDR (Two Dimensional non-parametric Resampling) measure newly proposed in this study offers users a simple and easy approach to evaluate the confidence of species membership in DNA barcoding projects. We assessed the validity and robustness of the TDR approach using datasets simulated under coalescent models, and an empirical dataset, and found that TDR measure is very robust in assessing species membership of DNA barcoding. In contrast to the likelihood ratio test and bayesian approach, the TDR method stands out due to simplicity in both concepts and calculations, with little in the way of restrictive population genetic assumptions. To implement this approach we have developed a computer program package (TDR1.0beta) freely available from ftp://202.204.209.200/education/video/TDR1.0beta.rar.

  13. Detection of Bistability in Phase Space of a Real Galaxy, using a New Non-parametric Bayesian Test of Hypothesis

    CERN Document Server

    Chakrabarty, Dalia

    2013-01-01

    In lieu of direct detection of dark matter, estimation of the distribution of the gravitational mass in distant galaxies is of crucial importance in Astrophysics. Typically, such estimation is performed using small samples of noisy, partially missing measurements - only some of the three components of the velocity and location vectors of individual particles that live in the galaxy are measurable. Such limitations of the available data in turn demands that simplifying model assumptions be undertaken. Thus, assuming that the phase space of a galaxy manifests simple symmetries - such as isotropy - allows for the learning of the density of the gravitational mass in galaxies. This is equivalent to assuming that the phase space $pdf$ from which the velocity and location vectors of galactic particles are sampled from, is an isotropic function of these vectors. We present a new non-parametric test of hypothesis that tests for relative support in two or more measured data sets of disparate sizes, for the undertaken m...

  14. When the Single Matters more than the Group (II): Addressing the Problem of High False Positive Rates in Single Case Voxel Based Morphometry Using Non-parametric Statistics.

    Science.gov (United States)

    Scarpazza, Cristina; Nichols, Thomas E; Seramondi, Donato; Maumet, Camille; Sartori, Giuseppe; Mechelli, Andrea

    2016-01-01

    In recent years, an increasing number of studies have used Voxel Based Morphometry (VBM) to compare a single patient with a psychiatric or neurological condition of interest against a group of healthy controls. However, the validity of this approach critically relies on the assumption that the single patient is drawn from a hypothetical population with a normal distribution and variance equal to that of the control group. In a previous investigation, we demonstrated that family-wise false positive error rate (i.e., the proportion of statistical comparisons yielding at least one false positive) in single case VBM are much higher than expected (Scarpazza et al., 2013). Here, we examine whether the use of non-parametric statistics, which does not rely on the assumptions of normal distribution and equal variance, would enable the investigation of single subjects with good control of false positive risk. We empirically estimated false positive rates (FPRs) in single case non-parametric VBM, by performing 400 statistical comparisons between a single disease-free individual and a group of 100 disease-free controls. The impact of smoothing (4, 8, and 12 mm) and type of pre-processing (Modulated, Unmodulated) was also examined, as these factors have been found to influence FPRs in previous investigations using parametric statistics. The 400 statistical comparisons were repeated using two independent, freely available data sets in order to maximize the generalizability of the results. We found that the family-wise error rate was 5% for increases and 3.6% for decreases in one data set; and 5.6% for increases and 6.3% for decreases in the other data set (5% nominal). Further, these results were not dependent on the level of smoothing and modulation. Therefore, the present study provides empirical evidence that single case VBM studies with non-parametric statistics are not susceptible to high false positive rates. The critical implication of this finding is that VBM can be used

  15. Non-Parametric, Closed-Loop Testing of Autonomy in Unmanned Aircraft Systems Project

    Data.gov (United States)

    National Aeronautics and Space Administration — The proposed Phase I program aims to develop new methods to support safety testing for integration of Unmanned Aircraft Systems into the National Airspace (NAS) with...

  16. Testing the Non-Parametric Conditional CAPM in the Brazilian Stock Market

    Directory of Open Access Journals (Sweden)

    Daniel Reed Bergmann

    2014-04-01

    Full Text Available This paper seeks to analyze if the variations of returns and systematic risks from Brazilian portfolios could be explained by the nonparametric conditional Capital Asset Pricing Model (CAPM by Wang (2002. There are four informational variables available to the investors: (i the Brazilian industrial production level; (ii the broad money supply M4; (iii the inflation represented by the Índice de Preços ao Consumidor Amplo (IPCA; and (iv the real-dollar exchange rate, obtained by PTAX dollar quotation.This study comprised the shares listed in the BOVESPA throughout January 2002 to December 2009. The test methodology developed by Wang (2002 and retorted to the Mexican context by Castillo-Spíndola (2006 was used. The observed results indicate that the nonparametric conditional model is relevant in explaining the portfolios’ returns of the sample considered for two among the four tested variables, M4 and PTAX dollar at 5% level of significance.

  17. Non-parametric asymptotic statistics for the Palm mark distribution of \\beta-mixing marked point processes

    CERN Document Server

    Heinrich, Lothar; Schmidt, Volker

    2012-01-01

    We consider spatially homogeneous marked point patterns in an unboundedly expanding convex sampling window. Our main objective is to identify the distribution of the typical mark by constructing an asymptotic \\chi^2-goodness-of-fit test. The corresponding test statistic is based on a natural empirical version of the Palm mark distribution and a smoothed covariance estimator which turns out to be mean-square consistent. Our approach does not require independent marks and allows dependences between the mark field and the point pattern. Instead we impose a suitable \\beta-mixing condition on the underlying stationary marked point process which can be checked for a number of Poisson-based models and, in particular, in the case of geostatistical marking. Our method needs a central limit theorem for \\beta-mixing random fields which is proved by extending Bernstein's blocking technique to non-cubic index sets and seems to be of interest in its own right. By large-scale model-based simulations the performance of our t...

  18. ANALYSIS OF TIED DATA: AN ALTERNATIVE NON-PARAMETRIC APPROACH

    Directory of Open Access Journals (Sweden)

    I. C. A. OYEKA

    2012-02-01

    Full Text Available This paper presents a non-parametric statistical method of analyzing two-sample data that makes provision for the possibility of ties in the data. A test statistic is developed and shown to be free of the effect of any possible ties in the data. An illustrative example is provided and the method is shown to compare favourably with its competitor; the Mann-Whitney test and is more powerful than the latter when there are ties.

  19. Characterizing Ipomopsis rubra (Polemoniaceae) germination under various thermal scenarios with non-parametric and semi-parametric statistical methods.

    Science.gov (United States)

    Pérez, Hector E; Kettner, Keith

    2013-10-01

    Time-to-event analysis represents a collection of relatively new, flexible, and robust statistical techniques for investigating the incidence and timing of transitions from one discrete condition to another. Plant biology is replete with examples of such transitions occurring from the cellular to population levels. However, application of these statistical methods has been rare in botanical research. Here, we demonstrate the use of non- and semi-parametric time-to-event and categorical data analyses to address questions regarding seed to seedling transitions of Ipomopsis rubra propagules exposed to various doses of constant or simulated seasonal diel temperatures. Seeds were capable of germinating rapidly to >90 % at 15-25 or 22/11-29/19 °C. Optimum temperatures for germination occurred at 25 or 29/19 °C. Germination was inhibited and seed viability decreased at temperatures ≥30 or 33/24 °C. Kaplan-Meier estimates of survivor functions indicated highly significant differences in temporal germination patterns for seeds exposed to fluctuating or constant temperatures. Extended Cox regression models specified an inverse relationship between temperature and the hazard of germination. Moreover, temperature and the temperature × day interaction had significant effects on germination response. Comparisons to reference temperatures and linear contrasts suggest that summer temperatures (33/24 °C) play a significant role in differential germination responses. Similarly, simple and complex comparisons revealed that the effects of elevated temperatures predominate in terms of components of seed viability. In summary, the application of non- and semi-parametric analyses provides appropriate, powerful data analysis procedures to address various topics in seed biology and more widespread use is encouraged.

  20. 统计软件R在非参数统计教学中的应用%Application of Statistical Software R in the Teaching of Non-Parametric Statistics

    Institute of Scientific and Technical Information of China (English)

    王志刚; 冯利英; 刘勇

    2012-01-01

    Introduces the applieation of statistical software R in the teaching of non-parametric statistic's, which is an important branch of statistics. In particular, describes the using of software R in ex- ploratory data analysis, inferential statistics and stochastic, simulation in details. The flexihle, open-sourc, e characteristics of software R makes the data processing more efficient. This soft- ware can realize all the methods of the teaching process, and is convenient fi~r learners to opti- mize and improve based on the previous work. R software is suitable for teaching of the non- parametric statistics.%主要介绍统计软件R在统计中一个重要分支非参数统计中的应用.分别从探索性数据分析、推断统计、随机模拟三个角度介绍R软件的应用。从介绍可以看出R软件的灵活、开源的特性,使得数据处理变得更加高效、得心应手。能够通过软件实现教学环节中的所有方法,并且方便学习者在前人工作基础上对方法进行优化、改进,在非参数统计教学中选用R软件是适合的。

  1. A new powerful non-parametric two-stage approach for testing multiple phenotypes in family-based association studies

    NARCIS (Netherlands)

    Lange, C; Lyon, H; DeMeo, D; Raby, B; Silverman, EK; Weiss, ST

    2003-01-01

    We introduce a new powerful nonparametric testing strategy for family-based association studies in which multiple quantitative traits are recorded and the phenotype with the strongest genetic component is not known prior to the analysis. In the first stage, using a population-based test based on the

  2. Short-term monitoring of benzene air concentration in an urban area: a preliminary study of application of Kruskal-Wallis non-parametric test to assess pollutant impact on global environment and indoor.

    Science.gov (United States)

    Mura, Maria Chiara; De Felice, Marco; Morlino, Roberta; Fuselli, Sergio

    2010-01-01

    In step with the need to develop statistical procedures to manage small-size environmental samples, in this work we have used concentration values of benzene (C6H6), concurrently detected by seven outdoor and indoor monitoring stations over 12 000 minutes, in order to assess the representativeness of collected data and the impact of the pollutant on indoor environment. Clearly, the former issue is strictly connected to sampling-site geometry, which proves critical to correctly retrieving information from analysis of pollutants of sanitary interest. Therefore, according to current criteria for network-planning, single stations have been interpreted as nodes of a set of adjoining triangles; then, a) node pairs have been taken into account in order to estimate pollutant stationarity on triangle sides, as well as b) node triplets, to statistically associate data from air-monitoring with the corresponding territory area, and c) node sextuplets, to assess the impact probability of the outdoor pollutant on indoor environment for each area. Distributions from the various node combinations are all non-Gaussian, in the consequently, Kruskal-Wallis (KW) non-parametric statistics has been exploited to test variability on continuous density function from each pair, triplet and sextuplet. Results from the above-mentioned statistical analysis have shown randomness of site selection, which has not allowed a reliable generalization of monitoring data to the entire selected territory, except for a single "forced" case (70%); most important, they suggest a possible procedure to optimize network design.

  3. Short-term monitoring of benzene air concentration in an urban area: a preliminary study of application of Kruskal-Wallis non-parametric test to assess pollutant impact on global environment and indoor

    Directory of Open Access Journals (Sweden)

    Maria Chiara Mura

    2010-12-01

    Full Text Available In step with the need to develop statistical procedures to manage small-size environmental samples, in this work we have used concentration values of benzene (C6H6, concurrently detected by seven outdoor and indoor monitoring stations over 12 000 minutes, in order to assess the representativeness of collected data and the impact of the pollutant on indoor environment. Clearly, the former issue is strictly connected to sampling-site geometry, which proves critical to correctly retrieving information from analysis of pollutants of sanitary interest. Therefore, according to current criteria for network-planning, single stations have been interpreted as nodes of a set of adjoining triangles; then, a node pairs have been taken into account in order to estimate pollutant stationarity on triangle sides, as well as b node triplets, to statistically associate data from air-monitoring with the corresponding territory area, and c node sextuplets, to assess the impact probability of the outdoor pollutant on indoor environment for each area. Distributions from the various node combinations are all non-Gaussian, in the consequently, Kruskal-Wallis (KW non-parametric statistics has been exploited to test variability on continuous density function from each pair, triplet and sextuplet. Results from the above-mentioned statistical analysis have shown randomness of site selection, which has not allowed a reliable generalization of monitoring data to the entire selected territory, except for a single "forced" case (70%; most important, they suggest a possible procedure to optimize network design.

  4. Non-Parametric Inference in Astrophysics

    CERN Document Server

    Wasserman, L H; Nichol, R C; Genovese, C; Jang, W; Connolly, A J; Moore, A W; Schneider, J; Wasserman, Larry; Miller, Christopher J.; Nichol, Robert C.; Genovese, Chris; Jang, Woncheol; Connolly, Andrew J.; Moore, Andrew W.; Schneider, Jeff; group, the PICA

    2001-01-01

    We discuss non-parametric density estimation and regression for astrophysics problems. In particular, we show how to compute non-parametric confidence intervals for the location and size of peaks of a function. We illustrate these ideas with recent data on the Cosmic Microwave Background. We also briefly discuss non-parametric Bayesian inference.

  5. Non-parametric approach to the study of phenotypic stability.

    Science.gov (United States)

    Ferreira, D F; Fernandes, S B; Bruzi, A T; Ramalho, M A P

    2016-02-19

    The aim of this study was to undertake the theoretical derivations of non-parametric methods, which use linear regressions based on rank order, for stability analyses. These methods were extension different parametric methods used for stability analyses and the result was compared with a standard non-parametric method. Intensive computational methods (e.g., bootstrap and permutation) were applied, and data from the plant-breeding program of the Biology Department of UFLA (Minas Gerais, Brazil) were used to illustrate and compare the tests. The non-parametric stability methods were effective for the evaluation of phenotypic stability. In the presence of variance heterogeneity, the non-parametric methods exhibited greater power of discrimination when determining the phenotypic stability of genotypes.

  6. Non-parametric partitioning of SAR images

    Science.gov (United States)

    Delyon, G.; Galland, F.; Réfrégier, Ph.

    2006-09-01

    We describe and analyse a generalization of a parametric segmentation technique adapted to Gamma distributed SAR images to a simple non parametric noise model. The partition is obtained by minimizing the stochastic complexity of a quantized version on Q levels of the SAR image and lead to a criterion without parameters to be tuned by the user. We analyse the reliability of the proposed approach on synthetic images. The quality of the obtained partition will be studied for different possible strategies. In particular, one will discuss the reliability of the proposed optimization procedure. Finally, we will precisely study the performance of the proposed approach in comparison with the statistical parametric technique adapted to Gamma noise. These studies will be led by analyzing the number of misclassified pixels, the standard Hausdorff distance and the number of estimated regions.

  7. Estimation of the limit of detection with a bootstrap-derived standard error by a partly non-parametric approach. Application to HPLC drug assays

    DEFF Research Database (Denmark)

    Linnet, Kristian

    2005-01-01

    Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors......Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors...

  8. Parametric and Non-Parametric System Modelling

    DEFF Research Database (Denmark)

    Nielsen, Henrik Aalborg

    1999-01-01

    considered. It is shown that adaptive estimation in conditional parametric models can be performed by combining the well known methods of local polynomial regression and recursive least squares with exponential forgetting. The approach used for estimation in conditional parametric models also highlights how....... For this purpose non-parametric methods together with additive models are suggested. Also, a new approach specifically designed to detect non-linearities is introduced. Confidence intervals are constructed by use of bootstrapping. As a link between non-parametric and parametric methods a paper dealing with neural...... the focus is on combinations of parametric and non-parametric methods of regression. This combination can be in terms of additive models where e.g. one or more non-parametric term is added to a linear regression model. It can also be in terms of conditional parametric models where the coefficients...

  9. Bayesian non parametric modelling of Higgs pair production

    Science.gov (United States)

    Scarpa, Bruno; Dorigo, Tommaso

    2017-03-01

    Statistical classification models are commonly used to separate a signal from a background. In this talk we face the problem of isolating the signal of Higgs pair production using the decay channel in which each boson decays into a pair of b-quarks. Typically in this context non parametric methods are used, such as Random Forests or different types of boosting tools. We remain in the same non-parametric framework, but we propose to face the problem following a Bayesian approach. A Dirichlet process is used as prior for the random effects in a logit model which is fitted by leveraging the Polya-Gamma data augmentation. Refinements of the model include the insertion in the simple model of P-splines to relate explanatory variables with the response and the use of Bayesian trees (BART) to describe the atoms in the Dirichlet process.

  10. Bayesian non parametric modelling of Higgs pair production

    Directory of Open Access Journals (Sweden)

    Scarpa Bruno

    2017-01-01

    Full Text Available Statistical classification models are commonly used to separate a signal from a background. In this talk we face the problem of isolating the signal of Higgs pair production using the decay channel in which each boson decays into a pair of b-quarks. Typically in this context non parametric methods are used, such as Random Forests or different types of boosting tools. We remain in the same non-parametric framework, but we propose to face the problem following a Bayesian approach. A Dirichlet process is used as prior for the random effects in a logit model which is fitted by leveraging the Polya-Gamma data augmentation. Refinements of the model include the insertion in the simple model of P-splines to relate explanatory variables with the response and the use of Bayesian trees (BART to describe the atoms in the Dirichlet process.

  11. Parametric versus non-parametric simulation

    OpenAIRE

    Dupeux, Bérénice; Buysse, Jeroen

    2014-01-01

    Most of ex-ante impact assessment policy models have been based on a parametric approach. We develop a novel non-parametric approach, called Inverse DEA. We use non parametric efficiency analysis for determining the farm’s technology and behaviour. Then, we compare the parametric approach and the Inverse DEA models to a known data generating process. We use a bio-economic model as a data generating process reflecting a real world situation where often non-linear relationships exist. Results s...

  12. 100 statistical tests

    CERN Document Server

    Kanji, Gopal K

    2006-01-01

    This expanded and updated Third Edition of Gopal K. Kanji's best-selling resource on statistical tests covers all the most commonly used tests with information on how to calculate and interpret results with simple datasets. Each entry begins with a short summary statement about the test's purpose, and contains details of the test objective, the limitations (or assumptions) involved, a brief outline of the method, a worked example, and the numerical calculation. 100 Statistical Tests, Third Edition is the one indispensable guide for users of statistical materials and consumers of statistical information at all levels and across all disciplines.

  13. Non-parametric Morphologies of Mergers in the Illustris Simulation

    CERN Document Server

    Bignone, Lucas A; Sillero, Emanuel; Pedrosa, Susana E; Pellizza, Leonardo J; Lambas, Diego G

    2016-01-01

    We study non-parametric morphologies of mergers events in a cosmological context, using the Illustris project. We produce mock g-band images comparable to observational surveys from the publicly available Illustris simulation idealized mock images at $z=0$. We then measure non parametric indicators: asymmetry, Gini, $M_{20}$, clumpiness and concentration for a set of galaxies with $M_* >10^{10}$ M$_\\odot$. We correlate these automatic statistics with the recent merger history of galaxies and with the presence of close companions. Our main contribution is to assess in a cosmological framework, the empirically derived non-parametric demarcation line and average time-scales used to determine the merger rate observationally. We found that 98 per cent of galaxies above the demarcation line have a close companion or have experienced a recent merger event. On average, merger signatures obtained from the $G-M_{20}$ criteria anticorrelate clearly with the elapsing time to the last merger event. We also find that the a...

  14. Using non-parametric methods in econometric production analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    2012-01-01

    by investigating the relationship between the elasticity of scale and the farm size. We use a balanced panel data set of 371~specialised crop farms for the years 2004-2007. A non-parametric specification test shows that neither the Cobb-Douglas function nor the Translog function are consistent with the "true......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...

  15. Log-concave Probability Distributions: Theory and Statistical Testing

    DEFF Research Database (Denmark)

    An, Mark Yuing

    1996-01-01

    This paper studies the broad class of log-concave probability distributions that arise in economics of uncertainty and information. For univariate, continuous, and log-concave random variables we prove useful properties without imposing the differentiability of density functions. Discrete...... and multivariate distributions are also discussed. We propose simple non-parametric testing procedures for log-concavity. The test statistics are constructed to test one of the two implicati ons of log-concavity: increasing hazard rates and new-is-better-than-used (NBU) property. The test for increasing hazard...... rates are based on normalized spacing of the sample order statistics. The tests for NBU property fall into the category of Hoeffding's U-statistics...

  16. Comparação de duas metodologias de amostragem atmosférica com ferramenta estatística não paramétrica Comparison of two atmospheric sampling methodologies with non-parametric statistical tools

    Directory of Open Access Journals (Sweden)

    Maria João Nunes

    2005-03-01

    Full Text Available In atmospheric aerosol sampling, it is inevitable that the air that carries particles is in motion, as a result of both externally driven wind and the sucking action of the sampler itself. High or low air flow sampling speeds may lead to significant particle size bias. The objective of this work is the validation of measurements enabling the comparison of species concentration from both air flow sampling techniques. The presence of several outliers and increase of residuals with concentration becomes obvious, requiring non-parametric methods, recommended for the handling of data which may not be normally distributed. This way, conversion factors are obtained for each of the various species under study using Kendall regression.

  17. Use of statistical tests and statistical software choice in 2014: tale from three Medline indexed Pakistani journals.

    Science.gov (United States)

    Shaikh, Masood Ali

    2016-04-01

    Statistical tests help infer meaningful conclusions from studies conducted and data collected. This descriptive study analyzed the type of statistical tests used and the statistical software utilized for analysis reported in the original articles published in 2014 by the three Medline-indexed journals of Pakistan. Cumulatively, 466 original articles were published in 2014. The most frequently reported statistical tests for original articles by all three journals were bivariate parametric and non-parametric tests i.e. involving comparisons between two groups e.g. Chi-square test, t-test, and various types of correlations. Cumulatively, 201 (43.1%) articles used these tests. SPSS was the primary choice for statistical analysis, as it was exclusively used in 374 (80.3%) original articles. There has been a substantial increase in the number of articles published, and in the sophistication of statistical tests used in the articles published in the Pakistani Medline indexed journals in 2014, compared to 2007.

  18. Nonparametric tests for censored data

    CERN Document Server

    Bagdonavicus, Vilijandas; Nikulin, Mikhail

    2013-01-01

    This book concerns testing hypotheses in non-parametric models. Generalizations of many non-parametric tests to the case of censored and truncated data are considered. Most of the test results are proved and real applications are illustrated using examples. Theories and exercises are provided. The incorrect use of many tests applying most statistical software is highlighted and discussed.

  19. Non-Parametric Estimation of Correlation Functions

    DEFF Research Database (Denmark)

    Brincker, Rune; Rytter, Anders; Krenk, Steen

    In this paper three methods of non-parametric correlation function estimation are reviewed and evaluated: the direct method, estimation by the Fast Fourier Transform and finally estimation by the Random Decrement technique. The basic ideas of the techniques are reviewed, sources of bias are pointed...... out, and methods to prevent bias are presented. The techniques are evaluated by comparing their speed and accuracy on the simple case of estimating auto-correlation functions for the response of a single degree-of-freedom system loaded with white noise....

  20. Lottery spending: a non-parametric analysis.

    Science.gov (United States)

    Garibaldi, Skip; Frisoli, Kayla; Ke, Li; Lim, Melody

    2015-01-01

    We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.

  1. Lottery spending: a non-parametric analysis.

    Directory of Open Access Journals (Sweden)

    Skip Garibaldi

    Full Text Available We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.

  2. Testing statistical hypotheses

    CERN Document Server

    Lehmann, E L

    2005-01-01

    The third edition of Testing Statistical Hypotheses updates and expands upon the classic graduate text, emphasizing optimality theory for hypothesis testing and confidence sets. The principal additions include a rigorous treatment of large sample optimality, together with the requisite tools. In addition, an introduction to the theory of resampling methods such as the bootstrap is developed. The sections on multiple testing and goodness of fit testing are expanded. The text is suitable for Ph.D. students in statistics and includes over 300 new problems out of a total of more than 760. E.L. Lehmann is Professor of Statistics Emeritus at the University of California, Berkeley. He is a member of the National Academy of Sciences and the American Academy of Arts and Sciences, and the recipient of honorary degrees from the University of Leiden, The Netherlands and the University of Chicago. He is the author of Elements of Large-Sample Theory and (with George Casella) he is also the author of Theory of Point Estimat...

  3. A non-parametric peak finder algorithm and its application in searches for new physics

    CERN Document Server

    Chekanov, S

    2011-01-01

    We have developed an algorithm for non-parametric fitting and extraction of statistically significant peaks in the presence of statistical and systematic uncertainties. Applications of this algorithm for analysis of high-energy collision data are discussed. In particular, we illustrate how to use this algorithm in general searches for new physics in invariant-mass spectra using pp Monte Carlo simulations.

  4. On Parametric (and Non-Parametric Variation

    Directory of Open Access Journals (Sweden)

    Neil Smith

    2009-11-01

    Full Text Available This article raises the issue of the correct characterization of ‘Parametric Variation’ in syntax and phonology. After specifying their theoretical commitments, the authors outline the relevant parts of the Principles–and–Parameters framework, and draw a three-way distinction among Universal Principles, Parameters, and Accidents. The core of the contribution then consists of an attempt to provide identity criteria for parametric, as opposed to non-parametric, variation. Parametric choices must be antecedently known, and it is suggested that they must also satisfy seven individually necessary and jointly sufficient criteria. These are that they be cognitively represented, systematic, dependent on the input, deterministic, discrete, mutually exclusive, and irreversible.

  5. Mathematical statistics

    CERN Document Server

    Pestman, Wiebe R

    2009-01-01

    This textbook provides a broad and solid introduction to mathematical statistics, including the classical subjects hypothesis testing, normal regression analysis, and normal analysis of variance. In addition, non-parametric statistics and vectorial statistics are considered, as well as applications of stochastic analysis in modern statistics, e.g., Kolmogorov-Smirnov testing, smoothing techniques, robustness and density estimation. For students with some elementary mathematical background. With many exercises. Prerequisites from measure theory and linear algebra are presented.

  6. Non-parametric estimation of Fisher information from real data

    CERN Document Server

    Shemesh, Omri Har; Miñano, Borja; Hoekstra, Alfons G; Sloot, Peter M A

    2015-01-01

    The Fisher Information matrix is a widely used measure for applications ranging from statistical inference, information geometry, experiment design, to the study of criticality in biological systems. Yet there is no commonly accepted non-parametric algorithm to estimate it from real data. In this rapid communication we show how to accurately estimate the Fisher information in a nonparametric way. We also develop a numerical procedure to minimize the errors by choosing the interval of the finite difference scheme necessary to compute the derivatives in the definition of the Fisher information. Our method uses the recently published "Density Estimation using Field Theory" algorithm to compute the probability density functions for continuous densities. We use the Fisher information of the normal distribution to validate our method and as an example we compute the temperature component of the Fisher Information Matrix in the two dimensional Ising model and show that it obeys the expected relation to the heat capa...

  7. Comparison of non-parametric methods for ungrouping coarsely aggregated data

    DEFF Research Database (Denmark)

    Rizzi, Silvia; Thinggaard, Mikael; Engholm, Gerda

    2016-01-01

    Background Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age...... methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate...... composite link model performs the best. Conclusion We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized...

  8. A non-parametric method for correction of global radiation observations

    DEFF Research Database (Denmark)

    Bacher, Peder; Madsen, Henrik; Perers, Bengt;

    2013-01-01

    in the observations are corrected. These are errors such as: tilt in the leveling of the sensor, shadowing from surrounding objects, clipping and saturation in the signal processing, and errors from dirt and wear. The method is based on a statistical non-parametric clear-sky model which is applied to both...

  9. Binary Classifier Calibration Using a Bayesian Non-Parametric Approach.

    Science.gov (United States)

    Naeini, Mahdi Pakdaman; Cooper, Gregory F; Hauskrecht, Milos

    Learning probabilistic predictive models that are well calibrated is critical for many prediction and decision-making tasks in Data mining. This paper presents two new non-parametric methods for calibrating outputs of binary classification models: a method based on the Bayes optimal selection and a method based on the Bayesian model averaging. The advantage of these methods is that they are independent of the algorithm used to learn a predictive model, and they can be applied in a post-processing step, after the model is learned. This makes them applicable to a wide variety of machine learning models and methods. These calibration methods, as well as other methods, are tested on a variety of datasets in terms of both discrimination and calibration performance. The results show the methods either outperform or are comparable in performance to the state-of-the-art calibration methods.

  10. Non-Parametric Statistical Methods and Data Transformations in Agricultural Pest Population Studies Métodos Estadísticos no Paramétricos y Transformaciones de Datos en Estudios de Poblaciones de Plagas Agrícolas

    Directory of Open Access Journals (Sweden)

    Alcides Cabrera Campos

    2012-09-01

    Full Text Available Analyzing data from agricultural pest populations regularly detects that they do not fulfill the theoretical requirements to implement classical ANOVA. Box-Cox transformations and nonparametric statistical methods are commonly used as alternatives to solve this problem. In this paper, we describe the results of applying these techniques to data from Thrips palmi Karny sampled in potato (Solanum tuberosum L. plantations. The X² test was used for the goodness-of-fit of negative binomial distribution and as a test of independence to investigate the relationship between plant strata and insect stages. Seven data transformations were also applied to meet the requirements of classical ANOVA, which failed to eliminate the relationship between mean and variance. Given this negative result, comparisons between insect population densities were made using the nonparametric Kruskal-Wallis ANOVA test. Results from this analysis allowed selecting the insect larval stage and plant middle stratum as keys to design pest sampling plans.Al analizar datos provenientes de poblaciones de plagas agrícolas, regularmente se detecta que no cumplen los requerimientos teóricos para la aplicación del ANDEVA clásico. El uso de transformaciones Box-Cox y de métodos estadísticos no paramétricos resulta la alternativa más utilizada para resolver este inconveniente. En el presente trabajo se exponen los resultados de la aplicación de estas técnicas a datos provenientes de Thrips palmi Karny muestreadas en plantaciones de papa (Solanum tuberosum L. en el período de incidencia de la plaga. Se utilizó la dócima X² para la bondad de ajuste a la distribución binomial negativa y de independencia para investigar la relación entre los estratos de las plantas y los estados del insecto, se aplicaron siete transformaciones a los datos para satisfacer el cumplimiento de los supuestos básicos del ANDEVA, con las cuales no se logró eliminar la relación entre la media y la

  11. A non-parametric meta-analysis approach for combining independent microarray datasets: application using two microarray datasets pertaining to chronic allograft nephropathy

    Directory of Open Access Journals (Sweden)

    Archer Kellie J

    2008-02-01

    Full Text Available Abstract Background With the popularity of DNA microarray technology, multiple groups of researchers have studied the gene expression of similar biological conditions. Different methods have been developed to integrate the results from various microarray studies, though most of them rely on distributional assumptions, such as the t-statistic based, mixed-effects model, or Bayesian model methods. However, often the sample size for each individual microarray experiment is small. Therefore, in this paper we present a non-parametric meta-analysis approach for combining data from independent microarray studies, and illustrate its application on two independent Affymetrix GeneChip studies that compared the gene expression of biopsies from kidney transplant recipients with chronic allograft nephropathy (CAN to those with normal functioning allograft. Results The simulation study comparing the non-parametric meta-analysis approach to a commonly used t-statistic based approach shows that the non-parametric approach has better sensitivity and specificity. For the application on the two CAN studies, we identified 309 distinct genes that expressed differently in CAN. By applying Fisher's exact test to identify enriched KEGG pathways among those genes called differentially expressed, we found 6 KEGG pathways to be over-represented among the identified genes. We used the expression measurements of the identified genes as predictors to predict the class labels for 6 additional biopsy samples, and the predicted results all conformed to their pathologist diagnosed class labels. Conclusion We present a new approach for combining data from multiple independent microarray studies. This approach is non-parametric and does not rely on any distributional assumptions. The rationale behind the approach is logically intuitive and can be easily understood by researchers not having advanced training in statistics. Some of the identified genes and pathways have been

  12. A non-parametric meta-analysis approach for combining independent microarray datasets: application using two microarray datasets pertaining to chronic allograft nephropathy.

    Science.gov (United States)

    Kong, Xiangrong; Mas, Valeria; Archer, Kellie J

    2008-02-26

    With the popularity of DNA microarray technology, multiple groups of researchers have studied the gene expression of similar biological conditions. Different methods have been developed to integrate the results from various microarray studies, though most of them rely on distributional assumptions, such as the t-statistic based, mixed-effects model, or Bayesian model methods. However, often the sample size for each individual microarray experiment is small. Therefore, in this paper we present a non-parametric meta-analysis approach for combining data from independent microarray studies, and illustrate its application on two independent Affymetrix GeneChip studies that compared the gene expression of biopsies from kidney transplant recipients with chronic allograft nephropathy (CAN) to those with normal functioning allograft. The simulation study comparing the non-parametric meta-analysis approach to a commonly used t-statistic based approach shows that the non-parametric approach has better sensitivity and specificity. For the application on the two CAN studies, we identified 309 distinct genes that expressed differently in CAN. By applying Fisher's exact test to identify enriched KEGG pathways among those genes called differentially expressed, we found 6 KEGG pathways to be over-represented among the identified genes. We used the expression measurements of the identified genes as predictors to predict the class labels for 6 additional biopsy samples, and the predicted results all conformed to their pathologist diagnosed class labels. We present a new approach for combining data from multiple independent microarray studies. This approach is non-parametric and does not rely on any distributional assumptions. The rationale behind the approach is logically intuitive and can be easily understood by researchers not having advanced training in statistics. Some of the identified genes and pathways have been reported to be relevant to renal diseases. Further study on the

  13. Non-parametric frequency analysis of extreme values for integrated disaster management considering probable maximum events

    Science.gov (United States)

    Takara, K. T.

    2015-12-01

    This paper describes a non-parametric frequency analysis method for hydrological extreme-value samples with a size larger than 100, verifying the estimation accuracy with a computer intensive statistics (CIS) resampling such as the bootstrap. Probable maximum values are also incorporated into the analysis for extreme events larger than a design level of flood control. Traditional parametric frequency analysis methods of extreme values include the following steps: Step 1: Collecting and checking extreme-value data; Step 2: Enumerating probability distributions that would be fitted well to the data; Step 3: Parameter estimation; Step 4: Testing goodness of fit; Step 5: Checking the variability of quantile (T-year event) estimates by the jackknife resampling method; and Step_6: Selection of the best distribution (final model). The non-parametric method (NPM) proposed here can skip Steps 2, 3, 4 and 6. Comparing traditional parameter methods (PM) with the NPM, this paper shows that PM often underestimates 100-year quantiles for annual maximum rainfall samples with records of more than 100 years. Overestimation examples are also demonstrated. The bootstrap resampling can do bias correction for the NPM and can also give the estimation accuracy as the bootstrap standard error. This NPM has advantages to avoid various difficulties in above-mentioned steps in the traditional PM. Probable maximum events are also incorporated into the NPM as an upper bound of the hydrological variable. Probable maximum precipitation (PMP) and probable maximum flood (PMF) can be a new parameter value combined with the NPM. An idea how to incorporate these values into frequency analysis is proposed for better management of disasters that exceed the design level. The idea stimulates more integrated approach by geoscientists and statisticians as well as encourages practitioners to consider the worst cases of disasters in their disaster management planning and practices.

  14. A non-parametric approach to investigating fish population dynamics

    National Research Council Canada - National Science Library

    Cook, R.M; Fryer, R.J

    2001-01-01

    .... Using a non-parametric model for the stock-recruitment relationship it is possible to avoid defining specific functions relating recruitment to stock size while also providing a natural framework to model process error...

  15. Comparison of reliability techniques of parametric and non-parametric method

    Directory of Open Access Journals (Sweden)

    C. Kalaiselvan

    2016-06-01

    Full Text Available Reliability of a product or system is the probability that the product performs adequately its intended function for the stated period of time under stated operating conditions. It is function of time. The most widely used nano ceramic capacitor C0G and X7R is used in this reliability study to generate the Time-to failure (TTF data. The time to failure data are identified by Accelerated Life Test (ALT and Highly Accelerated Life Testing (HALT. The test is conducted at high stress level to generate more failure rate within the short interval of time. The reliability method used to convert accelerated to actual condition is Parametric method and Non-Parametric method. In this paper, comparative study has been done for Parametric and Non-Parametric methods to identify the failure data. The Weibull distribution is identified for parametric method; Kaplan–Meier and Simple Actuarial Method are identified for non-parametric method. The time taken to identify the mean time to failure (MTTF in accelerating condition is the same for parametric and non-parametric method with relative deviation.

  16. Trend Analysis of Golestan's Rivers Discharges Using Parametric and Non-parametric Methods

    Science.gov (United States)

    Mosaedi, Abolfazl; Kouhestani, Nasrin

    2010-05-01

    One of the major problems in human life is climate changes and its problems. Climate changes will cause changes in rivers discharges. The aim of this research is to investigate the trend analysis of seasonal and yearly rivers discharges of Golestan province (Iran). In this research four trend analysis method including, conjunction point, linear regression, Wald-Wolfowitz and Mann-Kendall, for analyzing of river discharges in seasonal and annual periods in significant level of 95% and 99% were applied. First, daily discharge data of 12 hydrometrics stations with a length of 42 years (1965-2007) were selected, after some common statistical tests such as, homogeneity test (by applying G-B and M-W tests), the four mentioned trends analysis tests were applied. Results show that in all stations, for summer data time series, there are decreasing trends with a significant level of 99% according to Mann-Kendall (M-K) test. For autumn time series data, all four methods have similar results. For other periods, the results of these four tests were more or less similar together. While, for some stations the results of tests were different. Keywords: Trend Analysis, Discharge, Non-parametric methods, Wald-Wolfowitz, The Mann-Kendall test, Golestan Province.

  17. Statistical Test for Bivariate Uniformity

    Directory of Open Access Journals (Sweden)

    Zhenmin Chen

    2014-01-01

    Full Text Available The purpose of the multidimension uniformity test is to check whether the underlying probability distribution of a multidimensional population differs from the multidimensional uniform distribution. The multidimensional uniformity test has applications in various fields such as biology, astronomy, and computer science. Such a test, however, has received less attention in the literature compared with the univariate case. A new test statistic for checking multidimensional uniformity is proposed in this paper. Some important properties of the proposed test statistic are discussed. As a special case, the bivariate statistic test is discussed in detail in this paper. The Monte Carlo simulation is used to compare the power of the newly proposed test with the distance-to-boundary test, which is a recently published statistical test for multidimensional uniformity. It has been shown that the test proposed in this paper is more powerful than the distance-to-boundary test in some cases.

  18. Evaluation of model-based versus non-parametric monaural noise-reduction approaches for hearing aids.

    Science.gov (United States)

    Harlander, Niklas; Rosenkranz, Tobias; Hohmann, Volker

    2012-08-01

    Single channel noise reduction has been well investigated and seems to have reached its limits in terms of speech intelligibility improvement, however, the quality of such schemes can still be advanced. This study tests to what extent novel model-based processing schemes might improve performance in particular for non-stationary noise conditions. Two prototype model-based algorithms, a speech-model-based, and a auditory-model-based algorithm were compared to a state-of-the-art non-parametric minimum statistics algorithm. A speech intelligibility test, preference rating, and listening effort scaling were performed. Additionally, three objective quality measures for the signal, background, and overall distortions were applied. For a better comparison of all algorithms, particular attention was given to the usage of the similar Wiener-based gain rule. The perceptual investigation was performed with fourteen hearing-impaired subjects. The results revealed that the non-parametric algorithm and the auditory model-based algorithm did not affect speech intelligibility, whereas the speech-model-based algorithm slightly decreased intelligibility. In terms of subjective quality, both model-based algorithms perform better than the unprocessed condition and the reference in particular for highly non-stationary noise environments. Data support the hypothesis that model-based algorithms are promising for improving performance in non-stationary noise conditions.

  19. Variable selection in identification of a high dimensional nonlinear non-parametric system

    Institute of Scientific and Technical Information of China (English)

    Er-Wei BAI; Wenxiao ZHAO; Weixing ZHENG

    2015-01-01

    The problem of variable selection in system identification of a high dimensional nonlinear non-parametric system is described. The inherent difficulty, the curse of dimensionality, is introduced. Then its connections to various topics and research areas are briefly discussed, including order determination, pattern recognition, data mining, machine learning, statistical regression and manifold embedding. Finally, some results of variable selection in system identification in the recent literature are presented.

  20. Measuring the influence of information networks on transaction costs using a non-parametric regression technique

    DEFF Research Database (Denmark)

    Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H. C. A.

    All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access to no...... are unveiled by reduced productivity. A cross-validated local linear non-parametric regression shows that good information networks increase the productivity of farms. A bootstrapping procedure confirms that this result is statistically significant....

  1. Further Research into a Non-Parametric Statistical Screening System.

    Science.gov (United States)

    1979-12-14

    Let X = V if birth weight is high X2 = 0 if gestation length is short V2 if gestation length is long Normal babies have high birth weight and long... gestation length or low birth weight and short gestation length . Abnormal babies have either of the other two combinations ((0, 1) or (1, 0)). The LDF

  2. A New Non-Parametric Approach to Galaxy Morphological Classification

    CERN Document Server

    Lotz, J M; Madau, P; Lotz, Jennifer M.; Primack, Joel; Madau, Piero

    2003-01-01

    We present two new non-parametric methods for quantifying galaxy morphology: the relative distribution of the galaxy pixel flux values (the Gini coefficient or G) and the second-order moment of the brightest 20% of the galaxy's flux (M20). We test the robustness of G and M20 to decreasing signal-to-noise and spatial resolution, and find that both measures are reliable to within 10% at average signal-to-noise per pixel greater than 3 and resolutions better than 1000 pc and 500 pc, respectively. We have measured G and M20, as well as concentration (C), asymmetry (A), and clumpiness (S) in the rest-frame near-ultraviolet/optical wavelengths for 150 bright local "normal" Hubble type galaxies (E-Sd) galaxies and 104 0.05 < z < 0.25 ultra-luminous infrared galaxies (ULIRGs).We find that most local galaxies follow a tight sequence in G-M20-C, where early-types have high G and C and low M20 and late-type spirals have lower G and C and higher M20. The majority of ULIRGs lie above the normal galaxy G-M20 sequence...

  3. Non-Parametric Bayesian Areal Linguistics

    CERN Document Server

    Daumé, Hal

    2009-01-01

    We describe a statistical model over linguistic areas and phylogeny. Our model recovers known areas and identifies a plausible hierarchy of areal features. The use of areas improves genetic reconstruction of languages both qualitatively and quantitatively according to a variety of metrics. We model linguistic areas by a Pitman-Yor process and linguistic phylogeny by Kingman's coalescent.

  4. rSeqNP: a non-parametric approach for detecting differential expression and splicing from RNA-Seq data.

    Science.gov (United States)

    Shi, Yang; Chinnaiyan, Arul M; Jiang, Hui

    2015-07-01

    High-throughput sequencing of transcriptomes (RNA-Seq) has become a powerful tool to study gene expression. Here we present an R package, rSeqNP, which implements a non-parametric approach to test for differential expression and splicing from RNA-Seq data. rSeqNP uses permutation tests to access statistical significance and can be applied to a variety of experimental designs. By combining information across isoforms, rSeqNP is able to detect more differentially expressed or spliced genes from RNA-Seq data. The R package with its source code and documentation are freely available at http://www-personal.umich.edu/∼jianghui/rseqnp/. jianghui@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. Testing statistical hypotheses of equivalence

    CERN Document Server

    Wellek, Stefan

    2010-01-01

    Equivalence testing has grown significantly in importance over the last two decades, especially as its relevance to a variety of applications has become understood. Yet published work on the general methodology remains scattered in specialists' journals, and for the most part, it focuses on the relatively narrow topic of bioequivalence assessment.With a far broader perspective, Testing Statistical Hypotheses of Equivalence provides the first comprehensive treatment of statistical equivalence testing. The author addresses a spectrum of specific, two-sided equivalence testing problems, from the

  6. Non-parametric analysis of rating transition and default data

    DEFF Research Database (Denmark)

    Fledelius, Peter; Lando, David; Perch Nielsen, Jens

    2004-01-01

    We demonstrate the use of non-parametric intensity estimation - including construction of pointwise confidence sets - for analyzing rating transition data. We find that transition intensities away from the class studied here for illustration strongly depend on the direction of the previous move b...... but that this dependence vanishes after 2-3 years....

  7. A non-parametric model for the cosmic velocity field

    NARCIS (Netherlands)

    Branchini, E; Teodoro, L; Frenk, CS; Schmoldt, [No Value; Efstathiou, G; White, SDM; Saunders, W; Sutherland, W; Rowan-Robinson, M; Keeble, O; Tadros, H; Maddox, S; Oliver, S

    1999-01-01

    We present a self-consistent non-parametric model of the local cosmic velocity field derived from the distribution of IRAS galaxies in the PSCz redshift survey. The survey has been analysed using two independent methods, both based on the assumptions of gravitational instability and linear biasing.

  8. Non-parametric Bayesian inference for inhomogeneous Markov point processes

    DEFF Research Database (Denmark)

    Berthelsen, Kasper Klitgaard; Møller, Jesper

    With reference to a specific data set, we consider how to perform a flexible non-parametric Bayesian analysis of an inhomogeneous point pattern modelled by a Markov point process, with a location dependent first order term and pairwise interaction only. A priori we assume that the first order term...

  9. Non-parametric analysis of rating transition and default data

    DEFF Research Database (Denmark)

    Fledelius, Peter; Lando, David; Perch Nielsen, Jens

    2004-01-01

    We demonstrate the use of non-parametric intensity estimation - including construction of pointwise confidence sets - for analyzing rating transition data. We find that transition intensities away from the class studied here for illustration strongly depend on the direction of the previous move...

  10. Parametric and Non-Parametric System Modelling

    DEFF Research Database (Denmark)

    Nielsen, Henrik Aalborg

    1999-01-01

    other aspects, the properties of a method for parameter estimation in stochastic differential equations is considered within the field of heat dynamics of buildings. In the second paper a lack-of-fit test for stochastic differential equations is presented. The test can be applied to both linear and non-linear...... networks is included. In this paper, neural networks are used for predicting the electricity production of a wind farm. The results are compared with results obtained using an adaptively estimated ARX-model. Finally, two papers on stochastic differential equations are included. In the first paper, among...... stochastic differential equations. Some applications are presented in the papers. In the summary report references are made to a number of other applications. Resumé på dansk: Nærværende afhandling består af ti artikler publiceret i perioden 1996-1999 samt et sammendrag og en perspektivering heraf. I...

  11. (Errors in statistical tests3

    Directory of Open Access Journals (Sweden)

    Kaufman Jay S

    2008-07-01

    Full Text Available Abstract In 2004, Garcia-Berthou and Alcaraz published "Incongruence between test statistics and P values in medical papers," a critique of statistical errors that received a tremendous amount of attention. One of their observations was that the final reported digit of p-values in articles published in the journal Nature departed substantially from the uniform distribution that they suggested should be expected. In 2006, Jeng critiqued that critique, observing that the statistical analysis of those terminal digits had been based on comparing the actual distribution to a uniform continuous distribution, when digits obviously are discretely distributed. Jeng corrected the calculation and reported statistics that did not so clearly support the claim of a digit preference. However delightful it may be to read a critique of statistical errors in a critique of statistical errors, we nevertheless found several aspects of the whole exchange to be quite troubling, prompting our own meta-critique of the analysis. The previous discussion emphasized statistical significance testing. But there are various reasons to expect departure from the uniform distribution in terminal digits of p-values, so that simply rejecting the null hypothesis is not terribly informative. Much more importantly, Jeng found that the original p-value of 0.043 should have been 0.086, and suggested this represented an important difference because it was on the other side of 0.05. Among the most widely reiterated (though often ignored tenets of modern quantitative research methods is that we should not treat statistical significance as a bright line test of whether we have observed a phenomenon. Moreover, it sends the wrong message about the role of statistics to suggest that a result should be dismissed because of limited statistical precision when it is so easy to gather more data. In response to these limitations, we gathered more data to improve the statistical precision, and

  12. Comparison of non-parametric methods for ungrouping coarsely aggregated data

    Directory of Open Access Journals (Sweden)

    Silvia Rizzi

    2016-05-01

    Full Text Available Abstract Background Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data. Methods From an extensive literature search we identify five methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate differently shaped distributions; can handle unequal interval length; and allow stretches of 0 counts. Results The methods show similar performance when the grouping scheme is relatively narrow, i.e. 5-years age classes. With coarser age intervals, i.e. in the presence of open-ended age groups, the penalized composite link model performs the best. Conclusion We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized composite link model when data are grouped in wide age classes.

  13. Non-Parametric Evolutionary Algorithm for Estimating Root Zone Soil Moisture

    Science.gov (United States)

    Mohanty, B.; Shin, Y.; Ines, A. M.

    2013-12-01

    Prediction of root zone soil moisture is critical for water resources management. In this study, we explored a non-parametric evolutionary algorithm for estimating root zone soil moisture from a time series of spatially-distributed rainfall across multiple weather locations under two different hydro-climatic regions. A new genetic algorithm-based hidden Markov model (HMMGA) was developed to estimate long-term root zone soil moisture dynamics at different soil depths. Also, we analyzed rainfall occurrence probabilities and dry/wet spell lengths reproduced by this approach. The HMMGA was used to estimate the optimal state sequences (weather states) based on the precipitation history. Historical root zone soil moisture statistics were then determined based on the weather state conditions. To test the new approach, we selected two different soil moisture fields, Oklahoma (130 km x 130 km) and Illinois (300 km x 500 km), during 1995 to 2009 and 1994 to 2010, respectively. We found that the newly developed framework performed well in predicting root zone soil moisture dynamics at both the spatial scales. Also, the reproduced rainfall occurrence probabilities and dry/wet spell lengths matched well with the observations at the spatio-temporal scales. Since the proposed algorithm requires only precipitation and historical soil moisture data from existing, established weather stations, it can serve an attractive alternative for predicting root zone soil moisture in the future using climate change scenarios and root zone soil moisture history.

  14. A Non-Parametric and Entropy Based Analysis of the Relationship between the VIX and S&P 500

    Directory of Open Access Journals (Sweden)

    Abhay K. Singh

    2013-10-01

    Full Text Available This paper features an analysis of the relationship between the S&P 500 Index and the VIX using daily data obtained from the CBOE website and SIRCA (The Securities Industry Research Centre of the Asia Pacific. We explore the relationship between the S&P 500 daily return series and a similar series for the VIX in terms of a long sample drawn from the CBOE from 1990 to mid 2011 and a set of returns from SIRCA’s TRTH datasets from March 2005 to-date. This shorter sample, which captures the behavior of the new VIX, introduced in 2003, is divided into four sub-samples which permit the exploration of the impact of the Global Financial Crisis. We apply a series of non-parametric based tests utilizing entropy based metrics. These suggest that the PDFs and CDFs of these two return distributions change shape in various subsample periods. The entropy and MI statistics suggest that the degree of uncertainty attached to these distributions changes through time and using the S&P 500 return as the dependent variable, that the amount of information obtained from the VIX changes with time and reaches a relative maximum in the most recent period from 2011 to 2012. The entropy based non-parametric tests of the equivalence of the two distributions and their symmetry all strongly reject their respective nulls. The results suggest that parametric techniques do not adequately capture the complexities displayed in the behavior of these series. This has practical implications for hedging utilizing derivatives written on the VIX.

  15. Non-parametric versus parametric methods in environmental sciences

    Directory of Open Access Journals (Sweden)

    Muhammad Riaz

    2016-01-01

    Full Text Available This current report intends to highlight the importance of considering background assumptions required for the analysis of real datasets in different disciplines. We will provide comparative discussion of parametric methods (that depends on distributional assumptions (like normality relative to non-parametric methods (that are free from many distributional assumptions. We have chosen a real dataset from environmental sciences (one of the application areas. The findings may be extended to the other disciplines following the same spirit.

  16. Monotonicity of chi-square test statistics

    OpenAIRE

    Ryu, Keunkwan

    2003-01-01

    This paper establishes monotonicity of the chi-square test statistic. As the more efficient parameter estimator is plugged into the test statistic, the degrees of freedom of the resulting chi-square test statistic monotonically increase.

  17. Non-parametric methods – Tree and P-CFA – for the ecological evaluation and assessment of suitable aquatic habitats: A contribution to fish psychology

    Directory of Open Access Journals (Sweden)

    Andreas H. Melcher

    2012-09-01

    Full Text Available This study analyses multidimensional spawning habitat suitability of the fish species “Nase” (latin: Chondrostoma nasus. This is the first time non-parametric methods were used to better understand biotic habitat use in theory and practice. In particular, we tested (1 the Decision Tree technique, Chi-squared Automatic Interaction Detectors (CHAID, to identify specific habitat types and (2 Prediction-Configural Frequency Analysis (P-CFA to test for statistical significance. The combination of both non-parametric methods, CHAID and P-CFA, enabled the identification, prediction and interpretation of most typical significant spawning habitats, and we were also able to determine non-typical habitat types, e.g., types in contrast to antitypes. The gradual combination of these two methods underlined three significant habitat types: shaded habitat, fine and coarse substrate habitat depending on high flow velocity. The study affirmed the importance for fish species of shading and riparian vegetation along river banks. In addition, this method provides a weighting of interactions between specific habitat characteristics. The results demonstrate that efficient river restoration requires re-establishing riparian vegetation as well as the open river continuum and hydro-morphological improvements to habitats.

  18. Non-parametric change-point method for differential gene expression detection.

    Directory of Open Access Journals (Sweden)

    Yao Wang

    Full Text Available BACKGROUND: We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short, by using a single equation for detecting differential gene expression (DGE in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability. METHODOLOGY: NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods. CONCLUSIONS: Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods.

  19. Digital spectral analysis parametric, non-parametric and advanced methods

    CERN Document Server

    Castanié, Francis

    2013-01-01

    Digital Spectral Analysis provides a single source that offers complete coverage of the spectral analysis domain. This self-contained work includes details on advanced topics that are usually presented in scattered sources throughout the literature.The theoretical principles necessary for the understanding of spectral analysis are discussed in the first four chapters: fundamentals, digital signal processing, estimation in spectral analysis, and time-series models.An entire chapter is devoted to the non-parametric methods most widely used in industry.High resolution methods a

  20. Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data.

    Science.gov (United States)

    Tan, Qihua; Thomassen, Mads; Burton, Mark; Mose, Kristian Fredløv; Andersen, Klaus Ejner; Hjelmborg, Jacob; Kruse, Torben

    2017-06-06

    Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray time-course data and for exploring the complex relationships in the omics data for studying their association with disease and health.

  1. Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data

    DEFF Research Database (Denmark)

    Tan, Qihua; Thomassen, Mads; Burton, Mark

    2017-01-01

    Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering...... the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray...... time-course data and for exploring the complex relationships in the omics data for studying their association with disease and health....

  2. Non-parametric trend analysis of water quality data of rivers in Kansas

    Science.gov (United States)

    Yu, Y.-S.; Zou, S.; Whittemore, D.

    1993-01-01

    Surface water quality data for 15 sampling stations in the Arkansas, Verdigris, Neosho, and Walnut river basins inside the state of Kansas were analyzed to detect trends (or lack of trends) in 17 major constituents by using four different non-parametric methods. The results show that concentrations of specific conductance, total dissolved solids, calcium, total hardness, sodium, potassium, alkalinity, sulfate, chloride, total phosphorus, ammonia plus organic nitrogen, and suspended sediment generally have downward trends. Some of the downward trends are related to increases in discharge, while others could be caused by decreases in pollution sources. Homogeneity tests show that both station-wide trends and basinwide trends are non-homogeneous. ?? 1993.

  3. Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data

    DEFF Research Database (Denmark)

    Tan, Qihua; Thomassen, Mads; Burton, Mark

    2017-01-01

    Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering...... the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray...

  4. A non-parametric framework for estimating threshold limit values

    Directory of Open Access Journals (Sweden)

    Ulm Kurt

    2005-11-01

    Full Text Available Abstract Background To estimate a threshold limit value for a compound known to have harmful health effects, an 'elbow' threshold model is usually applied. We are interested on non-parametric flexible alternatives. Methods We describe how a step function model fitted by isotonic regression can be used to estimate threshold limit values. This method returns a set of candidate locations, and we discuss two algorithms to select the threshold among them: the reduced isotonic regression and an algorithm considering the closed family of hypotheses. We assess the performance of these two alternative approaches under different scenarios in a simulation study. We illustrate the framework by analysing the data from a study conducted by the German Research Foundation aiming to set a threshold limit value in the exposure to total dust at workplace, as a causal agent for developing chronic bronchitis. Results In the paper we demonstrate the use and the properties of the proposed methodology along with the results from an application. The method appears to detect the threshold with satisfactory success. However, its performance can be compromised by the low power to reject the constant risk assumption when the true dose-response relationship is weak. Conclusion The estimation of thresholds based on isotonic framework is conceptually simple and sufficiently powerful. Given that in threshold value estimation context there is not a gold standard method, the proposed model provides a useful non-parametric alternative to the standard approaches and can corroborate or challenge their findings.

  5. Using non-parametric methods in econometric production analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    2012-01-01

    Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb-Douglas a......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...... to estimate production functions without the specification of a functional form. Therefore, they avoid possible misspecification errors due to the use of an unsuitable functional form. In this paper, we use parametric and non-parametric methods to identify the optimal size of Polish crop farms...

  6. Transit Timing Observations From Kepler: Ii. Confirmation of Two Multiplanet Systems via a Non-Parametric Correlation Analysis

    OpenAIRE

    Ford, Eric B.; Fabrycky, Daniel C.; Steffen, Jason H.; Carter, Joshua A.; Fressin, Francois; Holman, Matthew Jon; Lissauer, Jack J.; Moorhead, Althea V.; Morehead, Robert C.; Ragozzine, Darin; Rowe, Jason F.; Welsh, William F.; Allen, Christopher; Batalha, Natalie M.; Borucki, William J.

    2012-01-01

    We present a new method for confirming transiting planets based on the combination of transit timingn variations (TTVs) and dynamical stability. Correlated TTVs provide evidence that the pair of bodies are in the same physical system. Orbital stability provides upper limits for the masses of the transiting companions that are in the planetary regime. This paper describes a non-parametric technique for quantifying the statistical significance of TTVs based on the correlation of two TTV data se...

  7. Homothetic Efficiency and Test Power: A Non-Parametric Approach

    NARCIS (Netherlands)

    J. Heufer (Jan); P. Hjertstrand (Per)

    2015-01-01

    markdownabstract__Abstract__ We provide a nonparametric revealed preference approach to demand analysis based on homothetic efficiency. Homotheticity is a useful restriction but data rarely satisfies testable conditions. To overcome this we provide a way to estimate homothetic efficiency of

  8. Homothetic Efficiency and Test Power: A Non-Parametric Approach

    NARCIS (Netherlands)

    J. Heufer (Jan); P. Hjertstrand (Per)

    2015-01-01

    markdownabstract__Abstract__ We provide a nonparametric revealed preference approach to demand analysis based on homothetic efficiency. Homotheticity is a useful restriction but data rarely satisfies testable conditions. To overcome this we provide a way to estimate homothetic efficiency of consump

  9. Validation of two (parametric vs non-parametric) daily weather generators

    Science.gov (United States)

    Dubrovsky, M.; Skalak, P.

    2015-12-01

    As the climate models (GCMs and RCMs) fail to satisfactorily reproduce the real-world surface weather regime, various statistical methods are applied to downscale GCM/RCM outputs into site-specific weather series. The stochastic weather generators are among the most favourite downscaling methods capable to produce realistic (observed-like) meteorological inputs for agrological, hydrological and other impact models used in assessing sensitivity of various ecosystems to climate change/variability. To name their advantages, the generators may (i) produce arbitrarily long multi-variate synthetic weather series representing both present and changed climates (in the latter case, the generators are commonly modified by GCM/RCM-based climate change scenarios), (ii) be run in various time steps and for multiple weather variables (the generators reproduce the correlations among variables), (iii) be interpolated (and run also for sites where no weather data are available to calibrate the generator). This contribution will compare two stochastic daily weather generators in terms of their ability to reproduce various features of the daily weather series. M&Rfi is a parametric generator: Markov chain model is used to model precipitation occurrence, precipitation amount is modelled by the Gamma distribution, and the 1st order autoregressive model is used to generate non-precipitation surface weather variables. The non-parametric GoMeZ generator is based on the nearest neighbours resampling technique making no assumption on the distribution of the variables being generated. Various settings of both weather generators will be assumed in the present validation tests. The generators will be validated in terms of (a) extreme temperature and precipitation characteristics (annual and 30-years extremes and maxima of duration of hot/cold/dry/wet spells); (b) selected validation statistics developed within the frame of VALUE project. The tests will be based on observational weather series

  10. A non-parametric Bayesian approach for clustering and tracking non-stationarities of neural spikes.

    Science.gov (United States)

    Shalchyan, Vahid; Farina, Dario

    2014-02-15

    Neural spikes from multiple neurons recorded in a multi-unit signal are usually separated by clustering. Drifts in the position of the recording electrode relative to the neurons over time cause gradual changes in the position and shapes of the clusters, challenging the clustering task. By dividing the data into short time intervals, Bayesian tracking of the clusters based on Gaussian cluster model has been previously proposed. However, the Gaussian cluster model is often not verified for neural spikes. We present a Bayesian clustering approach that makes no assumptions on the distribution of the clusters and use kernel-based density estimation of the clusters in every time interval as a prior for Bayesian classification of the data in the subsequent time interval. The proposed method was tested and compared to Gaussian model-based approach for cluster tracking by using both simulated and experimental datasets. The results showed that the proposed non-parametric kernel-based density estimation of the clusters outperformed the sequential Gaussian model fitting in both simulated and experimental data tests. Using non-parametric kernel density-based clustering that makes no assumptions on the distribution of the clusters enhances the ability of tracking cluster non-stationarity over time with respect to the Gaussian cluster modeling approach. Copyright © 2013 Elsevier B.V. All rights reserved.

  11. Multi-Directional Non-Parametric Analysis of Agricultural Efficiency

    DEFF Research Database (Denmark)

    Balezentis, Tomas

    This thesis seeks to develop methodologies for assessment of agricultural efficiency and employ them to Lithuanian family farms. In particular, we focus on three particular objectives throughout the research: (i) to perform a fully non-parametric analysis of efficiency effects, (ii) to extend...... relative to labour, intermediate consumption and land (in some cases land was not treated as a discretionary input). These findings call for further research on relationships among financial structure, investment decisions, and efficiency in Lithuanian family farms. Application of different techniques...... of stochasticity associated with Lithuanian family farm performance. The former technique showed that the farms differed in terms of the mean values and variance of the efficiency scores over time with some clear patterns prevailing throughout the whole research period. The fuzzy Free Disposal Hull showed...

  12. Using non-parametric methods in econometric production analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    -Douglas function nor the Translog function are consistent with the “true” relationship between the inputs and the output in our data set. We solve this problem by using non-parametric regression. This approach delivers reasonable results, which are on average not too different from the results of the parametric......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...

  13. Homogeneity and change-point detection tests for multivariate data using rank statistics

    CERN Document Server

    Lung-Yut-Fong, Alexandre; Cappé, Olivier

    2011-01-01

    Detecting and locating changes in highly multivariate data is a major concern in several current statistical applications. In this context, the first contribution of the paper is a novel non-parametric two-sample homogeneity test for multivariate data based on the well-known Wilcoxon rank statistic. The proposed two-sample homogeneity test statistic can be extended to deal with ordinal or censored data as well as to test for the homogeneity of more than two samples. The second contribution of the paper concerns the use of the proposed test statistic to perform retrospective change-point analysis. It is first shown that the approach is computationally feasible even when looking for a large number of change-points thanks to the use of dynamic programming. Computable asymptotic $p$-values for the test are then provided in the case where a single potential change-point is to be detected. Compared to available alternatives, the proposed approach appears to be very reliable and robust. This is particularly true in ...

  14. Application of the LSQR algorithm in non-parametric estimation of aerosol size distribution

    Science.gov (United States)

    He, Zhenzong; Qi, Hong; Lew, Zhongyuan; Ruan, Liming; Tan, Heping; Luo, Kun

    2016-05-01

    Based on the Least Squares QR decomposition (LSQR) algorithm, the aerosol size distribution (ASD) is retrieved in non-parametric approach. The direct problem is solved by the Anomalous Diffraction Approximation (ADA) and the Lambert-Beer Law. An optimal wavelength selection method is developed to improve the retrieval accuracy of the ASD. The proposed optimal wavelength set is selected by the method which can make the measurement signals sensitive to wavelength and decrease the degree of the ill-condition of coefficient matrix of linear systems effectively to enhance the anti-interference ability of retrieval results. Two common kinds of monomodal and bimodal ASDs, log-normal (L-N) and Gamma distributions, are estimated, respectively. Numerical tests show that the LSQR algorithm can be successfully applied to retrieve the ASD with high stability in the presence of random noise and low susceptibility to the shape of distributions. Finally, the experimental measurement ASD over Harbin in China is recovered reasonably. All the results confirm that the LSQR algorithm combined with the optimal wavelength selection method is an effective and reliable technique in non-parametric estimation of ASD.

  15. Statistical hypothesis testing with SAS and R

    CERN Document Server

    Taeger, Dirk

    2014-01-01

    A comprehensive guide to statistical hypothesis testing with examples in SAS and R When analyzing datasets the following questions often arise:Is there a short hand procedure for a statistical test available in SAS or R?If so, how do I use it?If not, how do I program the test myself? This book answers these questions and provides an overview of the most commonstatistical test problems in a comprehensive way, making it easy to find and performan appropriate statistical test. A general summary of statistical test theory is presented, along with a basicdescription for each test, including the

  16. Non-parametric star formation histories for 5 dwarf spheroidal galaxies of the local group

    CERN Document Server

    Hernández, X; Valls-Gabaud, D; Gilmore, Gerard; Valls-Gabaud, David

    2000-01-01

    We use recent HST colour-magnitude diagrams of the resolved stellar populations of a sample of local dSph galaxies (Carina, LeoI, LeoII, Ursa Minor and Draco) to infer the star formation histories of these systems, $SFR(t)$. Applying a new variational calculus maximum likelihood method which includes a full Bayesian analysis and allows a non-parametric estimate of the function one is solving for, we infer the star formation histories of the systems studied. This method has the advantage of yielding an objective answer, as one need not assume {\\it a priori} the form of the function one is trying to recover. The results are checked independently using Saha's $W$ statistic. The total luminosities of the systems are used to normalize the results into physical units and derive SN type II rates. We derive the luminosity weighted mean star formation history of this sample of galaxies.

  17. Assessing T cell clonal size distribution: a non-parametric approach.

    Science.gov (United States)

    Bolkhovskaya, Olesya V; Zorin, Daniil Yu; Ivanchenko, Mikhail V

    2014-01-01

    Clonal structure of the human peripheral T-cell repertoire is shaped by a number of homeostatic mechanisms, including antigen presentation, cytokine and cell regulation. Its accurate tuning leads to a remarkable ability to combat pathogens in all their variety, while systemic failures may lead to severe consequences like autoimmune diseases. Here we develop and make use of a non-parametric statistical approach to assess T cell clonal size distributions from recent next generation sequencing data. For 41 healthy individuals and a patient with ankylosing spondylitis, who undergone treatment, we invariably find power law scaling over several decades and for the first time calculate quantitatively meaningful values of decay exponent. It has proved to be much the same among healthy donors, significantly different for an autoimmune patient before the therapy, and converging towards a typical value afterwards. We discuss implications of the findings for theoretical understanding and mathematical modeling of adaptive immunity.

  18. Assessing T cell clonal size distribution: a non-parametric approach.

    Directory of Open Access Journals (Sweden)

    Olesya V Bolkhovskaya

    Full Text Available Clonal structure of the human peripheral T-cell repertoire is shaped by a number of homeostatic mechanisms, including antigen presentation, cytokine and cell regulation. Its accurate tuning leads to a remarkable ability to combat pathogens in all their variety, while systemic failures may lead to severe consequences like autoimmune diseases. Here we develop and make use of a non-parametric statistical approach to assess T cell clonal size distributions from recent next generation sequencing data. For 41 healthy individuals and a patient with ankylosing spondylitis, who undergone treatment, we invariably find power law scaling over several decades and for the first time calculate quantitatively meaningful values of decay exponent. It has proved to be much the same among healthy donors, significantly different for an autoimmune patient before the therapy, and converging towards a typical value afterwards. We discuss implications of the findings for theoretical understanding and mathematical modeling of adaptive immunity.

  19. A non-parametric method for correction of global radiation observations

    DEFF Research Database (Denmark)

    Bacher, Peder; Madsen, Henrik; Perers, Bengt;

    2013-01-01

    This paper presents a method for correction and alignment of global radiation observations based on information obtained from calculated global radiation, in the present study one-hour forecast of global radiation from a numerical weather prediction (NWP) model is used. Systematical errors detected...... in the observations are corrected. These are errors such as: tilt in the leveling of the sensor, shadowing from surrounding objects, clipping and saturation in the signal processing, and errors from dirt and wear. The method is based on a statistical non-parametric clear-sky model which is applied to both...... University. The method can be useful for optimized use of solar radiation observations for forecasting, monitoring, and modeling of energy production and load which are affected by solar radiation....

  20. Measuring the influence of information networks on transaction costs using a non-parametric regression technique

    DEFF Research Database (Denmark)

    Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H. C. A.

    All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access...... to nonpublic information. Our analysis shows that information networks have an impact on the level of TAC. Many resources that are sacrificed for TAC are inputs that also enter the technical production process. As most production data do not separate between these two usages of inputs, high transaction costs...... are unveiled by reduced productivity. A cross-validated local linear non-parametric regression shows that good information networks increase the productivity of farms. A bootstrapping procedure confirms that this result is statistically significant....

  1. Cliff´s Delta Calculator: A non-parametric effect size program for two groups of observations

    Directory of Open Access Journals (Sweden)

    Guillermo Macbeth

    2011-05-01

    Full Text Available The Cliff´s Delta statistic is an effect size measure that quantifies the amount of difference between two non-parametric variables beyond p-values interpretation. This measure can be understood as a useful complementary analysis for the corresponding hypothesis testing. During the last two decades the use of effect size measures has been strongly encouraged by methodologists and leading institutions of behavioral sciences. The aim of this contribution is to introduce the Cliff´s Delta Calculator software that performs such analysis and offers some interpretation tips. Differences and similarities with the parametric case are analysed and illustrated. The implementation of this free program is fully described and compared with other calculators. Alternative algorithmic approaches are mathematically analysed and a basic linear algebra proof of its equivalence is formally presented. Two worked examples in cognitive psychology are commented. A visual interpretation of Cliff´s Delta is suggested. Availability, installation and applications of the program are presented and discussed.

  2. Non-parametric and least squares Langley plot methods

    Directory of Open Access Journals (Sweden)

    P. W. Kiedron

    2015-04-01

    Full Text Available Langley plots are used to calibrate sun radiometers primarily for the measurement of the aerosol component of the atmosphere that attenuates (scatters and absorbs incoming direct solar radiation. In principle, the calibration of a sun radiometer is a straightforward application of the Bouguer–Lambert–Beer law V=V>/i>0e−τ ·m, where a plot of ln (V voltage vs. m air mass yields a straight line with intercept ln (V0. This ln (V0 subsequently can be used to solve for τ for any measurement of V and calculation of m. This calibration works well on some high mountain sites, but the application of the Langley plot calibration technique is more complicated at other, more interesting, locales. This paper is concerned with ferreting out calibrations at difficult sites and examining and comparing a number of conventional and non-conventional methods for obtaining successful Langley plots. The eleven techniques discussed indicate that both least squares and various non-parametric techniques produce satisfactory calibrations with no significant differences among them when the time series of ln (V0's are smoothed and interpolated with median and mean moving window filters.

  3. Parametric and non-parametric modeling of short-term synaptic plasticity. Part II: Experimental study.

    Science.gov (United States)

    Song, Dong; Wang, Zhuo; Marmarelis, Vasilis Z; Berger, Theodore W

    2009-02-01

    This paper presents a synergistic parametric and non-parametric modeling study of short-term plasticity (STP) in the Schaffer collateral to hippocampal CA1 pyramidal neuron (SC) synapse. Parametric models in the form of sets of differential and algebraic equations have been proposed on the basis of the current understanding of biological mechanisms active within the system. Non-parametric Poisson-Volterra models are obtained herein from broadband experimental input-output data. The non-parametric model is shown to provide better prediction of the experimental output than a parametric model with a single set of facilitation/depression (FD) process. The parametric model is then validated in terms of its input-output transformational properties using the non-parametric model since the latter constitutes a canonical and more complete representation of the synaptic nonlinear dynamics. Furthermore, discrepancies between the experimentally-derived non-parametric model and the equivalent non-parametric model of the parametric model suggest the presence of multiple FD processes in the SC synapses. Inclusion of an additional set of FD process in the parametric model makes it replicate better the characteristics of the experimentally-derived non-parametric model. This improved parametric model in turn provides the requisite biological interpretability that the non-parametric model lacks.

  4. Non-parametric kernel density estimation of species sensitivity distributions in developing water quality criteria of metals.

    Science.gov (United States)

    Wang, Ying; Wu, Fengchang; Giesy, John P; Feng, Chenglian; Liu, Yuedan; Qin, Ning; Zhao, Yujie

    2015-09-01

    Due to use of different parametric models for establishing species sensitivity distributions (SSDs), comparison of water quality criteria (WQC) for metals of the same group or period in the periodic table is uncertain and results can be biased. To address this inadequacy, a new probabilistic model, based on non-parametric kernel density estimation was developed and optimal bandwidths and testing methods are proposed. Zinc (Zn), cadmium (Cd), and mercury (Hg) of group IIB of the periodic table are widespread in aquatic environments, mostly at small concentrations, but can exert detrimental effects on aquatic life and human health. With these metals as target compounds, the non-parametric kernel density estimation method and several conventional parametric density estimation methods were used to derive acute WQC of metals for protection of aquatic species in China that were compared and contrasted with WQC for other jurisdictions. HC5 values for protection of different types of species were derived for three metals by use of non-parametric kernel density estimation. The newly developed probabilistic model was superior to conventional parametric density estimations for constructing SSDs and for deriving WQC for these metals. HC5 values for the three metals were inversely proportional to atomic number, which means that the heavier atoms were more potent toxicants. The proposed method provides a novel alternative approach for developing SSDs that could have wide application prospects in deriving WQC and use in assessment of risks to ecosystems.

  5. Testing for Subcellular Randomness

    CERN Document Server

    Okunoye, Babatunde O

    2008-01-01

    Statistical tests were conducted on 1,000 numbers generated from the genome of Bacteriophage T4, obtained from GenBank with accession number AF158101.The numbers passed the non-parametric, distribution-free tests.Deoxyribonucleic acid was discovered to be a random number generator, existent in nature.

  6. Modelación de episodios críticos de contaminación por material particulado (PM10 en Santiago de Chile: Comparación de la eficiencia predictiva de los modelos paramétricos y no paramétricos Modeling critical episodes of air pollution by PM10 in Santiago, Chile: Comparison of the predictive efficiency of parametric and non-parametric statistical models

    Directory of Open Access Journals (Sweden)

    Sergio A. Alvarado

    2010-12-01

    Full Text Available Objetivo: Evaluar la eficiencia predictiva de modelos estadísticos paramétricos y no paramétricos para predecir episodios críticos de contaminación por material particulado PM10 del día siguiente, que superen en Santiago de Chile la norma de calidad diaria. Una predicción adecuada de tales episodios permite a la autoridad decretar medidas restrictivas que aminoren la gravedad del episodio, y consecuentemente proteger la salud de la comunidad. Método: Se trabajó con las concentraciones de material particulado PM10 registradas en una estación asociada a la red de monitorización de la calidad del aire MACAM-2, considerando 152 observaciones diarias de 14 variables, y con información meteorológica registrada durante los años 2001 a 2004. Se ajustaron modelos estadísticos paramétricos Gamma usando el paquete estadístico STATA v11, y no paramétricos usando una demo del software estadístico MARS v 2.0 distribuida por Salford-Systems. Resultados: Ambos métodos de modelación presentan una alta correlación entre los valores observados y los predichos. Los modelos Gamma presentan mejores aciertos que MARS para las concentraciones de PM10 con valores Objective: To evaluate the predictive efficiency of two statistical models (one parametric and the other non-parametric to predict critical episodes of air pollution exceeding daily air quality standards in Santiago, Chile by using the next day PM10 maximum 24h value. Accurate prediction of such episodes would allow restrictive measures to be applied by health authorities to reduce their seriousness and protect the community´s health. Methods: We used the PM10 concentrations registered by a station of the Air Quality Monitoring Network (152 daily observations of 14 variables and meteorological information gathered from 2001 to 2004. To construct predictive models, we fitted a parametric Gamma model using STATA v11 software and a non-parametric MARS model by using a demo version of Salford

  7. SOPIE: an R package for the non-parametric estimation of the off-pulse interval of a pulsar light curve

    Science.gov (United States)

    Schutte, Willem D.; Swanepoel, Jan W. H.

    2016-09-01

    An automated tool to derive the off-pulse interval of a light curve originating from a pulsar is needed. First, we derive a powerful and accurate non-parametric sequential estimation technique to estimate the off-pulse interval of a pulsar light curve in an objective manner. This is in contrast to the subjective `eye-ball' (visual) technique, and complementary to the Bayesian Block method which is currently used in the literature. The second aim involves the development of a statistical package, necessary for the implementation of our new estimation technique. We develop a statistical procedure to estimate the off-pulse interval in the presence of noise. It is based on a sequential application of p-values obtained from goodness-of-fit tests for uniformity. The Kolmogorov-Smirnov, Cramér-von Mises, Anderson-Darling and Rayleigh test statistics are applied. The details of the newly developed statistical package SOPIE (Sequential Off-Pulse Interval Estimation) are discussed. The developed estimation procedure is applied to simulated and real pulsar data. Finally, the SOPIE estimated off-pulse intervals of two pulsars are compared to the estimates obtained with the Bayesian Block method and yield very satisfactory results. We provide the code to implement the SOPIE package, which is publicly available at http://CRAN.R-project.org/package=SOPIE (Schutte).

  8. Polarimetric Segmentation Using Wishart Test Statistic

    DEFF Research Database (Denmark)

    Skriver, Henning; Schou, Jesper; Nielsen, Allan Aasbjerg;

    2002-01-01

    A newly developed test statistic for equality of two complex covariance matrices following the complex Wishart distribution and an associated asymptotic probability for the test statistic has been used in a segmentation algorithm. The segmentation algorithm is based on the MUM (merge using moments......) approach, which is a merging algorithm for single channel SAR images. The polarimetric version described in this paper uses the above-mentioned test statistic for merging. The segmentation algorithm has been applied to polarimetric SAR data from the Danish dual-frequency, airborne polarimetric SAR, EMISAR....... The results show clearly an improved segmentation performance for the full polarimetric algorithm compared to single channel approaches....

  9. Teaching Statistics in Language Testing Courses

    Science.gov (United States)

    Brown, James Dean

    2013-01-01

    The purpose of this article is to examine the literature on teaching statistics for useful ideas that teachers of language testing courses can draw on and incorporate into their teaching toolkits as they see fit. To those ends, the article addresses eight questions: What is known generally about teaching statistics? Why are students so anxious…

  10. Teaching Statistics in Language Testing Courses

    Science.gov (United States)

    Brown, James Dean

    2013-01-01

    The purpose of this article is to examine the literature on teaching statistics for useful ideas that teachers of language testing courses can draw on and incorporate into their teaching toolkits as they see fit. To those ends, the article addresses eight questions: What is known generally about teaching statistics? Why are students so anxious…

  11. Continuous/discrete non parametric Bayesian belief nets with UNICORN and UNINET

    NARCIS (Netherlands)

    Cooke, R.M.; Kurowicka, D.; Hanea, A.M.; Morales Napoles, O.; Ababei, D.A.; Ale, B.J.M.; Roelen, A.

    2007-01-01

    Hanea et al. (2006) presented a method for quantifying and computing continuous/discrete non parametric Bayesian Belief Nets (BBN). Influences are represented as conditional rank correlations, and the joint normal copula enables rapid sampling and conditionalization. Further mathematical background

  12. Kernel bandwidth estimation for non-parametric density estimation: a comparative study

    CSIR Research Space (South Africa)

    Van der Walt, CM

    2013-12-01

    Full Text Available We investigate the performance of conventional bandwidth estimators for non-parametric kernel density estimation on a number of representative pattern-recognition tasks, to gain a better understanding of the behaviour of these estimators in high...

  13. Statistics: Notes and Examples. Study Guide for the Doctor of Arts in Computer-Based Learning.

    Science.gov (United States)

    MacFarland, Thomas W.

    This study guide presents lessons on hand calculating various statistics: Central Tendency and Dispersion; Tips on Data Presentation; Two-Tailed and One-Tailed Tests of Significance; Error Types; Standard Scores; Non-Parametric Tests such as Chi-square, Spearman Rho, Sign Test, Wilcoxon Matched Pairs, Mann-Whitney U, Kruskal-Wallis, and Rank Sums;…

  14. SPSS for applied sciences basic statistical testing

    CERN Document Server

    Davis, Cole

    2013-01-01

    This book offers a quick and basic guide to using SPSS and provides a general approach to solving problems using statistical tests. It is both comprehensive in terms of the tests covered and the applied settings it refers to, and yet is short and easy to understand. Whether you are a beginner or an intermediate level test user, this book will help you to analyse different types of data in applied settings. It will also give you the confidence to use other statistical software and to extend your expertise to more specific scientific settings as required.The author does not use mathematical form

  15. Statistical test of VEP waveform equality.

    Science.gov (United States)

    Young, Rockefeller S L; Kimura, Eiji

    2010-04-01

    The aim of the study was to describe a theory and method for inferring the statistical significance of a visually evoked cortical potential (VEP) recording. The statistical evaluation is predicated on the pre-stimulus VEP as estimates of the cortical potentials expected when the stimulus does not produce an effect, a mathematical transform to convert the voltages into standard deviations from zero, and a time-series approach for estimating the variability of between-session VEPs under the null hypothesis. Empirical and Monte Carlo analyses address issues concerned with testability, statistical validity, clinical feasibility, as well as limitations of the proposed method. We conclude that visual electrophysiological recordings can be evaluated as a statistical study of n = 1 subject using time-series analysis when confounding effects are adequately controlled. The statistical test can be performed on either a single VEP or the difference between pairs of VEPs.

  16. Analysis of Preference Data Using Intermediate Test Statistic Abstract

    African Journals Online (AJOL)

    PROF. O. E. OSUAGWU

    2013-06-01

    Jun 1, 2013 ... Intermediate statistic is a link between Friedman test statistic and the multinomial statistic. The statistic is ... The null hypothesis Ho .... [7] Taplin, R.H., The Statistical Analysis of Preference Data, Applied Statistics, No. 4, pp.

  17. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    Science.gov (United States)

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  18. A Non-parametric Approach to Constrain the Transfer Function in Reverberation Mapping

    Science.gov (United States)

    Li, Yan-Rong; Wang, Jian-Min; Bai, Jin-Ming

    2016-11-01

    Broad emission lines of active galactic nuclei stem from a spatially extended region (broad-line region, BLR) that is composed of discrete clouds and photoionized by the central ionizing continuum. The temporal behaviors of these emission lines are blurred echoes of continuum variations (i.e., reverberation mapping, RM) and directly reflect the structures and kinematic information of BLRs through the so-called transfer function (also known as the velocity-delay map). Based on the previous works of Rybicki and Press and Zu et al., we develop an extended, non-parametric approach to determine the transfer function for RM data, in which the transfer function is expressed as a sum of a family of relatively displaced Gaussian response functions. Therefore, arbitrary shapes of transfer functions associated with complicated BLR geometry can be seamlessly included, enabling us to relax the presumption of a specified transfer function frequently adopted in previous studies and to let it be determined by observation data. We formulate our approach in a previously well-established framework that incorporates the statistical modeling of continuum variations as a damped random walk process and takes into account long-term secular variations which are irrelevant to RM signals. The application to RM data shows the fidelity of our approach.

  19. Dependence between fusion temperatures and chemical components of a certain type of coal using classical, non-parametric and bootstrap techniques

    Energy Technology Data Exchange (ETDEWEB)

    Gonzalez-Manteiga, W.; Prada-Sanchez, J.M.; Fiestras-Janeiro, M.G.; Garcia-Jurado, I. (Universidad de Santiago de Compostela, Santiago de Compostela (Spain). Dept. de Estadistica e Investigacion Operativa)

    1990-11-01

    A statistical study of the dependence between various critical fusion temperatures of a certain kind of coal and its chemical components is carried out. As well as using classical dependence techniques (multiple, stepwise and PLS regression, principal components, canonical correlation, etc.) together with the corresponding inference on the parameters of interest, non-parametric regression and bootstrap inference are also performed. 11 refs., 3 figs., 8 tabs.

  20. Statistical test theory for the behavioral sciences

    CERN Document Server

    de Gruijter, Dato N M

    2007-01-01

    Since the development of the first intelligence test in the early 20th century, educational and psychological tests have become important measurement techniques to quantify human behavior. Focusing on this ubiquitous yet fruitful area of research, Statistical Test Theory for the Behavioral Sciences provides both a broad overview and a critical survey of assorted testing theories and models used in psychology, education, and other behavioral science fields. Following a logical progression from basic concepts to more advanced topics, the book first explains classical test theory, covering true score, measurement error, and reliability. It then presents generalizability theory, which provides a framework to deal with various aspects of test scores. In addition, the authors discuss the concept of validity in testing, offering a strategy for evidence-based validity. In the two chapters devoted to item response theory (IRT), the book explores item response models, such as the Rasch model, and applications, incl...

  1. New Graphical Methods and Test Statistics for Testing Composite Normality

    Directory of Open Access Journals (Sweden)

    Marc S. Paolella

    2015-07-01

    Full Text Available Several graphical methods for testing univariate composite normality from an i.i.d. sample are presented. They are endowed with correct simultaneous error bounds and yield size-correct tests. As all are based on the empirical CDF, they are also consistent for all alternatives. For one test, called the modified stabilized probability test, or MSP, a highly simplified computational method is derived, which delivers the test statistic and also a highly accurate p-value approximation, essentially instantaneously. The MSP test is demonstrated to have higher power against asymmetric alternatives than the well-known and powerful Jarque-Bera test. A further size-correct test, based on combining two test statistics, is shown to have yet higher power. The methodology employed is fully general and can be applied to any i.i.d. univariate continuous distribution setting.

  2. The Non-Parametric Model for Linking Galaxy Luminosity with Halo/Subhalo Mass: Are First Brightest Galaxies Special?

    CERN Document Server

    Vale, A

    2007-01-01

    We revisit the longstanding question of whether first brightest cluster galaxies are statistically drawn from the same distribution as other cluster galaxies or are "special", using the new non-parametric, empirically based model presented in Vale&Ostriker (2006) for associating galaxy luminosity with halo/subhalo masses. We introduce scatter in galaxy luminosity at fixed halo mass into this model, building a conditional luminosity function (CLF) by considering two possible models: a simple lognormal and a model based on the distribution of concentration in haloes of a given mass. We show that this model naturally allows an identification of halo/subhalo systems with groups and clusters of galaxies, giving rise to a clear central/satellite galaxy distinction. We then use these results to build up the dependence of brightest cluster galaxy (BCG) magnitudes on cluster luminosity, focusing on two statistical indicators, the dispersion in BCG magnitude and the magnitude difference between first and second bri...

  3. Statistical tests to compare motif count exceptionalities

    Directory of Open Access Journals (Sweden)

    Vandewalle Vincent

    2007-03-01

    Full Text Available Abstract Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use.

  4. Climatic, parametric and non-parametric analysis of energy performance of double-glazed windows in different climates

    Directory of Open Access Journals (Sweden)

    Saeed Banihashemi

    2015-12-01

    Full Text Available In line with the growing global trend toward energy efficiency in buildings, this paper aims to first; investigate the energy performance of double-glazed windows in different climates and second; analyze the most dominant used parametric and non-parametric tests in dimension reduction for simulating this component. A four-story building representing the conventional type of residential apartments for four climates of cold, temperate, hot-arid and hot-humid was selected for simulation. 10 variables of U-factor, SHGC, emissivity, visible transmittance, monthly average dry bulb temperature, monthly average percent humidity, monthly average wind speed, monthly average direct solar radiation, monthly average diffuse solar radiation and orientation constituted the parameters considered in the calculation of cooling and heating loads of the case. Design of Experiment and Principal Component Analysis methods were applied to find the most significant factors and reduction dimension of initial variables. It was observed that in two climates of temperate and hot-arid, using double glazed windows was beneficial in both cold and hot months whereas in cold and hot-humid climates where heating and cooling loads are dominant respectively, they were advantageous in only those dominant months. Furthermore, an inconsistency was revealed between parametric and non-parametric tests in terms of identifying the most significant variables.

  5. A non-parametric peak calling algorithm for DamID-Seq.

    Directory of Open Access Journals (Sweden)

    Renhua Li

    Full Text Available Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS of double sex (DSX-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq. One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only. After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1 reads resampling; 2 reads scaling (normalization and computing signal-to-noise fold changes; 3 filtering; 4 Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC. We also used irreproducible discovery rate (IDR analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.

  6. A non-parametric peak calling algorithm for DamID-Seq.

    Science.gov (United States)

    Li, Renhua; Hempel, Leonie U; Jiang, Tingbo

    2015-01-01

    Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.

  7. Cancer driver gene discovery through an integrative genomics approach in a non-parametric Bayesian framework.

    Science.gov (United States)

    Yang, Hai; Wei, Qiang; Zhong, Xue; Yang, Hushan; Li, Bingshan

    2017-02-15

    Comprehensive catalogue of genes that drive tumor initiation and progression in cancer is key to advancing diagnostics, therapeutics and treatment. Given the complexity of cancer, the catalogue is far from complete yet. Increasing evidence shows that driver genes exhibit consistent aberration patterns across multiple-omics in tumors. In this study, we aim to leverage complementary information encoded in each of the omics data to identify novel driver genes through an integrative framework. Specifically, we integrated mutations, gene expression, DNA copy numbers, DNA methylation and protein abundance, all available in The Cancer Genome Atlas (TCGA) and developed iDriver, a non-parametric Bayesian framework based on multivariate statistical modeling to identify driver genes in an unsupervised fashion. iDriver captures the inherent clusters of gene aberrations and constructs the background distribution that is used to assess and calibrate the confidence of driver genes identified through multi-dimensional genomic data. We applied the method to 4 cancer types in TCGA and identified candidate driver genes that are highly enriched with known drivers. (e.g.: P < 3.40 × 10 -36 for breast cancer). We are particularly interested in novel genes and observed multiple lines of supporting evidence. Using systematic evaluation from multiple independent aspects, we identified 45 candidate driver genes that were not previously known across these 4 cancer types. The finding has important implications that integrating additional genomic data with multivariate statistics can help identify cancer drivers and guide the next stage of cancer genomics research. The C ++ source code is freely available at https://medschool.vanderbilt.edu/cgg/ . hai.yang@vanderbilt.edu or bingshan.li@Vanderbilt.Edu. Supplementary data are available at Bioinformatics online.

  8. Non-Parametric Bayesian Updating within the Assessment of Reliability for Offshore Wind Turbine Support Structures

    DEFF Research Database (Denmark)

    Ramirez, José Rangel; Sørensen, John Dalsgaard

    2011-01-01

    This work illustrates the updating and incorporation of information in the assessment of fatigue reliability for offshore wind turbine. The new information, coming from external and condition monitoring can be used to direct updating of the stochastic variables through a non-parametric Bayesian...... updating approach and be integrated in the reliability analysis by a third-order polynomial chaos expansion approximation. Although Classical Bayesian updating approaches are often used because of its parametric formulation, non-parametric approaches are better alternatives for multi-parametric updating...... with a non-conjugating formulation. The results in this paper show the influence on the time dependent updated reliability when non-parametric and classical Bayesian approaches are used. Further, the influence on the reliability of the number of updated parameters is illustrated....

  9. Non-parametric seismic hazard analysis in the presence of incomplete data

    Science.gov (United States)

    Yazdani, Azad; Mirzaei, Sajjad; Dadkhah, Koroush

    2017-01-01

    The distribution of earthquake magnitudes plays a crucial role in the estimation of seismic hazard parameters. Due to the complexity of earthquake magnitude distribution, non-parametric approaches are recommended over classical parametric methods. The main deficiency of the non-parametric approach is the lack of complete magnitude data in almost all cases. This study aims to introduce an imputation procedure for completing earthquake catalog data that will allow the catalog to be used for non-parametric density estimation. Using a Monte Carlo simulation, the efficiency of introduced approach is investigated. This study indicates that when a magnitude catalog is incomplete, the imputation procedure can provide an appropriate tool for seismic hazard assessment. As an illustration, the imputation procedure was applied to estimate earthquake magnitude distribution in Tehran, the capital city of Iran.

  10. Power of non-parametric linkage analysis in mapping genes contributing to human longevity in long-lived sib-pairs

    DEFF Research Database (Denmark)

    Tan, Qihua; Zhao, J H; Iachine, I

    2004-01-01

    This report investigates the power issue in applying the non-parametric linkage analysis of affected sib-pairs (ASP) [Kruglyak and Lander, 1995: Am J Hum Genet 57:439-454] to localize genes that contribute to human longevity using long-lived sib-pairs. Data were simulated by introducing a recently...... developed statistical model for measuring marker-longevity associations [Yashin et al., 1999: Am J Hum Genet 65:1178-1193], enabling direct power comparison between linkage and association approaches. The non-parametric linkage (NPL) scores estimated in the region harboring the causal allele are evaluated...... in case of a dominant effect. Although the power issue may depend heavily on the true genetic nature in maintaining survival, our study suggests that results from small-scale sib-pair investigations should be referred with caution, given the complexity of human longevity....

  11. Application of non-parametric bootstrap methods to estimate confidence intervals for QTL location in a beef cattle QTL experimental population.

    Science.gov (United States)

    Jongjoo, Kim; Davis, Scott K; Taylor, Jeremy F

    2002-06-01

    Empirical confidence intervals (CIs) for the estimated quantitative trait locus (QTL) location from selective and non-selective non-parametric bootstrap resampling methods were compared for a genome scan involving an Angus x Brahman reciprocal fullsib backcross population. Genetic maps, based on 357 microsatellite markers, were constructed for 29 chromosomes using CRI-MAP V2.4. Twelve growth, carcass composition and beef quality traits (n = 527-602) were analysed to detect QTLs utilizing (composite) interval mapping approaches. CIs were investigated for 28 likelihood ratio test statistic (LRT) profiles for the one QTL per chromosome model. The CIs from the non-selective bootstrap method were largest (87 7 cM average or 79-2% coverage of test chromosomes). The Selective II procedure produced the smallest CI size (42.3 cM average). However, CI sizes from the Selective II procedure were more variable than those produced by the two LOD drop method. CI ranges from the Selective II procedure were also asymmetrical (relative to the most likely QTL position) due to the bias caused by the tendency for the estimated QTL position to be at a marker position in the bootstrap samples and due to monotonicity and asymmetry of the LRT curve in the original sample.

  12. Transit Timing Observations from Kepler: II. Confirmation of Two Multiplanet Systems via a Non-parametric Correlation Analysis

    CERN Document Server

    Ford, Eric B; Steffen, Jason H; Carter, Joshua A; Fressin, Francois; Holman, Matthew J; Lissauer, Jack J; Moorhead, Althea V; Morehead, Robert C; Ragozzine, Darin; Rowe, Jason F; Welsh, William F; Allen, Christopher; Batalha, Natalie M; Borucki, William J; Bryson, Stephen T; Buchhave, Lars A; Burke, Christopher J; Caldwell, Douglas A; Charbonneau, David; Clarke, Bruce D; Cochran, William D; Désert, Jean-Michel; Endl, Michael; Everett, Mark E; Fischer, Debra A; Gautier, Thomas N; Gilliland, Ron L; Jenkins, Jon M; Haas, Michael R; Horch, Elliott; Howell, Steve B; Ibrahim, Khadeejah A; Isaacson, Howard; Koch, David G; Latham, David W; Li, Jie; Lucas, Philip; MacQueen, Phillip J; Marcy, Geoffrey W; McCauliff, Sean; Mullally, Fergal R; Quinn, Samuel N; Quintana, Elisa; Shporer, Avi; Still, Martin; Tenenbaum, Peter; Thompson, Susan E; Torres, Guillermo; Twicken, Joseph D; Wohler, Bill

    2012-01-01

    We present a new method for confirming transiting planets based on the combination of transit timingn variations (TTVs) and dynamical stability. Correlated TTVs provide evidence that the pair of bodies are in the same physical system. Orbital stability provides upper limits for the masses of the transiting companions that are in the planetary regime. This paper describes a non-parametric technique for quantifying the statistical significance of TTVs based on the correlation of two TTV data sets. We apply this method to an analysis of the transit timing variations of two stars with multiple transiting planet candidates identified by Kepler. We confirm four transiting planets in two multiple planet systems based on their TTVs and the constraints imposed by dynamical stability. An additional three candidates in these same systems are not confirmed as planets, but are likely to be validated as real planets once further observations and analyses are possible. If all were confirmed, these systems would be near 4:6:...

  13. Conditional statistical inference with multistage testing designs.

    Science.gov (United States)

    Zwitser, Robert J; Maris, Gunter

    2015-03-01

    In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.

  14. Non-parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods

    DEFF Research Database (Denmark)

    Høg, Esben

    In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean-reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...

  15. Non-parametric system identification from non-linear stochastic response

    DEFF Research Database (Denmark)

    Rüdinger, Finn; Krenk, Steen

    2001-01-01

    An estimation method is proposed for identification of non-linear stiffness and damping of single-degree-of-freedom systems under stationary white noise excitation. Non-parametric estimates of the stiffness and damping along with an estimate of the white noise intensity are obtained by suitable p...

  16. Non-Parametric Bayesian Updating within the Assessment of Reliability for Offshore Wind Turbine Support Structures

    DEFF Research Database (Denmark)

    Ramirez, José Rangel; Sørensen, John Dalsgaard

    2011-01-01

    This work illustrates the updating and incorporation of information in the assessment of fatigue reliability for offshore wind turbine. The new information, coming from external and condition monitoring can be used to direct updating of the stochastic variables through a non-parametric Bayesian u...

  17. Non-Parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods

    DEFF Research Database (Denmark)

    Høg, Esben

    2003-01-01

    In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean--reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...

  18. Non-parametric production analysis of pesticides use in the Netherlands

    NARCIS (Netherlands)

    Oude Lansink, A.G.J.M.; Silva, E.

    2004-01-01

    Many previous empirical studies on the productivity of pesticides suggest that pesticides are under-utilized in agriculture despite the general held believe that these inputs are substantially over-utilized. This paper uses data envelopment analysis (DEA) to calculate non-parametric measures of the

  19. Performances and Spending Efficiency in Higher Education: A European Comparison through Non-Parametric Approaches

    Science.gov (United States)

    Agasisti, Tommaso

    2011-01-01

    The objective of this paper is an efficiency analysis concerning higher education systems in European countries. Data have been extracted from OECD data-sets (Education at a Glance, several years), using a non-parametric technique--data envelopment analysis--to calculate efficiency scores. This paper represents the first attempt to conduct such an…

  20. Low default credit scoring using two-class non-parametric kernel density estimation

    CSIR Research Space (South Africa)

    Rademeyer, E

    2016-12-01

    Full Text Available This paper investigates the performance of two-class classification credit scoring data sets with low default ratios. The standard two-class parametric Gaussian and non-parametric Parzen classifiers are extended, using Bayes’ rule, to include either...

  1. Measuring the influence of networks on transaction costs using a non-parametric regression technique

    DEFF Research Database (Denmark)

    Henningsen, Géraldine; Henningsen, Arne; Henning, Christian H.C.A.

    . We empirically analyse the effect of networks on productivity using a cross-validated local linear non-parametric regression technique and a data set of 384 farms in Poland. Our empirical study generally supports our hypothesis that networks affect productivity. Large and dense trading networks...

  2. Non-Parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods

    DEFF Research Database (Denmark)

    Høg, Esben

    2003-01-01

    In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean--reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...

  3. Non-parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods

    DEFF Research Database (Denmark)

    Høg, Esben

    In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean-reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...

  4. Parametric and Non-Parametric Vibration-Based Structural Identification Under Earthquake Excitation

    Science.gov (United States)

    Pentaris, Fragkiskos P.; Fouskitakis, George N.

    2014-05-01

    The problem of modal identification in civil structures is of crucial importance, and thus has been receiving increasing attention in recent years. Vibration-based methods are quite promising as they are capable of identifying the structure's global characteristics, they are relatively easy to implement and they tend to be time effective and less expensive than most alternatives [1]. This paper focuses on the off-line structural/modal identification of civil (concrete) structures subjected to low-level earthquake excitations, under which, they remain within their linear operating regime. Earthquakes and their details are recorded and provided by the seismological network of Crete [2], which 'monitors' the broad region of south Hellenic arc, an active seismic region which functions as a natural laboratory for earthquake engineering of this kind. A sufficient number of seismic events are analyzed in order to reveal the modal characteristics of the structures under study, that consist of the two concrete buildings of the School of Applied Sciences, Technological Education Institute of Crete, located in Chania, Crete, Hellas. Both buildings are equipped with high-sensitivity and accuracy seismographs - providing acceleration measurements - established at the basement (structure's foundation) presently considered as the ground's acceleration (excitation) and at all levels (ground floor, 1st floor, 2nd floor and terrace). Further details regarding the instrumentation setup and data acquisition may be found in [3]. The present study invokes stochastic, both non-parametric (frequency-based) and parametric methods for structural/modal identification (natural frequencies and/or damping ratios). Non-parametric methods include Welch-based spectrum and Frequency response Function (FrF) estimation, while parametric methods, include AutoRegressive (AR), AutoRegressive with eXogeneous input (ARX) and Autoregressive Moving-Average with eXogeneous input (ARMAX) models[4, 5

  5. Statistical analysis of concrete quality testing results

    Directory of Open Access Journals (Sweden)

    Jevtić Dragica

    2014-01-01

    Full Text Available This paper statistically investigates the testing results of compressive strength and density of control concrete specimens tested in the Laboratory for materials, Faculty of Civil Engineering, University of Belgrade, during 2012. The total number of 4420 concrete specimens were tested, which were sampled on different locations - either on concrete production site (concrete plant, or concrete placement location (construction site. To be exact, these samples were made of concrete which was produced on 15 concrete plants, i.e. placed in at 50 different reinforced concrete structures, built during 2012 by 22 different contractors. It is a known fact that the achieved values of concrete compressive strength are very important, both for quality and durability assessment of concrete inside the structural elements, as well as for calculation of their load-bearing capacity limit. Together with the compressive strength testing results, the data concerning requested (designed concrete class, matching between the designed and the achieved concrete quality, concrete density values and frequency of execution of concrete works during 2012 were analyzed.

  6. Revisiting the Distance Duality Relation using a non-parametric regression method

    Science.gov (United States)

    Rana, Akshay; Jain, Deepak; Mahajan, Shobhit; Mukherjee, Amitabha

    2016-07-01

    The interdependence of luminosity distance, DL and angular diameter distance, DA given by the distance duality relation (DDR) is very significant in observational cosmology. It is very closely tied with the temperature-redshift relation of Cosmic Microwave Background (CMB) radiation. Any deviation from η(z)≡ DL/DA (1+z)2 =1 indicates a possible emergence of new physics. Our aim in this work is to check the consistency of these relations using a non-parametric regression method namely, LOESS with SIMEX. This technique avoids dependency on the cosmological model and works with a minimal set of assumptions. Further, to analyze the efficiency of the methodology, we simulate a dataset of 020 points of η (z) data based on a phenomenological model η(z)= (1+z)epsilon. The error on the simulated data points is obtained by using the temperature of CMB radiation at various redshifts. For testing the distance duality relation, we use the JLA SNe Ia data for luminosity distances, while the angular diameter distances are obtained from radio galaxies datasets. Since the DDR is linked with CMB temperature-redshift relation, therefore we also use the CMB temperature data to reconstruct η (z). It is important to note that with CMB data, we are able to study the evolution of DDR upto a very high redshift z = 2.418. In this analysis, we find no evidence of deviation from η=1 within a 1σ region in the entire redshift range used in this analysis (0 < z <= 2.418).

  7. Assessment of water quality trends in the Minnesota River using non-parametric and parametric methods

    Science.gov (United States)

    Johnson, H.O.; Gupta, S.C.; Vecchia, A.V.; Zvomuya, F.

    2009-01-01

    Excessive loading of sediment and nutrients to rivers is a major problem in many parts of the United States. In this study, we tested the non-parametric Seasonal Kendall (SEAKEN) trend model and the parametric USGS Quality of Water trend program (QWTREND) to quantify trends in water quality of the Minnesota River at Fort Snelling from 1976 to 2003. Both methods indicated decreasing trends in flow-adjusted concentrations of total suspended solids (TSS), total phosphorus (TP), and orthophosphorus (OP) and a generally increasing trend in flow-adjusted nitrate plus nitrite-nitrogen (NO3-N) concentration. The SEAKEN results were strongly influenced by the length of the record as well as extreme years (dry or wet) earlier in the record. The QWTREND results, though influenced somewhat by the same factors, were more stable. The magnitudes of trends between the two methods were somewhat different and appeared to be associated with conceptual differences between the flow-adjustment processes used and with data processing methods. The decreasing trends in TSS, TP, and OP concentrations are likely related to conservation measures implemented in the basin. However, dilution effects from wet climate or additional tile drainage cannot be ruled out. The increasing trend in NO3-N concentrations was likely due to increased drainage in the basin. Since the Minnesota River is the main source of sediments to the Mississippi River, this study also addressed the rapid filling of Lake Pepin on the Mississippi River and found the likely cause to be increased flow due to recent wet climate in the region. Copyright ?? 2009 by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America. All rights reserved.

  8. Evaluación de la estabilidad de un cultivar de caña de azúcar (Saccharum spp. en diferentes ambientes agroecológicos a través de una técnica no paramétrica en Tucumán, R. Argentina Assessment of the stability of a sugarcane (Saccharum spp. cultivar in different environments by a non-parametric test in Tucumán, Argentina

    Directory of Open Access Journals (Sweden)

    Santiago Ostengo

    2011-12-01

    considered in a breeding program. It is for that reason that in sugar cane breeding, multienvironmental trials (MET are conducted at the last stage of the selection process. There exist different approaches to study genotype-environment interaction. One of these is the non-parametric technique, a valid and useful tool which allows making an initial exploration that can be easily interpreted. The non-parametric technique called relative consistency of performance enables the classification of genotypes into the following four categories: (i consistently superior; (ii inconsistently superior; (iii inconsistently inferior and (iv consistently inferior. This work aims to evaluate the consistency of performance of TUC 95-10 variety across different agro-ecological environments in the province of Tucumán (Argentina, as regards the variable tons of sugar per hectare and considering different crop ages. Data were obtained from MET of the Sugarcane Breeding Program of Estación Experimental Agroindustrial Obispo Colombres (EEAOC from Tucumán (Argentina, conducted at six sites through four crop ages. Results showed that TUC 95-10, recently released by EEAOC, can be labeled as consistently superior at all ages, i.e. it held the top position in sugar production in all tested environments. Therefore, it can be concluded that TUC 95-10 shows an excellent performance and good adaptation to different agro-ecological environments in Tucumán, at all crop ages.

  9. Statistical reasoning in clinical trials: hypothesis testing.

    Science.gov (United States)

    Kelen, G D; Brown, C G; Ashton, J

    1988-01-01

    Hypothesis testing is based on certain statistical and mathematical principles that allow investigators to evaluate data by making decisions based on the probability or implausibility of observing the results obtained. However, classic hypothesis testing has its limitations, and probabilities mathematically calculated are inextricably linked to sample size. Furthermore, the meaning of the p value frequently is misconstrued as indicating that the findings are also of clinical significance. Finally, hypothesis testing allows for four possible outcomes, two of which are errors that can lead to erroneous adoption of certain hypotheses: 1. The null hypothesis is rejected when, in fact, it is false. 2. The null hypothesis is rejected when, in fact, it is true (type I or alpha error). 3. The null hypothesis is conceded when, in fact, it is true. 4. The null hypothesis is conceded when, in fact, it is false (type II or beta error). The implications of these errors, their relation to sample size, the interpretation of negative trials, and strategies related to the planning of clinical trials will be explored in a future article in this journal.

  10. A Statistical Perspective on Highly Accelerated Testing

    Energy Technology Data Exchange (ETDEWEB)

    Thomas, Edward V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2015-02-01

    Highly accelerated life testing has been heavily promoted at Sandia (and elsewhere) as a means to rapidly identify product weaknesses caused by flaws in the product's design or manufacturing process. During product development, a small number of units are forced to fail at high stress. The failed units are then examined to determine the root causes of failure. The identification of the root causes of product failures exposed by highly accelerated life testing can instigate changes to the product's design and/or manufacturing process that result in a product with increased reliability. It is widely viewed that this qualitative use of highly accelerated life testing (often associated with the acronym HALT) can be useful. However, highly accelerated life testing has also been proposed as a quantitative means for "demonstrating" the reliability of a product where unreliability is associated with loss of margin via an identified and dominating failure mechanism. It is assumed that the dominant failure mechanism can be accelerated by changing the level of a stress factor that is assumed to be related to the dominant failure mode. In extreme cases, a minimal number of units (often from a pre-production lot) are subjected to a single highly accelerated stress relative to normal use. If no (or, sufficiently few) units fail at this high stress level, some might claim that a certain level of reliability has been demonstrated (relative to normal use conditions). Underlying this claim are assumptions regarding the level of knowledge associated with the relationship between the stress level and the probability of failure. The primary purpose of this document is to discuss (from a statistical perspective) the efficacy of using accelerated life testing protocols (and, in particular, "highly accelerated" protocols) to make quantitative inferences concerning the performance of a product (e.g., reliability) when in fact there is lack-of-knowledge and uncertainty concerning

  11. A Statistical Perspective on Highly Accelerated Testing.

    Energy Technology Data Exchange (ETDEWEB)

    Thomas, Edward V.

    2015-02-01

    Highly accelerated life testing has been heavily promoted at Sandia (and elsewhere) as a means to rapidly identify product weaknesses caused by flaws in the product's design or manufacturing process. During product development, a small number of units are forced to fail at high stress. The failed units are then examined to determine the root causes of failure. The identification of the root causes of product failures exposed by highly accelerated life testing can instigate changes to the product's design and/or manufacturing process that result in a product with increased reliability. It is widely viewed that this qualitative use of highly accelerated life testing (often associated with the acronym HALT) can be useful. However, highly accelerated life testing has also been proposed as a quantitative means for "demonstrating" the reliability of a product where unreliability is associated with loss of margin via an identified and dominating failure mechanism. It is assumed that the dominant failure mechanism can be accelerated by changing the level of a stress factor that is assumed to be related to the dominant failure mode. In extreme cases, a minimal number of units (often from a pre-production lot) are subjected to a single highly accelerated stress relative to normal use. If no (or, sufficiently few) units fail at this high stress level, some might claim that a certain level of reliability has been demonstrated (relative to normal use conditions). Underlying this claim are assumptions regarding the level of knowledge associated with the relationship between the stress level and the probability of failure. The primary purpose of this document is to discuss (from a statistical perspective) the efficacy of using accelerated life testing protocols (and, in particular, "highly accelerated" protocols) to make quantitative inferences concerning the performance of a product (e.g., reliability) when in fact there is lack-of-knowledge and uncertainty concerning

  12. Non-parametric Bayesian human motion recognition using a single MEMS tri-axial accelerometer.

    Science.gov (United States)

    Ahmed, M Ejaz; Song, Ju Bin

    2012-09-27

    In this paper, we propose a non-parametric clustering method to recognize the number of human motions using features which are obtained from a single microelectromechanical system (MEMS) accelerometer. Since the number of human motions under consideration is not known a priori and because of the unsupervised nature of the proposed technique, there is no need to collect training data for the human motions. The infinite Gaussian mixture model (IGMM) and collapsed Gibbs sampler are adopted to cluster the human motions using extracted features. From the experimental results, we show that the unanticipated human motions are detected and recognized with significant accuracy, as compared with the parametric Fuzzy C-Mean (FCM) technique, the unsupervised K-means algorithm, and the non-parametric mean-shift method.

  13. Non-Parametric Bayesian Human Motion Recognition Using a Single MEMS Tri-Axial Accelerometer

    Directory of Open Access Journals (Sweden)

    M. Ejaz Ahmed

    2012-09-01

    Full Text Available In this paper, we propose a non-parametric clustering method to recognize the number of human motions using features which are obtained from a single microelectromechanical system (MEMS accelerometer. Since the number of human motions under consideration is not known a priori and because of the unsupervised nature of the proposed technique, there is no need to collect training data for the human motions. The infinite Gaussian mixture model (IGMM and collapsed Gibbs sampler are adopted to cluster the human motions using extracted features. From the experimental results, we show that the unanticipated human motions are detected and recognized with significant accuracy, as compared with the parametric Fuzzy C-Mean (FCM technique, the unsupervised K-means algorithm, and the non-parametric mean-shift method.

  14. The Galker test of speech reception in noise

    DEFF Research Database (Denmark)

    Lauritsen, Maj-Britt Glenn; Söderström, Margareta; Kreiner, Svend

    2016-01-01

    and daycare teachers completed questionnaires on the children's ability to hear and understand speech. As most of the variables were not assessed using interval scales, non-parametric statistics (Goodman-Kruskal's gamma) were used for analyzing associations with the Galker test score. For comparisons......, analysis of variance (ANOVA) was used. Interrelations were adjusted for using a non-parametric graphic model. RESULTS: In unadjusted analyses, the Galker test was associated with gender, age group, language development (Reynell revised scale), audiometry, and tympanometry. The Galker score was also...

  15. Estimating Financial Risk Measures for Futures Positions:A Non-Parametric Approach

    OpenAIRE

    Cotter, John; dowd, kevin

    2011-01-01

    This paper presents non-parametric estimates of spectral risk measures applied to long and short positions in 5 prominent equity futures contracts. It also compares these to estimates of two popular alternative measures, the Value-at-Risk (VaR) and Expected Shortfall (ES). The spectral risk measures are conditioned on the coefficient of absolute risk aversion, and the latter two are conditioned on the confidence level. Our findings indicate that all risk measures increase dramatically and the...

  16. A Non Parametric Study of the Volatility of the Economy as a Country Risk Predictor

    CERN Document Server

    Costanzo, Sabatino; Dominguez, Ramses; Moreno, William

    2007-01-01

    This paper intends to explain Venezuela's country spread behavior through the Neural Networks analysis of a monthly economic activity general index of economic indicators constructed by the Central Bank of Venezuela, a measure of the shocks affecting country risk of emerging markets and the U.S. short term interest rate. The use of non parametric methods allowed the finding of non linear relationship between these inputs and the country risk. The networks performance was evaluated using the method of excess predictability.

  17. Statistical Tests of Galactic Dynamo Theory

    Science.gov (United States)

    Chamandy, Luke; Shukurov, Anvar; Taylor, A. Russ

    2016-12-01

    Mean-field galactic dynamo theory is the leading theory to explain the prevalence of regular magnetic fields in spiral galaxies, but its systematic comparison with observations is still incomplete and fragmentary. Here we compare predictions of mean-field dynamo models to observational data on magnetic pitch angle and the strength of the mean magnetic field. We demonstrate that a standard {α }2{{Ω }} dynamo model produces pitch angles of the regular magnetic fields of nearby galaxies that are reasonably consistent with available data. The dynamo estimates of the magnetic field strength are generally within a factor of a few of the observational values. Reasonable agreement between theoretical and observed pitch angles generally requires the turbulent correlation time τ to be in the range of 10-20 {Myr}, in agreement with standard estimates. Moreover, good agreement also requires that the ratio of the ionized gas scale height to root-mean-square turbulent velocity increases with radius. Our results thus widen the possibilities to constrain interstellar medium parameters using observations of magnetic fields. This work is a step toward systematic statistical tests of galactic dynamo theory. Such studies are becoming more and more feasible as larger data sets are acquired using current and up-and-coming instruments.

  18. Statistical tests of simple earthquake cycle models

    Science.gov (United States)

    DeVries, Phoebe M. R.; Evans, Eileen L.

    2016-12-01

    A central goal of observing and modeling the earthquake cycle is to forecast when a particular fault may generate an earthquake: a fault late in its earthquake cycle may be more likely to generate an earthquake than a fault early in its earthquake cycle. Models that can explain geodetic observations throughout the entire earthquake cycle may be required to gain a more complete understanding of relevant physics and phenomenology. Previous efforts to develop unified earthquake models for strike-slip faults have largely focused on explaining both preseismic and postseismic geodetic observations available across a few faults in California, Turkey, and Tibet. An alternative approach leverages the global distribution of geodetic and geologic slip rate estimates on strike-slip faults worldwide. Here we use the Kolmogorov-Smirnov test for similarity of distributions to infer, in a statistically rigorous manner, viscoelastic earthquake cycle models that are inconsistent with 15 sets of observations across major strike-slip faults. We reject a large subset of two-layer models incorporating Burgers rheologies at a significance level of α = 0.05 (those with long-term Maxwell viscosities ηM 4.6 × 1020 Pa s) but cannot reject models on the basis of transient Kelvin viscosity ηK. Finally, we examine the implications of these results for the predicted earthquake cycle timing of the 15 faults considered and compare these predictions to the geologic and historical record.

  19. A Comparison of Parametric and Non-Parametric Methods Applied to a Likert Scale.

    Science.gov (United States)

    Mircioiu, Constantin; Atkinson, Jeffrey

    2017-05-10

    A trenchant and passionate dispute over the use of parametric versus non-parametric methods for the analysis of Likert scale ordinal data has raged for the past eight decades. The answer is not a simple "yes" or "no" but is related to hypotheses, objectives, risks, and paradigms. In this paper, we took a pragmatic approach. We applied both types of methods to the analysis of actual Likert data on responses from different professional subgroups of European pharmacists regarding competencies for practice. Results obtained show that with "large" (>15) numbers of responses and similar (but clearly not normal) distributions from different subgroups, parametric and non-parametric analyses give in almost all cases the same significant or non-significant results for inter-subgroup comparisons. Parametric methods were more discriminant in the cases of non-similar conclusions. Considering that the largest differences in opinions occurred in the upper part of the 4-point Likert scale (ranks 3 "very important" and 4 "essential"), a "score analysis" based on this part of the data was undertaken. This transformation of the ordinal Likert data into binary scores produced a graphical representation that was visually easier to understand as differences were accentuated. In conclusion, in this case of Likert ordinal data with high response rates, restraining the analysis to non-parametric methods leads to a loss of information. The addition of parametric methods, graphical analysis, analysis of subsets, and transformation of data leads to more in-depth analyses.

  20. Non-parametric foreground subtraction for 21cm epoch of reionization experiments

    CERN Document Server

    Harker, Geraint; Bernardi, Gianni; Brentjens, Michiel A; De Bruyn, A G; Ciardi, Benedetta; Jelic, Vibor; Koopmans, Leon V E; Labropoulos, Panagiotis; Mellema, Garrelt; Offringa, Andre; Pandey, V N; Schaye, Joop; Thomas, Rajat M; Yatawatta, Sarod

    2009-01-01

    An obstacle to the detection of redshifted 21cm emission from the epoch of reionization (EoR) is the presence of foregrounds which exceed the cosmological signal in intensity by orders of magnitude. We argue that in principle it would be better to fit the foregrounds non-parametrically - allowing the data to determine their shape - rather than selecting some functional form in advance and then fitting its parameters. Non-parametric fits often suffer from other problems, however. We discuss these before suggesting a non-parametric method, Wp smoothing, which seems to avoid some of them. After outlining the principles of Wp smoothing we describe an algorithm used to implement it. We then apply Wp smoothing to a synthetic data cube for the LOFAR EoR experiment. The performance of Wp smoothing, measured by the extent to which it is able to recover the variance of the cosmological signal and to which it avoids leakage of power from the foregrounds, is compared to that of a parametric fit, and to another non-parame...

  1. Hypothesis testing and statistical analysis of microbiome

    Directory of Open Access Journals (Sweden)

    Yinglin Xia

    2017-09-01

    Full Text Available After the initiation of Human Microbiome Project in 2008, various biostatistic and bioinformatic tools for data analysis and computational methods have been developed and applied to microbiome studies. In this review and perspective, we discuss the research and statistical hypotheses in gut microbiome studies, focusing on mechanistic concepts that underlie the complex relationships among host, microbiome, and environment. We review the current available statistic tools and highlight recent progress of newly developed statistical methods and models. Given the current challenges and limitations in biostatistic approaches and tools, we discuss the future direction in developing statistical methods and models for the microbiome studies.

  2. Frequency of the adequate use of statistical tests of hypothesis in original articles published in the Revista Brasileira de Anestesiologia between January 2008 and December 2009.

    Science.gov (United States)

    Barbosa, Fabiano Timbó; de Souza, Diego Agra

    2010-01-01

    Statistical analysis is necessary for adequate evaluation of the original article by the reader allowing him/her to better visualize and comprehend the results. The objective of the present study was to determine the frequency of the adequate use of statistical tests in original articles published in the Revista Brasileira de Anestesiologia from January 2008 to December 2009. Original articles published in the Revista Brasileira de Anestesiologia between January 2008 and December 2009 were selected. The use of statistical tests was deemed appropriate when the selection of the tests was adequate for continuous and categorical variables and for parametric and non-parametric tests, the correction factor was described when the use of multiple comparisons was reported, and the specific use of a statistical test for analysis of one variable was mentioned. Seventy-six original articles from a total of 179 statistical tests were selected. The frequency of the statistical tests used more often was: Chi-square 20.11%, Student t test 19.55%, ANOVA 10.05%, and Fisher exact test 9.49%. The frequency of the adequate use of statistical tests was 56.42% (95% CI 49.16% to 63.68%), erroneous use in 13.41% (95% CI 8.42% to 18.40%), and an inconclusive result in 30.16% (95% CI 23.44% to 36.88%). The frequency of inadequate use of statistical tests in the articles published by the Revista Brasileira de Anestesiologia between January 2008 and December 2009 was 56.42%. Copyright © 2010 Elsevier Editora Ltda. All rights reserved.

  3. Testing for Statistical Discrimination based on Gender

    DEFF Research Database (Denmark)

    Lesner, Rune Vammen

    This paper develops a model which incorporates the two most commonly cited strands of the literature on statistical discrimination, namely screening discrimination and stereotyping. The model is used to provide empirical evidence of statistical discrimination based on gender in the labour market....... It is shown that the implications of both screening discrimination and stereotyping are consistent with observable wage dynamics. In addition, it is found that the gender wage gap decreases in tenure but increases in job transitions and that the fraction of women in high-ranking positions within a firm does...... not affect the level of statistical discrimination by gender....

  4. Statistical Decision Theory Estimation, Testing, and Selection

    CERN Document Server

    Liese, Friedrich

    2008-01-01

    Suitable for advanced graduate students and researchers in mathematical statistics and decision theory, this title presents an account of the concepts and a treatment of the major results of classical finite sample size decision theory and modern asymptotic decision theory

  5. Detecting correlation changes in multivariate time series: A comparison of four non-parametric change point detection methods.

    Science.gov (United States)

    Cabrieto, Jedelyn; Tuerlinckx, Francis; Kuppens, Peter; Grassmann, Mariel; Ceulemans, Eva

    2017-06-01

    Change point detection in multivariate time series is a complex task since next to the mean, the correlation structure of the monitored variables may also alter when change occurs. DeCon was recently developed to detect such changes in mean and\\or correlation by combining a moving windows approach and robust PCA. However, in the literature, several other methods have been proposed that employ other non-parametric tools: E-divisive, Multirank, and KCP. Since these methods use different statistical approaches, two issues need to be tackled. First, applied researchers may find it hard to appraise the differences between the methods. Second, a direct comparison of the relative performance of all these methods for capturing change points signaling correlation changes is still lacking. Therefore, we present the basic principles behind DeCon, E-divisive, Multirank, and KCP and the corresponding algorithms, to make them more accessible to readers. We further compared their performance through extensive simulations using the settings of Bulteel et al. (Biological Psychology, 98 (1), 29-42, 2014) implying changes in mean and in correlation structure and those of Matteson and James (Journal of the American Statistical Association, 109 (505), 334-345, 2014) implying different numbers of (noise) variables. KCP emerged as the best method in almost all settings. However, in case of more than two noise variables, only DeCon performed adequately in detecting correlation changes.

  6. Scaling of preferential flow in biopores by parametric or non parametric transfer functions

    Science.gov (United States)

    Zehe, E.; Hartmann, N.; Klaus, J.; Palm, J.; Schroeder, B.

    2009-04-01

    finally assign the measured hydraulic capacities to these pores. By combining this population of macropores with observed data on soil hydraulic properties we obtain a virtual reality. Flow and transport is simulated for different rainfall forcings comparing two models, Hydrus 3d and Catflow. The simulated cumulative travel depths distributions for different forcings will be linked to the cumulative depth distribution of connected flow paths. The latter describes the fraction of connected paths - where flow resistance is always below a selected threshold that links the surface to a certain critical depth. Systematic variation of the average number of macropores and their depth distributions will show whether a clear link between the simulated travel depths distributions and the depth distribution of connected paths may be identified. The third essential step is to derive a non parametric transfer function that predicts travel depth distributions of tracers and on the long term pesticides based on easy-to-assess subsurface characteristics (mainly density and depth distribution of worm burrows, soil matrix properties), initial conditions and rainfall forcing. Such a transfer function is independent of scale ? as long as we stay in the same ensemble i.e. worm population and soil properties stay the same. Shipitalo, M.J. and Butt, K.R. (1999): Occupancy and geometrical properties of Lumbricus terrestris L. burrows affecting infiltration. Pedobiologia 43:782-794 Zehe E, and Fluehler H. (2001b): Slope scale distribution of flow patterns in soil profiles. J. Hydrol. 247: 116-132.

  7. Statistical Tests for Mixed Linear Models

    CERN Document Server

    Khuri, André I; Sinha, Bimal K

    2011-01-01

    An advanced discussion of linear models with mixed or random effects. In recent years a breakthrough has occurred in our ability to draw inferences from exact and optimum tests of variance component models, generating much research activity that relies on linear models with mixed and random effects. This volume covers the most important research of the past decade as well as the latest developments in hypothesis testing. It compiles all currently available results in the area of exact and optimum tests for variance component models and offers the only comprehensive treatment for these models a

  8. A web application for evaluating Phase I methods using a non-parametric optimal benchmark.

    Science.gov (United States)

    Wages, Nolan A; Varhegyi, Nikole

    2017-06-01

    In evaluating the performance of Phase I dose-finding designs, simulation studies are typically conducted to assess how often a method correctly selects the true maximum tolerated dose under a set of assumed dose-toxicity curves. A necessary component of the evaluation process is to have some concept for how well a design can possibly perform. The notion of an upper bound on the accuracy of maximum tolerated dose selection is often omitted from the simulation study, and the aim of this work is to provide researchers with accessible software to quickly evaluate the operating characteristics of Phase I methods using a benchmark. The non-parametric optimal benchmark is a useful theoretical tool for simulations that can serve as an upper limit for the accuracy of maximum tolerated dose identification based on a binary toxicity endpoint. It offers researchers a sense of the plausibility of a Phase I method's operating characteristics in simulation. We have developed an R shiny web application for simulating the benchmark. The web application has the ability to quickly provide simulation results for the benchmark and requires no programming knowledge. The application is free to access and use on any device with an Internet browser. The application provides the percentage of correct selection of the maximum tolerated dose and an accuracy index, operating characteristics typically used in evaluating the accuracy of dose-finding designs. We hope this software will facilitate the use of the non-parametric optimal benchmark as an evaluation tool in dose-finding simulation.

  9. Non-parametric transformation for data correlation and integration: From theory to practice

    Energy Technology Data Exchange (ETDEWEB)

    Datta-Gupta, A.; Xue, Guoping; Lee, Sang Heon [Texas A& M Univ., College Station, TX (United States)

    1997-08-01

    The purpose of this paper is two-fold. First, we introduce the use of non-parametric transformations for correlating petrophysical data during reservoir characterization. Such transformations are completely data driven and do not require a priori functional relationship between response and predictor variables which is the case with traditional multiple regression. The transformations are very general, computationally efficient and can easily handle mixed data types for example, continuous variables such as porosity, permeability and categorical variables such as rock type, lithofacies. The power of the non-parametric transformation techniques for data correlation has been illustrated through synthetic and field examples. Second, we utilize these transformations to propose a two-stage approach for data integration during heterogeneity characterization. The principal advantages of our approach over traditional cokriging or cosimulation methods are: (1) it does not require a linear relationship between primary and secondary data, (2) it exploits the secondary information to its fullest potential by maximizing the correlation between the primary and secondary data, (3) it can be easily applied to cases where several types of secondary or soft data are involved, and (4) it significantly reduces variance function calculations and thus, greatly facilitates non-Gaussian cosimulation. We demonstrate the data integration procedure using synthetic and field examples. The field example involves estimation of pore-footage distribution using well data and multiple seismic attributes.

  10. A non-parametric approach to estimate the total deviation index for non-normal data.

    Science.gov (United States)

    Perez-Jaume, Sara; Carrasco, Josep L

    2015-11-10

    Concordance indices are used to assess the degree of agreement between different methods that measure the same characteristic. In this context, the total deviation index (TDI) is an unscaled concordance measure that quantifies to which extent the readings from the same subject obtained by different methods may differ with a certain probability. Common approaches to estimate the TDI assume data are normally distributed and linearity between response and effects (subjects, methods and random error). Here, we introduce a new non-parametric methodology for estimation and inference of the TDI that can deal with any kind of quantitative data. The present study introduces this non-parametric approach and compares it with the already established methods in two real case examples that represent situations of non-normal data (more specifically, skewed data and count data). The performance of the already established methodologies and our approach in these contexts is assessed by means of a simulation study. Copyright © 2015 John Wiley & Sons, Ltd.

  11. Non-parametric iterative model constraint graph min-cut for automatic kidney segmentation.

    Science.gov (United States)

    Freiman, M; Kronman, A; Esses, S J; Joskowicz, L; Sosna, J

    2010-01-01

    We present a new non-parametric model constraint graph min-cut algorithm for automatic kidney segmentation in CT images. The segmentation is formulated as a maximum a-posteriori estimation of a model-driven Markov random field. A non-parametric hybrid shape and intensity model is treated as a latent variable in the energy functional. The latent model and labeling map that minimize the energy functional are then simultaneously computed with an expectation maximization approach. The main advantages of our method are that it does not assume a fixed parametric prior model, which is subjective to inter-patient variability and registration errors, and that it combines both the model and the image information into a unified graph min-cut based segmentation framework. We evaluated our method on 20 kidneys from 10 CT datasets with and without contrast agent for which ground-truth segmentations were generated by averaging three manual segmentations. Our method yields an average volumetric overlap error of 10.95%, and average symmetric surface distance of 0.79 mm. These results indicate that our method is accurate and robust for kidney segmentation.

  12. MEASURING DARK MATTER PROFILES NON-PARAMETRICALLY IN DWARF SPHEROIDALS: AN APPLICATION TO DRACO

    Energy Technology Data Exchange (ETDEWEB)

    Jardel, John R.; Gebhardt, Karl [Department of Astronomy, The University of Texas, 2515 Speedway, Stop C1400, Austin, TX 78712-1205 (United States); Fabricius, Maximilian H.; Williams, Michael J. [Max-Planck Institut fuer extraterrestrische Physik, Giessenbachstrasse, D-85741 Garching bei Muenchen (Germany); Drory, Niv, E-mail: jardel@astro.as.utexas.edu [Instituto de Astronomia, Universidad Nacional Autonoma de Mexico, Avenida Universidad 3000, Ciudad Universitaria, C.P. 04510 Mexico D.F. (Mexico)

    2013-02-15

    We introduce a novel implementation of orbit-based (or Schwarzschild) modeling that allows dark matter density profiles to be calculated non-parametrically in nearby galaxies. Our models require no assumptions to be made about velocity anisotropy or the dark matter profile. The technique can be applied to any dispersion-supported stellar system, and we demonstrate its use by studying the Local Group dwarf spheroidal galaxy (dSph) Draco. We use existing kinematic data at larger radii and also present 12 new radial velocities within the central 13 pc obtained with the VIRUS-W integral field spectrograph on the 2.7 m telescope at McDonald Observatory. Our non-parametric Schwarzschild models find strong evidence that the dark matter profile in Draco is cuspy for 20 {<=} r {<=} 700 pc. The profile for r {>=} 20 pc is well fit by a power law with slope {alpha} = -1.0 {+-} 0.2, consistent with predictions from cold dark matter simulations. Our models confirm that, despite its low baryon content relative to other dSphs, Draco lives in a massive halo.

  13. A Bayesian non-parametric Potts model with application to pre-surgical FMRI data.

    Science.gov (United States)

    Johnson, Timothy D; Liu, Zhuqing; Bartsch, Andreas J; Nichols, Thomas E

    2013-08-01

    The Potts model has enjoyed much success as a prior model for image segmentation. Given the individual classes in the model, the data are typically modeled as Gaussian random variates or as random variates from some other parametric distribution. In this article, we present a non-parametric Potts model and apply it to a functional magnetic resonance imaging study for the pre-surgical assessment of peritumoral brain activation. In our model, we assume that the Z-score image from a patient can be segmented into activated, deactivated, and null classes, or states. Conditional on the class, or state, the Z-scores are assumed to come from some generic distribution which we model non-parametrically using a mixture of Dirichlet process priors within the Bayesian framework. The posterior distribution of the model parameters is estimated with a Markov chain Monte Carlo algorithm, and Bayesian decision theory is used to make the final classifications. Our Potts prior model includes two parameters, the standard spatial regularization parameter and a parameter that can be interpreted as the a priori probability that each voxel belongs to the null, or background state, conditional on the lack of spatial regularization. We assume that both of these parameters are unknown, and jointly estimate them along with other model parameters. We show through simulation studies that our model performs on par, in terms of posterior expected loss, with parametric Potts models when the parametric model is correctly specified and outperforms parametric models when the parametric model in misspecified.

  14. Caveats for using statistical significance tests in research assessments

    OpenAIRE

    2011-01-01

    This paper raises concerns about the advantages of using statistical significance tests in research assessments as has recently been suggested in the debate about proper normalization procedures for citation indicators. Statistical significance tests are highly controversial and numerous criticisms have been leveled against their use. Based on examples from articles by proponents of the use statistical significance tests in research assessments, we address some of the numerous problems with s...

  15. Assessing Statistical Aspects of Test Fairness with Structural Equation Modelling

    Science.gov (United States)

    Kline, Rex B.

    2013-01-01

    Test fairness and test bias are not synonymous concepts. Test bias refers to statistical evidence that the psychometrics or interpretation of test scores depend on group membership, such as gender or race, when such differences are not expected. A test that is grossly biased may be judged to be unfair, but test fairness concerns the broader, more…

  16. Similar tests and the standardized log likelihood ratio statistic

    DEFF Research Database (Denmark)

    Jensen, Jens Ledet

    1986-01-01

    When testing an affine hypothesis in an exponential family the 'ideal' procedure is to calculate the exact similar test, or an approximation to this, based on the conditional distribution given the minimal sufficient statistic under the null hypothesis. By contrast to this there is a 'primitive......' approach in which the marginal distribution of a test statistic considered and any nuisance parameter appearing in the test statistic is replaced by an estimate. We show here that when using standardized likelihood ratio statistics the 'primitive' procedure is in fact an 'ideal' procedure to order O(n -3...

  17. A note on measurement scales and statistical testing

    NARCIS (Netherlands)

    Meijer, R.R.; Oosterloo, Sebe J.

    2008-01-01

    In elementary books on applied statistics (e.g., Siegel, 1988; Agresti, 1990) and books on research methodology in psychology and personality assessment (e.g., Aiken, 1999), it is often suggested that the choice of a statistical test and the choice of statistical operations should be determined by

  18. A note on measurement scales and statistical testing

    NARCIS (Netherlands)

    Meijer, Rob R.; Oosterloo, Sebie J.

    2008-01-01

    In elementary books on applied statistics (e.g., Siegel, 1988; Agresti, 1990) and books on research methodology in psychology and personality assessment (e.g., Aiken, 1999), it is often suggested that the choice of a statistical test and the choice of statistical operations should be determined by t

  19. A statistical procedure for testing financial contagion

    Directory of Open Access Journals (Sweden)

    Attilio Gardini

    2013-05-01

    Full Text Available The aim of the paper is to provide an analysis of contagion through the measurement of the risk premia disequilibria dynamics. In order to discriminate among several disequilibrium situations we propose to test contagion on the basis of a two-step procedure: in the first step we estimate the preference parameters of the consumption-based asset pricing model (CCAPM to control for fundamentals and to measure the equilibrium risk premia in different countries; in the second step we measure the differences among empirical risk premia and equilibrium risk premia in order to test cross-country disequilibrium situations due to contagion. Disequilibrium risk premium measures are modelled by the multivariate DCC-GARCH model including a deterministic crisis variable. The model describes simultaneously the risk premia dynamics due to endogenous amplifications of volatility and to exogenous idiosyncratic shocks (contagion, having controlled for fundamentals effects in the first step. Our approach allows us to achieve two goals: (i to identify the disequilibria generated by irrational behaviours of the agents, which cause increasing in volatility that is not explained by the economic fundamentals but is endogenous to financial markets, and (ii to assess the existence of contagion effect defined by exogenous shift in cross-country return correlations during crisis periods. Our results show evidence of contagion from the United States to United Kingdom, Japan, France, and Italy during the financial crisis which started in 2007-08.

  20. Caveats for using statistical significance tests in research assessments

    CERN Document Server

    Schneider, Jesper W

    2011-01-01

    This paper raises concerns about the advantages of using statistical significance tests in research assessments as has recently been suggested in the debate about proper normalization procedures for citation indicators. Statistical significance tests are highly controversial and numerous criticisms have been leveled against their use. Based on examples from articles by proponents of the use statistical significance tests in research assessments, we address some of the numerous problems with such tests. The issues specifically discussed are the ritual practice of such tests, their dichotomous application in decision making, the difference between statistical and substantive significance, the implausibility of most null hypotheses, the crucial assumption of randomness, as well as the utility of standard errors and confidence intervals for inferential purposes. We argue that applying statistical significance tests and mechanically adhering to their results is highly problematic and detrimental to critical thinki...

  1. Distinguish Dynamic Basic Blocks by Structural Statistical Testing

    DEFF Research Database (Denmark)

    Petit, Matthieu; Gotlieb, Arnaud

    Statistical testing aims at generating random test data that respect selected probabilistic properties. A distribution probability is associated with the program input space in order to achieve statistical test purpose: to test the most frequent usage of software or to maximize the probability...... of satisfying a structural coverage criterion for instance. In this paper, we propose a new statistical testing method that generates sequences of random test data that respect the following probabilistic properties: 1) each sequence guarantees the uniform selection of feasible paths only and 2) the uniform...... control flow path) during the test data selection. We implemented this algorithm in a statistical test data generator for Java programs. A first experimental validation is presented...

  2. Misuse of statistical test in three decades of psychotherapy research.

    Science.gov (United States)

    Dar, R; Serlin, R C; Omer, H

    1994-02-01

    This article reviews the misuse of statistical tests in psychotherapy research studies published in the Journal of Consulting and Clinical Psychology in the years 1967-1968, 1977-1978, and 1987-1988. It focuses on 3 major problems in statistical practice: inappropriate uses of null hypothesis tests and p values, neglect of effect size, and inflation of Type I error rate. The impressive frequency of these problems is documented, and changes in statistical practices over the past 3 decades are interpreted in light of trends in psychotherapy research. The article concludes with practical suggestions for rational application of statistical tests.

  3. Quantum Hypothesis Testing and Non-Equilibrium Statistical Mechanics

    CERN Document Server

    Jaksic, V; Pillet, C -A; Seiringer, R

    2011-01-01

    We extend the mathematical theory of quantum hypothesis testing to the general $W^*$-algebraic setting and explore its relation with recent developments in non-equilibrium quantum statistical mechanics. In particular, we relate the large deviation principle for the full counting statistics of entropy flow to quantum hypothesis testing of the arrow of time.

  4. CAUSALITY BETWEEN GDP, ENERGY AND COAL CONSUMPTION IN INDIA, 1970-2011: A NON-PARAMETRIC BOOTSTRAP APPROACH

    Directory of Open Access Journals (Sweden)

    Rohin Anhal

    2013-10-01

    Full Text Available The aim of this paper is to examine the direction of causality between real GDP on the one hand and final energy and coal consumption on the other in India, for the period from 1970 to 2011. The methodology adopted is the non-parametric bootstrap procedure, which is used to construct the critical values for the hypothesis of causality. The results of the bootstrap tests show that for total energy consumption, there exists no causal relationship in either direction with GDP of India. However, if coal consumption is considered, we find evidence in support of unidirectional causality running from coal consumption to GDP. This clearly has important implications for the Indian economy. The most important implication is that curbing coal consumption in order to reduce carbon emissions would in turn have a limiting effect on economic growth. Our analysis contributes to the literature in three distinct ways. First, this is the first paper to use the bootstrap method to examine the growth-energy connection for the Indian economy. Second, we analyze data for the time period 1970 to 2011, thereby utilizing recently available data that has not been used by others. Finally, in contrast to the recently done studies, we adopt a disaggregated approach for the analysis of the growth-energy nexus by considering not only aggregate energy consumption, but coal consumption as well.

  5. Rural-urban Migration and Dynamics of Income Distribution in China: A Non-parametric Approach%Rural-urban Migration and Dynamics of Income Distribution in China: A Non-parametric Approach

    Institute of Scientific and Technical Information of China (English)

    Yong Liu,; Wei Zou

    2011-01-01

    Extending the income dynamics approach in Quah (2003), the present paper studies the enlarging income inequality in China over the past three decades from the viewpoint of rural-urban migration and economic transition. We establish non-parametric estimations of rural and urban income distribution functions in China, and aggregate a population- weighted, nationwide income distribution function taking into account rural-urban differences in technological progress and price indexes. We calculate 12 inequality indexes through non-parametric estimation to overcome the biases in existingparametric estimation and, therefore, provide more accurate measurement of income inequalitY. Policy implications have been drawn based on our research.

  6. Non-parametric co-clustering of large scale sparse bipartite networks on the GPU

    DEFF Research Database (Denmark)

    Hansen, Toke Jansen; Mørup, Morten; Hansen, Lars Kai

    2011-01-01

    Co-clustering is a problem of both theoretical and practical importance, e.g., market basket analysis and collaborative filtering, and in web scale text processing. We state the co-clustering problem in terms of non-parametric generative models which can address the issue of estimating the number...... of row and column clusters from a hypothesis space of an infinite number of clusters. To reach large scale applications of co-clustering we exploit that parameter inference for co-clustering is well suited for parallel computing. We develop a generic GPU framework for efficient inference on large scale......-life large scale collaborative filtering data and web scale text corpora, demonstrating that latent mesoscale structures extracted by the co-clustering problem as formulated by the Infinite Relational Model (IRM) are consistent across consecutive runs with different initializations and also relevant...

  7. Non-parametric Reconstruction of Cluster Mass Distribution from Strong Lensing Modelling Abell 370

    CERN Document Server

    Abdel-Salam, H M; Williams, L L R

    1997-01-01

    We describe a new non-parametric technique for reconstructing the mass distribution in galaxy clusters with strong lensing, i.e., from multiple images of background galaxies. The observed positions and redshifts of the images are considered as rigid constraints and through the lens (ray-trace) equation they provide us with linear constraint equations. These constraints confine the mass distribution to some allowed region, which is then found by linear programming. Within this allowed region we study in detail the mass distribution with minimum mass-to-light variation; also some others, such as the smoothest mass distribution. The method is applied to the extensively studied cluster Abell 370, which hosts a giant luminous arc and several other multiply imaged background galaxies. Our mass maps are constrained by the observed positions and redshifts (spectroscopic or model-inferred by previous authors) of the giant arc and multiple image systems. The reconstructed maps obtained for A370 reveal a detailed mass d...

  8. Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling.

    Science.gov (United States)

    Karsch, Kevin; Liu, Ce; Kang, Sing Bing

    2014-11-01

    We describe a technique that automatically generates plausible depth maps from videos using non-parametric depth sampling. We demonstrate our technique in cases where past methods fail (non-translating cameras and dynamic scenes). Our technique is applicable to single images as well as videos. For videos, we use local motion cues to improve the inferred depth maps, while optical flow is used to ensure temporal depth consistency. For training and evaluation, we use a Kinect-based system to collect a large data set containing stereoscopic videos with known depths. We show that our depth estimation technique outperforms the state-of-the-art on benchmark databases. Our technique can be used to automatically convert a monoscopic video into stereo for 3D visualization, and we demonstrate this through a variety of visually pleasing results for indoor and outdoor scenes, including results from the feature film Charade.

  9. A multitemporal and non-parametric approach for assessing the impacts of drought on vegetation greenness

    DEFF Research Database (Denmark)

    Carrao, Hugo; Sepulcre, Guadalupe; Horion, Stéphanie Marie Anne F;

    2013-01-01

    for the period between 1998 and 2010. The time-series analysis of vegetation greenness is performed during the growing season with a non-parametric method, namely the seasonal Relative Greenness (RG) of spatially accumulated fAPAR. The Global Land Cover map of 2000 and the GlobCover maps of 2005/2006 and 2009......This study evaluates the relationship between the frequency and duration of meteorological droughts and the subsequent temporal changes on the quantity of actively photosynthesizing biomass (greenness) estimated from satellite imagery on rainfed croplands in Latin America. An innovative non...... Full Data Reanalysis precipitation time-series product, which ranges from January 1901 to December 2010 and is interpolated at the spatial resolution of 1° (decimal degree, DD). Vegetation greenness composites are derived from 10-daily SPOT-VEGETATION images at the spatial resolution of 1/112° DD...

  10. Comparative Study of Parametric and Non-parametric Approaches in Fault Detection and Isolation

    DEFF Research Database (Denmark)

    Katebi, S.D.; Blanke, M.; Katebi, M.R.

    This report describes a comparative study between two approaches to fault detection and isolation in dynamic systems. The first approach uses a parametric model of the system. The main components of such techniques are residual and signature generation for processing and analyzing. The second...... approach is non-parametric in the sense that the signature analysis is only dependent on the frequency or time domain information extracted directly from the input-output signals. Based on these approaches, two different fault monitoring schemes are developed where the feature extraction and fault decision...... algorithms employed are adopted from the template matching in pattern recognition. Extensive simulation studies are performed to demonstrate satisfactory performance of the proposed techniques. The advantages and disadvantages of each approach are discussed and analyzed....

  11. Developing two non-parametric performance models for higher learning institutions

    Science.gov (United States)

    Kasim, Maznah Mat; Kashim, Rosmaini; Rahim, Rahela Abdul; Khan, Sahubar Ali Muhamed Nadhar

    2016-08-01

    Measuring the performance of higher learning Institutions (HLIs) is a must for these institutions to improve their excellence. This paper focuses on formation of two performance models: efficiency and effectiveness models by utilizing a non-parametric method, Data Envelopment Analysis (DEA). The proposed models are validated by measuring the performance of 16 public universities in Malaysia for year 2008. However, since data for one of the variables is unavailable, an estimate was used as a proxy to represent the real data. The results show that average efficiency and effectiveness scores were 0.817 and 0.900 respectively, while six universities were fully efficient and eight universities were fully effective. A total of six universities were both efficient and effective. It is suggested that the two proposed performance models would work as complementary methods to the existing performance appraisal method or as alternative methods in monitoring the performance of HLIs especially in Malaysia.

  12. Factors associated with malnutrition among tribal children in India: a non-parametric approach.

    Science.gov (United States)

    Debnath, Avijit; Bhattacharjee, Nairita

    2014-06-01

    The purpose of this study is to identify the determinants of malnutrition among the tribal children in India. The investigation is based on secondary data compiled from the National Family Health Survey-3. We used a classification and regression tree model, a non-parametric approach, to address the objective. Our analysis shows that breastfeeding practice, economic status, antenatal care of mother and women's decision-making autonomy are negatively associated with malnutrition among tribal children. We identify maternal malnutrition and urban concentration of household as the two risk factors for child malnutrition. The identified associated factors may be used for designing and targeting preventive programmes for malnourished tribal children. © The Author [2014]. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  13. Non-parametric method for separating domestic hot water heating spikes and space heating

    DEFF Research Database (Denmark)

    Bacher, Peder; de Saint-Aubain, Philip Anton; Christiansen, Lasse Engbo;

    2016-01-01

    In this paper a method for separating spikes from a noisy data series, where the data change and evolve over time, is presented. The method is applied on measurements of the total heat load for a single family house. It relies on the fact that the domestic hot water heating is a process generating...... short-lived spikes in the time series, while the space heating changes in slower patterns during the day dependent on the climate and user behavior. The challenge is to separate the domestic hot water heating spikes from the space heating without affecting the natural noise in the space heating...... measurements. The assumption behind the developed method is that the space heating can be estimated by a non-parametric kernel smoother, such that every value significantly above this kernel smoother estimate is identified as a domestic hot water heating spike. First, it is showed how a basic kernel smoothing...

  14. LICORS: Light Cone Reconstruction of States for Non-parametric Forecasting of Spatio-Temporal Systems

    CERN Document Server

    Goerg, Georg M

    2012-01-01

    We present a new, non-parametric forecasting method for data where continuous values are observed discretely in space and time. Our method, "light-cone reconstruction of states" (LICORS), uses physical principles to identify predictive states which are local properties of the system, both in space and time. LICORS discovers the number of predictive states and their predictive distributions automatically, and consistently, under mild assumptions on the data source. We provide an algorithm to implement our method, along with a cross-validation scheme to pick control settings. Simulations show that CV-tuned LICORS outperforms standard methods in forecasting challenging spatio-temporal dynamics. Our work provides applied researchers with a new, highly automatic method to analyze and forecast spatio-temporal data.

  15. Statistics

    CERN Document Server

    Hayslett, H T

    1991-01-01

    Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the

  16. Non-parametric PSF estimation from celestial transit solar images using blind deconvolution

    Science.gov (United States)

    González, Adriana; Delouille, Véronique; Jacques, Laurent

    2016-01-01

    Context: Characterization of instrumental effects in astronomical imaging is important in order to extract accurate physical information from the observations. The measured image in a real optical instrument is usually represented by the convolution of an ideal image with a Point Spread Function (PSF). Additionally, the image acquisition process is also contaminated by other sources of noise (read-out, photon-counting). The problem of estimating both the PSF and a denoised image is called blind deconvolution and is ill-posed. Aims: We propose a blind deconvolution scheme that relies on image regularization. Contrarily to most methods presented in the literature, our method does not assume a parametric model of the PSF and can thus be applied to any telescope. Methods: Our scheme uses a wavelet analysis prior model on the image and weak assumptions on the PSF. We use observations from a celestial transit, where the occulting body can be assumed to be a black disk. These constraints allow us to retain meaningful solutions for the filter and the image, eliminating trivial, translated, and interchanged solutions. Under an additive Gaussian noise assumption, they also enforce noise canceling and avoid reconstruction artifacts by promoting the whiteness of the residual between the blurred observations and the cleaned data. Results: Our method is applied to synthetic and experimental data. The PSF is estimated for the SECCHI/EUVI instrument using the 2007 Lunar transit, and for SDO/AIA using the 2012 Venus transit. Results show that the proposed non-parametric blind deconvolution method is able to estimate the core of the PSF with a similar quality to parametric methods proposed in the literature. We also show that, if these parametric estimations are incorporated in the acquisition model, the resulting PSF outperforms both the parametric and non-parametric methods.

  17. Non-parametric PSF estimation from celestial transit solar images using blind deconvolution

    Directory of Open Access Journals (Sweden)

    González Adriana

    2016-01-01

    Full Text Available Context: Characterization of instrumental effects in astronomical imaging is important in order to extract accurate physical information from the observations. The measured image in a real optical instrument is usually represented by the convolution of an ideal image with a Point Spread Function (PSF. Additionally, the image acquisition process is also contaminated by other sources of noise (read-out, photon-counting. The problem of estimating both the PSF and a denoised image is called blind deconvolution and is ill-posed. Aims: We propose a blind deconvolution scheme that relies on image regularization. Contrarily to most methods presented in the literature, our method does not assume a parametric model of the PSF and can thus be applied to any telescope. Methods: Our scheme uses a wavelet analysis prior model on the image and weak assumptions on the PSF. We use observations from a celestial transit, where the occulting body can be assumed to be a black disk. These constraints allow us to retain meaningful solutions for the filter and the image, eliminating trivial, translated, and interchanged solutions. Under an additive Gaussian noise assumption, they also enforce noise canceling and avoid reconstruction artifacts by promoting the whiteness of the residual between the blurred observations and the cleaned data. Results: Our method is applied to synthetic and experimental data. The PSF is estimated for the SECCHI/EUVI instrument using the 2007 Lunar transit, and for SDO/AIA using the 2012 Venus transit. Results show that the proposed non-parametric blind deconvolution method is able to estimate the core of the PSF with a similar quality to parametric methods proposed in the literature. We also show that, if these parametric estimations are incorporated in the acquisition model, the resulting PSF outperforms both the parametric and non-parametric methods.

  18. A Non-parametric Approach to Measuring the $k^{-}\\pi^{+}$ Amplitudes in $D^{+} \\to K^{-}K^{+}\\pi{+}$ Decay

    CERN Document Server

    Link, J M; Alimonti, G; Anjos, J C; Arena, V; Barberis, S; Bediaga, I; Benussi, L; Bianco, S; Boca, G; Bonomi, G; Boschini, M; Butler, J N; Carrillo, S; Casimiro, E; Castromonte, C; Cawlfield, C; Cerutti, A; Cheung, H W K; Chiodini, G; Cho, K; Chung, Y S; Cinquini, L; Cuautle, E; Cumalat, J P; D'Angelo, P; Davenport, T F; De Miranda, J M; Di Corato, M; Dini, P; Dos Reis, A C; Edera, L; Engh, D; Erba, S; Fabbri, F L; Frisullo, V; Gaines, I; Garbincius, P H; Gardner, R; Garren, L A; Gianini, G; Gottschalk, E; Göbel, C; Handler, T; Hernández, H; Hosack, M; Inzani, P; Johns, W E; Kang, J S; Kasper, P H; Kim, D Y; Ko, B R; Kreymer, A E; Kryemadhi, A; Kutschke, R; Kwak, J W; Lee, K B; Leveraro, F; Liguori, G; Lopes-Pegna, D; Luiggi, E; López, A M; Machado, A A; Magnin, J; Malvezzi, S; Massafferri, A; Menasce, D; Merlo, M M; Mezzadri, M; Mitchell, R; Moroni, L; Méndez, H; Nehring, M; O'Reilly, B; Otalora, J; Pantea, D; Paris, A; Park, H; Pedrini, D; Pepe, I M; Polycarpo, E; Pontoglio, C; Prelz, F; Quinones, J; Rahimi, A; Ramírez, J E; Ratti, S P; Reyes, M; Riccardi, C; Rovere, M; Sala, S; Segoni, I; Sheaff, M; Sheldon, P D; Stenson, K; Sánchez-Hernández, A; Uribe, C; Vaandering, E W; Vitulo, P; Vázquez, F; Wang, M; Webster, M; Wilson, J R; Wiss, J; Yager, P M; Zallo, A; Zhang, Y

    2007-01-01

    Using a large sample of \\dpkkpi{} decays collected by the FOCUS photoproduction experiment at Fermilab, we present the first non-parametric analysis of the \\kpi{} amplitudes in \\dpkkpi{} decay. The technique is similar to the technique used for our non-parametric measurements of the \\krzmndk{} form factors. Although these results are in rough agreement with those of E687, we observe a wider S-wave contribution for the \\ksw{} contribution than the standard, PDG \\cite{pdg} Breit-Wigner parameterization. We have some weaker evidence for the existence of a new, D-wave component at low values of the $K^- \\pi^+$ mass.

  19. Caveats for using statistical significance tests in research assessments

    DEFF Research Database (Denmark)

    Schneider, Jesper Wiborg

    2013-01-01

    This article raises concerns about the advantages of using statistical significance tests in research assessments as has recently been suggested in the debate about proper normalization procedures for citation indicators by Opthof and Leydesdorff (2010). Statistical significance tests are highly...... controversial and numerous criticisms have been leveled against their use. Based on examples from articles by proponents of the use statistical significance tests in research assessments, we address some of the numerous problems with such tests. The issues specifically discussed are the ritual practice...... of such tests, their dichotomous application in decision making, the difference between statistical and substantive significance, the implausibility of most null hypotheses, the crucial assumption of randomness, as well as the utility of standard errors and confidence intervals for inferential purposes. We...

  20. Caveats for using statistical significance tests in research assessments

    DEFF Research Database (Denmark)

    Schneider, Jesper Wiborg

    2013-01-01

    This article raises concerns about the advantages of using statistical significance tests in research assessments as has recently been suggested in the debate about proper normalization procedures for citation indicators by Opthof and Leydesdorff (2010). Statistical significance tests are highly...... controversial and numerous criticisms have been leveled against their use. Based on examples from articles by proponents of the use statistical significance tests in research assessments, we address some of the numerous problems with such tests. The issues specifically discussed are the ritual practice...... are important or not. On the contrary their use may be harmful. Like many other critics, we generally believe that statistical significance tests are over- and misused in the empirical sciences including scientometrics and we encourage a reform on these matters....

  1. Two independent pivotal statistics that test location and misspecification and add-up to the Anderson-Rubin statistic

    NARCIS (Netherlands)

    Kleibergen, F.R.

    2002-01-01

    We extend the novel pivotal statistics for testing the parameters in the instrumental variables regression model. We show that these statistics result from a decomposition of the Anderson-Rubin statistic into two independent pivotal statistics. The first statistic is a score statistic that tests loc

  2. Better Statistics for Better Decisions: Rejecting Null Hypotheses Statistical Tests in Favor of Replication Statistics

    Science.gov (United States)

    Sanabria, Federico; Killeen, Peter R.

    2007-01-01

    Despite being under challenge for the past 50 years, null hypothesis significance testing (NHST) remains dominant in the scientific field for want of viable alternatives. NHST, along with its significance level "p," is inadequate for most of the uses to which it is put, a flaw that is of particular interest to educational practitioners…

  3. Misuse of statistical tests in Archives of Clinical Neuropsychology publications.

    Science.gov (United States)

    Schatz, Philip; Jay, Kristin A; McComb, Jason; McLaughlin, Jason R

    2005-12-01

    This article reviews the (mis)use of statistical tests in neuropsychology research studies published in the Archives of Clinical Neuropsychology in the years 1990-1992 and 1996-2000, and 2001-2004, prior to, commensurate with the internet-based and paper-based release, and following the release of the American Psychological Association's Task Force on Statistical Inference. The authors focused on four statistical errors: inappropriate use of null hypothesis tests, inappropriate use of P-values, neglect of effect size, and inflation of Type I error rates. Despite the recommendations of the Task Force on Statistical Inference published in 1999, the present study recorded instances of these statistical errors both pre- and post-APA's report, with only the reporting of effect size increasing after the release of the report. Neuropsychologists involved in empirical research should be better aware of the limitations and boundaries of hypothesis testing as well as the theoretical aspects of research methodology.

  4. The Use of Meta-Analytic Statistical Significance Testing

    Science.gov (United States)

    Polanin, Joshua R.; Pigott, Terri D.

    2015-01-01

    Meta-analysis multiplicity, the concept of conducting multiple tests of statistical significance within one review, is an underdeveloped literature. We address this issue by considering how Type I errors can impact meta-analytic results, suggest how statistical power may be affected through the use of multiplicity corrections, and propose how…

  5. CUSUM-Based Person-Fit Statistics for Adaptive Testing.

    Science.gov (United States)

    van Krimpen-Stoop, Edith M. L. A.; Meijer, Rob R.

    2001-01-01

    Proposed person-fit statistics that are designed for use in a computerized adaptive test (CAT) and derived critical values for these statistics using cumulative sum (CUSUM) procedures so that item-score patterns can be classified as fitting or misfitting. Compared nominal Type I errors with empirical Type I errors through simulation studies. (SLD)

  6. Transit Timing Observations from Kepler: II. Confirmation of Two Multiplanet Systems via a Non-parametric Correlation Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Ford, Eric B.; /Florida U.; Fabrycky, Daniel C.; /Lick Observ.; Steffen, Jason H.; /Fermilab; Carter, Joshua A.; /Harvard-Smithsonian Ctr. Astrophys.; Fressin, Francois; /Harvard-Smithsonian Ctr. Astrophys.; Holman, Matthew J.; /Harvard-Smithsonian Ctr. Astrophys.; Lissauer, Jack J.; /NASA, Ames; Moorhead, Althea V.; /Florida U.; Morehead, Robert C.; /Florida U.; Ragozzine, Darin; /Harvard-Smithsonian Ctr. Astrophys.; Rowe, Jason F.; /NASA, Ames /SETI Inst., Mtn. View /San Diego State U., Astron. Dept.

    2012-01-01

    We present a new method for confirming transiting planets based on the combination of transit timing variations (TTVs) and dynamical stability. Correlated TTVs provide evidence that the pair of bodies are in the same physical system. Orbital stability provides upper limits for the masses of the transiting companions that are in the planetary regime. This paper describes a non-parametric technique for quantifying the statistical significance of TTVs based on the correlation of two TTV data sets. We apply this method to an analysis of the transit timing variations of two stars with multiple transiting planet candidates identified by Kepler. We confirm four transiting planets in two multiple planet systems based on their TTVs and the constraints imposed by dynamical stability. An additional three candidates in these same systems are not confirmed as planets, but are likely to be validated as real planets once further observations and analyses are possible. If all were confirmed, these systems would be near 4:6:9 and 2:4:6:9 period commensurabilities. Our results demonstrate that TTVs provide a powerful tool for confirming transiting planets, including low-mass planets and planets around faint stars for which Doppler follow-up is not practical with existing facilities. Continued Kepler observations will dramatically improve the constraints on the planet masses and orbits and provide sensitivity for detecting additional non-transiting planets. If Kepler observations were extended to eight years, then a similar analysis could likely confirm systems with multiple closely spaced, small transiting planets in or near the habitable zone of solar-type stars.

  7. A non-parametric approach for detecting gene-gene interactions associated with age-at-onset outcomes.

    Science.gov (United States)

    Li, Ming; Gardiner, Joseph C; Breslau, Naomi; Anthony, James C; Lu, Qing

    2014-07-01

    Cox-regression-based methods have been commonly used for the analyses of survival outcomes, such as age-at-disease-onset. These methods generally assume the hazard functions are proportional among various risk groups. However, such an assumption may not be valid in genetic association studies, especially when complex interactions are involved. In addition, genetic association studies commonly adopt case-control designs. Direct use of Cox regression to case-control data may yield biased estimators and incorrect statistical inference. We propose a non-parametric approach, the weighted Nelson-Aalen (WNA) approach, for detecting genetic variants that are associated with age-dependent outcomes. The proposed approach can be directly applied to prospective cohort studies, and can be easily extended for population-based case-control studies. Moreover, it does not rely on any assumptions of the disease inheritance models, and is able to capture high-order gene-gene interactions. Through simulations, we show the proposed approach outperforms Cox-regression-based methods in various scenarios. We also conduct an empirical study of progression of nicotine dependence by applying the WNA approach to three independent datasets from the Study of Addiction: Genetics and Environment. In the initial dataset, two SNPs, rs6570989 and rs2930357, located in genes GRIK2 and CSMD1, are found to be significantly associated with the progression of nicotine dependence (ND). The joint association is further replicated in two independent datasets. Further analysis suggests that these two genes may interact and be associated with the progression of ND. As demonstrated by the simulation studies and real data analysis, the proposed approach provides an efficient tool for detecting genetic interactions associated with age-at-onset outcomes.

  8. BIAZA statistics guidelines: toward a common application of statistical tests for zoo research.

    Science.gov (United States)

    Plowman, Amy B

    2008-05-01

    Zoo research presents many statistical challenges, mostly arising from the need to work with small sample sizes. Efforts to overcome these often lead to the misuse of statistics including pseudoreplication, inappropriate pooling, assumption violation or excessive Type II errors because of using tests with low power to avoid assumption violation. To tackle these issues and make some general statistical recommendations for zoo researchers, the Research Group of the British and Irish Association of Zoos and Aquariums (BIAZA) conducted a workshop. Participants included zoo-based researchers, university academics with zoo interests and three statistical experts. The result was a BIAZA publication Zoo Research Guidelines: Statistics for Typical Zoo Datasets (Plowman [2006] Zoo research guidelines: statistics for zoo datasets. London: BIAZA), which provides advice for zoo researchers on study design and analysis to ensure appropriate and rigorous use of statistics. The main recommendations are: (1) that many typical zoo investigations should be conducted as single case/small N randomized designs, analyzed with randomization tests, (2) that when comparing complete time budgets across conditions in behavioral studies, G tests and their derivatives are the most appropriate statistical tests and (3) that in studies involving multiple dependent and independent variables there are usually no satisfactory alternatives to traditional parametric tests and, despite some assumption violations, it is better to use these tests with careful interpretation, than to lose information through not testing at all. The BIAZA guidelines were recommended by American Association of Zoos and Aquariums (AZA) researchers at the AZA Annual Conference in Tampa, FL, September 2006, and are free to download from www.biaza.org.uk.

  9. Evaluation of Multi-parameter Test Statistics for Multiple Imputation.

    Science.gov (United States)

    Liu, Yu; Enders, Craig K

    2017-01-01

    In Ordinary Least Square regression, researchers often are interested in knowing whether a set of parameters is different from zero. With complete data, this could be achieved using the gain in prediction test, hierarchical multiple regression, or an omnibus F test. However, in substantive research scenarios, missing data often exist. In the context of multiple imputation, one of the current state-of-art missing data strategies, there are several different analogous multi-parameter tests of the joint significance of a set of parameters, and these multi-parameter test statistics can be referenced to various distributions to make statistical inferences. However, little is known about the performance of these tests, and virtually no research study has compared the Type 1 error rates and statistical power of these tests in scenarios that are typical of behavioral science data (e.g., small to moderate samples, etc.). This paper uses Monte Carlo simulation techniques to examine the performance of these multi-parameter test statistics for multiple imputation under a variety of realistic conditions. We provide a number of practical recommendations for substantive researchers based on the simulation results, and illustrate the calculation of these test statistics with an empirical example.

  10. Robust inference from multiple test statistics via permutations: a better alternative to the single test statistic approach for randomized trials.

    Science.gov (United States)

    Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie

    2013-01-01

    Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials.

  11. Statistical Evaluation of Molecular Contamination During Spacecraft Thermal Vacuum Test

    Science.gov (United States)

    Chen, Philip; Hedgeland, Randy; Montoya, Alex; Roman-Velazquez, Juan; Dunn, Jamie; Colony, Joe; Petitto, Joseph

    1999-01-01

    The purpose of this paper is to evaluate the statistical molecular contamination data with a goal to improve spacecraft contamination control. The statistical data was generated in typical thermal vacuum tests at the National Aeronautics and Space Administration, Goddard Space Flight Center (GSFC). The magnitude of material outgassing was measured using a Quartz Crystal Microbalance (QCNO device during the test. A solvent rinse sample was taken at the conclusion of each test. Then detailed qualitative and quantitative measurements were obtained through chemical analyses. All data used in this study encompassed numerous spacecraft tests in recent years.

  12. Generalized Correlation Coefficient Based on Log Likelihood Ratio Test Statistic

    Directory of Open Access Journals (Sweden)

    Liu Hsiang-Chuan

    2016-01-01

    Full Text Available In this paper, I point out that both Joe’s and Ding’s strength statistics can only be used for testing the pair-wise independence, and I propose a novel G-square based strength statistic, called Liu’s generalized correlation coefficient, it can be used to detect and compare the strength of not only the pair-wise independence but also the mutual independence of any multivariate variables. Furthermore, I proved that only Liu’s generalized correlation coefficient is strictly increasing on its number of variables, it is more sensitive and useful than Cramer’s V coefficient, in other words, Liu generalized correlation coefficient is not only the G-square based strength statistic, but also an improved statistic for detecting and comparing the strengths of deferent associations of any two or more sets of multivariate variables, moreover, this new strength statistic can also be tested by G2.

  13. Posterior contraction rate for non-parametric Bayesian estimation of the dispersion coefficient of a stochastic differential equation

    NARCIS (Netherlands)

    Gugushvili, S.; Spreij, P.

    2016-01-01

    We consider the problem of non-parametric estimation of the deterministic dispersion coefficient of a linear stochastic differential equation based on discrete time observations on its solution. We take a Bayesian approach to the problem and under suitable regularity assumptions derive the posteror

  14. Further Empirical Results on Parametric Versus Non-Parametric IRT Modeling of Likert-Type Personality Data

    Science.gov (United States)

    Maydeu-Olivares, Albert

    2005-01-01

    Chernyshenko, Stark, Chan, Drasgow, and Williams (2001) investigated the fit of Samejima's logistic graded model and Levine's non-parametric MFS model to the scales of two personality questionnaires and found that the graded model did not fit well. We attribute the poor fit of the graded model to small amounts of multidimensionality present in…

  15. Adjusting for population heterogeneity: a framework for characterizing statistical information and developing efficient test statistics.

    Science.gov (United States)

    Rabinowitz, Daniel

    2003-05-01

    The focus of this work is the TDT-type and family-based test statistics used for adjusting for potential confounding due to population heterogeneity or misspecified allele frequencies. A variety of heuristics have been used to motivate and derive these statistics, and the statistics have been developed for a variety of analytic goals. There appears to be no general theoretical framework, however, that may be used to evaluate competing approaches. Furthermore, there is no framework to guide the development of efficient TDT-type and family-based methods for analytic goals for which methods have not yet been proposed. The purpose of this paper is to present a theoretical framework that serves both to identify the information which is available to methods that are immune to confounding due to population heterogeneity or misspecified allele frequencies, and to inform the construction of efficient unbiased tests in novel settings. The development relies on the existence of a characterization of the null hypothesis in terms of a completely specified conditional distribution of transmitted genotypes. An important observation is that, with such a characterization, when the conditioning event is unobserved or incomplete, there is statistical information that cannot be exploited by any exact conditional test. The main technical result of this work is an approach to computing test statistics for local alternatives that exploit all of the available statistical information. Copyright 2003 Wiley-Liss, Inc.

  16. 病例对照设计为基础的候选基因关联研究中交互作用的统计方法进展%Progress of statistical methods for testing interactions in candidate gene association studies based on case-control design

    Institute of Scientific and Technical Information of China (English)

    金如锋

    2011-01-01

    候选基因关联研究中基因-基因、基因-环境交互作用的统计分析有利于揭示疾病的发生机制.本文针对病例对照设计的候选基因关联研究,综述交互作用的统计方法及其进展.交互作用的统计方法包括参数法和非参数法.参数法中最常用的为Logistic回归模型,非参数法主要是数据挖掘方法.有4类数据挖掘方法可用于候选基因关联研究,包括降维法、基于树的方法、模式识别法和贝叶斯法.本文对最常用且可靠的几种数据挖掘方法(多因子降维法、分类回归树、随机森林、贝叶斯上位效应关联图谱)的原理、分析过程和优缺点予以比较.参数法和非参数法分析交互作用时各有优缺点;低维数据的分析可采用参数法和非参数法,高维数据的分析则主要采用非参数法.随着基因分型技术的发展,可检测的SNP规模逐渐增大,使得非参数方法的应用越来越广.%Testing for gene-gene and gene-environment interactions in candidate gene association studies will help to reveal possible mechanisms underlying diseases. This article summarized the progress of statistical methods for testing interactions in candidate gene association studies based on case-control design. Parametric and non-parametric methods can be used to detect the interactions. Logistic regression is the most frequently used parametric method,and data mining techniques offer a variety of alternative non-parametric methods. Data mining techniques that can be applied in association studies consist of dimension reduction, tree-based approach, pattern recognition and Bayesian methods. Among alternative non-parametric methods we concentrated on the four methods which have become popular and are reliable for detection of interactions, including multifactor dimensionality reduction (MDR),classification and regression tree (CART), random forest, and Bayesian epistasis association mapping (BEAM). The principles

  17. Structuring feature space: a non-parametric method for volumetric transfer function generation.

    Science.gov (United States)

    Maciejewski, Ross; Woo, Insoo; Chen, Wei; Ebert, David S

    2009-01-01

    The use of multi-dimensional transfer functions for direct volume rendering has been shown to be an effective means of extracting materials and their boundaries for both scalar and multivariate data. The most common multi-dimensional transfer function consists of a two-dimensional (2D) histogram with axes representing a subset of the feature space (e.g., value vs. value gradient magnitude), with each entry in the 2D histogram being the number of voxels at a given feature space pair. Users then assign color and opacity to the voxel distributions within the given feature space through the use of interactive widgets (e.g., box, circular, triangular selection). Unfortunately, such tools lead users through a trial-and-error approach as they assess which data values within the feature space map to a given area of interest within the volumetric space. In this work, we propose the addition of non-parametric clustering within the transfer function feature space in order to extract patterns and guide transfer function generation. We apply a non-parametric kernel density estimation to group voxels of similar features within the 2D histogram. These groups are then binned and colored based on their estimated density, and the user may interactively grow and shrink the binned regions to explore feature boundaries and extract regions of interest. We also extend this scheme to temporal volumetric data in which time steps of 2D histograms are composited into a histogram volume. A three-dimensional (3D) density estimation is then applied, and users can explore regions within the feature space across time without adjusting the transfer function at each time step. Our work enables users to effectively explore the structures found within a feature space of the volume and provide a context in which the user can understand how these structures relate to their volumetric data. We provide tools for enhanced exploration and manipulation of the transfer function, and we show that the initial

  18. Accelerated testing statistical models, test plans, and data analysis

    CERN Document Server

    Nelson, Wayne B

    2009-01-01

    The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase global appeal and general circulation. With these new unabridged softcover volumes, Wiley hopes to extend the lives of these works by making them available to future generations of statisticians, mathematicians, and scientists. "". . . a goldmine of knowledge on accelerated life testing principles and practices . . . one of the very few capable of advancing the science of reliability. It definitely belongs in every bookshelf on engineering.""-Dev G.

  19. CUSUM-based person-fit statistics for adaptive testing

    NARCIS (Netherlands)

    Krimpen-Stoop, van Edith M.L.A.; Meijer, Rob R.

    2001-01-01

    Item scores that do not fit an assumed item response theory model may cause the latent trait value to be inaccurately estimated. Several person-fit statistics for detecting nonfitting score patterns for paper-and-pencil tests have been proposed. In the context of computerized adaptive tests (CAT), t

  20. Statistical Measures of Integrity in Online Testing: Empirical Study

    Science.gov (United States)

    Wielicki, Tom

    2016-01-01

    This paper reports on longitudinal study regarding integrity of testing in an online format as used by e-learning platforms. Specifically, this study explains whether online testing, which implies an open book format is compromising integrity of assessment by encouraging cheating among students. Statistical experiment designed for this study…

  1. Statistical significance test for transition matrices of atmospheric Markov chains

    Science.gov (United States)

    Vautard, Robert; Mo, Kingtse C.; Ghil, Michael

    1990-01-01

    Low-frequency variability of large-scale atmospheric dynamics can be represented schematically by a Markov chain of multiple flow regimes. This Markov chain contains useful information for the long-range forecaster, provided that the statistical significance of the associated transition matrix can be reliably tested. Monte Carlo simulation yields a very reliable significance test for the elements of this matrix. The results of this test agree with previously used empirical formulae when each cluster of maps identified as a distinct flow regime is sufficiently large and when they all contain a comparable number of maps. Monte Carlo simulation provides a more reliable way to test the statistical significance of transitions to and from small clusters. It can determine the most likely transitions, as well as the most unlikely ones, with a prescribed level of statistical significance.

  2. Comparative study of species sensitivity distributions based on non-parametric kernel density estimation for some transition metals.

    Science.gov (United States)

    Wang, Ying; Feng, Chenglian; Liu, Yuedan; Zhao, Yujie; Li, Huixian; Zhao, Tianhui; Guo, Wenjing

    2017-02-01

    Transition metals in the fourth period of the periodic table of the elements are widely widespread in aquatic environments. They could often occur at certain concentrations to cause adverse effects on aquatic life and human health. Generally, parametric models are mostly used to construct species sensitivity distributions (SSDs), which result in comparison for water quality criteria (WQC) of elements in the same period or group of the periodic table might be inaccurate and the results could be biased. To address this inadequacy, the non-parametric kernel density estimation (NPKDE) with its optimal bandwidths and testing methods were developed for establishing SSDs. The NPKDE was better fit, more robustness and better predicted than conventional normal and logistic parametric density estimations for constructing SSDs and deriving acute HC5 and WQC for transition metals in the fourth period of the periodic table. The decreasing sequence of HC5 values for the transition metals in the fourth period was Ti > Mn > V > Ni > Zn > Cu > Fe > Co > Cr(VI), which were not proportional to atomic number in the periodic table, and for different metals the relatively sensitive species were also different. The results indicated that except for physical and chemical properties there are other factors affecting toxicity mechanisms of transition metals. The proposed method enriched the methodological foundation for WQC. Meanwhile, it also provided a relatively innovative, accurate approach for the WQC derivation and risk assessment of the same group and period metals in aquatic environments to support protection of aquatic organisms.

  3. Non-parametric determination of H and He IS fluxes from cosmic-ray data

    CERN Document Server

    Ghelfi, A; Derome, L; Maurin, D

    2015-01-01

    Top-of-atmosphere (TOA) cosmic-ray (CR) fluxes from satellites and balloon-borne experiments are snapshots of the solar activity imprinted on the interstellar (IS) fluxes. Given a series of snapshots, the unknown IS flux shape and the level of modulation (for each snapshot) can be recovered. We wish (i) to provide the most accurate determination of the IS H and He fluxes from TOA data only, (ii) to obtain the associated modulation levels (and uncertainties) fully accounting for the correlations with the IS flux uncertainties, and (iii) to inspect whether the minimal Force-Field approximation is sufficient to explain all the data at hand. Using H and He TOA measurements, including the recent high precision AMS, BESS-Polar and PAMELA data, we perform a non-parametric fit of the IS fluxes $J^{\\rm IS}_{\\rm H,~He}$ and modulation level $\\phi_i$ for each data taking period. We rely on a Markov Chain Monte Carlo (MCMC) engine to extract the PDF and correlations (hence the credible intervals) of the sought parameters...

  4. THE DARK MATTER PROFILE OF THE MILKY WAY: A NON-PARAMETRIC RECONSTRUCTION

    Energy Technology Data Exchange (ETDEWEB)

    Pato, Miguel [The Oskar Klein Centre for Cosmoparticle Physics, Department of Physics, Stockholm University, AlbaNova, SE-106 91 Stockholm (Sweden); Iocco, Fabio [ICTP South American Institute for Fundamental Research, and Instituto de Física Teórica—Universidade Estadual Paulista (UNESP), Rua Dr. Bento Teobaldo Ferraz 271, 01140-070 São Paulo, SP (Brazil)

    2015-04-10

    We present the results of a new, non-parametric method to reconstruct the Galactic dark matter profile directly from observations. Using the latest kinematic data to track the total gravitational potential and the observed distribution of stars and gas to set the baryonic component, we infer the dark matter contribution to the circular velocity across the Galaxy. The radial derivative of this dynamical contribution is then estimated to extract the dark matter profile. The innovative feature of our approach is that it makes no assumption on the functional form or shape of the profile, thus allowing for a clean determination with no theoretical bias. We illustrate the power of the method by constraining the spherical dark matter profile between 2.5 and 25 kpc away from the Galactic center. The results show that the proposed method, free of widely used assumptions, can already be applied to pinpoint the dark matter distribution in the Milky Way with competitive accuracy, and paves the way for future developments.

  5. Non-parametric method for measuring gas inhomogeneities from X-ray observations of galaxy clusters

    CERN Document Server

    Morandi, Andrea; Cui, Wei

    2013-01-01

    We present a non-parametric method to measure inhomogeneities in the intracluster medium (ICM) from X-ray observations of galaxy clusters. Analyzing mock Chandra X-ray observations of simulated clusters, we show that our new method enables the accurate recovery of the 3D gas density and gas clumping factor profiles out to large radii of galaxy clusters. We then apply this method to Chandra X-ray observations of Abell 1835 and present the first determination of the gas clumping factor from the X-ray cluster data. We find that the gas clumping factor in Abell 1835 increases with radius and reaches ~2-3 at r=R_{200}. This is in good agreement with the predictions of hydrodynamical simulations, but it is significantly below the values inferred from recent Suzaku observations. We further show that the radially increasing gas clumping factor causes flattening of the derived entropy profile of the ICM and affects physical interpretation of the cluster gas structure, especially at the large cluster-centric radii. Our...

  6. Modeling the World Health Organization Disability Assessment Schedule II using non-parametric item response models.

    Science.gov (United States)

    Galindo-Garre, Francisca; Hidalgo, María Dolores; Guilera, Georgina; Pino, Oscar; Rojo, J Emilio; Gómez-Benito, Juana

    2015-03-01

    The World Health Organization Disability Assessment Schedule II (WHO-DAS II) is a multidimensional instrument developed for measuring disability. It comprises six domains (getting around, self-care, getting along with others, life activities and participation in society). The main purpose of this paper is the evaluation of the psychometric properties for each domain of the WHO-DAS II with parametric and non-parametric Item Response Theory (IRT) models. A secondary objective is to assess whether the WHO-DAS II items within each domain form a hierarchy of invariantly ordered severity indicators of disability. A sample of 352 patients with a schizophrenia spectrum disorder is used in this study. The 36 items WHO-DAS II was administered during the consultation. Partial Credit and Mokken scale models are used to study the psychometric properties of the questionnaire. The psychometric properties of the WHO-DAS II scale are satisfactory for all the domains. However, we identify a few items that do not discriminate satisfactorily between different levels of disability and cannot be invariantly ordered in the scale. In conclusion the WHO-DAS II can be used to assess overall disability in patients with schizophrenia, but some domains are too general to assess functionality in these patients because they contain items that are not applicable to this pathology.

  7. Non-parametric reconstruction of the galaxy-lens in PG1115+080

    CERN Document Server

    Saha, P; Saha, Prasenjit; Williams, Liliya L. R.

    1997-01-01

    We describe a new, non-parametric, method for reconstructing lensing mass distributions in multiple-image systems, and apply it to PG1115, for which time delays have recently been measured. It turns out that the image positions and the ratio of time delays between different pairs of images constrain the mass distribution in a linear fashion. Since observational errors on image positions and time delay ratios are constantly improving, we use these data as a rigid constraint in our modelling. In addition, we require the projected mass distributions to be inversion-symmetric and to have inward-pointing density gradients. With these realistic yet non-restrictive conditions it is very easy to produce mass distributions that fit the data precisely. We then present models, for $H_0=42$, 63 and 84 \\kmsmpc, that in each case minimize mass-to-light variations while strictly obeying the lensing constraints. (Only a very rough light distribution is available at present.) All three values of $H_0$ are consistent with the ...

  8. Decision making in coal mine planning using a non-parametric technique of indicator kriging

    Energy Technology Data Exchange (ETDEWEB)

    Mamurekli, D. [Hacettepe University, Ankara (Turkey). Mining Engineering Dept.

    1997-03-01

    In countries where low calorific value coal reserves are abundant and oil reserves are short or none, the requirement of energy production is mainly supported by coal-fired power stations. Consequently, planning to mine the low calorific value coal deposits gains much importance considering the technical and environmental restrictions. Such a mine in Kangal Town of Sivas City is the one that delivers run of mine coal directly to the power station built in the region. In case the calorific value and the ash content of the extracted coal are lower and higher than the required limits, 1300 kcal/kg and 21%, respectively, the power station may apply penalties to the coal producing company. Since the delivery is continuous and made by relying on in situ determination of pre-estimated values these assessments without defining any confidence levels are inevitably subject to inaccuracy. Thus, the company should be aware of uncertainties in making decisions and avoid conceivable risks. In this study, valuable information is provided in the form of conditional distribution to be used during planning process. It maps the indicator variogram corresponding to calorific value of 1300 kcal/kg and the ash content of 21% estimating the conditional probabilities that the true ash contents are less and calorific values are higher than the critical limits by the application of non-parametric technique, indicator kriging. In addition, it outlines the areas that are most uncertain for decision making. 4 refs., 8 figs., 3 tabs.

  9. Non-parametric Deprojection of Surface Brightness Profiles of Galaxies in Generalised Geometries

    CERN Document Server

    Chakrabarty, Dalia

    2009-01-01

    We present a new Bayesian non-parametric deprojection algorithm DOPING (Deprojection of Observed Photometry using and INverse Gambit), that is designed to extract 3-D luminosity density distributions $\\rho$ from observed surface brightness maps $I$, in generalised geometries, while taking into account changes in intrinsic shape with radius, using a penalised likelihood approach and an MCMC optimiser. We provide the most likely solution to the integral equation that represents deprojection of the measured $I$ to $\\rho$. In order to keep the solution modular, we choose to express $\\rho$ as a function of the line-of-sight (LOS) coordinate $z$. We calculate the extent of the system along the ${\\bf z}$-axis, for a given point on the image that lies within an identified isophotal annulus. The extent along the LOS is binned and density is held a constant over each such $z$-bin. The code begins with a seed density and at the beginning of an iterative step, the trial $\\rho$ is updated. Comparison of the projection of ...

  10. Spectral decompositions of multiple time series: a Bayesian non-parametric approach.

    Science.gov (United States)

    Macaro, Christian; Prado, Raquel

    2014-01-01

    We consider spectral decompositions of multiple time series that arise in studies where the interest lies in assessing the influence of two or more factors. We write the spectral density of each time series as a sum of the spectral densities associated to the different levels of the factors. We then use Whittle's approximation to the likelihood function and follow a Bayesian non-parametric approach to obtain posterior inference on the spectral densities based on Bernstein-Dirichlet prior distributions. The prior is strategically important as it carries identifiability conditions for the models and allows us to quantify our degree of confidence in such conditions. A Markov chain Monte Carlo (MCMC) algorithm for posterior inference within this class of frequency-domain models is presented.We illustrate the approach by analyzing simulated and real data via spectral one-way and two-way models. In particular, we present an analysis of functional magnetic resonance imaging (fMRI) brain responses measured in individuals who participated in a designed experiment to study pain perception in humans.

  11. A Non-parametric Approach to the Overall Estimate of Cognitive Load Using NIRS Time Series.

    Science.gov (United States)

    Keshmiri, Soheil; Sumioka, Hidenobu; Yamazaki, Ryuji; Ishiguro, Hiroshi

    2017-01-01

    We present a non-parametric approach to prediction of the n-back n ∈ {1, 2} task as a proxy measure of mental workload using Near Infrared Spectroscopy (NIRS) data. In particular, we focus on measuring the mental workload through hemodynamic responses in the brain induced by these tasks, thereby realizing the potential that they can offer for their detection in real world scenarios (e.g., difficulty of a conversation). Our approach takes advantage of intrinsic linearity that is inherent in the components of the NIRS time series to adopt a one-step regression strategy. We demonstrate the correctness of our approach through its mathematical analysis. Furthermore, we study the performance of our model in an inter-subject setting in contrast with state-of-the-art techniques in the literature to show a significant improvement on prediction of these tasks (82.50 and 86.40% for female and male participants, respectively). Moreover, our empirical analysis suggest a gender difference effect on the performance of the classifiers (with male data exhibiting a higher non-linearity) along with the left-lateralized activation in both genders with higher specificity in females.

  12. A Non-Parametric Delphi Approach to Foster Innovation Policy Debate in Spain

    Directory of Open Access Journals (Sweden)

    Juan Carlos Salazar-Elena

    2016-05-01

    Full Text Available The aim of this paper is to identify some changes needed in Spain’s innovation policy to fill the gap between its innovation results and those of other European countries in lieu of sustainable leadership. To do this we apply the Delphi methodology to experts from academia, business, and government. To overcome the shortcomings of traditional descriptive methods, we develop an inferential analysis by following a non-parametric bootstrap method which enables us to identify important changes that should be implemented. Particularly interesting is the support found for improving the interconnections among the relevant agents of the innovation system (instead of focusing exclusively in the provision of knowledge and technological inputs through R and D activities, or the support found for “soft” policy instruments aimed at providing a homogeneous framework to assess the innovation capabilities of firms (e.g., for funding purposes. Attention to potential innovators among small and medium enterprises (SMEs and traditional industries is particularly encouraged by experts.

  13. An artificial neural network architecture for non-parametric visual odometry in wireless capsule endoscopy

    Science.gov (United States)

    Dimas, George; Iakovidis, Dimitris K.; Karargyris, Alexandros; Ciuti, Gastone; Koulaouzidis, Anastasios

    2017-09-01

    Wireless capsule endoscopy is a non-invasive screening procedure of the gastrointestinal (GI) tract performed with an ingestible capsule endoscope (CE) of the size of a large vitamin pill. Such endoscopes are equipped with a usually low-frame-rate color camera which enables the visualization of the GI lumen and the detection of pathologies. The localization of the commercially available CEs is performed in the 3D abdominal space using radio-frequency (RF) triangulation from external sensor arrays, in combination with transit time estimation. State-of-the-art approaches, such as magnetic localization, which have been experimentally proved more accurate than the RF approach, are still at an early stage. Recently, we have demonstrated that CE localization is feasible using solely visual cues and geometric models. However, such approaches depend on camera parameters, many of which are unknown. In this paper the authors propose a novel non-parametric visual odometry (VO) approach to CE localization based on a feed-forward neural network architecture. The effectiveness of this approach in comparison to state-of-the-art geometric VO approaches is validated using a robotic-assisted in vitro experimental setup.

  14. Non-parametric mass reconstruction of A1689 from strong lensing data with SLAP

    CERN Document Server

    Diego-Rodriguez, J M; Protopapas, P; Tegmark, M; Benítez, N; Broadhurst, T J

    2004-01-01

    We present the mass distribution in the central area of the cluster A1689 by fitting over 100 multiply lensed images with the non-parametric Strong Lensing Analysis Package (SLAP, Diego et al. 2004). The surface mass distribution is obtained in a robust way finding a total mass of 0.25E15 M_sun/h within a 70'' circle radius from the central peak. Our reconstructed density profile fits well an NFW profile with small perturbations due to substructure and is compatible with the more model dependent analysis of Broadhurst et al. (2004a) based on the same data. Our estimated mass does not rely on any prior information about the distribution of dark matter in the cluster. The peak of the mass distribution falls very close to the central cD and there is substructure near the center suggesting that the cluster is not fully relaxed. We also examine the effect on the recovered mass when we include the uncertainties in the redshift of the sources and in the original shape of the sources. Using simulations designed to mi...

  15. A Non-parametric Approach to Constrain the Transfer Function in Reverberation Mapping

    CERN Document Server

    Li, Yan-Rong; Bai, Jin-Ming

    2016-01-01

    Broad emission lines of active galactic nuclei stem from a spatially extended region (broad-line region; BLR) that are composed of discrete clouds and photoionized by the central ionizing continuum. The temporal behaviors of these emission lines are blurred echoes of the continuum variations (i.e., reverberation mapping; RM) and directly reflect structures and kinematics information of BLRs through the so-called transfer function (also known as velocity-delay map). Based on the previous works of Rybicki & Press (1992) and Zu et al. (2011), we develop an extended, non-parametric approach to determine the transfer function for RM data, in which the transfer function is expressed as a sum of a family of relatively-displaced Gaussian response functions. As such, arbitrary shapes of transfer functions associated with complicated BLR geometry can be seamlessly included, enabling us to relax the presumption of a specified transfer function frequently adopted in previous studies and to let it be determined by obs...

  16. Detection of Invalid Test Scores on Admission Tests : A Simulation Study Using Person-Fit Statistics

    NARCIS (Netherlands)

    Tendeiro, Jorge N.; Meijer, Rob R.; Albers, Casper J.

    While an admission test may strongly predict success in university or law school programs for most test takers, there may be some test takers who are mismeasured. To address this issue, a class of statistics called person-fit statistics is used to check the validity of individual test scores.

  17. Clinical methodologies and incidence of appropriate statistical testing in orthopaedic spine literature. Are statistics misleading?

    Science.gov (United States)

    Vrbos, L A; Lorenz, M A; Peabody, E H; McGregor, M

    1993-06-15

    An analysis of 300 randomly drawn orthopaedic spine articles, published between 1970 and 1990, was performed to assess the quality of biostatistical testing and research design reported in the literature. Of the 300 articles, 269 dealt with topics of an experimental nature, while 31 documented descriptive studies. Statistical deficiencies were identified in 54.0% of the total articles. Conclusions drawn as the result of misleading significance values occurred in 124 experimental studies (46%) while 96 failed to document the form of analysis chosen (35.7%). Statistical testing was not documented in 34 studies (12.6%), while 20 (7.4%) employed analyses considered inappropriate for the specific design structure.

  18. Model of risk assessment under ballistic statistical tests

    Science.gov (United States)

    Gabrovski, Ivan; Karakaneva, Juliana

    The material presents the application of a mathematical method for risk assessment under statistical determination of the ballistic limits of the protection equipment. The authors have implemented a mathematical model based on Pierson's criteria. The software accomplishment of the model allows to evaluate the V50 indicator and to assess the statistical hypothesis' reliability. The results supply the specialists with information about the interval valuations of the probability determined during the testing process.

  19. Distributions of Hardy-Weinberg equilibrium test statistics.

    Science.gov (United States)

    Rohlfs, R V; Weir, B S

    2008-11-01

    It is well established that test statistics and P-values derived from discrete data, such as genetic markers, are also discrete. In most genetic applications, the null distribution for a discrete test statistic is approximated with a continuous distribution, but this approximation may not be reasonable. In some cases using the continuous approximation for the expected null distribution may cause truly null test statistics to appear nonnull. We explore the implications of using continuous distributions to approximate the discrete distributions of Hardy-Weinberg equilibrium test statistics and P-values. We derive exact P-value distributions under the null and alternative hypotheses, enabling a more accurate analysis than is possible with continuous approximations. We apply these methods to biological data and find that using continuous distribution theory with exact tests may underestimate the extent of Hardy-Weinberg disequilibrium in a sample. The implications may be most important for the widespread use of whole-genome case-control association studies and Hardy-Weinberg equilibrium (HWE) testing for data quality control.

  20. [Clinical research IV. Relevancy of the statistical test chosen].

    Science.gov (United States)

    Talavera, Juan O; Rivas-Ruiz, Rodolfo

    2011-01-01

    When we look at the difference between two therapies or the association of a risk factor or prognostic indicator with its outcome, we need to evaluate the accuracy of the result. This assessment is based on a judgment that uses information about the study design and statistical management of the information. This paper specifically mentions the relevance of the statistical test selected. Statistical tests are chosen mainly from two characteristics: the objective of the study and type of variables. The objective can be divided into three test groups: a) those in which you want to show differences between groups or inside a group before and after a maneuver, b) those that seek to show the relationship (correlation) between variables, and c) those that aim to predict an outcome. The types of variables are divided in two: quantitative (continuous and discontinuous) and qualitative (ordinal and dichotomous). For example, if we seek to demonstrate differences in age (quantitative variable) among patients with systemic lupus erythematosus (SLE) with and without neurological disease (two groups), the appropriate test is the "Student t test for independent samples." But if the comparison is about the frequency of females (binomial variable), then the appropriate statistical test is the χ(2).

  1. On the Correct Use of Statistical Tests: Reply to "Lies, damned lies and statistics (in Geology)"

    CERN Document Server

    Sornette, D

    2010-01-01

    In a recent Forum in EOS entitled "Lies, damned lies and statistics (in Geology)", Vermeesch (2009) claims that "statistical significant is not the same as geological significant", in other words, statistical tests may be misleading. In complete contradiction, we affirm that statistical tests are always informative. We trace the erroneous claim of Vermeesch (2009) to a mistake in the interpretation of the chi-square test. Furthermore, using the same catalog of 118,415 earthquakes of magnitude 4 or greater and occurring between Friday 1st January 1999 and Thursday, 1 January 2009 (USGS, http://earthquake.usgs.gov), we show that the null hypothesis that "the occurrence of earthquakes does not depend on the day of the week" cannot be rejected (p-value equal to p=0.46), when taking into account the two well-known effects of (i) catalog incompleteness and (ii) aftershock clustering. This corrects the p-value p=4.5 10^{-18} found by P. Vermeesch (2009), whose implementation of the chi-square test assumes that the 1...

  2. 688,112 statistical results : Content mining psychology articles for statistical test results

    NARCIS (Netherlands)

    Hartgerink, C.H.J.

    2016-01-01

    In this data deposit, I describe a dataset that is the result of content mining 167,318 published articles for statistical test results reported according to the standards prescribed by the American Psychological Association (APA). Articles published by the APA, Springer, Sage, and Taylor & Francis

  3. ROTS: An R package for reproducibility-optimized statistical testing.

    Science.gov (United States)

    Suomi, Tomi; Seyednasrollah, Fatemeh; Jaakkola, Maria K; Faux, Thomas; Elo, Laura L

    2017-05-01

    Differential expression analysis is one of the most common types of analyses performed on various biological data (e.g. RNA-seq or mass spectrometry proteomics). It is the process that detects features, such as genes or proteins, showing statistically significant differences between the sample groups under comparison. A major challenge in the analysis is the choice of an appropriate test statistic, as different statistics have been shown to perform well in different datasets. To this end, the reproducibility-optimized test statistic (ROTS) adjusts a modified t-statistic according to the inherent properties of the data and provides a ranking of the features based on their statistical evidence for differential expression between two groups. ROTS has already been successfully applied in a range of different studies from transcriptomics to proteomics, showing competitive performance against other state-of-the-art methods. To promote its widespread use, we introduce here a Bioconductor R package for performing ROTS analysis conveniently on different types of omics data. To illustrate the benefits of ROTS in various applications, we present three case studies, involving proteomics and RNA-seq data from public repositories, including both bulk and single cell data. The package is freely available from Bioconductor (https://www.bioconductor.org/packages/ROTS).

  4. Generalized Hypergeometric Ensembles: Statistical Hypothesis Testing in Complex Networks

    CERN Document Server

    Casiraghi, Giona; Scholtes, Ingo; Schweitzer, Frank

    2016-01-01

    Statistical ensembles define probability spaces of all networks consistent with given aggregate statistics and have become instrumental in the analysis of relational data on networked systems. Their numerical and analytical study provides the foundation for the inference of topological patterns, the definition of network-analytic measures, as well as for model selection and statistical hypothesis testing. Contributing to the foundation of these important data science techniques, in this article we introduce generalized hypergeometric ensembles, a framework of analytically tractable statistical ensembles of finite, directed and weighted networks. This framework can be interpreted as a generalization of the classical configuration model, which is commonly used to randomly generate networks with a given degree sequence or distribution. Our generalization rests on the introduction of dyadic link propensities, which capture the degree-corrected tendencies of pairs of nodes to form edges between each other. Studyin...

  5. Bayesian Semi- and Non-Parametric Models for Longitudinal Data with Multiple Membership Effects in R

    Directory of Open Access Journals (Sweden)

    Terrance Savitsky

    2014-03-01

    Full Text Available We introduce growcurves for R that performs analysis of repeated measures multiple membership (MM data. This data structure arises in studies under which an intervention is delivered to each subject through the subjects participation in a set of multiple elements that characterize the intervention. In our motivating study design under which subjects receive a group cognitive behavioral therapy (CBT treatment, an element is a group CBT session and each subject attends multiple sessions that, together, comprise the treatment. The sets of elements, or group CBT sessions, attended by subjects will partly overlap with some of those from other subjects to induce a dependence in their responses. The growcurves package offers two alternative sets of hierarchical models: 1. Separate terms are specified for multivariate subject and MM element random effects, where the subject effects are modeled under a Dirichlet process prior to produce a semi-parametric construction; 2. A single term is employed to model joint subject-by-MM effects. A fully non-parametric dependent Dirichlet process formulation allows exploration of differences in subject responses across different MM elements. This model allows for borrowing information among subjects who express similar longitudinal trajectories for flexible estimation. growcurves deploys estimation functions to perform posterior sampling under a suite of prior options. An accompanying set of plot functions allows the user to readily extract by-subject growth curves. The design approach intends to anticipate inferential goals with tools that fully extract information from repeated measures data. Computational efficiency is achieved by performing the sampling for estimation functions using compiled C++ code.

  6. Population pharmacokinetics of nevirapine in Malaysian HIV patients: a non-parametric approach.

    Science.gov (United States)

    Mustafa, Suzana; Yusuf, Wan Nazirah Wan; Woillard, Jean Baptiste; Choon, Tan Soo; Hassan, Norul Badriah

    2016-07-01

    Nevirapine is the first non-nucleoside reverse-transcriptase inhibitor approved and is widely used in combination therapy to treat HIV-1 infection. The pharmacokinetics of nevirapine was extensively studied in various populations with a parametric approach. Hence, this study was aimed to determine population pharmacokinetic parameters in Malaysian HIV-infected patients with a non-parametric approach which allows detection of outliers or non-normal distribution contrary to the parametric approach. Nevirapine population pharmacokinetics was modelled with Pmetrics. A total of 708 observations from 112 patients were included in the model building and validation analysis. Evaluation of the model was based on a visual inspection of observed versus predicted (population and individual) concentrations and plots weighted residual error versus concentrations. Accuracy and robustness of the model were evaluated by visual predictive check (VPC). The median parameters' estimates obtained from the final model were used to predict individual nevirapine plasma area-under-curve (AUC) in the validation dataset. The Bland-Altman plot was used to compare the AUC predicted with trapezoidal AUC. The median nevirapine clearance was of 2.92 L/h, the median rate of absorption was 2.55/h and the volume of distribution was 78.23 L. Nevirapine pharmacokinetics were best described by one-compartmental with first-order absorption model and a lag-time. Weighted residuals for the model selected were homogenously distributed over the concentration and time range. The developed model adequately estimated AUC. In conclusion, a model to describe the pharmacokinetics of nevirapine was developed. The developed model adequately describes nevirapine population pharmacokinetics in HIV-infected patients in Malaysia.

  7. A Non-Parametric Item Response Theory Evaluation of the CAGE Instrument Among Older Adults.

    Science.gov (United States)

    Abdin, Edimansyah; Sagayadevan, Vathsala; Vaingankar, Janhavi Ajit; Picco, Louisa; Chong, Siow Ann; Subramaniam, Mythily

    2017-08-04

    The validity of the CAGE using item response theory (IRT) has not yet been examined in older adult population. This study aims to investigate the psychometric properties of the CAGE using both non-parametric and parametric IRT models, assess whether there is any differential item functioning (DIF) by age, gender and ethnicity and examine the measurement precision at the cut-off scores. We used data from the Well-being of the Singapore Elderly study to conduct Mokken scaling analysis (MSA), dichotomous Rasch and 2-parameter logistic IRT models. The measurement precision at the cut-off scores were evaluated using classification accuracy (CA) and classification consistency (CC). The MSA showed the overall scalability H index was 0.459, indicating a medium performing instrument. All items were found to be homogenous, measuring the same construct and able to discriminate well between respondents with high levels of the construct and the ones with lower levels. The item discrimination ranged from 1.07 to 6.73 while the item difficulty ranged from 0.33 to 2.80. Significant DIF was found for 2-item across ethnic group. More than 90% (CC and CA ranged from 92.5% to 94.3%) of the respondents were consistently and accurately classified by the CAGE cut-off scores of 2 and 3. The current study provides new evidence on the validity of the CAGE from the IRT perspective. This study provides valuable information of each item in the assessment of the overall severity of alcohol problem and the precision of the cut-off scores in older adult population.

  8. Reliability Evaluation of Concentric Butterfly Valve Using Statistical Hypothesis Test

    Energy Technology Data Exchange (ETDEWEB)

    Chang, Mu Seong; Choi, Jong Sik; Choi, Byung Oh; Kim, Do Sik [Korea Institute of Machinery and Materials, Daejeon (Korea, Republic of)

    2015-12-15

    A butterfly valve is a type of flow-control device typically used to regulate a fluid flow. This paper presents an estimation of the shape parameter of the Weibull distribution, characteristic life, and B10 life for a concentric butterfly valve based on a statistical analysis of the reliability test data taken before and after the valve improvement. The difference in the shape and scale parameters between the existing and improved valves is reviewed using a statistical hypothesis test. The test results indicate that the shape parameter of the improved valve is similar to that of the existing valve, and that the scale parameter of the improved valve is found to have increased. These analysis results are particularly useful for a reliability qualification test and the determination of the service life cycles.

  9. Interactive comparison of hypothesis tests for statistical model checking

    NARCIS (Netherlands)

    de Boer, Pieter-Tjerk; Reijsbergen, D.P.; Scheinhardt, Willem R.W.

    2015-01-01

    We present a web-based interactive comparison of hypothesis tests as are used in statistical model checking, providing users and tool developers with more insight into their characteristics. Parameters can be modified easily and their influence is visualized in real time; an integrated simulation

  10. The performance of robust test statistics with categorical data

    NARCIS (Netherlands)

    Savalei, V.; Rhemtulla, M.

    2013-01-01

    This paper reports on a simulation study that evaluated the performance of five structural equation model test statistics appropriate for categorical data. Both Type I error rate and power were investigated. Different model sizes, sample sizes, numbers of categories, and threshold distributions were

  11. The performance of robust test statistics with categorical data

    NARCIS (Netherlands)

    Savalei, V.; Rhemtulla, M.

    2013-01-01

    This paper reports on a simulation study that evaluated the performance of five structural equation model test statistics appropriate for categorical data. Both Type I error rate and power were investigated. Different model sizes, sample sizes, numbers of categories, and threshold distributions were

  12. Evaluation of the Wishart test statistics for polarimetric SAR data

    DEFF Research Database (Denmark)

    Skriver, Henning; Nielsen, Allan Aasbjerg; Conradsen, Knut

    2003-01-01

    A test statistic for equality of two covariance matrices following the complex Wishart distribution has previously been used in new algorithms for change detection, edge detection and segmentation in polarimetric SAR images. Previously, the results for change detection and edge detection have been...

  13. APPLICATION OF PARAMETRIC AND NON-PARAMETRIC BENCHMARKING METHODS IN COST EFFICIENCY ANALYSIS OF THE ELECTRICITY DISTRIBUTION SECTOR

    Directory of Open Access Journals (Sweden)

    Andrea Furková

    2007-06-01

    Full Text Available This paper explores the aplication of parametric and non-parametric benchmarking methods in measuring cost efficiency of Slovak and Czech electricity distribution companies. We compare the relative cost efficiency of Slovak and Czech distribution companies using two benchmarking methods: the non-parametric Data Envelopment Analysis (DEA and the Stochastic Frontier Analysis (SFA as the parametric approach. The first part of analysis was based on DEA models. Traditional cross-section CCR and BCC model were modified to cost efficiency estimation. In further analysis we focus on two versions of stochastic frontier cost functioin using panel data: MLE model and GLS model. These models have been applied to an unbalanced panel of 11 (Slovakia 3 and Czech Republic 8 regional electricity distribution utilities over a period from 2000 to 2004. The differences in estimated scores, parameters and ranking of utilities were analyzed. We observed significant differences between parametric methods and DEA approach.

  14. Equivalence versus classical statistical tests in water quality assessments.

    Science.gov (United States)

    Ngatia, Murage; Gonzalez, David; San Julian, Steve; Conner, Arin

    2010-01-01

    To evaluate whether two unattended field organic carbon instruments could provide data comparable to laboratory-generated data, we needed a practical assessment. Null hypothesis statistical testing (NHST) is commonly utilized for such evaluations in environmental assessments, but researchers in other disciplines have identified weaknesses that may limit NHST's usefulness. For example, in NHST, large sample sizes change p-values and a statistically significant result can be obtained by merely increasing the sample size. In addition, p-values can indicate that observed results are statistically significantly different, but in reality the differences could be trivial in magnitude. Equivalence tests, on the other hand, allow the investigator to incorporate decision criteria that have practical relevance to the study. In this paper, we demonstrate the potential use of equivalence tests as an alternative to NHST. We first compare data between the two field instruments, and then compare the field instruments' data to laboratory-generated data using both NHST and equivalence tests. NHST indicated that the data between the two field instruments and the data between the field instruments and the laboratory were significantly different. Equivalence tests showed that the data were equivalent because they fell within a pre-determined equivalence interval based on our knowledge of laboratory precision. We conclude that equivalence tests provide more useful comparisons and interpretation of water quality data than NHST and should be more widely used in similar environmental assessments.

  15. Non-parametric data-based approach for the quantification and communication of uncertainties in river flood forecasts

    Science.gov (United States)

    Van Steenbergen, N.; Willems, P.

    2012-04-01

    Reliable flood forecasts are the most important non-structural measures to reduce the impact of floods. However flood forecasting systems are subject to uncertainty originating from the input data, model structure and model parameters of the different hydraulic and hydrological submodels. To quantify this uncertainty a non-parametric data-based approach has been developed. This approach analyses the historical forecast residuals (differences between the predictions and the observations at river gauging stations) without using a predefined statistical error distribution. Because the residuals are correlated with the value of the forecasted water level and the lead time, the residuals are split up into discrete classes of simulated water levels and lead times. For each class, percentile values are calculated of the model residuals and stored in a 'three dimensional error' matrix. By 3D interpolation in this error matrix, the uncertainty in new forecasted water levels can be quantified. In addition to the quantification of the uncertainty, the communication of this uncertainty is equally important. The communication has to be done in a consistent way, reducing the chance of misinterpretation. Also, the communication needs to be adapted to the audience; the majority of the larger public is not interested in in-depth information on the uncertainty on the predicted water levels, but only is interested in information on the likelihood of exceedance of certain alarm levels. Water managers need more information, e.g. time dependent uncertainty information, because they rely on this information to undertake the appropriate flood mitigation action. There are various ways in presenting uncertainty information (numerical, linguistic, graphical, time (in)dependent, etc.) each with their advantages and disadvantages for a specific audience. A useful method to communicate uncertainty of flood forecasts is by probabilistic flood mapping. These maps give a representation of the

  16. A statistical approach to bioclimatic trend detection in the airborne pollen records of Catalonia (NE Spain).

    Science.gov (United States)

    Fernández-Llamazares, Alvaro; Belmonte, Jordina; Delgado, Rosario; De Linares, Concepción

    2014-04-01

    Airborne pollen records are a suitable indicator for the study of climate change. The present work focuses on the role of annual pollen indices for the detection of bioclimatic trends through the analysis of the aerobiological spectra of 11 taxa of great biogeographical relevance in Catalonia over an 18-year period (1994-2011), by means of different parametric and non-parametric statistical methods. Among others, two non-parametric rank-based statistical tests were performed for detecting monotonic trends in time series data of the selected airborne pollen types and we have observed that they have similar power in detecting trends. Except for those cases in which the pollen data can be well-modeled by a normal distribution, it is better to apply non-parametric statistical methods to aerobiological studies. Our results provide a reliable representation of the pollen trends in the region and suggest that greater pollen quantities are being liberated to the atmosphere in the last years, specially by Mediterranean taxa such as Pinus, Total Quercus and Evergreen Quercus, although the trends may differ geographically. Longer aerobiological monitoring periods are required to corroborate these results and survey the increasing levels of certain pollen types that could exert an impact in terms of public health.

  17. Testing the Weak Form Efficiency of Karachi Stock Exchange

    Directory of Open Access Journals (Sweden)

    Muhammad Arshad Haroon

    2012-12-01

    Full Text Available In an efficient market, share prices reflect all available information. The study of efficient market hypothesis helps to take right decisions related to investments. In this research,weak form efficiency has been tested of Karachi Stock Exchange—KSE covering the period of 2nd November 1991 to 2nd November 2011. Descriptive statistics indicated the absence of weak form efficiency while results of non-parametric tests, showed consistency as well. We employed non-parametric tests were KS Goodness-of-Fit test,run test and autocorrelation test to find out serial independency of the data. Results prove that KSE is not weak-form-efficient. This happens because KSE is an emerging market and there, it has been observed that information take time to be processed. Thus it can besaid that technical analysis may be applied to gain abnormal returns.

  18. Wavelet analysis in ecology and epidemiology: impact of statistical tests.

    Science.gov (United States)

    Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario

    2014-02-06

    Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the 'beta-surrogate' method.

  19. Parametric modeling of DSC-MRI data with stochastic filtration and optimal input design versus non-parametric modeling.

    Science.gov (United States)

    Kalicka, Renata; Pietrenko-Dabrowska, Anna

    2007-03-01

    In the paper MRI measurements are used for assessment of brain tissue perfusion and other features and functions of the brain (cerebral blood flow - CBF, cerebral blood volume - CBV, mean transit time - MTT). Perfusion is an important indicator of tissue viability and functioning as in pathological tissue blood flow, vascular and tissue structure are altered with respect to normal tissue. MRI enables diagnosing diseases at an early stage of their course. The parametric and non-parametric approaches to the identification of MRI models are presented and compared. The non-parametric modeling adopts gamma variate functions. The parametric three-compartmental catenary model, based on the general kinetic model, is also proposed. The parameters of the models are estimated on the basis of experimental data. The goodness of fit of the gamma variate and the three-compartmental models to the data and the accuracy of the parameter estimates are compared. Kalman filtering, smoothing the measurements, was adopted to improve the estimate accuracy of the parametric model. Parametric modeling gives a better fit and better parameter estimates than non-parametric and allows an insight into the functioning of the system. To improve the accuracy optimal experiment design related to the input signal was performed.

  20. Statistical tests for associations between two directed acyclic graphs.

    Directory of Open Access Journals (Sweden)

    Robert Hoehndorf

    Full Text Available Biological data, and particularly annotation data, are increasingly being represented in directed acyclic graphs (DAGs. However, while relevant biological information is implicit in the links between multiple domains, annotations from these different domains are usually represented in distinct, unconnected DAGs, making links between the domains represented difficult to determine. We develop a novel family of general statistical tests for the discovery of strong associations between two directed acyclic graphs. Our method takes the topology of the input graphs and the specificity and relevance of associations between nodes into consideration. We apply our method to the extraction of associations between biomedical ontologies in an extensive use-case. Through a manual and an automatic evaluation, we show that our tests discover biologically relevant relations. The suite of statistical tests we develop for this purpose is implemented and freely available for download.

  1. Mean-squared-displacement statistical test for fractional Brownian motion

    Science.gov (United States)

    Sikora, Grzegorz; Burnecki, Krzysztof; Wyłomańska, Agnieszka

    2017-03-01

    Anomalous diffusion in crowded fluids, e.g., in cytoplasm of living cells, is a frequent phenomenon. A common tool by which the anomalous diffusion of a single particle can be classified is the time-averaged mean square displacement (TAMSD). A classical mechanism leading to the anomalous diffusion is the fractional Brownian motion (FBM). A validation of such process for single-particle tracking data is of great interest for experimentalists. In this paper we propose a rigorous statistical test for FBM based on TAMSD. To this end we analyze the distribution of the TAMSD statistic, which is given by the generalized chi-squared distribution. Next, we study the power of the test by means of Monte Carlo simulations. We show that the test is very sensitive for changes of the Hurst parameter. Moreover, it can easily distinguish between two models of subdiffusion: FBM and continuous-time random walk.

  2. Fully Bayesian tests of neutrality using genealogical summary statistics

    Directory of Open Access Journals (Sweden)

    Drummond Alexei J

    2008-10-01

    Full Text Available Abstract Background Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome. Results Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size. Conclusion Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.

  3. Your Chi-Square Test Is Statistically Significant: Now What?

    Directory of Open Access Journals (Sweden)

    Donald Sharpe

    2015-04-01

    Full Text Available Applied researchers have employed chi-square tests for more than one hundred years. This paper addresses the question of how one should follow a statistically significant chi-square test result in order to determine the source of that result. Four approaches were evaluated: calculating residuals, comparing cells, ransacking, and partitioning. Data from two recent journal articles were used to illustrate these approaches. A call is made for greater consideration of foundational techniques such as the chi-square tests.

  4. Asymptotics of Bonferroni for Dependent Normal Test Statistics.

    Science.gov (United States)

    Proschan, Michael A; Shaw, Pamela A

    2011-07-01

    The Bonferroni adjustment is sometimes used to control the familywise error rate (FWE) when the number of comparisons is huge. In genome wide association studies, researchers compare cases to controls with respect to thousands of single nucleotide polymorphisms. It has been claimed that the Bonferroni adjustment is only slightly conservative if the comparisons are nearly independent. We show that the veracity of this claim depends on how one defines "nearly." Specifically, if the test statistics' pairwise correlations converge to 0 as the number of tests tend to ∞, the conservatism of the Bonferroni procedure depends on their rate of convergence. The type I error rate of Bonferroni can tend to 0 or 1 - exp(-α) ≈ α, depending on that rate. We show using elementary probability theory what happens to the distribution of the number of errors when using Bonferroni, as the number of dependent normal test statistics gets large. We also use the limiting behavior of Bonferroni to shed light on properties of other commonly used test statistics.

  5. Non-parametric determination of H and He interstellar fluxes from cosmic-ray data

    Science.gov (United States)

    Ghelfi, A.; Barao, F.; Derome, L.; Maurin, D.

    2016-06-01

    Context. Top-of-atmosphere (TOA) cosmic-ray (CR) fluxes from satellites and balloon-borne experiments are snapshots of the solar activity imprinted on the interstellar (IS) fluxes. Given a series of snapshots, the unknown IS flux shape and the level of modulation (for each snapshot) can be recovered. Aims: We wish (i) to provide the most accurate determination of the IS H and He fluxes from TOA data alone; (ii) to obtain the associated modulation levels (and uncertainties) while fully accounting for the correlations with the IS flux uncertainties; and (iii) to inspect whether the minimal force-field approximation is sufficient to explain all the data at hand. Methods: Using H and He TOA measurements, including the recent high-precision AMS, BESS-Polar, and PAMELA data, we performed a non-parametric fit of the IS fluxes JISH,~He and modulation level φi for each data-taking period. We relied on a Markov chain Monte Carlo (MCMC) engine to extract the probability density function and correlations (hence the credible intervals) of the sought parameters. Results: Although H and He are the most abundant and best measured CR species, several datasets had to be excluded from the analysis because of inconsistencies with other measurements. From the subset of data passing our consistency cut, we provide ready-to-use best-fit and credible intervals for the H and He IS fluxes from MeV/n to PeV/n energy (with a relative precision in the range [ 2-10% ] at 1σ). Given the strong correlation between JIS and φi parameters, the uncertainties on JIS translate into Δφ ≈ ± 30 MV (at 1σ) for all experiments. We also find that the presence of 3He in He data biases φ towards higher φ values by ~30 MV. The force-field approximation, despite its limitation, gives an excellent (χ2/d.o.f. = 1.02) description of the recent high-precision TOA H and He fluxes. Conclusions: The analysis must be extended to different charge species and more realistic modulation models. It would benefit

  6. Evaluation of world's largest social welfare scheme: An assessment using non-parametric approach.

    Science.gov (United States)

    Singh, Sanjeet

    2016-08-01

    Mahatma Gandhi National Rural Employment Guarantee Act (MGNREGA) is the world's largest social welfare scheme in India for the poverty alleviation through rural employment generation. This paper aims to evaluate and rank the performance of the states in India under MGNREGA scheme. A non-parametric approach, Data Envelopment Analysis (DEA) is used to calculate the overall technical, pure technical, and scale efficiencies of states in India. The sample data is drawn from the annual official reports published by the Ministry of Rural Development, Government of India. Based on three selected input parameters (expenditure indicators) and five output parameters (employment generation indicators), I apply both input and output oriented DEA models to estimate how well the states utilize their resources and generate outputs during the financial year 2013-14. The relative performance evaluation has been made under the assumption of constant returns and also under variable returns to scale to assess the impact of scale on performance. The results indicate that the main source of inefficiency is both technical and managerial practices adopted. 11 states are overall technically efficient and operate at the optimum scale whereas 18 states are pure technical or managerially efficient. It has been found that for some states it necessary to alter scheme size to perform at par with the best performing states. For inefficient states optimal input and output targets along with the resource savings and output gains are calculated. Analysis shows that if all inefficient states operate at optimal input and output levels, on an average 17.89% of total expenditure and a total amount of $780million could have been saved in a single year. Most of the inefficient states perform poorly when it comes to the participation of women and disadvantaged sections (SC&ST) in the scheme. In order to catch up with the performance of best performing states, inefficient states on an average need to enhance

  7. Statistical testing and distribution for lead chloride toxicity

    Institute of Scientific and Technical Information of China (English)

    John H. Lange

    2005-01-01

    @@ Dear Sir, Graca et al. [1] provided an interesting investigation on the toxicity of lead chloride and sperm development in mice. However, I would like to make a comment on the statistical analysis presented. Table 1 and its results suggest that a comparison of treated (experiment) and control mice were undertaken using the t-test. The authors indicate that they used the t-test along with complementation of ANOVA analysis. It appears that the t-test was used for analysis in Table 1 and ANOVA, as indicated,for Table 2. Use of t-test for comparing three of more groups is not appropriate since this may result in a multiple comparison problem (increasing the type I error rate)[2, 3]. Multiple comparisons can result in the reporting of a P value that is significant (or of lower value) when in actuality it is not. It is better to use, for example, a one-way ANOVA followed by a post-test (post-hoc)which can take into account all comparisons. Other statistical testing can also be employed to control the overall type I error, such as Tukey-HSD (honest significant difference), Scheffe's and Bonferroni-Dunn methods [2].

  8. Probability and Statistics Questions and Tests : a critical analysis

    Directory of Open Access Journals (Sweden)

    Fabrizio Maturo

    2015-06-01

    Full Text Available In probability and statistics courses, a popular method for the evaluation of the students is to assess them using multiple choice tests. The use of these tests allows to evaluate certain types of skills such as fast response, short-term memory, mental clarity and ability to compete. In our opinion, the verification through testing can certainly be useful for the analysis of certain aspects, and to speed up the process of assessment, but we should be aware of the limitations of such a standardized procedure and then exclude that the assessments of pupils, classes and schools can be reduced to processing of test results. To prove this thesis, this article argues in detail the main test limits, presents some recent models which have been proposed in the literature and suggests some alternative valuation methods.   Quesiti e test di Probabilità e Statistica: un'analisi critica Nei corsi di Probabilità e  Statistica, un metodo molto diffuso per la valutazione degli studenti consiste nel sottoporli a quiz a risposta multipla.  L'uso di questi test permette di valutare alcuni tipi di abilità come la rapidità di risposta, la memoria a breve termine, la lucidità mentale e l'attitudine a gareggiare. A nostro parere, la verifica attraverso i test può essere sicuramente utile per l'analisi di alcuni aspetti e per velocizzare il percorso di valutazione ma si deve essere consapevoli dei limiti di una tale procedura standardizzata e quindi escludere che le valutazioni di alunni, classi e scuole possano essere ridotte a elaborazioni di risultati di test. A dimostrazione di questa tesi, questo articolo argomenta in dettaglio i limiti principali dei test, presenta alcuni recenti modelli proposti in letteratura e propone alcuni metodi di valutazione alternativi. Parole Chiave:  item responce theory, valutazione, test, probabilità

  9. Statistical distributions of test statistics used for quantitative trait association mapping in structured populations

    Directory of Open Access Journals (Sweden)

    Teyssèdre Simon

    2012-11-01

    Full Text Available Abstract Background Spurious associations between single nucleotide polymorphisms and phenotypes are a major issue in genome-wide association studies and have led to underestimation of type 1 error rate and overestimation of the number of quantitative trait loci found. Many authors have investigated the influence of population structure on the robustness of methods by simulation. This paper is aimed at developing further the algebraic formalization of power and type 1 error rate for some of the classical statistical methods used: simple regression, two approximate methods of mixed models involving the effect of a single nucleotide polymorphism (SNP and a random polygenic effect (GRAMMAR and FASTA and the transmission/disequilibrium test for quantitative traits and nuclear families. Analytical formulae were derived using matrix algebra for the first and second moments of the statistical tests, assuming a true mixed model with a polygenic effect and SNP effects. Results The expectation and variance of the test statistics and their marginal expectations and variances according to the distribution of genotypes and estimators of variance components are given as a function of the relationship matrix and of the heritability of the polygenic effect. These formulae were used to compute type 1 error rate and power for any kind of relationship matrix between phenotyped and genotyped individuals for any level of heritability. For the regression method, type 1 error rate increased with the variability of relationships and with heritability, but decreased with the GRAMMAR method and was not affected with the FASTA and quantitative transmission/disequilibrium test methods. Conclusions The formulae can be easily used to provide the correct threshold of type 1 error rate and to calculate the power when designing experiments or data collection protocols. The results concerning the efficacy of each method agree with simulation results in the literature but were

  10. A new measure for gene expression biclustering based on non-parametric correlation.

    Science.gov (United States)

    Flores, Jose L; Inza, Iñaki; Larrañaga, Pedro; Calvo, Borja

    2013-12-01

    One of the emerging techniques for performing the analysis of the DNA microarray data known as biclustering is the search of subsets of genes and conditions which are coherently expressed. These subgroups provide clues about the main biological processes. Until now, different approaches to this problem have been proposed. Most of them use the mean squared residue as quality measure but relevant and interesting patterns can not be detected such as shifting, or scaling patterns. Furthermore, recent papers show that there exist new coherence patterns involved in different kinds of cancer and tumors such as inverse relationships between genes which can not be captured. The proposed measure is called Spearman's biclustering measure (SBM) which performs an estimation of the quality of a bicluster based on the non-linear correlation among genes and conditions simultaneously. The search of biclusters is performed by using a evolutionary technique called estimation of distribution algorithms which uses the SBM measure as fitness function. This approach has been examined from different points of view by using artificial and real microarrays. The assessment process has involved the use of quality indexes, a set of bicluster patterns of reference including new patterns and a set of statistical tests. It has been also examined the performance using real microarrays and comparing to different algorithmic approaches such as Bimax, CC, OPSM, Plaid and xMotifs. SBM shows several advantages such as the ability to recognize more complex coherence patterns such as shifting, scaling and inversion and the capability to selectively marginalize genes and conditions depending on the statistical significance. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  11. Quantum Statistical Testing of a Quantum Random Number Generator

    Energy Technology Data Exchange (ETDEWEB)

    Humble, Travis S [ORNL

    2014-01-01

    The unobservable elements in a quantum technology, e.g., the quantum state, complicate system verification against promised behavior. Using model-based system engineering, we present methods for verifying the opera- tion of a prototypical quantum random number generator. We begin with the algorithmic design of the QRNG followed by the synthesis of its physical design requirements. We next discuss how quantum statistical testing can be used to verify device behavior as well as detect device bias. We conclude by highlighting how system design and verification methods must influence effort to certify future quantum technologies.

  12. Quantum Statistical Testing of a Quantum Random Number Generator

    Energy Technology Data Exchange (ETDEWEB)

    Humble, Travis S [ORNL

    2014-01-01

    The unobservable elements in a quantum technology, e.g., the quantum state, complicate system verification against promised behavior. Using model-based system engineering, we present methods for verifying the opera- tion of a prototypical quantum random number generator. We begin with the algorithmic design of the QRNG followed by the synthesis of its physical design requirements. We next discuss how quantum statistical testing can be used to verify device behavior as well as detect device bias. We conclude by highlighting how system design and verification methods must influence effort to certify future quantum technologies.

  13. Quantum statistical testing of a quantum random number generator

    Science.gov (United States)

    Humble, Travis S.

    2014-10-01

    The unobservable elements in a quantum technology, e.g., the quantum state, complicate system verification against promised behavior. Using model-based system engineering, we present methods for verifying the operation of a prototypical quantum random number generator. We begin with the algorithmic design of the QRNG followed by the synthesis of its physical design requirements. We next discuss how quantum statistical testing can be used to verify device behavior as well as detect device bias. We conclude by highlighting how system design and verification methods must influence effort to certify future quantum technologies.

  14. 污染线性模型的非参数估计%NON-PARAMETRIC ESTIMATION IN CONTAMINATED LINEAR MODEL

    Institute of Scientific and Technical Information of China (English)

    柴根象; 孙燕; 杨筱菡

    2001-01-01

    In this paper, the following contaminated linear model is considered: yi=(1-ε)xτiβ+zi, 1≤i≤n, where r.v.'s {yi} are contaminated with errors {zi}. To assume that the errors have the finite moment of order 2 only. The non-parametric estimation of contaminated coefficient ε and regression parameter β are established, and the strong consistency and convergence rate almost surely of the estimators are obtained. A simulated example is also given to show the visual performance of the estimations.

  15. Statistical tests for power-law cross-correlated processes.

    Science.gov (United States)

    Podobnik, Boris; Jiang, Zhi-Qiang; Zhou, Wei-Xing; Stanley, H Eugene

    2011-12-01

    For stationary time series, the cross-covariance and the cross-correlation as functions of time lag n serve to quantify the similarity of two time series. The latter measure is also used to assess whether the cross-correlations are statistically significant. For nonstationary time series, the analogous measures are detrended cross-correlations analysis (DCCA) and the recently proposed detrended cross-correlation coefficient, ρ(DCCA)(T,n), where T is the total length of the time series and n the window size. For ρ(DCCA)(T,n), we numerically calculated the Cauchy inequality -1 ≤ ρ(DCCA)(T,n) ≤ 1. Here we derive -1 ≤ ρ DCCA)(T,n) ≤ 1 for a standard variance-covariance approach and for a detrending approach. For overlapping windows, we find the range of ρ(DCCA) within which the cross-correlations become statistically significant. For overlapping windows we numerically determine-and for nonoverlapping windows we derive--that the standard deviation of ρ(DCCA)(T,n) tends with increasing T to 1/T. Using ρ(DCCA)(T,n) we show that the Chinese financial market's tendency to follow the U.S. market is extremely weak. We also propose an additional statistical test that can be used to quantify the existence of cross-correlations between two power-law correlated time series.

  16. Non parametric deprojection of NIKA SZ observations: pressure distribution in the Planck-discovered cluster PSZ1 G045.85+57.71

    CERN Document Server

    Ruppin, F; Comis, B; Ade, P; André, P; Arnaud, M; Beelen, A; Benoît, A; Bideaud, A; Billot, N; Bourrion, O; Calvo, M; Catalano, A; Coiffard, G; D'Addabbo, A; De Petris, M; Désert, F -X; Doyle, S; Goupy, J; Kramer, C; Leclercq, S; Macías-Pérez, J F; Mauskopf, P; Mayet, F; Monfardini, A; Pajot, F; Pascale, E; Perotto, L; Pisano, G; Pointecouteau, E; Ponthieu, N; Pratt, G W; Revéret, V; Ritacco, A; Rodriguez, L; Romero, C; Schuster, K; Sievers, A; Triqueneaux, S; Tucker, C; Zylka, R

    2016-01-01

    The determination of the thermodynamic properties of clusters of galaxies at intermediate and high redshift can bring new insights into the formation of large scale structures. It is essential for a robust calibration of the mass-observable scaling relations and their scatter, which are key ingredients for precise cosmology using cluster statistics. Here we illustrate an application of high-resolution $(< 20$ arcsec) thermal Sunyaev-Zel'dovich (tSZ) observations by probing the intracluster medium (ICM) of the Planck-discovered galaxy cluster PSZ1 G045.85+57.71 at redshift $z = 0.61$, using tSZ data obtained with the NIKA camera, a dual-band (150 and 260~GHz) instrument operated at the IRAM 30-meter telescope. We deproject jointly NIKA and Planck data to extract the electronic pressure distribution non-parametrically from the cluster core ($R \\sim 0.02\\, R_{500}$) to its outskirts ($R \\sim 3\\, R_{500}$), for the first time at intermediate redshift. The constraints on the resulting pressure profile allow us ...

  17. Statistical Tests of the PTHA Poisson Assumption for Submarine Landslides

    Science.gov (United States)

    Geist, E. L.; Chaytor, J. D.; Parsons, T.; Ten Brink, U. S.

    2012-12-01

    We demonstrate that a sequence of dated mass transport deposits (MTDs) can provide information to statistically test whether or not submarine landslides associated with these deposits conform to a Poisson model of occurrence. Probabilistic tsunami hazard analysis (PTHA) most often assumes Poissonian occurrence for all sources, with an exponential distribution of return times. Using dates that define the bounds of individual MTDs, we first describe likelihood and Monte Carlo methods of parameter estimation for a suite of candidate occurrence models (Poisson, lognormal, gamma, Brownian Passage Time). In addition to age-dating uncertainty, both methods incorporate uncertainty caused by the open time intervals: i.e., before the first and after the last event to the present. Accounting for these open intervals is critical when there are a small number of observed events. The optimal occurrence model is selected according to both the Akaike Information Criteria (AIC) and Akaike's Bayesian Information Criterion (ABIC). In addition, the likelihood ratio test can be performed on occurrence models from the same family: e.g., the gamma model relative to the exponential model of return time distribution. Parameter estimation, model selection, and hypothesis testing are performed on data from two IODP holes in the northern Gulf of Mexico that penetrated a total of 14 MTDs, some of which are correlated between the two holes. Each of these events has been assigned an age based on microfossil zonations and magnetostratigraphic datums. Results from these sites indicate that the Poisson assumption is likely valid. However, parameter estimation results using the likelihood method for one of the sites suggest that the events may have occurred quasi-periodically. Methods developed in this study provide tools with which one can determine both the rate of occurrence and the statistical validity of the Poisson assumption when submarine landslides are included in PTHA.

  18. On The Robustness of z=0-1 Galaxy Size Measurements Through Model and Non-Parametric Fits

    CERN Document Server

    Mosleh, Moein; Franx, Marijn

    2013-01-01

    We present the size-stellar mass relations of nearby (z=0.01-0.02) SDSS galaxies, for samples selected by color, morphology, Sersic index n, and specific star formation rate. Several commonly-employed size measurement techniques are used, including single Sersic fits, two-component Sersic models and a non-parametric method. Through simple simulations we show that the non-parametric and two-component Sersic methods provide the most robust effective radius measurements, while those based on single Sersic profiles are often overestimates, especially for massive red/early-type galaxies. Using our robust sizes, we show that for all sub-samples, the mass-size relations are shallow at low stellar masses and steepen above ~3-4 x 10^{10}\\Msun. The mass-size relations for galaxies classified as late-type, low-n, and star-forming are consistent with each other, while blue galaxies follow a somewhat steeper relation. The mass-size relations of early-type, high-n, red, and quiescent galaxies all agree with each other but ...

  19. Further Empirical Results on Parametric Versus Non-Parametric IRT Modeling of Likert-Type Personality Data.

    Science.gov (United States)

    Maydeu-Olivares, Albert

    2005-04-01

    Chernyshenko, Stark, Chan, Drasgow, and Williams (2001) investigated the fit of Samejima's logistic graded model and Levine's non-parametric MFS model to the scales of two personality questionnaires and found that the graded model did not fit well. We attribute the poor fit of the graded model to small amounts of multidimensionality present in their data. To verify this conjecture, we compare the fit of these models to the Social Problem Solving Inventory-Revised, whose scales were designed to be unidimensional. A calibration and a cross-validation sample of new observations were used. We also included the following parametric models in the comparison: Bock's nominal model, Masters' partial credit model, and Thissen and Steinberg's extension of the latter. All models were estimated using full information maximum likelihood. We also included in the comparison a normal ogive model version of Samejima's model estimated using limited information estimation. We found that for all scales Samejima's model outperformed all other parametric IRT models in both samples, regardless of the estimation method employed. The non-parametric model outperformed all parametric models in the calibration sample. However, the graded model outperformed MFS in the cross-validation sample in some of the scales. We advocate employing the graded model estimated using limited information methods in modeling Likert-type data, as these methods are more versatile than full information methods to capture the multidimensionality that is generally present in personality data.

  20. 'nparACT' package for R: A free software tool for the non-parametric analysis of actigraphy data.

    Science.gov (United States)

    Blume, Christine; Santhi, Nayantara; Schabus, Manuel

    2016-01-01

    For many studies, participants' sleep-wake patterns are monitored and recorded prior to, during and following an experimental or clinical intervention using actigraphy, i.e. the recording of data generated by movements. Often, these data are merely inspected visually without computation of descriptive parameters, in part due to the lack of user-friendly software. To address this deficit, we developed a package for R Core Team [6], that allows computing several non-parametric measures from actigraphy data. Specifically, it computes the interdaily stability (IS), intradaily variability (IV) and relative amplitude (RA) of activity and gives the start times and average activity values of M10 (i.e. the ten hours with maximal activity) and L5 (i.e. the five hours with least activity). Two functions compute these 'classical' parameters and handle either single or multiple files. Two other functions additionally allow computing an L-value (i.e. the least activity value) for a user-defined time span termed 'Lflex' value. A plotting option is included in all functions. The package can be downloaded from the Comprehensive R Archives Network (CRAN). •The package 'nparACT' for R serves the non-parametric analysis of actigraphy data.•Computed parameters include interdaily stability (IS), intradaily variability (IV) and relative amplitude (RA) as well as start times and average activity during the 10 h with maximal and the 5 h with minimal activity (i.e. M10 and L5).

  1. Testing Result Statistics-Based Rapid Testing Method for Safety-Critical System

    Institute of Scientific and Technical Information of China (English)

    Zhi-Yao Deng; Nan Sang

    2008-01-01

    Safety-critical system (SCS) has highly demand for dependability, which requires plenty of resource to ensure that the system under test (SUT) satisfies the dependability requirement. In this paper, a new SCS rapid testing method is proposed to improve SCS adaptive dependability testing. The result of each test execution is saved in calculation memory unit and evaluated as an algorithm model. Then the least quantity of scenario test case for next test execution will be calculated according to the promised SUT's confidence level. The feedback data are generated to weight controller as the guideline for the further testing. Finally, a compre- hensive experiment study demonstrates that this adaptive testing method can really work in practice. This rapid testing method, testing result statistics-based adaptive control, makes the SCS dependability testing much more effective.

  2. Comparison of Statistical Methods for Detector Testing Programs

    Energy Technology Data Exchange (ETDEWEB)

    Rennie, John Alan [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Abhold, Mark [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-10-14

    A typical goal for any detector testing program is to ascertain not only the performance of the detector systems under test, but also the confidence that systems accepted using that testing program’s acceptance criteria will exceed a minimum acceptable performance (which is usually expressed as the minimum acceptable success probability, p). A similar problem often arises in statistics, where we would like to ascertain the fraction, p, of a population of items that possess a property that may take one of two possible values. Typically, the problem is approached by drawing a fixed sample of size n, with the number of items out of n that possess the desired property, x, being termed successes. The sample mean gives an estimate of the population mean p ≈ x/n, although usually it is desirable to accompany such an estimate with a statement concerning the range within which p may fall and the confidence associated with that range. Procedures for establishing such ranges and confidence limits are described in detail by Clopper, Brown, and Agresti for two-sided symmetric confidence intervals.

  3. Basic statistical tools in research and data analysis

    Science.gov (United States)

    Ali, Zulfiqar; Bhaskar, S Bala

    2016-01-01

    Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

  4. Breakdown of statistical inference from some random experiments

    CERN Document Server

    Kupczynski, Marian

    2014-01-01

    Many experiments can be interpreted in terms of random processes operating according to some internal protocols. When experiments are costly or cannot be repeated as in some clinical trials one has data gathered in only one or in a few long runs of the experiment. In this paper we study data generated by computer experiments operating according to particular internal protocols. We show that the standard statistical analysis of a sample, containing 100 000 data points or more, may sometimes be highly misleading and statistical errors largely underestimated. Our results confirm in a dramatic way the dangers of standard asymptotic statistical inference based on data gathered in one, possibly long run of the experiment. We demonstrate that analyzing various subdivisions of samples by multiple chi-square tests and chi-square frequency graphs is very effective in detecting the anomalies. Therefore to assure correctness of the statistical inference the above mentioned chi-square tests and other non-parametric sample...

  5. Basic statistical tools in research and data analysis.

    Science.gov (United States)

    Ali, Zulfiqar; Bhaskar, S Bala

    2016-09-01

    Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

  6. Basic statistical tools in research and data analysis

    Directory of Open Access Journals (Sweden)

    Zulfiqar Ali

    2016-01-01

    Full Text Available Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

  7. A test statistic for the affected-sib-set method.

    Science.gov (United States)

    Lange, K

    1986-07-01

    This paper discusses generalizations of the affected-sib-pair method. First, the requirement that sib identity-by-descent relations be known unambiguously is relaxed by substituting sib identity-by-state relations. This permits affected sibs to be used even when their parents are unavailable for typing. In the limit of an infinite number of marker alleles each of infinitesimal population frequency, the identity-by-state relations coincide with the usual identity-by-descent relations. Second, a weighted pairs test statistic is proposed that covers affected sib sets of size greater than two. These generalizations make the affected-sib-pair method a more powerful technique for detecting departures from independent segregation of disease and marker phenotypes. A sample calculation suggests such a departure for tuberculoid leprosy and the HLA D locus.

  8. Evaluation of statistical tools used in short-term repeated dose administration toxicity studies with rodents.

    Science.gov (United States)

    Kobayashi, Katsumi; Pillai, K Sadasivan; Sakuratani, Yuki; Abe, Takemaru; Kamata, Eiichi; Hayashi, Makoto

    2008-02-01

    In order to know the different statistical tools used to analyze the data obtained from twenty-eight-day repeated dose oral toxicity studies with rodents and the impact of these statistical tools on interpretation of data obtained from the studies, study reports of 122 numbers of twenty-eight-day repeated dose oral toxicity studies conducted in rats were examined. It was found that both complex and easy routes of decision trees were followed for the analysis of the quantitative data. These tools include Scheffe's test, non-parametric type Dunnett's and Scheffe's tests with very low power. Few studies used the non-parametric Dunnett type test and Mann-Whitney's U test. Though Chi-square and Fisher's tests are widely used for analysis of qualitative data, their sensitivity to detect a treatment-related effect is questionable. Mann-Whitney's U test has better sensitivity to analyze qualitative data than the chi-square and Fisher's tests. We propose Dunnett's test for analysis of quantitative data obtained from twenty-eight-day repeated dose oral toxicity tests and for qualitative data, Mann-Whitney's U test. For both tests, one-sided test with p=0.05 may be applied.

  9. SOCR Analyses: Implementation and Demonstration of a New Graphical Statistics Educational Toolkit

    Directory of Open Access Journals (Sweden)

    Annie Chu

    2009-04-01

    Full Text Available The web-based, Java-written SOCR (Statistical Online Computational Resource toolshave been utilized in many undergraduate and graduate level statistics courses for sevenyears now (Dinov 2006; Dinov et al. 2008b. It has been proven that these resourcescan successfully improve students' learning (Dinov et al. 2008b. Being rst publishedonline in 2005, SOCR Analyses is a somewhat new component and it concentrate on datamodeling for both parametric and non-parametric data analyses with graphical modeldiagnostics. One of the main purposes of SOCR Analyses is to facilitate statistical learn-ing for high school and undergraduate students. As we have already implemented SOCRDistributions and Experiments, SOCR Analyses and Charts fulll the rest of a standardstatistics curricula. Currently, there are four core components of SOCR Analyses. Linearmodels included in SOCR Analyses are simple linear regression, multiple linear regression,one-way and two-way ANOVA. Tests for sample comparisons include t-test in the para-metric category. Some examples of SOCR Analyses' in the non-parametric category areWilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, Kolmogorov-Smirno testand Fligner-Killeen test. Hypothesis testing models include contingency table, Friedman'stest and Fisher's exact test. The last component of Analyses is a utility for computingsample sizes for normal distribution. In this article, we present the design framework,computational implementation and the utilization of SOCR Analyses.

  10. A statistical design for testing apomictic diversification through linkage analysis.

    Science.gov (United States)

    Zeng, Yanru; Hou, Wei; Song, Shuang; Feng, Sisi; Shen, Lin; Xia, Guohua; Wu, Rongling

    2014-03-01

    The capacity of apomixis to generate maternal clones through seed reproduction has made it a useful characteristic for the fixation of heterosis in plant breeding. It has been observed that apomixis displays pronounced intra- and interspecific diversification, but the genetic mechanisms underlying this diversification remains elusive, obstructing the exploitation of this phenomenon in practical breeding programs. By capitalizing on molecular information in mapping populations, we describe and assess a statistical design that deploys linkage analysis to estimate and test the pattern and extent of apomictic differences at various levels from genotypes to species. The design is based on two reciprocal crosses between two individuals each chosen from a hermaphrodite or monoecious species. A multinomial distribution likelihood is constructed by combining marker information from two crosses. The EM algorithm is implemented to estimate the rate of apomixis and test its difference between two plant populations or species as the parents. The design is validated by computer simulation. A real data analysis of two reciprocal crosses between hickory (Carya cathayensis) and pecan (C. illinoensis) demonstrates the utilization and usefulness of the design in practice. The design provides a tool to address fundamental and applied questions related to the evolution and breeding of apomixis.

  11. Development and testing of improved statistical wind power forecasting methods.

    Energy Technology Data Exchange (ETDEWEB)

    Mendes, J.; Bessa, R.J.; Keko, H.; Sumaili, J.; Miranda, V.; Ferreira, C.; Gama, J.; Botterud, A.; Zhou, Z.; Wang, J. (Decision and Information Sciences); (INESC Porto)

    2011-12-06

    Wind power forecasting (WPF) provides important inputs to power system operators and electricity market participants. It is therefore not surprising that WPF has attracted increasing interest within the electric power industry. In this report, we document our research on improving statistical WPF algorithms for point, uncertainty, and ramp forecasting. Below, we provide a brief introduction to the research presented in the following chapters. For a detailed overview of the state-of-the-art in wind power forecasting, we refer to [1]. Our related work on the application of WPF in operational decisions is documented in [2]. Point forecasts of wind power are highly dependent on the training criteria used in the statistical algorithms that are used to convert weather forecasts and observational data to a power forecast. In Chapter 2, we explore the application of information theoretic learning (ITL) as opposed to the classical minimum square error (MSE) criterion for point forecasting. In contrast to the MSE criterion, ITL criteria do not assume a Gaussian distribution of the forecasting errors. We investigate to what extent ITL criteria yield better results. In addition, we analyze time-adaptive training algorithms and how they enable WPF algorithms to cope with non-stationary data and, thus, to adapt to new situations without requiring additional offline training of the model. We test the new point forecasting algorithms on two wind farms located in the U.S. Midwest. Although there have been advancements in deterministic WPF, a single-valued forecast cannot provide information on the dispersion of observations around the predicted value. We argue that it is essential to generate, together with (or as an alternative to) point forecasts, a representation of the wind power uncertainty. Wind power uncertainty representation can take the form of probabilistic forecasts (e.g., probability density function, quantiles), risk indices (e.g., prediction risk index) or scenarios

  12. Testing Result Statistics-Based Rapid Testing Method for Safety-Critical System

    Institute of Scientific and Technical Information of China (English)

    Zhi-Yao Deng; Nan Sang

    2008-01-01

    Safety-critical system (SCS) has highlydemand for dependability, which requires plenty ofresource to ensure that the system under test (SUT)satisfies the dependability requirement. In this paper, anew SCS rapid testing method is proposed to improveSCS adaptive dependability testing. The result of each testexecution is saved in calculation memory unit andevaluated as an algorithm model. Then the least quantityof scenario test case for next test execution will becalculated according to the promised SUT's confidencelevel. The feedback data are generated to weightcontroller as the guideline for the further testing. Finally,a compre- hensive experiment study demonstrates thatthis adaptive testing method can really work in practice.This rapid testing method, testing result statistics-basedadaptive control, makes the SCS dependability testingmuch more effective.

  13. Determination of drug absorption rate in time-variant disposition by direct deconvolution using beta clearance correction and end-constrained non-parametric regression.

    Science.gov (United States)

    Neelakantan, S; Veng-Pedersen, P

    2005-11-01

    A novel numerical deconvolution method is presented that enables the estimation of drug absorption rates under time-variant disposition conditions. The method involves two components. (1) A disposition decomposition-recomposition (DDR) enabling exact changes in the unit impulse response (UIR) to be constructed based on centrally based clearance changes iteratively determined. (2) A non-parametric, end-constrained cubic spline (ECS) input response function estimated by cross-validation. The proposed DDR-ECS method compensates for disposition changes between the test and the reference administrations by using a "beta" clearance correction based on DDR analysis. The representation of the input response by the ECS method takes into consideration the complex absorption process and also ensures physiologically realistic approximations of the response. The stability of the new method to noisy data was evaluated by comprehensive simulations that considered different UIRs, various input functions, clearance changes and a novel scaling of the input function that includes the "flip-flop" absorption phenomena. The simulated input response was also analysed by two other methods and all three methods were compared for their relative performances. The DDR-ECS method provides better estimation of the input profile under significant clearance changes but tends to overestimate the input when there were only small changes in the clearance.

  14. Non-parametric study of the evolution of the cosmological equation of state with SNeIa, BAO and high redshift GRBs

    CERN Document Server

    Postnikov, Sergey; Hernandez, Xavier; Capozziello, Salvatore

    2014-01-01

    We study the dark energy equation of state as a function of redshift in a non-parametric way, without imposing any {\\it a priori} $w(z)$ (ratio of pressure over energy density) functional form. As a check of the method, we test our scheme through the use of synthetic data sets produced from different input cosmological models which have the same relative errors and redshift distribution as the real data. Using the luminosity-time $L_{X}-T_{a}$ correlation for GRB X-ray afterglows (the Dainotti et al. correlation), we are able to utilize GRB sample from the {\\it Swift} satellite as probes of the expansion history of the Universe out to $z \\approx 10$. Within the assumption of a flat FLRW universe and combining SNeIa data with BAO constraints, the resulting maximum likelihood solutions are close to a constant $w=-1$. If one imposes the restriction of a constant $w$, we obtain $w=-0.99 \\pm 0.06$ (consistent with a cosmological constant) with the present day Hubble constant as $H_{0}=70.0 \\pm 0.6$ ${\\rm km} \\, {\\...

  15. Material analysis on engineering statistics

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Seung Hun

    2008-03-15

    This book is about material analysis on engineering statistics using mini tab, which includes technical statistics and seven tools of QC, probability distribution, presumption and checking, regression analysis, tim series analysis, control chart, process capacity analysis, measurement system analysis, sampling check, experiment planning, response surface analysis, compound experiment, Taguchi method, and non parametric statistics. It is good for university and company to use because it deals with theory first and analysis using mini tab on 6 sigma BB and MBB.

  16. The analysis of variance in anaesthetic research: statistics, biography and history.

    Science.gov (United States)

    Pandit, J J

    2010-12-01

    Multiple t-tests (or their non-parametric equivalents) are often used erroneously to compare the means of three or more groups in anaesthetic research. Methods for correcting the p value regarded as significant can be applied to take account of multiple testing, but these are somewhat arbitrary and do not avoid several unwieldy calculations. The appropriate method for most such comparisons is the 'analysis of variance' that not only economises on the number of statistical procedures, but also indicates if underlying factors or sub-groups have contributed to any significant results. This article outlines the history, rationale and method of this analysis.

  17. Comparing non-parametric methods for ungrouping coarsely aggregated age-specific distributions

    DEFF Research Database (Denmark)

    Rizzi, Silvia; Thinggaard, Mikael; Vaupel, James W.

    2016-01-01

    Demographers have often access to vital statistics that are less than ideal for the purpose of their research. In many instances demographic data are reported in coarse histograms, where the values given are only the summation of true latent values, thereby making detailed analysis troublesome. O...

  18. Singular Value Decomposition, Hessian Errors, and Linear Algebra of Non-parametric Extraction of Partons from DIS

    CERN Document Server

    Goshtasbpour, Mehrdad

    2014-01-01

    By singular value decomposition (SVD) of a numerically singular Hessian matrix and a numerically singular system of linear equations for the experimental data (accumulated in the respective ${\\chi ^2}$ function) and constraints, least square solutions and their propagated errors for the non-parametric extraction of Partons from $F_2$ are obtained. SVD and its physical application is phenomenologically described in the two cases. Among the subjects covered are: identification and properties of the boundary between the two subsets of ordered eigenvalues corresponding to range and null space, and the eigenvalue structure of the null space of the singular matrix, including a second boundary separating the smallest eigenvalues of essentially no information, in a particular case. The eigenvector-eigenvalue structure of "redundancy and smallness" of the errors of two pdf sets, in our simplified Hessian model, is described by a secondary manifestation of deeper null space, in the context of SVD.

  19. A non-parametric conditional bivariate reference region with an application to height/weight measurements on normal girls

    DEFF Research Database (Denmark)

    Petersen, Jørgen Holm

    2009-01-01

    A conceptually simple two-dimensional conditional reference curve is described. The curve gives a decision basis for determining whether a bivariate response from an individual is "normal" or "abnormal" when taking into account that a third (conditioning) variable may influence the bivariate...... response. The reference curve is not only characterized analytically but also by geometric properties that are easily communicated to medical doctors - the users of such curves. The reference curve estimator is completely non-parametric, so no distributional assumptions are needed about the two......-dimensional response. An example that will serve to motivate and illustrate the reference is the study of the height/weight distribution of 7-8-year-old Danish school girls born in 1930, 1950, or 1970....

  20. Non-parametric frontier approach to modelling the relationships among population, GDP, energy consumption and CO{sub 2} emissions

    Energy Technology Data Exchange (ETDEWEB)

    Lozano, Sebastian; Gutierrez, Ester [University of Seville, E.S.I., Department of Industrial Management, Camino de los Descubrimientos, s/n, 41092 Sevilla (Spain)

    2008-07-15

    In this paper, a non-parametric approach based in Data Envelopment Analysis (DEA) is proposed as an alternative to the Kaya identity (a.k.a ImPACT). This Frontier Method identifies and extends existing best practices. Population and GDP are considered as input and output, respectively. Both primary energy consumption and Greenhouse Gas (GHG) emissions are considered as undesirable outputs. Several Linear Programming models are formulated with different aims, namely: (a) determine efficiency levels, (b) estimate maximum GDP compatible with given levels of population, energy intensity and carbonization intensity, and (c) estimate the minimum level of GHG emissions compatible with given levels of population, GDP, energy intensity or carbonization index. The United States of America case is used as illustration of the proposed approach. (author)

  1. Adaptive ILC algorithms of nonlinear continuous systems with non-parametric uncertainties for non-repetitive trajectory tracking

    Science.gov (United States)

    Li, Xiao-Dong; Lv, Mang-Mang; Ho, John K. L.

    2016-07-01

    In this article, two adaptive iterative learning control (ILC) algorithms are presented for nonlinear continuous systems with non-parametric uncertainties. Unlike general ILC techniques, the proposed adaptive ILC algorithms allow that both the initial error at each iteration and the reference trajectory are iteration-varying in the ILC process, and can achieve non-repetitive trajectory tracking beyond a small initial time interval. Compared to the neural network or fuzzy system-based adaptive ILC schemes and the classical ILC methods, in which the number of iterative variables is generally larger than or equal to the number of control inputs, the first adaptive ILC algorithm proposed in this paper uses just two iterative variables, while the second even uses a single iterative variable provided that some bound information on system dynamics is known. As a result, the memory space in real-time ILC implementations is greatly reduced.

  2. Detrending the long-term stellar activity and the systematics of the Kepler data with a non-parametric approach

    CERN Document Server

    Danielski, C; Tinetti, G

    2013-01-01

    The NASA Kepler mission is delivering groundbreaking results, with an increasing number of Earth-sized and moon-sized objects been discovered. A high photometric precision can be reached only through a thorough removal of the stellar activity and the instrumental systematics. We have explored here the possibility of using non-parametric methods to analyse the Simple Aperture Photometry data observed by the Kepler mission. We focused on a sample of stellar light curves with different effective temperatures and flux modulations, and we found that Gaussian Processes-based techniques can very effectively correct the instrumental systematics along with the long-term stellar activity. Our method can disentangle astrophysical features (events), such as planetary transits, flares or general sudden variations in the intensity, from the star signal and it is very efficient as it requires only a few training iterations of the Gaussian Process model. The results obtained show the potential of our method to isolate the ma...

  3. Microprocessors as an Adjunct to Statistics Instruction.

    Science.gov (United States)

    Miller, William G.

    Examinations of costs and acquisition of facilities indicate that an Altair 8800A microcomputer with a program library of parametric, non-parametric, mathematical, and teaching programs can be used effectively for teaching college-level statistics. Statistical packages presently in use require extensive computing knowledge beyond the students' and…

  4. New Statistical PDFs: Predictions and Tests up to LHC Energies

    CERN Document Server

    Soffer, Jacques

    2016-01-01

    The quantum statistical parton distributions approach proposed more than one decade ago is revisited by considering a larger set of recent and accurate Deep Inelastic Scattering experimental results. It enables us to improve the description of the data by means of a new determination of the parton distributions. This global next-to-leading order QCD analysis leads to a good description of several structure functions, involving unpolarized parton distributions and helicity distributions, in a broad range of $x$ and $Q^2$ and in terms of a rather small number of free parameters. There are several challenging issues, in particular the behavior of $\\bar d(x) / \\bar u(x)$ at large $x$, a possible large positive gluon helicity distribution, etc.. The predictions of this theoretical approach will be tested for single-jet production and charge asymmetry in $W^{\\pm}$ production in $\\bar p p$ and $p p$ collisions up to LHC energies, using recent data and also for forthcoming experimental results.

  5. Statistical tools for weld defect evaluation in radiographic testing

    Energy Technology Data Exchange (ETDEWEB)

    Nacereddine, N.; Tridi, M. [LTSI, Centre de Recherche en Soudage et Controle, Alger (Algeria); Hamami, L. [Ecole National Polytechnique, Alger (Algeria). Dept. Electronique; Ziou, D. [Sherbrooke Univ., Quebec (Canada). DMI, Faculte des Sciences

    2006-07-01

    A reliable detection of defects in welded joints is one of the most important tasks in non-destructive testing by radiography, since the human factor still has a decisive influence on the evaluation of defects on the film. An incorrect classification may disapprove a piece in good conditions or approve a piece with discontinuities exceeding the limit established by the applicable standards. The progresses in computer science and the artificial intelligence techniques have allowed the welded joint quality interpretation to be carried out by using pattern recognition tools, making the system of the weld inspection more reliable, reproducible and faster. In this work, we develop and implement algorithms based on statistical approaches for segmentation and classification of the weld defects. Because of the complex nature of the considered images and so that the extracted defect area represents the most accurately possible the real defect, and that the detected defect corresponds as well as possible to its real class, the choice of the algorithms must be very judicious. In order to achieve this, a comparative study of the various segmentation and classification methods was performed to demonstrate the advantages of the ones in comparison with the others giving to the most optimal combinations. (orig.)

  6. 非参数项目反应理论回顾与展望%The Retrospect and Prospect of Non-parametric Item Response Theory

    Institute of Scientific and Technical Information of China (English)

    陈婧; 康春花; 钟晓玲

    2013-01-01

      相比参数项目反应理论,非参数项目反应理论提供了更吻合实践情境的理论框架。目前非参数项目反应理论研究主要关注参数估计方法及其比较、数据-模型拟合验证等方面,其应用研究则集中于量表修订及个性数据和项目功能差异分析,而在认知诊断理论基础上发展起来的非参数认知诊断理论更是凸显其应用优势。未来研究应更多侧重于非参数项目反应理论的实践应用,对非参数认知诊断理论的研究也值得关注,以充分发挥非参数方法在实践领域的应用优势。%  Compared to parametric item response theory, non-parametric item response theory provide a more appropriate theoretical framework of practice situations. Non-parametric item response theory research focuses on parameter estimation methods and its comparison, data- model fitting verify etc. currently.Its applied research concentrate on scale amendments, personalized data and differential item functioning analysis. Non-parametric cognitive diagnostic theory which based on the parametric cognitive diagnostic theory gives prominence to the advantages of its application.To give full play to the advantages of non-parametric methods in practice,future studies should emphasis on the application of non-parametric item response theory while cognitive diagnosis of the non-parametric study is also worth of attention.

  7. Statistical tests for taxonomic distinctiveness from observations of monophyly.

    Science.gov (United States)

    Rosenberg, Noah A

    2007-02-01

    The observation of monophyly for a specified set of genealogical lineages is often used to place the lineages into a distinctive taxonomic entity. However, it is sometimes possible that monophyly of the lineages can occur by chance as an outcome of the random branching of lineages within a single taxon. Thus, especially for small samples, an observation of monophyly for a set of lineages--even if strongly supported statistically--does not necessarily indicate that the lineages are from a distinctive group. Here I develop a test of the null hypothesis that monophyly is a chance outcome of random branching. I also compute the sample size required so that the probability of chance occurrence of monophyly of a specified set of lineages lies below a prescribed tolerance. Under the null model of random branching, the probability that monophyly of the lineages in an index group occurs by chance is substantial if the sample is highly asymmetric, that is, if only a few of the sampled lineages are from the index group, or if only a few lineages are external to the group. If sample sizes are similar inside and outside the group of interest, however, chance occurrence of monophyly can be rejected at stringent significance levels (P < 10(-5)) even for quite small samples (approximately 20 total lineages). For a fixed total sample size, rejection of the null hypothesis of random branching in a single taxon occurs at the most stringent level if samples of nearly equal size inside and outside the index group--with a slightly greater size within the index group--are used. Similar results apply, with smaller sample sizes needed, when reciprocal monophyly of two groups, rather than monophyly of a single group, is of interest. The results suggest minimal sample sizes required for inferences to be made about taxonomic distinctiveness from observations of monophyly.

  8. Statistics

    Science.gov (United States)

    Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.

  9. Non-parametric deprojection of NIKA SZ observations: Pressure distribution in the Planck-discovered cluster PSZ1 G045.85+57.71

    Science.gov (United States)

    Ruppin, F.; Adam, R.; Comis, B.; Ade, P.; André, P.; Arnaud, M.; Beelen, A.; Benoît, A.; Bideaud, A.; Billot, N.; Bourrion, O.; Calvo, M.; Catalano, A.; Coiffard, G.; D'Addabbo, A.; De Petris, M.; Désert, F.-X.; Doyle, S.; Goupy, J.; Kramer, C.; Leclercq, S.; Macías-Pérez, J. F.; Mauskopf, P.; Mayet, F.; Monfardini, A.; Pajot, F.; Pascale, E.; Perotto, L.; Pisano, G.; Pointecouteau, E.; Ponthieu, N.; Pratt, G. W.; Revéret, V.; Ritacco, A.; Rodriguez, L.; Romero, C.; Schuster, K.; Sievers, A.; Triqueneaux, S.; Tucker, C.; Zylka, R.

    2017-01-01

    The determination of the thermodynamic properties of clusters of galaxies at intermediate and high redshift can bring new insights into the formation of large-scale structures. It is essential for a robust calibration of the mass-observable scaling relations and their scatter, which are key ingredients for precise cosmology using cluster statistics. Here we illustrate an application of high resolution (R 0.02 R500) to its outskirts (R 3 R500) non-parametrically for the first time at intermediate redshift. The constraints on the resulting pressure profile allow us to reduce the relative uncertainty on the integrated Compton parameter by a factor of two compared to the Planck value. Combining the tSZ data and the deprojected electronic density profile from XMM-Newton allows us to undertake a hydrostatic mass analysis, for which we study the impact of a spherical model assumption on the total mass estimate. We also investigate the radial temperature and entropy distributions. These data indicate that PSZ1 G045.85+57.71 is a massive (M500 5.5 × 1014M⊙) cool-core cluster. This work is part of a pilot study aiming at optimizing the treatment of the NIKA2 tSZ large program dedicated to the follow-up of SZ-discovered clusters at intermediate and high redshifts. This study illustrates the potential of NIKA2 to put constraints on thethermodynamic properties and tSZ-scaling relations of these clusters, and demonstrates the excellent synergy between tSZ and X-ray observations of similar angular resolution.

  10. STATISTICAL ANALYSIS OF SOME EXPERIMENTAL FATIGUE TESTS RESULTS

    OpenAIRE

    Adrian Stere PARIS; Gheorghe AMZA; Claudiu BABIŞ; Dan Niţoi

    2012-01-01

    The paper details the results of processing the fatigue data experiments to find the regression function. Application software for statistical processing like ANOVA and regression calculi are properly utilized, with emphasis on popular software like MSExcel and CurveExpert

  11. TESTS OF ELLIPTICAL SYMMETRY AND THE ASYMPTOTIC TAIL BEHAVIOR OF THE STATISTICS

    Institute of Scientific and Technical Information of China (English)

    JingPing; ZhuLixing

    1999-01-01

    In this paper, some test statistics Of Kolmogorov type and Cramervon Mises type based on projection pursuit technique are proposed for testing the sphericity problem of a high-dimensional distribution. The limiting distributions of the test statistics are derived under the null hypothesis. The asymptotic properties of Bootstrap approximation are investigated and the tail behaviors of the statistics are studied.

  12. Wind speed forecasting at different time scales: a non parametric approach

    CERN Document Server

    D'Amico, Guglielmo; Prattico, Flavio

    2013-01-01

    The prediction of wind speed is one of the most important aspects when dealing with renewable energy. In this paper we show a new nonparametric model, based on semi-Markov chains, to predict wind speed. Particularly we use an indexed semi-Markov model, that reproduces accurately the statistical behavior of wind speed, to forecast wind speed one step ahead for different time scales and for very long time horizon maintaining the goodness of prediction. In order to check the main features of the model we show, as indicator of goodness, the root mean square error between real data and predicted ones and we compare our forecasting results with those of a persistence model.

  13. Non-parametric probabilistic forecasts of wind power: required properties and evaluation

    DEFF Research Database (Denmark)

    Pinson, Pierre; Nielsen, Henrik Aalborg; Møller, Jan Kloppenborg;

    2007-01-01

    of the conditional expectation of future generation for each look-ahead time, but also with uncertainty estimates given by probabilistic forecasts. In order to avoid assumptions on the shape of predictive distributions, these probabilistic predictions are produced from nonparametric methods, and then take the form...... of a single or a set of quantile forecasts. The required and desirable properties of such probabilistic forecasts are defined and a framework for their evaluation is proposed. This framework is applied for evaluating the quality of two statistical methods producing full predictive distributions from point......Predictions of wind power production for horizons up to 48-72 hour ahead comprise a highly valuable input to the methods for the daily management or trading of wind generation. Today, users of wind power predictions are not only provided with point predictions, which are estimates...

  14. Non-parametric classification of esophagus motility by means of neural networks

    DEFF Research Database (Denmark)

    Thøgersen, C; Rasmussen, C; Rutz, K

    1997-01-01

    . The aim of the present work has been to test the ability of neural networks to identify abnormal contraction patterns in patients with non-obstructive dysphagia (NOBD). Nineteen volunteers and 22 patients with NOBD underwent simultaneous recordings of four pressures in the esophagus for at least 23 hours...

  15. The binned bispectrum estimator: template-based and non-parametric CMB non-Gaussianity searches

    CERN Document Server

    Bucher, Martin; van Tent, Bartjan

    2015-01-01

    We describe the details of the binned bispectrum estimator as used for the official 2013 and 2015 analyses of the temperature and polarization CMB maps from the ESA Planck satellite. The defining aspect of this estimator is the determination of a map bispectrum (3-point correlator) that has been binned in harmonic space. For a parametric determination of the non-Gaussianity in the map (the so-called fNL parameters), one takes the inner product of this binned bispectrum with theoretically motivated templates. However, as a complementary approach one can also smooth the binned bispectrum using a variable smoothing scale in order to suppress noise and make coherent features stand out above the noise. This allows one to look in a model-independent way for any statistically significant bispectral signal. This approach is useful for characterizing the bispectral shape of the galactic foreground emission, for which a theoretical prediction of the bispectral anisotropy is lacking, and for detecting a serendipitous pr...

  16. Non-parametric causality detection: An application to social media and financial data

    Science.gov (United States)

    Tsapeli, Fani; Musolesi, Mirco; Tino, Peter

    2017-10-01

    According to behavioral finance, stock market returns are influenced by emotional, social and psychological factors. Several recent works support this theory by providing evidence of correlation between stock market prices and collective sentiment indexes measured using social media data. However, a pure correlation analysis is not sufficient to prove that stock market returns are influenced by such emotional factors since both stock market prices and collective sentiment may be driven by a third unmeasured factor. Controlling for factors that could influence the study by applying multivariate regression models is challenging given the complexity of stock market data. False assumptions about the linearity or non-linearity of the model and inaccuracies on model specification may result in misleading conclusions. In this work, we propose a novel framework for causal inference that does not require any assumption about a particular parametric form of the model expressing statistical relationships among the variables of the study and can effectively control a large number of observed factors. We apply our method in order to estimate the causal impact that information posted in social media may have on stock market returns of four big companies. Our results indicate that social media data not only correlate with stock market returns but also influence them.

  17. 分布函数的非参数最小二乘估计%NON-PARAMETRIC LEAST SQUARE ESTIMATION OF DISTRIBUTION FUNCTION

    Institute of Scientific and Technical Information of China (English)

    柴根象; 花虹; 尚汉冀

    2002-01-01

    By using the non-parametric least square method, the strong consistent estimations of distribution function and failure function are established,where the distribution function F(x) after logist transformation is assumed to be approximated by a polynomial.The performance of simulation shows that the estimations are highly satisfactory.

  18. A Non Parametric Model for the Forecasting of the Venezuelan Oil Prices

    CERN Document Server

    Costanzo, Sabatino; Dehne, Wafaa; Prato, Hender

    2007-01-01

    A neural net model for forecasting the prices of Venezuelan crude oil is proposed. The inputs of the neural net are selected by reference to a dynamic system model of oil prices by Mashayekhi (1995, 2001) and its performance is evaluated using two criteria: the Excess Profitability test by Anatoliev and Gerko (2005) and the characteristics of the equity curve generated by a trading strategy based on the neural net predictions. ----- Se introduce aqui un modelo no parametrico para pronosticar los precios del petroleo Venezolano cuyos insumos son seleccionados en base a un sistema dinamico que explica los precios en terminos de dichos insumos. Se describe el proceso de recoleccion y pre-procesamiento de datos y la corrida de la red y se evaluan sus pronosticos a traves de un test estadistico de predictibilidad y de las caracteristicas del Equity Curve inducido por la estrategia de compraventa bursatil generada por dichos pronosticos.

  19. "Happiness in Life Domains: Evidence from Bangladesh Based on Parametric and Non-Parametric Models"

    OpenAIRE

    Minhaj Mahmud; Yasuyuki Sawada

    2015-01-01

    This paper applies a two layer approach to explain overall happiness both as a function of happiness in different life-domains and conventional explanatory variables such as income, education and health etc. Then it tests the happiness-income relationship in different happiness domains. Overall, the results suggest that income explains a large part of the variation in total happiness and that income is closely related with domain-specific happiness, even with non-economic domains. This is als...

  20. Statistical significance of trends in monthly heavy precipitation over the US

    KAUST Repository

    Mahajan, Salil

    2011-05-11

    Trends in monthly heavy precipitation, defined by a return period of one year, are assessed for statistical significance in observations and Global Climate Model (GCM) simulations over the contiguous United States using Monte Carlo non-parametric and parametric bootstrapping techniques. The results from the two Monte Carlo approaches are found to be similar to each other, and also to the traditional non-parametric Kendall\\'s τ test, implying the robustness of the approach. Two different observational data-sets are employed to test for trends in monthly heavy precipitation and are found to exhibit consistent results. Both data-sets demonstrate upward trends, one of which is found to be statistically significant at the 95% confidence level. Upward trends similar to observations are observed in some climate model simulations of the twentieth century, but their statistical significance is marginal. For projections of the twenty-first century, a statistically significant upwards trend is observed in most of the climate models analyzed. The change in the simulated precipitation variance appears to be more important in the twenty-first century projections than changes in the mean precipitation. Stochastic fluctuations of the climate-system are found to be dominate monthly heavy precipitation as some GCM simulations show a downwards trend even in the twenty-first century projections when the greenhouse gas forcings are strong. © 2011 Springer-Verlag.

  1. Statistical analyses for NANOGrav 5-year timing residuals

    Science.gov (United States)

    Wang, Yan; Cordes, James M.; Jenet, Fredrick A.; Chatterjee, Shami; Demorest, Paul B.; Dolch, Timothy; Ellis, Justin A.; Lam, Michael T.; Madison, Dustin R.; McLaughlin, Maura A.; Perrodin, Delphine; Rankin, Joanna; Siemens, Xavier; Vallisneri, Michele

    2017-02-01

    In pulsar timing, timing residuals are the differences between the observed times of arrival and predictions from the timing model. A comprehensive timing model will produce featureless residuals, which are presumably composed of dominating noise and weak physical effects excluded from the timing model (e.g. gravitational waves). In order to apply optimal statistical methods for detecting weak gravitational wave signals, we need to know the statistical properties of noise components in the residuals. In this paper we utilize a variety of non-parametric statistical tests to analyze the whiteness and Gaussianity of the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) 5-year timing data, which are obtained from Arecibo Observatory and Green Bank Telescope from 2005 to 2010. We find that most of the data are consistent with white noise; many data deviate from Gaussianity at different levels, nevertheless, removing outliers in some pulsars will mitigate the deviations.

  2. Statistical Analyses for NANOGrav 5-year Timing Residuals

    CERN Document Server

    Wang, Y; Jenet, F A; Chatterjee, S; Demorest, P B; Dolch, T; Ellis, J A; Lam, M T; Madison, D R; McLaughlin, M; Perrodin, D; Rankin, J; Siemens, X; Vallisneri, M

    2016-01-01

    In pulsar timing, timing residuals are the differences between the observed times of arrival and the predictions from the timing model. A comprehensive timing model will produce featureless residuals, which are presumably composed of dominating noise and weak physical effects excluded from the timing model (e.g. gravitational waves). In order to apply the optimal statistical methods for detecting the weak gravitational wave signals, we need to know the statistical properties of the noise components in the residuals. In this paper we utilize a variety of non-parametric statistical tests to analyze the whiteness and Gaussianity of the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) 5-year timing data which are obtained from the Arecibo Observatory and the Green Bank Telescope from 2005 to 2010 (Demorest et al. 2013). We find that most of the data are consistent with white noise; Many data deviate from Gaussianity at different levels, nevertheless, removing outliers in some pulsars will m...

  3. STATISTICAL ANALYSIS OF SOME EXPERIMENTAL FATIGUE TESTS RESULTS

    Directory of Open Access Journals (Sweden)

    Adrian Stere PARIS

    2012-05-01

    Full Text Available The paper details the results of processing the fatigue data experiments to find the regression function. Application software for statistical processing like ANOVA and regression calculi are properly utilized, with emphasis on popular software like MSExcel and CurveExpert

  4. 基于工业控制模型的非参数CUSUM入侵检测方法%A non-parametric CUSUM intrusion detection method based on industrial control model

    Institute of Scientific and Technical Information of China (English)

    张云贵; 赵华; 王丽娜

    2012-01-01

    To deal with the rising serious information security problem of the industrial control system (ICS) , this paper presents an intrusion detection method of the non-parametric cumulative sum (CUSUM) for industrial control network. Using the output-input dependent characteristics of the ICS, a mathematical model of the ICS is established to predict the output of the system. Once the sensors of the control system are under attack, the actual output will change. At every moment, the difference between the predicted output of the industrial control model and the measured signal by the sensors is calculated, and then the time-based statistical sequence is formed. By the non-parametric CUSUM algorithm, the online detection of the intrusion attacks is implemented and alarmed. The simulated detection experiments show that the proposed method has a good real-time and low false alarm rate. By choosing appropriate parameters r and β of the non-parametric CUSUM algorithm, the intrusion detection method can accurately detect the attacks before substantial damage to the control system and it is also helpful to monitor the misoperation.%为解决日趋严重的工业控制系统(industrial control system,ICS)信息安全问题,提出一种针对工业控制网络的非参数累积和( cumulative sum,CUSUM)入侵检测方法.利用ICS输入决定输出的特性,建立ICS的数学模型预测系统的输出,一旦控制系统的传感器遭受攻击,实际输出信号将发生改变.在每个时刻,计算工业控制模型的预测输出与传感器测量信号的差值,形成基于时间的统计序列,采用非参数CUSUM算法,实现在线检测入侵并报警.仿真检测实验证明,该方法具有良好的实时性和低误报率.选择适当的非参数CUSUM算法参数T和β,该入侵检测方法不但能在攻击对控制系统造成实质伤害前检测出攻击,还对监测ICS中的误操作有一定帮助.

  5. Temporal Expression of Peripheral Blood Leukocyte Biomarkers in a Macaca fascicularis Infection Model of Tuberculosis; Comparison with Human Datasets and Analysis with Parametric/Non-parametric Tools for Improved Diagnostic Biomarker Identification.

    Directory of Open Access Journals (Sweden)

    Sajid Javed

    Full Text Available A temporal study of gene expression in peripheral blood leukocytes (PBLs from a Mycobacterium tuberculosis primary, pulmonary challenge model Macaca fascicularis has been conducted. PBL samples were taken prior to challenge and at one, two, four and six weeks post-challenge and labelled, purified RNAs hybridised to Operon Human Genome AROS V4.0 slides. Data analyses revealed a large number of differentially regulated gene entities, which exhibited temporal profiles of expression across the time course study. Further data refinements identified groups of key markers showing group-specific expression patterns, with a substantial reprogramming event evident at the four to six week interval. Selected statistically-significant gene entities from this study and other immune and apoptotic markers were validated using qPCR, which confirmed many of the results obtained using microarray hybridisation. These showed evidence of a step-change in gene expression from an 'early' FOS-associated response, to a 'late' predominantly type I interferon-driven response, with coincident reduction of expression of other markers. Loss of T-cell-associate marker expression was observed in responsive animals, with concordant elevation of markers which may be associated with a myeloid suppressor cell phenotype e.g. CD163. The animals in the study were of different lineages and these Chinese and Mauritian cynomolgous macaque lines showed clear evidence of differing susceptibilities to Tuberculosis challenge. We determined a number of key differences in response profiles between the groups, particularly in expression of T-cell and apoptotic makers, amongst others. These have provided interesting insights into innate susceptibility related to different host `phenotypes. Using a combination of parametric and non-parametric artificial neural network analyses we have identified key genes and regulatory pathways which may be important in early and adaptive responses to TB. Using

  6. Non parametric denoising methods based on wavelets: Application to electron microscopy images in low exposure time

    Energy Technology Data Exchange (ETDEWEB)

    Soumia, Sid Ahmed, E-mail: samasoumia@hotmail.fr [Science and Technology Faculty, El Bachir El Ibrahimi University, BordjBouArreridj (Algeria); Messali, Zoubeida, E-mail: messalizoubeida@yahoo.fr [Laboratory of Electrical Engineering(LGE), University of M' sila (Algeria); Ouahabi, Abdeldjalil, E-mail: abdeldjalil.ouahabi@univ-tours.fr [Polytechnic School, University of Tours (EPU - PolytechTours), EPU - Energy and Electronics Department (France); Trepout, Sylvain, E-mail: sylvain.trepout@curie.fr, E-mail: cedric.messaoudi@curie.fr, E-mail: sergio.marco@curie.fr; Messaoudi, Cedric, E-mail: sylvain.trepout@curie.fr, E-mail: cedric.messaoudi@curie.fr, E-mail: sergio.marco@curie.fr; Marco, Sergio, E-mail: sylvain.trepout@curie.fr, E-mail: cedric.messaoudi@curie.fr, E-mail: sergio.marco@curie.fr [INSERMU759, University Campus Orsay, 91405 Orsay Cedex (France)

    2015-01-13

    The 3D reconstruction of the Cryo-Transmission Electron Microscopy (Cryo-TEM) and Energy Filtering TEM images (EFTEM) hampered by the noisy nature of these images, so that their alignment becomes so difficult. This noise refers to the collision between the frozen hydrated biological samples and the electrons beam, where the specimen is exposed to the radiation with a high exposure time. This sensitivity to the electrons beam led specialists to obtain the specimen projection images at very low exposure time, which resulting the emergence of a new problem, an extremely low signal-to-noise ratio (SNR). This paper investigates the problem of TEM images denoising when they are acquired at very low exposure time. So, our main objective is to enhance the quality of TEM images to improve the alignment process which will in turn improve the three dimensional tomography reconstructions. We have done multiple tests on special TEM images acquired at different exposure time 0.5s, 0.2s, 0.1s and 1s (i.e. with different values of SNR)) and equipped by Golding beads for helping us in the assessment step. We herein, propose a structure to combine multiple noisy copies of the TEM images. The structure is based on four different denoising methods, to combine the multiple noisy TEM images copies. Namely, the four different methods are Soft, the Hard as Wavelet-Thresholding methods, Bilateral Filter as a non-linear technique able to maintain the edges neatly, and the Bayesian approach in the wavelet domain, in which context modeling is used to estimate the parameter for each coefficient. To ensure getting a high signal-to-noise ratio, we have guaranteed that we are using the appropriate wavelet family at the appropriate level. So we have chosen âĂIJsym8âĂİ wavelet at level 3 as the most appropriate parameter. Whereas, for the bilateral filtering many tests are done in order to determine the proper filter parameters represented by the size of the filter, the range parameter and the

  7. Understanding the Sampling Distribution and Its Use in Testing Statistical Significance.

    Science.gov (United States)

    Breunig, Nancy A.

    Despite the increasing criticism of statistical significance testing by researchers, particularly in the publication of the 1994 American Psychological Association's style manual, statistical significance test results are still popular in journal articles. For this reason, it remains important to understand the logic of inferential statistics. A…

  8. Alternative methods of marginal abatement cost estimation: Non- parametric distance functions

    Energy Technology Data Exchange (ETDEWEB)

    Boyd, G.; Molburg, J. [Argonne National Lab., IL (United States). Decision and Information Sciences Div.; Prince, R. [USDOE Office of Environmental Analysis, Washington, DC (United States)

    1996-12-31

    This project implements a economic methodology to measure the marginal abatement costs of pollution by measuring the lost revenue implied by an incremental reduction in pollution. It utilizes observed performance, or `best practice`, of facilities to infer the marginal abatement cost. The initial stage of the project is to use data from an earlier published study on productivity trends and pollution in electric utilities to test this approach and to provide insights on its implementation to issues of cost-benefit analysis studies needed by the Department of Energy. The basis for this marginal abatement cost estimation is a relationship between the outputs and the inputs of a firm or plant. Given a fixed set of input resources, including quasi-fixed inputs like plant and equipment and variable inputs like labor and fuel, a firm is able to produce a mix of outputs. This paper uses this theoretical view of the joint production process to implement a methodology and obtain empirical estimates of marginal abatement costs. These estimates are compared to engineering estimates.

  9. Modular autopilot design and development featuring Bayesian non-parametric adaptive control

    Science.gov (United States)

    Stockton, Jacob

    Over the last few decades, Unmanned Aircraft Systems, or UAS, have become a critical part of the defense of our nation and the growth of the aerospace sector. UAS have a great potential for the agricultural industry, first response, and ecological monitoring. However, the wide range of applications require many mission-specific vehicle platforms. These platforms must operate reliably in a range of environments, and in presence of significant uncertainties. The accepted practice for enabling autonomously flying UAS today relies on extensive manual tuning of the UAS autopilot parameters, or time consuming approximate modeling of the dynamics of the UAS. These methods may lead to overly conservative controllers or excessive development times. A comprehensive approach to the development of an adaptive, airframe-independent controller is presented. The control algorithm leverages a nonparametric, Bayesian approach to adaptation, and is used as a cornerstone for the development of a new modular autopilot. Promising simulation results are presented for the adaptive controller, as well as, flight test results for the modular autopilot.

  10. Reply to 'Statistical testing and distribution for lead chloride toxicity'

    Institute of Scientific and Technical Information of China (English)

    Maria de Lourdes Pereira; J. Ramalho-Santos

    2005-01-01

    @@ Dear Sir, We are very grateful for the letter written by Dr Lange,and indeed apologize for the mistakes noted in the wording of our text regarding statistical analysis. This was due to changes carried out while revising the manuscript at the request of reviewers, whom we thank for, pointing out several issues that were actually similar to those noted by Dr. Lange. Unfortunately, we were unable to

  11. Estimation from PET data of transient changes in dopamine concentration induced by alcohol: support for a non-parametric signal estimation method

    Energy Technology Data Exchange (ETDEWEB)

    Constantinescu, C C; Yoder, K K; Normandin, M D; Morris, E D [Department of Radiology, Indiana University School of Medicine, Indianapolis, IN (United States); Kareken, D A [Department of Neurology, Indiana University School of Medicine, Indianapolis, IN (United States); Bouman, C A [Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN (United States); O' Connor, S J [Department of Psychiatry, Indiana University School of Medicine, Indianapolis, IN (United States)], E-mail: emorris@iupui.edu

    2008-03-07

    We previously developed a model-independent technique (non-parametric ntPET) for extracting the transient changes in neurotransmitter concentration from paired (rest and activation) PET studies with a receptor ligand. To provide support for our method, we introduced three hypotheses of validation based on work by Endres and Carson (1998 J. Cereb. Blood Flow Metab. 18 1196-210) and Yoder et al (2004 J. Nucl. Med. 45 903-11), and tested them on experimental data. All three hypotheses describe relationships between the estimated free (synaptic) dopamine curves (F{sup DA}(t)) and the change in binding potential ({delta}BP). The veracity of the F{sup DA}(t) curves recovered by nonparametric ntPET is supported when the data adhere to the following hypothesized behaviors: (1) {delta}BP should decline with increasing DA peak time, (2) {delta}BP should increase as the strength of the temporal correlation between F{sup DA}(t) and the free raclopride (F{sup RAC}(t)) curve increases, (3) {delta}BP should decline linearly with the effective weighted availability of the receptor sites. We analyzed regional brain data from 8 healthy subjects who received two [{sup 11}C]raclopride scans: one at rest, and one during which unanticipated IV alcohol was administered to stimulate dopamine release. For several striatal regions, nonparametric ntPET was applied to recover F{sup DA}(t), and binding potential values were determined. Kendall rank-correlation analysis confirmed that the F{sup DA}(t) data followed the expected trends for all three validation hypotheses. Our findings lend credence to our model-independent estimates of F{sup DA}(t). Application of nonparametric ntPET may yield important insights into how alterations in timing of dopaminergic neurotransmission are involved in the pathologies of addiction and other psychiatric disorders.

  12. Non-parametric reconstruction of an inflaton potential from Einstein–Cartan–Sciama–Kibble gravity with particle production

    Directory of Open Access Journals (Sweden)

    Shantanu Desai

    2016-04-01

    Full Text Available The coupling between spin and torsion in the Einstein–Cartan–Sciama–Kibble theory of gravity generates gravitational repulsion at very high densities, which prevents a singularity in a black hole and may create there a new universe. We show that quantum particle production in such a universe near the last bounce, which represents the Big Bang, gives the dynamics that solves the horizon, flatness, and homogeneity problems in cosmology. For a particular range of the particle production coefficient, we obtain a nearly constant Hubble parameter that gives an exponential expansion of the universe with more than 60 e-folds, which lasts about ∼10−42 s. This scenario can thus explain cosmic inflation without requiring a fundamental scalar field and reheating. From the obtained time dependence of the scale factor, we follow the prescription of Ellis and Madsen to reconstruct in a non-parametric way a scalar field potential which gives the same dynamics of the early universe. This potential gives the slow-roll parameters of cosmic inflation, from which we calculate the tensor-to-scalar ratio, the scalar spectral index of density perturbations, and its running as functions of the production coefficient. We find that these quantities do not significantly depend on the scale factor at the Big Bounce. Our predictions for these quantities are consistent with the Planck 2015 observations.

  13. Non-parametric reconstruction of an inflaton potential from Einstein-Cartan-Sciama-Kibble gravity with particle production

    Science.gov (United States)

    Desai, Shantanu; Popławski, Nikodem J.

    2016-04-01

    The coupling between spin and torsion in the Einstein-Cartan-Sciama-Kibble theory of gravity generates gravitational repulsion at very high densities, which prevents a singularity in a black hole and may create there a new universe. We show that quantum particle production in such a universe near the last bounce, which represents the Big Bang, gives the dynamics that solves the horizon, flatness, and homogeneity problems in cosmology. For a particular range of the particle production coefficient, we obtain a nearly constant Hubble parameter that gives an exponential expansion of the universe with more than 60 e-folds, which lasts about ∼10-42 s. This scenario can thus explain cosmic inflation without requiring a fundamental scalar field and reheating. From the obtained time dependence of the scale factor, we follow the prescription of Ellis and Madsen to reconstruct in a non-parametric way a scalar field potential which gives the same dynamics of the early universe. This potential gives the slow-roll parameters of cosmic inflation, from which we calculate the tensor-to-scalar ratio, the scalar spectral index of density perturbations, and its running as functions of the production coefficient. We find that these quantities do not significantly depend on the scale factor at the Big Bounce. Our predictions for these quantities are consistent with the Planck 2015 observations.

  14. Non-parametric reconstruction of an inflaton potential from Einstein-Cartan-Sciama-Kibble gravity with particle production

    CERN Document Server

    Desai, Shantanu

    2015-01-01

    The coupling between spin and torsion in the Einstein-Cartan-Sciama-Kibble theory of gravity generates gravitational repulsion at very high densities, which prevents a singularity in a black hole and may create there a new universe. We show that quantum particle production in such a universe near the last bounce, which represents the Big Bang gives the dynamics that solves the horizon, flatness, and homogeneity problems in cosmology. For a particular range of the particle production coefficient, we obtain a nearly constant Hubble parameter that gives an exponential expansion of the universe with more than 60 $e$-folds, which lasts about $\\sim 10^{-42}$ s. This scenario can thus explain cosmic inflation without requiring a fundamental scalar field and reheating. From the obtained time dependence of the scale factor, we follow the prescription of Ellis and Madsen to reconstruct in a non-parametric way a scalar field potential which gives the same dynamics of the early universe. This potential gives the slow-rol...

  15. Inferring the three-dimensional distribution of dust in the Galaxy with a non-parametric method: Preparing for Gaia

    CERN Document Server

    Kh., S Rezaei; Hanson, R J; Fouesneau, M

    2016-01-01

    We present a non-parametric model for inferring the three-dimensional (3D) distribution of dust density in the Milky Way. Our approach uses the extinction measured towards stars at different locations in the Galaxy at approximately known distances. Each extinction measurement is proportional to the integrated dust density along its line-of-sight. Making simple assumptions about the spatial correlation of the dust density, we can infer the most probable 3D distribution of dust across the entire observed region, including along sight lines which were not observed. This is possible because our model employs a Gaussian Process to connect all lines-of-sight. We demonstrate the capability of our model to capture detailed dust density variations using mock data as well as simulated data from the Gaia Universe Model Snapshot. We then apply our method to a sample of giant stars observed by APOGEE and Kepler to construct a 3D dust map over a small region of the Galaxy. Due to our smoothness constraint and its isotropy,...

  16. Super-resolution non-parametric deconvolution in modelling the radial response function of a parallel plate ionization chamber.

    Science.gov (United States)

    Kulmala, A; Tenhunen, M

    2012-11-07

    The signal of the dosimetric detector is generally dependent on the shape and size of the sensitive volume of the detector. In order to optimize the performance of the detector and reliability of the output signal the effect of the detector size should be corrected or, at least, taken into account. The response of the detector can be modelled using the convolution theorem that connects the system input (actual dose), output (measured result) and the effect of the detector (response function) by a linear convolution operator. We have developed the super-resolution and non-parametric deconvolution method for determination of the cylinder symmetric ionization chamber radial response function. We have demonstrated that the presented deconvolution method is able to determine the radial response for the Roos parallel plate ionization chamber with a better than 0.5 mm correspondence with the physical measures of the chamber. In addition, the performance of the method was proved by the excellent agreement between the output factors of the stereotactic conical collimators (4-20 mm diameter) measured by the Roos chamber, where the detector size is larger than the measured field, and the reference detector (diode). The presented deconvolution method has a potential in providing reference data for more accurate physical models of the ionization chamber as well as for improving and enhancing the performance of the detectors in specific dosimetric problems.

  17. A sharper view of Pal 5's tails: Discovery of stream perturbations with a novel non-parametric technique

    CERN Document Server

    Erkal, Denis; Belokurov, Vasily

    2016-01-01

    Only in the Milky Way is it possible to conduct an experiment which uses stellar streams to detect low-mass dark matter subhaloes. In smooth and static host potentials, tidal tails of disrupting satellites appear highly symmetric. However, dark perturbers induce density fluctuations that destroy this symmetry. Motivated by the recent release of unprecedentedly deep and wide imaging data around the Pal 5 stellar stream, we develop a new probabilistic, adaptive and non-parametric technique which allows us to bring the cluster's tidal tails into clear focus. Strikingly, we uncover a stream whose density exhibits visible changes on a variety of angular scales. We detect significant bumps and dips, both narrow and broad: two peaks on either side of the progenitor, each only a fraction of a degree across, and two gaps, $\\sim2^{\\circ}$ and $\\sim9^{\\circ}$ wide, the latter accompanied by a gargantuan lump of debris. This largest density feature results in a pronounced inter-tail asymmetry which cannot be made consist...

  18. The merger fraction of active and inactive galaxies in the local Universe through an improved non-parametric classification

    CERN Document Server

    Cotini, Stefano; Caccianiga, Alessandro; Colpi, Monica; Della Ceca, Roberto; Mapelli, Michela; Severgnini, Paola; Segreto, Alberto; 10.1093/mnras/stt358

    2013-01-01

    We investigate the possible link between mergers and the enhanced activity of supermassive black holes (SMBHs) at the centre of galaxies, by comparing the merger fraction of a local sample (0.003 =< z < 0.03) of active galaxies - 59 active galactic nuclei (AGN) host galaxies selected from the all-sky Swift BAT (Burst Alert Telescope) survey - with an appropriate control sample (247 sources extracted from the Hyperleda catalogue) that has the same redshift distribution as the BAT sample. We detect the interacting systems in the two samples on the basis of non-parametric structural indexes of concentration (C), asymmetry (A), clumpiness (S), Gini coefficient (G) and second order momentum of light (M20). In particular, we propose a new morphological criterion, based on a combination of all these indexes, that improves the identification of interacting systems. We also present a new software - PyCASSo (Python CAS Software) - for the automatic computation of the structural indexes. After correcting for the c...

  19. Non-parametric analysis of infrared spectra for recognition of glass and glass ceramic fragments in recycling plants.

    Science.gov (United States)

    Farcomeni, Alessio; Serranti, Silvia; Bonifazi, Giuseppe

    2008-01-01

    Glass ceramic detection in glass recycling plants represents a still unsolved problem, as glass ceramic material looks like normal glass and is usually detected only by specialized personnel. The presence of glass-like contaminants inside waste glass products, resulting from both industrial and differentiated urban waste collection, increases process production costs and reduces final product quality. In this paper an innovative approach for glass ceramic recognition, based on the non-parametric analysis of infrared spectra, is proposed and investigated. The work was specifically addressed to the spectral classification of glass and glass ceramic fragments collected in an actual recycling plant from three different production lines: flat glass, colored container-glass and white container-glass. The analyses, carried out in the near and mid-infrared (NIR-MIR) spectral field (1280-4480 nm), show that glass ceramic and glass fragments can be recognized by applying a wavelet transform, with a small classification error. Moreover, a method for selecting only a small subset of relevant wavelength ratios is suggested, allowing the conduct of a fast recognition of the two classes of materials. The results show how the proposed approach can be utilized to develop a classification engine to be integrated inside a hardware and software sorting architecture for fast "on-line" ceramic glass recognition and separation.

  20. Prediction intervals for future BMI values of individual children: a non-parametric approach by quantile boosting.

    Science.gov (United States)

    Mayr, Andreas; Hothorn, Torsten; Fenske, Nora

    2012-01-25

    The construction of prediction intervals (PIs) for future body mass index (BMI) values of individual children based on a recent German birth cohort study with n = 2007 children is problematic for standard parametric approaches, as the BMI distribution in childhood is typically skewed depending on age. We avoid distributional assumptions by directly modelling the borders of PIs by additive quantile regression, estimated by boosting. We point out the concept of conditional coverage to prove the accuracy of PIs. As conditional coverage can hardly be evaluated in practical applications, we conduct a simulation study before fitting child- and covariate-specific PIs for future BMI values and BMI patterns for the present data. The results of our simulation study suggest that PIs fitted by quantile boosting cover future observations with the predefined coverage probability and outperform the benchmark approach. For the prediction of future BMI values, quantile boosting automatically selects informative covariates and adapts to the age-specific skewness of the BMI distribution. The lengths of the estimated PIs are child-specific and increase, as expected, with the age of the child. Quantile boosting is a promising approach to construct PIs with correct conditional coverage in a non-parametric way. It is in particular suitable for the prediction of BMI patterns depending on covariates, since it provides an interpretable predictor structure, inherent variable selection properties and can even account for longitudinal data structures.

  1. Non-parametric convolution based image-segmentation of ill-posed objects applying context window approach

    CERN Document Server

    Kumar, Upendra; Pal, Manoj Kumar

    2012-01-01

    Context-dependence in human cognition process is a well-established fact. Following this, we introduced the image segmentation method that can use context to classify a pixel on the basis of its membership to a particular object-class of the concerned image. In the broad methodological steps, each pixel was defined by its context window (CW) surrounding it the size of which was fixed heuristically. CW texture defined by the intensities of its pixels was convoluted with weights optimized through a non-parametric function supported by a backpropagation network. Result of convolution was used to classify them. The training data points (i.e., pixels) were carefully chosen to include all variety of contexts of types, i) points within the object, ii) points near the edge but inside the objects, iii) points at the border of the objects, iv) points near the edge but outside the objects, v) points near or at the edge of the image frame. Moreover the training data points were selected from all the images within image-d...

  2. Statistical Analysis for Test Papers with Software SPSS

    Institute of Scientific and Technical Information of China (English)

    张燕君

    2012-01-01

      Test paper evaluation is an important work for the management of tests, which results are significant bases for scientific summation of teaching and learning. Taking an English test paper of high students’monthly examination as the object, it focuses on the interpretation of SPSS output concerning item and whole quantitative analysis of papers. By analyzing and evaluating the papers, it can be a feedback for teachers to check the students’progress and adjust their teaching process.

  3. A semiparametric Wald statistic for testing logistic regression models based on case-control data

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    We propose a semiparametric Wald statistic to test the validity of logistic regression models based on case-control data. The test statistic is constructed using a semiparametric ROC curve estimator and a nonparametric ROC curve estimator. The statistic has an asymptotic chi-squared distribution and is an alternative to the Kolmogorov-Smirnov-type statistic proposed by Qin and Zhang in 1997, the chi-squared-type statistic proposed by Zhang in 1999 and the information matrix test statistic proposed by Zhang in 2001. The statistic is easy to compute in the sense that it requires none of the following methods: using a bootstrap method to find its critical values, partitioning the sample data or inverting a high-dimensional matrix. We present some results on simulation and on analysis of two real examples. Moreover, we discuss how to extend our statistic to a family of statistics and how to construct its Kolmogorov-Smirnov counterpart.

  4. Statistics of sampling for microbiological testing of foodborne pathogens

    Science.gov (United States)

    Despite the many recent advances in protocols for testing for pathogens in foods, a number of challenges still exist. For example, the microbiological safety of food cannot be completely ensured by testing because microorganisms are not evenly distributed throughout the food. Therefore, since it i...

  5. [Mastered with statistics: perfect eye drops and ideal screening test : Possibilities and limits of statistical methods for glaucoma].

    Science.gov (United States)

    Kotliar, K E; Lanzl, I M

    2016-10-01

    The use and the understanding of statistics are very important for biomedical research and for the clinical practice. This is particularly true for estimation of the possibilities for different diagnostic and therapy options in the field of glaucoma. The apparent complexity and contraintuitiveness of statistics along with a cautious acceptance by many physicians, might be the cause of conscious and unconscious manipulation with data representation and interpretation. Comprehendable clarification of some typical errors in the handling of medical statistical data. Using two hypothetical examples from glaucoma diagnostics the presentation of the effect of a hypotensive drug and interpretation of the results of a diagnostic test and typical statistical applications and sources of error are analyzed in detail and discussed. Mechanisms of data manipulation and incorrect data interpretation are elucidated. Typical sources of error in the statistical analysis and data presentation are explained. The practical examples analyzed demonstrate the need to understand the basics of statistics and to be able to apply them correctly. The lack of basic knowledge or half-knowledge in medical statistics can lead to misunderstandings, confusion and wrong decisions in medical research and also in clinical practice.

  6. Choosing statistical tests: part 12 of a series on evaluation of scientific publications.

    Science.gov (United States)

    du Prel, Jean-Baptist; Röhrig, Bernd; Hommel, Gerhard; Blettner, Maria

    2010-05-01

    The interpretation of scientific articles often requires an understanding of the methods of inferential statistics. This article informs the reader about frequently used statistical tests and their correct application. The most commonly used statistical tests were identified through a selective literature search on the methodology of medical research publications. These tests are discussed in this article, along with a selection of other standard methods of inferential statistics. Readers who are acquainted not just with descriptive methods, but also with Pearson's chi-square test, Fisher's exact test, and Student's t test will be able to interpret a large proportion of medical research articles. Criteria are presented for choosing the proper statistical test to be used out of the most frequently applied tests. An algorithm and a table are provided to facilitate the selection of the appropriate test.

  7. STATISTICS OF FUZZY DATA

    Directory of Open Access Journals (Sweden)

    Orlov A. I.

    2016-05-01

    Full Text Available Fuzzy sets are the special form of objects of nonnumeric nature. Therefore, in the processing of the sample, the elements of which are fuzzy sets, a variety of methods for the analysis of statistical data of any nature can be used - the calculation of the average, non-parametric density estimators, construction of diagnostic rules, etc. We have told about the development of our work on the theory of fuzziness (1975 - 2015. In the first of our work on fuzzy sets (1975, the theory of random sets is regarded as a generalization of the theory of fuzzy sets. In non-fiction series "Mathematics. Cybernetics" (publishing house "Knowledge" in 1980 the first book by a Soviet author fuzzy sets is published - our brochure "Optimization problems and fuzzy variables". This book is essentially a "squeeze" our research of 70-ies, ie, the research on the theory of stability and in particular on the statistics of objects of non-numeric nature, with a bias in the methodology. The book includes the main results of the fuzzy theory and its note to the random set theory, as well as new results (first publication! of statistics of fuzzy sets. On the basis of further experience, you can expect that the theory of fuzzy sets will be more actively applied in organizational and economic modeling of industry management processes. We discuss the concept of the average value of a fuzzy set. We have considered a number of statements of problems of testing statistical hypotheses on fuzzy sets. We have also proposed and justified some algorithms for restore relationships between fuzzy variables; we have given the representation of various variants of fuzzy cluster analysis of data and variables and described some methods of collection and description of fuzzy data

  8. Transit timing observations from Kepler. VI. Potentially interesting candidate systems from fourier-based statistical tests

    DEFF Research Database (Denmark)

    Steffen, J.H.; Ford, E.B.; Rowe, J.F.

    2012-01-01

    We analyze the deviations of transit times from a linear ephemeris for the Kepler Objects of Interest (KOI) through quarter six of science data. We conduct two statistical tests for all KOIs and a related statistical test for all pairs of KOIs in multi-transiting systems. These tests identify...

  9. A test on the statistics of derived intensities

    NARCIS (Netherlands)

    de With, G.; Feil, D.

    1976-01-01

    Variances of X-ray reflexions calculated with the procedure as proposed by McCandlish, Stout & Andrews [Acta Cryst. (1975), A31, 245-249] have been tested against variances determined in an independent way. A satisfying agreement is obtained

  10. The Statistical Assessment of Latent Trait Dimensionality in Psychological Testing

    Science.gov (United States)

    1984-06-01

    an upcoming memorial volumn in memory of David Wechsler, (1984). Lamperti, J., Probability, W. A. Benjamin , New York, (1966). Lawley, D.N...Dr. R. Darreli Bock Departissrit ot Education University of Chicago Chicago, IL 60637 » * 1 Dr. Robert Erennan • American CoiisgE Testing

  11. Monte Carlo testing in spatial statistics, with applications to spatial residuals

    DEFF Research Database (Denmark)

    Mrkvička, Tomáš; Soubeyrand, Samuel; Myllymäki, Mari;

    2016-01-01

    This paper reviews recent advances made in testing in spatial statistics and discussed at the Spatial Statistics conference in Avignon 2015. The rank and directional quantile envelope tests are discussed and practical rules for their use are provided. These tests are global envelope tests with an...... a two-dimensional smoothed residual field. Second, a goodness-of-fit test of a geostatistical model is performed based on two-dimensional raw residuals....

  12. Semi-automatic liver tumor segmentation with hidden Markov measure field model and non-parametric distribution estimation.

    Science.gov (United States)

    Häme, Yrjö; Pollari, Mika

    2012-01-01

    A novel liver tumor segmentation method for CT images is presented. The aim of this work was to reduce the manual labor and time required in the treatment planning of radiofrequency ablation (RFA), by providing accurate and automated tumor segmentations reliably. The developed method is semi-automatic, requiring only minimal user interaction. The segmentation is based on non-parametric intensity distribution estimation and a hidden Markov measure field model, with application of a spherical shape prior. A post-processing operation is also presented to remove the overflow to adjacent tissue. In addition to the conventional approach of using a single image as input data, an approach using images from multiple contrast phases was developed. The accuracy of the method was validated with two sets of patient data, and artificially generated samples. The patient data included preoperative RFA images and a public data set from "3D Liver Tumor Segmentation Challenge 2008". The method achieved very high accuracy with the RFA data, and outperformed other methods evaluated with the public data set, receiving an average overlap error of 30.3% which represents an improvement of 2.3% points to the previously best performing semi-automatic method. The average volume difference was 23.5%, and the average, the RMS, and the maximum surface distance errors were 1.87, 2.43, and 8.09 mm, respectively. The method produced good results even for tumors with very low contrast and ambiguous borders, and the performance remained high with noisy image data.

  13. A sharper view of Pal 5's tails: discovery of stream perturbations with a novel non-parametric technique

    Science.gov (United States)

    Erkal, Denis; Koposov, Sergey E.; Belokurov, Vasily

    2017-09-01

    Only in the Milky Way is it possible to conduct an experiment that uses stellar streams to detect low-mass dark matter subhaloes. In smooth and static host potentials, tidal tails of disrupting satellites appear highly symmetric. However, perturbations from dark subhaloes, as well as from GMCs and the Milky Way bar, can induce density fluctuations that destroy this symmetry. Motivated by the recent release of unprecedentedly deep and wide imaging data around the Pal 5 stellar stream, we develop a new probabilistic, adaptive and non-parametric technique that allows us to bring the cluster's tidal tails into clear focus. Strikingly, we uncover a stream whose density exhibits visible changes on a variety of angular scales. We detect significant bumps and dips, both narrow and broad: two peaks on either side of the progenitor, each only a fraction of a degree across, and two gaps, ∼2° and ∼9° wide, the latter accompanied by a gargantuan lump of debris. This largest density feature results in a pronounced intertail asymmetry which cannot be made consistent with an unperturbed stream according to a suite of simulations we have produced. We conjecture that the sharp peaks around Pal 5 are epicyclic overdensities, while the two dips are consistent with impacts by subhaloes. Assuming an age of 3.4 Gyr for Pal 5, these two gaps would correspond to the characteristic size of gaps created by subhaloes in the mass range of 106-107 M⊙ and 107-108 M⊙, respectively. In addition to dark substructure, we find that the bar of the Milky Way can plausibly produce the asymmetric density seen in Pal 5 and that GMCs could cause the smaller gap.

  14. Mathematical statistics and stochastic processes

    CERN Document Server

    Bosq, Denis

    2013-01-01

    Generally, books on mathematical statistics are restricted to the case of independent identically distributed random variables. In this book however, both this case AND the case of dependent variables, i.e. statistics for discrete and continuous time processes, are studied. This second case is very important for today's practitioners.Mathematical Statistics and Stochastic Processes is based on decision theory and asymptotic statistics and contains up-to-date information on the relevant topics of theory of probability, estimation, confidence intervals, non-parametric statistics and rob

  15. Statistical concepts a second course

    CERN Document Server

    Lomax, Richard G

    2012-01-01

    Statistical Concepts consists of the last 9 chapters of An Introduction to Statistical Concepts, 3rd ed. Designed for the second course in statistics, it is one of the few texts that focuses just on intermediate statistics. The book highlights how statistics work and what they mean to better prepare students to analyze their own data and interpret SPSS and research results. As such it offers more coverage of non-parametric procedures used when standard assumptions are violated since these methods are more frequently encountered when working with real data. Determining appropriate sample sizes

  16. Testing the DGP model with gravitational lensing statistics

    Science.gov (United States)

    Zhu, Zong-Hong; Sereno, M.

    2008-09-01

    Aims: The self-accelerating braneworld model (DGP) appears to provide a simple alternative to the standard ΛCDM cosmology to explain the current cosmic acceleration, which is strongly indicated by measurements of type Ia supernovae, as well as other concordant observations. Methods: We investigate observational constraints on this scenario provided by gravitational-lensing statistics using the Cosmic Lens All-Sky Survey (CLASS) lensing sample. Results: We show that a substantial part of the parameter space of the DGP model agrees well with that of radio source gravitational lensing sample. Conclusions: In the flat case, Ω_K=0, the likelihood is maximized, L=L_max, for ΩM = 0.30-0.11+0.19. If we relax the prior on Ω_K, the likelihood peaks at Ω_M,Ωr_c ≃ 0.29, 0.12, slightly in the region of open models. The confidence contours are, however, elongated such that we are unable to discard any of the close, flat or open models.

  17. Comparison of Three Statistical Classification Techniques for Maser Identification

    Science.gov (United States)

    Manning, Ellen M.; Holland, Barbara R.; Ellingsen, Simon P.; Breen, Shari L.; Chen, Xi; Humphries, Melissa

    2016-04-01

    We applied three statistical classification techniques-linear discriminant analysis (LDA), logistic regression, and random forests-to three astronomical datasets associated with searches for interstellar masers. We compared the performance of these methods in identifying whether specific mid-infrared or millimetre continuum sources are likely to have associated interstellar masers. We also discuss the interpretability of the results of each classification technique. Non-parametric methods have the potential to make accurate predictions when there are complex relationships between critical parameters. We found that for the small datasets the parametric methods logistic regression and LDA performed best, for the largest dataset the non-parametric method of random forests performed with comparable accuracy to parametric techniques, rather than any significant improvement. This suggests that at least for the specific examples investigated here accuracy of the predictions obtained is not being limited by the use of parametric models. We also found that for LDA, transformation of the data to match a normal distribution led to a significant improvement in accuracy. The different classification techniques had significant overlap in their predictions; further astronomical observations will enable the accuracy of these predictions to be tested.

  18. EDF Statistics for Testing for the Gamma Distribution, with Applications.

    Science.gov (United States)

    1982-08-13

    coordinates and then dividing by the spectral density function , which is assumed known up to some unknown scale factor. This gives rise to scaled...periodogram values, xl , ... ,x, which have an approximate G(Ox,m) distribution function, when the correct spectral density function is used. The unknown...scale parameter 0 can be estimated using the technique of section 2.2 and the z I’s found. A test of the !i correctly specified spectral density function can

  19. Testing the rate isomorphy hypothesis using five statistical methods

    Institute of Scientific and Technical Information of China (English)

    Xian-Ju Kuang; Megha N. Parajulee2+,; Pei-Jian Shi; Feng Ge; Fang-Sen Xue

    2012-01-01

    Organisms are said to be in developmental rate isomorphy when the proportions of developmental stage durations are unaffected by temperature.Comprehensive stage-specific developmental data were generated on the cabbage beetle,Colaphellus bowringi Baly (Coleoptera:Chrysomelidae),at eight temperatures ranging from 16℃ to 30℃ (in 2℃ increments) and five analytical methods were used to test the rate isomorphy hypothesis,including:(i) direct comparison of lower developmental thresholds with standard errors based on the traditional linear equation describing developmental rate as the linear function of temperature; (ii) analysis of covariance to compare the lower developmental thresholds of different stages based on the Ikemoto-Takai linear equation; (iii)testing the significance of the slope item in the regression line of arcsin(√P) versus temperature,where p is the ratio of the developmental duration of a particular developmental stage to the entire pre-imaginal developmental duration for one insect or mite species; (iv)analysis of variance to test for significant differences between the ratios of developmental stage durations to that of pre-imaginal development; and (v) checking whether there is an element less than a given level of significance in the p-value matrix of rotating regression line.The results revealed no significant difference among the lower developmental thresholds or among the aforementioned ratios,and thus convincingly confirmed the rate isomorphy hypothesis.

  20. GROUNDWATER MONITORING: Statistical Methods for Testing Special Background Conditions

    Energy Technology Data Exchange (ETDEWEB)

    Chou, Charissa J.

    2004-04-28

    This chapter illustrates application of a powerful intra-well testing method referred as the combined Shewhart-CUSUM control chart approach, which can detect abrupt and gradual changes in groundwater parameter concentrations. This method is broadly applicable to groundwater monitoring situations where there is no clearly defined upgradient well or wells, where spatial variability exists in parameter concentrations, or when groundwater flow rate is extremely slow. Procedures for determining the minimum time needed to acquire independent groundwater samples and useful transformations for obtaining normally distributed data are also provided. The control chart method will be insensitive to detect real changes if a preexisting trend is observed in the background data set. A method and a case study describing how a trend observed in a background data set can be removed using a transformation suggested by Gibbons (1994) are presented to illustrate treatment of a preexisting trend.

  1. SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

    Science.gov (United States)

    Chu, Annie; Cui, Jenny; Dinov, Ivo D

    2009-03-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most

  2. Testing the dark energy with gravitational lensing statistics

    CERN Document Server

    Cao, Shuo; Zhu, Zong-Hong

    2012-01-01

    We study the redshift distribution of two samples of early-type gravitational lenses, extracted from a larger collection of 122 systems, to constrain the cosmological constant in the LCDM model and the parameters of a set of alternative dark energy models (XCDM, Dvali-Gabadadze-Porrati and Ricci dark energy models), under a spatially flat universe. The likelihood is maximized for $\\Omega_\\Lambda= 0.70 \\pm 0.09$ when considering the sample excluding the SLACS systems (known to be biased towards large image-separation lenses) and no-evolution, and $\\Omega_\\Lambda= 0.81\\pm 0.05$ when limiting to gravitational lenses with image separation larger than 2" and no-evolution. In both cases, results accounting for galaxy evolution are consistent within 1$\\sigma$. The present test supports the accelerated expansion, by excluding the null-hypothesis (i.e., $\\Omega_\\Lambda = 0 $) at more than 4$\\sigma$, regardless of the chosen sample and assumptions on the galaxy evolution. A comparison between competitive world models i...

  3. "What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"

    Science.gov (United States)

    Ozturk, Elif

    2012-01-01

    The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…

  4. Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers

    Directory of Open Access Journals (Sweden)

    Stochl Jan

    2012-06-01

    Full Text Available Abstract Background Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Methods Scalability of data from 1 a cross-sectional health survey (the Scottish Health Education Population Survey and 2 a general population birth cohort study (the National Child Development Study illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. Results and conclusions After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items we show that all items from the 12-item General Health Questionnaire (GHQ-12 – when binary scored – were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech’s “well-being” and “distress” clinical scales. An illustration of ordinal item analysis

  5. Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers.

    Science.gov (United States)

    Stochl, Jan; Jones, Peter B; Croudace, Tim J

    2012-06-11

    Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12)--when binary scored--were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech's "well-being" and "distress" clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental

  6. EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.

    Science.gov (United States)

    Tong, Xiaoxiao; Bentler, Peter M

    2013-01-01

    Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.

  7. Nonparametric statistical tests for the continuous data: the basic concept and the practical use.

    Science.gov (United States)

    Nahm, Francis Sahngun

    2016-02-01

    Conventional statistical tests are usually called parametric tests. Parametric tests are used more frequently than nonparametric tests in many medical articles, because most of the medical researchers are familiar with and the statistical software packages strongly support parametric tests. Parametric tests require important assumption; assumption of normality which means that distribution of sample means is normally distributed. However, parametric test can be misleading when this assumption is not satisfied. In this circumstance, nonparametric tests are the alternative methods available, because they do not required the normality assumption. Nonparametric tests are the statistical methods based on signs and ranks. In this article, we will discuss about the basic concepts and practical use of nonparametric tests for the guide to the proper use.

  8. Non-parametric Bayesian approach to post-translational modification refinement of predictions from tandem mass spectrometry.

    Science.gov (United States)

    Chung, Clement; Emili, Andrew; Frey, Brendan J

    2013-04-01

    Tandem mass spectrometry (MS/MS) is a dominant approach for large-scale high-throughput post-translational modification (PTM) profiling. Although current state-of-the-art blind PTM spectral analysis algorithms can predict thousands of modified peptides (PTM predictions) in an MS/MS experiment, a significant percentage of these predictions have inaccurate modification mass estimates and false modification site assignments. This problem can be addressed by post-processing the PTM predictions with a PTM refinement algorithm. We developed a novel PTM refinement algorithm, iPTMClust, which extends a recently introduced PTM refinement algorithm PTMClust and uses a non-parametric Bayesian model to better account for uncertainties in the quantity and identity of PTMs in the input data. The use of this new modeling approach enables iPTMClust to provide a confidence score per modification site that allows fine-tuning and interpreting resulting PTM predictions. The primary goal behind iPTMClust is to improve the quality of the PTM predictions. First, to demonstrate that iPTMClust produces sensible and accurate cluster assignments, we compare it with k-means clustering, mixtures of Gaussians (MOG) and PTMClust on a synthetically generated PTM dataset. Second, in two separate benchmark experiments using PTM data taken from a phosphopeptide and a yeast proteome study, we show that iPTMClust outperforms state-of-the-art PTM prediction and refinement algorithms, including PTMClust. Finally, we illustrate the general applicability of our new approach on a set of human chromatin protein complex data, where we are able to identify putative novel modified peptides and modification sites that may be involved in the formation and regulation of protein complexes. Our method facilitates accurate PTM profiling, which is an important step in understanding the mechanisms behind many biological processes and should be an integral part of any proteomic study. Our algorithm is implemented in

  9. The Effects of Repeated Cooperative Testing in an Introductory Statistics Course.

    Science.gov (United States)

    Giraud, Gerald; Enders, Craig

    Cooperative testing seems a logical complement to cooperative learning, but it is counter to traditional testing procedures and is viewed by some as an opportunity for cheating and freeloading on the efforts of other test takers. This study examined the practice of cooperative testing in introductory statistics. Findings indicate that students had…

  10. Test Statistics and Confidence Intervals to Establish Noninferiority between Treatments with Ordinal Categorical Data.

    Science.gov (United States)

    Zhang, Fanghong; Miyaoka, Etsuo; Huang, Fuping; Tanaka, Yutaka

    2015-01-01

    The problem for establishing noninferiority is discussed between a new treatment and a standard (control) treatment with ordinal categorical data. A measure of treatment effect is used and a method of specifying noninferiority margin for the measure is provided. Two Z-type test statistics are proposed where the estimation of variance is constructed under the shifted null hypothesis using U-statistics. Furthermore, the confidence interval and the sample size formula are given based on the proposed test statistics. The proposed procedure is applied to a dataset from a clinical trial. A simulation study is conducted to compare the performance of the proposed test statistics with that of the existing ones, and the results show that the proposed test statistics are better in terms of the deviation from nominal level and the power.

  11. Statistical Redundancy Testing for Improved Gene Selection in Cancer Classification Using Microarray Data

    Directory of Open Access Journals (Sweden)

    J. Sunil Rao

    2007-01-01

    Full Text Available In gene selection for cancer classifi cation using microarray data, we define an eigenvalue-ratio statistic to measure a gene’s contribution to the joint discriminability when this gene is included into a set of genes. Based on this eigenvalueratio statistic, we define a novel hypothesis testing for gene statistical redundancy and propose two gene selection methods. Simulation studies illustrate the agreement between statistical redundancy testing and gene selection methods. Real data examples show the proposed gene selection methods can select a compact gene subset which can not only be used to build high quality cancer classifiers but also show biological relevance.

  12. Transit Timing Observations from Kepler. VI. Potentially Interesting Candidate Systems from Fourier-based Statistical Tests

    OpenAIRE

    Steffen, Jason H.; Ford, Eric B.; Rowe, Jason F.; Fabrycky, Daniel C.; Holman, Matthew J.; Welsh, William F.; Borucki, William J.; Batalha, Natalie M.; Bryson, Steve; Caldwell, Douglas A.; Ciardi, David R.; Jenkins, Jon M.; Kjeldsen, Hans; Koch, David G.; Prsa, Andrej

    2012-01-01

    We analyze the deviations of transit times from a linear ephemeris for the Kepler Objects of Interest (KOI) through Quarter six (Q6) of science data. We conduct two statistical tests for all KOIs and a related statistical test for all pairs of KOIs in multi-transiting systems. These tests identify several systems which show potentially interesting transit timing variations (TTVs). Strong TTV systems have been valuable for the confirmation of planets and their mass measurements. Many of the sy...

  13. Using the Bootstrap Method for a Statistical Significance Test of Differences between Summary Histograms

    Science.gov (United States)

    Xu, Kuan-Man

    2006-01-01

    A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.

  14. Testing over-representation of observations in subsets of a DEA technology

    DEFF Research Database (Denmark)

    Asmild, Mette; Hougaard, Jens Leth; Olesen, Ole Bent

    2013-01-01

    This paper proposes a test for whether data are over-represented in a given production zone, i.e. a subset of a production possibility set which has been estimated using the non-parametric Data Envelopment Analysis (DEA) approach. A binomial test is used that relates the number of observations...... inside such a zone to a discrete probability weighted relative volume of that zone. A Monte Carlo simulation illustrates the performance of the proposed test statistic and provides good estimation of both facet probabilities and the assumed common inefficiency distribution in a three dimensional input...

  15. Testing for phylogenetic signal in biological traits: the ubiquity of cross-product statistics.

    Science.gov (United States)

    Pavoine, Sandrine; Ricotta, Carlo

    2013-03-01

    To evaluate rates of evolution, to establish tests of correlation between two traits, or to investigate to what degree the phylogeny of a species assemblage is predictive of a trait value so-called tests for phylogenetic signal are used. Being based on different approaches, these tests are generally thought to possess quite different statistical performances. In this article, we show that the Blomberg et al. K and K*, the Abouheif index, the Moran's I, and the Mantel correlation are all based on a cross-product statistic, and are thus all related to each other when they are associated to a permutation test of phylogenetic signal. What changes is only the way phylogenetic and trait similarities (or dissimilarities) among the tips of a phylogeny are computed. The definitions of the phylogenetic and trait-based (dis)similarities among tips thus determines the performance of the tests. We shortly discuss the biological and statistical consequences (in terms of power and type I error of the tests) of the observed relatedness among the statistics that allow tests for phylogenetic signal. Blomberg et al. K* statistic appears as one on the most efficient approaches to test for phylogenetic signal. When branch lengths are not available or not accurate, Abouheif's Cmean statistic is a powerful alternative to K*.

  16. Generalizing Terwilliger's likelihood approach: a new score statistic to test for genetic association

    OpenAIRE

    Hsu Li; Helmer Quinta; de Visser Marieke CH; Uitte de Willige Shirley; el Galta Rachid; Houwing-Duistermaat Jeanine J

    2007-01-01

    Abstract Background: In this paper, we propose a one degree of freedom test for association between a candidate gene and a binary trait. This method is a generalization of Terwilliger's likelihood ratio statistic and is especially powerful for the situation of one associated haplotype. As an alternative to the likelihood ratio statistic, we derive a score statistic, which has a tractable expression. For haplotype analysis, we assume that phase is known. Results: By means of a simulation study...

  17. A semiparametric Wald statistic for testing logistic regression models based on case-control data

    Institute of Scientific and Technical Information of China (English)

    WAN ShuWen

    2008-01-01

    We propose a semiparametric Wald statistic to test the validity of logistic regression models based on case-control data.The test statistic is constructed using a semiparametric ROC curve estimator and a nonparametric ROC curve estimator.The statistic has an asymptotic chi-squared distribution and is an alternative to the Kolmogorov-Smirnov-type statistic proposed by Qin and Zhang in 1997,the chi-squared-type statistic proposed by Zhang in 1999 and the information matrix test statistic proposed by Zhang in 2001.The statistic is easy to compute in the sense that it requires none of the following methods:using a bootstrap method to find its critical values,partitioning the sample data or inverting a high-dimensional matrix.We present some results on simulation and on analysis of two real examples.Moreover,we discuss how to extend our statistic to a family of statistics and how to construct its Kolmogorov-Smirnov counterpart.

  18. Mnemonic Aids during Tests: Worthless Frivolity or Effective Tool in Statistics Education?

    Science.gov (United States)

    Larwin, Karen H.; Larwin, David A.; Gorman, Jennifer

    2012-01-01

    Researchers have explored many pedagogical approaches in an effort to assist students in finding understanding and comfort in required statistics courses. This study investigates the impact of mnemonic aids used during tests on students' statistics course performance in particular. In addition, the present study explores several hypotheses that…

  19. An Argument Framework for the Application of Null Hypothesis Statistical Testing in Support of Research

    Science.gov (United States)

    LeMire, Steven D.

    2010-01-01

    This paper proposes an argument framework for the teaching of null hypothesis statistical testing and its application in support of research. Elements of the Toulmin (1958) model of argument are used to illustrate the use of p values and Type I and Type II error rates in support of claims about statistical parameters and subject matter research…

  20. Selecting the most appropriate inferential statistical test for your quantitative research study.

    Science.gov (United States)

    Bettany-Saltikov, Josette; Whittaker, Victoria Jane

    2014-06-01

    To discuss the issues and processes relating to the selection of the most appropriate statistical test. A review of the basic research concepts together with a number of clinical scenarios is used to illustrate this. Quantitative nursing research generally features the use of empirical data which necessitates the selection of both descriptive and statistical tests. Different types of research questions can be answered by different types of research designs, which in turn need to be matched to a specific statistical test(s). Discursive paper. This paper discusses the issues relating to the selection of the most appropriate statistical test and makes some recommendations as to how these might be dealt with. When conducting empirical quantitative studies, a number of key issues need to be considered. Considerations for selecting the most appropriate statistical tests are discussed and flow charts provided to facilitate this process. When nursing clinicians and researchers conduct quantitative research studies, it is crucial that the most appropriate statistical test is selected to enable valid conclusions to be made. © 2013 John Wiley & Sons Ltd.

  1. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

    Science.gov (United States)

    Lin, Johnny; Bentler, Peter M

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.

  2. The Relationship between Test Anxiety and Academic Performance of Students in Vital Statistics Course

    Directory of Open Access Journals (Sweden)

    Shirin Iranfar

    2013-12-01

    Full Text Available Introduction: Test anxiety is a common phenomenon among students and is one of the problems of educational system. The present study was conducted to investigate the test anxiety in vital statistics course and its association with academic performance of students at Kermanshah University of Medical Sciences. This study was descriptive-analytical and the study sample included the students studying in nursing and midwifery, paramedicine and health faculties that had taken vital statistics course and were selected through census method. Sarason questionnaire was used to analyze the test anxiety. Data were analyzed by descriptive and inferential statistics. The findings indicated no significant correlation between test anxiety and score of vital statistics course.

  3. Statistical Analysis for High-Dimensional Data : The Abel Symposium 2014

    CERN Document Server

    Bühlmann, Peter; Glad, Ingrid; Langaas, Mette; Richardson, Sylvia; Vannucci, Marina

    2016-01-01

    This book features research contributions from The Abel Symposium on Statistical Analysis for High Dimensional Data, held in Nyvågar, Lofoten, Norway, in May 2014. The focus of the symposium was on statistical and machine learning methodologies specifically developed for inference in “big data” situations, with particular reference to genomic applications. The contributors, who are among the most prominent researchers on the theory of statistics for high dimensional inference, present new theories and methods, as well as challenging applications and computational solutions. Specific themes include, among others, variable selection and screening, penalised regression, sparsity, thresholding, low dimensional structures, computational challenges, non-convex situations, learning graphical models, sparse covariance and precision matrices, semi- and non-parametric formulations, multiple testing, classification, factor models, clustering, and preselection. Highlighting cutting-edge research and casting light on...

  4. Statistical Tests for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

    Directory of Open Access Journals (Sweden)

    Wararit Panichkitkosolkul

    2015-01-01

    Full Text Available An asymptotic test and an approximate test for the reciprocal of a normal mean with a known coefficient of variation were proposed in this paper. The asymptotic test was based on the expectation and variance of the estimator of the reciprocal of a normal mean. The approximate test used the approximate expectation and variance of the estimator by Taylor series expansion. A Monte Carlo simulation study was conducted to compare the performance of the two statistical tests. Simulation results showed that the two proposed tests performed well in terms of empirical type I errors and power. Nevertheless, the approximate test was easier to compute than the asymptotic test.

  5. A statistical test for drainage network recognition using MeanStreamDrop analysis

    Directory of Open Access Journals (Sweden)

    Corrado Cencetti

    2015-07-01

    Full Text Available This paper provides a new statistical test to evaluate the threshold of validity for the Mean Stream Drop analysis. In the case of a constant area threshold, the method aims to provide a unique threshold value to extract the drainage network through a statistical test more efficient than those widely used. The proposal starts from the assumption that a minimum threshold value exists suitable for drainage network extraction. Then, the method proceeds with Horton–Strahler ordering of the network and statistically analysing the network geometry. This procedure is repeated for all the threshold values in the set under investigation, using a statistical permutation test, called APTDTM (Adjusted Permutation Test based on the Difference between Trimmed Means. Statistical significance is evaluated by p-values adjusted to account for multiple comparisons. As a final result of the statistical analysis, the right threshold value for the specific basin is identified. Classical procedures are based on a set of two sample t-tests. However, this method relies on the assumptions of normality and homogeneity of variance, which are unlikely to hold in practice. The APTDTM test presented here provides accurate p-values even when the sampling distribution is not close to normal, or there is heteroskedasticity in the data.

  6. Average projection type weighted Cramér-von Mises statistics for testing some distributions

    Institute of Scientific and Technical Information of China (English)

    CUI; Hengjian(崔恒建)

    2002-01-01

    This paper addresses the problem of testing goodness-of-fit for several important multivariate distributions: (Ⅰ) Uniform distribution on p-dimensional unit sphere; (Ⅱ) multivariate standard normal distribution; and (Ⅲ) multivariate normal distribution with unknown mean vector and covariance matrix. The average projection type weighted Cramér-yon Mises test statistic as well as estimated and weighted Cramér-von Mises statistics for testing distributions (Ⅰ), (Ⅱ) and (Ⅲ) are constructed via integrating projection direction on the unit sphere, and the asymptotic distributions and the expansions of those test statistics under the null hypothesis are also obtained. Furthermore, the approach of this paper can be applied to testing goodness-of-fit for elliptically contoured distributions.

  7. An investigation of the statistical power of neutrality tests based on comparative and population genetic data

    DEFF Research Database (Denmark)

    Zhai, Weiwei; Nielsen, Rasmus; Slatkin, Montgomery

    2009-01-01

    In this report, we investigate the statistical power of several tests of selective neutrality based on patterns of genetic diversity within and between species. The goal is to compare tests based solely on population genetic data with tests using comparative data or a combination of comparative...... selection. The Hudson-Kreitman-Aguadé test is the most powerful test for detecting positive selection among the population genetic tests investigated, whereas McDonald-Kreitman test typically has more power to detect negative selection. We discuss our findings in the light of the discordant results obtained...

  8. How well do test case prioritization techniques support statistical fault localization

    OpenAIRE

    Tse, TH; Jiang, B.; Zhang, Z; Chen, TY

    2009-01-01

    In continuous integration, a tight integration of test case prioritization techniques and fault-localization techniques may both expose failures faster and locate faults more effectively. Statistical fault-localization techniques use the execution information collected during testing to locate faults. Executing a small fraction of a prioritized test suite reduces the cost of testing, and yet the subsequent fault localization may suffer. This paper presents the first empirical study to examine...

  9. A non-parametric method for automatic determination of P-wave and S-wave arrival times: application to local micro earthquakes

    Science.gov (United States)

    Rawles, Christopher; Thurber, Clifford

    2015-08-01

    We present a simple, fast, and robust method for automatic detection of P- and S-wave arrivals using a nearest neighbours-based approach. The nearest neighbour algorithm is one of the most popular time-series classification methods in the data mining community and has been applied to time-series problems in many different domains. Specifically, our method is based on the non-parametric time-series classification method developed by Nikolov. Instead of building a model by estimating parameters from the data, the method uses the data itself to define the model. Potential phase arrivals are identified based on their similarity to a set of reference data consisting of positive and negative sets, where the positive set contains examples of analyst identified P- or S-wave onsets and the negative set contains examples that do not contain P waves or S waves. Similarity is defined as the square of the Euclidean distance between vectors representing the scaled absolute values of the amplitudes of the observed signal and a given reference example in time windows of the same length. For both P waves and S waves, a single pass is done through the bandpassed data, producing a score function defined as the ratio of the sum of similarity to positive examples over the sum of similarity to negative examples for each window. A phase arrival is chosen as the centre position of the window that maximizes the score function. The method is tested on two local earthquake data sets, consisting of 98 known events from the Parkfield region in central California and 32 known events from the Alpine Fault region on the South Island of New Zealand. For P-wave picks, using a reference set containing two picks from the Parkfield data set, 98 per cent of Parkfield and 94 per cent of Alpine Fault picks are determined within 0.1 s of the analyst pick. For S-wave picks, 94 per cent and 91 per cent of picks are determined within 0.2 s of the analyst picks for the Parkfield and Alpine Fault data set

  10. Weighted pedigree-based statistics for testing the association of rare variants.

    Science.gov (United States)

    Shugart, Yin Yao; Zhu, Yun; Guo, Wei; Xiong, Momiao

    2012-11-24

    With the advent of next-generation sequencing (NGS) technologies, researchers are now generating a deluge of data on high dimensional genomic variations, whose analysis is likely to reveal rare variants involved in the complex etiology of disease. Standing in the way of such discoveries, however, is the fact that statistics for rare variants are currently designed for use with population-based data. In this paper, we introduce a pedigree-based statistic specifically designed to test for rare variants in family-based data. The additional power of pedigree-based statistics stems from the fact that while rare variants related to diseases or traits of interest occur only infrequently in populations, in families with multiple affected individuals, such variants are enriched. Note that while the proposed statistic can be applied with and without statistical weighting, our simulations show that its power increases when weighting (WSS and VT) are applied. Our working hypothesis was that, since rare variants are concentrated in families with multiple affected individuals, pedigree-based statistics should detect rare variants more powerfully than population-based statistics. To evaluate how well our new pedigree-based statistics perform in association studies, we develop a general framework for sequence-based association studies capable of handling data from pedigrees of various types and also from unrelated individuals. In short, we developed a procedure for transforming population-based statistics into tests for family-based associations. Furthermore, we modify two existing tests, the weighted sum-square test and the variable-threshold test, and apply both to our family-based collapsing methods. We demonstrate that the new family-based tests are more powerful than corresponding population-based test and they generate a reasonable type I error rate.To demonstrate feasibility, we apply the newly developed tests to a pedigree-based GWAS data set from the Framingham Heart

  11. Two-sample density-based empirical likelihood tests for incomplete data in application to a pneumonia study.

    Science.gov (United States)

    Vexler, Albert; Yu, Jihnhee

    2011-07-01

    In clinical trials examining the incidence of pneumonia it is a common practice to measure infection via both invasive and non-invasive procedures. In the context of a recently completed randomized trial comparing two treatments the invasive procedure was only utilized in certain scenarios due to the added risk involved, and given that the level of the non-invasive procedure surpassed a given threshold. Hence, what was observed was bivariate data with a pattern of missingness in the invasive variable dependent upon the value of the observed non-invasive observation within a given pair. In order to compare two treatments with bivariate observed data exhibiting this pattern of missingness we developed a semi-parametric methodology utilizing the density-based empirical likelihood approach in order to provide a non-parametric approximation to Neyman-Pearson-type test statistics. This novel empirical likelihood approach has both a parametric and non-parametric components. The non-parametric component utilizes the observations for the non-missing cases, while the parametric component is utilized to tackle the case where observations are missing with respect to the invasive variable. The method is illustrated through its application to the actual data obtained in the pneumonia study and is shown to be an efficient and practical method. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Integration of association statistics over genomic regions using Bayesian adaptive regression splines

    Directory of Open Access Journals (Sweden)

    Zhang Xiaohua

    2003-11-01

    Full Text Available Abstract In the search for genetic determinants of complex disease, two approaches to association analysis are most often employed, testing single loci or testing a small group of loci jointly via haplotypes for their relationship to disease status. It is still debatable which of these approaches is more favourable, and under what conditions. The former has the advantage of simplicity but suffers severely when alleles at the tested loci are not in linkage disequilibrium (LD with liability alleles; the latter should capture more of the signal encoded in LD, but is far from simple. The complexity of haplotype analysis could be especially troublesome for association scans over large genomic regions, which, in fact, is becoming the standard design. For these reasons, the authors have been evaluating statistical methods that bridge the gap between single-locus and haplotype-based tests. In this article, they present one such method, which uses non-parametric regression techniques embodied by Bayesian adaptive regression splines (BARS. For a set of markers falling within a common genomic region and a corresponding set of single-locus association statistics, the BARS procedure integrates these results into a single test by examining the class of smooth curves consistent with the data. The non-parametric BARS procedure generally finds no signal when no liability allele exists in the tested region (ie it achieves the specified size of the test and it is sensitive enough to pick up signals when a liability allele is present. The BARS procedure provides a robust and potentially powerful alternative to classical tests of association, diminishes the multiple testing problem inherent in those tests and can be applied to a wide range of data types, including genotype frequencies estimated from pooled samples.

  13. A Note on Three Statistical Tests in the Logistic Regression DIF Procedure

    Science.gov (United States)

    Paek, Insu

    2012-01-01

    Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…

  14. CUSUM-Based Person-Fit Statistics for Adaptive Testing. Research Report 99-05.

    Science.gov (United States)

    van Krimpen-Stoop, Edith M. L. A.; Meijer, Rob R.

    Item scores that do not fit an assumed item response theory model may cause the latent trait value to be estimated inaccurately. Several person-fit statistics for detecting nonfitting score patterns for paper-and-pencil tests have been proposed. In the context of computerized adaptive tests (CAT), the use of person-fit analysis has hardly been…

  15. The Comparability of the Statistical Characteristics of Test Items Generated by Computer Algorithms.

    Science.gov (United States)

    Meisner, Richard; And Others

    This paper presents a study on the generation of mathematics test items using algorithmic methods. The history of this approach is briefly reviewed and is followed by a survey of the research to date on the statistical parallelism of algorithmically generated mathematics items. Results are presented for 8 parallel test forms generated using 16…

  16. Active Learning and Threshold Concepts in Multiple Testing That Can Further Develop Student Critical Statistical Thinking

    Science.gov (United States)

    White, Desley

    2015-01-01

    Two practical activities are described, which aim to support critical thinking about statistics as they concern multiple outcomes testing. Formulae are presented in Microsoft Excel spreadsheets, which are used to calculate the inflation of error associated with the quantity of tests performed. This is followed by a decision-making exercise, where…

  17. What Are Null Hypotheses? The Reasoning Linking Scientific and Statistical Hypothesis Testing

    Science.gov (United States)

    Lawson, Anton E.

    2008-01-01

    We should dispense with use of the confusing term "null hypothesis" in educational research reports. To explain why the term should be dropped, the nature of, and relationship between, scientific and statistical hypothesis testing is clarified by explication of (a) the scientific reasoning used by Gregor Mendel in testing specific…

  18. A Note on Three Statistical Tests in the Logistic Regression DIF Procedure

    Science.gov (United States)

    Paek, Insu

    2012-01-01

    Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…

  19. Transit Timing Observations from Kepler: VII. Potentially interesting candidate systems from Fourier-based statistical tests

    Energy Technology Data Exchange (ETDEWEB)

    Steffen, Jason H.; /Fermilab; Ford, Eric B.; /Florida U.; Rowe, Jason F.; /NASA, Ames /SETI Inst., Mtn. View; Fabrycky, Daniel C.; /Lick Observ.; Holman, Matthew J.; /Harvard-Smithsonian Ctr. Astrophys.; Welsh, William F.; /San Diego State U., Astron. Dept.; Borucki, William J.; /NASA, Ames; Batalha, Natalie M.; /San Jose State U.; Bryson, Steve; /NASA, Ames; Caldwell, Douglas A.; /NASA, Ames /SETI Inst., Mtn. View; Ciardi, David R.; /Caltech /NASA, Ames /SETI Inst., Mtn. View

    2012-01-01

    We analyze the deviations of transit times from a linear ephemeris for the Kepler Objects of Interest (KOI) through Quarter six (Q6) of science data. We conduct two statistical tests for all KOIs and a related statistical test for all pairs of KOIs in multi-transiting systems. These tests identify several systems which show potentially interesting transit timing variations (TTVs). Strong TTV systems have been valuable for the confirmation of planets and their mass measurements. Many of the systems identified in this study should prove fruitful for detailed TTV studies.

  20. Transit Timing Observations from Kepler: VII. Potentially interesting candidate systems from Fourier-based statistical tests

    CERN Document Server

    Steffen, Jason H; Rowe, Jason F; Fabrycky, Daniel C; Holman, Matthew J; Welsh, William F; Borucki, William J; Batalha, Natalie M; Bryson, Steve; Caldwell, Douglas A; Ciardi, David R; Jenkins, Jon M; Kjeldsen, Hans; Koch, David G; Prsa, Andrej; Sanderfer, Dwight T; Seader, Shawn; Twicken, Joseph D

    2012-01-01

    We analyze the deviations of transit times from a linear ephemeris for the Kepler Objects of Interest (KOI) through Quarter six (Q6) of science data. We conduct two statistical tests for all KOIs and a related statistical test for all pairs of KOIs in multi-transiting systems. These tests identify several systems which show potentially interesting transit timing variations (TTVs). Strong TTV systems have been valuable for the confirmation of planets and their mass measurements. Many of the systems identified in this study should prove fruitful for detailed TTV studies.

  1. TRANSIT TIMING OBSERVATIONS FROM KEPLER. VI. POTENTIALLY INTERESTING CANDIDATE SYSTEMS FROM FOURIER-BASED STATISTICAL TESTS

    Energy Technology Data Exchange (ETDEWEB)

    Steffen, Jason H. [Fermilab Center for Particle Astrophysics, P.O. Box 500, MS 127, Batavia, IL 60510 (United States); Ford, Eric B. [Astronomy Department, University of Florida, 211 Bryant Space Sciences Center, Gainesville, FL 32111 (United States); Rowe, Jason F.; Borucki, William J.; Bryson, Steve; Caldwell, Douglas A.; Jenkins, Jon M.; Koch, David G.; Sanderfer, Dwight T.; Seader, Shawn; Twicken, Joseph D. [NASA Ames Research Center, Moffett Field, CA 94035 (United States); Fabrycky, Daniel C. [UCO/Lick Observatory, University of California, Santa Cruz, CA 95064 (United States); Holman, Matthew J. [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States); Welsh, William F. [Astronomy Department, San Diego State University, San Diego, CA 92182-1221 (United States); Batalha, Natalie M. [Department of Physics and Astronomy, San Jose State University, San Jose, CA 95192 (United States); Ciardi, David R. [NASA Exoplanet Science Institute/California Institute of Technology, Pasadena, CA 91125 (United States); Kjeldsen, Hans [Department of Physics and Astronomy, Aarhus University, DK-8000 Aarhus C (Denmark); Prsa, Andrej, E-mail: jsteffen@fnal.gov [Department of Astronomy and Astrophysics, Villanova University, 800 East Lancaster Avenue, Villanova, PA 19085 (United States)

    2012-09-10

    We analyze the deviations of transit times from a linear ephemeris for the Kepler Objects of Interest (KOI) through quarter six of science data. We conduct two statistical tests for all KOIs and a related statistical test for all pairs of KOIs in multi-transiting systems. These tests identify several systems which show potentially interesting transit timing variations (TTVs). Strong TTV systems have been valuable for the confirmation of planets and their mass measurements. Many of the systems identified in this study should prove fruitful for detailed TTV studies.

  2. Quantitative Phylogenomics of Within-Species Mitogenome Variation: Monte Carlo and Non-Parametric Analysis of Phylogeographic Structure among Discrete Transatlantic Breeding Areas of Harp Seals (Pagophilus groenlandicus.

    Directory of Open Access Journals (Sweden)

    Steven M Carr

    -stepping-stone biogeographic models, but not a simple 1-step trans-Atlantic model. Plots of the cumulative pairwise sequence difference curves among seals in each of the four populations provide continuous proxies for phylogenetic diversification within each. Non-parametric Kolmogorov-Smirnov (K-S tests of maximum pairwise differences between these curves indicates that the Greenland Sea population has a markedly younger phylogenetic structure than either the White Sea population or the two Northwest Atlantic populations, which are of intermediate age and homogeneous structure. The Monte Carlo and K-S assessments provide sensitive quantitative tests of within-species mitogenomic phylogeography. This is the first study to indicate that the White Sea and Greenland Sea populations have different population genetic histories. The analysis supports the hypothesis that Harp Seals comprises three genetically distinguishable breeding populations, in the White Sea, Greenland Sea, and Northwest Atlantic. Implications for an ice-dependent species during ongoing climate change are discussed.

  3. Improved Test Planning and Analysis Through the Use of Advanced Statistical Methods

    Science.gov (United States)

    Green, Lawrence L.; Maxwell, Katherine A.; Glass, David E.; Vaughn, Wallace L.; Barger, Weston; Cook, Mylan

    2016-01-01

    The goal of this work is, through computational simulations, to provide statistically-based evidence to convince the testing community that a distributed testing approach is superior to a clustered testing approach for most situations. For clustered testing, numerous, repeated test points are acquired at a limited number of test conditions. For distributed testing, only one or a few test points are requested at many different conditions. The statistical techniques of Analysis of Variance (ANOVA), Design of Experiments (DOE) and Response Surface Methods (RSM) are applied to enable distributed test planning, data analysis and test augmentation. The D-Optimal class of DOE is used to plan an optimally efficient single- and multi-factor test. The resulting simulated test data are analyzed via ANOVA and a parametric model is constructed using RSM. Finally, ANOVA can be used to plan a second round of testing to augment the existing data set with new data points. The use of these techniques is demonstrated through several illustrative examples. To date, many thousands of comparisons have been performed and the results strongly support the conclusion that the distributed testing approach outperforms the clustered testing approach.

  4. Statistical alignment: computational properties, homology testing and goodness-of-fit

    DEFF Research Database (Denmark)

    Hein, J; Wiuf, Carsten; Møller, Martin

    2000-01-01

    The model of insertions and deletions in biological sequences, first formulated by Thorne, Kishino, and Felsenstein in 1991 (the TKF91 model), provides a basis for performing alignment within a statistical framework. Here we investigate this model.Firstly, we show how to accelerate the statistical...... likelihood estimate. In addition, the recursions originally presented by Thorne, Kishino and Felsenstein can be simplified. Two proteins, about 1500 amino acids long, can be analysed with this method in less than five seconds on a fast desktop computer, which makes this method practical for actual data...... analysis.Secondly, we propose a new homology test based on this model, where homology means that an ancestor to a sequence pair can be found finitely far back in time. This test has statistical advantages relative to the traditional shuffle test for proteins.Finally, we describe a goodness-of-fit test...

  5. The relationship between multilevel models and non-parametric multilevel mixture models: Discrete approximation of intraclass correlation, random coefficient distributions, and residual heteroscedasticity.

    Science.gov (United States)

    Rights, Jason D; Sterba, Sonya K

    2016-11-01

    Multilevel data structures are common in the social sciences. Often, such nested data are analysed with multilevel models (MLMs) in which heterogeneity between clusters is modelled by continuously distributed random intercepts and/or slopes. Alternatively, the non-parametric multilevel regression mixture model (NPMM) can accommodate the same nested data structures through discrete latent class variation. The purpose of this article is to delineate analytic relationships between NPMM and MLM parameters that are useful for understanding the indirect interpretation of the NPMM as a non-parametric approximation of the MLM, with relaxed distributional assumptions. We define how seven standard and non-standard MLM specifications can be indirectly approximated by particular NPMM specifications. We provide formulas showing how the NPMM can serve as an approximation of the MLM in terms of intraclass correlation, random coefficient means and (co)variances, heteroscedasticity of residuals at level 1, and heteroscedasticity of residuals at level 2. Further, we discuss how these relationships can be useful in practice. The specific relationships are illustrated with simulated graphical demonstrations, and direct and indirect interpretations of NPMM classes are contrasted. We provide an R function to aid in implementing and visualizing an indirect interpretation of NPMM classes. An empirical example is presented and future directions are discussed. © 2016 The British Psychological Society.

  6. A Non-Parametric Approach for the Activation Detection of Block Design fMRI Simulated Data Using Self-Organizing Maps and Support Vector Machine.

    Science.gov (United States)

    Bahrami, Sheyda; Shamsi, Mousa

    2017-01-01

    Functional magnetic resonance imaging (fMRI) is a popular method to probe the functional organization of the brain using hemodynamic responses. In this method, volume images of the entire brain are obtained with a very good spatial resolution and low temporal resolution. However, they always suffer from high dimensionality in the face of classification algorithms. In this work, we combine a support vector machine (SVM) with a self-organizing map (SOM) for having a feature-based classification by using SVM. Then, a linear kernel SVM is used for detecting the active areas. Here, we use SOM for feature extracting and labeling the datasets. SOM has two major advances: (i) it reduces dimension of data sets for having less computational complexity and (ii) it is useful for identifying brain regions with small onset differences in hemodynamic responses. Our non-parametric model is compared with parametric and non-parametric methods. We use simulated fMRI data sets and block design inputs in this paper and consider the contrast to noise ratio (CNR) value equal to 0.6 for simulated datasets. fMRI simulated dataset has contrast 1-4% in active areas. The accuracy of our proposed method is 93.63% and the error rate is 6.37%.

  7. The use of carrier solvents in regulatory aquatic toxicology testing: practical, statistical and regulatory considerations.

    Science.gov (United States)

    Green, John; Wheeler, James R

    2013-11-15

    Solvents are often used to aid test item preparation in aquatic ecotoxicity experiments. This paper discusses the practical, statistical and regulatory considerations. The selection of the appropriate control (if a solvent is used) for statistical analysis is investigated using a database of 141 responses (endpoints) from 71 experiments. The advantages and disadvantages of basing the statistical analysis of treatment effects to the water control alone, solvent control alone, combined controls, or a conditional strategy of combining controls, when not statistically significantly different, are tested. The latter two approaches are shown to have distinct advantages. It is recommended that this approach continue to be the standard used for regulatory and research aquatic ecotoxicology studies. However, wherever technically feasible a solvent should not be employed or at least the concentration minimized. Copyright © 2013 Elsevier B.V. All rights reserved.

  8. Statistical tests for differential expression in cDNA microarray experiments

    OpenAIRE

    Cui, Xiangqin; Churchill, Gary A.

    2003-01-01

    Extracting biological information from microarray data requires appropriate statistical methods. The simplest statistical method for detecting differential expression is the t test, which can be used to compare two conditions when there is replication of samples. With more than two conditions, analysis of variance (ANOVA) can be used, and the mixed ANOVA model is a general and powerful approach for microarray experiments with multiple factors and/or several sources of variation.

  9. Price limits and stock market efficiency: Evidence from rolling bicorrelation test statistic

    Energy Technology Data Exchange (ETDEWEB)

    Lim, Kian-Ping [Labuan School of International Business and Finance, Universiti Malaysia Sabah (Malaysia); Department of Econometrics and Business Statistics, Monash University, P.O. Box 1071, Narre Warren, Victoria 3805 (Australia); Brooks, Robert D. [Department of Econometrics and Business Statistics, Monash University, P.O. Box 1071, Narre Warren, Victoria 3805 (Australia)], E-mail: Robert.brooks@buseco.monash.edu.au

    2009-05-15

    Using the rolling bicorrelation test statistic, the present paper compares the efficiency of stock markets from China, Korea and Taiwan in selected sub-periods with different price limits regimes. The statistical results do not support the claims that restrictive price limits and price limits per se are jeopardizing market efficiency. However, the evidence does not imply that price limits have no effect on the price discovery process but rather suggesting that market efficiency is not merely determined by price limits.

  10. Statistical studies of animal response data from USF toxicity screening test method

    Science.gov (United States)

    Hilado, C. J.; Machado, A. M.

    1978-01-01

    Statistical examination of animal response data obtained using Procedure B of the USF toxicity screening test method indicates that the data deviate only slightly from a normal or Gaussian distribution. This slight departure from normality is not expected to invalidate conclusions based on theoretical statistics. Comparison of times to staggering, convulsions, collapse, and death as endpoints shows that time to death appears to be the most reliable endpoint because it offers the lowest probability of missed observations and premature judgements.

  11. Tests and Confidence Intervals for an Extended Variance Component Using the Modified Likelihood Ratio Statistic

    DEFF Research Database (Denmark)

    Christensen, Ole Fredslund; Frydenberg, Morten; Jensen, Jens Ledet

    2005-01-01

    The large deviation modified likelihood ratio statistic is studied for testing a variance component equal to a specified value. Formulas are presented in the general balanced case, whereas in the unbalanced case only the one-way random effects model is studied. Simulation studies are presented, s......, showing that the normal approximation to the large deviation modified likelihood ratio statistic gives confidence intervals for variance components with coverage probabilities very close to the nominal confidence coefficient....

  12. Theory, Methods and Tools for Statistical Testing of Pseudo and Quantum Random Number Generators

    OpenAIRE

    Jakobsson, Krister Sune

    2014-01-01

    Statistical random number testing is a well studied field focusing on pseudo-random number generators, that is to say algorithms that produce random-looking sequences of numbers. These generators tend to have certain kinds of flaws, which have been exploited through rigorous testing. Such testing has led to advancements, and today pseudo random number generators are both very high-speed and produce seemingly random numbers. Recent advancements in quantum physics have opened up new doors, wher...

  13. [Tests of statistical significance in three biomedical journals: a critical review].

    Science.gov (United States)

    Sarria Castro, Madelaine; Silva Ayçaguer, Luis Carlos

    2004-05-01

    To describe the use of conventional tests of statistical significance and the current trends shown by their use in three biomedical journals read in Spanish-speaking countries. All descriptive or explanatory original articles published in the five-year period of 1996 through 2000 were reviewed in three journals: Revista Cubana de Medicina General Integral [Cuban Journal of Comprehensive General Medicine], Revista Panamericana de Salud Pública/Pan American Journal of Public Health, and Medicina Clínica [Clinical Medicine] (which is published in Spain). In the three journals that were reviewed various shortcomings were found in their use of hypothesis tests based on P values and in the limited use of new tools that have been suggested for use in their place: confidence intervals (CIs) and Bayesian inference. The basic findings of our research were: minimal use of CIs, as either a complement to significance tests or as the only statistical tool; mentions of a small sample size as a possible explanation for the lack of statistical significance; a predominant use of rigid alpha values; a lack of uniformity in the presentation of results; and improper reference in the research conclusions to the results of hypothesis tests. Our results indicate the lack of compliance by authors and editors with accepted standards for the use of tests of statistical significance. The findings also highlight that the stagnant use of these tests continues to be a common practice in the scientific literature.

  14. Statistical tests of conditional independence between responses and/or response times on test items

    NARCIS (Netherlands)

    van der Linden, Willem J.; Glas, Cornelis A.W.

    2010-01-01

    Three plausible assumptions of conditional independence in a hierarchical model for responses and response times on test items are identified. For each of the assumptions, a Lagrange multiplier test of the null hypothesis of conditional independence against a parametric alternative is derived. The t

  15. Application of Non-parametric Statistics in Market Research%非参数统计分析方法在市场调查中的应用

    Institute of Scientific and Technical Information of China (English)

    曹小敬

    2007-01-01

    市场调查是以市场为对象,收集、记录、整理与分析企业经营活动有关的数据、资料的活动。市场调查对于企业而言,犹如医生诊断患者,不经市场调查,就无从了解市场情况,就无从制定企业的经营战略。

  16. Mediator, Moderator and Intervening Variables in Marketing Researchs: Conceptualization, Differences and Statistical Procedures and Tests

    Directory of Open Access Journals (Sweden)

    shahriar Azizi

    2013-09-01

    Full Text Available Recent years marketing theories has been developed increasingly. This development needs paying attention to variables such as: moderator and mediator for extending marketing research models. Marketing and management researchers need to be informed about the meaning of mediator, moderator and intervening variables and accurate statistical procedures and tests for identification. By understanding the true meaning of mediator, moderator and intervening variables, providing more accurate marketing models that fits real word become easier. In this paper the meanings of those variables were presented and compared. In the next sections of the paper, statistical procedures and tests provided.

  17. A new model test in high energy physics in frequentist and Bayesian statistical formalisms

    CERN Document Server

    Kamenshchikov, Andrey

    2016-01-01

    A problem of a new physical model test given observed experimental data is a typical one for modern experiments of high energy physics (HEP). A solution of the problem may be provided with two alternative statistical formalisms, namely frequentist and Bayesian, which are widely spread in contemporary HEP searches. A characteristic experimental situation is modeled from general considerations and both the approaches are utilized in order to test a new model. The results are juxtaposed, what demonstrates their consistency in this work. An effect of a systematic uncertainty treatment in the statistical analysis is also considered.

  18. A new model test in high energy physics in frequentist and Bayesian statistical formalisms

    Science.gov (United States)

    Kamenshchikov, A.

    2017-01-01

    A problem of a new physical model test given observed experimental data is a typical one for modern experiments of high energy physics (HEP). A solution of the problem may be provided with two alternative statistical formalisms, namely frequentist and Bayesian, which are widely spread in contemporary HEP searches. A characteristic experimental situation is modeled from general considerations and both the approaches are utilized in order to test a new model. The results are juxtaposed, what demonstrates their consistency in this work. An effect of a systematic uncertainty treatment in the statistical analysis is also considered.

  19. Testing University Rankings Statistically: Why this Perhaps is not such a Good Idea after All. Some Reflections on Statistical Power, Effect Size, Random Sampling and Imaginary Populations

    DEFF Research Database (Denmark)

    Schneider, Jesper Wiborg

    2012-01-01

    In this paper we discuss and question the use of statistical significance tests in relation to university rankings as recently suggested. We outline the assumptions behind and interpretations of statistical significance tests and relate this to examples from the recent SCImago Institutions Ranking....... By use of statistical power analyses and demonstration of effect sizes, we emphasize that importance of empirical findings lies in “differences that make a difference” and not statistical significance tests per se. Finally we discuss the crucial assumption of randomness and question the presumption...... that randomness is present in the university ranking data. We conclude that the application of statistical significance tests in relation to university rankings, as recently advocated, is problematic and can be misleading....

  20. Empirical Statistical Power for Testing Multilocus Genotypic Effects under Unbalanced Designs Using a Gibbs Sampler

    Directory of Open Access Journals (Sweden)

    Chaeyoung Lee

    2012-11-01

    Full Text Available Epistasis that may explain a large portion of the phenotypic variation for complex economic traits of animals has been ignored in many genetic association studies. A Baysian method was introduced to draw inferences about multilocus genotypic effects based on their marginal posterior distributions by a Gibbs sampler. A simulation study was conducted to provide statistical powers under various unbalanced designs by using this method. Data were simulated by combined designs of number of loci, within genotype variance, and sample size in unbalanced designs with or without null combined genotype cells. Mean empirical statistical power was estimated for testing posterior mean estimate of combined genotype effect. A practical example for obtaining empirical statistical power estimates with a given sample size was provided under unbalanced designs. The empirical statistical powers would be useful for determining an optimal design when interactive associations of multiple loci with complex phenotypes were examined.

  1. A test statistic in the complex Wishart distribution and its application to change detection in polarimetric SAR data

    DEFF Research Database (Denmark)

    Conradsen, Knut; Nielsen, Allan Aasbjerg; Schou, Jesper;

    2003-01-01

    . Based on this distribution, a test statistic for equality of two such matrices and an associated asymptotic probability for obtaining a smaller value of the test statistic are derived and applied successfully to change detection in polarimetric SAR data. In a case study, EMISAR L-band data from April 17...... to HH, VV, or HV data alone, the derived test statistic reduces to the well-known gamma likelihood-ratio test statistic. The derived test statistic and the associated significance value can be applied as a line or edge detector in fully polarimetric SAR data also....

  2. Statistical Tests for the Gaussian Nature of Primordial Fluctuations Through CBR Experiments

    CERN Document Server

    Luo, X

    1994-01-01

    Information about the physical processes that generate the primordial fluctuations in the early universe can be gained by testing the Gaussian nature of the fluctuations through cosmic microwave background radiation (CBR) temperature anisotropy experiments. One of the crucial aspects of density perturbations that are produced by the standard inflation scenario is that they are Gaussian, whereas seeds produced by topological defects left over from an early cosmic phase transition tend to be non-Gaussian. To carry out this test, sophisticated statistical tools are required. In this paper, we will discuss several such statistical tools, including multivariant skewness and kurtosis, Euler-Poincare characteristics, the three point temperature correlation function, and the Hotelling's $T^{2}$ statistic defined through bispectral estimates of a one dimensional dataset. The effect of noise present in the current data is discussed in detail and the COBE 53 GHz dataset is analyzed. Our analysis shows that, on the large...

  3. An Algorithm to Improve Test Answer Copying Detection Using the Omega Statistic

    Science.gov (United States)

    Maeda, Hotaka; Zhang, Bo

    2017-01-01

    The omega (?) statistic is reputed to be one of the best indices for detecting answer copying on multiple choice tests, but its performance relies on the accurate estimation of copier ability, which is challenging because responses from the copiers may have been contaminated. We propose an algorithm that aims to identify and delete the suspected…

  4. Statistical methods for the detection of answer copying on achievement tests

    NARCIS (Netherlands)

    Sotaridona, Leonardo Sitchirita

    2003-01-01

    This thesis contains a collection of studies where statistical methods for the detection of answer copying on achievement tests in multiple-choice format are proposed and investigated. Although all methods are suited to detect answer copying, each method is designed to address specific characteristi

  5. Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.

    Science.gov (United States)

    Kieffer, Kevin M.; Thompson, Bruce

    As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate…

  6. Recent Literature on Whether Statistical Significance Tests Should or Should Not Be Banned.

    Science.gov (United States)

    Deegear, James

    This paper summarizes the literature regarding statistical significant testing with an emphasis on recent literature in various discipline and literature exploring why researchers have demonstrably failed to be influenced by the American Psychological Association publication manual's encouragement to report effect sizes. Also considered are…

  7. Statistical Testing of Optimality Conditions in Multiresponse Simulation-based Optimization (Revision of 2005-81)

    NARCIS (Netherlands)

    Bettonvil, B.W.M.; Del Castillo, E.; Kleijnen, J.P.C.

    2007-01-01

    This paper studies simulation-based optimization with multiple outputs. It assumes that the simulation model has one random objective function and must satisfy given constraints on the other random outputs. It presents a statistical procedure for test- ing whether a specific input combination

  8. Statistical Testing of Optimality Conditions in Multiresponse Simulation-based Optimization (Revision of 2005-81)

    NARCIS (Netherlands)

    Bettonvil, B.W.M.; Del Castillo, E.; Kleijnen, J.P.C.

    2007-01-01

    This paper studies simulation-based optimization with multiple outputs. It assumes that the simulation model has one random objective function and must satisfy given constraints on the other random outputs. It presents a statistical procedure for test- ing whether a specific input combination (propo

  9. A critical discussion of null hypothesis significance testing and statistical power analysis within psychological research

    DEFF Research Database (Denmark)

    Jones, Allan; Sommerlund, Bo

    2007-01-01

    The uses of null hypothesis significance testing (NHST) and statistical power analysis within psychological research are critically discussed. The article looks at the problems of relying solely on NHST when dealing with small and large sample sizes. The use of power-analysis in estimating...

  10. Connecting Science and Mathematics: The Nature of Scientific and Statistical Hypothesis Testing

    Science.gov (United States)

    Lawson, Anton E.; Oehrtman, Michael; Jensen, Jamie

    2008-01-01

    Confusion persists concerning the roles played by scientific hypotheses and predictions in doing science. This confusion extends to the nature of scientific and statistical hypothesis testing. The present paper utilizes the "If/and/then/Therefore" pattern of hypothetico-deductive (HD) reasoning to explicate the nature of both scientific and…

  11. Water quality analysis in rivers with non-parametric probability distributions and fuzzy inference systems: application to the Cauca River, Colombia.

    Science.gov (United States)

    Ocampo-Duque, William; Osorio, Carolina; Piamba, Christian; Schuhmacher, Marta; Domingo, José L

    2013-02-01

    The integration of water quality monitoring variables is essential in environmental decision making. Nowadays, advanced techniques to manage subjectivity, imprecision, uncertainty, vagueness, and variability are required in such complex evaluation process. We here propose a probabilistic fuzzy hybrid model to assess river water quality. Fuzzy logic reasoning has been used to compute a water quality integrative index. By applying a Monte Carlo technique, based on non-parametric probability distributions, the randomness of model inputs was estimated. Annual histograms of nine water quality variables were built with monitoring data systematically collected in the Colombian Cauca River, and probability density estimations using the kernel smoothing method were applied to fit data. Several years were assessed, and river sectors upstream and downstream the city of Santiago de Cali, a big city with basic wastewater treatment and high industrial activity, were analyzed. The probabilistic fuzzy water quality index was able to explain the reduction in water quality, as the river receives a larger number of agriculture, domestic, and industrial effluents. The results of the hybrid model were compared to traditional water quality indexes. The main advantage of the proposed method is that it considers flexible boundaries between the linguistic qualifiers used to define the water status, being the belongingness of water quality to the diverse output fuzzy sets or classes provided with percentiles and histograms, which allows classify better the real water condition. The results of this study show that fuzzy inference systems integrated to stochastic non-parametric techniques may be used as complementary tools in water quality indexing methodologies.

  12. Reproducibility-optimized test statistic for ranking genes in microarray studies.

    Science.gov (United States)

    Elo, Laura L; Filén, Sanna; Lahesmaa, Riitta; Aittokallio, Tero

    2008-01-01

    A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.

  13. Direct Numerical Test of the Statistical Mechanical Theory of Hydrophobic Interactions

    CERN Document Server

    Chaudhari, M I; Ashbaugh, H S; Pratt, L R

    2013-01-01

    This work tests the statistical mechanical theory of hydrophobic interactions, isolates consequences of excluded volume interactions, and obtains B2 for those purposes. Cavity methods that are particularly appropriate for study of hydrophobic interactions between atomic-size hard spheres in liquid water are developed and applied to test aspects of the Pratt-Chandler (PC) theory that have not been tested. Contact hydrophobic interactions between Ar-size hard-spheres in water are significantly more attractive than predicted by the PC theory. The corresponding results for the osmotic second virial coefficient are attractive (B2 <0), and more attractive with increasing temperature (Delta B2/Delta T < 0) in the temperature range 300K < T < 360K. This information has not been available previously, but is essential for development of the molecular-scale statistical mechanical theory of hydrophobic interactions, particularly for better definition of the role of attractive intermolecular interactions assoc...

  14. A review of statistical methods for testing genetic anticipation: looking for an answer in Lynch syndrome

    DEFF Research Database (Denmark)

    Boonstra, Philip S; Gruber, Stephen B; Raymond, Victoria M

    2010-01-01

    , and this right truncation effect is more pronounced in children than in parents. In this study, we first review different statistical methods for testing genetic anticipation in affected parent-child pairs that address the issue of bias due to right truncation. Using affected parent-child pair data, we compare......Anticipation, manifested through decreasing age of onset or increased severity in successive generations, has been noted in several genetic diseases. Statistical methods for genetic anticipation range from a simple use of the paired t-test for age of onset restricted to affected parent-child pairs...... to a recently proposed random effects model which includes extended pedigree data and unaffected family members [Larsen et al., 2009]. A naive use of the paired t-test is biased for the simple reason that age of onset has to be less than the age at ascertainment (interview) for both affected parent and child...

  15. A review of statistical methods for testing genetic anticipation: looking for an answer in Lynch syndrome

    DEFF Research Database (Denmark)

    Boonstra, Philip S; Gruber, Stephen B; Raymond, Victoria M

    2010-01-01

    Anticipation, manifested through decreasing age of onset or increased severity in successive generations, has been noted in several genetic diseases. Statistical methods for genetic anticipation range from a simple use of the paired t-test for age of onset restricted to affected parent-child pairs......, and this right truncation effect is more pronounced in children than in parents. In this study, we first review different statistical methods for testing genetic anticipation in affected parent-child pairs that address the issue of bias due to right truncation. Using affected parent-child pair data, we compare...... to a recently proposed random effects model which includes extended pedigree data and unaffected family members [Larsen et al., 2009]. A naive use of the paired t-test is biased for the simple reason that age of onset has to be less than the age at ascertainment (interview) for both affected parent and child...

  16. Computing Critical Values of Exact Tests by Incorporating Monte Carlo Simulations Combined with Statistical Tables.

    Science.gov (United States)

    Vexler, Albert; Kim, Young Min; Yu, Jihnhee; Lazar, Nicole A; Hutson, Aland

    2014-12-01

    Various exact tests for statistical inference are available for powerful and accurate decision rules provided that corresponding critical values are tabulated or evaluated via Monte Carlo methods. This article introduces a novel hybrid method for computing p-values of exact tests by combining Monte Carlo simulations and statistical tables generated a priori. To use the data from Monte Carlo generations and tabulated critical values jointly, we employ kernel density estimation within Bayesian-type procedures. The p-values are linked to the posterior means of quantiles. In this framework, we present relevant information from the Monte Carlo experiments via likelihood-type functions, whereas tabulated critical values are used to reflect prior distributions. The local maximum likelihood technique is employed to compute functional forms of prior distributions from statistical tables. Empirical likelihood functions are proposed to replace parametric likelihood functions within the structure of the posterior mean calculations to provide a Bayesian-type procedure with a distribution-free set of assumptions. We derive the asymptotic properties of the proposed nonparametric posterior means of quantiles process. Using the theoretical propositions, we calculate the minimum number of needed Monte Carlo resamples for desired level of accuracy on the basis of distances between actual data characteristics (e.g. sample sizes) and characteristics of data used to present corresponding critical values in a table. The proposed approach makes practical applications of exact tests simple and rapid. Implementations of the proposed technique are easily carried out via the recently developed STATA and R statistical packages.

  17. A Multi-Core Parallelization Strategy for Statistical Significance Testing in Learning Classifier Systems.

    Science.gov (United States)

    Rudd, James; Moore, Jason H; Urbanowicz, Ryan J

    2013-11-01

    Permutation-based statistics for evaluating the significance of class prediction, predictive attributes, and patterns of association have only appeared within the learning classifier system (LCS) literature since 2012. While still not widely utilized by the LCS research community, formal evaluations of test statistic confidence are imperative to large and complex real world applications such as genetic epidemiology where it is standard practice to quantify the likelihood that a seemingly meaningful statistic could have been obtained purely by chance. LCS algorithms are relatively computationally expensive on their own. The compounding requirements for generating permutation-based statistics may be a limiting factor for some researchers interested in applying LCS algorithms to real world problems. Technology has made LCS parallelization strategies more accessible and thus more popular in recent years. In the present study we examine the benefits of externally parallelizing a series of independent LCS runs such that permutation testing with cross validation becomes more feasible to complete on a single multi-core workstation. We test our python implementation of this strategy in the context of a simulated complex genetic epidemiological data mining problem. Our evaluations indicate that as long as the number of concurrent processes does not exceed the number of CPU cores, the speedup achieved is approximately linear.

  18. A critical discussion of null hypothesis significance testing and statistical power analysis within psychological research

    DEFF Research Database (Denmark)

    Jones, Allan; Sommerlund, Bo

    2007-01-01

    The uses of null hypothesis significance testing (NHST) and statistical power analysis within psychological research are critically discussed. The article looks at the problems of relying solely on NHST when dealing with small and large sample sizes. The use of power-analysis in estimating...... the potential error introduced by small and large samples is advocated. Power analysis is not recommended as a replacement to NHST but as an additional source of information about the phenomena under investigation. Moreover, the importance of conceptual analysis in relation to statistical analysis of hypothesis...

  19. A Test by Any Other Name: P Values, Bayes Factors, and Statistical Inference.

    Science.gov (United States)

    Stern, Hal S

    2016-01-01

    Procedures used for statistical inference are receiving increased scrutiny as the scientific community studies the factors associated with insuring reproducible research. This note addresses recent negative attention directed at p values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye toward better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required.

  20. Statistical power analysis a simple and general model for traditional and modern hypothesis tests

    CERN Document Server

    Murphy, Kevin R; Wolach, Allen

    2014-01-01

    Noted for its accessible approach, this text applies the latest approaches of power analysis to both null hypothesis and minimum-effect testing using the same basic unified model. Through the use of a few simple procedures and examples, the authors show readers with little expertise in statistical analysis how to obtain the values needed to carry out the power analysis for their research. Illustrations of how these analyses work and how they can be used to choose the appropriate criterion for defining statistically significant outcomes are sprinkled throughout. The book presents a simple and g

  1. Inductive inference or inductive behavior: Fisher and Neyman-Pearson approaches to statistical testing in psychological research (1940-1960).

    Science.gov (United States)

    Halpin, Peter F; Stam, Henderikus J

    2006-01-01

    The application of statistical testing in psychological research over the period of 1940-1960 is examined in order to address psychologists' reconciliation of the extant controversy between the Fisher and Neyman-Pearson approaches. Textbooks of psychological statistics and the psychological journal literature are reviewed to examine the presence of what Gigerenzer (1993) called a hybrid model of statistical testing. Such a model is present in the textbooks, although the mathematically incomplete character of this model precludes the appearance of a similarly hybridized approach to statistical testing in the research literature. The implications of this hybrid model for psychological research and the statistical testing controversy are discussed.

  2. Identifiability and inference of non-parametric rates-across-sites models on large-scale phylogenies

    CERN Document Server

    Mossel, Elchanan

    2011-01-01

    Mutation rate variation across loci is well known to cause difficulties, notably identifiability issues, in the reconstruction of evolutionary trees from molecular sequences. Here we introduce a new approach for estimating general rates-across-sites models. Our results imply, in particular, that large phylogenies are typically identifiable under rate variation. We also derive sequence-length requirements for high-probability reconstruction. Our main contribution is a novel algorithm that clusters sites according to their mutation rate. Following this site clustering step, standard reconstruction techniques can be used to recover the phylogeny. Our results rely on a basic insight: that, for large trees, certain site statistics experience concentration-of-measure phenomena.

  3. An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics.

    Science.gov (United States)

    Kim, Junghi; Bai, Yun; Pan, Wei

    2015-12-01

    We study the problem of testing for single marker-multiple phenotype associations based on genome-wide association study (GWAS) summary statistics without access to individual-level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual-level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta-analyzed GWAS dataset with three blood lipid traits and another with sex-stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta-analyzed) genome-wide summary statistics, then extend the method to meta-analysis of multiple sets of genome-wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods.

  4. Minor differences in haplotype frequency estimates can produce very large differences in heterogeneity test statistics

    Directory of Open Access Journals (Sweden)

    Xu Ke

    2007-06-01

    Full Text Available Abstract Background Tests for association between a haplotype and disease are commonly performed using a likelihood ratio test for heterogeneity between case and control haplotype frequencies. Using data from a study of association between heroin dependence and the DRD2 gene, we obtained estimated haplotype frequencies and the associated likelihood ratio statistic using two different computer programs, MLOCUS and GENECOUNTING. We also carried out permutation testing to assess the empirical significance of the results obtained. Results Both programs yielded similar, though not identical, estimates for the haplotype frequencies. MLOCUS produced a p value of 1.8*10-15 and GENECOUNTING produced a p value of 5.4*10-4. Permutation testing produced a p value 2.8*10-4. Conclusion The fact that very large differences occur between the likelihood ratio statistics from the two programs may reflect the fact that the haplotype frequencies for the combined group are not constrained to be equal to the weighted averages of the frequencies for the cases and controls, as they would be if they were directly observed rather than being estimated. Minor differences in haplotype frequency estimates can result in very large differences in the likelihood ratio statistic and associated p value.

  5. The Effects of Sample Size on Expected Value, Variance and Fraser Efficiency for Nonparametric Independent Two Sample Tests

    Directory of Open Access Journals (Sweden)

    Ismet DOGAN

    2015-10-01

    Full Text Available Objective: Choosing the most efficient statistical test is one of the essential problems of statistics. Asymptotic relative efficiency is a notion which enables to implement in large samples the quantitative comparison of two different tests used for testing of the same statistical hypothesis. The notion of the asymptotic efficiency of tests is more complicated than that of asymptotic efficiency of estimates. This paper discusses the effect of sample size on expected values and variances of non-parametric tests for independent two samples and determines the most effective test for different sample sizes using Fraser efficiency value. Material and Methods: Since calculating the power value in comparison of the tests is not practical most of the time, using the asymptotic relative efficiency value is favorable. Asymptotic relative efficiency is an indispensable technique for comparing and ordering statistical test in large samples. It is especially useful in nonparametric statistics where there exist numerous heuristic tests such as the linear rank tests. In this study, the sample size is determined as 2 ≤ n ≤ 50. Results: In both balanced and unbalanced cases, it is found that, as the sample size increases expected values and variances of all the tests discussed in this paper increase as well. Additionally, considering the Fraser efficiency, Mann-Whitney U test is found as the most efficient test among the non-parametric tests that are used in comparison of independent two samples regardless of their sizes. Conclusion: According to Fraser efficiency, Mann-Whitney U test is found as the most efficient test.

  6. Role of Statistical tests in Estimation of the Security of a New Encryption Algorithm

    CERN Document Server

    Krishna, Addepalli V N

    2010-01-01

    Encryption study basically deals with three levels of algorithms. The first algorithm deals with encryption mechanism, second deals with decryption Mechanism and the third discusses about the generation of keys and sub keys used in the encryption study. In the given study, a new algorithm is discussed. The algorithm executes a series of steps and generates a sequence. This sequence is being used as sub key to be mapped to plain text to generate cipher text. The strength of the encryption & Decryption process depends on the strength of sequence generated against crypto analysis.. In this part of work some statistical tests like Uniformity tests, Universal tests & Repetition tests are tried on the sequence generated to test the strength of it.

  7. Statistical refinements for data analysis of mollusc reproduction tests: an example with Lymnaea stagnalis

    DEFF Research Database (Denmark)

    Holbech, Henrik

    Since 2012, European experts work towards the development and validation of an OECD test guideline for mollusc reproductive toxicity with the freshwater gastropod Lymnaea stagnalis. A ring-test involving six laboratories allowed studying reproducibility of results, based on survival and reproduct......Since 2012, European experts work towards the development and validation of an OECD test guideline for mollusc reproductive toxicity with the freshwater gastropod Lymnaea stagnalis. A ring-test involving six laboratories allowed studying reproducibility of results, based on survival...... and reproduction data of snails monitored over 56 days exposure to cadmium. A classical statistical analysis of data was initially conducted by hypothesis tests and fit of parametric concentrationresponse models. However, as mortality occurred in exposed snails, these analyses require to be refined, particularly...... was twofold. First, we refined the statistical analyses of reproduction data accounting for mortality all along the test period. The variable “number of clutches/eggs produced per individual-day” was used for EC x modelling, as classically done in epidemiology in order to account for the time...

  8. Statistical hypothesis testing and common misinterpretations: Should we abandon p-value in forensic science applications?

    Science.gov (United States)

    Taroni, F; Biedermann, A; Bozza, S

    2016-02-01

    Many people regard the concept of hypothesis testing as fundamental to inferential statistics. Various schools of thought, in particular frequentist and Bayesian, have promoted radically different solutions for taking a decision about the plausibility of competing hypotheses. Comprehensive philosophical comparisons about their advantages and drawbacks are widely available and continue to span over large debates in the literature. More recently, controversial discussion was initiated by an editorial decision of a scientific journal [1] to refuse any paper submitted for publication containing null hypothesis testing procedures. Since the large majority of papers published in forensic journals propose the evaluation of statistical evidence based on the so called p-values, it is of interest to expose the discussion of this journal's decision within the forensic science community. This paper aims to provide forensic science researchers with a primer on the main concepts and their implications for making informed methodological choices.

  9. Testing a statistical method of global mean palotemperature estimations in a long climate simulation

    Energy Technology Data Exchange (ETDEWEB)

    Zorita, E.; Gonzalez-Rouco, F. [GKSS-Forschungszentrum Geesthacht GmbH (Germany). Inst. fuer Hydrophysik

    2001-07-01

    Current statistical methods of reconstructing the climate of the last centuries are based on statistical models linking climate observations (temperature, sea-level-pressure) and proxy-climate data (tree-ring chronologies, ice-cores isotope concentrations, varved sediments, etc.). These models are calibrated in the instrumental period, and the longer time series of proxy data are then used to estimate the past evolution of the climate variables. Using such methods the global mean temperature of the last 600 years has been recently estimated. In this work this method of reconstruction is tested using data from a very long simulation with a climate model. This testing allows to estimate the errors of the estimations as a function of the number of proxy data and the time scale at which the estimations are probably reliable. (orig.)

  10. Statistical analysis of the hen's egg test for micronucleus induction (HET-MN assay).

    Science.gov (United States)

    Hothorn, Ludwig A; Reisinger, Kerstin; Wolf, Thorsten; Poth, Albrecht; Fieblinger, Dagmar; Liebsch, Manfred; Pirow, Ralph

    2013-09-18

    The HET-MN assay (hen's egg test for micronucleus induction) is different from other in vitro genotoxicity assays in that it includes toxicologically important features such as absorption, distribution, metabolic activation, and excretion of the test compound. As a promising follow-up to complement existing in vitro test batteries for genotoxicity, the HET-MN is currently undergoing a formal validation. To optimize the validation, the present study describes a critical analysis of previously obtained HET-MN data to check the experimental design and to identify the most appropriate statistical procedure to evaluate treatment effects. Six statistical challenges (I-VI) of general relevance were identified, and remedies were provided which can be transferred to similarly designed test methods: a Williams-type trend test is proposed for overdispersed counts (II) by means of a square-root transformation which is robust for small sample sizes (I), variance heterogeneity (III), and possible downturn effects at high doses (IV). Due to near-to-zero or even zero-count data occurring in the negative control (V), a conditional comparison of the treatment groups against the mean of the historical controls (VI) instead of the concurrent control was proposed, which is in accordance with US-FDA recommendations. For the modified Williams-type tests, the power can be estimated depending on the magnitude and shape of the trend, the number of dose groups, and the magnitude of the MN counts in the negative control. The experimental design used previously (i.e. six eggs per dose group, scoring of 1000 cells per egg) was confirmed. The proposed approaches are easily available in the statistical computing environment R, and the corresponding R-codes are provided.

  11. Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.

    Science.gov (United States)

    Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg

    2009-11-01

    G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.

  12. Do firms share the same functional form of their growth rate distribution? A new statistical test

    CERN Document Server

    Lunardi, Josè T; Lillo, Fabrizio; Mantegna, Rosario N; Gallegati, Mauro

    2011-01-01

    We introduce a new statistical test of the hypothesis that a balanced panel of firms have the same growth rate distribution or, more generally, that they share the same functional form of growth rate distribution. We applied the test to European Union and US publicly quoted manufacturing firms data, considering functional forms belonging to the Subbotin family of distributions. While our hypotheses are rejected for the vast majority of sets at the sector level, we cannot rejected them at the subsector level, indicating that homogenous panels of firms could be described by a common functional form of growth rate distribution.

  13. Case Studies for the Statistical Design of Experiments Applied to Powered Rotor Wind Tunnel Tests

    Science.gov (United States)

    Overmeyer, Austin D.; Tanner, Philip E.; Martin, Preston B.; Commo, Sean A.

    2015-01-01

    The application of statistical Design of Experiments (DOE) to helicopter wind tunnel testing was explored during two powered rotor wind tunnel entries during the summers of 2012 and 2013. These tests were performed jointly by the U.S. Army Aviation Development Directorate Joint Research Program Office and NASA Rotary Wing Project Office, currently the Revolutionary Vertical Lift Project, at NASA Langley Research Center located in Hampton, Virginia. Both entries were conducted in the 14- by 22-Foot Subsonic Tunnel with a small portion of the overall tests devoted to developing case studies of the DOE approach as it applies to powered rotor testing. A 16-47 times reduction in the number of data points required was estimated by comparing the DOE approach to conventional testing methods. The average error for the DOE surface response model for the OH-58F test was 0.95 percent and 4.06 percent for drag and download, respectively. The DOE surface response model of the Active Flow Control test captured the drag within 4.1 percent of measured data. The operational differences between the two testing approaches are identified, but did not prevent the safe operation of the powered rotor model throughout the DOE test matrices.

  14. Robust Statistical Tests of Dragon-Kings beyond Power Law Distributions

    CERN Document Server

    Pisarenko, V F

    2011-01-01

    We ask the question whether it is possible to diagnose the existence of "Dragon-Kings" (DK), namely anomalous observations compared to a power law background distribution of event sizes. We present two new statistical tests, the U-test and the DK-test, aimed at identifying the existence of even a single anomalous event in the tail of the distribution of just a few tens of observations. The DK-test in particular is derived such that the p-value of its statistic is independent of the exponent characterizing the null hypothesis. We demonstrate how to apply these two tests on the distributions of cities and of agglomerations in a number of countries. We find the following evidence for Dragon-Kings: London in the distribution of city sizes of Great Britain; Moscow and St-Petersburg in the distribution of city sizes in the Russian Federation; and Paris in the distribution of agglomeration sizes in France. True negatives are also reported, for instance the absence of Dragon-Kings in the distribution of cities in Ger...

  15. A Rank Test on Equality of Population Medians

    Directory of Open Access Journals (Sweden)

    Pooi Ah Hin

    2013-02-01

    Full Text Available The Kruskal-Wallis test is a non-parametric test for the equality of K population medians. The test statistic involved is a measure of the overall closeness of the K average ranks in the individual samples to the average rank in the combined sample. The resulting acceptance region of the test however may not be the smallest region with the required acceptance probability under the null hypothesis. Presently an alternative acceptance region is constructed such that it has the smallest size, apart from having the required acceptance probability. Compared to the Kruskal-Wallis test, the alternative test is found to have larger average power computed from the powers along the evenly chosen directions of deviation of the medians.

  16. Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery

    Directory of Open Access Journals (Sweden)

    Garner Harold R

    2009-05-01

    Full Text Available Abstract Background Mass spectrometry-based biomarker discovery has long been hampered by the difficulty in reconciling lists of discriminatory peaks identified by different laboratories for the same diseases studied. We describe a multi-statistical analysis procedure that combines several independent computational methods. This approach capitalizes on the strengths of each to analyze the same high-resolution mass spectral data set to discover consensus differential mass peaks that should be robust biomarkers for distinguishing between disease states. Results The proposed methodology was applied to a pilot narcolepsy study using logistic regression, hierarchical clustering, t-test, and CART. Consensus, differential mass peaks with high predictive power were identified across three of the four statistical platforms. Based on the diagnostic accuracy measures investigated, the performance of the consensus-peak model was a compromise between logistic regression and CART, which produced better models than hierarchical clustering and t-test. However, consensus peaks confer a higher level of confidence in their ability to distinguish between disease states since they do not represent peaks that are a result of biases to a particular statistical algorithm. Instead, they were selected as differential across differing data distribution assumptions, demonstrating their true discriminatory potential. Conclusion The methodology described here is applicable to any high-resolution MALDI mass spectrometry-derived data set with minimal mass drift which is essential for peak-to-peak comparison studies. Four statistical approaches with differing data distribution assumptions were applied to the same raw data set to obtain consensus peaks that were found to be statistically differential between the two groups compared. These consensus peaks demonstrated high diagnostic accuracy when used to form a predictive model as evaluated by receiver operating characteristics

  17. Confidence intervals permit, but do not guarantee, better inference than statistical significance testing.

    Science.gov (United States)

    Coulson, Melissa; Healey, Michelle; Fidler, Fiona; Cumming, Geoff

    2010-01-01

    A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST), or confidence intervals (CIs). Authors of articles published in psychology, behavioral neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST.

  18. Confidence intervals permit, but don't guarantee, better inference than statistical significance testing

    Directory of Open Access Journals (Sweden)

    Melissa Coulson

    2010-07-01

    Full Text Available A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST, or confidence intervals (CIs. Authors of articles published in psychology, behavioural neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST.

  19. Rorschach test: Italian calibration update about statistical frequencies of responses and location sheets

    Directory of Open Access Journals (Sweden)

    Stefano Caruson

    2015-12-01

    Full Text Available Abstract The remarkable importance of a calibration of a test lies in the formalization of useful statistical norms. In particular, the determination of these norms is of key importance for the Rorschach Test because of it allows objectifying the estimates of the interpretations’ formal qualities, and help to characterize responses consistent with the common perception. The aim of this work is to communicate the new results provided by a study conducted  on Rorschach protocols related to a sample of “non-clinical” subjects. The expert team in Psychodiagnostic of CIFRIC (Italian Center for training, research and clinic in medicine and psychology has carried out the following work identifying the rate at which the details of each card are interpreted by normative sample. The data obtained are systematized in new Location sheets, which refers to the next edition of the "Updated Manual of Locations and Coding of Responses to Rorschach Test".             Considering the Rorschach Test one of the more effective means for the acquaintance of the personality, it appears therefore fundamental to provide the professional, who uses it, with the possibility of accessing updated statistical data that reflect the population of reference, in order to deduce from them reliable and objectively valid indications.

  20. A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants

    Science.gov (United States)

    Broadaway, K. Alaine; Cutler, David J.; Duncan, Richard; Moore, Jacob L.; Ware, Erin B.; Jhun, Min A.; Bielak, Lawrence F.; Zhao, Wei; Smith, Jennifer A.; Peyser, Patricia A.; Kardia, Sharon L.R.; Ghosh, Debashis; Epstein, Michael P.

    2016-01-01

    Increasing empirical evidence suggests that many genetic variants influence multiple distinct phenotypes. When cross-phenotype effects exist, multivariate association methods that consider pleiotropy are often more powerful than univariate methods that model each phenotype separately. Although several statistical approaches exist for testing cross-phenotype effects for common variants, there is a lack of similar tests for gene-based analysis of rare variants. In order to fill this important gap, we introduce a statistical method for cross-phenotype analysis of rare variants using a nonparametric distance-covariance approach that compares similarity in multivariate phenotypes to similarity in rare-variant genotypes across a gene. The approach can accommodate both binary and continuous phenotypes and further can adjust for covariates. Our approach yields a closed-form test whose significance can be evaluated analytically, thereby improving computational efficiency and permitting application on a genome-wide scale. We use simulated data to demonstrate that our method, which we refer to as the Gene Association with Multiple Traits (GAMuT) test, provides increased power over competing approaches. We also illustrate our approach using exome-chip data from the Genetic Epidemiology Network of Arteriopathy. PMID:26942286

  1. Statistic Analysis on Quantitative Characteristics for Developing the DUS Test Guideline of Ranunculus asiaticus L.

    Institute of Scientific and Technical Information of China (English)

    LIU Yan-fang; ZHANG Jian-hua; L Bo; YANG Xiao-hong; LI Yan-gang; WANG Ye; WANG Jiang-min; ZHANG Hui; GUAN Jun-jiao

    2013-01-01

    Selection of quantitative characteristics, division of their expression ranges, and selection of example varieties are key issues on developing DUS Test Guidelines, which are more crucial for quantitative characteristics since their expressions vary in different degrees. Taking the development of DUS Test Guideline of Ranunculus asiaticus L. as an example, this paper applied statistic-based approaches for the analyses of quantitative characteristics. We selected 9 quantitative characteristics from 18 pre-selected characteristics, based on within-variety uniformity, stability between different growing cycles, and correlation among characteristics, by the analyses of coefficient of variation, paired-samples t-test and partial correlation. The expression ranges of the 9 selected quantitative characteristics were divided into different states using descriptive statistics and distribution frequency of varieties. Eight of the 9 selected quantitative characteristics were categorized as standard characteristics as they showed one peak in distribution frequency of 120 varieties in various expressions of the characteristics, whereas, plant height can be categorized as grouping characteristic since it gave two peaks, and can group the varieties into pot and cut varieties. Finally, box-plot was applied to visually select the example varieties, and varieties 7, 12, and 28 were determined as the example varieties for plant height. The methods described in this paper are effective for the selection of quantitative characteristics, division of expression ranges, and selection of example varieties in Ranunculus asiaticus L. for DUS test, and may also be interest for other plant genera.

  2. 非参数化方法在 DNB 传递分析中的应用%Non-parametric Method Used in DNB Propagation Analysis

    Institute of Scientific and Technical Information of China (English)

    刘俊强; 黄禹

    2014-01-01

    Deciding the internal pressure probability distribution of the fuel rod is a fundamental work in the DNB propagation analysis using Monte Carlo method .The traditional parametric method is used to assume that the internal pressure probability of all rods can be characterized by a normal distribution .But this is not always the case , sometimes there is far more differences between normal distribution and the real one . However ,a new method ,the non-parametric method was used in the treatment of the rod internal pressure data because of its applicability anyw here and good precision in the case of large samples ,and the results show that it is more conservative to use non-parametric method than parametric method in DNB propagation analysis .%采用蒙特卡罗方法进行偏离泡核沸腾(DNB)传递分析中一个最基本的工作是确定燃料棒内压的概率分布。通常假设燃料棒的内压服从正态分布即传统的参数化方法。但燃料棒的内压不总是满足正态分布或与正态分布相差较远。为克服这一不足,本工作采用一种新的方法即非参数化的方法计算燃料棒内压的概率分布。通过对压水堆核电厂燃料棒内压数据的非参数化处理,得到燃料棒内压的概率分布并进行DNB传递分析。由计算结果得出:在DNB传递分析中,相较于参数化方法,采用非参数化方法所得的棒内压概率分布具有普遍适用性及大样本下的良好精度,分析结果更为保守、安全。

  3. Individual Differences in Male Rats in a Behavioral Test Battery: A Multivariate Statistical Approach

    Science.gov (United States)

    Feyissa, Daniel D.; Aher, Yogesh D.; Engidawork, Ephrem; Höger, Harald; Lubec, Gert; Korz, Volker

    2017-01-01

    Animal models for anxiety, depressive-like and cognitive diseases or aging often involve testing of subjects in behavioral test batteries. The large number of test variables with different mean variations and within and between test correlations often constitute a significant problem in determining essential variables to assess behavioral patterns and their variation in individual animals as well as appropriate statistical treatment. Therefore, we applied a multivariate approach (principal component analysis) to analyse the behavioral data of 162 male adult Sprague-Dawley rats that underwent a behavioral test battery including commonly used tests for spatial learning and memory (holeboard) and different behavioral patterns (open field, elevated plus maze, forced swim test) as well as for motor abilities (Rota rod). The high dimensional behavioral results were reduced to fewer components associated with spatial cognition, general activity, anxiety-, and depression-like behavior and motor ability. The loading scores of individual rats on these different components allow an assessment and the distribution of individual features in a population of animals. The reduced number of components can be used also for statistical calculations like appropriate sample sizes for valid discriminations between experimental groups, which otherwise have to be done on each variable. Because the animals were intact, untreated and experimentally naïve the results reflect trait patterns of behavior and thus individuality. The distribution of animals with high or low levels of anxiety, depressive-like behavior, general activity and cognitive features in a local population provides information of the probability of their appeareance in experimental samples and thus may help to avoid biases. However, such an analysis initially requires a large cohort of animals in order to gain a valid assessment.

  4. Selecting variables in non-parametric regression models for binary response. An application to the computerized detection of breast cancer.

    Science.gov (United States)

    Roca-Pardiñas, Javier; Cadarso-Suárez, Carmen; Tahoces, Pablo G; Lado, María J

    2009-01-30

    In many biomedical applications, interest lies in being able to distinguish between two possible states of a given response variable, depending on the values of certain continuous predictors. If the number of predictors, p, is high, or if there is redundancy among them, it then becomes important to decide on the selection of the best subset of predictors that will be able to obtain the models with greatest discrimination capacity. With this aim in mind, logistic generalized additive models were considered and receiver operating characteristic (ROC) curves were applied in order to determine and compare the discriminatory capacity of such models. This study sought to develop bootstrap-based tests that allow for the following to be ascertained: (a) the optimal number q < or = p of predictors; and (b) the model or models including q predictors, which display the largest AUC (area under the ROC curve). A simulation study was conducted to verify the behaviour of these tests. Finally, the proposed method was applied to a computer-aided diagnostic system dedicated to early detection of breast cancer. Copyright (c) 2008 John Wiley & Sons, Ltd.

  5. Statistical auditing and randomness test of lotto k/N-type games

    CERN Document Server

    Coronel-Brizio, H F; Rapallo, Fabio; Scalas, Enrico

    2008-01-01

    One of the most popular lottery games worldwide is the so-called ``lotto k/N''. It considers N numbers 1,2,...,N from which k are drawn randomly, without replacement. A player selects k or more numbers and the first prize is shared amongst those players whose selected numbers match all of the k randomly drawn. Exact rules may vary in different countries. In this paper, mean values and covariances for the random variables representing the numbers drawn from this kind of game are presented, with the aim of using them to audit statistically the consistency of a given sample of historical results with theoretical values coming from a hypergeometric statistical model. The method can be adapted to test pseudorandom number generators.

  6. Statistical test of Duane-Hunt's law and its comparison with an alternative law

    CERN Document Server

    Perkovac, Milan

    2010-01-01

    Using Pearson correlation coefficient a statistical analysis of Duane-Hunt and Kulenkampff's measurement results was performed. This analysis reveals that empirically based Duane-Hunt's law is not entirely consistent with the measurement data. The author has theoretically found the action of electromagnetic oscillators, which corresponds to Planck's constant, and also has found an alternative law based on the classical theory. Using the same statistical method, this alternative law is likewise tested, and it is proved that the alternative law is completely in accordance with the measurements. The alternative law gives a relativistic expression for the energy of electromagnetic wave emitted or absorbed by atoms and proves that the empirically derived Planck-Einstein's expression is only valid for relatively low frequencies. Wave equation, which is similar to the Schr\\"odinger equation, and wavelength of the standing electromagnetic wave are also established by the author's analysis. For a relatively low energy...

  7. Statistical auditing and randomness test of lotto k/N-type games

    Science.gov (United States)

    Coronel-Brizio, H. F.; Hernández-Montoya, A. R.; Rapallo, F.; Scalas, E.

    2008-11-01

    One of the most popular lottery games worldwide is the so-called “lotto k/N”. It considers N numbers 1,2,…,N from which k are drawn randomly, without replacement. A player selects k or more numbers and the first prize is shared amongst those players whose selected numbers match all of the k randomly drawn. Exact rules may vary in different countries. In this paper, mean values and covariances for the random variables representing the numbers drawn from this kind of game are presented, with the aim of using them to audit statistically the consistency of a given sample of historical results with theoretical values coming from a hypergeometric statistical model. The method can be adapted to test pseudorandom number generators.

  8. Mulcom: a multiple comparison statistical test for microarray data in Bioconductor

    Directory of Open Access Journals (Sweden)

    Renzulli Tommaso

    2011-09-01

    Full Text Available Abstract Background Many microarray experiments search for genes with differential expression between a common "reference" group and multiple "test" groups. In such cases currently employed statistical approaches based on t-tests or close derivatives have limited efficacy, mainly because estimation of the standard error is done on only two groups at a time. Alternative approaches based on ANOVA correctly capture within-group variance from all the groups, but then do not confront single test groups with the reference. Ideally, a t-test better suited for this type of data would compare each test group with the reference, but use within-group variance calculated from all the groups. Results We implemented an R-Bioconductor package named Mulcom, with a statistical test derived from the Dunnett's t-test, designed to compare multiple test groups individually against a common reference. Interestingly, the Dunnett's test uses for the denominator of each comparison a within-group standard error aggregated from all the experimental groups. In addition to the basic Dunnett's t value, the package includes an optional minimal fold-change threshold, m. Due to the automated, permutation-based estimation of False Discovery Rate (FDR, the package also permits fast optimization of the test, to obtain the maximum number of significant genes at a given FDR value. When applied to a time-course experiment profiled in parallel on two microarray platforms, and compared with two commonly used tests, Mulcom displayed better concordance of significant genes in the two array platforms (39% vs. 26% or 15%, and higher enrichment in functional annotation to categories related to the biology of the experiment (p value Conclusions The Mulcom package provides a powerful tool for the identification of differentially expressed genes when several experimental conditions are compared against a common reference. The results of the practical example presented here show that lists of

  9. Drug-excipient compatibility testing using a high-throughput approach and statistical design.

    Science.gov (United States)

    Wyttenbach, Nicole; Birringer, Christian; Alsenz, Jochem; Kuentz, Martin

    2005-01-01

    The aim of our research was to develop a miniaturized high throughput drug-excipient compatibility test. Experiments were planned and evaluated using statistical experimental design. Binary mixtures of a drug, acetylsalicylic acid, or fluoxetine hydrochloride, and of excipients commonly used in solid dosage forms were prepared at a ratio of approximately 1:100 in 96-well microtiter plates. Samples were exposed to different temperature (40 degrees C/ 50 degrees C) and humidity (10%/75%) for different time (1 week/4 weeks), and chemical drug degradation was analyzed using a fast gradient high pressure liquid chromatography (HPLC). Categorical statistical design was applied to identify the effects and interactions of time, temperature, humidity, and excipient on drug degradation. Acetylsalicylic acid was least stable in the presence of magnesium stearate, dibasic calcium phosphate, or sodium starch glycolate. Fluoxetine hydrochloride exhibited a marked degradation only with lactose. Factor-interaction plots revealed that the relative humidity had the strongest effect on the drug excipient blends tested. In conclusion, the developed technique enables fast drug-excipient compatibility testing and identification of interactions. Since only 0.1 mg of drug is needed per data point, fast rational preselection of the pharmaceutical additives can be performed early in solid dosage form development.

  10. SNSequate: Standard and Nonstandard Statistical Models and Methods for Test Equating

    Directory of Open Access Journals (Sweden)

    Jorge González

    2014-09-01

    Full Text Available Equating is a family of statistical models and methods that are used to adjust scores on two or more versions of a test, so that the scores from different tests may be used interchangeably. In this paper we present the R package SNSequate which implements both standard and nonstandard statistical models and methods for test equating. The package construction was motivated by the need of having a modular, simple, yet comprehensive, and general software that carries out traditional and new equating methods. SNSequate currently implements the traditional mean, linear and equipercentile equating methods, as well as the mean-mean, mean-sigma, Haebara and Stocking-Lord item response theory linking methods. It also supports the newest methods such as local equating, kernel equating, and item response theory parameter linking methods based on asymmetric item characteristic functions. Practical examples are given to illustrate the capabilities of the software. A list of other programs for equating is presented, highlighting the main differences between them. Future directions for the package are also discussed.

  11. A Statistical Testing Approach for Quantifying Software Reliability; Application to an Example System

    Energy Technology Data Exchange (ETDEWEB)

    Chu, Tsong-Lun [Brookhaven National Lab. (BNL), Upton, NY (United States); Varuttamaseni, Athi [Brookhaven National Lab. (BNL), Upton, NY (United States); Baek, Joo-Seok [Brookhaven National Lab. (BNL), Upton, NY (United States)

    2016-11-01

    The U.S. Nuclear Regulatory Commission (NRC) encourages the use of probabilistic risk assessment (PRA) technology in all regulatory matters, to the extent supported by the state-of-the-art in PRA methods and data. Although much has been accomplished in the area of risk-informed regulation, risk assessment for digital systems has not been fully developed. The NRC established a plan for research on digital systems to identify and develop methods, analytical tools, and regulatory guidance for (1) including models of digital systems in the PRAs of nuclear power plants (NPPs), and (2) incorporating digital systems in the NRC's risk-informed licensing and oversight activities. Under NRC's sponsorship, Brookhaven National Laboratory (BNL) explored approaches for addressing the failures of digital instrumentation and control (I and C) systems in the current NPP PRA framework. Specific areas investigated included PRA modeling digital hardware, development of a philosophical basis for defining software failure, and identification of desirable attributes of quantitative software reliability methods. Based on the earlier research, statistical testing is considered a promising method for quantifying software reliability. This paper describes a statistical software testing approach for quantifying software reliability and applies it to the loop-operating control system (LOCS) of an experimental loop of the Advanced Test Reactor (ATR) at Idaho National Laboratory (INL).

  12. Using Relative Statistics and Approximate Disease Prevalence to Compare Screening Tests.

    Science.gov (United States)

    Samuelson, Frank; Abbey, Craig

    2016-11-01

    Schatzkin et al. and other authors demonstrated that the ratios of some conditional statistics such as the true positive fraction are equal to the ratios of unconditional statistics, such as disease detection rates, and therefore we can calculate these ratios between two screening tests on the same population even if negative test patients are not followed with a reference procedure and the true and false negative rates are unknown. We demonstrate that this same property applies to an expected utility metric. We also demonstrate that while simple estimates of relative specificities and relative areas under ROC curves (AUC) do depend on the unknown negative rates, we can write these ratios in terms of disease prevalence, and the dependence of these ratios on a posited prevalence is often weak particularly if that prevalence is small or the performance of the two screening tests is similar. Therefore we can estimate relative specificity or AUC with little loss of accuracy, if we use an approximate value of disease prevalence.

  13. A novel non-parametric method for uncertainty evaluation of correlation-based molecular signatures: its application on PAM50 algorithm.

    Science.gov (United States)

    Fresno, Cristóbal; González, Germán Alexis; Merino, Gabriela Alejandra; Flesia, Ana Georgina; Podhajcer, Osvaldo Luis; Llera, Andrea Sabina; Fernández, Elmer Andrés

    2017-03-01

    The PAM50 classifier is used to assign patients to the highest correlated breast cancer subtype irrespectively of the obtained value. Nonetheless, all subtype correlations are required to build the risk of recurrence (ROR) score, currently used in therapeutic decisions. Present subtype uncertainty estimations are not accurate, seldom considered or require a population-based approach for this context. Here we present a novel single-subject non-parametric uncertainty estimation based on PAM50's gene label permutations. Simulations results ( n  = 5228) showed that only 61% subjects can be reliably 'Assigned' to the PAM50 subtype, whereas 33% should be 'Not Assigned' (NA), leaving the rest to tight 'Ambiguous' correlations between subtypes. The NA subjects exclusion from the analysis improved survival subtype curves discrimination yielding a higher proportion of low and high ROR values. Conversely, all NA subjects showed similar survival behaviour regardless of the original PAM50 assignment. We propose to incorporate our PAM50 uncertainty estimation to support therapeutic decisions. Source code can be found in 'pbcmc' R package at Bioconductor. cristobalfresno@gmail.com or efernandez@bdmg.com.ar. Supplementary data are available at Bioinformatics online.

  14. A Critical Look at the Mass-Metallicity-SFR Relation in the Local Universe: Non-parametric Analysis Framework and Confounding Systematics

    CERN Document Server

    Salim, Samir; Ly, Chun; Brinchmann, Jarle; Davé, Romeel; Dickinson, Mark; Salzer, John J; Charlot, Stéphane

    2014-01-01

    It has been proposed that the mass-metallicity relation of galaxies exhibits a secondary dependence on star formation rate (SFR), and that the resulting M-Z-SFR relation may be redshift-invariant, i.e., "fundamental." However, conflicting results on the character of the SFR dependence, and whether it exists, have been reported. To gain insight into the origins of the conflicting results, we (a) devise a non-parametric, astrophysically-motivated analysis framework based on the offset from the star-forming ("main") sequence at a given stellar mass (relative specific SFR), (b) apply this methodology and perform a comprehensive re-analysis of the local M-Z-SFR relation, based on SDSS, GALEX, and WISE data, and (c) study the impact of sample selection, and of using different metallicity and SFR indicators. We show that metallicity is anti-correlated with specific SFR regardless of the indicators used. We do not find that the relation is spurious due to correlations arising from biased metallicity measurements, or ...

  15. Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees

    Directory of Open Access Journals (Sweden)

    von Reumont Björn M

    2010-03-01

    Full Text Available Abstract Background Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. Results ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Conclusions Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment

  16. SPECIES-SPECIFIC FOREST VARIABLE ESTIMATION USING NON-PARAMETRIC MODELING OF MULTI-SPECTRAL PHOTOGRAMMETRIC POINT CLOUD DATA

    Directory of Open Access Journals (Sweden)

    J. Bohlin

    2012-07-01

    Full Text Available The recent development in software for automatic photogrammetric processing of multispectral aerial imagery, and the growing nation-wide availability of Digital Elevation Model (DEM data, are about to revolutionize data capture for forest management planning in Scandinavia. Using only already available aerial imagery and ALS-assessed DEM data, raster estimates of the forest variables mean tree height, basal area, total stem volume, and species-specific stem volumes were produced and evaluated. The study was conducted at a coniferous hemi-boreal test site in southern Sweden (lat. 58° N, long. 13° E. Digital aerial images from the Zeiss/Intergraph Digital Mapping Camera system were used to produce 3D point-cloud data with spectral information. Metrics were calculated for 696 field plots (10 m radius from point-cloud data and used in k-MSN to estimate forest variables. For these stands, the tree height ranged from 1.4 to 33.0 m (18.1 m mean, stem volume from 0 to 829 m3 ha-1 (249 m3 ha-1 mean and basal area from 0 to 62.2 m2 ha-1 (26.1 m2 ha-1 mean, with mean stand size of 2.8 ha. Estimates made using digital aerial images corresponding to the standard acquisition of the Swedish National Land Survey (Lantmäteriet showed RMSEs (in percent of the surveyed stand mean of 7.5% for tree height, 11.4% for basal area, 13.2% for total stem volume, 90.6% for pine stem volume, 26.4 for spruce stem volume, and 72.6% for deciduous stem volume. The results imply that photogrammetric matching of digital aerial images has significant potential for operational use in forestry.

  17. Change detection in a time series of polarimetric SAR data by an omnibus test statistic and its factorization

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg; Conradsen, Knut; Skriver, Henning

    2016-01-01

    Based on an omnibus likelihood ratio test statistic for the equality of several variance-covariance matrices following the complex Wishart distribution with an associated p-value and a factorization of this test statistic, change analysis in a short sequence of multilook, polarimetric SAR data...

  18. An omnibus likelihood test statistic and its factorization for change detection in time series of polarimetric SAR data

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg; Conradsen, Knut; Skriver, Henning

    2016-01-01

    Based on an omnibus likelihood ratio test statistic for the equality of several variance-covariance matrices following the complex Wishart distribution with an associated p-value and a factorization of this test statistic, change analysis in a short sequence of multilook, polarimetric SAR data...

  19. A statistical test on the reliability of the non-coevality of stars in binary systems

    CERN Document Server

    Valle, G; Moroni, P G Prada; Degl'Innocenti, S

    2016-01-01

    We develop a statistical test on the expected difference in age estimates of two coeval stars in detached double-lined eclipsing binary systems that are only caused by observational uncertainties. We focus on stars in the mass range [0.8; 1.6] Msun, and on stars in the main-sequence phase. The ages were obtained by means of the maximum-likelihood SCEPtER technique. The observational constraints used in the recovery procedure are stellar mass, radius, effective temperature, and metallicity [Fe/H]. We defined the statistic W computed as the ratio of the absolute difference of estimated ages for the two stars over the age of the older one. We determined the critical values of this statistics above which coevality can be rejected. The median expected difference in the reconstructed age between the coeval stars of a binary system -- caused alone by the observational uncertainties -- shows a strong dependence on the evolutionary stage. This ranges from about 20% for an evolved primary star to about 75% for a near Z...

  20. Application of a generalized likelihood ratio test statistic to MAGIC data

    CERN Document Server

    Klepser, S; 10.1063/1.4772359

    2012-01-01

    The commonly used detection test statistic for Cherenkov telescope data is Li & Ma (1983), Eq. 17. It evaluates the compatibility of event counts in an on-source region with those in a representative off-region. It does not exploit the typically known gamma-ray point spread function (PSF) of a system, and in practice its application requires either assumptions on the symmetry of the acceptance across the field of view, orMonte Carlo simulations.MAGIC has an azimuth-dependent, asymmetric acceptance which required a careful review of detection statistics. Besides an adapted Li & Ma based technique, the recently presented generalized LRT statistic of [1] is now in use. It is more flexible, more sensitive and less systematics-affected, because it is highly customized for multi-pointing Cherenkov telescope data with a known PSF. We present the application of this new method to archival MAGIC data and compare it to the other, Li&Ma-based method.