WorldWideScience

Sample records for non-parametric statistics univariate

  1. Using Mathematica to build Non-parametric Statistical Tables

    Directory of Open Access Journals (Sweden)

    Gloria Perez Sainz de Rozas

    2003-01-01

    Full Text Available In this paper, I present computational procedures to obtian statistical tables. The tables of the asymptotic distribution and the exact distribution of Kolmogorov-Smirnov statistic Dn for one population, the table of the distribution of the runs R, the table of the distribution of Wilcoxon signed-rank statistic W+ and the table of the distribution of Mann-Whitney statistic Ux using Mathematica, Version 3.9 under Window98. I think that it is an interesting cuestion because many statistical packages give the asymptotic significance level in the statistical tests and with these porcedures one can easily calculate the exact significance levels and the left-tail and right-tail probabilities with non-parametric distributions. I have used mathematica to make these calculations because one can use symbolic language to solve recursion relations. It's very easy to generate the format of the tables, and it's possible to obtain any table of the mentioned non-parametric distributions with any precision, not only with the standard parameters more used in Statistics, and without transcription mistakes. Furthermore, using similar procedures, we can generate tables for the following distribution functions: Binomial, Poisson, Hypergeometric, Normal, x2 Chi-Square, T-Student, F-Snedecor, Geometric, Gamma and Beta.

  2. Biological parametric mapping with robust and non-parametric statistics.

    Science.gov (United States)

    Yang, Xue; Beason-Held, Lori; Resnick, Susan M; Landman, Bennett A

    2011-07-15

    Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, regions of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrices. Recently, biological parametric mapping has extended the widely popular statistical parametric mapping approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and non-parametric regression in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provide a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities. Copyright © 2011 Elsevier Inc. All rights reserved.

  3. Statistic Non-Parametric Methods of Measurement and Interpretation of Existing Statistic Connections within Seaside Hydro Tourism

    OpenAIRE

    MIRELA SECARĂ

    2008-01-01

    Tourism represents an important field of economic and social life in our country, and the main sector of the economy of Constanta County is the balneary touristic capitalization of Romanian seaside. In order to statistically analyze hydro tourism on Romanian seaside, we have applied non-parametric methods of measuring and interpretation of existing statistic connections within seaside hydro tourism. Major objective of this research is represented by hydro tourism re-establishment on Romanian ...

  4. Non-parametric Estimation approach in statistical investigation of nuclear spectra

    CERN Document Server

    Jafarizadeh, M A; Sabri, H; Maleki, B Rashidian

    2011-01-01

    In this paper, Kernel Density Estimation (KDE) as a non-parametric estimation method is used to investigate statistical properties of nuclear spectra. The deviation to regular or chaotic dynamics, is exhibited by closer distances to Poisson or Wigner limits respectively which evaluated by Kullback-Leibler Divergence (KLD) measure. Spectral statistics of different sequences prepared by nuclei corresponds to three dynamical symmetry limits of Interaction Boson Model(IBM), oblate and prolate nuclei and also the pairing effect on nuclear level statistics are analyzed (with pure experimental data). KD-based estimated density function, confirm previous predictions with minimum uncertainty (evaluated with Integrate Absolute Error (IAE)) in compare to Maximum Likelihood (ML)-based method. Also, the increasing of regularity degrees of spectra due to pairing effect is reveal.

  5. Patterns of trunk muscle activation during walking and pole walking using statistical non-parametric mapping.

    Science.gov (United States)

    Zoffoli, Luca; Ditroilo, Massimiliano; Federici, Ario; Lucertini, Francesco

    2017-09-09

    This study used surface electromyography (EMG) to investigate the regions and patterns of activity of the external oblique (EO), erector spinae longissimus (ES), multifidus (MU) and rectus abdominis (RA) muscles during walking (W) and pole walking (PW) performed at different speeds and grades. Eighteen healthy adults undertook W and PW on a motorized treadmill at 60% and 100% of their walk-to-run preferred transition speed at 0% and 7% treadmill grade. The Teager-Kaiser energy operator was employed to improve the muscle activity detection and statistical non-parametric mapping based on paired t-tests was used to highlight statistical differences in the EMG patterns corresponding to different trials. The activation amplitude of all trunk muscles increased at high speed, while no differences were recorded at 7% treadmill grade. ES and MU appeared to support the upper body at the heel-strike during both W and PW, with the latter resulting in elevated recruitment of EO and RA as required to control for the longer stride and the push of the pole. Accordingly, the greater activity of the abdominal muscles and the comparable intervention of the spine extensors supports the use of poles by walkers seeking higher engagement of the lower trunk region. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. A non-parametric statistical test to compare clusters with applications in functional magnetic resonance imaging data.

    Science.gov (United States)

    Fujita, André; Takahashi, Daniel Y; Patriota, Alexandre G; Sato, João R

    2014-12-10

    Statistical inference of functional magnetic resonance imaging (fMRI) data is an important tool in neuroscience investigation. One major hypothesis in neuroscience is that the presence or not of a psychiatric disorder can be explained by the differences in how neurons cluster in the brain. Therefore, it is of interest to verify whether the properties of the clusters change between groups of patients and controls. The usual method to show group differences in brain imaging is to carry out a voxel-wise univariate analysis for a difference between the mean group responses using an appropriate test and to assemble the resulting 'significantly different voxels' into clusters, testing again at cluster level. In this approach, of course, the primary voxel-level test is blind to any cluster structure. Direct assessments of differences between groups at the cluster level seem to be missing in brain imaging. For this reason, we introduce a novel non-parametric statistical test called analysis of cluster structure variability (ANOCVA), which statistically tests whether two or more populations are equally clustered. The proposed method allows us to compare the clustering structure of multiple groups simultaneously and also to identify features that contribute to the differential clustering. We illustrate the performance of ANOCVA through simulations and an application to an fMRI dataset composed of children with attention deficit hyperactivity disorder (ADHD) and controls. Results show that there are several differences in the clustering structure of the brain between them. Furthermore, we identify some brain regions previously not described to be involved in the ADHD pathophysiology, generating new hypotheses to be tested. The proposed method is general enough to be applied to other types of datasets, not limited to fMRI, where comparison of clustering structures is of interest. Copyright © 2014 John Wiley & Sons, Ltd.

  7. COLOR IMAGE RETRIEVAL BASED ON NON-PARAMETRIC STATISTICAL TESTS OF HYPOTHESIS

    Directory of Open Access Journals (Sweden)

    R. Shekhar

    2016-09-01

    Full Text Available A novel method for color image retrieval, based on statistical non-parametric tests such as twosample Wald Test for equality of variance and Man-Whitney U test, is proposed in this paper. The proposed method tests the deviation, i.e. distance in terms of variance between the query and target images; if the images pass the test, then it is proceeded to test the spectrum of energy, i.e. distance between the mean values of the two images; otherwise, the test is dropped. If the query and target images pass the tests then it is inferred that the two images belong to the same class, i.e. both the images are same; otherwise, it is assumed that the images belong to different classes, i.e. both images are different. The proposed method is robust for scaling and rotation, since it adjusts itself and treats either the query image or the target image is the sample of other.

  8. t-tests, non-parametric tests, and large studies—a paradox of statistical practice?

    Directory of Open Access Journals (Sweden)

    Fagerland Morten W

    2012-06-01

    Full Text Available Abstract Background During the last 30 years, the median sample size of research studies published in high-impact medical journals has increased manyfold, while the use of non-parametric tests has increased at the expense of t-tests. This paper explores this paradoxical practice and illustrates its consequences. Methods A simulation study is used to compare the rejection rates of the Wilcoxon-Mann-Whitney (WMW test and the two-sample t-test for increasing sample size. Samples are drawn from skewed distributions with equal means and medians but with a small difference in spread. A hypothetical case study is used for illustration and motivation. Results The WMW test produces, on average, smaller p-values than the t-test. This discrepancy increases with increasing sample size, skewness, and difference in spread. For heavily skewed data, the proportion of p Conclusions Non-parametric tests are most useful for small studies. Using non-parametric tests in large studies may provide answers to the wrong question, thus confusing readers. For studies with a large sample size, t-tests and their corresponding confidence intervals can and should be used even for heavily skewed data.

  9. The application of non-parametric statistical techniques to an ALARA programme.

    Science.gov (United States)

    Moon, J H; Cho, Y H; Kang, C S

    2001-01-01

    For the cost-effective reduction of occupational radiation dose (ORD) at nuclear power plants, it is necessary to identify what are the processes of repetitive high ORD during maintenance and repair operations. To identify the processes, the point values such as mean and median are generally used, but they sometimes lead to misjudgment since they cannot show other important characteristics such as dose distributions and frequencies of radiation jobs. As an alternative, the non-parametric analysis method is proposed, which effectively identifies the processes of repetitive high ORD. As a case study, the method is applied to ORD data of maintenance and repair processes at Kori Units 3 and 4 that are pressurised water reactors with 950 MWe capacity and have been operating since 1986 and 1987 respectively, in Korea and the method is demonstrated to be an efficient way of analysing the data.

  10. Inferential, non-parametric statistics to assess the quality of probabilistic forecast systems

    NARCIS (Netherlands)

    Maia, A.H.N.; Meinke, H.B.; Lennox, S.; Stone, R.C.

    2007-01-01

    Many statistical forecast systems are available to interested users. To be useful for decision making, these systems must be based on evidence of underlying mechanisms. Once causal connections between the mechanism and its statistical manifestation have been firmly established, the forecasts must al

  11. Inferential, non-parametric statistics to assess the quality of probabilistic forecast systems

    NARCIS (Netherlands)

    Maia, A.H.N.; Meinke, H.B.; Lennox, S.; Stone, R.C.

    2007-01-01

    Many statistical forecast systems are available to interested users. To be useful for decision making, these systems must be based on evidence of underlying mechanisms. Once causal connections between the mechanism and its statistical manifestation have been firmly established, the forecasts must al

  12. Technical Topic 3.2.2.d Bayesian and Non-Parametric Statistics: Integration of Neural Networks with Bayesian Networks for Data Fusion and Predictive Modeling

    Science.gov (United States)

    2016-05-31

    Distribution Unlimited UU UU UU UU 31-05-2016 15-Apr-2014 14-Jan-2015 Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics...of Papers published in non peer-reviewed journals: Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics: Integration of Neural...Transfer N/A Number of graduating undergraduates who achieved a 3.5 GPA to 4.0 (4.0 max scale ): Number of graduating undergraduates funded by a DoD funded

  13. Non-parametric group-level statistics for source-resolved ERP analysis.

    Science.gov (United States)

    Lee, Clement; Miyakoshi, Makoto; Delorme, Arnaud; Cauwenberghs, Gert; Makeig, Scott

    2015-01-01

    We have developed a new statistical framework for group-level event-related potential (ERP) analysis in EEGLAB. The framework calculates the variance of scalp channel signals accounted for by the activity of homogeneous clusters of sources found by independent component analysis (ICA). When ICA data decomposition is performed on each subject's data separately, functionally equivalent ICs can be grouped into EEGLAB clusters. Here, we report a new addition (statPvaf) to the EEGLAB plug-in std_envtopo to enable inferential statistics on main effects and interactions in event related potentials (ERPs) of independent component (IC) processes at the group level. We demonstrate the use of the updated plug-in on simulated and actual EEG data.

  14. A Java program for non-parametric statistic comparison of community structure

    Directory of Open Access Journals (Sweden)

    WenJun Zhang

    2011-09-01

    Full Text Available The Java algorithm to statistically compare structure difference of two communities was presented in this study. Euclidean distance, Manhattan distance, Pearson correlation, Point correlation, quadratic correlation and Jaccard coefficient were included in the algorithm. The algorithm was used to compare rice arthropod communities in Pearl River Delta, China, and the results showed that the family composition of arthropods for Guangzhou, Zhongshan, Zhuhai, and Dongguan are not significantly different.

  15. Non-parametric Bayesian mixture of sparse regressions with application towards feature selection for statistical downscaling

    Directory of Open Access Journals (Sweden)

    D. Das

    2014-04-01

    Full Text Available Climate projections simulated by Global Climate Models (GCM are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often precludes their application towards accurately assessing the effects of climate change on finer regional scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods are often performed for regional climate projections. Statistical downscaling (SD is based on the understanding that the regional climate is influenced by two factors – the large scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model which relates these features (predictors to a climatic variable of interest (predictand based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP, for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence relatively more generalizable than non-sparse alternatives, and lends to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling shows our method can lead to new insights.

  16. Spatial Modeling of Rainfall Patterns over the Ebro River Basin Using Multifractality and Non-Parametric Statistical Techniques

    Directory of Open Access Journals (Sweden)

    José L. Valencia

    2015-11-01

    Full Text Available Rainfall, one of the most important climate variables, is commonly studied due to its great heterogeneity, which occasionally causes negative economic, social, and environmental consequences. Modeling the spatial distributions of rainfall patterns over watersheds has become a major challenge for water resources management. Multifractal analysis can be used to reproduce the scale invariance and intermittency of rainfall processes. To identify which factors are the most influential on the variability of multifractal parameters and, consequently, on the spatial distribution of rainfall patterns for different time scales in this study, universal multifractal (UM analysis—C1, α, and γs UM parameters—was combined with non-parametric statistical techniques that allow spatial-temporal comparisons of distributions by gradients. The proposed combined approach was applied to a daily rainfall dataset of 132 time-series from 1931 to 2009, homogeneously spatially-distributed across a 25 km × 25 km grid covering the Ebro River Basin. A homogeneous increase in C1 over the watershed and a decrease in α mainly in the western regions, were detected, suggesting an increase in the frequency of dry periods at different scales and an increase in the occurrence of rainfall process variability over the last decades.

  17. When the Single Matters more than the Group (II): Addressing the Problem of High False Positive Rates in Single Case Voxel Based Morphometry Using Non-parametric Statistics.

    Science.gov (United States)

    Scarpazza, Cristina; Nichols, Thomas E; Seramondi, Donato; Maumet, Camille; Sartori, Giuseppe; Mechelli, Andrea

    2016-01-01

    In recent years, an increasing number of studies have used Voxel Based Morphometry (VBM) to compare a single patient with a psychiatric or neurological condition of interest against a group of healthy controls. However, the validity of this approach critically relies on the assumption that the single patient is drawn from a hypothetical population with a normal distribution and variance equal to that of the control group. In a previous investigation, we demonstrated that family-wise false positive error rate (i.e., the proportion of statistical comparisons yielding at least one false positive) in single case VBM are much higher than expected (Scarpazza et al., 2013). Here, we examine whether the use of non-parametric statistics, which does not rely on the assumptions of normal distribution and equal variance, would enable the investigation of single subjects with good control of false positive risk. We empirically estimated false positive rates (FPRs) in single case non-parametric VBM, by performing 400 statistical comparisons between a single disease-free individual and a group of 100 disease-free controls. The impact of smoothing (4, 8, and 12 mm) and type of pre-processing (Modulated, Unmodulated) was also examined, as these factors have been found to influence FPRs in previous investigations using parametric statistics. The 400 statistical comparisons were repeated using two independent, freely available data sets in order to maximize the generalizability of the results. We found that the family-wise error rate was 5% for increases and 3.6% for decreases in one data set; and 5.6% for increases and 6.3% for decreases in the other data set (5% nominal). Further, these results were not dependent on the level of smoothing and modulation. Therefore, the present study provides empirical evidence that single case VBM studies with non-parametric statistics are not susceptible to high false positive rates. The critical implication of this finding is that VBM can be used

  18. An exercise in model validation: Comparing univariate statistics and Monte Carlo-based multivariate statistics

    Energy Technology Data Exchange (ETDEWEB)

    Weathers, J.B. [Shock, Noise, and Vibration Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: James.Weathers@ngc.com; Luck, R. [Department of Mechanical Engineering, Mississippi State University, 210 Carpenter Engineering Building, P.O. Box ME, Mississippi State, MS 39762-5925 (United States)], E-mail: Luck@me.msstate.edu; Weathers, J.W. [Structural Analysis Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: Jeffrey.Weathers@ngc.com

    2009-11-15

    The complexity of mathematical models used by practicing engineers is increasing due to the growing availability of sophisticated mathematical modeling tools and ever-improving computational power. For this reason, the need to define a well-structured process for validating these models against experimental results has become a pressing issue in the engineering community. This validation process is partially characterized by the uncertainties associated with the modeling effort as well as the experimental results. The net impact of the uncertainties on the validation effort is assessed through the 'noise level of the validation procedure', which can be defined as an estimate of the 95% confidence uncertainty bounds for the comparison error between actual experimental results and model-based predictions of the same quantities of interest. Although general descriptions associated with the construction of the noise level using multivariate statistics exists in the literature, a detailed procedure outlining how to account for the systematic and random uncertainties is not available. In this paper, the methodology used to derive the covariance matrix associated with the multivariate normal pdf based on random and systematic uncertainties is examined, and a procedure used to estimate this covariance matrix using Monte Carlo analysis is presented. The covariance matrices are then used to construct approximate 95% confidence constant probability contours associated with comparison error results for a practical example. In addition, the example is used to show the drawbacks of using a first-order sensitivity analysis when nonlinear local sensitivity coefficients exist. Finally, the example is used to show the connection between the noise level of the validation exercise calculated using multivariate and univariate statistics.

  19. Characterizing Ipomopsis rubra (Polemoniaceae) germination under various thermal scenarios with non-parametric and semi-parametric statistical methods.

    Science.gov (United States)

    Pérez, Hector E; Kettner, Keith

    2013-10-01

    Time-to-event analysis represents a collection of relatively new, flexible, and robust statistical techniques for investigating the incidence and timing of transitions from one discrete condition to another. Plant biology is replete with examples of such transitions occurring from the cellular to population levels. However, application of these statistical methods has been rare in botanical research. Here, we demonstrate the use of non- and semi-parametric time-to-event and categorical data analyses to address questions regarding seed to seedling transitions of Ipomopsis rubra propagules exposed to various doses of constant or simulated seasonal diel temperatures. Seeds were capable of germinating rapidly to >90 % at 15-25 or 22/11-29/19 °C. Optimum temperatures for germination occurred at 25 or 29/19 °C. Germination was inhibited and seed viability decreased at temperatures ≥30 or 33/24 °C. Kaplan-Meier estimates of survivor functions indicated highly significant differences in temporal germination patterns for seeds exposed to fluctuating or constant temperatures. Extended Cox regression models specified an inverse relationship between temperature and the hazard of germination. Moreover, temperature and the temperature × day interaction had significant effects on germination response. Comparisons to reference temperatures and linear contrasts suggest that summer temperatures (33/24 °C) play a significant role in differential germination responses. Similarly, simple and complex comparisons revealed that the effects of elevated temperatures predominate in terms of components of seed viability. In summary, the application of non- and semi-parametric analyses provides appropriate, powerful data analysis procedures to address various topics in seed biology and more widespread use is encouraged.

  20. Non-parametric asymptotic statistics for the Palm mark distribution of \\beta-mixing marked point processes

    CERN Document Server

    Heinrich, Lothar; Schmidt, Volker

    2012-01-01

    We consider spatially homogeneous marked point patterns in an unboundedly expanding convex sampling window. Our main objective is to identify the distribution of the typical mark by constructing an asymptotic \\chi^2-goodness-of-fit test. The corresponding test statistic is based on a natural empirical version of the Palm mark distribution and a smoothed covariance estimator which turns out to be mean-square consistent. Our approach does not require independent marks and allows dependences between the mark field and the point pattern. Instead we impose a suitable \\beta-mixing condition on the underlying stationary marked point process which can be checked for a number of Poisson-based models and, in particular, in the case of geostatistical marking. Our method needs a central limit theorem for \\beta-mixing random fields which is proved by extending Bernstein's blocking technique to non-cubic index sets and seems to be of interest in its own right. By large-scale model-based simulations the performance of our t...

  1. A simple 2D non-parametric resampling statistical approach to assess confidence in species identification in DNA barcoding--an alternative to likelihood and bayesian approaches.

    Science.gov (United States)

    Jin, Qian; He, Li-Jun; Zhang, Ai-Bing

    2012-01-01

    In the recent worldwide campaign for the global biodiversity inventory via DNA barcoding, a simple and easily used measure of confidence for assigning sequences to species in DNA barcoding has not been established so far, although the likelihood ratio test and the bayesian approach had been proposed to address this issue from a statistical point of view. The TDR (Two Dimensional non-parametric Resampling) measure newly proposed in this study offers users a simple and easy approach to evaluate the confidence of species membership in DNA barcoding projects. We assessed the validity and robustness of the TDR approach using datasets simulated under coalescent models, and an empirical dataset, and found that TDR measure is very robust in assessing species membership of DNA barcoding. In contrast to the likelihood ratio test and bayesian approach, the TDR method stands out due to simplicity in both concepts and calculations, with little in the way of restrictive population genetic assumptions. To implement this approach we have developed a computer program package (TDR1.0beta) freely available from ftp://202.204.209.200/education/video/TDR1.0beta.rar.

  2. A comparison of statistical selection strategies for univariate and bivariate log-linear models.

    Science.gov (United States)

    Moses, Tim; Holland, Paul W

    2010-11-01

    In this study, eight statistical selection strategies were evaluated for selecting the parameterizations of log-linear models used to model the distributions of psychometric tests. The selection strategies included significance tests based on four chi-squared statistics (likelihood ratio, Pearson, Freeman-Tukey, and Cressie-Read) and four additional strategies (Akaike information criterion (AIC), Bayesian information criterion (BIC), consistent Akaike information criterion (CAIC), and a measure attributed to Goodman). The strategies were evaluated in simulations for different log-linear models of univariate and bivariate test-score distributions and two sample sizes. Results showed that all eight selection strategies were most accurate for the largest sample size considered. For univariate distributions, the AIC selection strategy was especially accurate for selecting the correct parameterization of a complex log-linear model and the likelihood ratio chi-squared selection strategy was the most accurate strategy for selecting the correct parameterization of a relatively simple log-linear model. For bivariate distributions, the likelihood ratio chi-squared, Freeman-Tukey chi-squared, BIC, and CAIC selection strategies had similarly high selection accuracies.

  3. Non Parametric Statistical Analysis Research on College Students' Math Anxiety Generation Factors%大学生数学焦虑产生因素的非参数统计分析

    Institute of Scientific and Technical Information of China (English)

    范大付; 李春红

    2012-01-01

    The non-parametric statistics is a test method which does not involve the general parameter and does not depend on the distribution. By using the non-parametric statistics for analyzing and researching the factors of college students' math anxiety, we try to solve the negative effect for studying from math anxiety, and increase the academic achievement of the college students.%采用非参数统计方法中的Wilconxon秩和检验、Friedman检验、Mann-WhitneyU检验对大学生数学焦虑的5个主要影响因素进行了定量分析与评价,获得了数学焦虑产生因素的相关非参数统计结果,为解决数学焦虑所带来的学习负效应提供参考。

  4. 统计软件R在非参数统计教学中的应用%Application of Statistical Software R in the Teaching of Non-Parametric Statistics

    Institute of Scientific and Technical Information of China (English)

    王志刚; 冯利英; 刘勇

    2012-01-01

    Introduces the applieation of statistical software R in the teaching of non-parametric statistic's, which is an important branch of statistics. In particular, describes the using of software R in ex- ploratory data analysis, inferential statistics and stochastic, simulation in details. The flexihle, open-sourc, e characteristics of software R makes the data processing more efficient. This soft- ware can realize all the methods of the teaching process, and is convenient fi~r learners to opti- mize and improve based on the previous work. R software is suitable for teaching of the non- parametric statistics.%主要介绍统计软件R在统计中一个重要分支非参数统计中的应用.分别从探索性数据分析、推断统计、随机模拟三个角度介绍R软件的应用。从介绍可以看出R软件的灵活、开源的特性,使得数据处理变得更加高效、得心应手。能够通过软件实现教学环节中的所有方法,并且方便学习者在前人工作基础上对方法进行优化、改进,在非参数统计教学中选用R软件是适合的。

  5. Non-Parametric Inference in Astrophysics

    CERN Document Server

    Wasserman, L H; Nichol, R C; Genovese, C; Jang, W; Connolly, A J; Moore, A W; Schneider, J; Wasserman, Larry; Miller, Christopher J.; Nichol, Robert C.; Genovese, Chris; Jang, Woncheol; Connolly, Andrew J.; Moore, Andrew W.; Schneider, Jeff; group, the PICA

    2001-01-01

    We discuss non-parametric density estimation and regression for astrophysics problems. In particular, we show how to compute non-parametric confidence intervals for the location and size of peaks of a function. We illustrate these ideas with recent data on the Cosmic Microwave Background. We also briefly discuss non-parametric Bayesian inference.

  6. Development of an univariate method for predicting traffic behaviour in wireless networks through statistical models

    Directory of Open Access Journals (Sweden)

    Jorge E Salamanca Céspedes

    2015-02-01

    Full Text Available Today has shown that modern traffic in data networks is highly correlated, making it necessary to select this kind of models that capture autocorrelation characteristics governing data flows surrounding on the network [1]. Being able to perform accurate forecasting of traffic on communication networks, this has great importance at present, since it influences decisions as important such as network sizing and predestination. The main purpose in this paper is to put into context the reader about the importance of statistical models of time series, it enable for estimating future traffic forecasts in modern communications networks, and becomes an essential tool for traffic prediction, This prediction according to the individual needs of each network are listed in estimates with long range dependence (LDR and short-range dependence (SDR, each one providing a specific control, appropriate and efficient integrated at different levels of the network functional hierarchy [2]. But for the traffic forecasts in the modern communication networks must define the type of network to study and time series model that fits the same, which is why you should first select the type of network. For this case study, is a Wi-Fi network as the traffic behavior requires the development of a time series model with advanced statistics, that allows an integrated observing network and thus provide a tool to facilitate the monitoring and management of the same. According to this the type of time series model to use for this case are the ARIMA time series.

  7. Univariate description and bivariate statistical inference: the first step delving into data.

    Science.gov (United States)

    Zhang, Zhongheng

    2016-03-01

    In observational studies, the first step is usually to explore data distribution and the baseline differences between groups. Data description includes their central tendency (e.g., mean, median, and mode) and dispersion (e.g., standard deviation, range, interquartile range). There are varieties of bivariate statistical inference methods such as Student's t-test, Mann-Whitney U test and Chi-square test, for normal, skews and categorical data, respectively. The article shows how to perform these analyses with R codes. Furthermore, I believe that the automation of the whole workflow is of paramount importance in that (I) it allows for others to repeat your results; (II) you can easily find out how you performed analysis during revision; (III) it spares data input by hand and is less error-prone; and (IV) when you correct your original dataset, the final result can be automatically corrected by executing the codes. Therefore, the process of making a publication quality table incorporating all abovementioned statistics and P values is provided, allowing readers to customize these codes to their own needs.

  8. Non-parametric partitioning of SAR images

    Science.gov (United States)

    Delyon, G.; Galland, F.; Réfrégier, Ph.

    2006-09-01

    We describe and analyse a generalization of a parametric segmentation technique adapted to Gamma distributed SAR images to a simple non parametric noise model. The partition is obtained by minimizing the stochastic complexity of a quantized version on Q levels of the SAR image and lead to a criterion without parameters to be tuned by the user. We analyse the reliability of the proposed approach on synthetic images. The quality of the obtained partition will be studied for different possible strategies. In particular, one will discuss the reliability of the proposed optimization procedure. Finally, we will precisely study the performance of the proposed approach in comparison with the statistical parametric technique adapted to Gamma noise. These studies will be led by analyzing the number of misclassified pixels, the standard Hausdorff distance and the number of estimated regions.

  9. Estimation of the limit of detection with a bootstrap-derived standard error by a partly non-parametric approach. Application to HPLC drug assays

    DEFF Research Database (Denmark)

    Linnet, Kristian

    2005-01-01

    Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors......Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors...

  10. ANALYSIS OF TIED DATA: AN ALTERNATIVE NON-PARAMETRIC APPROACH

    Directory of Open Access Journals (Sweden)

    I. C. A. OYEKA

    2012-02-01

    Full Text Available This paper presents a non-parametric statistical method of analyzing two-sample data that makes provision for the possibility of ties in the data. A test statistic is developed and shown to be free of the effect of any possible ties in the data. An illustrative example is provided and the method is shown to compare favourably with its competitor; the Mann-Whitney test and is more powerful than the latter when there are ties.

  11. Parametric and Non-Parametric System Modelling

    DEFF Research Database (Denmark)

    Nielsen, Henrik Aalborg

    1999-01-01

    considered. It is shown that adaptive estimation in conditional parametric models can be performed by combining the well known methods of local polynomial regression and recursive least squares with exponential forgetting. The approach used for estimation in conditional parametric models also highlights how....... For this purpose non-parametric methods together with additive models are suggested. Also, a new approach specifically designed to detect non-linearities is introduced. Confidence intervals are constructed by use of bootstrapping. As a link between non-parametric and parametric methods a paper dealing with neural...... the focus is on combinations of parametric and non-parametric methods of regression. This combination can be in terms of additive models where e.g. one or more non-parametric term is added to a linear regression model. It can also be in terms of conditional parametric models where the coefficients...

  12. Bayesian non parametric modelling of Higgs pair production

    Science.gov (United States)

    Scarpa, Bruno; Dorigo, Tommaso

    2017-03-01

    Statistical classification models are commonly used to separate a signal from a background. In this talk we face the problem of isolating the signal of Higgs pair production using the decay channel in which each boson decays into a pair of b-quarks. Typically in this context non parametric methods are used, such as Random Forests or different types of boosting tools. We remain in the same non-parametric framework, but we propose to face the problem following a Bayesian approach. A Dirichlet process is used as prior for the random effects in a logit model which is fitted by leveraging the Polya-Gamma data augmentation. Refinements of the model include the insertion in the simple model of P-splines to relate explanatory variables with the response and the use of Bayesian trees (BART) to describe the atoms in the Dirichlet process.

  13. Bayesian non parametric modelling of Higgs pair production

    Directory of Open Access Journals (Sweden)

    Scarpa Bruno

    2017-01-01

    Full Text Available Statistical classification models are commonly used to separate a signal from a background. In this talk we face the problem of isolating the signal of Higgs pair production using the decay channel in which each boson decays into a pair of b-quarks. Typically in this context non parametric methods are used, such as Random Forests or different types of boosting tools. We remain in the same non-parametric framework, but we propose to face the problem following a Bayesian approach. A Dirichlet process is used as prior for the random effects in a logit model which is fitted by leveraging the Polya-Gamma data augmentation. Refinements of the model include the insertion in the simple model of P-splines to relate explanatory variables with the response and the use of Bayesian trees (BART to describe the atoms in the Dirichlet process.

  14. Parametric versus non-parametric simulation

    OpenAIRE

    Dupeux, Bérénice; Buysse, Jeroen

    2014-01-01

    Most of ex-ante impact assessment policy models have been based on a parametric approach. We develop a novel non-parametric approach, called Inverse DEA. We use non parametric efficiency analysis for determining the farm’s technology and behaviour. Then, we compare the parametric approach and the Inverse DEA models to a known data generating process. We use a bio-economic model as a data generating process reflecting a real world situation where often non-linear relationships exist. Results s...

  15. Non-parametric Morphologies of Mergers in the Illustris Simulation

    CERN Document Server

    Bignone, Lucas A; Sillero, Emanuel; Pedrosa, Susana E; Pellizza, Leonardo J; Lambas, Diego G

    2016-01-01

    We study non-parametric morphologies of mergers events in a cosmological context, using the Illustris project. We produce mock g-band images comparable to observational surveys from the publicly available Illustris simulation idealized mock images at $z=0$. We then measure non parametric indicators: asymmetry, Gini, $M_{20}$, clumpiness and concentration for a set of galaxies with $M_* >10^{10}$ M$_\\odot$. We correlate these automatic statistics with the recent merger history of galaxies and with the presence of close companions. Our main contribution is to assess in a cosmological framework, the empirically derived non-parametric demarcation line and average time-scales used to determine the merger rate observationally. We found that 98 per cent of galaxies above the demarcation line have a close companion or have experienced a recent merger event. On average, merger signatures obtained from the $G-M_{20}$ criteria anticorrelate clearly with the elapsing time to the last merger event. We also find that the a...

  16. Comparação de duas metodologias de amostragem atmosférica com ferramenta estatística não paramétrica Comparison of two atmospheric sampling methodologies with non-parametric statistical tools

    Directory of Open Access Journals (Sweden)

    Maria João Nunes

    2005-03-01

    Full Text Available In atmospheric aerosol sampling, it is inevitable that the air that carries particles is in motion, as a result of both externally driven wind and the sucking action of the sampler itself. High or low air flow sampling speeds may lead to significant particle size bias. The objective of this work is the validation of measurements enabling the comparison of species concentration from both air flow sampling techniques. The presence of several outliers and increase of residuals with concentration becomes obvious, requiring non-parametric methods, recommended for the handling of data which may not be normally distributed. This way, conversion factors are obtained for each of the various species under study using Kendall regression.

  17. Noncentral Chi-Square versus Normal Distributions in Describing the Likelihood Ratio Statistic: The Univariate Case and Its Multivariate Implication

    Science.gov (United States)

    Yuan, Ke-Hai

    2008-01-01

    In the literature of mean and covariance structure analysis, noncentral chi-square distribution is commonly used to describe the behavior of the likelihood ratio (LR) statistic under alternative hypothesis. Due to the inaccessibility of the rather technical literature for the distribution of the LR statistic, it is widely believed that the…

  18. Noncentral Chi-Square Versus Normal Distributions in Describing the Likelihood Ratio Statistic: The Univariate Case and Its Multivariate Implication.

    Science.gov (United States)

    Yuan, Ke-Hai

    2008-01-01

    In the literature of mean and covariance structure analysis, noncentral chi-square distribution is commonly used to describe the behavior of the likelihood ratio (LR) statistic under alternative hypothesis. Due to the inaccessibility of the rather technical literature for the distribution of the LR statistic, it is widely believed that the noncentral chi-square distribution is justified by statistical theory. Actually, when the null hypothesis is not trivially violated, the noncentral chi-square distribution cannot describe the LR statistic well even when data are normally distributed and the sample size is large. Using the one-dimensional case, this article provides the details showing that the LR statistic asymptotically follows a normal distribution, which also leads to an asymptotically correct confidence interval for the discrepancy between the null hypothesis/model and the population. For each one-dimensional result, the corresponding results in the higher dimensional case are pointed out and references are provided. Examples with real data illustrate the difference between the noncentral chi-square distribution and the normal distribution. Monte Carlo results compare the strength of the normal distribution against that of the noncentral chi-square distribution. The implication to data analysis is discussed whenever relevant. The development is built upon the concepts of basic calculous, linear algebra, and introductory probability and statistics. The aim is to provide the least technical material for quantitative graduate students in social science to understand the condition and limitation of the noncentral chi-square distribution.

  19. Non-Parametric Estimation of Correlation Functions

    DEFF Research Database (Denmark)

    Brincker, Rune; Rytter, Anders; Krenk, Steen

    In this paper three methods of non-parametric correlation function estimation are reviewed and evaluated: the direct method, estimation by the Fast Fourier Transform and finally estimation by the Random Decrement technique. The basic ideas of the techniques are reviewed, sources of bias are pointed...... out, and methods to prevent bias are presented. The techniques are evaluated by comparing their speed and accuracy on the simple case of estimating auto-correlation functions for the response of a single degree-of-freedom system loaded with white noise....

  20. Lottery spending: a non-parametric analysis.

    Science.gov (United States)

    Garibaldi, Skip; Frisoli, Kayla; Ke, Li; Lim, Melody

    2015-01-01

    We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.

  1. Lottery spending: a non-parametric analysis.

    Directory of Open Access Journals (Sweden)

    Skip Garibaldi

    Full Text Available We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.

  2. A non-parametric peak finder algorithm and its application in searches for new physics

    CERN Document Server

    Chekanov, S

    2011-01-01

    We have developed an algorithm for non-parametric fitting and extraction of statistically significant peaks in the presence of statistical and systematic uncertainties. Applications of this algorithm for analysis of high-energy collision data are discussed. In particular, we illustrate how to use this algorithm in general searches for new physics in invariant-mass spectra using pp Monte Carlo simulations.

  3. On Parametric (and Non-Parametric Variation

    Directory of Open Access Journals (Sweden)

    Neil Smith

    2009-11-01

    Full Text Available This article raises the issue of the correct characterization of ‘Parametric Variation’ in syntax and phonology. After specifying their theoretical commitments, the authors outline the relevant parts of the Principles–and–Parameters framework, and draw a three-way distinction among Universal Principles, Parameters, and Accidents. The core of the contribution then consists of an attempt to provide identity criteria for parametric, as opposed to non-parametric, variation. Parametric choices must be antecedently known, and it is suggested that they must also satisfy seven individually necessary and jointly sufficient criteria. These are that they be cognitively represented, systematic, dependent on the input, deterministic, discrete, mutually exclusive, and irreversible.

  4. Non-parametric estimation of Fisher information from real data

    CERN Document Server

    Shemesh, Omri Har; Miñano, Borja; Hoekstra, Alfons G; Sloot, Peter M A

    2015-01-01

    The Fisher Information matrix is a widely used measure for applications ranging from statistical inference, information geometry, experiment design, to the study of criticality in biological systems. Yet there is no commonly accepted non-parametric algorithm to estimate it from real data. In this rapid communication we show how to accurately estimate the Fisher information in a nonparametric way. We also develop a numerical procedure to minimize the errors by choosing the interval of the finite difference scheme necessary to compute the derivatives in the definition of the Fisher information. Our method uses the recently published "Density Estimation using Field Theory" algorithm to compute the probability density functions for continuous densities. We use the Fisher information of the normal distribution to validate our method and as an example we compute the temperature component of the Fisher Information Matrix in the two dimensional Ising model and show that it obeys the expected relation to the heat capa...

  5. A Non-Parametric Spatial Independence Test Using Symbolic Entropy

    Directory of Open Access Journals (Sweden)

    López Hernández, Fernando

    2008-01-01

    Full Text Available In the present paper, we construct a new, simple, consistent and powerful test forspatial independence, called the SG test, by using symbolic dynamics and symbolic entropyas a measure of spatial dependence. We also give a standard asymptotic distribution of anaffine transformation of the symbolic entropy under the null hypothesis of independencein the spatial process. The test statistic and its standard limit distribution, with theproposed symbolization, are invariant to any monotonuous transformation of the data.The test applies to discrete or continuous distributions. Given that the test is based onentropy measures, it avoids smoothed nonparametric estimation. We include a MonteCarlo study of our test, together with the well-known Moran’s I, the SBDS (de Graaffet al, 2001 and (Brett and Pinkse, 1997 non parametric test, in order to illustrate ourapproach.

  6. A non-parametric method for correction of global radiation observations

    DEFF Research Database (Denmark)

    Bacher, Peder; Madsen, Henrik; Perers, Bengt;

    2013-01-01

    in the observations are corrected. These are errors such as: tilt in the leveling of the sensor, shadowing from surrounding objects, clipping and saturation in the signal processing, and errors from dirt and wear. The method is based on a statistical non-parametric clear-sky model which is applied to both...

  7. Statistics for quantifying heterogeneity in univariate and bivariate meta-analyses of binary data: the case of meta-analyses of diagnostic accuracy.

    Science.gov (United States)

    Zhou, Yan; Dendukuri, Nandini

    2014-07-20

    Heterogeneity in diagnostic meta-analyses is common because of the observational nature of diagnostic studies and the lack of standardization in the positivity criterion (cut-off value) for some tests. So far the unexplained heterogeneity across studies has been quantified by either using the I(2) statistic for a single parameter (i.e. either the sensitivity or the specificity) or visually examining the data in a receiver-operating characteristic space. In this paper, we derive improved I(2) statistics measuring heterogeneity for dichotomous outcomes, with a focus on diagnostic tests. We show that the currently used estimate of the 'typical' within-study variance proposed by Higgins and Thompson is not able to properly account for the variability of the within-study variance across studies for dichotomous variables. Therefore, when the between-study variance is large, the 'typical' within-study variance underestimates the expected within-study variance, and the corresponding I(2) is overestimated. We propose to use the expected value of the within-study variation in the construction of I(2) in cases of univariate and bivariate diagnostic meta-analyses. For bivariate diagnostic meta-analyses, we derive a bivariate version of I(2) that is able to account for the correlation between sensitivity and specificity. We illustrate the performance of these new estimators using simulated data as well as two real data sets.

  8. Evaluating statistical and clinical significance of intervention effects in single-case experimental designs: an SPSS method to analyze univariate data.

    Science.gov (United States)

    Maric, Marija; de Haan, Else; Hogendoorn, Sanne M; Wolters, Lidewij H; Huizenga, Hilde M

    2015-03-01

    Single-case experimental designs are useful methods in clinical research practice to investigate individual client progress. Their proliferation might have been hampered by methodological challenges such as the difficulty applying existing statistical procedures. In this article, we describe a data-analytic method to analyze univariate (i.e., one symptom) single-case data using the common package SPSS. This method can help the clinical researcher to investigate whether an intervention works as compared with a baseline period or another intervention type, and to determine whether symptom improvement is clinically significant. First, we describe the statistical method in a conceptual way and show how it can be implemented in SPSS. Simulation studies were performed to determine the number of observation points required per intervention phase. Second, to illustrate this method and its implications, we present a case study of an adolescent with anxiety disorders treated with cognitive-behavioral therapy techniques in an outpatient psychotherapy clinic, whose symptoms were regularly assessed before each session. We provide a description of the data analyses and results of this case study. Finally, we discuss the advantages and shortcomings of the proposed method. Copyright © 2014. Published by Elsevier Ltd.

  9. A non-parametric approach to investigating fish population dynamics

    National Research Council Canada - National Science Library

    Cook, R.M; Fryer, R.J

    2001-01-01

    .... Using a non-parametric model for the stock-recruitment relationship it is possible to avoid defining specific functions relating recruitment to stock size while also providing a natural framework to model process error...

  10. Non-parametric approach to the study of phenotypic stability.

    Science.gov (United States)

    Ferreira, D F; Fernandes, S B; Bruzi, A T; Ramalho, M A P

    2016-02-19

    The aim of this study was to undertake the theoretical derivations of non-parametric methods, which use linear regressions based on rank order, for stability analyses. These methods were extension different parametric methods used for stability analyses and the result was compared with a standard non-parametric method. Intensive computational methods (e.g., bootstrap and permutation) were applied, and data from the plant-breeding program of the Biology Department of UFLA (Minas Gerais, Brazil) were used to illustrate and compare the tests. The non-parametric stability methods were effective for the evaluation of phenotypic stability. In the presence of variance heterogeneity, the non-parametric methods exhibited greater power of discrimination when determining the phenotypic stability of genotypes.

  11. Non-Parametric Statistical Methods and Data Transformations in Agricultural Pest Population Studies Métodos Estadísticos no Paramétricos y Transformaciones de Datos en Estudios de Poblaciones de Plagas Agrícolas

    Directory of Open Access Journals (Sweden)

    Alcides Cabrera Campos

    2012-09-01

    Full Text Available Analyzing data from agricultural pest populations regularly detects that they do not fulfill the theoretical requirements to implement classical ANOVA. Box-Cox transformations and nonparametric statistical methods are commonly used as alternatives to solve this problem. In this paper, we describe the results of applying these techniques to data from Thrips palmi Karny sampled in potato (Solanum tuberosum L. plantations. The X² test was used for the goodness-of-fit of negative binomial distribution and as a test of independence to investigate the relationship between plant strata and insect stages. Seven data transformations were also applied to meet the requirements of classical ANOVA, which failed to eliminate the relationship between mean and variance. Given this negative result, comparisons between insect population densities were made using the nonparametric Kruskal-Wallis ANOVA test. Results from this analysis allowed selecting the insect larval stage and plant middle stratum as keys to design pest sampling plans.Al analizar datos provenientes de poblaciones de plagas agrícolas, regularmente se detecta que no cumplen los requerimientos teóricos para la aplicación del ANDEVA clásico. El uso de transformaciones Box-Cox y de métodos estadísticos no paramétricos resulta la alternativa más utilizada para resolver este inconveniente. En el presente trabajo se exponen los resultados de la aplicación de estas técnicas a datos provenientes de Thrips palmi Karny muestreadas en plantaciones de papa (Solanum tuberosum L. en el período de incidencia de la plaga. Se utilizó la dócima X² para la bondad de ajuste a la distribución binomial negativa y de independencia para investigar la relación entre los estratos de las plantas y los estados del insecto, se aplicaron siete transformaciones a los datos para satisfacer el cumplimiento de los supuestos básicos del ANDEVA, con las cuales no se logró eliminar la relación entre la media y la

  12. Variable selection in identification of a high dimensional nonlinear non-parametric system

    Institute of Scientific and Technical Information of China (English)

    Er-Wei BAI; Wenxiao ZHAO; Weixing ZHENG

    2015-01-01

    The problem of variable selection in system identification of a high dimensional nonlinear non-parametric system is described. The inherent difficulty, the curse of dimensionality, is introduced. Then its connections to various topics and research areas are briefly discussed, including order determination, pattern recognition, data mining, machine learning, statistical regression and manifold embedding. Finally, some results of variable selection in system identification in the recent literature are presented.

  13. Measuring the influence of information networks on transaction costs using a non-parametric regression technique

    DEFF Research Database (Denmark)

    Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H. C. A.

    All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access to no...... are unveiled by reduced productivity. A cross-validated local linear non-parametric regression shows that good information networks increase the productivity of farms. A bootstrapping procedure confirms that this result is statistically significant....

  14. Further Research into a Non-Parametric Statistical Screening System.

    Science.gov (United States)

    1979-12-14

    Let X = V if birth weight is high X2 = 0 if gestation length is short V2 if gestation length is long Normal babies have high birth weight and long... gestation length or low birth weight and short gestation length . Abnormal babies have either of the other two combinations ((0, 1) or (1, 0)). The LDF

  15. Non-Parametric Bayesian Areal Linguistics

    CERN Document Server

    Daumé, Hal

    2009-01-01

    We describe a statistical model over linguistic areas and phylogeny. Our model recovers known areas and identifies a plausible hierarchy of areal features. The use of areas improves genetic reconstruction of languages both qualitatively and quantitatively according to a variety of metrics. We model linguistic areas by a Pitman-Yor process and linguistic phylogeny by Kingman's coalescent.

  16. Non-Parametric Tests of Structure for High Angular Resolution Diffusion Imaging in Q-Space

    CERN Document Server

    Olhede, Sofia C

    2010-01-01

    High angular resolution diffusion imaging data is the observed characteristic function for the local diffusion of water molecules in tissue. This data is used to infer structural information in brain imaging. Non-parametric scalar measures are proposed to summarize such data, and to locally characterize spatial features of the diffusion probability density function (PDF), relying on the geometry of the characteristic function. Summary statistics are defined so that their distributions are, to first order, both independent of nuisance parameters and also analytically tractable. The dominant direction of the diffusion at a spatial location (voxel) is determined, and a new set of axes are introduced in Fourier space. Variation quantified in these axes determines the local spatial properties of the diffusion density. Non-parametric hypothesis tests for determining whether the diffusion is unimodal, isotropic or multi-modal are proposed. More subtle characteristics of white-matter microstructure, such as the degre...

  17. Non-parametric analysis of rating transition and default data

    DEFF Research Database (Denmark)

    Fledelius, Peter; Lando, David; Perch Nielsen, Jens

    2004-01-01

    We demonstrate the use of non-parametric intensity estimation - including construction of pointwise confidence sets - for analyzing rating transition data. We find that transition intensities away from the class studied here for illustration strongly depend on the direction of the previous move b...... but that this dependence vanishes after 2-3 years....

  18. A non-parametric model for the cosmic velocity field

    NARCIS (Netherlands)

    Branchini, E; Teodoro, L; Frenk, CS; Schmoldt, [No Value; Efstathiou, G; White, SDM; Saunders, W; Sutherland, W; Rowan-Robinson, M; Keeble, O; Tadros, H; Maddox, S; Oliver, S

    1999-01-01

    We present a self-consistent non-parametric model of the local cosmic velocity field derived from the distribution of IRAS galaxies in the PSCz redshift survey. The survey has been analysed using two independent methods, both based on the assumptions of gravitational instability and linear biasing.

  19. Non-parametric Bayesian inference for inhomogeneous Markov point processes

    DEFF Research Database (Denmark)

    Berthelsen, Kasper Klitgaard; Møller, Jesper

    With reference to a specific data set, we consider how to perform a flexible non-parametric Bayesian analysis of an inhomogeneous point pattern modelled by a Markov point process, with a location dependent first order term and pairwise interaction only. A priori we assume that the first order term...

  20. Non-parametric analysis of rating transition and default data

    DEFF Research Database (Denmark)

    Fledelius, Peter; Lando, David; Perch Nielsen, Jens

    2004-01-01

    We demonstrate the use of non-parametric intensity estimation - including construction of pointwise confidence sets - for analyzing rating transition data. We find that transition intensities away from the class studied here for illustration strongly depend on the direction of the previous move...

  1. Using non-parametric methods in econometric production analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    2012-01-01

    by investigating the relationship between the elasticity of scale and the farm size. We use a balanced panel data set of 371~specialised crop farms for the years 2004-2007. A non-parametric specification test shows that neither the Cobb-Douglas function nor the Translog function are consistent with the "true......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...

  2. Non-parametric versus parametric methods in environmental sciences

    Directory of Open Access Journals (Sweden)

    Muhammad Riaz

    2016-01-01

    Full Text Available This current report intends to highlight the importance of considering background assumptions required for the analysis of real datasets in different disciplines. We will provide comparative discussion of parametric methods (that depends on distributional assumptions (like normality relative to non-parametric methods (that are free from many distributional assumptions. We have chosen a real dataset from environmental sciences (one of the application areas. The findings may be extended to the other disciplines following the same spirit.

  3. Evaluating statistical and clinical significance of intervention effects in single-case experimental designs: An SPSS method to analyze univariate data

    NARCIS (Netherlands)

    Maric, M.; de Haan, M.; Hogendoorn, S.M.; Wolters, L.H.; Huizenga, H.M.

    2015-01-01

    Single-case experimental designs are useful methods in clinical research practice to investigate individual client progress. Their proliferation might have been hampered by methodological challenges such as the difficulty applying existing statistical procedures. In this article, we describe a

  4. Evaluating statistical and clinical significance of intervention effects in single-case experimental designs: An SPSS method to analyze univariate data

    NARCIS (Netherlands)

    M. Maric; M. de Haan; S.M. Hogendoorn; L.H. Wolters; H.M. Huizenga

    2015-01-01

    Single-case experimental designs are useful methods in clinical research practice to investigate individual client progress. Their proliferation might have been hampered by methodological challenges such as the difficulty applying existing statistical procedures. In this article, we describe a data-

  5. A note on the use of the non-parametric Wilcoxon-Mann-Whitney test in the analysis of medical studies

    Directory of Open Access Journals (Sweden)

    Kühnast, Corinna

    2008-04-01

    Full Text Available Background: Although non-normal data are widespread in biomedical research, parametric tests unnecessarily predominate in statistical analyses. Methods: We surveyed five biomedical journals and – for all studies which contain at least the unpaired t-test or the non-parametric Wilcoxon-Mann-Whitney test – investigated the relationship between the choice of a statistical test and other variables such as type of journal, sample size, randomization, sponsoring etc. Results: The non-parametric Wilcoxon-Mann-Whitney was used in 30% of the studies. In a multivariable logistic regression the type of journal, the test object, the scale of measurement and the statistical software were significant. The non-parametric test was more common in case of non-continuous data, in high-impact journals, in studies in humans, and when the statistical software is specified, in particular when SPSS was used.

  6. Non-parametric change-point method for differential gene expression detection.

    Directory of Open Access Journals (Sweden)

    Yao Wang

    Full Text Available BACKGROUND: We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short, by using a single equation for detecting differential gene expression (DGE in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability. METHODOLOGY: NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods. CONCLUSIONS: Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods.

  7. Digital spectral analysis parametric, non-parametric and advanced methods

    CERN Document Server

    Castanié, Francis

    2013-01-01

    Digital Spectral Analysis provides a single source that offers complete coverage of the spectral analysis domain. This self-contained work includes details on advanced topics that are usually presented in scattered sources throughout the literature.The theoretical principles necessary for the understanding of spectral analysis are discussed in the first four chapters: fundamentals, digital signal processing, estimation in spectral analysis, and time-series models.An entire chapter is devoted to the non-parametric methods most widely used in industry.High resolution methods a

  8. A non-parametric framework for estimating threshold limit values

    Directory of Open Access Journals (Sweden)

    Ulm Kurt

    2005-11-01

    Full Text Available Abstract Background To estimate a threshold limit value for a compound known to have harmful health effects, an 'elbow' threshold model is usually applied. We are interested on non-parametric flexible alternatives. Methods We describe how a step function model fitted by isotonic regression can be used to estimate threshold limit values. This method returns a set of candidate locations, and we discuss two algorithms to select the threshold among them: the reduced isotonic regression and an algorithm considering the closed family of hypotheses. We assess the performance of these two alternative approaches under different scenarios in a simulation study. We illustrate the framework by analysing the data from a study conducted by the German Research Foundation aiming to set a threshold limit value in the exposure to total dust at workplace, as a causal agent for developing chronic bronchitis. Results In the paper we demonstrate the use and the properties of the proposed methodology along with the results from an application. The method appears to detect the threshold with satisfactory success. However, its performance can be compromised by the low power to reject the constant risk assumption when the true dose-response relationship is weak. Conclusion The estimation of thresholds based on isotonic framework is conceptually simple and sufficiently powerful. Given that in threshold value estimation context there is not a gold standard method, the proposed model provides a useful non-parametric alternative to the standard approaches and can corroborate or challenge their findings.

  9. Using non-parametric methods in econometric production analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    2012-01-01

    Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb-Douglas a......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...... to estimate production functions without the specification of a functional form. Therefore, they avoid possible misspecification errors due to the use of an unsuitable functional form. In this paper, we use parametric and non-parametric methods to identify the optimal size of Polish crop farms...

  10. Transit Timing Observations From Kepler: Ii. Confirmation of Two Multiplanet Systems via a Non-Parametric Correlation Analysis

    OpenAIRE

    Ford, Eric B.; Fabrycky, Daniel C.; Steffen, Jason H.; Carter, Joshua A.; Fressin, Francois; Holman, Matthew Jon; Lissauer, Jack J.; Moorhead, Althea V.; Morehead, Robert C.; Ragozzine, Darin; Rowe, Jason F.; Welsh, William F.; Allen, Christopher; Batalha, Natalie M.; Borucki, William J.

    2012-01-01

    We present a new method for confirming transiting planets based on the combination of transit timingn variations (TTVs) and dynamical stability. Correlated TTVs provide evidence that the pair of bodies are in the same physical system. Orbital stability provides upper limits for the masses of the transiting companions that are in the planetary regime. This paper describes a non-parametric technique for quantifying the statistical significance of TTVs based on the correlation of two TTV data se...

  11. Multi-Directional Non-Parametric Analysis of Agricultural Efficiency

    DEFF Research Database (Denmark)

    Balezentis, Tomas

    This thesis seeks to develop methodologies for assessment of agricultural efficiency and employ them to Lithuanian family farms. In particular, we focus on three particular objectives throughout the research: (i) to perform a fully non-parametric analysis of efficiency effects, (ii) to extend...... relative to labour, intermediate consumption and land (in some cases land was not treated as a discretionary input). These findings call for further research on relationships among financial structure, investment decisions, and efficiency in Lithuanian family farms. Application of different techniques...... of stochasticity associated with Lithuanian family farm performance. The former technique showed that the farms differed in terms of the mean values and variance of the efficiency scores over time with some clear patterns prevailing throughout the whole research period. The fuzzy Free Disposal Hull showed...

  12. Binary Classifier Calibration Using a Bayesian Non-Parametric Approach.

    Science.gov (United States)

    Naeini, Mahdi Pakdaman; Cooper, Gregory F; Hauskrecht, Milos

    Learning probabilistic predictive models that are well calibrated is critical for many prediction and decision-making tasks in Data mining. This paper presents two new non-parametric methods for calibrating outputs of binary classification models: a method based on the Bayes optimal selection and a method based on the Bayesian model averaging. The advantage of these methods is that they are independent of the algorithm used to learn a predictive model, and they can be applied in a post-processing step, after the model is learned. This makes them applicable to a wide variety of machine learning models and methods. These calibration methods, as well as other methods, are tested on a variety of datasets in terms of both discrimination and calibration performance. The results show the methods either outperform or are comparable in performance to the state-of-the-art calibration methods.

  13. Using non-parametric methods in econometric production analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    -Douglas function nor the Translog function are consistent with the “true” relationship between the inputs and the output in our data set. We solve this problem by using non-parametric regression. This approach delivers reasonable results, which are on average not too different from the results of the parametric......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...

  14. Non-parametric star formation histories for 5 dwarf spheroidal galaxies of the local group

    CERN Document Server

    Hernández, X; Valls-Gabaud, D; Gilmore, Gerard; Valls-Gabaud, David

    2000-01-01

    We use recent HST colour-magnitude diagrams of the resolved stellar populations of a sample of local dSph galaxies (Carina, LeoI, LeoII, Ursa Minor and Draco) to infer the star formation histories of these systems, $SFR(t)$. Applying a new variational calculus maximum likelihood method which includes a full Bayesian analysis and allows a non-parametric estimate of the function one is solving for, we infer the star formation histories of the systems studied. This method has the advantage of yielding an objective answer, as one need not assume {\\it a priori} the form of the function one is trying to recover. The results are checked independently using Saha's $W$ statistic. The total luminosities of the systems are used to normalize the results into physical units and derive SN type II rates. We derive the luminosity weighted mean star formation history of this sample of galaxies.

  15. Assessing T cell clonal size distribution: a non-parametric approach.

    Science.gov (United States)

    Bolkhovskaya, Olesya V; Zorin, Daniil Yu; Ivanchenko, Mikhail V

    2014-01-01

    Clonal structure of the human peripheral T-cell repertoire is shaped by a number of homeostatic mechanisms, including antigen presentation, cytokine and cell regulation. Its accurate tuning leads to a remarkable ability to combat pathogens in all their variety, while systemic failures may lead to severe consequences like autoimmune diseases. Here we develop and make use of a non-parametric statistical approach to assess T cell clonal size distributions from recent next generation sequencing data. For 41 healthy individuals and a patient with ankylosing spondylitis, who undergone treatment, we invariably find power law scaling over several decades and for the first time calculate quantitatively meaningful values of decay exponent. It has proved to be much the same among healthy donors, significantly different for an autoimmune patient before the therapy, and converging towards a typical value afterwards. We discuss implications of the findings for theoretical understanding and mathematical modeling of adaptive immunity.

  16. Assessing T cell clonal size distribution: a non-parametric approach.

    Directory of Open Access Journals (Sweden)

    Olesya V Bolkhovskaya

    Full Text Available Clonal structure of the human peripheral T-cell repertoire is shaped by a number of homeostatic mechanisms, including antigen presentation, cytokine and cell regulation. Its accurate tuning leads to a remarkable ability to combat pathogens in all their variety, while systemic failures may lead to severe consequences like autoimmune diseases. Here we develop and make use of a non-parametric statistical approach to assess T cell clonal size distributions from recent next generation sequencing data. For 41 healthy individuals and a patient with ankylosing spondylitis, who undergone treatment, we invariably find power law scaling over several decades and for the first time calculate quantitatively meaningful values of decay exponent. It has proved to be much the same among healthy donors, significantly different for an autoimmune patient before the therapy, and converging towards a typical value afterwards. We discuss implications of the findings for theoretical understanding and mathematical modeling of adaptive immunity.

  17. A non-parametric method for correction of global radiation observations

    DEFF Research Database (Denmark)

    Bacher, Peder; Madsen, Henrik; Perers, Bengt;

    2013-01-01

    This paper presents a method for correction and alignment of global radiation observations based on information obtained from calculated global radiation, in the present study one-hour forecast of global radiation from a numerical weather prediction (NWP) model is used. Systematical errors detected...... in the observations are corrected. These are errors such as: tilt in the leveling of the sensor, shadowing from surrounding objects, clipping and saturation in the signal processing, and errors from dirt and wear. The method is based on a statistical non-parametric clear-sky model which is applied to both...... University. The method can be useful for optimized use of solar radiation observations for forecasting, monitoring, and modeling of energy production and load which are affected by solar radiation....

  18. Measuring the influence of information networks on transaction costs using a non-parametric regression technique

    DEFF Research Database (Denmark)

    Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H. C. A.

    All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access...... to nonpublic information. Our analysis shows that information networks have an impact on the level of TAC. Many resources that are sacrificed for TAC are inputs that also enter the technical production process. As most production data do not separate between these two usages of inputs, high transaction costs...... are unveiled by reduced productivity. A cross-validated local linear non-parametric regression shows that good information networks increase the productivity of farms. A bootstrapping procedure confirms that this result is statistically significant....

  19. Comparison of non-parametric methods for ungrouping coarsely aggregated data

    DEFF Research Database (Denmark)

    Rizzi, Silvia; Thinggaard, Mikael; Engholm, Gerda

    2016-01-01

    Background Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age...... methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate...... composite link model performs the best. Conclusion We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized...

  20. A New Non-Parametric Approach to Galaxy Morphological Classification

    CERN Document Server

    Lotz, J M; Madau, P; Lotz, Jennifer M.; Primack, Joel; Madau, Piero

    2003-01-01

    We present two new non-parametric methods for quantifying galaxy morphology: the relative distribution of the galaxy pixel flux values (the Gini coefficient or G) and the second-order moment of the brightest 20% of the galaxy's flux (M20). We test the robustness of G and M20 to decreasing signal-to-noise and spatial resolution, and find that both measures are reliable to within 10% at average signal-to-noise per pixel greater than 3 and resolutions better than 1000 pc and 500 pc, respectively. We have measured G and M20, as well as concentration (C), asymmetry (A), and clumpiness (S) in the rest-frame near-ultraviolet/optical wavelengths for 150 bright local "normal" Hubble type galaxies (E-Sd) galaxies and 104 0.05 < z < 0.25 ultra-luminous infrared galaxies (ULIRGs).We find that most local galaxies follow a tight sequence in G-M20-C, where early-types have high G and C and low M20 and late-type spirals have lower G and C and higher M20. The majority of ULIRGs lie above the normal galaxy G-M20 sequence...

  1. Non-parametric and least squares Langley plot methods

    Directory of Open Access Journals (Sweden)

    P. W. Kiedron

    2015-04-01

    Full Text Available Langley plots are used to calibrate sun radiometers primarily for the measurement of the aerosol component of the atmosphere that attenuates (scatters and absorbs incoming direct solar radiation. In principle, the calibration of a sun radiometer is a straightforward application of the Bouguer–Lambert–Beer law V=V>/i>0e−τ ·m, where a plot of ln (V voltage vs. m air mass yields a straight line with intercept ln (V0. This ln (V0 subsequently can be used to solve for τ for any measurement of V and calculation of m. This calibration works well on some high mountain sites, but the application of the Langley plot calibration technique is more complicated at other, more interesting, locales. This paper is concerned with ferreting out calibrations at difficult sites and examining and comparing a number of conventional and non-conventional methods for obtaining successful Langley plots. The eleven techniques discussed indicate that both least squares and various non-parametric techniques produce satisfactory calibrations with no significant differences among them when the time series of ln (V0's are smoothed and interpolated with median and mean moving window filters.

  2. Parametric and non-parametric modeling of short-term synaptic plasticity. Part II: Experimental study.

    Science.gov (United States)

    Song, Dong; Wang, Zhuo; Marmarelis, Vasilis Z; Berger, Theodore W

    2009-02-01

    This paper presents a synergistic parametric and non-parametric modeling study of short-term plasticity (STP) in the Schaffer collateral to hippocampal CA1 pyramidal neuron (SC) synapse. Parametric models in the form of sets of differential and algebraic equations have been proposed on the basis of the current understanding of biological mechanisms active within the system. Non-parametric Poisson-Volterra models are obtained herein from broadband experimental input-output data. The non-parametric model is shown to provide better prediction of the experimental output than a parametric model with a single set of facilitation/depression (FD) process. The parametric model is then validated in terms of its input-output transformational properties using the non-parametric model since the latter constitutes a canonical and more complete representation of the synaptic nonlinear dynamics. Furthermore, discrepancies between the experimentally-derived non-parametric model and the equivalent non-parametric model of the parametric model suggest the presence of multiple FD processes in the SC synapses. Inclusion of an additional set of FD process in the parametric model makes it replicate better the characteristics of the experimentally-derived non-parametric model. This improved parametric model in turn provides the requisite biological interpretability that the non-parametric model lacks.

  3. Non-parametric frequency analysis of extreme values for integrated disaster management considering probable maximum events

    Science.gov (United States)

    Takara, K. T.

    2015-12-01

    This paper describes a non-parametric frequency analysis method for hydrological extreme-value samples with a size larger than 100, verifying the estimation accuracy with a computer intensive statistics (CIS) resampling such as the bootstrap. Probable maximum values are also incorporated into the analysis for extreme events larger than a design level of flood control. Traditional parametric frequency analysis methods of extreme values include the following steps: Step 1: Collecting and checking extreme-value data; Step 2: Enumerating probability distributions that would be fitted well to the data; Step 3: Parameter estimation; Step 4: Testing goodness of fit; Step 5: Checking the variability of quantile (T-year event) estimates by the jackknife resampling method; and Step_6: Selection of the best distribution (final model). The non-parametric method (NPM) proposed here can skip Steps 2, 3, 4 and 6. Comparing traditional parameter methods (PM) with the NPM, this paper shows that PM often underestimates 100-year quantiles for annual maximum rainfall samples with records of more than 100 years. Overestimation examples are also demonstrated. The bootstrap resampling can do bias correction for the NPM and can also give the estimation accuracy as the bootstrap standard error. This NPM has advantages to avoid various difficulties in above-mentioned steps in the traditional PM. Probable maximum events are also incorporated into the NPM as an upper bound of the hydrological variable. Probable maximum precipitation (PMP) and probable maximum flood (PMF) can be a new parameter value combined with the NPM. An idea how to incorporate these values into frequency analysis is proposed for better management of disasters that exceed the design level. The idea stimulates more integrated approach by geoscientists and statisticians as well as encourages practitioners to consider the worst cases of disasters in their disaster management planning and practices.

  4. Non-parametric combination and related permutation tests for neuroimaging.

    Science.gov (United States)

    Winkler, Anderson M; Webster, Matthew A; Brooks, Jonathan C; Tracey, Irene; Smith, Stephen M; Nichols, Thomas E

    2016-04-01

    In this work, we show how permutation methods can be applied to combination analyses such as those that include multiple imaging modalities, multiple data acquisitions of the same modality, or simply multiple hypotheses on the same data. Using the well-known definition of union-intersection tests and closed testing procedures, we use synchronized permutations to correct for such multiplicity of tests, allowing flexibility to integrate imaging data with different spatial resolutions, surface and/or volume-based representations of the brain, including non-imaging data. For the problem of joint inference, we propose and evaluate a modification of the recently introduced non-parametric combination (NPC) methodology, such that instead of a two-phase algorithm and large data storage requirements, the inference can be performed in a single phase, with reasonable computational demands. The method compares favorably to classical multivariate tests (such as MANCOVA), even when the latter is assessed using permutations. We also evaluate, in the context of permutation tests, various combining methods that have been proposed in the past decades, and identify those that provide the best control over error rate and power across a range of situations. We show that one of these, the method of Tippett, provides a link between correction for the multiplicity of tests and their combination. Finally, we discuss how the correction can solve certain problems of multiple comparisons in one-way ANOVA designs, and how the combination is distinguished from conjunctions, even though both can be assessed using permutation tests. We also provide a common algorithm that accommodates combination and correction.

  5. Characterizations of univariate continuous distributions

    CERN Document Server

    Ahsanullah, Mohammad

    2017-01-01

    Provides in an organized manner characterizations of univariate probability distributions with many new results published in this area since the 1978 work of Golambos & Kotz "Characterizations of Probability Distributions" (Springer), together with applications of the theory in model fitting and predictions.

  6. Modelación de episodios críticos de contaminación por material particulado (PM10 en Santiago de Chile: Comparación de la eficiencia predictiva de los modelos paramétricos y no paramétricos Modeling critical episodes of air pollution by PM10 in Santiago, Chile: Comparison of the predictive efficiency of parametric and non-parametric statistical models

    Directory of Open Access Journals (Sweden)

    Sergio A. Alvarado

    2010-12-01

    Full Text Available Objetivo: Evaluar la eficiencia predictiva de modelos estadísticos paramétricos y no paramétricos para predecir episodios críticos de contaminación por material particulado PM10 del día siguiente, que superen en Santiago de Chile la norma de calidad diaria. Una predicción adecuada de tales episodios permite a la autoridad decretar medidas restrictivas que aminoren la gravedad del episodio, y consecuentemente proteger la salud de la comunidad. Método: Se trabajó con las concentraciones de material particulado PM10 registradas en una estación asociada a la red de monitorización de la calidad del aire MACAM-2, considerando 152 observaciones diarias de 14 variables, y con información meteorológica registrada durante los años 2001 a 2004. Se ajustaron modelos estadísticos paramétricos Gamma usando el paquete estadístico STATA v11, y no paramétricos usando una demo del software estadístico MARS v 2.0 distribuida por Salford-Systems. Resultados: Ambos métodos de modelación presentan una alta correlación entre los valores observados y los predichos. Los modelos Gamma presentan mejores aciertos que MARS para las concentraciones de PM10 con valores Objective: To evaluate the predictive efficiency of two statistical models (one parametric and the other non-parametric to predict critical episodes of air pollution exceeding daily air quality standards in Santiago, Chile by using the next day PM10 maximum 24h value. Accurate prediction of such episodes would allow restrictive measures to be applied by health authorities to reduce their seriousness and protect the community´s health. Methods: We used the PM10 concentrations registered by a station of the Air Quality Monitoring Network (152 daily observations of 14 variables and meteorological information gathered from 2001 to 2004. To construct predictive models, we fitted a parametric Gamma model using STATA v11 software and a non-parametric MARS model by using a demo version of Salford

  7. Non-parametric three-way mixed ANOVA with aligned rank tests.

    Science.gov (United States)

    Oliver-Rodríguez, Juan C; Wang, X T

    2015-02-01

    Research problems that require a non-parametric analysis of multifactor designs with repeated measures arise in the behavioural sciences. There is, however, a lack of available procedures in commonly used statistical packages. In the present study, a generalization of the aligned rank test for the two-way interaction is proposed for the analysis of the typical sources of variation in a three-way analysis of variance (ANOVA) with repeated measures. It can be implemented in the usual statistical packages. Its statistical properties are tested by using simulation methods with two sample sizes (n = 30 and n = 10) and three distributions (normal, exponential and double exponential). Results indicate substantial increases in power for non-normal distributions in comparison with the usual parametric tests. Similar levels of Type I error for both parametric and aligned rank ANOVA were obtained with non-normal distributions and large sample sizes. Degrees-of-freedom adjustments for Type I error control in small samples are proposed. The procedure is applied to a case study with 30 participants per group where it detects gender differences in linguistic abilities in blind children not shown previously by other methods.

  8. Continuous/discrete non parametric Bayesian belief nets with UNICORN and UNINET

    NARCIS (Netherlands)

    Cooke, R.M.; Kurowicka, D.; Hanea, A.M.; Morales Napoles, O.; Ababei, D.A.; Ale, B.J.M.; Roelen, A.

    2007-01-01

    Hanea et al. (2006) presented a method for quantifying and computing continuous/discrete non parametric Bayesian Belief Nets (BBN). Influences are represented as conditional rank correlations, and the joint normal copula enables rapid sampling and conditionalization. Further mathematical background

  9. Kernel bandwidth estimation for non-parametric density estimation: a comparative study

    CSIR Research Space (South Africa)

    Van der Walt, CM

    2013-12-01

    Full Text Available We investigate the performance of conventional bandwidth estimators for non-parametric kernel density estimation on a number of representative pattern-recognition tasks, to gain a better understanding of the behaviour of these estimators in high...

  10. Zero- vs. one-dimensional, parametric vs. non-parametric, and confidence interval vs. hypothesis testing procedures in one-dimensional biomechanical trajectory analysis.

    Science.gov (United States)

    Pataky, Todd C; Vanrenterghem, Jos; Robinson, Mark A

    2015-05-01

    Biomechanical processes are often manifested as one-dimensional (1D) trajectories. It has been shown that 1D confidence intervals (CIs) are biased when based on 0D statistical procedures, and the non-parametric 1D bootstrap CI has emerged in the Biomechanics literature as a viable solution. The primary purpose of this paper was to clarify that, for 1D biomechanics datasets, the distinction between 0D and 1D methods is much more important than the distinction between parametric and non-parametric procedures. A secondary purpose was to demonstrate that a parametric equivalent to the 1D bootstrap exists in the form of a random field theory (RFT) correction for multiple comparisons. To emphasize these points we analyzed six datasets consisting of force and kinematic trajectories in one-sample, paired, two-sample and regression designs. Results showed, first, that the 1D bootstrap and other 1D non-parametric CIs were qualitatively identical to RFT CIs, and all were very different from 0D CIs. Second, 1D parametric and 1D non-parametric hypothesis testing results were qualitatively identical for all six datasets. Last, we highlight the limitations of 1D CIs by demonstrating that they are complex, design-dependent, and thus non-generalizable. These results suggest that (i) analyses of 1D data based on 0D models of randomness are generally biased unless one explicitly identifies 0D variables before the experiment, and (ii) parametric and non-parametric 1D hypothesis testing provide an unambiguous framework for analysis when one׳s hypothesis explicitly or implicitly pertains to whole 1D trajectories.

  11. Trend Analysis of Golestan's Rivers Discharges Using Parametric and Non-parametric Methods

    Science.gov (United States)

    Mosaedi, Abolfazl; Kouhestani, Nasrin

    2010-05-01

    One of the major problems in human life is climate changes and its problems. Climate changes will cause changes in rivers discharges. The aim of this research is to investigate the trend analysis of seasonal and yearly rivers discharges of Golestan province (Iran). In this research four trend analysis method including, conjunction point, linear regression, Wald-Wolfowitz and Mann-Kendall, for analyzing of river discharges in seasonal and annual periods in significant level of 95% and 99% were applied. First, daily discharge data of 12 hydrometrics stations with a length of 42 years (1965-2007) were selected, after some common statistical tests such as, homogeneity test (by applying G-B and M-W tests), the four mentioned trends analysis tests were applied. Results show that in all stations, for summer data time series, there are decreasing trends with a significant level of 99% according to Mann-Kendall (M-K) test. For autumn time series data, all four methods have similar results. For other periods, the results of these four tests were more or less similar together. While, for some stations the results of tests were different. Keywords: Trend Analysis, Discharge, Non-parametric methods, Wald-Wolfowitz, The Mann-Kendall test, Golestan Province.

  12. Comparison of non-parametric methods for ungrouping coarsely aggregated data

    Directory of Open Access Journals (Sweden)

    Silvia Rizzi

    2016-05-01

    Full Text Available Abstract Background Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data. Methods From an extensive literature search we identify five methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate differently shaped distributions; can handle unequal interval length; and allow stretches of 0 counts. Results The methods show similar performance when the grouping scheme is relatively narrow, i.e. 5-years age classes. With coarser age intervals, i.e. in the presence of open-ended age groups, the penalized composite link model performs the best. Conclusion We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized composite link model when data are grouped in wide age classes.

  13. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    Science.gov (United States)

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  14. Non-Parametric Evolutionary Algorithm for Estimating Root Zone Soil Moisture

    Science.gov (United States)

    Mohanty, B.; Shin, Y.; Ines, A. M.

    2013-12-01

    Prediction of root zone soil moisture is critical for water resources management. In this study, we explored a non-parametric evolutionary algorithm for estimating root zone soil moisture from a time series of spatially-distributed rainfall across multiple weather locations under two different hydro-climatic regions. A new genetic algorithm-based hidden Markov model (HMMGA) was developed to estimate long-term root zone soil moisture dynamics at different soil depths. Also, we analyzed rainfall occurrence probabilities and dry/wet spell lengths reproduced by this approach. The HMMGA was used to estimate the optimal state sequences (weather states) based on the precipitation history. Historical root zone soil moisture statistics were then determined based on the weather state conditions. To test the new approach, we selected two different soil moisture fields, Oklahoma (130 km x 130 km) and Illinois (300 km x 500 km), during 1995 to 2009 and 1994 to 2010, respectively. We found that the newly developed framework performed well in predicting root zone soil moisture dynamics at both the spatial scales. Also, the reproduced rainfall occurrence probabilities and dry/wet spell lengths matched well with the observations at the spatio-temporal scales. Since the proposed algorithm requires only precipitation and historical soil moisture data from existing, established weather stations, it can serve an attractive alternative for predicting root zone soil moisture in the future using climate change scenarios and root zone soil moisture history.

  15. A Non-parametric Approach to Constrain the Transfer Function in Reverberation Mapping

    Science.gov (United States)

    Li, Yan-Rong; Wang, Jian-Min; Bai, Jin-Ming

    2016-11-01

    Broad emission lines of active galactic nuclei stem from a spatially extended region (broad-line region, BLR) that is composed of discrete clouds and photoionized by the central ionizing continuum. The temporal behaviors of these emission lines are blurred echoes of continuum variations (i.e., reverberation mapping, RM) and directly reflect the structures and kinematic information of BLRs through the so-called transfer function (also known as the velocity-delay map). Based on the previous works of Rybicki and Press and Zu et al., we develop an extended, non-parametric approach to determine the transfer function for RM data, in which the transfer function is expressed as a sum of a family of relatively displaced Gaussian response functions. Therefore, arbitrary shapes of transfer functions associated with complicated BLR geometry can be seamlessly included, enabling us to relax the presumption of a specified transfer function frequently adopted in previous studies and to let it be determined by observation data. We formulate our approach in a previously well-established framework that incorporates the statistical modeling of continuum variations as a damped random walk process and takes into account long-term secular variations which are irrelevant to RM signals. The application to RM data shows the fidelity of our approach.

  16. 单组设计一元定量资料的统计推断与实验设计(一)%Statistical inference and experimental design of univariate quantitative data of single-group design(part one)

    Institute of Scientific and Technical Information of China (English)

    胡良平; 鲍晓蕾

    2010-01-01

    @@ In the former two issues of this periodical,we have introduced how to correctly express and describe univariate quantitative data of a single-group design,including expressing the data by frequency distribution tables,describing the data by some important indexes,such as the mean,the discrete scale,the maximum and the minimum values,ect.

  17. Dependence between fusion temperatures and chemical components of a certain type of coal using classical, non-parametric and bootstrap techniques

    Energy Technology Data Exchange (ETDEWEB)

    Gonzalez-Manteiga, W.; Prada-Sanchez, J.M.; Fiestras-Janeiro, M.G.; Garcia-Jurado, I. (Universidad de Santiago de Compostela, Santiago de Compostela (Spain). Dept. de Estadistica e Investigacion Operativa)

    1990-11-01

    A statistical study of the dependence between various critical fusion temperatures of a certain kind of coal and its chemical components is carried out. As well as using classical dependence techniques (multiple, stepwise and PLS regression, principal components, canonical correlation, etc.) together with the corresponding inference on the parameters of interest, non-parametric regression and bootstrap inference are also performed. 11 refs., 3 figs., 8 tabs.

  18. Evaluation of model-based versus non-parametric monaural noise-reduction approaches for hearing aids.

    Science.gov (United States)

    Harlander, Niklas; Rosenkranz, Tobias; Hohmann, Volker

    2012-08-01

    Single channel noise reduction has been well investigated and seems to have reached its limits in terms of speech intelligibility improvement, however, the quality of such schemes can still be advanced. This study tests to what extent novel model-based processing schemes might improve performance in particular for non-stationary noise conditions. Two prototype model-based algorithms, a speech-model-based, and a auditory-model-based algorithm were compared to a state-of-the-art non-parametric minimum statistics algorithm. A speech intelligibility test, preference rating, and listening effort scaling were performed. Additionally, three objective quality measures for the signal, background, and overall distortions were applied. For a better comparison of all algorithms, particular attention was given to the usage of the similar Wiener-based gain rule. The perceptual investigation was performed with fourteen hearing-impaired subjects. The results revealed that the non-parametric algorithm and the auditory model-based algorithm did not affect speech intelligibility, whereas the speech-model-based algorithm slightly decreased intelligibility. In terms of subjective quality, both model-based algorithms perform better than the unprocessed condition and the reference in particular for highly non-stationary noise environments. Data support the hypothesis that model-based algorithms are promising for improving performance in non-stationary noise conditions.

  19. The Non-Parametric Model for Linking Galaxy Luminosity with Halo/Subhalo Mass: Are First Brightest Galaxies Special?

    CERN Document Server

    Vale, A

    2007-01-01

    We revisit the longstanding question of whether first brightest cluster galaxies are statistically drawn from the same distribution as other cluster galaxies or are "special", using the new non-parametric, empirically based model presented in Vale&Ostriker (2006) for associating galaxy luminosity with halo/subhalo masses. We introduce scatter in galaxy luminosity at fixed halo mass into this model, building a conditional luminosity function (CLF) by considering two possible models: a simple lognormal and a model based on the distribution of concentration in haloes of a given mass. We show that this model naturally allows an identification of halo/subhalo systems with groups and clusters of galaxies, giving rise to a clear central/satellite galaxy distinction. We then use these results to build up the dependence of brightest cluster galaxy (BCG) magnitudes on cluster luminosity, focusing on two statistical indicators, the dispersion in BCG magnitude and the magnitude difference between first and second bri...

  20. A non-parametric meta-analysis approach for combining independent microarray datasets: application using two microarray datasets pertaining to chronic allograft nephropathy

    Directory of Open Access Journals (Sweden)

    Archer Kellie J

    2008-02-01

    Full Text Available Abstract Background With the popularity of DNA microarray technology, multiple groups of researchers have studied the gene expression of similar biological conditions. Different methods have been developed to integrate the results from various microarray studies, though most of them rely on distributional assumptions, such as the t-statistic based, mixed-effects model, or Bayesian model methods. However, often the sample size for each individual microarray experiment is small. Therefore, in this paper we present a non-parametric meta-analysis approach for combining data from independent microarray studies, and illustrate its application on two independent Affymetrix GeneChip studies that compared the gene expression of biopsies from kidney transplant recipients with chronic allograft nephropathy (CAN to those with normal functioning allograft. Results The simulation study comparing the non-parametric meta-analysis approach to a commonly used t-statistic based approach shows that the non-parametric approach has better sensitivity and specificity. For the application on the two CAN studies, we identified 309 distinct genes that expressed differently in CAN. By applying Fisher's exact test to identify enriched KEGG pathways among those genes called differentially expressed, we found 6 KEGG pathways to be over-represented among the identified genes. We used the expression measurements of the identified genes as predictors to predict the class labels for 6 additional biopsy samples, and the predicted results all conformed to their pathologist diagnosed class labels. Conclusion We present a new approach for combining data from multiple independent microarray studies. This approach is non-parametric and does not rely on any distributional assumptions. The rationale behind the approach is logically intuitive and can be easily understood by researchers not having advanced training in statistics. Some of the identified genes and pathways have been

  1. A non-parametric meta-analysis approach for combining independent microarray datasets: application using two microarray datasets pertaining to chronic allograft nephropathy.

    Science.gov (United States)

    Kong, Xiangrong; Mas, Valeria; Archer, Kellie J

    2008-02-26

    With the popularity of DNA microarray technology, multiple groups of researchers have studied the gene expression of similar biological conditions. Different methods have been developed to integrate the results from various microarray studies, though most of them rely on distributional assumptions, such as the t-statistic based, mixed-effects model, or Bayesian model methods. However, often the sample size for each individual microarray experiment is small. Therefore, in this paper we present a non-parametric meta-analysis approach for combining data from independent microarray studies, and illustrate its application on two independent Affymetrix GeneChip studies that compared the gene expression of biopsies from kidney transplant recipients with chronic allograft nephropathy (CAN) to those with normal functioning allograft. The simulation study comparing the non-parametric meta-analysis approach to a commonly used t-statistic based approach shows that the non-parametric approach has better sensitivity and specificity. For the application on the two CAN studies, we identified 309 distinct genes that expressed differently in CAN. By applying Fisher's exact test to identify enriched KEGG pathways among those genes called differentially expressed, we found 6 KEGG pathways to be over-represented among the identified genes. We used the expression measurements of the identified genes as predictors to predict the class labels for 6 additional biopsy samples, and the predicted results all conformed to their pathologist diagnosed class labels. We present a new approach for combining data from multiple independent microarray studies. This approach is non-parametric and does not rely on any distributional assumptions. The rationale behind the approach is logically intuitive and can be easily understood by researchers not having advanced training in statistics. Some of the identified genes and pathways have been reported to be relevant to renal diseases. Further study on the

  2. A non-parametric peak calling algorithm for DamID-Seq.

    Directory of Open Access Journals (Sweden)

    Renhua Li

    Full Text Available Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS of double sex (DSX-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq. One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only. After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1 reads resampling; 2 reads scaling (normalization and computing signal-to-noise fold changes; 3 filtering; 4 Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC. We also used irreproducible discovery rate (IDR analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.

  3. A non-parametric peak calling algorithm for DamID-Seq.

    Science.gov (United States)

    Li, Renhua; Hempel, Leonie U; Jiang, Tingbo

    2015-01-01

    Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.

  4. Cancer driver gene discovery through an integrative genomics approach in a non-parametric Bayesian framework.

    Science.gov (United States)

    Yang, Hai; Wei, Qiang; Zhong, Xue; Yang, Hushan; Li, Bingshan

    2017-02-15

    Comprehensive catalogue of genes that drive tumor initiation and progression in cancer is key to advancing diagnostics, therapeutics and treatment. Given the complexity of cancer, the catalogue is far from complete yet. Increasing evidence shows that driver genes exhibit consistent aberration patterns across multiple-omics in tumors. In this study, we aim to leverage complementary information encoded in each of the omics data to identify novel driver genes through an integrative framework. Specifically, we integrated mutations, gene expression, DNA copy numbers, DNA methylation and protein abundance, all available in The Cancer Genome Atlas (TCGA) and developed iDriver, a non-parametric Bayesian framework based on multivariate statistical modeling to identify driver genes in an unsupervised fashion. iDriver captures the inherent clusters of gene aberrations and constructs the background distribution that is used to assess and calibrate the confidence of driver genes identified through multi-dimensional genomic data. We applied the method to 4 cancer types in TCGA and identified candidate driver genes that are highly enriched with known drivers. (e.g.: P < 3.40 × 10 -36 for breast cancer). We are particularly interested in novel genes and observed multiple lines of supporting evidence. Using systematic evaluation from multiple independent aspects, we identified 45 candidate driver genes that were not previously known across these 4 cancer types. The finding has important implications that integrating additional genomic data with multivariate statistics can help identify cancer drivers and guide the next stage of cancer genomics research. The C ++ source code is freely available at https://medschool.vanderbilt.edu/cgg/ . hai.yang@vanderbilt.edu or bingshan.li@Vanderbilt.Edu. Supplementary data are available at Bioinformatics online.

  5. Non-Parametric Bayesian Updating within the Assessment of Reliability for Offshore Wind Turbine Support Structures

    DEFF Research Database (Denmark)

    Ramirez, José Rangel; Sørensen, John Dalsgaard

    2011-01-01

    This work illustrates the updating and incorporation of information in the assessment of fatigue reliability for offshore wind turbine. The new information, coming from external and condition monitoring can be used to direct updating of the stochastic variables through a non-parametric Bayesian...... updating approach and be integrated in the reliability analysis by a third-order polynomial chaos expansion approximation. Although Classical Bayesian updating approaches are often used because of its parametric formulation, non-parametric approaches are better alternatives for multi-parametric updating...... with a non-conjugating formulation. The results in this paper show the influence on the time dependent updated reliability when non-parametric and classical Bayesian approaches are used. Further, the influence on the reliability of the number of updated parameters is illustrated....

  6. Non-parametric seismic hazard analysis in the presence of incomplete data

    Science.gov (United States)

    Yazdani, Azad; Mirzaei, Sajjad; Dadkhah, Koroush

    2017-01-01

    The distribution of earthquake magnitudes plays a crucial role in the estimation of seismic hazard parameters. Due to the complexity of earthquake magnitude distribution, non-parametric approaches are recommended over classical parametric methods. The main deficiency of the non-parametric approach is the lack of complete magnitude data in almost all cases. This study aims to introduce an imputation procedure for completing earthquake catalog data that will allow the catalog to be used for non-parametric density estimation. Using a Monte Carlo simulation, the efficiency of introduced approach is investigated. This study indicates that when a magnitude catalog is incomplete, the imputation procedure can provide an appropriate tool for seismic hazard assessment. As an illustration, the imputation procedure was applied to estimate earthquake magnitude distribution in Tehran, the capital city of Iran.

  7. Power of non-parametric linkage analysis in mapping genes contributing to human longevity in long-lived sib-pairs

    DEFF Research Database (Denmark)

    Tan, Qihua; Zhao, J H; Iachine, I

    2004-01-01

    This report investigates the power issue in applying the non-parametric linkage analysis of affected sib-pairs (ASP) [Kruglyak and Lander, 1995: Am J Hum Genet 57:439-454] to localize genes that contribute to human longevity using long-lived sib-pairs. Data were simulated by introducing a recently...... developed statistical model for measuring marker-longevity associations [Yashin et al., 1999: Am J Hum Genet 65:1178-1193], enabling direct power comparison between linkage and association approaches. The non-parametric linkage (NPL) scores estimated in the region harboring the causal allele are evaluated...... in case of a dominant effect. Although the power issue may depend heavily on the true genetic nature in maintaining survival, our study suggests that results from small-scale sib-pair investigations should be referred with caution, given the complexity of human longevity....

  8. Transit Timing Observations from Kepler: II. Confirmation of Two Multiplanet Systems via a Non-parametric Correlation Analysis

    CERN Document Server

    Ford, Eric B; Steffen, Jason H; Carter, Joshua A; Fressin, Francois; Holman, Matthew J; Lissauer, Jack J; Moorhead, Althea V; Morehead, Robert C; Ragozzine, Darin; Rowe, Jason F; Welsh, William F; Allen, Christopher; Batalha, Natalie M; Borucki, William J; Bryson, Stephen T; Buchhave, Lars A; Burke, Christopher J; Caldwell, Douglas A; Charbonneau, David; Clarke, Bruce D; Cochran, William D; Désert, Jean-Michel; Endl, Michael; Everett, Mark E; Fischer, Debra A; Gautier, Thomas N; Gilliland, Ron L; Jenkins, Jon M; Haas, Michael R; Horch, Elliott; Howell, Steve B; Ibrahim, Khadeejah A; Isaacson, Howard; Koch, David G; Latham, David W; Li, Jie; Lucas, Philip; MacQueen, Phillip J; Marcy, Geoffrey W; McCauliff, Sean; Mullally, Fergal R; Quinn, Samuel N; Quintana, Elisa; Shporer, Avi; Still, Martin; Tenenbaum, Peter; Thompson, Susan E; Torres, Guillermo; Twicken, Joseph D; Wohler, Bill

    2012-01-01

    We present a new method for confirming transiting planets based on the combination of transit timingn variations (TTVs) and dynamical stability. Correlated TTVs provide evidence that the pair of bodies are in the same physical system. Orbital stability provides upper limits for the masses of the transiting companions that are in the planetary regime. This paper describes a non-parametric technique for quantifying the statistical significance of TTVs based on the correlation of two TTV data sets. We apply this method to an analysis of the transit timing variations of two stars with multiple transiting planet candidates identified by Kepler. We confirm four transiting planets in two multiple planet systems based on their TTVs and the constraints imposed by dynamical stability. An additional three candidates in these same systems are not confirmed as planets, but are likely to be validated as real planets once further observations and analyses are possible. If all were confirmed, these systems would be near 4:6:...

  9. rSeqNP: a non-parametric approach for detecting differential expression and splicing from RNA-Seq data.

    Science.gov (United States)

    Shi, Yang; Chinnaiyan, Arul M; Jiang, Hui

    2015-07-01

    High-throughput sequencing of transcriptomes (RNA-Seq) has become a powerful tool to study gene expression. Here we present an R package, rSeqNP, which implements a non-parametric approach to test for differential expression and splicing from RNA-Seq data. rSeqNP uses permutation tests to access statistical significance and can be applied to a variety of experimental designs. By combining information across isoforms, rSeqNP is able to detect more differentially expressed or spliced genes from RNA-Seq data. The R package with its source code and documentation are freely available at http://www-personal.umich.edu/∼jianghui/rseqnp/. jianghui@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Non-parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods

    DEFF Research Database (Denmark)

    Høg, Esben

    In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean-reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...

  11. Non-parametric system identification from non-linear stochastic response

    DEFF Research Database (Denmark)

    Rüdinger, Finn; Krenk, Steen

    2001-01-01

    An estimation method is proposed for identification of non-linear stiffness and damping of single-degree-of-freedom systems under stationary white noise excitation. Non-parametric estimates of the stiffness and damping along with an estimate of the white noise intensity are obtained by suitable p...

  12. Non-parametric tests of productive efficiency with errors-in-variables

    NARCIS (Netherlands)

    Kuosmanen, T.K.; Post, T.; Scholtes, S.

    2007-01-01

    We develop a non-parametric test of productive efficiency that accounts for errors-in-variables, following the approach of Varian. [1985. Nonparametric analysis of optimizing behavior with measurement error. Journal of Econometrics 30(1/2), 445-458]. The test is based on the general Pareto-Koopmans

  13. Non-Parametric Bayesian Updating within the Assessment of Reliability for Offshore Wind Turbine Support Structures

    DEFF Research Database (Denmark)

    Ramirez, José Rangel; Sørensen, John Dalsgaard

    2011-01-01

    This work illustrates the updating and incorporation of information in the assessment of fatigue reliability for offshore wind turbine. The new information, coming from external and condition monitoring can be used to direct updating of the stochastic variables through a non-parametric Bayesian u...

  14. Comparison of reliability techniques of parametric and non-parametric method

    Directory of Open Access Journals (Sweden)

    C. Kalaiselvan

    2016-06-01

    Full Text Available Reliability of a product or system is the probability that the product performs adequately its intended function for the stated period of time under stated operating conditions. It is function of time. The most widely used nano ceramic capacitor C0G and X7R is used in this reliability study to generate the Time-to failure (TTF data. The time to failure data are identified by Accelerated Life Test (ALT and Highly Accelerated Life Testing (HALT. The test is conducted at high stress level to generate more failure rate within the short interval of time. The reliability method used to convert accelerated to actual condition is Parametric method and Non-Parametric method. In this paper, comparative study has been done for Parametric and Non-Parametric methods to identify the failure data. The Weibull distribution is identified for parametric method; Kaplan–Meier and Simple Actuarial Method are identified for non-parametric method. The time taken to identify the mean time to failure (MTTF in accelerating condition is the same for parametric and non-parametric method with relative deviation.

  15. Non-Parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods

    DEFF Research Database (Denmark)

    Høg, Esben

    2003-01-01

    In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean--reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...

  16. Non-parametric production analysis of pesticides use in the Netherlands

    NARCIS (Netherlands)

    Oude Lansink, A.G.J.M.; Silva, E.

    2004-01-01

    Many previous empirical studies on the productivity of pesticides suggest that pesticides are under-utilized in agriculture despite the general held believe that these inputs are substantially over-utilized. This paper uses data envelopment analysis (DEA) to calculate non-parametric measures of the

  17. Performances and Spending Efficiency in Higher Education: A European Comparison through Non-Parametric Approaches

    Science.gov (United States)

    Agasisti, Tommaso

    2011-01-01

    The objective of this paper is an efficiency analysis concerning higher education systems in European countries. Data have been extracted from OECD data-sets (Education at a Glance, several years), using a non-parametric technique--data envelopment analysis--to calculate efficiency scores. This paper represents the first attempt to conduct such an…

  18. Low default credit scoring using two-class non-parametric kernel density estimation

    CSIR Research Space (South Africa)

    Rademeyer, E

    2016-12-01

    Full Text Available This paper investigates the performance of two-class classification credit scoring data sets with low default ratios. The standard two-class parametric Gaussian and non-parametric Parzen classifiers are extended, using Bayes’ rule, to include either...

  19. Measuring the influence of networks on transaction costs using a non-parametric regression technique

    DEFF Research Database (Denmark)

    Henningsen, Géraldine; Henningsen, Arne; Henning, Christian H.C.A.

    . We empirically analyse the effect of networks on productivity using a cross-validated local linear non-parametric regression technique and a data set of 384 farms in Poland. Our empirical study generally supports our hypothesis that networks affect productivity. Large and dense trading networks...

  20. Non-Parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods

    DEFF Research Database (Denmark)

    Høg, Esben

    2003-01-01

    In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean--reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...

  1. Non-parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods

    DEFF Research Database (Denmark)

    Høg, Esben

    In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean-reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...

  2. Parametric and Non-Parametric Vibration-Based Structural Identification Under Earthquake Excitation

    Science.gov (United States)

    Pentaris, Fragkiskos P.; Fouskitakis, George N.

    2014-05-01

    The problem of modal identification in civil structures is of crucial importance, and thus has been receiving increasing attention in recent years. Vibration-based methods are quite promising as they are capable of identifying the structure's global characteristics, they are relatively easy to implement and they tend to be time effective and less expensive than most alternatives [1]. This paper focuses on the off-line structural/modal identification of civil (concrete) structures subjected to low-level earthquake excitations, under which, they remain within their linear operating regime. Earthquakes and their details are recorded and provided by the seismological network of Crete [2], which 'monitors' the broad region of south Hellenic arc, an active seismic region which functions as a natural laboratory for earthquake engineering of this kind. A sufficient number of seismic events are analyzed in order to reveal the modal characteristics of the structures under study, that consist of the two concrete buildings of the School of Applied Sciences, Technological Education Institute of Crete, located in Chania, Crete, Hellas. Both buildings are equipped with high-sensitivity and accuracy seismographs - providing acceleration measurements - established at the basement (structure's foundation) presently considered as the ground's acceleration (excitation) and at all levels (ground floor, 1st floor, 2nd floor and terrace). Further details regarding the instrumentation setup and data acquisition may be found in [3]. The present study invokes stochastic, both non-parametric (frequency-based) and parametric methods for structural/modal identification (natural frequencies and/or damping ratios). Non-parametric methods include Welch-based spectrum and Frequency response Function (FrF) estimation, while parametric methods, include AutoRegressive (AR), AutoRegressive with eXogeneous input (ARX) and Autoregressive Moving-Average with eXogeneous input (ARMAX) models[4, 5

  3. Validation of two (parametric vs non-parametric) daily weather generators

    Science.gov (United States)

    Dubrovsky, M.; Skalak, P.

    2015-12-01

    As the climate models (GCMs and RCMs) fail to satisfactorily reproduce the real-world surface weather regime, various statistical methods are applied to downscale GCM/RCM outputs into site-specific weather series. The stochastic weather generators are among the most favourite downscaling methods capable to produce realistic (observed-like) meteorological inputs for agrological, hydrological and other impact models used in assessing sensitivity of various ecosystems to climate change/variability. To name their advantages, the generators may (i) produce arbitrarily long multi-variate synthetic weather series representing both present and changed climates (in the latter case, the generators are commonly modified by GCM/RCM-based climate change scenarios), (ii) be run in various time steps and for multiple weather variables (the generators reproduce the correlations among variables), (iii) be interpolated (and run also for sites where no weather data are available to calibrate the generator). This contribution will compare two stochastic daily weather generators in terms of their ability to reproduce various features of the daily weather series. M&Rfi is a parametric generator: Markov chain model is used to model precipitation occurrence, precipitation amount is modelled by the Gamma distribution, and the 1st order autoregressive model is used to generate non-precipitation surface weather variables. The non-parametric GoMeZ generator is based on the nearest neighbours resampling technique making no assumption on the distribution of the variables being generated. Various settings of both weather generators will be assumed in the present validation tests. The generators will be validated in terms of (a) extreme temperature and precipitation characteristics (annual and 30-years extremes and maxima of duration of hot/cold/dry/wet spells); (b) selected validation statistics developed within the frame of VALUE project. The tests will be based on observational weather series

  4. A Non-Parametric and Entropy Based Analysis of the Relationship between the VIX and S&P 500

    Directory of Open Access Journals (Sweden)

    Abhay K. Singh

    2013-10-01

    Full Text Available This paper features an analysis of the relationship between the S&P 500 Index and the VIX using daily data obtained from the CBOE website and SIRCA (The Securities Industry Research Centre of the Asia Pacific. We explore the relationship between the S&P 500 daily return series and a similar series for the VIX in terms of a long sample drawn from the CBOE from 1990 to mid 2011 and a set of returns from SIRCA’s TRTH datasets from March 2005 to-date. This shorter sample, which captures the behavior of the new VIX, introduced in 2003, is divided into four sub-samples which permit the exploration of the impact of the Global Financial Crisis. We apply a series of non-parametric based tests utilizing entropy based metrics. These suggest that the PDFs and CDFs of these two return distributions change shape in various subsample periods. The entropy and MI statistics suggest that the degree of uncertainty attached to these distributions changes through time and using the S&P 500 return as the dependent variable, that the amount of information obtained from the VIX changes with time and reaches a relative maximum in the most recent period from 2011 to 2012. The entropy based non-parametric tests of the equivalence of the two distributions and their symmetry all strongly reject their respective nulls. The results suggest that parametric techniques do not adequately capture the complexities displayed in the behavior of these series. This has practical implications for hedging utilizing derivatives written on the VIX.

  5. Mathematical statistics

    CERN Document Server

    Pestman, Wiebe R

    2009-01-01

    This textbook provides a broad and solid introduction to mathematical statistics, including the classical subjects hypothesis testing, normal regression analysis, and normal analysis of variance. In addition, non-parametric statistics and vectorial statistics are considered, as well as applications of stochastic analysis in modern statistics, e.g., Kolmogorov-Smirnov testing, smoothing techniques, robustness and density estimation. For students with some elementary mathematical background. With many exercises. Prerequisites from measure theory and linear algebra are presented.

  6. Applications of non-parametric statistics and analysis of variance on sample variances

    Science.gov (United States)

    Myers, R. H.

    1981-01-01

    Nonparametric methods that are available for NASA-type applications are discussed. An attempt will be made here to survey what can be used, to attempt recommendations as to when each would be applicable, and to compare the methods, when possible, with the usual normal-theory procedures that are avavilable for the Gaussion analog. It is important here to point out the hypotheses that are being tested, the assumptions that are being made, and limitations of the nonparametric procedures. The appropriateness of doing analysis of variance on sample variances are also discussed and studied. This procedure is followed in several NASA simulation projects. On the surface this would appear to be reasonably sound procedure. However, difficulties involved center around the normality problem and the basic homogeneous variance assumption that is mase in usual analysis of variance problems. These difficulties discussed and guidelines given for using the methods.

  7. Non-parametric Bayesian human motion recognition using a single MEMS tri-axial accelerometer.

    Science.gov (United States)

    Ahmed, M Ejaz; Song, Ju Bin

    2012-09-27

    In this paper, we propose a non-parametric clustering method to recognize the number of human motions using features which are obtained from a single microelectromechanical system (MEMS) accelerometer. Since the number of human motions under consideration is not known a priori and because of the unsupervised nature of the proposed technique, there is no need to collect training data for the human motions. The infinite Gaussian mixture model (IGMM) and collapsed Gibbs sampler are adopted to cluster the human motions using extracted features. From the experimental results, we show that the unanticipated human motions are detected and recognized with significant accuracy, as compared with the parametric Fuzzy C-Mean (FCM) technique, the unsupervised K-means algorithm, and the non-parametric mean-shift method.

  8. Non-Parametric Bayesian Human Motion Recognition Using a Single MEMS Tri-Axial Accelerometer

    Directory of Open Access Journals (Sweden)

    M. Ejaz Ahmed

    2012-09-01

    Full Text Available In this paper, we propose a non-parametric clustering method to recognize the number of human motions using features which are obtained from a single microelectromechanical system (MEMS accelerometer. Since the number of human motions under consideration is not known a priori and because of the unsupervised nature of the proposed technique, there is no need to collect training data for the human motions. The infinite Gaussian mixture model (IGMM and collapsed Gibbs sampler are adopted to cluster the human motions using extracted features. From the experimental results, we show that the unanticipated human motions are detected and recognized with significant accuracy, as compared with the parametric Fuzzy C-Mean (FCM technique, the unsupervised K-means algorithm, and the non-parametric mean-shift method.

  9. 单组设计一元定量资料的统计表达与描述(一)%Statistical expression and description of univariate quantitative data of single-group design (part one)

    Institute of Scientific and Technical Information of China (English)

    胡良平; 胡纯严; 鲍晓蕾

    2010-01-01

    @@ In order to obtain precise and reliable research data, we have to bear in mind a scientific and consummate research design as well as rigorous quality control means. Before doing a comprehensive statistical analysis to the data, we have to do statistical expression and description, sometimes even an exploratory analysis, in order to provide a necessary clue for the formal statistical analysis. This article is aimed to tell readers how to correctly express and describe research data[1,2].First of all, let's look at some questions and data.

  10. Estimating Financial Risk Measures for Futures Positions:A Non-Parametric Approach

    OpenAIRE

    Cotter, John; dowd, kevin

    2011-01-01

    This paper presents non-parametric estimates of spectral risk measures applied to long and short positions in 5 prominent equity futures contracts. It also compares these to estimates of two popular alternative measures, the Value-at-Risk (VaR) and Expected Shortfall (ES). The spectral risk measures are conditioned on the coefficient of absolute risk aversion, and the latter two are conditioned on the confidence level. Our findings indicate that all risk measures increase dramatically and the...

  11. A Non Parametric Study of the Volatility of the Economy as a Country Risk Predictor

    CERN Document Server

    Costanzo, Sabatino; Dominguez, Ramses; Moreno, William

    2007-01-01

    This paper intends to explain Venezuela's country spread behavior through the Neural Networks analysis of a monthly economic activity general index of economic indicators constructed by the Central Bank of Venezuela, a measure of the shocks affecting country risk of emerging markets and the U.S. short term interest rate. The use of non parametric methods allowed the finding of non linear relationship between these inputs and the country risk. The networks performance was evaluated using the method of excess predictability.

  12. A Comparison of Parametric and Non-Parametric Methods Applied to a Likert Scale.

    Science.gov (United States)

    Mircioiu, Constantin; Atkinson, Jeffrey

    2017-05-10

    A trenchant and passionate dispute over the use of parametric versus non-parametric methods for the analysis of Likert scale ordinal data has raged for the past eight decades. The answer is not a simple "yes" or "no" but is related to hypotheses, objectives, risks, and paradigms. In this paper, we took a pragmatic approach. We applied both types of methods to the analysis of actual Likert data on responses from different professional subgroups of European pharmacists regarding competencies for practice. Results obtained show that with "large" (>15) numbers of responses and similar (but clearly not normal) distributions from different subgroups, parametric and non-parametric analyses give in almost all cases the same significant or non-significant results for inter-subgroup comparisons. Parametric methods were more discriminant in the cases of non-similar conclusions. Considering that the largest differences in opinions occurred in the upper part of the 4-point Likert scale (ranks 3 "very important" and 4 "essential"), a "score analysis" based on this part of the data was undertaken. This transformation of the ordinal Likert data into binary scores produced a graphical representation that was visually easier to understand as differences were accentuated. In conclusion, in this case of Likert ordinal data with high response rates, restraining the analysis to non-parametric methods leads to a loss of information. The addition of parametric methods, graphical analysis, analysis of subsets, and transformation of data leads to more in-depth analyses.

  13. Non-parametric foreground subtraction for 21cm epoch of reionization experiments

    CERN Document Server

    Harker, Geraint; Bernardi, Gianni; Brentjens, Michiel A; De Bruyn, A G; Ciardi, Benedetta; Jelic, Vibor; Koopmans, Leon V E; Labropoulos, Panagiotis; Mellema, Garrelt; Offringa, Andre; Pandey, V N; Schaye, Joop; Thomas, Rajat M; Yatawatta, Sarod

    2009-01-01

    An obstacle to the detection of redshifted 21cm emission from the epoch of reionization (EoR) is the presence of foregrounds which exceed the cosmological signal in intensity by orders of magnitude. We argue that in principle it would be better to fit the foregrounds non-parametrically - allowing the data to determine their shape - rather than selecting some functional form in advance and then fitting its parameters. Non-parametric fits often suffer from other problems, however. We discuss these before suggesting a non-parametric method, Wp smoothing, which seems to avoid some of them. After outlining the principles of Wp smoothing we describe an algorithm used to implement it. We then apply Wp smoothing to a synthetic data cube for the LOFAR EoR experiment. The performance of Wp smoothing, measured by the extent to which it is able to recover the variance of the cosmological signal and to which it avoids leakage of power from the foregrounds, is compared to that of a parametric fit, and to another non-parame...

  14. Non-parametric Tuning of PID Controllers A Modified Relay-Feedback-Test Approach

    CERN Document Server

    Boiko, Igor

    2013-01-01

    The relay feedback test (RFT) has become a popular and efficient  tool used in process identification and automatic controller tuning. Non-parametric Tuning of PID Controllers couples new modifications of classical RFT with application-specific optimal tuning rules to form a non-parametric method of test-and-tuning. Test and tuning are coordinated through a set of common parameters so that a PID controller can obtain the desired gain or phase margins in a system exactly, even with unknown process dynamics. The concept of process-specific optimal tuning rules in the nonparametric setup, with corresponding tuning rules for flow, level pressure, and temperature control loops is presented in the text.   Common problems of tuning accuracy based on parametric and non-parametric approaches are addressed. In addition, the text treats the parametric approach to tuning based on the modified RFT approach and the exact model of oscillations in the system under test using the locus of a perturbedrelay system (LPRS) meth...

  15. Log-concave Probability Distributions: Theory and Statistical Testing

    DEFF Research Database (Denmark)

    An, Mark Yuing

    1996-01-01

    This paper studies the broad class of log-concave probability distributions that arise in economics of uncertainty and information. For univariate, continuous, and log-concave random variables we prove useful properties without imposing the differentiability of density functions. Discrete...... and multivariate distributions are also discussed. We propose simple non-parametric testing procedures for log-concavity. The test statistics are constructed to test one of the two implicati ons of log-concavity: increasing hazard rates and new-is-better-than-used (NBU) property. The test for increasing hazard...... rates are based on normalized spacing of the sample order statistics. The tests for NBU property fall into the category of Hoeffding's U-statistics...

  16. There Is More Than One Univariate Normal Distribution: What Is the Normal Distribution, Really?

    Science.gov (United States)

    Team, Rachel M.

    Many univariate statistical methods, such as the analysis of variance, t-test, and regression, assume that the dependent variable data have a univariate normal distribution (Hinkle, Weirsma, and Jurs, 1998). Various other statistical methods assume that the error scores are normally distributed (Thompson, 1992). Violating this assumption can be…

  17. Detecting correlation changes in multivariate time series: A comparison of four non-parametric change point detection methods.

    Science.gov (United States)

    Cabrieto, Jedelyn; Tuerlinckx, Francis; Kuppens, Peter; Grassmann, Mariel; Ceulemans, Eva

    2017-06-01

    Change point detection in multivariate time series is a complex task since next to the mean, the correlation structure of the monitored variables may also alter when change occurs. DeCon was recently developed to detect such changes in mean and\\or correlation by combining a moving windows approach and robust PCA. However, in the literature, several other methods have been proposed that employ other non-parametric tools: E-divisive, Multirank, and KCP. Since these methods use different statistical approaches, two issues need to be tackled. First, applied researchers may find it hard to appraise the differences between the methods. Second, a direct comparison of the relative performance of all these methods for capturing change points signaling correlation changes is still lacking. Therefore, we present the basic principles behind DeCon, E-divisive, Multirank, and KCP and the corresponding algorithms, to make them more accessible to readers. We further compared their performance through extensive simulations using the settings of Bulteel et al. (Biological Psychology, 98 (1), 29-42, 2014) implying changes in mean and in correlation structure and those of Matteson and James (Journal of the American Statistical Association, 109 (505), 334-345, 2014) implying different numbers of (noise) variables. KCP emerged as the best method in almost all settings. However, in case of more than two noise variables, only DeCon performed adequately in detecting correlation changes.

  18. Two new non-parametric tests to the distance duality relation with galaxy clusters

    CERN Document Server

    Costa, S S; Holanda, R F L

    2015-01-01

    The cosmic distance duality relation is a milestone of cosmology involving the luminosity and angular diameter distances. Any departure of the relation points to new physics or systematic errors in the observations, therefore tests of the relation are extremely important to build a consistent cosmological framework. Here, two new tests are proposed based on galaxy clusters observations (angular diameter distance and gas mass fraction) and $H(z)$ measurements. By applying Gaussian Processes, a non-parametric method, we are able to derive constraints on departures of the relation where no evidence of deviation is found in both methods, reinforcing the cosmological and astrophysical hypotheses adopted so far.

  19. Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data.

    Science.gov (United States)

    Tan, Qihua; Thomassen, Mads; Burton, Mark; Mose, Kristian Fredløv; Andersen, Klaus Ejner; Hjelmborg, Jacob; Kruse, Torben

    2017-06-06

    Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray time-course data and for exploring the complex relationships in the omics data for studying their association with disease and health.

  20. Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data

    DEFF Research Database (Denmark)

    Tan, Qihua; Thomassen, Mads; Burton, Mark

    2017-01-01

    Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering...... the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray...... time-course data and for exploring the complex relationships in the omics data for studying their association with disease and health....

  1. Non-parametric trend analysis of water quality data of rivers in Kansas

    Science.gov (United States)

    Yu, Y.-S.; Zou, S.; Whittemore, D.

    1993-01-01

    Surface water quality data for 15 sampling stations in the Arkansas, Verdigris, Neosho, and Walnut river basins inside the state of Kansas were analyzed to detect trends (or lack of trends) in 17 major constituents by using four different non-parametric methods. The results show that concentrations of specific conductance, total dissolved solids, calcium, total hardness, sodium, potassium, alkalinity, sulfate, chloride, total phosphorus, ammonia plus organic nitrogen, and suspended sediment generally have downward trends. Some of the downward trends are related to increases in discharge, while others could be caused by decreases in pollution sources. Homogeneity tests show that both station-wide trends and basinwide trends are non-homogeneous. ?? 1993.

  2. Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data

    DEFF Research Database (Denmark)

    Tan, Qihua; Thomassen, Mads; Burton, Mark

    2017-01-01

    Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering...... the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray...

  3. The geometry of distributional preferences and a non-parametric identification approach: The Equality Equivalence Test.

    Science.gov (United States)

    Kerschbamer, Rudolf

    2015-05-01

    This paper proposes a geometric delineation of distributional preference types and a non-parametric approach for their identification in a two-person context. It starts with a small set of assumptions on preferences and shows that this set (i) naturally results in a taxonomy of distributional archetypes that nests all empirically relevant types considered in previous work; and (ii) gives rise to a clean experimental identification procedure - the Equality Equivalence Test - that discriminates between archetypes according to core features of preferences rather than properties of specific modeling variants. As a by-product the test yields a two-dimensional index of preference intensity.

  4. Scaling of preferential flow in biopores by parametric or non parametric transfer functions

    Science.gov (United States)

    Zehe, E.; Hartmann, N.; Klaus, J.; Palm, J.; Schroeder, B.

    2009-04-01

    finally assign the measured hydraulic capacities to these pores. By combining this population of macropores with observed data on soil hydraulic properties we obtain a virtual reality. Flow and transport is simulated for different rainfall forcings comparing two models, Hydrus 3d and Catflow. The simulated cumulative travel depths distributions for different forcings will be linked to the cumulative depth distribution of connected flow paths. The latter describes the fraction of connected paths - where flow resistance is always below a selected threshold that links the surface to a certain critical depth. Systematic variation of the average number of macropores and their depth distributions will show whether a clear link between the simulated travel depths distributions and the depth distribution of connected paths may be identified. The third essential step is to derive a non parametric transfer function that predicts travel depth distributions of tracers and on the long term pesticides based on easy-to-assess subsurface characteristics (mainly density and depth distribution of worm burrows, soil matrix properties), initial conditions and rainfall forcing. Such a transfer function is independent of scale ? as long as we stay in the same ensemble i.e. worm population and soil properties stay the same. Shipitalo, M.J. and Butt, K.R. (1999): Occupancy and geometrical properties of Lumbricus terrestris L. burrows affecting infiltration. Pedobiologia 43:782-794 Zehe E, and Fluehler H. (2001b): Slope scale distribution of flow patterns in soil profiles. J. Hydrol. 247: 116-132.

  5. Cliff´s Delta Calculator: A non-parametric effect size program for two groups of observations

    Directory of Open Access Journals (Sweden)

    Guillermo Macbeth

    2011-05-01

    Full Text Available The Cliff´s Delta statistic is an effect size measure that quantifies the amount of difference between two non-parametric variables beyond p-values interpretation. This measure can be understood as a useful complementary analysis for the corresponding hypothesis testing. During the last two decades the use of effect size measures has been strongly encouraged by methodologists and leading institutions of behavioral sciences. The aim of this contribution is to introduce the Cliff´s Delta Calculator software that performs such analysis and offers some interpretation tips. Differences and similarities with the parametric case are analysed and illustrated. The implementation of this free program is fully described and compared with other calculators. Alternative algorithmic approaches are mathematically analysed and a basic linear algebra proof of its equivalence is formally presented. Two worked examples in cognitive psychology are commented. A visual interpretation of Cliff´s Delta is suggested. Availability, installation and applications of the program are presented and discussed.

  6. A web application for evaluating Phase I methods using a non-parametric optimal benchmark.

    Science.gov (United States)

    Wages, Nolan A; Varhegyi, Nikole

    2017-06-01

    In evaluating the performance of Phase I dose-finding designs, simulation studies are typically conducted to assess how often a method correctly selects the true maximum tolerated dose under a set of assumed dose-toxicity curves. A necessary component of the evaluation process is to have some concept for how well a design can possibly perform. The notion of an upper bound on the accuracy of maximum tolerated dose selection is often omitted from the simulation study, and the aim of this work is to provide researchers with accessible software to quickly evaluate the operating characteristics of Phase I methods using a benchmark. The non-parametric optimal benchmark is a useful theoretical tool for simulations that can serve as an upper limit for the accuracy of maximum tolerated dose identification based on a binary toxicity endpoint. It offers researchers a sense of the plausibility of a Phase I method's operating characteristics in simulation. We have developed an R shiny web application for simulating the benchmark. The web application has the ability to quickly provide simulation results for the benchmark and requires no programming knowledge. The application is free to access and use on any device with an Internet browser. The application provides the percentage of correct selection of the maximum tolerated dose and an accuracy index, operating characteristics typically used in evaluating the accuracy of dose-finding designs. We hope this software will facilitate the use of the non-parametric optimal benchmark as an evaluation tool in dose-finding simulation.

  7. Application of the LSQR algorithm in non-parametric estimation of aerosol size distribution

    Science.gov (United States)

    He, Zhenzong; Qi, Hong; Lew, Zhongyuan; Ruan, Liming; Tan, Heping; Luo, Kun

    2016-05-01

    Based on the Least Squares QR decomposition (LSQR) algorithm, the aerosol size distribution (ASD) is retrieved in non-parametric approach. The direct problem is solved by the Anomalous Diffraction Approximation (ADA) and the Lambert-Beer Law. An optimal wavelength selection method is developed to improve the retrieval accuracy of the ASD. The proposed optimal wavelength set is selected by the method which can make the measurement signals sensitive to wavelength and decrease the degree of the ill-condition of coefficient matrix of linear systems effectively to enhance the anti-interference ability of retrieval results. Two common kinds of monomodal and bimodal ASDs, log-normal (L-N) and Gamma distributions, are estimated, respectively. Numerical tests show that the LSQR algorithm can be successfully applied to retrieve the ASD with high stability in the presence of random noise and low susceptibility to the shape of distributions. Finally, the experimental measurement ASD over Harbin in China is recovered reasonably. All the results confirm that the LSQR algorithm combined with the optimal wavelength selection method is an effective and reliable technique in non-parametric estimation of ASD.

  8. Non-parametric transformation for data correlation and integration: From theory to practice

    Energy Technology Data Exchange (ETDEWEB)

    Datta-Gupta, A.; Xue, Guoping; Lee, Sang Heon [Texas A& M Univ., College Station, TX (United States)

    1997-08-01

    The purpose of this paper is two-fold. First, we introduce the use of non-parametric transformations for correlating petrophysical data during reservoir characterization. Such transformations are completely data driven and do not require a priori functional relationship between response and predictor variables which is the case with traditional multiple regression. The transformations are very general, computationally efficient and can easily handle mixed data types for example, continuous variables such as porosity, permeability and categorical variables such as rock type, lithofacies. The power of the non-parametric transformation techniques for data correlation has been illustrated through synthetic and field examples. Second, we utilize these transformations to propose a two-stage approach for data integration during heterogeneity characterization. The principal advantages of our approach over traditional cokriging or cosimulation methods are: (1) it does not require a linear relationship between primary and secondary data, (2) it exploits the secondary information to its fullest potential by maximizing the correlation between the primary and secondary data, (3) it can be easily applied to cases where several types of secondary or soft data are involved, and (4) it significantly reduces variance function calculations and thus, greatly facilitates non-Gaussian cosimulation. We demonstrate the data integration procedure using synthetic and field examples. The field example involves estimation of pore-footage distribution using well data and multiple seismic attributes.

  9. Robust non-parametric one-sample tests for the analysis of recurrent events.

    Science.gov (United States)

    Rebora, Paola; Galimberti, Stefania; Valsecchi, Maria Grazia

    2010-12-30

    One-sample non-parametric tests are proposed here for inference on recurring events. The focus is on the marginal mean function of events and the basis for inference is the standardized distance between the observed and the expected number of events under a specified reference rate. Different weights are considered in order to account for various types of alternative hypotheses on the mean function of the recurrent events process. A robust version and a stratified version of the test are also proposed. The performance of these tests was investigated through simulation studies under various underlying event generation processes, such as homogeneous and nonhomogeneous Poisson processes, autoregressive and renewal processes, with and without frailty effects. The robust versions of the test have been shown to be suitable in a wide variety of event generating processes. The motivating context is a study on gene therapy in a very rare immunodeficiency in children, where a major end-point is the recurrence of severe infections. Robust non-parametric one-sample tests for recurrent events can be useful to assess efficacy and especially safety in non-randomized studies or in epidemiological studies for comparison with a standard population.

  10. A non-parametric approach to estimate the total deviation index for non-normal data.

    Science.gov (United States)

    Perez-Jaume, Sara; Carrasco, Josep L

    2015-11-10

    Concordance indices are used to assess the degree of agreement between different methods that measure the same characteristic. In this context, the total deviation index (TDI) is an unscaled concordance measure that quantifies to which extent the readings from the same subject obtained by different methods may differ with a certain probability. Common approaches to estimate the TDI assume data are normally distributed and linearity between response and effects (subjects, methods and random error). Here, we introduce a new non-parametric methodology for estimation and inference of the TDI that can deal with any kind of quantitative data. The present study introduces this non-parametric approach and compares it with the already established methods in two real case examples that represent situations of non-normal data (more specifically, skewed data and count data). The performance of the already established methodologies and our approach in these contexts is assessed by means of a simulation study. Copyright © 2015 John Wiley & Sons, Ltd.

  11. A non-parametric Bayesian approach for clustering and tracking non-stationarities of neural spikes.

    Science.gov (United States)

    Shalchyan, Vahid; Farina, Dario

    2014-02-15

    Neural spikes from multiple neurons recorded in a multi-unit signal are usually separated by clustering. Drifts in the position of the recording electrode relative to the neurons over time cause gradual changes in the position and shapes of the clusters, challenging the clustering task. By dividing the data into short time intervals, Bayesian tracking of the clusters based on Gaussian cluster model has been previously proposed. However, the Gaussian cluster model is often not verified for neural spikes. We present a Bayesian clustering approach that makes no assumptions on the distribution of the clusters and use kernel-based density estimation of the clusters in every time interval as a prior for Bayesian classification of the data in the subsequent time interval. The proposed method was tested and compared to Gaussian model-based approach for cluster tracking by using both simulated and experimental datasets. The results showed that the proposed non-parametric kernel-based density estimation of the clusters outperformed the sequential Gaussian model fitting in both simulated and experimental data tests. Using non-parametric kernel density-based clustering that makes no assumptions on the distribution of the clusters enhances the ability of tracking cluster non-stationarity over time with respect to the Gaussian cluster modeling approach. Copyright © 2013 Elsevier B.V. All rights reserved.

  12. Non-parametric iterative model constraint graph min-cut for automatic kidney segmentation.

    Science.gov (United States)

    Freiman, M; Kronman, A; Esses, S J; Joskowicz, L; Sosna, J

    2010-01-01

    We present a new non-parametric model constraint graph min-cut algorithm for automatic kidney segmentation in CT images. The segmentation is formulated as a maximum a-posteriori estimation of a model-driven Markov random field. A non-parametric hybrid shape and intensity model is treated as a latent variable in the energy functional. The latent model and labeling map that minimize the energy functional are then simultaneously computed with an expectation maximization approach. The main advantages of our method are that it does not assume a fixed parametric prior model, which is subjective to inter-patient variability and registration errors, and that it combines both the model and the image information into a unified graph min-cut based segmentation framework. We evaluated our method on 20 kidneys from 10 CT datasets with and without contrast agent for which ground-truth segmentations were generated by averaging three manual segmentations. Our method yields an average volumetric overlap error of 10.95%, and average symmetric surface distance of 0.79 mm. These results indicate that our method is accurate and robust for kidney segmentation.

  13. MEASURING DARK MATTER PROFILES NON-PARAMETRICALLY IN DWARF SPHEROIDALS: AN APPLICATION TO DRACO

    Energy Technology Data Exchange (ETDEWEB)

    Jardel, John R.; Gebhardt, Karl [Department of Astronomy, The University of Texas, 2515 Speedway, Stop C1400, Austin, TX 78712-1205 (United States); Fabricius, Maximilian H.; Williams, Michael J. [Max-Planck Institut fuer extraterrestrische Physik, Giessenbachstrasse, D-85741 Garching bei Muenchen (Germany); Drory, Niv, E-mail: jardel@astro.as.utexas.edu [Instituto de Astronomia, Universidad Nacional Autonoma de Mexico, Avenida Universidad 3000, Ciudad Universitaria, C.P. 04510 Mexico D.F. (Mexico)

    2013-02-15

    We introduce a novel implementation of orbit-based (or Schwarzschild) modeling that allows dark matter density profiles to be calculated non-parametrically in nearby galaxies. Our models require no assumptions to be made about velocity anisotropy or the dark matter profile. The technique can be applied to any dispersion-supported stellar system, and we demonstrate its use by studying the Local Group dwarf spheroidal galaxy (dSph) Draco. We use existing kinematic data at larger radii and also present 12 new radial velocities within the central 13 pc obtained with the VIRUS-W integral field spectrograph on the 2.7 m telescope at McDonald Observatory. Our non-parametric Schwarzschild models find strong evidence that the dark matter profile in Draco is cuspy for 20 {<=} r {<=} 700 pc. The profile for r {>=} 20 pc is well fit by a power law with slope {alpha} = -1.0 {+-} 0.2, consistent with predictions from cold dark matter simulations. Our models confirm that, despite its low baryon content relative to other dSphs, Draco lives in a massive halo.

  14. A Bayesian non-parametric Potts model with application to pre-surgical FMRI data.

    Science.gov (United States)

    Johnson, Timothy D; Liu, Zhuqing; Bartsch, Andreas J; Nichols, Thomas E

    2013-08-01

    The Potts model has enjoyed much success as a prior model for image segmentation. Given the individual classes in the model, the data are typically modeled as Gaussian random variates or as random variates from some other parametric distribution. In this article, we present a non-parametric Potts model and apply it to a functional magnetic resonance imaging study for the pre-surgical assessment of peritumoral brain activation. In our model, we assume that the Z-score image from a patient can be segmented into activated, deactivated, and null classes, or states. Conditional on the class, or state, the Z-scores are assumed to come from some generic distribution which we model non-parametrically using a mixture of Dirichlet process priors within the Bayesian framework. The posterior distribution of the model parameters is estimated with a Markov chain Monte Carlo algorithm, and Bayesian decision theory is used to make the final classifications. Our Potts prior model includes two parameters, the standard spatial regularization parameter and a parameter that can be interpreted as the a priori probability that each voxel belongs to the null, or background state, conditional on the lack of spatial regularization. We assume that both of these parameters are unknown, and jointly estimate them along with other model parameters. We show through simulation studies that our model performs on par, in terms of posterior expected loss, with parametric Potts models when the parametric model is correctly specified and outperforms parametric models when the parametric model in misspecified.

  15. Properties of generalized univariate hypergeometric functions

    NARCIS (Netherlands)

    van de Bult, F.J.; Rains, E.M.; Stokman, J.V.

    2007-01-01

    Abstract: Based on Spiridonov’s analysis of elliptic generalizations of the Gauss hypergeometric function, we develop a common framework for 7-parameter families of generalized elliptic, hyperbolic and trigonometric univariate hypergeometric functions. In each case we derive the symmetries of the ge

  16. Properties of generalized univariate hypergeometric functions

    NARCIS (Netherlands)

    van de Bult, F.J.; Rains, E.M.; Stokman, J.V.

    2007-01-01

    Abstract: Based on Spiridonov’s analysis of elliptic generalizations of the Gauss hypergeometric function, we develop a common framework for 7-parameter families of generalized elliptic, hyperbolic and trigonometric univariate hypergeometric functions. In each case we derive the symmetries of the

  17. Non-parametric methods – Tree and P-CFA – for the ecological evaluation and assessment of suitable aquatic habitats: A contribution to fish psychology

    Directory of Open Access Journals (Sweden)

    Andreas H. Melcher

    2012-09-01

    Full Text Available This study analyses multidimensional spawning habitat suitability of the fish species “Nase” (latin: Chondrostoma nasus. This is the first time non-parametric methods were used to better understand biotic habitat use in theory and practice. In particular, we tested (1 the Decision Tree technique, Chi-squared Automatic Interaction Detectors (CHAID, to identify specific habitat types and (2 Prediction-Configural Frequency Analysis (P-CFA to test for statistical significance. The combination of both non-parametric methods, CHAID and P-CFA, enabled the identification, prediction and interpretation of most typical significant spawning habitats, and we were also able to determine non-typical habitat types, e.g., types in contrast to antitypes. The gradual combination of these two methods underlined three significant habitat types: shaded habitat, fine and coarse substrate habitat depending on high flow velocity. The study affirmed the importance for fish species of shading and riparian vegetation along river banks. In addition, this method provides a weighting of interactions between specific habitat characteristics. The results demonstrate that efficient river restoration requires re-establishing riparian vegetation as well as the open river continuum and hydro-morphological improvements to habitats.

  18. Univariate normalization of bispectrum using Hölder's inequality.

    Science.gov (United States)

    Shahbazi, Forooz; Ewald, Arne; Nolte, Guido

    2014-08-15

    Considering that many biological systems including the brain are complex non-linear systems, suitable methods capable of detecting these non-linearities are required to study the dynamical properties of these systems. One of these tools is the third order cummulant or cross-bispectrum, which is a measure of interfrequency interactions between three signals. For convenient interpretation, interaction measures are most commonly normalized to be independent of constant scales of the signals such that its absolute values are bounded by one, with this limit reflecting perfect coupling. Although many different normalization factors for cross-bispectra were suggested in the literature these either do not lead to bounded measures or are themselves dependent on the coupling and not only on the scale of the signals. In this paper we suggest a normalization factor which is univariate, i.e., dependent only on the amplitude of each signal and not on the interactions between signals. Using a generalization of Hölder's inequality it is proven that the absolute value of this univariate bicoherence is bounded by zero and one. We compared three widely used normalizations to the univariate normalization concerning the significance of bicoherence values gained from resampling tests. Bicoherence values are calculated from real EEG data recorded in an eyes closed experiment from 10 subjects. The results show slightly more significant values for the univariate normalization but in general, the differences are very small or even vanishing in some subjects. Therefore, we conclude that the normalization factor does not play an important role in the bicoherence values with regard to statistical power, although a univariate normalization is the only normalization factor which fulfills all the required conditions of a proper normalization.

  19. Handbook of univariate and multivariate data analysis with IBM SPSS

    CERN Document Server

    Ho, Robert

    2013-01-01

    Using the same accessible, hands-on approach as its best-selling predecessor, the Handbook of Univariate and Multivariate Data Analysis with IBM SPSS, Second Edition explains how to apply statistical tests to experimental findings, identify the assumptions underlying the tests, and interpret the findings. This second edition now covers more topics and has been updated with the SPSS statistical package for Windows.New to the Second EditionThree new chapters on multiple discriminant analysis, logistic regression, and canonical correlationNew section on how to deal with missing dataCoverage of te

  20. Evaluation of droplet size distributions using univariate and multivariate approaches

    DEFF Research Database (Denmark)

    Gauno, M.H.; Larsen, C.C.; Vilhelmsen, T.

    2013-01-01

    of the distribution. The current study was aiming to compare univariate and multivariate approach in evaluating droplet size distributions. As a model system, the atomization of a coating solution from a two-fluid nozzle was investigated. The effect of three process parameters (concentration of ethyl cellulose....... Investigation of loading and score plots from principal component analysis (PCA) revealed additional information on the droplet size distributions and it was possible to identify univariate statistics (volume median droplet size), which were similar, however, originating from varying droplet size distributions....... The multivariate data analysis was proven to be an efficient tool for evaluating the full information contained in a distribution. © 2013 Informa Healthcare USA, Inc....

  1. Rural-urban Migration and Dynamics of Income Distribution in China: A Non-parametric Approach%Rural-urban Migration and Dynamics of Income Distribution in China: A Non-parametric Approach

    Institute of Scientific and Technical Information of China (English)

    Yong Liu,; Wei Zou

    2011-01-01

    Extending the income dynamics approach in Quah (2003), the present paper studies the enlarging income inequality in China over the past three decades from the viewpoint of rural-urban migration and economic transition. We establish non-parametric estimations of rural and urban income distribution functions in China, and aggregate a population- weighted, nationwide income distribution function taking into account rural-urban differences in technological progress and price indexes. We calculate 12 inequality indexes through non-parametric estimation to overcome the biases in existingparametric estimation and, therefore, provide more accurate measurement of income inequalitY. Policy implications have been drawn based on our research.

  2. Univariate time series forecasting algorithm validation

    Science.gov (United States)

    Ismail, Suzilah; Zakaria, Rohaiza; Muda, Tuan Zalizam Tuan

    2014-12-01

    Forecasting is a complex process which requires expert tacit knowledge in producing accurate forecast values. This complexity contributes to the gaps between end users and expert. Automating this process by using algorithm can act as a bridge between them. Algorithm is a well-defined rule for solving a problem. In this study a univariate time series forecasting algorithm was developed in JAVA and validated using SPSS and Excel. Two set of simulated data (yearly and non-yearly); several univariate forecasting techniques (i.e. Moving Average, Decomposition, Exponential Smoothing, Time Series Regressions and ARIMA) and recent forecasting process (such as data partition, several error measures, recursive evaluation and etc.) were employed. Successfully, the results of the algorithm tally with the results of SPSS and Excel. This algorithm will not just benefit forecaster but also end users that lacking in depth knowledge of forecasting process.

  3. Non-parametric co-clustering of large scale sparse bipartite networks on the GPU

    DEFF Research Database (Denmark)

    Hansen, Toke Jansen; Mørup, Morten; Hansen, Lars Kai

    2011-01-01

    Co-clustering is a problem of both theoretical and practical importance, e.g., market basket analysis and collaborative filtering, and in web scale text processing. We state the co-clustering problem in terms of non-parametric generative models which can address the issue of estimating the number...... of row and column clusters from a hypothesis space of an infinite number of clusters. To reach large scale applications of co-clustering we exploit that parameter inference for co-clustering is well suited for parallel computing. We develop a generic GPU framework for efficient inference on large scale......-life large scale collaborative filtering data and web scale text corpora, demonstrating that latent mesoscale structures extracted by the co-clustering problem as formulated by the Infinite Relational Model (IRM) are consistent across consecutive runs with different initializations and also relevant...

  4. Non-parametric Reconstruction of Cluster Mass Distribution from Strong Lensing Modelling Abell 370

    CERN Document Server

    Abdel-Salam, H M; Williams, L L R

    1997-01-01

    We describe a new non-parametric technique for reconstructing the mass distribution in galaxy clusters with strong lensing, i.e., from multiple images of background galaxies. The observed positions and redshifts of the images are considered as rigid constraints and through the lens (ray-trace) equation they provide us with linear constraint equations. These constraints confine the mass distribution to some allowed region, which is then found by linear programming. Within this allowed region we study in detail the mass distribution with minimum mass-to-light variation; also some others, such as the smoothest mass distribution. The method is applied to the extensively studied cluster Abell 370, which hosts a giant luminous arc and several other multiply imaged background galaxies. Our mass maps are constrained by the observed positions and redshifts (spectroscopic or model-inferred by previous authors) of the giant arc and multiple image systems. The reconstructed maps obtained for A370 reveal a detailed mass d...

  5. Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling.

    Science.gov (United States)

    Karsch, Kevin; Liu, Ce; Kang, Sing Bing

    2014-11-01

    We describe a technique that automatically generates plausible depth maps from videos using non-parametric depth sampling. We demonstrate our technique in cases where past methods fail (non-translating cameras and dynamic scenes). Our technique is applicable to single images as well as videos. For videos, we use local motion cues to improve the inferred depth maps, while optical flow is used to ensure temporal depth consistency. For training and evaluation, we use a Kinect-based system to collect a large data set containing stereoscopic videos with known depths. We show that our depth estimation technique outperforms the state-of-the-art on benchmark databases. Our technique can be used to automatically convert a monoscopic video into stereo for 3D visualization, and we demonstrate this through a variety of visually pleasing results for indoor and outdoor scenes, including results from the feature film Charade.

  6. A multitemporal and non-parametric approach for assessing the impacts of drought on vegetation greenness

    DEFF Research Database (Denmark)

    Carrao, Hugo; Sepulcre, Guadalupe; Horion, Stéphanie Marie Anne F;

    2013-01-01

    for the period between 1998 and 2010. The time-series analysis of vegetation greenness is performed during the growing season with a non-parametric method, namely the seasonal Relative Greenness (RG) of spatially accumulated fAPAR. The Global Land Cover map of 2000 and the GlobCover maps of 2005/2006 and 2009......This study evaluates the relationship between the frequency and duration of meteorological droughts and the subsequent temporal changes on the quantity of actively photosynthesizing biomass (greenness) estimated from satellite imagery on rainfed croplands in Latin America. An innovative non...... Full Data Reanalysis precipitation time-series product, which ranges from January 1901 to December 2010 and is interpolated at the spatial resolution of 1° (decimal degree, DD). Vegetation greenness composites are derived from 10-daily SPOT-VEGETATION images at the spatial resolution of 1/112° DD...

  7. Comparative Study of Parametric and Non-parametric Approaches in Fault Detection and Isolation

    DEFF Research Database (Denmark)

    Katebi, S.D.; Blanke, M.; Katebi, M.R.

    This report describes a comparative study between two approaches to fault detection and isolation in dynamic systems. The first approach uses a parametric model of the system. The main components of such techniques are residual and signature generation for processing and analyzing. The second...... approach is non-parametric in the sense that the signature analysis is only dependent on the frequency or time domain information extracted directly from the input-output signals. Based on these approaches, two different fault monitoring schemes are developed where the feature extraction and fault decision...... algorithms employed are adopted from the template matching in pattern recognition. Extensive simulation studies are performed to demonstrate satisfactory performance of the proposed techniques. The advantages and disadvantages of each approach are discussed and analyzed....

  8. Developing two non-parametric performance models for higher learning institutions

    Science.gov (United States)

    Kasim, Maznah Mat; Kashim, Rosmaini; Rahim, Rahela Abdul; Khan, Sahubar Ali Muhamed Nadhar

    2016-08-01

    Measuring the performance of higher learning Institutions (HLIs) is a must for these institutions to improve their excellence. This paper focuses on formation of two performance models: efficiency and effectiveness models by utilizing a non-parametric method, Data Envelopment Analysis (DEA). The proposed models are validated by measuring the performance of 16 public universities in Malaysia for year 2008. However, since data for one of the variables is unavailable, an estimate was used as a proxy to represent the real data. The results show that average efficiency and effectiveness scores were 0.817 and 0.900 respectively, while six universities were fully efficient and eight universities were fully effective. A total of six universities were both efficient and effective. It is suggested that the two proposed performance models would work as complementary methods to the existing performance appraisal method or as alternative methods in monitoring the performance of HLIs especially in Malaysia.

  9. Factors associated with malnutrition among tribal children in India: a non-parametric approach.

    Science.gov (United States)

    Debnath, Avijit; Bhattacharjee, Nairita

    2014-06-01

    The purpose of this study is to identify the determinants of malnutrition among the tribal children in India. The investigation is based on secondary data compiled from the National Family Health Survey-3. We used a classification and regression tree model, a non-parametric approach, to address the objective. Our analysis shows that breastfeeding practice, economic status, antenatal care of mother and women's decision-making autonomy are negatively associated with malnutrition among tribal children. We identify maternal malnutrition and urban concentration of household as the two risk factors for child malnutrition. The identified associated factors may be used for designing and targeting preventive programmes for malnourished tribal children. © The Author [2014]. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  10. Non-parametric method for separating domestic hot water heating spikes and space heating

    DEFF Research Database (Denmark)

    Bacher, Peder; de Saint-Aubain, Philip Anton; Christiansen, Lasse Engbo;

    2016-01-01

    In this paper a method for separating spikes from a noisy data series, where the data change and evolve over time, is presented. The method is applied on measurements of the total heat load for a single family house. It relies on the fact that the domestic hot water heating is a process generating...... short-lived spikes in the time series, while the space heating changes in slower patterns during the day dependent on the climate and user behavior. The challenge is to separate the domestic hot water heating spikes from the space heating without affecting the natural noise in the space heating...... measurements. The assumption behind the developed method is that the space heating can be estimated by a non-parametric kernel smoother, such that every value significantly above this kernel smoother estimate is identified as a domestic hot water heating spike. First, it is showed how a basic kernel smoothing...

  11. LICORS: Light Cone Reconstruction of States for Non-parametric Forecasting of Spatio-Temporal Systems

    CERN Document Server

    Goerg, Georg M

    2012-01-01

    We present a new, non-parametric forecasting method for data where continuous values are observed discretely in space and time. Our method, "light-cone reconstruction of states" (LICORS), uses physical principles to identify predictive states which are local properties of the system, both in space and time. LICORS discovers the number of predictive states and their predictive distributions automatically, and consistently, under mild assumptions on the data source. We provide an algorithm to implement our method, along with a cross-validation scheme to pick control settings. Simulations show that CV-tuned LICORS outperforms standard methods in forecasting challenging spatio-temporal dynamics. Our work provides applied researchers with a new, highly automatic method to analyze and forecast spatio-temporal data.

  12. On the interpolation of univariate distributions

    CERN Document Server

    Dembinski, Hans P

    2011-01-01

    This note discusses an interpolation technique for univariate distributions. In other words, the question is how to obtain a good approximation for f(x|a) if a0 < a < a1 is a control variable and f(x|a0) and f(x|a1) are known. The technique presented here is based on the interpolation of the quantile function, i.e. the inverse of the cumulative density function.

  13. Non-parametric PSF estimation from celestial transit solar images using blind deconvolution

    Science.gov (United States)

    González, Adriana; Delouille, Véronique; Jacques, Laurent

    2016-01-01

    Context: Characterization of instrumental effects in astronomical imaging is important in order to extract accurate physical information from the observations. The measured image in a real optical instrument is usually represented by the convolution of an ideal image with a Point Spread Function (PSF). Additionally, the image acquisition process is also contaminated by other sources of noise (read-out, photon-counting). The problem of estimating both the PSF and a denoised image is called blind deconvolution and is ill-posed. Aims: We propose a blind deconvolution scheme that relies on image regularization. Contrarily to most methods presented in the literature, our method does not assume a parametric model of the PSF and can thus be applied to any telescope. Methods: Our scheme uses a wavelet analysis prior model on the image and weak assumptions on the PSF. We use observations from a celestial transit, where the occulting body can be assumed to be a black disk. These constraints allow us to retain meaningful solutions for the filter and the image, eliminating trivial, translated, and interchanged solutions. Under an additive Gaussian noise assumption, they also enforce noise canceling and avoid reconstruction artifacts by promoting the whiteness of the residual between the blurred observations and the cleaned data. Results: Our method is applied to synthetic and experimental data. The PSF is estimated for the SECCHI/EUVI instrument using the 2007 Lunar transit, and for SDO/AIA using the 2012 Venus transit. Results show that the proposed non-parametric blind deconvolution method is able to estimate the core of the PSF with a similar quality to parametric methods proposed in the literature. We also show that, if these parametric estimations are incorporated in the acquisition model, the resulting PSF outperforms both the parametric and non-parametric methods.

  14. Non-parametric PSF estimation from celestial transit solar images using blind deconvolution

    Directory of Open Access Journals (Sweden)

    González Adriana

    2016-01-01

    Full Text Available Context: Characterization of instrumental effects in astronomical imaging is important in order to extract accurate physical information from the observations. The measured image in a real optical instrument is usually represented by the convolution of an ideal image with a Point Spread Function (PSF. Additionally, the image acquisition process is also contaminated by other sources of noise (read-out, photon-counting. The problem of estimating both the PSF and a denoised image is called blind deconvolution and is ill-posed. Aims: We propose a blind deconvolution scheme that relies on image regularization. Contrarily to most methods presented in the literature, our method does not assume a parametric model of the PSF and can thus be applied to any telescope. Methods: Our scheme uses a wavelet analysis prior model on the image and weak assumptions on the PSF. We use observations from a celestial transit, where the occulting body can be assumed to be a black disk. These constraints allow us to retain meaningful solutions for the filter and the image, eliminating trivial, translated, and interchanged solutions. Under an additive Gaussian noise assumption, they also enforce noise canceling and avoid reconstruction artifacts by promoting the whiteness of the residual between the blurred observations and the cleaned data. Results: Our method is applied to synthetic and experimental data. The PSF is estimated for the SECCHI/EUVI instrument using the 2007 Lunar transit, and for SDO/AIA using the 2012 Venus transit. Results show that the proposed non-parametric blind deconvolution method is able to estimate the core of the PSF with a similar quality to parametric methods proposed in the literature. We also show that, if these parametric estimations are incorporated in the acquisition model, the resulting PSF outperforms both the parametric and non-parametric methods.

  15. A Non-parametric Approach to Measuring the $k^{-}\\pi^{+}$ Amplitudes in $D^{+} \\to K^{-}K^{+}\\pi{+}$ Decay

    CERN Document Server

    Link, J M; Alimonti, G; Anjos, J C; Arena, V; Barberis, S; Bediaga, I; Benussi, L; Bianco, S; Boca, G; Bonomi, G; Boschini, M; Butler, J N; Carrillo, S; Casimiro, E; Castromonte, C; Cawlfield, C; Cerutti, A; Cheung, H W K; Chiodini, G; Cho, K; Chung, Y S; Cinquini, L; Cuautle, E; Cumalat, J P; D'Angelo, P; Davenport, T F; De Miranda, J M; Di Corato, M; Dini, P; Dos Reis, A C; Edera, L; Engh, D; Erba, S; Fabbri, F L; Frisullo, V; Gaines, I; Garbincius, P H; Gardner, R; Garren, L A; Gianini, G; Gottschalk, E; Göbel, C; Handler, T; Hernández, H; Hosack, M; Inzani, P; Johns, W E; Kang, J S; Kasper, P H; Kim, D Y; Ko, B R; Kreymer, A E; Kryemadhi, A; Kutschke, R; Kwak, J W; Lee, K B; Leveraro, F; Liguori, G; Lopes-Pegna, D; Luiggi, E; López, A M; Machado, A A; Magnin, J; Malvezzi, S; Massafferri, A; Menasce, D; Merlo, M M; Mezzadri, M; Mitchell, R; Moroni, L; Méndez, H; Nehring, M; O'Reilly, B; Otalora, J; Pantea, D; Paris, A; Park, H; Pedrini, D; Pepe, I M; Polycarpo, E; Pontoglio, C; Prelz, F; Quinones, J; Rahimi, A; Ramírez, J E; Ratti, S P; Reyes, M; Riccardi, C; Rovere, M; Sala, S; Segoni, I; Sheaff, M; Sheldon, P D; Stenson, K; Sánchez-Hernández, A; Uribe, C; Vaandering, E W; Vitulo, P; Vázquez, F; Wang, M; Webster, M; Wilson, J R; Wiss, J; Yager, P M; Zallo, A; Zhang, Y

    2007-01-01

    Using a large sample of \\dpkkpi{} decays collected by the FOCUS photoproduction experiment at Fermilab, we present the first non-parametric analysis of the \\kpi{} amplitudes in \\dpkkpi{} decay. The technique is similar to the technique used for our non-parametric measurements of the \\krzmndk{} form factors. Although these results are in rough agreement with those of E687, we observe a wider S-wave contribution for the \\ksw{} contribution than the standard, PDG \\cite{pdg} Breit-Wigner parameterization. We have some weaker evidence for the existence of a new, D-wave component at low values of the $K^- \\pi^+$ mass.

  16. Evaluation of droplet size distributions using univariate and multivariate approaches.

    Science.gov (United States)

    Gaunø, Mette Høg; Larsen, Crilles Casper; Vilhelmsen, Thomas; Møller-Sonnergaard, Jørn; Wittendorff, Jørgen; Rantanen, Jukka

    2013-01-01

    Pharmaceutically relevant material characteristics are often analyzed based on univariate descriptors instead of utilizing the whole information available in the full distribution. One example is droplet size distribution, which is often described by the median droplet size and the width of the distribution. The current study was aiming to compare univariate and multivariate approach in evaluating droplet size distributions. As a model system, the atomization of a coating solution from a two-fluid nozzle was investigated. The effect of three process parameters (concentration of ethyl cellulose in ethanol, atomizing air pressure, and flow rate of coating solution) on the droplet size and droplet size distribution using a full mixed factorial design was used. The droplet size produced by a two-fluid nozzle was measured by laser diffraction and reported as volume based size distribution. Investigation of loading and score plots from principal component analysis (PCA) revealed additional information on the droplet size distributions and it was possible to identify univariate statistics (volume median droplet size), which were similar, however, originating from varying droplet size distributions. The multivariate data analysis was proven to be an efficient tool for evaluating the full information contained in a distribution.

  17. Transit Timing Observations from Kepler: II. Confirmation of Two Multiplanet Systems via a Non-parametric Correlation Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Ford, Eric B.; /Florida U.; Fabrycky, Daniel C.; /Lick Observ.; Steffen, Jason H.; /Fermilab; Carter, Joshua A.; /Harvard-Smithsonian Ctr. Astrophys.; Fressin, Francois; /Harvard-Smithsonian Ctr. Astrophys.; Holman, Matthew J.; /Harvard-Smithsonian Ctr. Astrophys.; Lissauer, Jack J.; /NASA, Ames; Moorhead, Althea V.; /Florida U.; Morehead, Robert C.; /Florida U.; Ragozzine, Darin; /Harvard-Smithsonian Ctr. Astrophys.; Rowe, Jason F.; /NASA, Ames /SETI Inst., Mtn. View /San Diego State U., Astron. Dept.

    2012-01-01

    We present a new method for confirming transiting planets based on the combination of transit timing variations (TTVs) and dynamical stability. Correlated TTVs provide evidence that the pair of bodies are in the same physical system. Orbital stability provides upper limits for the masses of the transiting companions that are in the planetary regime. This paper describes a non-parametric technique for quantifying the statistical significance of TTVs based on the correlation of two TTV data sets. We apply this method to an analysis of the transit timing variations of two stars with multiple transiting planet candidates identified by Kepler. We confirm four transiting planets in two multiple planet systems based on their TTVs and the constraints imposed by dynamical stability. An additional three candidates in these same systems are not confirmed as planets, but are likely to be validated as real planets once further observations and analyses are possible. If all were confirmed, these systems would be near 4:6:9 and 2:4:6:9 period commensurabilities. Our results demonstrate that TTVs provide a powerful tool for confirming transiting planets, including low-mass planets and planets around faint stars for which Doppler follow-up is not practical with existing facilities. Continued Kepler observations will dramatically improve the constraints on the planet masses and orbits and provide sensitivity for detecting additional non-transiting planets. If Kepler observations were extended to eight years, then a similar analysis could likely confirm systems with multiple closely spaced, small transiting planets in or near the habitable zone of solar-type stars.

  18. A non-parametric approach for detecting gene-gene interactions associated with age-at-onset outcomes.

    Science.gov (United States)

    Li, Ming; Gardiner, Joseph C; Breslau, Naomi; Anthony, James C; Lu, Qing

    2014-07-01

    Cox-regression-based methods have been commonly used for the analyses of survival outcomes, such as age-at-disease-onset. These methods generally assume the hazard functions are proportional among various risk groups. However, such an assumption may not be valid in genetic association studies, especially when complex interactions are involved. In addition, genetic association studies commonly adopt case-control designs. Direct use of Cox regression to case-control data may yield biased estimators and incorrect statistical inference. We propose a non-parametric approach, the weighted Nelson-Aalen (WNA) approach, for detecting genetic variants that are associated with age-dependent outcomes. The proposed approach can be directly applied to prospective cohort studies, and can be easily extended for population-based case-control studies. Moreover, it does not rely on any assumptions of the disease inheritance models, and is able to capture high-order gene-gene interactions. Through simulations, we show the proposed approach outperforms Cox-regression-based methods in various scenarios. We also conduct an empirical study of progression of nicotine dependence by applying the WNA approach to three independent datasets from the Study of Addiction: Genetics and Environment. In the initial dataset, two SNPs, rs6570989 and rs2930357, located in genes GRIK2 and CSMD1, are found to be significantly associated with the progression of nicotine dependence (ND). The joint association is further replicated in two independent datasets. Further analysis suggests that these two genes may interact and be associated with the progression of ND. As demonstrated by the simulation studies and real data analysis, the proposed approach provides an efficient tool for detecting genetic interactions associated with age-at-onset outcomes.

  19. Posterior contraction rate for non-parametric Bayesian estimation of the dispersion coefficient of a stochastic differential equation

    NARCIS (Netherlands)

    Gugushvili, S.; Spreij, P.

    2016-01-01

    We consider the problem of non-parametric estimation of the deterministic dispersion coefficient of a linear stochastic differential equation based on discrete time observations on its solution. We take a Bayesian approach to the problem and under suitable regularity assumptions derive the posteror

  20. Further Empirical Results on Parametric Versus Non-Parametric IRT Modeling of Likert-Type Personality Data

    Science.gov (United States)

    Maydeu-Olivares, Albert

    2005-01-01

    Chernyshenko, Stark, Chan, Drasgow, and Williams (2001) investigated the fit of Samejima's logistic graded model and Levine's non-parametric MFS model to the scales of two personality questionnaires and found that the graded model did not fit well. We attribute the poor fit of the graded model to small amounts of multidimensionality present in…

  1. Structuring feature space: a non-parametric method for volumetric transfer function generation.

    Science.gov (United States)

    Maciejewski, Ross; Woo, Insoo; Chen, Wei; Ebert, David S

    2009-01-01

    The use of multi-dimensional transfer functions for direct volume rendering has been shown to be an effective means of extracting materials and their boundaries for both scalar and multivariate data. The most common multi-dimensional transfer function consists of a two-dimensional (2D) histogram with axes representing a subset of the feature space (e.g., value vs. value gradient magnitude), with each entry in the 2D histogram being the number of voxels at a given feature space pair. Users then assign color and opacity to the voxel distributions within the given feature space through the use of interactive widgets (e.g., box, circular, triangular selection). Unfortunately, such tools lead users through a trial-and-error approach as they assess which data values within the feature space map to a given area of interest within the volumetric space. In this work, we propose the addition of non-parametric clustering within the transfer function feature space in order to extract patterns and guide transfer function generation. We apply a non-parametric kernel density estimation to group voxels of similar features within the 2D histogram. These groups are then binned and colored based on their estimated density, and the user may interactively grow and shrink the binned regions to explore feature boundaries and extract regions of interest. We also extend this scheme to temporal volumetric data in which time steps of 2D histograms are composited into a histogram volume. A three-dimensional (3D) density estimation is then applied, and users can explore regions within the feature space across time without adjusting the transfer function at each time step. Our work enables users to effectively explore the structures found within a feature space of the volume and provide a context in which the user can understand how these structures relate to their volumetric data. We provide tools for enhanced exploration and manipulation of the transfer function, and we show that the initial

  2. Non-parametric determination of H and He IS fluxes from cosmic-ray data

    CERN Document Server

    Ghelfi, A; Derome, L; Maurin, D

    2015-01-01

    Top-of-atmosphere (TOA) cosmic-ray (CR) fluxes from satellites and balloon-borne experiments are snapshots of the solar activity imprinted on the interstellar (IS) fluxes. Given a series of snapshots, the unknown IS flux shape and the level of modulation (for each snapshot) can be recovered. We wish (i) to provide the most accurate determination of the IS H and He fluxes from TOA data only, (ii) to obtain the associated modulation levels (and uncertainties) fully accounting for the correlations with the IS flux uncertainties, and (iii) to inspect whether the minimal Force-Field approximation is sufficient to explain all the data at hand. Using H and He TOA measurements, including the recent high precision AMS, BESS-Polar and PAMELA data, we perform a non-parametric fit of the IS fluxes $J^{\\rm IS}_{\\rm H,~He}$ and modulation level $\\phi_i$ for each data taking period. We rely on a Markov Chain Monte Carlo (MCMC) engine to extract the PDF and correlations (hence the credible intervals) of the sought parameters...

  3. THE DARK MATTER PROFILE OF THE MILKY WAY: A NON-PARAMETRIC RECONSTRUCTION

    Energy Technology Data Exchange (ETDEWEB)

    Pato, Miguel [The Oskar Klein Centre for Cosmoparticle Physics, Department of Physics, Stockholm University, AlbaNova, SE-106 91 Stockholm (Sweden); Iocco, Fabio [ICTP South American Institute for Fundamental Research, and Instituto de Física Teórica—Universidade Estadual Paulista (UNESP), Rua Dr. Bento Teobaldo Ferraz 271, 01140-070 São Paulo, SP (Brazil)

    2015-04-10

    We present the results of a new, non-parametric method to reconstruct the Galactic dark matter profile directly from observations. Using the latest kinematic data to track the total gravitational potential and the observed distribution of stars and gas to set the baryonic component, we infer the dark matter contribution to the circular velocity across the Galaxy. The radial derivative of this dynamical contribution is then estimated to extract the dark matter profile. The innovative feature of our approach is that it makes no assumption on the functional form or shape of the profile, thus allowing for a clean determination with no theoretical bias. We illustrate the power of the method by constraining the spherical dark matter profile between 2.5 and 25 kpc away from the Galactic center. The results show that the proposed method, free of widely used assumptions, can already be applied to pinpoint the dark matter distribution in the Milky Way with competitive accuracy, and paves the way for future developments.

  4. Non-parametric method for measuring gas inhomogeneities from X-ray observations of galaxy clusters

    CERN Document Server

    Morandi, Andrea; Cui, Wei

    2013-01-01

    We present a non-parametric method to measure inhomogeneities in the intracluster medium (ICM) from X-ray observations of galaxy clusters. Analyzing mock Chandra X-ray observations of simulated clusters, we show that our new method enables the accurate recovery of the 3D gas density and gas clumping factor profiles out to large radii of galaxy clusters. We then apply this method to Chandra X-ray observations of Abell 1835 and present the first determination of the gas clumping factor from the X-ray cluster data. We find that the gas clumping factor in Abell 1835 increases with radius and reaches ~2-3 at r=R_{200}. This is in good agreement with the predictions of hydrodynamical simulations, but it is significantly below the values inferred from recent Suzaku observations. We further show that the radially increasing gas clumping factor causes flattening of the derived entropy profile of the ICM and affects physical interpretation of the cluster gas structure, especially at the large cluster-centric radii. Our...

  5. Modeling the World Health Organization Disability Assessment Schedule II using non-parametric item response models.

    Science.gov (United States)

    Galindo-Garre, Francisca; Hidalgo, María Dolores; Guilera, Georgina; Pino, Oscar; Rojo, J Emilio; Gómez-Benito, Juana

    2015-03-01

    The World Health Organization Disability Assessment Schedule II (WHO-DAS II) is a multidimensional instrument developed for measuring disability. It comprises six domains (getting around, self-care, getting along with others, life activities and participation in society). The main purpose of this paper is the evaluation of the psychometric properties for each domain of the WHO-DAS II with parametric and non-parametric Item Response Theory (IRT) models. A secondary objective is to assess whether the WHO-DAS II items within each domain form a hierarchy of invariantly ordered severity indicators of disability. A sample of 352 patients with a schizophrenia spectrum disorder is used in this study. The 36 items WHO-DAS II was administered during the consultation. Partial Credit and Mokken scale models are used to study the psychometric properties of the questionnaire. The psychometric properties of the WHO-DAS II scale are satisfactory for all the domains. However, we identify a few items that do not discriminate satisfactorily between different levels of disability and cannot be invariantly ordered in the scale. In conclusion the WHO-DAS II can be used to assess overall disability in patients with schizophrenia, but some domains are too general to assess functionality in these patients because they contain items that are not applicable to this pathology.

  6. Non-parametric reconstruction of the galaxy-lens in PG1115+080

    CERN Document Server

    Saha, P; Saha, Prasenjit; Williams, Liliya L. R.

    1997-01-01

    We describe a new, non-parametric, method for reconstructing lensing mass distributions in multiple-image systems, and apply it to PG1115, for which time delays have recently been measured. It turns out that the image positions and the ratio of time delays between different pairs of images constrain the mass distribution in a linear fashion. Since observational errors on image positions and time delay ratios are constantly improving, we use these data as a rigid constraint in our modelling. In addition, we require the projected mass distributions to be inversion-symmetric and to have inward-pointing density gradients. With these realistic yet non-restrictive conditions it is very easy to produce mass distributions that fit the data precisely. We then present models, for $H_0=42$, 63 and 84 \\kmsmpc, that in each case minimize mass-to-light variations while strictly obeying the lensing constraints. (Only a very rough light distribution is available at present.) All three values of $H_0$ are consistent with the ...

  7. Decision making in coal mine planning using a non-parametric technique of indicator kriging

    Energy Technology Data Exchange (ETDEWEB)

    Mamurekli, D. [Hacettepe University, Ankara (Turkey). Mining Engineering Dept.

    1997-03-01

    In countries where low calorific value coal reserves are abundant and oil reserves are short or none, the requirement of energy production is mainly supported by coal-fired power stations. Consequently, planning to mine the low calorific value coal deposits gains much importance considering the technical and environmental restrictions. Such a mine in Kangal Town of Sivas City is the one that delivers run of mine coal directly to the power station built in the region. In case the calorific value and the ash content of the extracted coal are lower and higher than the required limits, 1300 kcal/kg and 21%, respectively, the power station may apply penalties to the coal producing company. Since the delivery is continuous and made by relying on in situ determination of pre-estimated values these assessments without defining any confidence levels are inevitably subject to inaccuracy. Thus, the company should be aware of uncertainties in making decisions and avoid conceivable risks. In this study, valuable information is provided in the form of conditional distribution to be used during planning process. It maps the indicator variogram corresponding to calorific value of 1300 kcal/kg and the ash content of 21% estimating the conditional probabilities that the true ash contents are less and calorific values are higher than the critical limits by the application of non-parametric technique, indicator kriging. In addition, it outlines the areas that are most uncertain for decision making. 4 refs., 8 figs., 3 tabs.

  8. Non-parametric Deprojection of Surface Brightness Profiles of Galaxies in Generalised Geometries

    CERN Document Server

    Chakrabarty, Dalia

    2009-01-01

    We present a new Bayesian non-parametric deprojection algorithm DOPING (Deprojection of Observed Photometry using and INverse Gambit), that is designed to extract 3-D luminosity density distributions $\\rho$ from observed surface brightness maps $I$, in generalised geometries, while taking into account changes in intrinsic shape with radius, using a penalised likelihood approach and an MCMC optimiser. We provide the most likely solution to the integral equation that represents deprojection of the measured $I$ to $\\rho$. In order to keep the solution modular, we choose to express $\\rho$ as a function of the line-of-sight (LOS) coordinate $z$. We calculate the extent of the system along the ${\\bf z}$-axis, for a given point on the image that lies within an identified isophotal annulus. The extent along the LOS is binned and density is held a constant over each such $z$-bin. The code begins with a seed density and at the beginning of an iterative step, the trial $\\rho$ is updated. Comparison of the projection of ...

  9. Spectral decompositions of multiple time series: a Bayesian non-parametric approach.

    Science.gov (United States)

    Macaro, Christian; Prado, Raquel

    2014-01-01

    We consider spectral decompositions of multiple time series that arise in studies where the interest lies in assessing the influence of two or more factors. We write the spectral density of each time series as a sum of the spectral densities associated to the different levels of the factors. We then use Whittle's approximation to the likelihood function and follow a Bayesian non-parametric approach to obtain posterior inference on the spectral densities based on Bernstein-Dirichlet prior distributions. The prior is strategically important as it carries identifiability conditions for the models and allows us to quantify our degree of confidence in such conditions. A Markov chain Monte Carlo (MCMC) algorithm for posterior inference within this class of frequency-domain models is presented.We illustrate the approach by analyzing simulated and real data via spectral one-way and two-way models. In particular, we present an analysis of functional magnetic resonance imaging (fMRI) brain responses measured in individuals who participated in a designed experiment to study pain perception in humans.

  10. A Non-parametric Approach to the Overall Estimate of Cognitive Load Using NIRS Time Series.

    Science.gov (United States)

    Keshmiri, Soheil; Sumioka, Hidenobu; Yamazaki, Ryuji; Ishiguro, Hiroshi

    2017-01-01

    We present a non-parametric approach to prediction of the n-back n ∈ {1, 2} task as a proxy measure of mental workload using Near Infrared Spectroscopy (NIRS) data. In particular, we focus on measuring the mental workload through hemodynamic responses in the brain induced by these tasks, thereby realizing the potential that they can offer for their detection in real world scenarios (e.g., difficulty of a conversation). Our approach takes advantage of intrinsic linearity that is inherent in the components of the NIRS time series to adopt a one-step regression strategy. We demonstrate the correctness of our approach through its mathematical analysis. Furthermore, we study the performance of our model in an inter-subject setting in contrast with state-of-the-art techniques in the literature to show a significant improvement on prediction of these tasks (82.50 and 86.40% for female and male participants, respectively). Moreover, our empirical analysis suggest a gender difference effect on the performance of the classifiers (with male data exhibiting a higher non-linearity) along with the left-lateralized activation in both genders with higher specificity in females.

  11. A Non-Parametric Delphi Approach to Foster Innovation Policy Debate in Spain

    Directory of Open Access Journals (Sweden)

    Juan Carlos Salazar-Elena

    2016-05-01

    Full Text Available The aim of this paper is to identify some changes needed in Spain’s innovation policy to fill the gap between its innovation results and those of other European countries in lieu of sustainable leadership. To do this we apply the Delphi methodology to experts from academia, business, and government. To overcome the shortcomings of traditional descriptive methods, we develop an inferential analysis by following a non-parametric bootstrap method which enables us to identify important changes that should be implemented. Particularly interesting is the support found for improving the interconnections among the relevant agents of the innovation system (instead of focusing exclusively in the provision of knowledge and technological inputs through R and D activities, or the support found for “soft” policy instruments aimed at providing a homogeneous framework to assess the innovation capabilities of firms (e.g., for funding purposes. Attention to potential innovators among small and medium enterprises (SMEs and traditional industries is particularly encouraged by experts.

  12. An artificial neural network architecture for non-parametric visual odometry in wireless capsule endoscopy

    Science.gov (United States)

    Dimas, George; Iakovidis, Dimitris K.; Karargyris, Alexandros; Ciuti, Gastone; Koulaouzidis, Anastasios

    2017-09-01

    Wireless capsule endoscopy is a non-invasive screening procedure of the gastrointestinal (GI) tract performed with an ingestible capsule endoscope (CE) of the size of a large vitamin pill. Such endoscopes are equipped with a usually low-frame-rate color camera which enables the visualization of the GI lumen and the detection of pathologies. The localization of the commercially available CEs is performed in the 3D abdominal space using radio-frequency (RF) triangulation from external sensor arrays, in combination with transit time estimation. State-of-the-art approaches, such as magnetic localization, which have been experimentally proved more accurate than the RF approach, are still at an early stage. Recently, we have demonstrated that CE localization is feasible using solely visual cues and geometric models. However, such approaches depend on camera parameters, many of which are unknown. In this paper the authors propose a novel non-parametric visual odometry (VO) approach to CE localization based on a feed-forward neural network architecture. The effectiveness of this approach in comparison to state-of-the-art geometric VO approaches is validated using a robotic-assisted in vitro experimental setup.

  13. Non-parametric mass reconstruction of A1689 from strong lensing data with SLAP

    CERN Document Server

    Diego-Rodriguez, J M; Protopapas, P; Tegmark, M; Benítez, N; Broadhurst, T J

    2004-01-01

    We present the mass distribution in the central area of the cluster A1689 by fitting over 100 multiply lensed images with the non-parametric Strong Lensing Analysis Package (SLAP, Diego et al. 2004). The surface mass distribution is obtained in a robust way finding a total mass of 0.25E15 M_sun/h within a 70'' circle radius from the central peak. Our reconstructed density profile fits well an NFW profile with small perturbations due to substructure and is compatible with the more model dependent analysis of Broadhurst et al. (2004a) based on the same data. Our estimated mass does not rely on any prior information about the distribution of dark matter in the cluster. The peak of the mass distribution falls very close to the central cD and there is substructure near the center suggesting that the cluster is not fully relaxed. We also examine the effect on the recovered mass when we include the uncertainties in the redshift of the sources and in the original shape of the sources. Using simulations designed to mi...

  14. A Non-parametric Approach to Constrain the Transfer Function in Reverberation Mapping

    CERN Document Server

    Li, Yan-Rong; Bai, Jin-Ming

    2016-01-01

    Broad emission lines of active galactic nuclei stem from a spatially extended region (broad-line region; BLR) that are composed of discrete clouds and photoionized by the central ionizing continuum. The temporal behaviors of these emission lines are blurred echoes of the continuum variations (i.e., reverberation mapping; RM) and directly reflect structures and kinematics information of BLRs through the so-called transfer function (also known as velocity-delay map). Based on the previous works of Rybicki & Press (1992) and Zu et al. (2011), we develop an extended, non-parametric approach to determine the transfer function for RM data, in which the transfer function is expressed as a sum of a family of relatively-displaced Gaussian response functions. As such, arbitrary shapes of transfer functions associated with complicated BLR geometry can be seamlessly included, enabling us to relax the presumption of a specified transfer function frequently adopted in previous studies and to let it be determined by obs...

  15. Revisiting the Distance Duality Relation using a non-parametric regression method

    Science.gov (United States)

    Rana, Akshay; Jain, Deepak; Mahajan, Shobhit; Mukherjee, Amitabha

    2016-07-01

    The interdependence of luminosity distance, DL and angular diameter distance, DA given by the distance duality relation (DDR) is very significant in observational cosmology. It is very closely tied with the temperature-redshift relation of Cosmic Microwave Background (CMB) radiation. Any deviation from η(z)≡ DL/DA (1+z)2 =1 indicates a possible emergence of new physics. Our aim in this work is to check the consistency of these relations using a non-parametric regression method namely, LOESS with SIMEX. This technique avoids dependency on the cosmological model and works with a minimal set of assumptions. Further, to analyze the efficiency of the methodology, we simulate a dataset of 020 points of η (z) data based on a phenomenological model η(z)= (1+z)epsilon. The error on the simulated data points is obtained by using the temperature of CMB radiation at various redshifts. For testing the distance duality relation, we use the JLA SNe Ia data for luminosity distances, while the angular diameter distances are obtained from radio galaxies datasets. Since the DDR is linked with CMB temperature-redshift relation, therefore we also use the CMB temperature data to reconstruct η (z). It is important to note that with CMB data, we are able to study the evolution of DDR upto a very high redshift z = 2.418. In this analysis, we find no evidence of deviation from η=1 within a 1σ region in the entire redshift range used in this analysis (0 < z <= 2.418).

  16. Does sunspot numbers cause global temperatures? A reconsideration using non-parametric causality tests

    Science.gov (United States)

    Hassani, Hossein; Huang, Xu; Gupta, Rangan; Ghodsi, Mansi

    2016-10-01

    In a recent paper, Gupta et al., (2015), analyzed whether sunspot numbers cause global temperatures based on monthly data covering the period 1880:1-2013:9. The authors find that standard time domain Granger causality test fails to reject the null hypothesis that sunspot numbers do not cause global temperatures for both full and sub-samples, namely 1880:1-1936:2, ​1936:3-1986:11 and 1986:12-2013:9 (identified based on tests of structural breaks). However, frequency domain causality test detects predictability for the full-sample at short (2-2.6 months) cycle lengths, but not the sub-samples. But since, full-sample causality cannot be relied upon due to structural breaks, Gupta et al., (2015) conclude that the evidence of causality running from sunspot numbers to global temperatures is weak and inconclusive. Given the importance of the issue of global warming, our current paper aims to revisit this issue of whether sunspot numbers cause global temperatures, using the same data set and sub-samples used by Gupta et al., (2015), based on an nonparametric Singular Spectrum Analysis (SSA)-based causality test. Based on this test, we however, show that sunspot numbers have predictive ability for global temperatures for the three sub-samples, over and above the full-sample. Thus, generally speaking, our non-parametric SSA-based causality test outperformed both time domain and frequency domain causality tests and highlighted that sunspot numbers have always been important in predicting global temperatures.

  17. Bayesian Semi- and Non-Parametric Models for Longitudinal Data with Multiple Membership Effects in R

    Directory of Open Access Journals (Sweden)

    Terrance Savitsky

    2014-03-01

    Full Text Available We introduce growcurves for R that performs analysis of repeated measures multiple membership (MM data. This data structure arises in studies under which an intervention is delivered to each subject through the subjects participation in a set of multiple elements that characterize the intervention. In our motivating study design under which subjects receive a group cognitive behavioral therapy (CBT treatment, an element is a group CBT session and each subject attends multiple sessions that, together, comprise the treatment. The sets of elements, or group CBT sessions, attended by subjects will partly overlap with some of those from other subjects to induce a dependence in their responses. The growcurves package offers two alternative sets of hierarchical models: 1. Separate terms are specified for multivariate subject and MM element random effects, where the subject effects are modeled under a Dirichlet process prior to produce a semi-parametric construction; 2. A single term is employed to model joint subject-by-MM effects. A fully non-parametric dependent Dirichlet process formulation allows exploration of differences in subject responses across different MM elements. This model allows for borrowing information among subjects who express similar longitudinal trajectories for flexible estimation. growcurves deploys estimation functions to perform posterior sampling under a suite of prior options. An accompanying set of plot functions allows the user to readily extract by-subject growth curves. The design approach intends to anticipate inferential goals with tools that fully extract information from repeated measures data. Computational efficiency is achieved by performing the sampling for estimation functions using compiled C++ code.

  18. Population pharmacokinetics of nevirapine in Malaysian HIV patients: a non-parametric approach.

    Science.gov (United States)

    Mustafa, Suzana; Yusuf, Wan Nazirah Wan; Woillard, Jean Baptiste; Choon, Tan Soo; Hassan, Norul Badriah

    2016-07-01

    Nevirapine is the first non-nucleoside reverse-transcriptase inhibitor approved and is widely used in combination therapy to treat HIV-1 infection. The pharmacokinetics of nevirapine was extensively studied in various populations with a parametric approach. Hence, this study was aimed to determine population pharmacokinetic parameters in Malaysian HIV-infected patients with a non-parametric approach which allows detection of outliers or non-normal distribution contrary to the parametric approach. Nevirapine population pharmacokinetics was modelled with Pmetrics. A total of 708 observations from 112 patients were included in the model building and validation analysis. Evaluation of the model was based on a visual inspection of observed versus predicted (population and individual) concentrations and plots weighted residual error versus concentrations. Accuracy and robustness of the model were evaluated by visual predictive check (VPC). The median parameters' estimates obtained from the final model were used to predict individual nevirapine plasma area-under-curve (AUC) in the validation dataset. The Bland-Altman plot was used to compare the AUC predicted with trapezoidal AUC. The median nevirapine clearance was of 2.92 L/h, the median rate of absorption was 2.55/h and the volume of distribution was 78.23 L. Nevirapine pharmacokinetics were best described by one-compartmental with first-order absorption model and a lag-time. Weighted residuals for the model selected were homogenously distributed over the concentration and time range. The developed model adequately estimated AUC. In conclusion, a model to describe the pharmacokinetics of nevirapine was developed. The developed model adequately describes nevirapine population pharmacokinetics in HIV-infected patients in Malaysia.

  19. Assessment of water quality trends in the Minnesota River using non-parametric and parametric methods

    Science.gov (United States)

    Johnson, H.O.; Gupta, S.C.; Vecchia, A.V.; Zvomuya, F.

    2009-01-01

    Excessive loading of sediment and nutrients to rivers is a major problem in many parts of the United States. In this study, we tested the non-parametric Seasonal Kendall (SEAKEN) trend model and the parametric USGS Quality of Water trend program (QWTREND) to quantify trends in water quality of the Minnesota River at Fort Snelling from 1976 to 2003. Both methods indicated decreasing trends in flow-adjusted concentrations of total suspended solids (TSS), total phosphorus (TP), and orthophosphorus (OP) and a generally increasing trend in flow-adjusted nitrate plus nitrite-nitrogen (NO3-N) concentration. The SEAKEN results were strongly influenced by the length of the record as well as extreme years (dry or wet) earlier in the record. The QWTREND results, though influenced somewhat by the same factors, were more stable. The magnitudes of trends between the two methods were somewhat different and appeared to be associated with conceptual differences between the flow-adjustment processes used and with data processing methods. The decreasing trends in TSS, TP, and OP concentrations are likely related to conservation measures implemented in the basin. However, dilution effects from wet climate or additional tile drainage cannot be ruled out. The increasing trend in NO3-N concentrations was likely due to increased drainage in the basin. Since the Minnesota River is the main source of sediments to the Mississippi River, this study also addressed the rapid filling of Lake Pepin on the Mississippi River and found the likely cause to be increased flow due to recent wet climate in the region. Copyright ?? 2009 by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America. All rights reserved.

  20. A Non-Parametric Item Response Theory Evaluation of the CAGE Instrument Among Older Adults.

    Science.gov (United States)

    Abdin, Edimansyah; Sagayadevan, Vathsala; Vaingankar, Janhavi Ajit; Picco, Louisa; Chong, Siow Ann; Subramaniam, Mythily

    2017-08-04

    The validity of the CAGE using item response theory (IRT) has not yet been examined in older adult population. This study aims to investigate the psychometric properties of the CAGE using both non-parametric and parametric IRT models, assess whether there is any differential item functioning (DIF) by age, gender and ethnicity and examine the measurement precision at the cut-off scores. We used data from the Well-being of the Singapore Elderly study to conduct Mokken scaling analysis (MSA), dichotomous Rasch and 2-parameter logistic IRT models. The measurement precision at the cut-off scores were evaluated using classification accuracy (CA) and classification consistency (CC). The MSA showed the overall scalability H index was 0.459, indicating a medium performing instrument. All items were found to be homogenous, measuring the same construct and able to discriminate well between respondents with high levels of the construct and the ones with lower levels. The item discrimination ranged from 1.07 to 6.73 while the item difficulty ranged from 0.33 to 2.80. Significant DIF was found for 2-item across ethnic group. More than 90% (CC and CA ranged from 92.5% to 94.3%) of the respondents were consistently and accurately classified by the CAGE cut-off scores of 2 and 3. The current study provides new evidence on the validity of the CAGE from the IRT perspective. This study provides valuable information of each item in the assessment of the overall severity of alcohol problem and the precision of the cut-off scores in older adult population.

  1. SOPIE: an R package for the non-parametric estimation of the off-pulse interval of a pulsar light curve

    Science.gov (United States)

    Schutte, Willem D.; Swanepoel, Jan W. H.

    2016-09-01

    An automated tool to derive the off-pulse interval of a light curve originating from a pulsar is needed. First, we derive a powerful and accurate non-parametric sequential estimation technique to estimate the off-pulse interval of a pulsar light curve in an objective manner. This is in contrast to the subjective `eye-ball' (visual) technique, and complementary to the Bayesian Block method which is currently used in the literature. The second aim involves the development of a statistical package, necessary for the implementation of our new estimation technique. We develop a statistical procedure to estimate the off-pulse interval in the presence of noise. It is based on a sequential application of p-values obtained from goodness-of-fit tests for uniformity. The Kolmogorov-Smirnov, Cramér-von Mises, Anderson-Darling and Rayleigh test statistics are applied. The details of the newly developed statistical package SOPIE (Sequential Off-Pulse Interval Estimation) are discussed. The developed estimation procedure is applied to simulated and real pulsar data. Finally, the SOPIE estimated off-pulse intervals of two pulsars are compared to the estimates obtained with the Bayesian Block method and yield very satisfactory results. We provide the code to implement the SOPIE package, which is publicly available at http://CRAN.R-project.org/package=SOPIE (Schutte).

  2. APPLICATION OF PARAMETRIC AND NON-PARAMETRIC BENCHMARKING METHODS IN COST EFFICIENCY ANALYSIS OF THE ELECTRICITY DISTRIBUTION SECTOR

    Directory of Open Access Journals (Sweden)

    Andrea Furková

    2007-06-01

    Full Text Available This paper explores the aplication of parametric and non-parametric benchmarking methods in measuring cost efficiency of Slovak and Czech electricity distribution companies. We compare the relative cost efficiency of Slovak and Czech distribution companies using two benchmarking methods: the non-parametric Data Envelopment Analysis (DEA and the Stochastic Frontier Analysis (SFA as the parametric approach. The first part of analysis was based on DEA models. Traditional cross-section CCR and BCC model were modified to cost efficiency estimation. In further analysis we focus on two versions of stochastic frontier cost functioin using panel data: MLE model and GLS model. These models have been applied to an unbalanced panel of 11 (Slovakia 3 and Czech Republic 8 regional electricity distribution utilities over a period from 2000 to 2004. The differences in estimated scores, parameters and ranking of utilities were analyzed. We observed significant differences between parametric methods and DEA approach.

  3. Non-parametric data-based approach for the quantification and communication of uncertainties in river flood forecasts

    Science.gov (United States)

    Van Steenbergen, N.; Willems, P.

    2012-04-01

    Reliable flood forecasts are the most important non-structural measures to reduce the impact of floods. However flood forecasting systems are subject to uncertainty originating from the input data, model structure and model parameters of the different hydraulic and hydrological submodels. To quantify this uncertainty a non-parametric data-based approach has been developed. This approach analyses the historical forecast residuals (differences between the predictions and the observations at river gauging stations) without using a predefined statistical error distribution. Because the residuals are correlated with the value of the forecasted water level and the lead time, the residuals are split up into discrete classes of simulated water levels and lead times. For each class, percentile values are calculated of the model residuals and stored in a 'three dimensional error' matrix. By 3D interpolation in this error matrix, the uncertainty in new forecasted water levels can be quantified. In addition to the quantification of the uncertainty, the communication of this uncertainty is equally important. The communication has to be done in a consistent way, reducing the chance of misinterpretation. Also, the communication needs to be adapted to the audience; the majority of the larger public is not interested in in-depth information on the uncertainty on the predicted water levels, but only is interested in information on the likelihood of exceedance of certain alarm levels. Water managers need more information, e.g. time dependent uncertainty information, because they rely on this information to undertake the appropriate flood mitigation action. There are various ways in presenting uncertainty information (numerical, linguistic, graphical, time (in)dependent, etc.) each with their advantages and disadvantages for a specific audience. A useful method to communicate uncertainty of flood forecasts is by probabilistic flood mapping. These maps give a representation of the

  4. Parametric modeling of DSC-MRI data with stochastic filtration and optimal input design versus non-parametric modeling.

    Science.gov (United States)

    Kalicka, Renata; Pietrenko-Dabrowska, Anna

    2007-03-01

    In the paper MRI measurements are used for assessment of brain tissue perfusion and other features and functions of the brain (cerebral blood flow - CBF, cerebral blood volume - CBV, mean transit time - MTT). Perfusion is an important indicator of tissue viability and functioning as in pathological tissue blood flow, vascular and tissue structure are altered with respect to normal tissue. MRI enables diagnosing diseases at an early stage of their course. The parametric and non-parametric approaches to the identification of MRI models are presented and compared. The non-parametric modeling adopts gamma variate functions. The parametric three-compartmental catenary model, based on the general kinetic model, is also proposed. The parameters of the models are estimated on the basis of experimental data. The goodness of fit of the gamma variate and the three-compartmental models to the data and the accuracy of the parameter estimates are compared. Kalman filtering, smoothing the measurements, was adopted to improve the estimate accuracy of the parametric model. Parametric modeling gives a better fit and better parameter estimates than non-parametric and allows an insight into the functioning of the system. To improve the accuracy optimal experiment design related to the input signal was performed.

  5. Non-parametric kernel density estimation of species sensitivity distributions in developing water quality criteria of metals.

    Science.gov (United States)

    Wang, Ying; Wu, Fengchang; Giesy, John P; Feng, Chenglian; Liu, Yuedan; Qin, Ning; Zhao, Yujie

    2015-09-01

    Due to use of different parametric models for establishing species sensitivity distributions (SSDs), comparison of water quality criteria (WQC) for metals of the same group or period in the periodic table is uncertain and results can be biased. To address this inadequacy, a new probabilistic model, based on non-parametric kernel density estimation was developed and optimal bandwidths and testing methods are proposed. Zinc (Zn), cadmium (Cd), and mercury (Hg) of group IIB of the periodic table are widespread in aquatic environments, mostly at small concentrations, but can exert detrimental effects on aquatic life and human health. With these metals as target compounds, the non-parametric kernel density estimation method and several conventional parametric density estimation methods were used to derive acute WQC of metals for protection of aquatic species in China that were compared and contrasted with WQC for other jurisdictions. HC5 values for protection of different types of species were derived for three metals by use of non-parametric kernel density estimation. The newly developed probabilistic model was superior to conventional parametric density estimations for constructing SSDs and for deriving WQC for these metals. HC5 values for the three metals were inversely proportional to atomic number, which means that the heavier atoms were more potent toxicants. The proposed method provides a novel alternative approach for developing SSDs that could have wide application prospects in deriving WQC and use in assessment of risks to ecosystems.

  6. Non-parametric determination of H and He interstellar fluxes from cosmic-ray data

    Science.gov (United States)

    Ghelfi, A.; Barao, F.; Derome, L.; Maurin, D.

    2016-06-01

    Context. Top-of-atmosphere (TOA) cosmic-ray (CR) fluxes from satellites and balloon-borne experiments are snapshots of the solar activity imprinted on the interstellar (IS) fluxes. Given a series of snapshots, the unknown IS flux shape and the level of modulation (for each snapshot) can be recovered. Aims: We wish (i) to provide the most accurate determination of the IS H and He fluxes from TOA data alone; (ii) to obtain the associated modulation levels (and uncertainties) while fully accounting for the correlations with the IS flux uncertainties; and (iii) to inspect whether the minimal force-field approximation is sufficient to explain all the data at hand. Methods: Using H and He TOA measurements, including the recent high-precision AMS, BESS-Polar, and PAMELA data, we performed a non-parametric fit of the IS fluxes JISH,~He and modulation level φi for each data-taking period. We relied on a Markov chain Monte Carlo (MCMC) engine to extract the probability density function and correlations (hence the credible intervals) of the sought parameters. Results: Although H and He are the most abundant and best measured CR species, several datasets had to be excluded from the analysis because of inconsistencies with other measurements. From the subset of data passing our consistency cut, we provide ready-to-use best-fit and credible intervals for the H and He IS fluxes from MeV/n to PeV/n energy (with a relative precision in the range [ 2-10% ] at 1σ). Given the strong correlation between JIS and φi parameters, the uncertainties on JIS translate into Δφ ≈ ± 30 MV (at 1σ) for all experiments. We also find that the presence of 3He in He data biases φ towards higher φ values by ~30 MV. The force-field approximation, despite its limitation, gives an excellent (χ2/d.o.f. = 1.02) description of the recent high-precision TOA H and He fluxes. Conclusions: The analysis must be extended to different charge species and more realistic modulation models. It would benefit

  7. Evaluation of world's largest social welfare scheme: An assessment using non-parametric approach.

    Science.gov (United States)

    Singh, Sanjeet

    2016-08-01

    Mahatma Gandhi National Rural Employment Guarantee Act (MGNREGA) is the world's largest social welfare scheme in India for the poverty alleviation through rural employment generation. This paper aims to evaluate and rank the performance of the states in India under MGNREGA scheme. A non-parametric approach, Data Envelopment Analysis (DEA) is used to calculate the overall technical, pure technical, and scale efficiencies of states in India. The sample data is drawn from the annual official reports published by the Ministry of Rural Development, Government of India. Based on three selected input parameters (expenditure indicators) and five output parameters (employment generation indicators), I apply both input and output oriented DEA models to estimate how well the states utilize their resources and generate outputs during the financial year 2013-14. The relative performance evaluation has been made under the assumption of constant returns and also under variable returns to scale to assess the impact of scale on performance. The results indicate that the main source of inefficiency is both technical and managerial practices adopted. 11 states are overall technically efficient and operate at the optimum scale whereas 18 states are pure technical or managerially efficient. It has been found that for some states it necessary to alter scheme size to perform at par with the best performing states. For inefficient states optimal input and output targets along with the resource savings and output gains are calculated. Analysis shows that if all inefficient states operate at optimal input and output levels, on an average 17.89% of total expenditure and a total amount of $780million could have been saved in a single year. Most of the inefficient states perform poorly when it comes to the participation of women and disadvantaged sections (SC&ST) in the scheme. In order to catch up with the performance of best performing states, inefficient states on an average need to enhance

  8. 污染线性模型的非参数估计%NON-PARAMETRIC ESTIMATION IN CONTAMINATED LINEAR MODEL

    Institute of Scientific and Technical Information of China (English)

    柴根象; 孙燕; 杨筱菡

    2001-01-01

    In this paper, the following contaminated linear model is considered: yi=(1-ε)xτiβ+zi, 1≤i≤n, where r.v.'s {yi} are contaminated with errors {zi}. To assume that the errors have the finite moment of order 2 only. The non-parametric estimation of contaminated coefficient ε and regression parameter β are established, and the strong consistency and convergence rate almost surely of the estimators are obtained. A simulated example is also given to show the visual performance of the estimations.

  9. Non parametric deprojection of NIKA SZ observations: pressure distribution in the Planck-discovered cluster PSZ1 G045.85+57.71

    CERN Document Server

    Ruppin, F; Comis, B; Ade, P; André, P; Arnaud, M; Beelen, A; Benoît, A; Bideaud, A; Billot, N; Bourrion, O; Calvo, M; Catalano, A; Coiffard, G; D'Addabbo, A; De Petris, M; Désert, F -X; Doyle, S; Goupy, J; Kramer, C; Leclercq, S; Macías-Pérez, J F; Mauskopf, P; Mayet, F; Monfardini, A; Pajot, F; Pascale, E; Perotto, L; Pisano, G; Pointecouteau, E; Ponthieu, N; Pratt, G W; Revéret, V; Ritacco, A; Rodriguez, L; Romero, C; Schuster, K; Sievers, A; Triqueneaux, S; Tucker, C; Zylka, R

    2016-01-01

    The determination of the thermodynamic properties of clusters of galaxies at intermediate and high redshift can bring new insights into the formation of large scale structures. It is essential for a robust calibration of the mass-observable scaling relations and their scatter, which are key ingredients for precise cosmology using cluster statistics. Here we illustrate an application of high-resolution $(< 20$ arcsec) thermal Sunyaev-Zel'dovich (tSZ) observations by probing the intracluster medium (ICM) of the Planck-discovered galaxy cluster PSZ1 G045.85+57.71 at redshift $z = 0.61$, using tSZ data obtained with the NIKA camera, a dual-band (150 and 260~GHz) instrument operated at the IRAM 30-meter telescope. We deproject jointly NIKA and Planck data to extract the electronic pressure distribution non-parametrically from the cluster core ($R \\sim 0.02\\, R_{500}$) to its outskirts ($R \\sim 3\\, R_{500}$), for the first time at intermediate redshift. The constraints on the resulting pressure profile allow us ...

  10. Application of non-parametric bootstrap methods to estimate confidence intervals for QTL location in a beef cattle QTL experimental population.

    Science.gov (United States)

    Jongjoo, Kim; Davis, Scott K; Taylor, Jeremy F

    2002-06-01

    Empirical confidence intervals (CIs) for the estimated quantitative trait locus (QTL) location from selective and non-selective non-parametric bootstrap resampling methods were compared for a genome scan involving an Angus x Brahman reciprocal fullsib backcross population. Genetic maps, based on 357 microsatellite markers, were constructed for 29 chromosomes using CRI-MAP V2.4. Twelve growth, carcass composition and beef quality traits (n = 527-602) were analysed to detect QTLs utilizing (composite) interval mapping approaches. CIs were investigated for 28 likelihood ratio test statistic (LRT) profiles for the one QTL per chromosome model. The CIs from the non-selective bootstrap method were largest (87 7 cM average or 79-2% coverage of test chromosomes). The Selective II procedure produced the smallest CI size (42.3 cM average). However, CI sizes from the Selective II procedure were more variable than those produced by the two LOD drop method. CI ranges from the Selective II procedure were also asymmetrical (relative to the most likely QTL position) due to the bias caused by the tendency for the estimated QTL position to be at a marker position in the bootstrap samples and due to monotonicity and asymmetry of the LRT curve in the original sample.

  11. On The Robustness of z=0-1 Galaxy Size Measurements Through Model and Non-Parametric Fits

    CERN Document Server

    Mosleh, Moein; Franx, Marijn

    2013-01-01

    We present the size-stellar mass relations of nearby (z=0.01-0.02) SDSS galaxies, for samples selected by color, morphology, Sersic index n, and specific star formation rate. Several commonly-employed size measurement techniques are used, including single Sersic fits, two-component Sersic models and a non-parametric method. Through simple simulations we show that the non-parametric and two-component Sersic methods provide the most robust effective radius measurements, while those based on single Sersic profiles are often overestimates, especially for massive red/early-type galaxies. Using our robust sizes, we show that for all sub-samples, the mass-size relations are shallow at low stellar masses and steepen above ~3-4 x 10^{10}\\Msun. The mass-size relations for galaxies classified as late-type, low-n, and star-forming are consistent with each other, while blue galaxies follow a somewhat steeper relation. The mass-size relations of early-type, high-n, red, and quiescent galaxies all agree with each other but ...

  12. Further Empirical Results on Parametric Versus Non-Parametric IRT Modeling of Likert-Type Personality Data.

    Science.gov (United States)

    Maydeu-Olivares, Albert

    2005-04-01

    Chernyshenko, Stark, Chan, Drasgow, and Williams (2001) investigated the fit of Samejima's logistic graded model and Levine's non-parametric MFS model to the scales of two personality questionnaires and found that the graded model did not fit well. We attribute the poor fit of the graded model to small amounts of multidimensionality present in their data. To verify this conjecture, we compare the fit of these models to the Social Problem Solving Inventory-Revised, whose scales were designed to be unidimensional. A calibration and a cross-validation sample of new observations were used. We also included the following parametric models in the comparison: Bock's nominal model, Masters' partial credit model, and Thissen and Steinberg's extension of the latter. All models were estimated using full information maximum likelihood. We also included in the comparison a normal ogive model version of Samejima's model estimated using limited information estimation. We found that for all scales Samejima's model outperformed all other parametric IRT models in both samples, regardless of the estimation method employed. The non-parametric model outperformed all parametric models in the calibration sample. However, the graded model outperformed MFS in the cross-validation sample in some of the scales. We advocate employing the graded model estimated using limited information methods in modeling Likert-type data, as these methods are more versatile than full information methods to capture the multidimensionality that is generally present in personality data.

  13. Climatic, parametric and non-parametric analysis of energy performance of double-glazed windows in different climates

    Directory of Open Access Journals (Sweden)

    Saeed Banihashemi

    2015-12-01

    Full Text Available In line with the growing global trend toward energy efficiency in buildings, this paper aims to first; investigate the energy performance of double-glazed windows in different climates and second; analyze the most dominant used parametric and non-parametric tests in dimension reduction for simulating this component. A four-story building representing the conventional type of residential apartments for four climates of cold, temperate, hot-arid and hot-humid was selected for simulation. 10 variables of U-factor, SHGC, emissivity, visible transmittance, monthly average dry bulb temperature, monthly average percent humidity, monthly average wind speed, monthly average direct solar radiation, monthly average diffuse solar radiation and orientation constituted the parameters considered in the calculation of cooling and heating loads of the case. Design of Experiment and Principal Component Analysis methods were applied to find the most significant factors and reduction dimension of initial variables. It was observed that in two climates of temperate and hot-arid, using double glazed windows was beneficial in both cold and hot months whereas in cold and hot-humid climates where heating and cooling loads are dominant respectively, they were advantageous in only those dominant months. Furthermore, an inconsistency was revealed between parametric and non-parametric tests in terms of identifying the most significant variables.

  14. 'nparACT' package for R: A free software tool for the non-parametric analysis of actigraphy data.

    Science.gov (United States)

    Blume, Christine; Santhi, Nayantara; Schabus, Manuel

    2016-01-01

    For many studies, participants' sleep-wake patterns are monitored and recorded prior to, during and following an experimental or clinical intervention using actigraphy, i.e. the recording of data generated by movements. Often, these data are merely inspected visually without computation of descriptive parameters, in part due to the lack of user-friendly software. To address this deficit, we developed a package for R Core Team [6], that allows computing several non-parametric measures from actigraphy data. Specifically, it computes the interdaily stability (IS), intradaily variability (IV) and relative amplitude (RA) of activity and gives the start times and average activity values of M10 (i.e. the ten hours with maximal activity) and L5 (i.e. the five hours with least activity). Two functions compute these 'classical' parameters and handle either single or multiple files. Two other functions additionally allow computing an L-value (i.e. the least activity value) for a user-defined time span termed 'Lflex' value. A plotting option is included in all functions. The package can be downloaded from the Comprehensive R Archives Network (CRAN). •The package 'nparACT' for R serves the non-parametric analysis of actigraphy data.•Computed parameters include interdaily stability (IS), intradaily variability (IV) and relative amplitude (RA) as well as start times and average activity during the 10 h with maximal and the 5 h with minimal activity (i.e. M10 and L5).

  15. Automatic Image Segmentation Using Active Contours with Univariate Marginal Distribution

    Directory of Open Access Journals (Sweden)

    I. Cruz-Aceves

    2013-01-01

    Full Text Available This paper presents a novel automatic image segmentation method based on the theory of active contour models and estimation of distribution algorithms. The proposed method uses the univariate marginal distribution model to infer statistical dependencies between the control points on different active contours. These contours have been generated through an alignment process of reference shape priors, in order to increase the exploration and exploitation capabilities regarding different interactive segmentation techniques. This proposed method is applied in the segmentation of the hollow core in microscopic images of photonic crystal fibers and it is also used to segment the human heart and ventricular areas from datasets of computed tomography and magnetic resonance images, respectively. Moreover, to evaluate the performance of the medical image segmentations compared to regions outlined by experts, a set of similarity measures has been adopted. The experimental results suggest that the proposed image segmentation method outperforms the traditional active contour model and the interactive Tseng method in terms of segmentation accuracy and stability.

  16. Material analysis on engineering statistics

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Seung Hun

    2008-03-15

    This book is about material analysis on engineering statistics using mini tab, which includes technical statistics and seven tools of QC, probability distribution, presumption and checking, regression analysis, tim series analysis, control chart, process capacity analysis, measurement system analysis, sampling check, experiment planning, response surface analysis, compound experiment, Taguchi method, and non parametric statistics. It is good for university and company to use because it deals with theory first and analysis using mini tab on 6 sigma BB and MBB.

  17. Comparing non-parametric methods for ungrouping coarsely aggregated age-specific distributions

    DEFF Research Database (Denmark)

    Rizzi, Silvia; Thinggaard, Mikael; Vaupel, James W.

    2016-01-01

    Demographers have often access to vital statistics that are less than ideal for the purpose of their research. In many instances demographic data are reported in coarse histograms, where the values given are only the summation of true latent values, thereby making detailed analysis troublesome. O...

  18. Singular Value Decomposition, Hessian Errors, and Linear Algebra of Non-parametric Extraction of Partons from DIS

    CERN Document Server

    Goshtasbpour, Mehrdad

    2014-01-01

    By singular value decomposition (SVD) of a numerically singular Hessian matrix and a numerically singular system of linear equations for the experimental data (accumulated in the respective ${\\chi ^2}$ function) and constraints, least square solutions and their propagated errors for the non-parametric extraction of Partons from $F_2$ are obtained. SVD and its physical application is phenomenologically described in the two cases. Among the subjects covered are: identification and properties of the boundary between the two subsets of ordered eigenvalues corresponding to range and null space, and the eigenvalue structure of the null space of the singular matrix, including a second boundary separating the smallest eigenvalues of essentially no information, in a particular case. The eigenvector-eigenvalue structure of "redundancy and smallness" of the errors of two pdf sets, in our simplified Hessian model, is described by a secondary manifestation of deeper null space, in the context of SVD.

  19. Detection of Bistability in Phase Space of a Real Galaxy, using a New Non-parametric Bayesian Test of Hypothesis

    CERN Document Server

    Chakrabarty, Dalia

    2013-01-01

    In lieu of direct detection of dark matter, estimation of the distribution of the gravitational mass in distant galaxies is of crucial importance in Astrophysics. Typically, such estimation is performed using small samples of noisy, partially missing measurements - only some of the three components of the velocity and location vectors of individual particles that live in the galaxy are measurable. Such limitations of the available data in turn demands that simplifying model assumptions be undertaken. Thus, assuming that the phase space of a galaxy manifests simple symmetries - such as isotropy - allows for the learning of the density of the gravitational mass in galaxies. This is equivalent to assuming that the phase space $pdf$ from which the velocity and location vectors of galactic particles are sampled from, is an isotropic function of these vectors. We present a new non-parametric test of hypothesis that tests for relative support in two or more measured data sets of disparate sizes, for the undertaken m...

  20. A non-parametric conditional bivariate reference region with an application to height/weight measurements on normal girls

    DEFF Research Database (Denmark)

    Petersen, Jørgen Holm

    2009-01-01

    A conceptually simple two-dimensional conditional reference curve is described. The curve gives a decision basis for determining whether a bivariate response from an individual is "normal" or "abnormal" when taking into account that a third (conditioning) variable may influence the bivariate...... response. The reference curve is not only characterized analytically but also by geometric properties that are easily communicated to medical doctors - the users of such curves. The reference curve estimator is completely non-parametric, so no distributional assumptions are needed about the two......-dimensional response. An example that will serve to motivate and illustrate the reference is the study of the height/weight distribution of 7-8-year-old Danish school girls born in 1930, 1950, or 1970....

  1. Non-parametric frontier approach to modelling the relationships among population, GDP, energy consumption and CO{sub 2} emissions

    Energy Technology Data Exchange (ETDEWEB)

    Lozano, Sebastian; Gutierrez, Ester [University of Seville, E.S.I., Department of Industrial Management, Camino de los Descubrimientos, s/n, 41092 Sevilla (Spain)

    2008-07-15

    In this paper, a non-parametric approach based in Data Envelopment Analysis (DEA) is proposed as an alternative to the Kaya identity (a.k.a ImPACT). This Frontier Method identifies and extends existing best practices. Population and GDP are considered as input and output, respectively. Both primary energy consumption and Greenhouse Gas (GHG) emissions are considered as undesirable outputs. Several Linear Programming models are formulated with different aims, namely: (a) determine efficiency levels, (b) estimate maximum GDP compatible with given levels of population, energy intensity and carbonization intensity, and (c) estimate the minimum level of GHG emissions compatible with given levels of population, GDP, energy intensity or carbonization index. The United States of America case is used as illustration of the proposed approach. (author)

  2. Adaptive ILC algorithms of nonlinear continuous systems with non-parametric uncertainties for non-repetitive trajectory tracking

    Science.gov (United States)

    Li, Xiao-Dong; Lv, Mang-Mang; Ho, John K. L.

    2016-07-01

    In this article, two adaptive iterative learning control (ILC) algorithms are presented for nonlinear continuous systems with non-parametric uncertainties. Unlike general ILC techniques, the proposed adaptive ILC algorithms allow that both the initial error at each iteration and the reference trajectory are iteration-varying in the ILC process, and can achieve non-repetitive trajectory tracking beyond a small initial time interval. Compared to the neural network or fuzzy system-based adaptive ILC schemes and the classical ILC methods, in which the number of iterative variables is generally larger than or equal to the number of control inputs, the first adaptive ILC algorithm proposed in this paper uses just two iterative variables, while the second even uses a single iterative variable provided that some bound information on system dynamics is known. As a result, the memory space in real-time ILC implementations is greatly reduced.

  3. Detrending the long-term stellar activity and the systematics of the Kepler data with a non-parametric approach

    CERN Document Server

    Danielski, C; Tinetti, G

    2013-01-01

    The NASA Kepler mission is delivering groundbreaking results, with an increasing number of Earth-sized and moon-sized objects been discovered. A high photometric precision can be reached only through a thorough removal of the stellar activity and the instrumental systematics. We have explored here the possibility of using non-parametric methods to analyse the Simple Aperture Photometry data observed by the Kepler mission. We focused on a sample of stellar light curves with different effective temperatures and flux modulations, and we found that Gaussian Processes-based techniques can very effectively correct the instrumental systematics along with the long-term stellar activity. Our method can disentangle astrophysical features (events), such as planetary transits, flares or general sudden variations in the intensity, from the star signal and it is very efficient as it requires only a few training iterations of the Gaussian Process model. The results obtained show the potential of our method to isolate the ma...

  4. Microprocessors as an Adjunct to Statistics Instruction.

    Science.gov (United States)

    Miller, William G.

    Examinations of costs and acquisition of facilities indicate that an Altair 8800A microcomputer with a program library of parametric, non-parametric, mathematical, and teaching programs can be used effectively for teaching college-level statistics. Statistical packages presently in use require extensive computing knowledge beyond the students' and…

  5. Statistics

    CERN Document Server

    Hayslett, H T

    1991-01-01

    Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the

  6. Recursive least squares background prediction of univariate syndromic surveillance data

    OpenAIRE

    Burkom Howard; Najmi Amir-Homayoon

    2009-01-01

    Abstract Background Surveillance of univariate syndromic data as a means of potential indicator of developing public health conditions has been used extensively. This paper aims to improve the performance of detecting outbreaks by using a background forecasting algorithm based on the adaptive recursive least squares method combined with a novel treatment of the Day of the Week effect. Methods Previous work by the first author has suggested that univariate recursive least squares analysis of s...

  7. 非参数项目反应理论回顾与展望%The Retrospect and Prospect of Non-parametric Item Response Theory

    Institute of Scientific and Technical Information of China (English)

    陈婧; 康春花; 钟晓玲

    2013-01-01

      相比参数项目反应理论,非参数项目反应理论提供了更吻合实践情境的理论框架。目前非参数项目反应理论研究主要关注参数估计方法及其比较、数据-模型拟合验证等方面,其应用研究则集中于量表修订及个性数据和项目功能差异分析,而在认知诊断理论基础上发展起来的非参数认知诊断理论更是凸显其应用优势。未来研究应更多侧重于非参数项目反应理论的实践应用,对非参数认知诊断理论的研究也值得关注,以充分发挥非参数方法在实践领域的应用优势。%  Compared to parametric item response theory, non-parametric item response theory provide a more appropriate theoretical framework of practice situations. Non-parametric item response theory research focuses on parameter estimation methods and its comparison, data- model fitting verify etc. currently.Its applied research concentrate on scale amendments, personalized data and differential item functioning analysis. Non-parametric cognitive diagnostic theory which based on the parametric cognitive diagnostic theory gives prominence to the advantages of its application.To give full play to the advantages of non-parametric methods in practice,future studies should emphasis on the application of non-parametric item response theory while cognitive diagnosis of the non-parametric study is also worth of attention.

  8. A new non-parametric stationarity test of time series in the time domain

    KAUST Repository

    Jin, Lei

    2014-11-07

    © 2015 The Royal Statistical Society and Blackwell Publishing Ltd. We propose a new double-order selection test for checking second-order stationarity of a time series. To develop the test, a sequence of systematic samples is defined via Walsh functions. Then the deviations of the autocovariances based on these systematic samples from the corresponding autocovariances of the whole time series are calculated and the uniform asymptotic joint normality of these deviations over different systematic samples is obtained. With a double-order selection scheme, our test statistic is constructed by combining the deviations at different lags in the systematic samples. The null asymptotic distribution of the statistic proposed is derived and the consistency of the test is shown under fixed and local alternatives. Simulation studies demonstrate well-behaved finite sample properties of the method proposed. Comparisons with some existing tests in terms of power are given both analytically and empirically. In addition, the method proposed is applied to check the stationarity assumption of a chemical process viscosity readings data set.

  9. Statistics

    Science.gov (United States)

    Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.

  10. Non-parametric deprojection of NIKA SZ observations: Pressure distribution in the Planck-discovered cluster PSZ1 G045.85+57.71

    Science.gov (United States)

    Ruppin, F.; Adam, R.; Comis, B.; Ade, P.; André, P.; Arnaud, M.; Beelen, A.; Benoît, A.; Bideaud, A.; Billot, N.; Bourrion, O.; Calvo, M.; Catalano, A.; Coiffard, G.; D'Addabbo, A.; De Petris, M.; Désert, F.-X.; Doyle, S.; Goupy, J.; Kramer, C.; Leclercq, S.; Macías-Pérez, J. F.; Mauskopf, P.; Mayet, F.; Monfardini, A.; Pajot, F.; Pascale, E.; Perotto, L.; Pisano, G.; Pointecouteau, E.; Ponthieu, N.; Pratt, G. W.; Revéret, V.; Ritacco, A.; Rodriguez, L.; Romero, C.; Schuster, K.; Sievers, A.; Triqueneaux, S.; Tucker, C.; Zylka, R.

    2017-01-01

    The determination of the thermodynamic properties of clusters of galaxies at intermediate and high redshift can bring new insights into the formation of large-scale structures. It is essential for a robust calibration of the mass-observable scaling relations and their scatter, which are key ingredients for precise cosmology using cluster statistics. Here we illustrate an application of high resolution (R 0.02 R500) to its outskirts (R 3 R500) non-parametrically for the first time at intermediate redshift. The constraints on the resulting pressure profile allow us to reduce the relative uncertainty on the integrated Compton parameter by a factor of two compared to the Planck value. Combining the tSZ data and the deprojected electronic density profile from XMM-Newton allows us to undertake a hydrostatic mass analysis, for which we study the impact of a spherical model assumption on the total mass estimate. We also investigate the radial temperature and entropy distributions. These data indicate that PSZ1 G045.85+57.71 is a massive (M500 5.5 × 1014M⊙) cool-core cluster. This work is part of a pilot study aiming at optimizing the treatment of the NIKA2 tSZ large program dedicated to the follow-up of SZ-discovered clusters at intermediate and high redshifts. This study illustrates the potential of NIKA2 to put constraints on thethermodynamic properties and tSZ-scaling relations of these clusters, and demonstrates the excellent synergy between tSZ and X-ray observations of similar angular resolution.

  11. Wind speed forecasting at different time scales: a non parametric approach

    CERN Document Server

    D'Amico, Guglielmo; Prattico, Flavio

    2013-01-01

    The prediction of wind speed is one of the most important aspects when dealing with renewable energy. In this paper we show a new nonparametric model, based on semi-Markov chains, to predict wind speed. Particularly we use an indexed semi-Markov model, that reproduces accurately the statistical behavior of wind speed, to forecast wind speed one step ahead for different time scales and for very long time horizon maintaining the goodness of prediction. In order to check the main features of the model we show, as indicator of goodness, the root mean square error between real data and predicted ones and we compare our forecasting results with those of a persistence model.

  12. Non-parametric probabilistic forecasts of wind power: required properties and evaluation

    DEFF Research Database (Denmark)

    Pinson, Pierre; Nielsen, Henrik Aalborg; Møller, Jan Kloppenborg;

    2007-01-01

    of the conditional expectation of future generation for each look-ahead time, but also with uncertainty estimates given by probabilistic forecasts. In order to avoid assumptions on the shape of predictive distributions, these probabilistic predictions are produced from nonparametric methods, and then take the form...... of a single or a set of quantile forecasts. The required and desirable properties of such probabilistic forecasts are defined and a framework for their evaluation is proposed. This framework is applied for evaluating the quality of two statistical methods producing full predictive distributions from point......Predictions of wind power production for horizons up to 48-72 hour ahead comprise a highly valuable input to the methods for the daily management or trading of wind generation. Today, users of wind power predictions are not only provided with point predictions, which are estimates...

  13. A new measure for gene expression biclustering based on non-parametric correlation.

    Science.gov (United States)

    Flores, Jose L; Inza, Iñaki; Larrañaga, Pedro; Calvo, Borja

    2013-12-01

    One of the emerging techniques for performing the analysis of the DNA microarray data known as biclustering is the search of subsets of genes and conditions which are coherently expressed. These subgroups provide clues about the main biological processes. Until now, different approaches to this problem have been proposed. Most of them use the mean squared residue as quality measure but relevant and interesting patterns can not be detected such as shifting, or scaling patterns. Furthermore, recent papers show that there exist new coherence patterns involved in different kinds of cancer and tumors such as inverse relationships between genes which can not be captured. The proposed measure is called Spearman's biclustering measure (SBM) which performs an estimation of the quality of a bicluster based on the non-linear correlation among genes and conditions simultaneously. The search of biclusters is performed by using a evolutionary technique called estimation of distribution algorithms which uses the SBM measure as fitness function. This approach has been examined from different points of view by using artificial and real microarrays. The assessment process has involved the use of quality indexes, a set of bicluster patterns of reference including new patterns and a set of statistical tests. It has been also examined the performance using real microarrays and comparing to different algorithmic approaches such as Bimax, CC, OPSM, Plaid and xMotifs. SBM shows several advantages such as the ability to recognize more complex coherence patterns such as shifting, scaling and inversion and the capability to selectively marginalize genes and conditions depending on the statistical significance. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  14. FUNSTAT and statistical image representations

    Science.gov (United States)

    Parzen, E.

    1983-01-01

    General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.

  15. Mass univariate analysis of event-related brain potentials/fields I: a critical tutorial review.

    Science.gov (United States)

    Groppe, David M; Urbach, Thomas P; Kutas, Marta

    2011-12-01

    Event-related potentials (ERPs) and magnetic fields (ERFs) are typically analyzed via ANOVAs on mean activity in a priori windows. Advances in computing power and statistics have produced an alternative, mass univariate analyses consisting of thousands of statistical tests and powerful corrections for multiple comparisons. Such analyses are most useful when one has little a priori knowledge of effect locations or latencies, and for delineating effect boundaries. Mass univariate analyses complement and, at times, obviate traditional analyses. Here we review this approach as applied to ERP/ERF data and four methods for multiple comparison correction: strong control of the familywise error rate (FWER) via permutation tests, weak control of FWER via cluster-based permutation tests, false discovery rate control, and control of the generalized FWER. We end with recommendations for their use and introduce free MATLAB software for their implementation.

  16. The binned bispectrum estimator: template-based and non-parametric CMB non-Gaussianity searches

    CERN Document Server

    Bucher, Martin; van Tent, Bartjan

    2015-01-01

    We describe the details of the binned bispectrum estimator as used for the official 2013 and 2015 analyses of the temperature and polarization CMB maps from the ESA Planck satellite. The defining aspect of this estimator is the determination of a map bispectrum (3-point correlator) that has been binned in harmonic space. For a parametric determination of the non-Gaussianity in the map (the so-called fNL parameters), one takes the inner product of this binned bispectrum with theoretically motivated templates. However, as a complementary approach one can also smooth the binned bispectrum using a variable smoothing scale in order to suppress noise and make coherent features stand out above the noise. This allows one to look in a model-independent way for any statistically significant bispectral signal. This approach is useful for characterizing the bispectral shape of the galactic foreground emission, for which a theoretical prediction of the bispectral anisotropy is lacking, and for detecting a serendipitous pr...

  17. Non-parametric causality detection: An application to social media and financial data

    Science.gov (United States)

    Tsapeli, Fani; Musolesi, Mirco; Tino, Peter

    2017-10-01

    According to behavioral finance, stock market returns are influenced by emotional, social and psychological factors. Several recent works support this theory by providing evidence of correlation between stock market prices and collective sentiment indexes measured using social media data. However, a pure correlation analysis is not sufficient to prove that stock market returns are influenced by such emotional factors since both stock market prices and collective sentiment may be driven by a third unmeasured factor. Controlling for factors that could influence the study by applying multivariate regression models is challenging given the complexity of stock market data. False assumptions about the linearity or non-linearity of the model and inaccuracies on model specification may result in misleading conclusions. In this work, we propose a novel framework for causal inference that does not require any assumption about a particular parametric form of the model expressing statistical relationships among the variables of the study and can effectively control a large number of observed factors. We apply our method in order to estimate the causal impact that information posted in social media may have on stock market returns of four big companies. Our results indicate that social media data not only correlate with stock market returns but also influence them.

  18. 分布函数的非参数最小二乘估计%NON-PARAMETRIC LEAST SQUARE ESTIMATION OF DISTRIBUTION FUNCTION

    Institute of Scientific and Technical Information of China (English)

    柴根象; 花虹; 尚汉冀

    2002-01-01

    By using the non-parametric least square method, the strong consistent estimations of distribution function and failure function are established,where the distribution function F(x) after logist transformation is assumed to be approximated by a polynomial.The performance of simulation shows that the estimations are highly satisfactory.

  19. Evaluation dam overtopping risk based on univariate and bivariate flood frequency analysis

    OpenAIRE

    Goodarzi, E.; M. Mirzaei; L. T. Shui; Ziaei, M.

    2011-01-01

    There is a growing tendency to assess the safety levels of existing dams based on risk and uncertainty analysis using mathematical and statistical methods. This research presents the application of risk and uncertainty analysis to dam overtopping based on univariate and bivariate flood frequency analyses by applying Gumbel logistic distribution for the Doroudzan earth-fill dam in south of Iran. The bivariate frequency analysis resulted in six inflow hydrographs with a joint return period of 1...

  20. A novel scan statistics approach for clustering identification and comparison in binary genomic data.

    Science.gov (United States)

    Pellin, Danilo; Di Serio, Clelia

    2016-09-22

    In biomedical research a relevant issue is to identify time intervals or portions of a n-dimensional support where a particular event of interest is more likely to occur than expected. Algorithms that require to specify a-priori number/dimension/length of clusters assumed for the data suffer from a high degree of arbitrariness whenever no precise information are available, and this may strongly affect final estimation on parameters. Within this framework, spatial scan-statistics have been proposed in the literature, representing a valid non-parametric alternative. We adapt the so called Bernoulli-model scan statistic to the genomic field and we propose a multivariate extension, named Relative Scan Statistics, for the comparison of two series of Bernoulli r.v. defined over a common support, with the final goal of highlighting unshared event rate variations. Using a probabilistic approach based on success probability estimates and comparison (likelihood based), we can exploit an hypothesis testing procedure to identify clusters and relative clusters. Both the univariate and the novel multivariate extension of the scan statistic confirm previously published findings. The method described in the paper represents a challenging application of scan statistics framework to problem related to genomic data. From a biological perspective, these tools offer the possibility to clinicians and researcher to improve their knowledge on viral vectors integrations process, allowing to focus their attention to restricted over-targeted portion of the genome.

  1. 基于工业控制模型的非参数CUSUM入侵检测方法%A non-parametric CUSUM intrusion detection method based on industrial control model

    Institute of Scientific and Technical Information of China (English)

    张云贵; 赵华; 王丽娜

    2012-01-01

    To deal with the rising serious information security problem of the industrial control system (ICS) , this paper presents an intrusion detection method of the non-parametric cumulative sum (CUSUM) for industrial control network. Using the output-input dependent characteristics of the ICS, a mathematical model of the ICS is established to predict the output of the system. Once the sensors of the control system are under attack, the actual output will change. At every moment, the difference between the predicted output of the industrial control model and the measured signal by the sensors is calculated, and then the time-based statistical sequence is formed. By the non-parametric CUSUM algorithm, the online detection of the intrusion attacks is implemented and alarmed. The simulated detection experiments show that the proposed method has a good real-time and low false alarm rate. By choosing appropriate parameters r and β of the non-parametric CUSUM algorithm, the intrusion detection method can accurately detect the attacks before substantial damage to the control system and it is also helpful to monitor the misoperation.%为解决日趋严重的工业控制系统(industrial control system,ICS)信息安全问题,提出一种针对工业控制网络的非参数累积和( cumulative sum,CUSUM)入侵检测方法.利用ICS输入决定输出的特性,建立ICS的数学模型预测系统的输出,一旦控制系统的传感器遭受攻击,实际输出信号将发生改变.在每个时刻,计算工业控制模型的预测输出与传感器测量信号的差值,形成基于时间的统计序列,采用非参数CUSUM算法,实现在线检测入侵并报警.仿真检测实验证明,该方法具有良好的实时性和低误报率.选择适当的非参数CUSUM算法参数T和β,该入侵检测方法不但能在攻击对控制系统造成实质伤害前检测出攻击,还对监测ICS中的误操作有一定帮助.

  2. Univariate and multivariate Chen-Stein characterizations -- a parametric approach

    CERN Document Server

    Ley, Christophe

    2011-01-01

    We provide a general framework for characterizing families of (univariate, multivariate, discrete and continuous) distributions in terms of a parameter of interest. We show how this allows for recovering known Chen-Stein characterizations, and for constructing many more. Several examples are worked out in full, and different potential applications are discussed.

  3. Non-parametric reconstruction of an inflaton potential from Einstein–Cartan–Sciama–Kibble gravity with particle production

    Directory of Open Access Journals (Sweden)

    Shantanu Desai

    2016-04-01

    Full Text Available The coupling between spin and torsion in the Einstein–Cartan–Sciama–Kibble theory of gravity generates gravitational repulsion at very high densities, which prevents a singularity in a black hole and may create there a new universe. We show that quantum particle production in such a universe near the last bounce, which represents the Big Bang, gives the dynamics that solves the horizon, flatness, and homogeneity problems in cosmology. For a particular range of the particle production coefficient, we obtain a nearly constant Hubble parameter that gives an exponential expansion of the universe with more than 60 e-folds, which lasts about ∼10−42 s. This scenario can thus explain cosmic inflation without requiring a fundamental scalar field and reheating. From the obtained time dependence of the scale factor, we follow the prescription of Ellis and Madsen to reconstruct in a non-parametric way a scalar field potential which gives the same dynamics of the early universe. This potential gives the slow-roll parameters of cosmic inflation, from which we calculate the tensor-to-scalar ratio, the scalar spectral index of density perturbations, and its running as functions of the production coefficient. We find that these quantities do not significantly depend on the scale factor at the Big Bounce. Our predictions for these quantities are consistent with the Planck 2015 observations.

  4. Non-parametric reconstruction of an inflaton potential from Einstein-Cartan-Sciama-Kibble gravity with particle production

    Science.gov (United States)

    Desai, Shantanu; Popławski, Nikodem J.

    2016-04-01

    The coupling between spin and torsion in the Einstein-Cartan-Sciama-Kibble theory of gravity generates gravitational repulsion at very high densities, which prevents a singularity in a black hole and may create there a new universe. We show that quantum particle production in such a universe near the last bounce, which represents the Big Bang, gives the dynamics that solves the horizon, flatness, and homogeneity problems in cosmology. For a particular range of the particle production coefficient, we obtain a nearly constant Hubble parameter that gives an exponential expansion of the universe with more than 60 e-folds, which lasts about ∼10-42 s. This scenario can thus explain cosmic inflation without requiring a fundamental scalar field and reheating. From the obtained time dependence of the scale factor, we follow the prescription of Ellis and Madsen to reconstruct in a non-parametric way a scalar field potential which gives the same dynamics of the early universe. This potential gives the slow-roll parameters of cosmic inflation, from which we calculate the tensor-to-scalar ratio, the scalar spectral index of density perturbations, and its running as functions of the production coefficient. We find that these quantities do not significantly depend on the scale factor at the Big Bounce. Our predictions for these quantities are consistent with the Planck 2015 observations.

  5. Non-parametric reconstruction of an inflaton potential from Einstein-Cartan-Sciama-Kibble gravity with particle production

    CERN Document Server

    Desai, Shantanu

    2015-01-01

    The coupling between spin and torsion in the Einstein-Cartan-Sciama-Kibble theory of gravity generates gravitational repulsion at very high densities, which prevents a singularity in a black hole and may create there a new universe. We show that quantum particle production in such a universe near the last bounce, which represents the Big Bang gives the dynamics that solves the horizon, flatness, and homogeneity problems in cosmology. For a particular range of the particle production coefficient, we obtain a nearly constant Hubble parameter that gives an exponential expansion of the universe with more than 60 $e$-folds, which lasts about $\\sim 10^{-42}$ s. This scenario can thus explain cosmic inflation without requiring a fundamental scalar field and reheating. From the obtained time dependence of the scale factor, we follow the prescription of Ellis and Madsen to reconstruct in a non-parametric way a scalar field potential which gives the same dynamics of the early universe. This potential gives the slow-rol...

  6. Inferring the three-dimensional distribution of dust in the Galaxy with a non-parametric method: Preparing for Gaia

    CERN Document Server

    Kh., S Rezaei; Hanson, R J; Fouesneau, M

    2016-01-01

    We present a non-parametric model for inferring the three-dimensional (3D) distribution of dust density in the Milky Way. Our approach uses the extinction measured towards stars at different locations in the Galaxy at approximately known distances. Each extinction measurement is proportional to the integrated dust density along its line-of-sight. Making simple assumptions about the spatial correlation of the dust density, we can infer the most probable 3D distribution of dust across the entire observed region, including along sight lines which were not observed. This is possible because our model employs a Gaussian Process to connect all lines-of-sight. We demonstrate the capability of our model to capture detailed dust density variations using mock data as well as simulated data from the Gaia Universe Model Snapshot. We then apply our method to a sample of giant stars observed by APOGEE and Kepler to construct a 3D dust map over a small region of the Galaxy. Due to our smoothness constraint and its isotropy,...

  7. Super-resolution non-parametric deconvolution in modelling the radial response function of a parallel plate ionization chamber.

    Science.gov (United States)

    Kulmala, A; Tenhunen, M

    2012-11-07

    The signal of the dosimetric detector is generally dependent on the shape and size of the sensitive volume of the detector. In order to optimize the performance of the detector and reliability of the output signal the effect of the detector size should be corrected or, at least, taken into account. The response of the detector can be modelled using the convolution theorem that connects the system input (actual dose), output (measured result) and the effect of the detector (response function) by a linear convolution operator. We have developed the super-resolution and non-parametric deconvolution method for determination of the cylinder symmetric ionization chamber radial response function. We have demonstrated that the presented deconvolution method is able to determine the radial response for the Roos parallel plate ionization chamber with a better than 0.5 mm correspondence with the physical measures of the chamber. In addition, the performance of the method was proved by the excellent agreement between the output factors of the stereotactic conical collimators (4-20 mm diameter) measured by the Roos chamber, where the detector size is larger than the measured field, and the reference detector (diode). The presented deconvolution method has a potential in providing reference data for more accurate physical models of the ionization chamber as well as for improving and enhancing the performance of the detectors in specific dosimetric problems.

  8. A sharper view of Pal 5's tails: Discovery of stream perturbations with a novel non-parametric technique

    CERN Document Server

    Erkal, Denis; Belokurov, Vasily

    2016-01-01

    Only in the Milky Way is it possible to conduct an experiment which uses stellar streams to detect low-mass dark matter subhaloes. In smooth and static host potentials, tidal tails of disrupting satellites appear highly symmetric. However, dark perturbers induce density fluctuations that destroy this symmetry. Motivated by the recent release of unprecedentedly deep and wide imaging data around the Pal 5 stellar stream, we develop a new probabilistic, adaptive and non-parametric technique which allows us to bring the cluster's tidal tails into clear focus. Strikingly, we uncover a stream whose density exhibits visible changes on a variety of angular scales. We detect significant bumps and dips, both narrow and broad: two peaks on either side of the progenitor, each only a fraction of a degree across, and two gaps, $\\sim2^{\\circ}$ and $\\sim9^{\\circ}$ wide, the latter accompanied by a gargantuan lump of debris. This largest density feature results in a pronounced inter-tail asymmetry which cannot be made consist...

  9. The merger fraction of active and inactive galaxies in the local Universe through an improved non-parametric classification

    CERN Document Server

    Cotini, Stefano; Caccianiga, Alessandro; Colpi, Monica; Della Ceca, Roberto; Mapelli, Michela; Severgnini, Paola; Segreto, Alberto; 10.1093/mnras/stt358

    2013-01-01

    We investigate the possible link between mergers and the enhanced activity of supermassive black holes (SMBHs) at the centre of galaxies, by comparing the merger fraction of a local sample (0.003 =< z < 0.03) of active galaxies - 59 active galactic nuclei (AGN) host galaxies selected from the all-sky Swift BAT (Burst Alert Telescope) survey - with an appropriate control sample (247 sources extracted from the Hyperleda catalogue) that has the same redshift distribution as the BAT sample. We detect the interacting systems in the two samples on the basis of non-parametric structural indexes of concentration (C), asymmetry (A), clumpiness (S), Gini coefficient (G) and second order momentum of light (M20). In particular, we propose a new morphological criterion, based on a combination of all these indexes, that improves the identification of interacting systems. We also present a new software - PyCASSo (Python CAS Software) - for the automatic computation of the structural indexes. After correcting for the c...

  10. Non-parametric analysis of infrared spectra for recognition of glass and glass ceramic fragments in recycling plants.

    Science.gov (United States)

    Farcomeni, Alessio; Serranti, Silvia; Bonifazi, Giuseppe

    2008-01-01

    Glass ceramic detection in glass recycling plants represents a still unsolved problem, as glass ceramic material looks like normal glass and is usually detected only by specialized personnel. The presence of glass-like contaminants inside waste glass products, resulting from both industrial and differentiated urban waste collection, increases process production costs and reduces final product quality. In this paper an innovative approach for glass ceramic recognition, based on the non-parametric analysis of infrared spectra, is proposed and investigated. The work was specifically addressed to the spectral classification of glass and glass ceramic fragments collected in an actual recycling plant from three different production lines: flat glass, colored container-glass and white container-glass. The analyses, carried out in the near and mid-infrared (NIR-MIR) spectral field (1280-4480 nm), show that glass ceramic and glass fragments can be recognized by applying a wavelet transform, with a small classification error. Moreover, a method for selecting only a small subset of relevant wavelength ratios is suggested, allowing the conduct of a fast recognition of the two classes of materials. The results show how the proposed approach can be utilized to develop a classification engine to be integrated inside a hardware and software sorting architecture for fast "on-line" ceramic glass recognition and separation.

  11. Prediction intervals for future BMI values of individual children: a non-parametric approach by quantile boosting.

    Science.gov (United States)

    Mayr, Andreas; Hothorn, Torsten; Fenske, Nora

    2012-01-25

    The construction of prediction intervals (PIs) for future body mass index (BMI) values of individual children based on a recent German birth cohort study with n = 2007 children is problematic for standard parametric approaches, as the BMI distribution in childhood is typically skewed depending on age. We avoid distributional assumptions by directly modelling the borders of PIs by additive quantile regression, estimated by boosting. We point out the concept of conditional coverage to prove the accuracy of PIs. As conditional coverage can hardly be evaluated in practical applications, we conduct a simulation study before fitting child- and covariate-specific PIs for future BMI values and BMI patterns for the present data. The results of our simulation study suggest that PIs fitted by quantile boosting cover future observations with the predefined coverage probability and outperform the benchmark approach. For the prediction of future BMI values, quantile boosting automatically selects informative covariates and adapts to the age-specific skewness of the BMI distribution. The lengths of the estimated PIs are child-specific and increase, as expected, with the age of the child. Quantile boosting is a promising approach to construct PIs with correct conditional coverage in a non-parametric way. It is in particular suitable for the prediction of BMI patterns depending on covariates, since it provides an interpretable predictor structure, inherent variable selection properties and can even account for longitudinal data structures.

  12. Non-parametric convolution based image-segmentation of ill-posed objects applying context window approach

    CERN Document Server

    Kumar, Upendra; Pal, Manoj Kumar

    2012-01-01

    Context-dependence in human cognition process is a well-established fact. Following this, we introduced the image segmentation method that can use context to classify a pixel on the basis of its membership to a particular object-class of the concerned image. In the broad methodological steps, each pixel was defined by its context window (CW) surrounding it the size of which was fixed heuristically. CW texture defined by the intensities of its pixels was convoluted with weights optimized through a non-parametric function supported by a backpropagation network. Result of convolution was used to classify them. The training data points (i.e., pixels) were carefully chosen to include all variety of contexts of types, i) points within the object, ii) points near the edge but inside the objects, iii) points at the border of the objects, iv) points near the edge but outside the objects, v) points near or at the edge of the image frame. Moreover the training data points were selected from all the images within image-d...

  13. CAUSALITY BETWEEN GDP, ENERGY AND COAL CONSUMPTION IN INDIA, 1970-2011: A NON-PARAMETRIC BOOTSTRAP APPROACH

    Directory of Open Access Journals (Sweden)

    Rohin Anhal

    2013-10-01

    Full Text Available The aim of this paper is to examine the direction of causality between real GDP on the one hand and final energy and coal consumption on the other in India, for the period from 1970 to 2011. The methodology adopted is the non-parametric bootstrap procedure, which is used to construct the critical values for the hypothesis of causality. The results of the bootstrap tests show that for total energy consumption, there exists no causal relationship in either direction with GDP of India. However, if coal consumption is considered, we find evidence in support of unidirectional causality running from coal consumption to GDP. This clearly has important implications for the Indian economy. The most important implication is that curbing coal consumption in order to reduce carbon emissions would in turn have a limiting effect on economic growth. Our analysis contributes to the literature in three distinct ways. First, this is the first paper to use the bootstrap method to examine the growth-energy connection for the Indian economy. Second, we analyze data for the time period 1970 to 2011, thereby utilizing recently available data that has not been used by others. Finally, in contrast to the recently done studies, we adopt a disaggregated approach for the analysis of the growth-energy nexus by considering not only aggregate energy consumption, but coal consumption as well.

  14. Univariate Niho Bent Functions from o-Polynomials

    OpenAIRE

    Budaghyan, Lilya; Kholosha, Alexander; Carlet, Claude; Helleseth, Tor

    2014-01-01

    In this paper, we discover that any univariate Niho bent function is a sum of functions having the form of Leander-Kholosha bent functions with extra coefficients of the power terms. This allows immediately, knowing the terms of an o-polynomial, to obtain the powers of the additive terms in the polynomial representing corresponding bent function. However, the coefficients are calculated ambiguously. The explicit form is given for the bent functions obtained from quadratic and cubic o-polynomi...

  15. Semi-automatic liver tumor segmentation with hidden Markov measure field model and non-parametric distribution estimation.

    Science.gov (United States)

    Häme, Yrjö; Pollari, Mika

    2012-01-01

    A novel liver tumor segmentation method for CT images is presented. The aim of this work was to reduce the manual labor and time required in the treatment planning of radiofrequency ablation (RFA), by providing accurate and automated tumor segmentations reliably. The developed method is semi-automatic, requiring only minimal user interaction. The segmentation is based on non-parametric intensity distribution estimation and a hidden Markov measure field model, with application of a spherical shape prior. A post-processing operation is also presented to remove the overflow to adjacent tissue. In addition to the conventional approach of using a single image as input data, an approach using images from multiple contrast phases was developed. The accuracy of the method was validated with two sets of patient data, and artificially generated samples. The patient data included preoperative RFA images and a public data set from "3D Liver Tumor Segmentation Challenge 2008". The method achieved very high accuracy with the RFA data, and outperformed other methods evaluated with the public data set, receiving an average overlap error of 30.3% which represents an improvement of 2.3% points to the previously best performing semi-automatic method. The average volume difference was 23.5%, and the average, the RMS, and the maximum surface distance errors were 1.87, 2.43, and 8.09 mm, respectively. The method produced good results even for tumors with very low contrast and ambiguous borders, and the performance remained high with noisy image data.

  16. A sharper view of Pal 5's tails: discovery of stream perturbations with a novel non-parametric technique

    Science.gov (United States)

    Erkal, Denis; Koposov, Sergey E.; Belokurov, Vasily

    2017-09-01

    Only in the Milky Way is it possible to conduct an experiment that uses stellar streams to detect low-mass dark matter subhaloes. In smooth and static host potentials, tidal tails of disrupting satellites appear highly symmetric. However, perturbations from dark subhaloes, as well as from GMCs and the Milky Way bar, can induce density fluctuations that destroy this symmetry. Motivated by the recent release of unprecedentedly deep and wide imaging data around the Pal 5 stellar stream, we develop a new probabilistic, adaptive and non-parametric technique that allows us to bring the cluster's tidal tails into clear focus. Strikingly, we uncover a stream whose density exhibits visible changes on a variety of angular scales. We detect significant bumps and dips, both narrow and broad: two peaks on either side of the progenitor, each only a fraction of a degree across, and two gaps, ∼2° and ∼9° wide, the latter accompanied by a gargantuan lump of debris. This largest density feature results in a pronounced intertail asymmetry which cannot be made consistent with an unperturbed stream according to a suite of simulations we have produced. We conjecture that the sharp peaks around Pal 5 are epicyclic overdensities, while the two dips are consistent with impacts by subhaloes. Assuming an age of 3.4 Gyr for Pal 5, these two gaps would correspond to the characteristic size of gaps created by subhaloes in the mass range of 106-107 M⊙ and 107-108 M⊙, respectively. In addition to dark substructure, we find that the bar of the Milky Way can plausibly produce the asymmetric density seen in Pal 5 and that GMCs could cause the smaller gap.

  17. Comparative study of species sensitivity distributions based on non-parametric kernel density estimation for some transition metals.

    Science.gov (United States)

    Wang, Ying; Feng, Chenglian; Liu, Yuedan; Zhao, Yujie; Li, Huixian; Zhao, Tianhui; Guo, Wenjing

    2017-02-01

    Transition metals in the fourth period of the periodic table of the elements are widely widespread in aquatic environments. They could often occur at certain concentrations to cause adverse effects on aquatic life and human health. Generally, parametric models are mostly used to construct species sensitivity distributions (SSDs), which result in comparison for water quality criteria (WQC) of elements in the same period or group of the periodic table might be inaccurate and the results could be biased. To address this inadequacy, the non-parametric kernel density estimation (NPKDE) with its optimal bandwidths and testing methods were developed for establishing SSDs. The NPKDE was better fit, more robustness and better predicted than conventional normal and logistic parametric density estimations for constructing SSDs and deriving acute HC5 and WQC for transition metals in the fourth period of the periodic table. The decreasing sequence of HC5 values for the transition metals in the fourth period was Ti > Mn > V > Ni > Zn > Cu > Fe > Co > Cr(VI), which were not proportional to atomic number in the periodic table, and for different metals the relatively sensitive species were also different. The results indicated that except for physical and chemical properties there are other factors affecting toxicity mechanisms of transition metals. The proposed method enriched the methodological foundation for WQC. Meanwhile, it also provided a relatively innovative, accurate approach for the WQC derivation and risk assessment of the same group and period metals in aquatic environments to support protection of aquatic organisms.

  18. Mathematical statistics and stochastic processes

    CERN Document Server

    Bosq, Denis

    2013-01-01

    Generally, books on mathematical statistics are restricted to the case of independent identically distributed random variables. In this book however, both this case AND the case of dependent variables, i.e. statistics for discrete and continuous time processes, are studied. This second case is very important for today's practitioners.Mathematical Statistics and Stochastic Processes is based on decision theory and asymptotic statistics and contains up-to-date information on the relevant topics of theory of probability, estimation, confidence intervals, non-parametric statistics and rob

  19. Statistical concepts a second course

    CERN Document Server

    Lomax, Richard G

    2012-01-01

    Statistical Concepts consists of the last 9 chapters of An Introduction to Statistical Concepts, 3rd ed. Designed for the second course in statistics, it is one of the few texts that focuses just on intermediate statistics. The book highlights how statistics work and what they mean to better prepare students to analyze their own data and interpret SPSS and research results. As such it offers more coverage of non-parametric procedures used when standard assumptions are violated since these methods are more frequently encountered when working with real data. Determining appropriate sample sizes

  20. Short-term monitoring of benzene air concentration in an urban area: a preliminary study of application of Kruskal-Wallis non-parametric test to assess pollutant impact on global environment and indoor.

    Science.gov (United States)

    Mura, Maria Chiara; De Felice, Marco; Morlino, Roberta; Fuselli, Sergio

    2010-01-01

    In step with the need to develop statistical procedures to manage small-size environmental samples, in this work we have used concentration values of benzene (C6H6), concurrently detected by seven outdoor and indoor monitoring stations over 12 000 minutes, in order to assess the representativeness of collected data and the impact of the pollutant on indoor environment. Clearly, the former issue is strictly connected to sampling-site geometry, which proves critical to correctly retrieving information from analysis of pollutants of sanitary interest. Therefore, according to current criteria for network-planning, single stations have been interpreted as nodes of a set of adjoining triangles; then, a) node pairs have been taken into account in order to estimate pollutant stationarity on triangle sides, as well as b) node triplets, to statistically associate data from air-monitoring with the corresponding territory area, and c) node sextuplets, to assess the impact probability of the outdoor pollutant on indoor environment for each area. Distributions from the various node combinations are all non-Gaussian, in the consequently, Kruskal-Wallis (KW) non-parametric statistics has been exploited to test variability on continuous density function from each pair, triplet and sextuplet. Results from the above-mentioned statistical analysis have shown randomness of site selection, which has not allowed a reliable generalization of monitoring data to the entire selected territory, except for a single "forced" case (70%); most important, they suggest a possible procedure to optimize network design.

  1. Short-term monitoring of benzene air concentration in an urban area: a preliminary study of application of Kruskal-Wallis non-parametric test to assess pollutant impact on global environment and indoor

    Directory of Open Access Journals (Sweden)

    Maria Chiara Mura

    2010-12-01

    Full Text Available In step with the need to develop statistical procedures to manage small-size environmental samples, in this work we have used concentration values of benzene (C6H6, concurrently detected by seven outdoor and indoor monitoring stations over 12 000 minutes, in order to assess the representativeness of collected data and the impact of the pollutant on indoor environment. Clearly, the former issue is strictly connected to sampling-site geometry, which proves critical to correctly retrieving information from analysis of pollutants of sanitary interest. Therefore, according to current criteria for network-planning, single stations have been interpreted as nodes of a set of adjoining triangles; then, a node pairs have been taken into account in order to estimate pollutant stationarity on triangle sides, as well as b node triplets, to statistically associate data from air-monitoring with the corresponding territory area, and c node sextuplets, to assess the impact probability of the outdoor pollutant on indoor environment for each area. Distributions from the various node combinations are all non-Gaussian, in the consequently, Kruskal-Wallis (KW non-parametric statistics has been exploited to test variability on continuous density function from each pair, triplet and sextuplet. Results from the above-mentioned statistical analysis have shown randomness of site selection, which has not allowed a reliable generalization of monitoring data to the entire selected territory, except for a single "forced" case (70%; most important, they suggest a possible procedure to optimize network design.

  2. Compounding approach for univariate time series with non-stationary variances

    CERN Document Server

    Schäfer, Rudi; Guhr, Thomas; Stöckmann, Hans-Jürgen; Kuhl, Ulrich

    2015-01-01

    A defining feature of non-stationary systems is the time dependence of their statistical parameters. Measured time series may exhibit Gaussian statistics on short time horizons, due to the central limit theorem. The sample statistics for long time horizons, however, averages over the time-dependent parameters. To model the long-term statistical behavior, we compound the local distribution with the distribution of its parameters. Here we consider two concrete, but diverse examples of such non-stationary systems, the turbulent air flow of a fan and a time series of foreign exchange rates. Our main focus is to empirically determine the appropriate parameter distribution for the compounding approach. To this end we have to estimate the parameter distribution for univariate time series in a highly non-stationary situation.

  3. Effect Sizes for Research Univariate and Multivariate Applications

    CERN Document Server

    Grissom, Robert J

    2011-01-01

    Noted for its comprehensive coverage, this greatly expanded new edition now covers the use of univariate and multivariate effect sizes. Many measures and estimators are reviewed along with their application, interpretation, and limitations. Noted for its practical approach, the book features numerous examples using real data for a variety of variables and designs, to help readers apply the material to their own data. Tips on the use of SPSS, SAS, R, and S-Plus are provided. The book's broad disciplinary appeal results from its inclusion of a variety of examples from psychology, medicine, educa

  4. Univariate real root isolation in an extension field

    DEFF Research Database (Denmark)

    Strzebonski, Adam; Tsigaridas, Elias

    2011-01-01

    We present algorithmic, complexity and implementation results for the problem of isolating the real roots of a univariate polynomial in Bα ∈ L[y], where L=Qα is a simple algebraic extension of the rational numbers. We revisit two approaches for the problem. In the first approach, using resultant...... a complexity bound of OB(N8) and for the latter a bound of OB(N7). We implemented the algorithms in C as part of the core library of Mathematica and we illustrate their efficiency over various data sets. Finally, we present complexity results for the general case of the first approach, where the coefficients...

  5. Non-parametric Bayesian approach to post-translational modification refinement of predictions from tandem mass spectrometry.

    Science.gov (United States)

    Chung, Clement; Emili, Andrew; Frey, Brendan J

    2013-04-01

    Tandem mass spectrometry (MS/MS) is a dominant approach for large-scale high-throughput post-translational modification (PTM) profiling. Although current state-of-the-art blind PTM spectral analysis algorithms can predict thousands of modified peptides (PTM predictions) in an MS/MS experiment, a significant percentage of these predictions have inaccurate modification mass estimates and false modification site assignments. This problem can be addressed by post-processing the PTM predictions with a PTM refinement algorithm. We developed a novel PTM refinement algorithm, iPTMClust, which extends a recently introduced PTM refinement algorithm PTMClust and uses a non-parametric Bayesian model to better account for uncertainties in the quantity and identity of PTMs in the input data. The use of this new modeling approach enables iPTMClust to provide a confidence score per modification site that allows fine-tuning and interpreting resulting PTM predictions. The primary goal behind iPTMClust is to improve the quality of the PTM predictions. First, to demonstrate that iPTMClust produces sensible and accurate cluster assignments, we compare it with k-means clustering, mixtures of Gaussians (MOG) and PTMClust on a synthetically generated PTM dataset. Second, in two separate benchmark experiments using PTM data taken from a phosphopeptide and a yeast proteome study, we show that iPTMClust outperforms state-of-the-art PTM prediction and refinement algorithms, including PTMClust. Finally, we illustrate the general applicability of our new approach on a set of human chromatin protein complex data, where we are able to identify putative novel modified peptides and modification sites that may be involved in the formation and regulation of protein complexes. Our method facilitates accurate PTM profiling, which is an important step in understanding the mechanisms behind many biological processes and should be an integral part of any proteomic study. Our algorithm is implemented in

  6. Selectivity in analytical chemistry: two interpretations for univariate methods.

    Science.gov (United States)

    Dorkó, Zsanett; Verbić, Tatjana; Horvai, George

    2015-01-01

    Selectivity is extremely important in analytical chemistry but its definition is elusive despite continued efforts by professional organizations and individual scientists. This paper shows that the existing selectivity concepts for univariate analytical methods broadly fall in two classes: selectivity concepts based on measurement error and concepts based on response surfaces (the response surface being the 3D plot of the univariate signal as a function of analyte and interferent concentration, respectively). The strengths and weaknesses of the different definitions are analyzed and contradictions between them unveiled. The error based selectivity is very general and very safe but its application to a range of samples (as opposed to a single sample) requires the knowledge of some constraint about the possible sample compositions. The selectivity concepts based on the response surface are easily applied to linear response surfaces but may lead to difficulties and counterintuitive results when applied to nonlinear response surfaces. A particular advantage of this class of selectivity is that with linear response surfaces it can provide a concentration independent measure of selectivity. In contrast, the error based selectivity concept allows only yes/no type decision about selectivity.

  7. Recursive least squares background prediction of univariate syndromic surveillance data

    Directory of Open Access Journals (Sweden)

    Burkom Howard

    2009-01-01

    Full Text Available Abstract Background Surveillance of univariate syndromic data as a means of potential indicator of developing public health conditions has been used extensively. This paper aims to improve the performance of detecting outbreaks by using a background forecasting algorithm based on the adaptive recursive least squares method combined with a novel treatment of the Day of the Week effect. Methods Previous work by the first author has suggested that univariate recursive least squares analysis of syndromic data can be used to characterize the background upon which a prediction and detection component of a biosurvellance system may be built. An adaptive implementation is used to deal with data non-stationarity. In this paper we develop and implement the RLS method for background estimation of univariate data. The distinctly dissimilar distribution of data for different days of the week, however, can affect filter implementations adversely, and so a novel procedure based on linear transformations of the sorted values of the daily counts is introduced. Seven-days ahead daily predicted counts are used as background estimates. A signal injection procedure is used to examine the integrated algorithm's ability to detect synthetic anomalies in real syndromic time series. We compare the method to a baseline CDC forecasting algorithm known as the W2 method. Results We present detection results in the form of Receiver Operating Characteristic curve values for four different injected signal to noise ratios using 16 sets of syndromic data. We find improvements in the false alarm probabilities when compared to the baseline W2 background forecasts. Conclusion The current paper introduces a prediction approach for city-level biosurveillance data streams such as time series of outpatient clinic visits and sales of over-the-counter remedies. This approach uses RLS filters modified by a correction for the weekly patterns often seen in these data series, and a threshold

  8. Forecasting inflation in Montenegro using univariate time series models

    Directory of Open Access Journals (Sweden)

    Milena Lipovina-Božović

    2015-04-01

    Full Text Available The analysis of price trends and their prognosis is one of the key tasks of the economic authorities in each country. Due to the nature of the Montenegrin economy as small and open economy with euro as currency, forecasting inflation is very specific which is more difficult due to low quality of the data. This paper analyzes the utility and applicability of univariate time series models for forecasting price index in Montenegro. Data analysis of key macroeconomic movements in previous decades indicates the presence of many possible determinants that could influence forecasting result. This paper concludes that the forecasting models (ARIMA based only on its own previous values cannot adequately cover the key factors that determine the price level in the future, probably because of the existence of numerous external factors that influence the price movement in Montenegro.

  9. Forecasting electricity usage using univariate time series models

    Science.gov (United States)

    Hock-Eam, Lim; Chee-Yin, Yip

    2014-12-01

    Electricity is one of the important energy sources. A sufficient supply of electricity is vital to support a country's development and growth. Due to the changing of socio-economic characteristics, increasing competition and deregulation of electricity supply industry, the electricity demand forecasting is even more important than before. It is imperative to evaluate and compare the predictive performance of various forecasting methods. This will provide further insights on the weakness and strengths of each method. In literature, there are mixed evidences on the best forecasting methods of electricity demand. This paper aims to compare the predictive performance of univariate time series models for forecasting the electricity demand using a monthly data of maximum electricity load in Malaysia from January 2003 to December 2013. Results reveal that the Box-Jenkins method produces the best out-of-sample predictive performance. On the other hand, Holt-Winters exponential smoothing method is a good forecasting method for in-sample predictive performance.

  10. Certified counting of roots of random univariate polynomials

    CERN Document Server

    Cleveland, Joseph; Hauenstein, Jonathan D; Haywood, Ian; Mehta, Dhagash; Morse, Anthony; Robol, Leonardo; Schlenk, Taylor

    2014-01-01

    A challenging problem in computational mathematics is to compute roots of a high-degree univariate random polynomial. We combine an efficient multiprecision implementation for solving high-degree random polynomials with two certification methods, namely Smale's $\\alpha$-theory and one based on Gerschgorin's theorem, for showing that a given numerical approximation is in the quadratic convergence region of Newton's method of some exact solution. With this combination, we can certifiably count the number of real roots of random polynomials. We quantify the difference between the two certification procedures and list the salient features of both of them. After benchmarking on random polynomials where the coefficients are drawn from the Gaussian distribution, we obtain novel experimental results for the Cauchy distribution case.

  11. Jelly pineapple syneresis assessment via univariate and multivariate analysis

    Directory of Open Access Journals (Sweden)

    Carlos Alberto da Silva Ledo

    2010-09-01

    Full Text Available The evaluation of the pineapple jelly is intended to analyze the occurrence of syneresis by univariate and multivariate analysis. The jelly of the pineapple presents low concentration pectin, therefore, it was added high methoxyl pectin in the following concentrations: 0.50%, 0.75% and 1.00% corresponding to slow, medium and fast speed of gel formation process. In this study it was checked the pH, acidity, brix and the syneresis of jelly. The highest concentration of pectin in the jelly showed a decrease in the release of the water, syneresis. This result showed that the percentage of 1.00% of pectin in jelly is necessary to form the gel and to obtain a suitable texture.

  12. A comparison of bivariate and univariate QTL mapping in livestock populations

    Directory of Open Access Journals (Sweden)

    Sorensen Daniel

    2003-11-01

    Full Text Available Abstract This study presents a multivariate, variance component-based QTL mapping model implemented via restricted maximum likelihood (REML. The method was applied to investigate bivariate and univariate QTL mapping analyses, using simulated data. Specifically, we report results on the statistical power to detect a QTL and on the precision of parameter estimates using univariate and bivariate approaches. The model and methodology were also applied to study the effectiveness of partitioning the overall genetic correlation between two traits into a component due to many genes of small effect, and one due to the QTL. It is shown that when the QTL has a pleiotropic effect on two traits, a bivariate analysis leads to a higher statistical power of detecting the QTL and to a more precise estimate of the QTL's map position, in particular in the case when the QTL has a small effect on the trait. The increase in power is most marked in cases where the contributions of the QTL and of the polygenic components to the genetic correlation have opposite signs. The bivariate REML analysis can successfully partition the two components contributing to the genetic correlation between traits.

  13. Multivariate spatial Gaussian mixture modeling for statistical clustering of hemodynamic parameters in functional MRI

    Energy Technology Data Exchange (ETDEWEB)

    Fouque, A.L.; Ciuciu, Ph.; Risser, L. [NeuroSpin/CEA, F-91191 Gif-sur-Yvette (France); Fouque, A.L.; Ciuciu, Ph.; Risser, L. [IFR 49, Institut d' Imagerie Neurofonctionnelle, Paris (France)

    2009-07-01

    In this paper, a novel statistical parcellation of intra-subject functional MRI (fMRI) data is proposed. The key idea is to identify functionally homogenous regions of interest from their hemodynamic parameters. To this end, a non-parametric voxel-based estimation of hemodynamic response function is performed as a prerequisite. Then, the extracted hemodynamic features are entered as the input data of a Multivariate Spatial Gaussian Mixture Model (MSGMM) to be fitted. The goal of the spatial aspect is to favor the recovery of connected components in the mixture. Our statistical clustering approach is original in the sense that it extends existing works done on univariate spatially regularized Gaussian mixtures. A specific Gibbs sampler is derived to account for different covariance structures in the feature space. On realistic artificial fMRI datasets, it is shown that our algorithm is helpful for identifying a parsimonious functional parcellation required in the context of joint detection estimation of brain activity. This allows us to overcome the classical assumption of spatial stationarity of the BOLD signal model. (authors)

  14. The relationship between multilevel models and non-parametric multilevel mixture models: Discrete approximation of intraclass correlation, random coefficient distributions, and residual heteroscedasticity.

    Science.gov (United States)

    Rights, Jason D; Sterba, Sonya K

    2016-11-01

    Multilevel data structures are common in the social sciences. Often, such nested data are analysed with multilevel models (MLMs) in which heterogeneity between clusters is modelled by continuously distributed random intercepts and/or slopes. Alternatively, the non-parametric multilevel regression mixture model (NPMM) can accommodate the same nested data structures through discrete latent class variation. The purpose of this article is to delineate analytic relationships between NPMM and MLM parameters that are useful for understanding the indirect interpretation of the NPMM as a non-parametric approximation of the MLM, with relaxed distributional assumptions. We define how seven standard and non-standard MLM specifications can be indirectly approximated by particular NPMM specifications. We provide formulas showing how the NPMM can serve as an approximation of the MLM in terms of intraclass correlation, random coefficient means and (co)variances, heteroscedasticity of residuals at level 1, and heteroscedasticity of residuals at level 2. Further, we discuss how these relationships can be useful in practice. The specific relationships are illustrated with simulated graphical demonstrations, and direct and indirect interpretations of NPMM classes are contrasted. We provide an R function to aid in implementing and visualizing an indirect interpretation of NPMM classes. An empirical example is presented and future directions are discussed. © 2016 The British Psychological Society.

  15. A Non-Parametric Approach for the Activation Detection of Block Design fMRI Simulated Data Using Self-Organizing Maps and Support Vector Machine.

    Science.gov (United States)

    Bahrami, Sheyda; Shamsi, Mousa

    2017-01-01

    Functional magnetic resonance imaging (fMRI) is a popular method to probe the functional organization of the brain using hemodynamic responses. In this method, volume images of the entire brain are obtained with a very good spatial resolution and low temporal resolution. However, they always suffer from high dimensionality in the face of classification algorithms. In this work, we combine a support vector machine (SVM) with a self-organizing map (SOM) for having a feature-based classification by using SVM. Then, a linear kernel SVM is used for detecting the active areas. Here, we use SOM for feature extracting and labeling the datasets. SOM has two major advances: (i) it reduces dimension of data sets for having less computational complexity and (ii) it is useful for identifying brain regions with small onset differences in hemodynamic responses. Our non-parametric model is compared with parametric and non-parametric methods. We use simulated fMRI data sets and block design inputs in this paper and consider the contrast to noise ratio (CNR) value equal to 0.6 for simulated datasets. fMRI simulated dataset has contrast 1-4% in active areas. The accuracy of our proposed method is 93.63% and the error rate is 6.37%.

  16. Univariate/multivariate genome-wide association scans using data from families and unrelated samples.

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    Full Text Available As genome-wide association studies (GWAS are becoming more popular, two approaches, among others, could be considered in order to improve statistical power for identifying genes contributing subtle to moderate effects to human diseases. The first approach is to increase sample size, which could be achieved by combining both unrelated and familial subjects together. The second approach is to jointly analyze multiple correlated traits. In this study, by extending generalized estimating equations (GEEs, we propose a simple approach for performing univariate or multivariate association tests for the combined data of unrelated subjects and nuclear families. In particular, we correct for population stratification by integrating principal component analysis and transmission disequilibrium test strategies. The proposed method allows for multiple siblings as well as missing parental information. Simulation studies show that the proposed test has improved power compared to two popular methods, EIGENSTRAT and FBAT, by analyzing the combined data, while correcting for population stratification. In addition, joint analysis of bivariate traits has improved power over univariate analysis when pleiotropic effects are present. Application to the Genetic Analysis Workshop 16 (GAW16 data sets attests to the feasibility and applicability of the proposed method.

  17. Evaluation dam overtopping risk based on univariate and bivariate flood frequency analysis

    Science.gov (United States)

    Goodarzi, E.; Mirzaei, M.; Shui, L. T.; Ziaei, M.

    2011-11-01

    There is a growing tendency to assess the safety levels of existing dams based on risk and uncertainty analysis using mathematical and statistical methods. This research presents the application of risk and uncertainty analysis to dam overtopping based on univariate and bivariate flood frequency analyses by applying Gumbel logistic distribution for the Doroudzan earth-fill dam in south of Iran. The bivariate frequency analysis resulted in six inflow hydrographs with a joint return period of 100-yr. The overtopping risks were computed for all of those hydrographs considering quantile of flood peak discharge (in particular 100-yr), initial depth of water in the reservoir, and discharge coefficient of spillway as uncertain variables. The maximum height of the water, as most important factor in the overtopping analysis, was evaluated using reservoir routing and the Monte Carlo and Latin hypercube techniques were applied for uncertainty analysis. Finally, the achieved results using both univariate and bivariate frequency analysis have been compared to show the significance of bivariate analyses on dam overtopping.

  18. Application of Non-parametric Statistics in Market Research%非参数统计分析方法在市场调查中的应用

    Institute of Scientific and Technical Information of China (English)

    曹小敬

    2007-01-01

    市场调查是以市场为对象,收集、记录、整理与分析企业经营活动有关的数据、资料的活动。市场调查对于企业而言,犹如医生诊断患者,不经市场调查,就无从了解市场情况,就无从制定企业的经营战略。

  19. Forecasting electric vehicles sales with univariate and multivariate time series models: The case of China.

    Science.gov (United States)

    Zhang, Yong; Zhong, Miner; Geng, Nana; Jiang, Yunjian

    2017-01-01

    The market demand for electric vehicles (EVs) has increased in recent years. Suitable models are necessary to understand and forecast EV sales. This study presents a singular spectrum analysis (SSA) as a univariate time-series model and vector autoregressive model (VAR) as a multivariate model. Empirical results suggest that SSA satisfactorily indicates the evolving trend and provides reasonable results. The VAR model, which comprised exogenous parameters related to the market on a monthly basis, can significantly improve the prediction accuracy. The EV sales in China, which are categorized into battery and plug-in EVs, are predicted in both short term (up to December 2017) and long term (up to 2020), as statistical proofs of the growth of the Chinese EV industry.

  20. Effect sizes for research univariate and multivariate applications

    CERN Document Server

    Grissom, Robert J

    2005-01-01

    The goal of this book is to inform a broad readership about a variety of measures and estimators of effect sizes for research, their proper applications and interpretations, and their limitations. Its focus is on analyzing post-research results. The book provides an evenhanded account of controversial issues in the field, such as the role of significance testing. Consistent with the trend toward greater use of robust statistical methods, the book pays much attention to the statistical assumptions of the methods and to robust measures of effect size.Effect Sizes for Research

  1. Water quality analysis in rivers with non-parametric probability distributions and fuzzy inference systems: application to the Cauca River, Colombia.

    Science.gov (United States)

    Ocampo-Duque, William; Osorio, Carolina; Piamba, Christian; Schuhmacher, Marta; Domingo, José L

    2013-02-01

    The integration of water quality monitoring variables is essential in environmental decision making. Nowadays, advanced techniques to manage subjectivity, imprecision, uncertainty, vagueness, and variability are required in such complex evaluation process. We here propose a probabilistic fuzzy hybrid model to assess river water quality. Fuzzy logic reasoning has been used to compute a water quality integrative index. By applying a Monte Carlo technique, based on non-parametric probability distributions, the randomness of model inputs was estimated. Annual histograms of nine water quality variables were built with monitoring data systematically collected in the Colombian Cauca River, and probability density estimations using the kernel smoothing method were applied to fit data. Several years were assessed, and river sectors upstream and downstream the city of Santiago de Cali, a big city with basic wastewater treatment and high industrial activity, were analyzed. The probabilistic fuzzy water quality index was able to explain the reduction in water quality, as the river receives a larger number of agriculture, domestic, and industrial effluents. The results of the hybrid model were compared to traditional water quality indexes. The main advantage of the proposed method is that it considers flexible boundaries between the linguistic qualifiers used to define the water status, being the belongingness of water quality to the diverse output fuzzy sets or classes provided with percentiles and histograms, which allows classify better the real water condition. The results of this study show that fuzzy inference systems integrated to stochastic non-parametric techniques may be used as complementary tools in water quality indexing methodologies.

  2. The issue of multiple univariate comparisons in the context of neuroelectric brain mapping: an application in a neuromarketing experiment.

    Science.gov (United States)

    Vecchiato, G; De Vico Fallani, F; Astolfi, L; Toppi, J; Cincotti, F; Mattia, D; Salinari, S; Babiloni, F

    2010-08-30

    This paper presents some considerations about the use of adequate statistical techniques in the framework of the neuroelectromagnetic brain mapping. With the use of advanced EEG/MEG recording setup involving hundred of sensors, the issue of the protection against the type I errors that could occur during the execution of hundred of univariate statistical tests, has gained interest. In the present experiment, we investigated the EEG signals from a mannequin acting as an experimental subject. Data have been collected while performing a neuromarketing experiment and analyzed with state of the art computational tools adopted in specialized literature. Results showed that electric data from the mannequin's head presents statistical significant differences in power spectra during the visualization of a commercial advertising when compared to the power spectra gathered during a documentary, when no adjustments were made on the alpha level of the multiple univariate tests performed. The use of the Bonferroni or Bonferroni-Holm adjustments returned correctly no differences between the signals gathered from the mannequin in the two experimental conditions. An partial sample of recently published literature on different neuroscience journals suggested that at least the 30% of the papers do not use statistical protection for the type I errors. While the occurrence of type I errors could be easily managed with appropriate statistical techniques, the use of such techniques is still not so largely adopted in the literature. Copyright (c) 2010 Elsevier B.V. All rights reserved.

  3. Identifiability and inference of non-parametric rates-across-sites models on large-scale phylogenies

    CERN Document Server

    Mossel, Elchanan

    2011-01-01

    Mutation rate variation across loci is well known to cause difficulties, notably identifiability issues, in the reconstruction of evolutionary trees from molecular sequences. Here we introduce a new approach for estimating general rates-across-sites models. Our results imply, in particular, that large phylogenies are typically identifiable under rate variation. We also derive sequence-length requirements for high-probability reconstruction. Our main contribution is a novel algorithm that clusters sites according to their mutation rate. Following this site clustering step, standard reconstruction techniques can be used to recover the phylogeny. Our results rely on a basic insight: that, for large trees, certain site statistics experience concentration-of-measure phenomena.

  4. 非参数化方法在 DNB 传递分析中的应用%Non-parametric Method Used in DNB Propagation Analysis

    Institute of Scientific and Technical Information of China (English)

    刘俊强; 黄禹

    2014-01-01

    Deciding the internal pressure probability distribution of the fuel rod is a fundamental work in the DNB propagation analysis using Monte Carlo method .The traditional parametric method is used to assume that the internal pressure probability of all rods can be characterized by a normal distribution .But this is not always the case , sometimes there is far more differences between normal distribution and the real one . However ,a new method ,the non-parametric method was used in the treatment of the rod internal pressure data because of its applicability anyw here and good precision in the case of large samples ,and the results show that it is more conservative to use non-parametric method than parametric method in DNB propagation analysis .%采用蒙特卡罗方法进行偏离泡核沸腾(DNB)传递分析中一个最基本的工作是确定燃料棒内压的概率分布。通常假设燃料棒的内压服从正态分布即传统的参数化方法。但燃料棒的内压不总是满足正态分布或与正态分布相差较远。为克服这一不足,本工作采用一种新的方法即非参数化的方法计算燃料棒内压的概率分布。通过对压水堆核电厂燃料棒内压数据的非参数化处理,得到燃料棒内压的概率分布并进行DNB传递分析。由计算结果得出:在DNB传递分析中,相较于参数化方法,采用非参数化方法所得的棒内压概率分布具有普遍适用性及大样本下的良好精度,分析结果更为保守、安全。

  5. Determination of drug absorption rate in time-variant disposition by direct deconvolution using beta clearance correction and end-constrained non-parametric regression.

    Science.gov (United States)

    Neelakantan, S; Veng-Pedersen, P

    2005-11-01

    A novel numerical deconvolution method is presented that enables the estimation of drug absorption rates under time-variant disposition conditions. The method involves two components. (1) A disposition decomposition-recomposition (DDR) enabling exact changes in the unit impulse response (UIR) to be constructed based on centrally based clearance changes iteratively determined. (2) A non-parametric, end-constrained cubic spline (ECS) input response function estimated by cross-validation. The proposed DDR-ECS method compensates for disposition changes between the test and the reference administrations by using a "beta" clearance correction based on DDR analysis. The representation of the input response by the ECS method takes into consideration the complex absorption process and also ensures physiologically realistic approximations of the response. The stability of the new method to noisy data was evaluated by comprehensive simulations that considered different UIRs, various input functions, clearance changes and a novel scaling of the input function that includes the "flip-flop" absorption phenomena. The simulated input response was also analysed by two other methods and all three methods were compared for their relative performances. The DDR-ECS method provides better estimation of the input profile under significant clearance changes but tends to overestimate the input when there were only small changes in the clearance.

  6. A novel non-parametric method for uncertainty evaluation of correlation-based molecular signatures: its application on PAM50 algorithm.

    Science.gov (United States)

    Fresno, Cristóbal; González, Germán Alexis; Merino, Gabriela Alejandra; Flesia, Ana Georgina; Podhajcer, Osvaldo Luis; Llera, Andrea Sabina; Fernández, Elmer Andrés

    2017-03-01

    The PAM50 classifier is used to assign patients to the highest correlated breast cancer subtype irrespectively of the obtained value. Nonetheless, all subtype correlations are required to build the risk of recurrence (ROR) score, currently used in therapeutic decisions. Present subtype uncertainty estimations are not accurate, seldom considered or require a population-based approach for this context. Here we present a novel single-subject non-parametric uncertainty estimation based on PAM50's gene label permutations. Simulations results ( n  = 5228) showed that only 61% subjects can be reliably 'Assigned' to the PAM50 subtype, whereas 33% should be 'Not Assigned' (NA), leaving the rest to tight 'Ambiguous' correlations between subtypes. The NA subjects exclusion from the analysis improved survival subtype curves discrimination yielding a higher proportion of low and high ROR values. Conversely, all NA subjects showed similar survival behaviour regardless of the original PAM50 assignment. We propose to incorporate our PAM50 uncertainty estimation to support therapeutic decisions. Source code can be found in 'pbcmc' R package at Bioconductor. cristobalfresno@gmail.com or efernandez@bdmg.com.ar. Supplementary data are available at Bioinformatics online.

  7. Non-parametric study of the evolution of the cosmological equation of state with SNeIa, BAO and high redshift GRBs

    CERN Document Server

    Postnikov, Sergey; Hernandez, Xavier; Capozziello, Salvatore

    2014-01-01

    We study the dark energy equation of state as a function of redshift in a non-parametric way, without imposing any {\\it a priori} $w(z)$ (ratio of pressure over energy density) functional form. As a check of the method, we test our scheme through the use of synthetic data sets produced from different input cosmological models which have the same relative errors and redshift distribution as the real data. Using the luminosity-time $L_{X}-T_{a}$ correlation for GRB X-ray afterglows (the Dainotti et al. correlation), we are able to utilize GRB sample from the {\\it Swift} satellite as probes of the expansion history of the Universe out to $z \\approx 10$. Within the assumption of a flat FLRW universe and combining SNeIa data with BAO constraints, the resulting maximum likelihood solutions are close to a constant $w=-1$. If one imposes the restriction of a constant $w$, we obtain $w=-0.99 \\pm 0.06$ (consistent with a cosmological constant) with the present day Hubble constant as $H_{0}=70.0 \\pm 0.6$ ${\\rm km} \\, {\\...

  8. A Critical Look at the Mass-Metallicity-SFR Relation in the Local Universe: Non-parametric Analysis Framework and Confounding Systematics

    CERN Document Server

    Salim, Samir; Ly, Chun; Brinchmann, Jarle; Davé, Romeel; Dickinson, Mark; Salzer, John J; Charlot, Stéphane

    2014-01-01

    It has been proposed that the mass-metallicity relation of galaxies exhibits a secondary dependence on star formation rate (SFR), and that the resulting M-Z-SFR relation may be redshift-invariant, i.e., "fundamental." However, conflicting results on the character of the SFR dependence, and whether it exists, have been reported. To gain insight into the origins of the conflicting results, we (a) devise a non-parametric, astrophysically-motivated analysis framework based on the offset from the star-forming ("main") sequence at a given stellar mass (relative specific SFR), (b) apply this methodology and perform a comprehensive re-analysis of the local M-Z-SFR relation, based on SDSS, GALEX, and WISE data, and (c) study the impact of sample selection, and of using different metallicity and SFR indicators. We show that metallicity is anti-correlated with specific SFR regardless of the indicators used. We do not find that the relation is spurious due to correlations arising from biased metallicity measurements, or ...

  9. Statistics: Notes and Examples. Study Guide for the Doctor of Arts in Computer-Based Learning.

    Science.gov (United States)

    MacFarland, Thomas W.

    This study guide presents lessons on hand calculating various statistics: Central Tendency and Dispersion; Tips on Data Presentation; Two-Tailed and One-Tailed Tests of Significance; Error Types; Standard Scores; Non-Parametric Tests such as Chi-square, Spearman Rho, Sign Test, Wilcoxon Matched Pairs, Mann-Whitney U, Kruskal-Wallis, and Rank Sums;…

  10. Estimation from PET data of transient changes in dopamine concentration induced by alcohol: support for a non-parametric signal estimation method

    Energy Technology Data Exchange (ETDEWEB)

    Constantinescu, C C; Yoder, K K; Normandin, M D; Morris, E D [Department of Radiology, Indiana University School of Medicine, Indianapolis, IN (United States); Kareken, D A [Department of Neurology, Indiana University School of Medicine, Indianapolis, IN (United States); Bouman, C A [Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN (United States); O' Connor, S J [Department of Psychiatry, Indiana University School of Medicine, Indianapolis, IN (United States)], E-mail: emorris@iupui.edu

    2008-03-07

    We previously developed a model-independent technique (non-parametric ntPET) for extracting the transient changes in neurotransmitter concentration from paired (rest and activation) PET studies with a receptor ligand. To provide support for our method, we introduced three hypotheses of validation based on work by Endres and Carson (1998 J. Cereb. Blood Flow Metab. 18 1196-210) and Yoder et al (2004 J. Nucl. Med. 45 903-11), and tested them on experimental data. All three hypotheses describe relationships between the estimated free (synaptic) dopamine curves (F{sup DA}(t)) and the change in binding potential ({delta}BP). The veracity of the F{sup DA}(t) curves recovered by nonparametric ntPET is supported when the data adhere to the following hypothesized behaviors: (1) {delta}BP should decline with increasing DA peak time, (2) {delta}BP should increase as the strength of the temporal correlation between F{sup DA}(t) and the free raclopride (F{sup RAC}(t)) curve increases, (3) {delta}BP should decline linearly with the effective weighted availability of the receptor sites. We analyzed regional brain data from 8 healthy subjects who received two [{sup 11}C]raclopride scans: one at rest, and one during which unanticipated IV alcohol was administered to stimulate dopamine release. For several striatal regions, nonparametric ntPET was applied to recover F{sup DA}(t), and binding potential values were determined. Kendall rank-correlation analysis confirmed that the F{sup DA}(t) data followed the expected trends for all three validation hypotheses. Our findings lend credence to our model-independent estimates of F{sup DA}(t). Application of nonparametric ntPET may yield important insights into how alterations in timing of dopaminergic neurotransmission are involved in the pathologies of addiction and other psychiatric disorders.

  11. Temporal Expression of Peripheral Blood Leukocyte Biomarkers in a Macaca fascicularis Infection Model of Tuberculosis; Comparison with Human Datasets and Analysis with Parametric/Non-parametric Tools for Improved Diagnostic Biomarker Identification.

    Directory of Open Access Journals (Sweden)

    Sajid Javed

    Full Text Available A temporal study of gene expression in peripheral blood leukocytes (PBLs from a Mycobacterium tuberculosis primary, pulmonary challenge model Macaca fascicularis has been conducted. PBL samples were taken prior to challenge and at one, two, four and six weeks post-challenge and labelled, purified RNAs hybridised to Operon Human Genome AROS V4.0 slides. Data analyses revealed a large number of differentially regulated gene entities, which exhibited temporal profiles of expression across the time course study. Further data refinements identified groups of key markers showing group-specific expression patterns, with a substantial reprogramming event evident at the four to six week interval. Selected statistically-significant gene entities from this study and other immune and apoptotic markers were validated using qPCR, which confirmed many of the results obtained using microarray hybridisation. These showed evidence of a step-change in gene expression from an 'early' FOS-associated response, to a 'late' predominantly type I interferon-driven response, with coincident reduction of expression of other markers. Loss of T-cell-associate marker expression was observed in responsive animals, with concordant elevation of markers which may be associated with a myeloid suppressor cell phenotype e.g. CD163. The animals in the study were of different lineages and these Chinese and Mauritian cynomolgous macaque lines showed clear evidence of differing susceptibilities to Tuberculosis challenge. We determined a number of key differences in response profiles between the groups, particularly in expression of T-cell and apoptotic makers, amongst others. These have provided interesting insights into innate susceptibility related to different host `phenotypes. Using a combination of parametric and non-parametric artificial neural network analyses we have identified key genes and regulatory pathways which may be important in early and adaptive responses to TB. Using

  12. Detection of patient subgroups with differential expression in omics data: a comprehensive comparison of univariate measures.

    Directory of Open Access Journals (Sweden)

    Maike Ahrens

    Full Text Available Detection of yet unknown subgroups showing differential gene or protein expression is a frequent goal in the analysis of modern molecular data. Applications range from cancer biology over developmental biology to toxicology. Often a control and an experimental group are compared, and subgroups can be characterized by differential expression for only a subgroup-specific set of genes or proteins. Finding such genes and corresponding patient subgroups can help in understanding pathological pathways, diagnosis and defining drug targets. The size of the subgroup and the type of differential expression determine the optimal strategy for subgroup identification. To date, commonly used software packages hardly provide statistical tests and methods for the detection of such subgroups. Different univariate methods for subgroup detection are characterized and compared, both on simulated and on real data. We present an advanced design for simulation studies: Data is simulated under different distributional assumptions for the expression of the subgroup, and performance results are compared against theoretical upper bounds. For each distribution, different degrees of deviation from the majority of observations are considered for the subgroup. We evaluate classical approaches as well as various new suggestions in the context of omics data, including outlier sum, PADGE, and kurtosis. We also propose the new FisherSum score. ROC curve analysis and AUC values are used to quantify the ability of the methods to distinguish between genes or proteins with and without certain subgroup patterns. In general, FisherSum for small subgroups and t-test for large subgroups achieve best results. We apply each method to a case-control study on Parkinson's disease and underline the biological benefit of the new method.

  13. Functional summary statistics for the Johnson-Mehl model

    DEFF Research Database (Denmark)

    Møller, Jesper; Ghorbani, Mohammad

    of functional summary statistics. This paper therefore invents four functional summary statistics adapted to the Johnson-Mehl model, with two of them based on the second-order properties and the other two on the nuclei-boundary distances for the associated Johnson-Mehl tessellation. The functional summary...... statistics theoretical properties are investigated, non-parametric estimators are suggested, and their usefulness for model checking is examined in a simulation study. The functional summary statistics are also used for checking fitted parametric Johnson-Mehl models for a neurotransmitters dataset....

  14. Statistical Theory for the "RCT-YES" Software: Design-Based Causal Inference for RCTs. NCEE 2015-4011

    Science.gov (United States)

    Schochet, Peter Z.

    2015-01-01

    This report presents the statistical theory underlying the "RCT-YES" software that estimates and reports impacts for RCTs for a wide range of designs used in social policy research. The report discusses a unified, non-parametric design-based approach for impact estimation using the building blocks of the Neyman-Rubin-Holland causal…

  15. Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers

    Directory of Open Access Journals (Sweden)

    Stochl Jan

    2012-06-01

    Full Text Available Abstract Background Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Methods Scalability of data from 1 a cross-sectional health survey (the Scottish Health Education Population Survey and 2 a general population birth cohort study (the National Child Development Study illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. Results and conclusions After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items we show that all items from the 12-item General Health Questionnaire (GHQ-12 – when binary scored – were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech’s “well-being” and “distress” clinical scales. An illustration of ordinal item analysis

  16. The signaling petri net-based simulator: a non-parametric strategy for characterizing the dynamics of cell-specific signaling networks.

    Directory of Open Access Journals (Sweden)

    Derek Ruths

    2008-02-01

    Full Text Available Reconstructing cellular signaling networks and understanding how they work are major endeavors in cell biology. The scale and complexity of these networks, however, render their analysis using experimental biology approaches alone very challenging. As a result, computational methods have been developed and combined with experimental biology approaches, producing powerful tools for the analysis of these networks. These computational methods mostly fall on either end of a spectrum of model parameterization. On one end is a class of structural network analysis methods; these typically use the network connectivity alone to generate hypotheses about global properties. On the other end is a class of dynamic network analysis methods; these use, in addition to the connectivity, kinetic parameters of the biochemical reactions to predict the network's dynamic behavior. These predictions provide detailed insights into the properties that determine aspects of the network's structure and behavior. However, the difficulty of obtaining numerical values of kinetic parameters is widely recognized to limit the applicability of this latter class of methods. Several researchers have observed that the connectivity of a network alone can provide significant insights into its dynamics. Motivated by this fundamental observation, we present the signaling Petri net, a non-parametric model of cellular signaling networks, and the signaling Petri net-based simulator, a Petri net execution strategy for characterizing the dynamics of signal flow through a signaling network using token distribution and sampling. The result is a very fast method, which can analyze large-scale networks, and provide insights into the trends of molecules' activity-levels in response to an external stimulus, based solely on the network's connectivity. We have implemented the signaling Petri net-based simulator in the PathwayOracle toolkit, which is publicly available at http

  17. Non-parametric linear regression of discrete Fourier transform convoluted chromatographic peak responses under non-ideal conditions of internal standard method.

    Science.gov (United States)

    Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Fahmy, Ossama T; Ragab, Marwa A A

    2010-11-15

    This manuscript discusses the application of chemometrics to the handling of HPLC response data using the internal standard method (ISM). This was performed on a model mixture containing terbutaline sulphate, guaiphenesin, bromhexine HCl, sodium benzoate and propylparaben as an internal standard. Derivative treatment of chromatographic response data of analyte and internal standard was followed by convolution of the resulting derivative curves using 8-points sin x(i) polynomials (discrete Fourier functions). The response of each analyte signal, its corresponding derivative and convoluted derivative data were divided by that of the internal standard to obtain the corresponding ratio data. This was found beneficial in eliminating different types of interferences. It was successfully applied to handle some of the most common chromatographic problems and non-ideal conditions, namely: overlapping chromatographic peaks and very low analyte concentrations. For example, a significant change in the correlation coefficient of sodium benzoate, in case of overlapping peaks, went from 0.9975 to 0.9998 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. Also a significant improvement in the precision and accuracy for the determination of synthetic mixtures and dosage forms in non-ideal cases was achieved. For example, in the case of overlapping peaks guaiphenesin mean recovery% and RSD% went from 91.57, 9.83 to 100.04, 0.78 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. This work also compares the application of Theil's method, a non-parametric regression method, in handling the response ratio data, with the least squares parametric regression method, which is considered the de facto standard method used for regression. Theil's method was found to be superior to the method of least squares as it assumes that errors could occur in both x- and y-directions and

  18. Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers.

    Science.gov (United States)

    Stochl, Jan; Jones, Peter B; Croudace, Tim J

    2012-06-11

    Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12)--when binary scored--were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech's "well-being" and "distress" clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental

  19. A univariate analysis of variance design for multiple-choice feeding-preference experiments: A hypothetical example with fruit-eating birds

    Science.gov (United States)

    Larrinaga, Asier R.

    2010-01-01

    I consider statistical problems in the analysis of multiple-choice food-preference experiments, and propose a univariate analysis of variance design for experiments of this type. I present an example experimental design, for a hypothetical comparison of fruit colour preferences between two frugivorous bird species. In each fictitious trial, four trays each containing a known weight of artificial fruits (red, blue, black, or green) are introduced into the cage, while four equivalent trays are left outside the cage, to control for tray weight loss due to other factors (notably desiccation). The proposed univariate approach allows data from such designs to be analysed with adequate power and no major violations of statistical assumptions. Nevertheless, there is no single "best" approach for experiments of this type: the best analysis in each case will depend on the particular aims and nature of the experiments.

  20. Comparison of three Statistical Classification Techniques for Maser Identification

    CERN Document Server

    Manning, Ellen M; Ellingsen, Simon P; Breen, Shari L; Chen, Xi; Humphries, Melissa

    2016-01-01

    We applied three statistical classification techniques - linear discriminant analysis (LDA), logistic regression and random forests - to three astronomical datasets associated with searches for interstellar masers. We compared the performance of these methods in identifying whether specific mid-infrared or millimetre continuum sources are likely to have associated interstellar masers. We also discuss the ease, or otherwise, with which the results of each classification technique can be interpreted. Non-parametric methods have the potential to make accurate predictions when there are complex relationships between critical parameters. We found that for the small datasets the parametric methods logistic regression and LDA performed best, for the largest dataset the non-parametric method of random forests performed with comparable accuracy to parametric techniques, rather than any significant improvement. This suggests that at least for the specific examples investigated here accuracy of the predictions obtained ...

  1. Univariate and multivariate general linear models theory and applications with SAS

    CERN Document Server

    Kim, Kevin

    2006-01-01

    Reviewing the theory of the general linear model (GLM) using a general framework, Univariate and Multivariate General Linear Models: Theory and Applications with SAS, Second Edition presents analyses of simple and complex models, both univariate and multivariate, that employ data sets from a variety of disciplines, such as the social and behavioral sciences.With revised examples that include options available using SAS 9.0, this expanded edition divides theory from applications within each chapter. Following an overview of the GLM, the book introduces unrestricted GLMs to analyze multiple regr

  2. Cellulose I crystallinity determination using FT-Raman spectroscopy : univariate and multivariate methods

    Science.gov (United States)

    Umesh P. Agarwal; Richard S. Reiner; Sally A. Ralph

    2010-01-01

    Two new methods based on FT–Raman spectroscopy, one simple, based on band intensity ratio, and the other using a partial least squares (PLS) regression model, are proposed to determine cellulose I crystallinity. In the simple method, crystallinity in cellulose I samples was determined based on univariate regression that was first developed using the Raman band...

  3. Regression Is a Univariate General Linear Model Subsuming Other Parametric Methods as Special Cases.

    Science.gov (United States)

    Vidal, Sherry

    Although the concept of the general linear model (GLM) has existed since the 1960s, other univariate analyses such as the t-test and the analysis of variance models have remained popular. The GLM produces an equation that minimizes the mean differences of independent variables as they are related to a dependent variable. From a computer printout…

  4. Combinatorial bounds on the α-divergence of univariate mixture models

    KAUST Repository

    Nielsen, Frank

    2017-06-20

    We derive lower- and upper-bounds of α-divergence between univariate mixture models with components in the exponential family. Three pairs of bounds are presented in order with increasing quality and increasing computational cost. They are verified empirically through simulated Gaussian mixture models. The presented methodology generalizes to other divergence families relying on Hellinger-type integrals.

  5. A non-parametric method for automatic determination of P-wave and S-wave arrival times: application to local micro earthquakes

    Science.gov (United States)

    Rawles, Christopher; Thurber, Clifford

    2015-08-01

    We present a simple, fast, and robust method for automatic detection of P- and S-wave arrivals using a nearest neighbours-based approach. The nearest neighbour algorithm is one of the most popular time-series classification methods in the data mining community and has been applied to time-series problems in many different domains. Specifically, our method is based on the non-parametric time-series classification method developed by Nikolov. Instead of building a model by estimating parameters from the data, the method uses the data itself to define the model. Potential phase arrivals are identified based on their similarity to a set of reference data consisting of positive and negative sets, where the positive set contains examples of analyst identified P- or S-wave onsets and the negative set contains examples that do not contain P waves or S waves. Similarity is defined as the square of the Euclidean distance between vectors representing the scaled absolute values of the amplitudes of the observed signal and a given reference example in time windows of the same length. For both P waves and S waves, a single pass is done through the bandpassed data, producing a score function defined as the ratio of the sum of similarity to positive examples over the sum of similarity to negative examples for each window. A phase arrival is chosen as the centre position of the window that maximizes the score function. The method is tested on two local earthquake data sets, consisting of 98 known events from the Parkfield region in central California and 32 known events from the Alpine Fault region on the South Island of New Zealand. For P-wave picks, using a reference set containing two picks from the Parkfield data set, 98 per cent of Parkfield and 94 per cent of Alpine Fault picks are determined within 0.1 s of the analyst pick. For S-wave picks, 94 per cent and 91 per cent of picks are determined within 0.2 s of the analyst picks for the Parkfield and Alpine Fault data set

  6. Quantitative Phylogenomics of Within-Species Mitogenome Variation: Monte Carlo and Non-Parametric Analysis of Phylogeographic Structure among Discrete Transatlantic Breeding Areas of Harp Seals (Pagophilus groenlandicus.

    Directory of Open Access Journals (Sweden)

    Steven M Carr

    -stepping-stone biogeographic models, but not a simple 1-step trans-Atlantic model. Plots of the cumulative pairwise sequence difference curves among seals in each of the four populations provide continuous proxies for phylogenetic diversification within each. Non-parametric Kolmogorov-Smirnov (K-S tests of maximum pairwise differences between these curves indicates that the Greenland Sea population has a markedly younger phylogenetic structure than either the White Sea population or the two Northwest Atlantic populations, which are of intermediate age and homogeneous structure. The Monte Carlo and K-S assessments provide sensitive quantitative tests of within-species mitogenomic phylogeography. This is the first study to indicate that the White Sea and Greenland Sea populations have different population genetic histories. The analysis supports the hypothesis that Harp Seals comprises three genetically distinguishable breeding populations, in the White Sea, Greenland Sea, and Northwest Atlantic. Implications for an ice-dependent species during ongoing climate change are discussed.

  7. Anthropometry of Women of the U.S. Army -- 1977. Report Number 2. The Basic Univariate Statistics

    Science.gov (United States)

    1977-06-01

    Administrative support , without which the survey would have been impossible, was provided at Fort Sam Houston by Colonel Maurice H. Henaley, Colonel George...k the ocore.OinN. WOMC’S ARMY CORPS AMP(4ETUIC SURVEY BLANK - 1976/1977 (Plea" print all recTieted information) subject No. Social secur 4 ty No...subeerios. :z~z15 WOMENS AMY CORPS ANThMOPOKTRIC SLAVEY BLAiK - 1976/1977 (Pleas* print all requested i•formation) Subject wo. Social Security No. Nam

  8. Statistical Analysis of Data for Timber Strengths

    DEFF Research Database (Denmark)

    Sørensen, John Dalsgaard

    2003-01-01

    Statistical analyses are performed for material strength parameters from a large number of specimens of structural timber. Non-parametric statistical analysis and fits have been investigated for the following distribution types: Normal, Lognormal, 2 parameter Weibull and 3-parameter Weibull....... The statistical fits have generally been made using all data and the lower tail of the data. The Maximum Likelihood Method and the Least Square Technique have been used to estimate the statistical parameters in the selected distributions. The results show that the 2-parameter Weibull distribution gives the best...... fits to the data available, especially if tail fits are used whereas the Log Normal distribution generally gives a poor fit and larger coefficients of variation, especially if tail fits are used. The implications on the reliability level of typical structural elements and on partial safety factors...

  9. Statistical Analysis of Data for Timber Strengths

    DEFF Research Database (Denmark)

    Sørensen, John Dalsgaard; Hoffmeyer, P.

    Statistical analyses are performed for material strength parameters from approximately 6700 specimens of structural timber. Non-parametric statistical analyses and fits to the following distributions types have been investigated: Normal, Lognormal, 2 parameter Weibull and 3-parameter Weibull....... The statistical fits have generally been made using all data (100%) and the lower tail (30%) of the data. The Maximum Likelihood Method and the Least Square Technique have been used to estimate the statistical parameters in the selected distributions. 8 different databases are analysed. The results show that 2......-parameter Weibull (and Normal) distributions give the best fits to the data available, especially if tail fits are used whereas the LogNormal distribution generally gives poor fit and larger coefficients of variation, especially if tail fits are used....

  10. Statistical Analysis Of Reconnaissance Geochemical Data From ...

    African Journals Online (AJOL)

    Statistical Analysis Of Reconnaissance Geochemical Data From Orle District, ... The univariate methods used include frequency distribution and cumulative ... The possible mineral potential of the area include base metals (Pb, Zn, Cu, Mo, etc.) ...

  11. A Simple Universal Generator for Continuous and Discrete Univariate T-concave Distributions

    OpenAIRE

    Leydold, Josef

    2000-01-01

    We use inequalities to design short universal algorithms that can be used to generate random variates from large classes of univariate continuous or discrete distributions (including all log-concave distributions). The expected time is uniformly bounded over all these distributions. The algorithms can be implemented in a few lines of high level language code. In opposition to other black-box algorithms hardly any setup step is required and thus it is superior in the changing parameter case. (...

  12. An improved estimate for the condition number anomaly of univariate Gaussian correlation matrices

    DEFF Research Database (Denmark)

    Zimmermann, Ralf

    2015-01-01

    In this short note, it is proved that the derivatives of the parametrized univariate Gaussian correlation matrix R_g (θ) = (exp(−θ(x_i − x_j )^2_{i,j} ∈ R^{n×n} are rank-deficient in the limit θ = 0 up to any order m < (n − 1)/2. This result generalizes the rank deficiency theorem for Euclidean d...

  13. Treatment effect heterogeneity for univariate subgroups in clinical trials: Shrinkage, standardization, or else.

    Science.gov (United States)

    Varadhan, Ravi; Wang, Sue-Jane

    2016-01-01

    Treatment effect heterogeneity is a well-recognized phenomenon in randomized controlled clinical trials. In this paper, we discuss subgroup analyses with prespecified subgroups of clinical or biological importance. We explore various alternatives to the naive (the traditional univariate) subgroup analyses to address the issues of multiplicity and confounding. Specifically, we consider a model-based Bayesian shrinkage (Bayes-DS) and a nonparametric, empirical Bayes shrinkage approach (Emp-Bayes) to temper the optimism of traditional univariate subgroup analyses; a standardization approach (standardization) that accounts for correlation between baseline covariates; and a model-based maximum likelihood estimation (MLE) approach. The Bayes-DS and Emp-Bayes methods model the variation in subgroup-specific treatment effect rather than testing the null hypothesis of no difference between subgroups. The standardization approach addresses the issue of confounding in subgroup analyses. The MLE approach is considered only for comparison in simulation studies as the "truth" since the data were generated from the same model. Using the characteristics of a hypothetical large outcome trial, we perform simulation studies and articulate the utilities and potential limitations of these estimators. Simulation results indicate that Bayes-DS and Emp-Bayes can protect against optimism present in the naïve approach. Due to its simplicity, the naïve approach should be the reference for reporting univariate subgroup-specific treatment effect estimates from exploratory subgroup analyses. Standardization, although it tends to have a larger variance, is suggested when it is important to address the confounding of univariate subgroup effects due to correlation between baseline covariates. The Bayes-DS approach is available as an R package (DSBayes).

  14. The Stellar Initial Mass Function in Early-type Galaxies from Absorption Line Spectroscopy. IV. A Super-Salpeter IMF in the Center of NGC 1407 from Non-parametric Models

    Science.gov (United States)

    Conroy, Charlie; van Dokkum, Pieter G.; Villaume, Alexa

    2017-03-01

    It is now well-established that the stellar initial mass function (IMF) can be determined from the absorption line spectra of old stellar systems, and this has been used to measure the IMF and its variation across the early-type galaxy population. Previous work focused on measuring the slope of the IMF over one or more stellar mass intervals, implicitly assuming that this is a good description of the IMF and that the IMF has a universal low-mass cutoff. In this work we consider more flexible IMFs, including two-component power laws with a variable low-mass cutoff and a general non-parametric model. We demonstrate with mock spectra that the detailed shape of the IMF can be accurately recovered as long as the data quality is high (S/N ≳ 300 Å‑1) and cover a wide wavelength range (0.4–1.0 μm). We apply these flexible IMF models to a high S/N spectrum of the center of the massive elliptical galaxy NGC 1407. Fitting the spectrum with non-parametric IMFs, we find that the IMF in the center shows a continuous rise extending toward the hydrogen-burning limit, with a behavior that is well-approximated by a power law with an index of ‑2.7. These results provide strong evidence for the existence of extreme (super-Salpeter) IMFs in the cores of massive galaxies.

  15. The Stellar Initial Mass Function in Early-Type Galaxies From Absorption Line Spectroscopy. IV. A Super-Salpeter IMF in the center of NGC 1407 from Non-Parametric Models

    CERN Document Server

    Conroy, Charlie; Villaume, Alexa

    2016-01-01

    It is now well-established that the stellar initial mass function (IMF) can be determined from the absorption line spectra of old stellar systems, and this has been used to measure the IMF and its variation across the early-type galaxy population. Previous work focused on measuring the slope of the IMF over one or more stellar mass intervals, implicitly assuming that this is a good description of the IMF and that the IMF has a universal low-mass cutoff. In this work we consider more flexible IMFs, including two-component power-laws with a variable low-mass cutoff and a general non-parametric model. We demonstrate with mock spectra that the detailed shape of the IMF can be accurately recovered as long as the data quality are high (S/N$\\gtrsim300$) and cover a wide wavelength range (0.4um-1.0um). We apply these flexible IMF models to a high S/N spectrum of the center of the massive elliptical galaxy NGC 1407. Fitting the spectrum with non-parametric IMFs, we find that the IMF in the center shows a continuous ri...

  16. STATISTICS OF FUZZY DATA

    Directory of Open Access Journals (Sweden)

    Orlov A. I.

    2016-05-01

    Full Text Available Fuzzy sets are the special form of objects of nonnumeric nature. Therefore, in the processing of the sample, the elements of which are fuzzy sets, a variety of methods for the analysis of statistical data of any nature can be used - the calculation of the average, non-parametric density estimators, construction of diagnostic rules, etc. We have told about the development of our work on the theory of fuzziness (1975 - 2015. In the first of our work on fuzzy sets (1975, the theory of random sets is regarded as a generalization of the theory of fuzzy sets. In non-fiction series "Mathematics. Cybernetics" (publishing house "Knowledge" in 1980 the first book by a Soviet author fuzzy sets is published - our brochure "Optimization problems and fuzzy variables". This book is essentially a "squeeze" our research of 70-ies, ie, the research on the theory of stability and in particular on the statistics of objects of non-numeric nature, with a bias in the methodology. The book includes the main results of the fuzzy theory and its note to the random set theory, as well as new results (first publication! of statistics of fuzzy sets. On the basis of further experience, you can expect that the theory of fuzzy sets will be more actively applied in organizational and economic modeling of industry management processes. We discuss the concept of the average value of a fuzzy set. We have considered a number of statements of problems of testing statistical hypotheses on fuzzy sets. We have also proposed and justified some algorithms for restore relationships between fuzzy variables; we have given the representation of various variants of fuzzy cluster analysis of data and variables and described some methods of collection and description of fuzzy data

  17. Function of cancer associated genes revealed by modern univariate and multivariate association tests.

    Directory of Open Access Journals (Sweden)

    Malka Gorfine

    Full Text Available Copy number variation (CNV plays a role in pathogenesis of many human diseases, especially cancer. Several whole genome CNV association studies have been performed for the purpose of identifying cancer associated CNVs. Here we undertook a novel approach to whole genome CNV analysis, with the goal being identification of associations between CNV of different genes (CNV-CNV across 60 human cancer cell lines. We hypothesize that these associations point to the roles of the associated genes in cancer, and can be indicators of their position in gene networks of cancer-driving processes. Recent studies show that gene associations are often non-linear and non-monotone. In order to obtain a more complete picture of all CNV associations, we performed omnibus univariate analysis by utilizing dCov, MIC, and HHG association tests, which are capable of detecting any type of association, including non-monotone relationships. For comparison we used Spearman and Pearson association tests, which detect only linear or monotone relationships. Application of dCov, MIC and HHG tests resulted in identification of twice as many associations compared to those found by Spearman and Pearson alone. Interestingly, most of the new associations were detected by the HHG test. Next, we utilized dCov's and HHG's ability to perform multivariate analysis. We tested for association between genes of unknown function and known cancer-related pathways. Our results indicate that multivariate analysis is much more effective than univariate analysis for the purpose of ascribing biological roles to genes of unknown function. We conclude that a combination of multivariate and univariate omnibus association tests can reveal significant information about gene networks of disease-driving processes. These methods can be applied to any large gene or pathway dataset, allowing more comprehensive analysis of biological processes.

  18. Nuisance forecasting. Univariate modelling and very-short-term forecasting of winter smog episodes; Immissionsprognose. Univariate Modellierung und Kuerzestfristvorhersage von Wintersmogsituationen

    Energy Technology Data Exchange (ETDEWEB)

    Schlink, U.

    1996-12-31

    The work evaluates specifically the nuisance data provided by the measuring station in the centre of Leipig during the period from 1980 to 1993, with the aim to develop an algorithm for making very short-term forecasts of excessive nuisances. Forecasting was to be univariate, i.e., based exclusively on the half-hourly readings of SO{sub 2} concentrations taken in the past. As shown by Fourier analysis, there exist three main and mutually independent spectral regions: the high-frequency sector (period < 12 hours) of unstable irregularities, the seasonal sector with the periods of 24 and 12 hours, and the low-frequency sector (period > 24 hours). After breaking the measuring series up into components, the low-frequency sector is termed trend component, or trend for short. For obtaining the components, a Kalman filter is used. It was found that smog episodes are most adequately described by the trend component. This is therefore more closely investigated. The phase representation then shows characteristic trajectories of the trends. (orig./KW) [Deutsch] In der vorliegende Arbeit wurden speziell die Immissionsdaten der Messstation Leipzig-Mitte des Zeitraumes 1980-1993 mit dem Ziel der Erstellung eines Algorithmus fuer die Kuerzestfristprognose von Ueberschreitungssituationen untersucht. Die Prognosestellung sollte allein anhand der in der Vergangenheit registrierten Halbstundenwerte der SO{sub 2}-Konzentration, also univariat erfolgen. Wie die Fourieranalyse zeigt, gibt es drei wesentliche und voneinander unabhaengige Spektralbereiche: Den hochfrequenten Bereich (Periode <12 Stunden) der instabilen Irregularitaeten, den saisonalen Anteil mit den Perioden von 24 und 12 Stunden und den niedrigfrequenten Bereich (Periode >24 Stunden). Letzterer wird nach einer Zerlegung der Messreihe in Komponenten als Trendkomponente (oder kurz Trend) bezeichnet. Fuer die Komponentenzerlegung wird ein Kalman-Filter verwendet. Es stellt sich heraus, dass Smogepisoden am deutlichsten

  19. A statistical approach to bioclimatic trend detection in the airborne pollen records of Catalonia (NE Spain).

    Science.gov (United States)

    Fernández-Llamazares, Alvaro; Belmonte, Jordina; Delgado, Rosario; De Linares, Concepción

    2014-04-01

    Airborne pollen records are a suitable indicator for the study of climate change. The present work focuses on the role of annual pollen indices for the detection of bioclimatic trends through the analysis of the aerobiological spectra of 11 taxa of great biogeographical relevance in Catalonia over an 18-year period (1994-2011), by means of different parametric and non-parametric statistical methods. Among others, two non-parametric rank-based statistical tests were performed for detecting monotonic trends in time series data of the selected airborne pollen types and we have observed that they have similar power in detecting trends. Except for those cases in which the pollen data can be well-modeled by a normal distribution, it is better to apply non-parametric statistical methods to aerobiological studies. Our results provide a reliable representation of the pollen trends in the region and suggest that greater pollen quantities are being liberated to the atmosphere in the last years, specially by Mediterranean taxa such as Pinus, Total Quercus and Evergreen Quercus, although the trends may differ geographically. Longer aerobiological monitoring periods are required to corroborate these results and survey the increasing levels of certain pollen types that could exert an impact in terms of public health.

  20. Wind Speed Prediction Using a Univariate ARIMA Model and a Multivariate NARX Model

    Directory of Open Access Journals (Sweden)

    Erasmo Cadenas

    2016-02-01

    Full Text Available Two on step ahead wind speed forecasting models were compared. A univariate model was developed using a linear autoregressive integrated moving average (ARIMA. This method’s performance is well studied for a large number of prediction problems. The other is a multivariate model developed using a nonlinear autoregressive exogenous artificial neural network (NARX. This uses the variables: barometric pressure, air temperature, wind direction and solar radiation or relative humidity, as well as delayed wind speed. Both models were developed from two databases from two sites: an hourly average measurements database from La Mata, Oaxaca, Mexico, and a ten minute average measurements database from Metepec, Hidalgo, Mexico. The main objective was to compare the impact of the various meteorological variables on the performance of the multivariate model of wind speed prediction with respect to the high performance univariate linear model. The NARX model gave better results with improvements on the ARIMA model of between 5.5% and 10. 6% for the hourly database and of between 2.3% and 12.8% for the ten minute database for mean absolute error and mean squared error, respectively.

  1. Nonlinear reconstruction of single-molecule free-energy surfaces from univariate time series.

    Science.gov (United States)

    Wang, Jiang; Ferguson, Andrew L

    2016-03-01

    The stable conformations and dynamical fluctuations of polymers and macromolecules are governed by the underlying single-molecule free energy surface. By integrating ideas from dynamical systems theory with nonlinear manifold learning, we have recovered single-molecule free energy surfaces from univariate time series in a single coarse-grained system observable. Using Takens' Delay Embedding Theorem, we expand the univariate time series into a high dimensional space in which the dynamics are equivalent to those of the molecular motions in real space. We then apply the diffusion map nonlinear manifold learning algorithm to extract a low-dimensional representation of the free energy surface that is diffeomorphic to that computed from a complete knowledge of all system degrees of freedom. We validate our approach in molecular dynamics simulations of a C(24)H(50) n-alkane chain to demonstrate that the two-dimensional free energy surface extracted from the atomistic simulation trajectory is - subject to spatial and temporal symmetries - geometrically and topologically equivalent to that recovered from a knowledge of only the head-to-tail distance of the chain. Our approach lays the foundations to extract empirical single-molecule free energy surfaces directly from experimental measurements.

  2. Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier

    Directory of Open Access Journals (Sweden)

    Ryden Tobias

    2010-10-01

    Full Text Available Abstract Background Genome wide gene expression data is a rich source for the identification of gene signatures suitable for clinical purposes and a number of statistical algorithms have been described for both identification and evaluation of such signatures. Some employed algorithms are fairly complex and hence sensitive to over-fitting whereas others are more simple and straight forward. Here we present a new type of simple algorithm based on ROC analysis and the use of metagenes that we believe will be a good complement to existing algorithms. Results The basis for the proposed approach is the use of metagenes, instead of collections of individual genes, and a feature selection using AUC values obtained by ROC analysis. Each gene in a data set is assigned an AUC value relative to the tumor class under investigation and the genes are ranked according to these values. Metagenes are then formed by calculating the mean expression level for an increasing number of ranked genes, and the metagene expression value that optimally discriminates tumor classes in the training set is used for classification of new samples. The performance of the metagene is then evaluated using LOOCV and balanced accuracies. Conclusions We show that the simple uni-variate gene expression average algorithm performs as well as several alternative algorithms such as discriminant analysis and the more complex approaches such as SVM and neural networks. The R package rocc is freely available at http://cran.r-project.org/web/packages/rocc/index.html.

  3. 随机右删失非参数回归模型的影响分析%Influence Analysis of Non-parametric Regression Model with Random Right Censorship

    Institute of Scientific and Technical Information of China (English)

    王淑玲; 冯予; 刘刚

    2012-01-01

    In this paper, the primary model is transformed to non-parametric regression model; Then, local influence is discussed and concise influence matrix is obtained; At last, example is given to illustrate our results.%将随机删失非参数固定设计回归模型转化为非参数回归模型进行研究;然后对此模型作了局部影响分析,得到计算影响矩阵及最大影响曲率方向的简洁公式;最后通过实例分析,验证了分析方法的有效性.

  4. Non-Parametric Cell-Based Photometric Proxies for Galaxy Morphology: Methodology and Application to the Morphologically-Defined Star Formation -- Stellar Mass Relation of Spiral Galaxies in the Local Universe

    CERN Document Server

    Grootes, M W; Popescu, C C; Robotham, A S G; Seibert, M; Kelvin, L S

    2013-01-01

    (Abridged) We present a non-parametric cell-based method of selecting highly pure and largely complete samples of spiral galaxies using photometric and structural parameters as provided by standard photometric pipelines and simple shape fitting algorithms, demonstrably superior to commonly used proxies. Furthermore, we find structural parameters derived using passbands longwards of the $g$ band and linked to older stellar populations, especially the stellar mass surface density $\\mu_*$ and the $r$ band effective radius $r_e$, to perform at least equally well as parameters more traditionally linked to the identification of spirals by means of their young stellar populations. In particular the distinct bimodality in the parameter $\\mu_*$, consistent with expectations of different evolutionary paths for spirals and ellipticals, represents an often overlooked yet powerful parameter in differentiating between spiral and non-spiral/elliptical galaxies. We investigate the intrinsic specific star-formation rate - ste...

  5. Applications of quantum entropy to statistics

    Energy Technology Data Exchange (ETDEWEB)

    Silver, R.N.; Martz, H.F.

    1994-07-01

    This paper develops two generalizations of the maximum entropy (ME) principle. First, Shannon classical entropy is replaced by von Neumann quantum entropy to yield a broader class of information divergences (or penalty functions) for statistics applications. Negative relative quantum entropy enforces convexity, positivity, non-local extensivity and prior correlations such as smoothness. This enables the extension of ME methods from their traditional domain of ill-posed in-verse problems to new applications such as non-parametric density estimation. Second, given a choice of information divergence, a combination of ME and Bayes rule is used to assign both prior and posterior probabilities. Hyperparameters are interpreted as Lagrange multipliers enforcing constraints. Conservation principles are proposed to act statistical regularization and other hyperparameters, such as conservation of information and smoothness. ME provides an alternative to heirarchical Bayes methods.

  6. Trend and forecasting rate of cancer deaths at a public university hospital using univariate modeling

    Science.gov (United States)

    Ismail, A.; Hassan, Noor I.

    2013-09-01

    Cancer is one of the principal causes of death in Malaysia. This study was performed to determine the pattern of rate of cancer deaths at a public hospital in Malaysia over an 11 year period from year 2001 to 2011, to determine the best fitted model of forecasting the rate of cancer deaths using Univariate Modeling and to forecast the rates for the next two years (2012 to 2013). The medical records of the death of patients with cancer admitted at this Hospital over 11 year's period were reviewed, with a total of 663 cases. The cancers were classified according to 10th Revision International Classification of Diseases (ICD-10). Data collected include socio-demographic background of patients such as registration number, age, gender, ethnicity, ward and diagnosis. Data entry and analysis was accomplished using SPSS 19.0 and Minitab 16.0. The five Univariate Models used were Naïve with Trend Model, Average Percent Change Model (ACPM), Single Exponential Smoothing, Double Exponential Smoothing and Holt's Method. The overall 11 years rate of cancer deaths showed that at this hospital, Malay patients have the highest percentage (88.10%) compared to other ethnic groups with males (51.30%) higher than females. Lung and breast cancer have the most number of cancer deaths among gender. About 29.60% of the patients who died due to cancer were aged 61 years old and above. The best Univariate Model used for forecasting the rate of cancer deaths is Single Exponential Smoothing Technique with alpha of 0.10. The forecast for the rate of cancer deaths shows a horizontally or flat value. The forecasted mortality trend remains at 6.84% from January 2012 to December 2013. All the government and private sectors and non-governmental organizations need to highlight issues on cancer especially lung and breast cancers to the public through campaigns using mass media, media electronics, posters and pamphlets in the attempt to decrease the rate of cancer deaths in Malaysia.

  7. Use of statistical tests and statistical software choice in 2014: tale from three Medline indexed Pakistani journals.

    Science.gov (United States)

    Shaikh, Masood Ali

    2016-04-01

    Statistical tests help infer meaningful conclusions from studies conducted and data collected. This descriptive study analyzed the type of statistical tests used and the statistical software utilized for analysis reported in the original articles published in 2014 by the three Medline-indexed journals of Pakistan. Cumulatively, 466 original articles were published in 2014. The most frequently reported statistical tests for original articles by all three journals were bivariate parametric and non-parametric tests i.e. involving comparisons between two groups e.g. Chi-square test, t-test, and various types of correlations. Cumulatively, 201 (43.1%) articles used these tests. SPSS was the primary choice for statistical analysis, as it was exclusively used in 374 (80.3%) original articles. There has been a substantial increase in the number of articles published, and in the sophistication of statistical tests used in the articles published in the Pakistani Medline indexed journals in 2014, compared to 2007.

  8. Robust statistical approaches to assess the degree of agreement of clinical data

    Science.gov (United States)

    Grilo, Luís M.; Grilo, Helena L.

    2016-06-01

    To analyze the blood of patients who took vitamin B12 for a period of time, two different medicine measurement methods were used (one is the established method, with more human intervention, and the other method uses essentially machines). Given the non-normality of the differences between both measurement methods, the limits of agreement are estimated using also a non-parametric approach to assess the degree of agreement of the clinical data. The bootstrap resampling method is applied in order to obtain robust confidence intervals for mean and median of differences. The approaches used are easy to apply, running a friendly software, and their outputs are also easy to interpret. In this case study the results obtained with (non)parametric approaches lead us to different statistical conclusions, but the decision whether agreement is acceptable or not is always a clinical judgment.

  9. GPGCD, an Iterative Method for Calculating Approximate GCD of Univariate Polynomials, with the Complex Coefficients

    CERN Document Server

    Terui, Akira

    2010-01-01

    We present an extension of our GPGCD method, an iterative method for calculating approximate greatest common divisor (GCD) of univariate polynomials, to polynomials with the complex coefficients. For a given pair of polynomials and a degree, our algorithm finds a pair of polynomials which has a GCD of the given degree and whose coefficients are perturbed from those in the original inputs, making the perturbations as small as possible, along with the GCD. In our GPGCD method, the problem of approximate GCD is transfered to a constrained minimization problem, then solved with a so-called modified Newton method, which is a generalization of the gradient-projection method, by searching the solution iteratively. While our original method is designed for polynomials with the real coefficients, we extend it to accept polynomials with the complex coefficients in this paper.

  10. GPGCD, an Iterative Method for Calculating Approximate GCD, for Multiple Univariate Polynomials

    CERN Document Server

    Terui, Akira

    2010-01-01

    We present an extension of our GPGCD method, an iterative method for calculating approximate greatest common divisor (GCD) of univariate polynomials, to multiple polynomial inputs. For a given pair of polynomials and a degree, our algorithm finds a pair of polynomials which has a GCD of the given degree and whose coefficients are perturbed from those in the original inputs, making the perturbations as small as possible, along with the GCD. In our GPGCD method, the problem of approximate GCD is transferred to a constrained minimization problem, then solved with the so-called modified Newton method, which is a generalization of the gradient-projection method, by searching the solution iteratively. In this paper, we extend our method to accept more than two polynomials with the real coefficients as an input.

  11. GPGCD, an Iterative Method for Calculating Approximate GCD, for Multiple Univariate Polynomials

    Science.gov (United States)

    Terui, Akira

    We present an extension of our GPGCD method, an iterative method for calculating approximate greatest common divisor (GCD) of univariate polynomials, to multiple polynomial inputs. For a given pair of polynomials and a degree, our algorithm finds a pair of polynomials which has a GCD of the given degree and whose coefficients are perturbed from those in the original inputs, making the perturbations as small as possible, along with the GCD. In our GPGCD method, the problem of approximate GCD is transferred to a constrained minimization problem, then solved with the so-called modified Newton method, which is a generalization of the gradient-projection method, by searching the solution iteratively. In this paper, we extend our method to accept more than two polynomials with the real coefficients as an input.

  12. Univariate Risk Factors for Prolonged Mechanical Ventilation in Patients Undergoing Prosthetic Heart Valves Replacement Surgery

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Data from 736 patients undergoing prosthetic heart valve replacement surgery and concomitant surgery (combined surgery) from January 1998 to January 2004 at Union Hospital were retrospectively reviewed. Univariate logistic regression analyses were conducted to identify risk factors for prolonged mechanical ventilation. The results showed that prolonged cardiopulmonary bypass duration, prolonged aortic cross clamp time and low ejection fraction less than 50 percent (50 %)were found to be independent predictors for prolonged mechanical ventilation. Meanwhile age,weight, and preoperative hospital stay (days) were not found to be associated with prolonged mechanical ventilation. It was concluded that, for age and weight, this might be due to the lower number of old age patients (70 years and above) included in our study and genetic body structure of majority Chinese population that favor them to be in normal weight, respectively.

  13. Nonparametric statistical structuring of knowledge systems using binary feature matches

    DEFF Research Database (Denmark)

    Mørup, Morten; Glückstad, Fumiko Kano; Herlau, Tue

    2014-01-01

    statistical support and how this approach generalizes to the structuring and alignment of knowledge systems. We propose a non-parametric Bayesian generative model for structuring binary feature data that does not depend on a specific choice of similarity measure. We jointly model all combinations of binary......Structuring knowledge systems with binary features is often based on imposing a similarity measure and clustering objects according to this similarity. Unfortunately, such analyses can be heavily influenced by the choice of similarity measure. Furthermore, it is unclear at which level clusters have...

  14. Detection of biomarkers for Hepatocellular Carcinoma using a hybrid univariate gene selection methods

    Directory of Open Access Journals (Sweden)

    Abdel Samee Nagwan M

    2012-08-01

    Full Text Available Abstract Background Discovering new biomarkers has a great role in improving early diagnosis of Hepatocellular carcinoma (HCC. The experimental determination of biomarkers needs a lot of time and money. This motivates this work to use in-silico prediction of biomarkers to reduce the number of experiments required for detecting new ones. This is achieved by extracting the most representative genes in microarrays of HCC. Results In this work, we provide a method for extracting the differential expressed genes, up regulated ones, that can be considered candidate biomarkers in high throughput microarrays of HCC. We examine the power of several gene selection methods (such as Pearson’s correlation coefficient, Cosine coefficient, Euclidean distance, Mutual information and Entropy with different estimators in selecting informative genes. A biological interpretation of the highly ranked genes is done using KEGG (Kyoto Encyclopedia of Genes and Genomes pathways, ENTREZ and DAVID (Database for Annotation, Visualization, and Integrated Discovery databases. The top ten genes selected using Pearson’s correlation coefficient and Cosine coefficient contained six genes that have been implicated in cancer (often multiple cancers genesis in previous studies. A fewer number of genes were obtained by the other methods (4 genes using Mutual information, 3genes using Euclidean distance and only one gene using Entropy. A better result was obtained by the utilization of a hybrid approach based on intersecting the highly ranked genes in the output of all investigated methods. This hybrid combination yielded seven genes (2 genes for HCC and 5 genes in different types of cancer in the top ten genes of the list of intersected genes. Conclusions To strengthen the effectiveness of the univariate selection methods, we propose a hybrid approach by intersecting several of these methods in a cascaded manner. This approach surpasses all of univariate selection methods when

  15. Micro-Raman Spectroscopy and Univariate Analysis for Monitoring Disease Follow-Up

    Directory of Open Access Journals (Sweden)

    Vito Capozzi

    2011-08-01

    Full Text Available Micro-Raman spectroscopy is a very promising tool for medical applications, thanks to its sensitivity to subtle changes in the chemical and structural characteristics of biological specimens. To fully exploit these promises, building a method of data analysis properly suited for the case under study is crucial. Here, a linear or univariate approach using a R2 determination coefficient is proposed for discriminating Raman spectra even with small differences. The validity of the proposed approach has been tested using Raman spectra of high purity glucose solutions collected in the 600 to 1,600 cm−1 region and also from solutions with two known solutes at different concentrations. After this validation step, the proposed analysis has been applied to Raman spectra from oral human tissues affected by Pemphigus Vulgaris (PV, a rare life-threatening autoimmune disease, for monitoring disease follow-up. Raman spectra have been obtained in the wavenumber regions from 1,050 to 1,700 cm−1 and 2,700 to 3,200 cm−1 from tissues of patients at different stages of pathology (active PV, under therapy and PV in remission stage as confirmed by histopathological and immunofluorescence analysis. Differences in the spectra depending on tissue illness stage have been detected at 1,150–1,250 cm−1 (amide III and 1,420–1,450 cm−1 (CH3 deformation regions and around 1,650 cm−1 (amide I and 2,930 cm−1 (CH3 symmetric stretch. The analysis of tissue Raman spectra by the proposed univariate method has allowed us to effectively differentiate tissues at different stages of pathology.

  16. The Effects of Statistical Analysis Software and Calculators on Statistics Achievement

    Science.gov (United States)

    Christmann, Edwin P.

    2009-01-01

    This study compared the effects of microcomputer-based statistical software and hand-held calculators on the statistics achievement of university males and females. The subjects, 73 graduate students enrolled in univariate statistics classes at a public comprehensive university, were randomly assigned to groups that used either microcomputer-based…

  17. Analysis of meteorological droughts for the Saskatchewan River Basin using univariate and bivariate approaches

    Science.gov (United States)

    Masud, M. B.; Khaliq, M. N.; Wheater, H. S.

    2015-03-01

    This study is focused on the Saskatchewan River Basin (SRB) that spans southern parts of Alberta, Saskatchewan and Manitoba, the three Prairie Provinces of Canada, where most of the country's agricultural activities are concentrated. The SRB is confronted with immense water-related challenges and is now one of the ten GEWEX (Global Energy and Water Exchanges) Regional Hydroclimate Projects in the world. In the past, various multi-year droughts have been observed in this part of Canada that impacted agriculture, energy and socio-economic sectors. Therefore, proper understanding of the spatial and temporal characteristics of historical droughts is important for many water resources planning and management related activities across the basin. In the study, observed gridded data of daily precipitation and temperature and conventional univariate and copula-based bivariate frequency analyses are used to characterize drought events in terms of drought severity and duration on the basis of two drought indices, the Standardized Precipitation Index (SPI) and the Standardized Precipitation Evapotranspiration Index (SPEI). Within the framework of univariate and bivariate analyses, drought risk indicators are developed and mapped across the SRB to delineate the most vulnerable parts of the basin. Based on the results obtained, southern parts of the SRB (i.e., western part of the South Saskatchewan River, Seven Persons Creek and Bigstick Lake watersheds) are associated with a higher drought risk, while moderate risk is noted for the North Saskatchewan River (except its eastern parts), Red Deer River, Oldman River, Bow River, Sounding Creek, Carrot River and Battle River watersheds. Lower drought risk is found for the areas surrounding the Saskatchewan-Manitoba border (particularly, the Saskatchewan River watershed). It is also found that the areas characterized with higher drought severity are also associated with higher drought duration. A comparison of SPI- and SPEI

  18. Genetic parameters for growth characteristics of free-range chickens under univariate random regression models.

    Science.gov (United States)

    Rovadoscki, Gregori A; Petrini, Juliana; Ramirez-Diaz, Johanna; Pertile, Simone F N; Pertille, Fábio; Salvian, Mayara; Iung, Laiza H S; Rodriguez, Mary Ana P; Zampar, Aline; Gaya, Leila G; Carvalho, Rachel S B; Coelho, Antonio A D; Savino, Vicente J M; Coutinho, Luiz L; Mourão, Gerson B

    2016-09-01

    Repeated measures from the same individual have been analyzed by using repeatability and finite dimension models under univariate or multivariate analyses. However, in the last decade, the use of random regression models for genetic studies with longitudinal data have become more common. Thus, the aim of this research was to estimate genetic parameters for body weight of four experimental chicken lines by using univariate random regression models. Body weight data from hatching to 84 days of age (n = 34,730) from four experimental free-range chicken lines (7P, Caipirão da ESALQ, Caipirinha da ESALQ and Carijó Barbado) were used. The analysis model included the fixed effects of contemporary group (gender and rearing system), fixed regression coefficients for age at measurement, and random regression coefficients for permanent environmental effects and additive genetic effects. Heterogeneous variances for residual effects were considered, and one residual variance was assigned for each of six subclasses of age at measurement. Random regression curves were modeled by using Legendre polynomials of the second and third orders, with the best model chosen based on the Akaike Information Criterion, Bayesian Information Criterion, and restricted maximum likelihood. Multivariate analyses under the same animal mixed model were also performed for the validation of the random regression models. The Legendre polynomials of second order were better for describing the growth curves of the lines studied. Moderate to high heritabilities (h(2) = 0.15 to 0.98) were estimated for body weight between one and 84 days of age, suggesting that selection for body weight at all ages can be used as a selection criteria. Genetic correlations among body weight records obtained through multivariate analyses ranged from 0.18 to 0.96, 0.12 to 0.89, 0.06 to 0.96, and 0.28 to 0.96 in 7P, Caipirão da ESALQ, Caipirinha da ESALQ, and Carijó Barbado chicken lines, respectively. Results indicate that

  19. Handbook of univariate and multivariate data analysis and interpretation with SPSS

    CERN Document Server

    Ho, Robert

    2006-01-01

    Many statistics texts tend to focus more on the theory and mathematics underlying statistical tests than on their applications and interpretation. This can leave readers with little understanding of how to apply statistical tests or how to interpret their findings. While the SPSS statistical software has done much to alleviate the frustrations of social science professionals and students who must analyze data, they still face daunting challenges in selecting the proper tests, executing the tests, and interpreting the test results.With emphasis firmly on such practical matters, this handbook se

  20. Improving the performance of univariate control charts for abnormal detection and classification

    Science.gov (United States)

    Yiakopoulos, Christos; Koutsoudaki, Maria; Gryllias, Konstantinos; Antoniadis, Ioannis

    2017-03-01

    Bearing failures in rotating machinery can cause machine breakdown and economical loss, if no effective actions are taken on time. Therefore, it is of prime importance to detect accurately the presence of faults, especially at their early stage, to prevent sequent damage and reduce costly downtime. The machinery fault diagnosis follows a roadmap of data acquisition, feature extraction and diagnostic decision making, in which mechanical vibration fault feature extraction is the foundation and the key to obtain an accurate diagnostic result. A challenge in this area is the selection of the most sensitive features for various types of fault, especially when the characteristics of failures are difficult to be extracted. Thus, a plethora of complex data-driven fault diagnosis methods are fed by prominent features, which are extracted and reduced through traditional or modern algorithms. Since most of the available datasets are captured during normal operating conditions, the last decade a number of novelty detection methods, able to work when only normal data are available, have been developed. In this study, a hybrid method combining univariate control charts and a feature extraction scheme is introduced focusing towards an abnormal change detection and classification, under the assumption that measurements under normal operating conditions of the machinery are available. The feature extraction method integrates the morphological operators and the Morlet wavelets. The effectiveness of the proposed methodology is validated on two different experimental cases with bearing faults, demonstrating that the proposed approach can improve the fault detection and classification performance of conventional control charts.

  1. Segmentation of Coronary Angiograms Using Gabor Filters and Boltzmann Univariate Marginal Distribution Algorithm

    Science.gov (United States)

    Cervantes-Sanchez, Fernando; Hernandez-Aguirre, Arturo; Solorio-Meza, Sergio; Ornelas-Rodriguez, Manuel; Torres-Cisneros, Miguel

    2016-01-01

    This paper presents a novel method for improving the training step of the single-scale Gabor filters by using the Boltzmann univariate marginal distribution algorithm (BUMDA) in X-ray angiograms. Since the single-scale Gabor filters (SSG) are governed by three parameters, the optimal selection of the SSG parameters is highly desirable in order to maximize the detection performance of coronary arteries while reducing the computational time. To obtain the best set of parameters for the SSG, the area (Az) under the receiver operating characteristic curve is used as fitness function. Moreover, to classify vessel and nonvessel pixels from the Gabor filter response, the interclass variance thresholding method has been adopted. The experimental results using the proposed method obtained the highest detection rate with Az = 0.9502 over a training set of 40 images and Az = 0.9583 with a test set of 40 images. In addition, the experimental results of vessel segmentation provided an accuracy of 0.944 with the test set of angiograms. PMID:27738422

  2. Univariate and multivariate analysis on processing tomato quality under different mulches

    Directory of Open Access Journals (Sweden)

    Carmen Moreno

    2014-04-01

    Full Text Available The use of eco-friendly mulch materials as alternatives to the standard polyethylene (PE has become increasingly prevalent worldwide. Consequently, a comparison of mulch materials from different origins is necessary to evaluate their feasibility. Several researchers have compared the effects of mulch materials on each crop variable through univariate analysis (ANOVA. However, it is important to focus on the effect of these materials on fruit quality, because this factor decisively influences the acceptance of the final product by consumers and the industrial sector. This study aimed to analyze the information supplied by a randomized complete block experiment combined over two seasons, a principal component analysis (PCA and a cluster analysis (CA when studying the effects of mulch materials on the quality of processing tomato (Lycopersicon esculentum Mill.. The study focused on the variability in the quality measurements and on the determination of mulch materials with a similar response to them. A comparison of the results from both types of analysis yielded complementary information. ANOVA showed the similarity of certain materials. However, considering the totality of the variables analyzed, the final interpretation was slightly complicated. PCA indicated that the juice color, the fruit firmness and the soluble solid content were the most influential factors in the total variability of a set of 12 juice and fruit variables, and CA allowed us to establish four categories of treatment: plastics (polyethylene - PE, oxo- and biodegradable materials, papers, manual weeding and barley (Hordeum vulgare L. straw. Oxobiodegradable and PE were most closely related based on CA.

  3. Segmentation of Coronary Angiograms Using Gabor Filters and Boltzmann Univariate Marginal Distribution Algorithm

    Directory of Open Access Journals (Sweden)

    Fernando Cervantes-Sanchez

    2016-01-01

    Full Text Available This paper presents a novel method for improving the training step of the single-scale Gabor filters by using the Boltzmann univariate marginal distribution algorithm (BUMDA in X-ray angiograms. Since the single-scale Gabor filters (SSG are governed by three parameters, the optimal selection of the SSG parameters is highly desirable in order to maximize the detection performance of coronary arteries while reducing the computational time. To obtain the best set of parameters for the SSG, the area (Az under the receiver operating characteristic curve is used as fitness function. Moreover, to classify vessel and nonvessel pixels from the Gabor filter response, the interclass variance thresholding method has been adopted. The experimental results using the proposed method obtained the highest detection rate with Az=0.9502 over a training set of 40 images and Az=0.9583 with a test set of 40 images. In addition, the experimental results of vessel segmentation provided an accuracy of 0.944 with the test set of angiograms.

  4. 广义Lorenz曲线的非参数统计推断%Non-parametric inferences for the generalized Lorenz curve

    Institute of Scientific and Technical Information of China (English)

    杨宝莹; 秦更生; BELINGA-HILL Nelly E.

    2012-01-01

    本文讨论了广义Lorenz曲线的经验似然统计推断.在简单随机抽样、分层随机抽样和整群随机抽样下,本文分别定义了广义Lorenz坐标的profile经验似然比统计量,得出这些经验似然比的极限分布为带系数的自由度为1的x2分布.对于整个Lorenz曲线,基于经验似然方法类似地得出相应的极限过程.根据所得的经验似然理论,本文给出了bootstrap经验似然置信区间构造方法,并通过数据模拟,对新给出的广义Lorenz坐标的bootstrap经验似然置信区间与渐近正态置信区间以及bootstrap置信区间等进行了对比研究.对整个Lorenz曲线,基于经验似然方法对其置信域也进行了模拟研究.最后我们将所推荐的置信区间应用到实例中.%In this paper, we discuss the empirical likelihood-based inferences for the generalized Lorenz (GL) curve. In the settings of simple random sampling, stratified random sampling and cluster random sampling, it is shown that the limiting distributions of the empirical likelihood ratio statistics for the GL ordinate are the scaled x2 distributions with one degree of freedom. We also derive the limiting processes of the associated empirical likelihood-based GL processes. Various confidence intervals for the GL ordinate are proposed based on bootstrap method and the newly developed empirical likelihood theory. Extensive simulation studies are conducted to compare the relative performances of various confidence intervals for GL ordinates in terms of coverage probability and average interval length. The finite sample performances of the empirical likelihood-based confidence bands are also illustrated in simulation studies. Finally, a real example is used to illustrate the application of the recommended intervals.

  5. Basic statistical tools in research and data analysis

    Science.gov (United States)

    Ali, Zulfiqar; Bhaskar, S Bala

    2016-01-01

    Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

  6. Breakdown of statistical inference from some random experiments

    CERN Document Server

    Kupczynski, Marian

    2014-01-01

    Many experiments can be interpreted in terms of random processes operating according to some internal protocols. When experiments are costly or cannot be repeated as in some clinical trials one has data gathered in only one or in a few long runs of the experiment. In this paper we study data generated by computer experiments operating according to particular internal protocols. We show that the standard statistical analysis of a sample, containing 100 000 data points or more, may sometimes be highly misleading and statistical errors largely underestimated. Our results confirm in a dramatic way the dangers of standard asymptotic statistical inference based on data gathered in one, possibly long run of the experiment. We demonstrate that analyzing various subdivisions of samples by multiple chi-square tests and chi-square frequency graphs is very effective in detecting the anomalies. Therefore to assure correctness of the statistical inference the above mentioned chi-square tests and other non-parametric sample...

  7. Basic statistical tools in research and data analysis.

    Science.gov (United States)

    Ali, Zulfiqar; Bhaskar, S Bala

    2016-09-01

    Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

  8. Basic statistical tools in research and data analysis

    Directory of Open Access Journals (Sweden)

    Zulfiqar Ali

    2016-01-01

    Full Text Available Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

  9. Female children with autism spectrum disorder: an insight from mass-univariate and pattern classification analyses.

    Science.gov (United States)

    Calderoni, Sara; Retico, Alessandra; Biagi, Laura; Tancredi, Raffaella; Muratori, Filippo; Tosetti, Michela

    2012-01-16

    Several studies on structural MRI in children with autism spectrum disorders (ASD) have mainly focused on samples prevailingly consisting of males. Sex differences in brain structure are observable since infancy and therefore caution is required in transferring to females the results obtained for males. The neuroanatomical phenotype of female children with ASD (ASDf) represents indeed a neglected area of research. In this study, we investigated for the first time the anatomic brain structures of a sample entirely composed of ASDf (n=38; 2-7 years of age; mean=53 months; SD=18) with respect to 38 female age and non verbal IQ matched controls, using both mass-univariate and pattern classification approaches. The whole brain volumes of each group were compared using voxel-based morphometry (VBM) with diffeomorphic anatomical registration through exponentiated lie algebra (DARTEL) procedure, allowing us to build a study-specific template. Significantly more gray matter (GM) was found in the left superior frontal gyrus (SFG) in ASDf subjects compared to controls. The GM segments obtained in the VBM-DARTEL preprocessing are also classified with a support vector machine (SVM), using the leave-pair-out cross-validation protocol. Then, the recursive feature elimination (SVM-RFE) approach allows for the identification of the most discriminating voxels in the GM segments and these prove extremely consistent with the SFG region identified by the VBM analysis. Furthermore, the SVM-RFE map obtained with the most discriminating set of voxels corresponding to the maximum Area Under the Receiver Operating Characteristic Curve (AUC(max)=0.80) highlighted a more complex circuitry of increased cortical volume in ASDf, involving bilaterally the SFG and the right temporo-parietal junction (TPJ). The SFG and TPJ abnormalities may be relevant to the pathophysiology of ASDf, since these structures participate in some core atypical features of autism. Copyright © 2011 Elsevier Inc. All

  10. Pleiotropic locus for emotion recognition and amygdala volume identified using univariate and bivariate linkage

    Science.gov (United States)

    Knowles, Emma E. M.; McKay, D. Reese; Kent, Jack W.; Sprooten, Emma; Carless, Melanie A.; Curran, Joanne E.; de Almeida, Marcio A. A.; Dyer, Thomas D.; Göring, Harald H. H.; Olvera, Rene; Duggirala, Ravi; Fox, Peter; Almasy, Laura; Blangero, John; Glahn, David. C.

    2014-01-01

    The role of the amygdala in emotion recognition is well established and separately each trait has been shown to be highly heritable, but the potential role of common genetic influences on both traits has not been explored. Here we present an investigation of the pleiotropic influences of amygdala and emotion recognition in a sample of randomly selected, extended pedigrees (N = 858). Using a combination of univariate and bivariate linkage we found a pleiotropic region for amygdala and emotion recognition on 4q26 (LOD = 4.34). Association analysis conducted in the region underlying the bivariate linkage peak revealed a variant meeting the corrected significance level (pBonferroni = 5.01×10−05) within an intron of PDE5A (rs2622497, Χ2 =16.67, p = 4.4×10−05) as being jointly influential on both traits. PDE5A has been implicated previously in recognition-memory deficits and is expressed in subcortical structures that are thought to underlie memory ability including the amygdala. The present paper extends our understanding of the shared etiology between amygdala and emotion recognition by showing that the overlap between the two traits is due, at least in part, to common genetic influences. Moreover, the present paper identifies a pleiotropic locus for the two traits and an associated variant, which localizes the genetic signal even more precisely. These results, when taken in the context of previous research, highlight the potential utility of PDE5-inhibitors for ameliorating emotion-recognition deficits in populations including, but not exclusively, those individuals suffering from mental or neurodegenerative illness. PMID:25322361

  11. Climatic spatialization and analyses of longitudinal data of beef cattle Nellore raising Maranhão, Pará and Tocantins using univariate and multivariate approach

    Directory of Open Access Journals (Sweden)

    Jorge Luís Ferreira

    2014-09-01

    Full Text Available This study was carried out to spatialize climatic factors that best discriminate the states of Maranhão, Pará and Tocantins, to analyze the structure of phenotypic correlation between phenotypic variables weights standardized at 120, 210, 365, 450 and 550 days old and propose phenotypic indices for animals selection in these States. The climate variables analyzed were maximum temperature, minimum temperature, average temperature, precipitation, normalized difference vegetative index, humidity, altitude and temperature and humidity index. Univariate and multivariate approach were used by procedures program Statistical Analysis System, SAS, to explain the relationship intra-variables, phenotypic and environmental variation. The expected differences in the progenies (EDPs were predicted using the software MTDFREML. All climatic and phenotypic variables were effective in discriminating the Maranhão, Pará and Tocantins States. Thus, we suggest the use of phenotypic indices for classification and animals’ selection within each State.

  12. The use of principal components and univariate charts to control multivariate processes

    Directory of Open Access Journals (Sweden)

    Marcela A. G. Machado

    2008-04-01

    Full Text Available In this article, we evaluate the performance of the T² chart based on the principal components (PC X chart and the simultaneous univariate control charts based on the original variables (SU charts or based on the principal components (SUPC charts. The main reason to consider the PC chart lies on the dimensionality reduction. However, depending on the disturbance and on the way the original variables are related, the chart is very slow in signaling, except when all variables are negatively correlated and the principal component is wisely selected. Comparing the SU , the SUPC and the T² charts we conclude that the SU X charts (SUPC charts have a better overall performance when the variables are positively (negatively correlated. We also develop the expression to obtain the power of two S² charts designed for monitoring the covariance matrix. These joint S² charts are, in the majority of the cases, more efficient than the generalized variance chart.Neste artigo, avaliamos o desempenho do gráfico de T² baseado em componentes principais (gráfico PC e dos gráficos de controle simultâneos univariados baseados nas variáveis originais (gráfico SU X ou baseados em componentes principais (gráfico SUPC. A principal razão para o uso do gráfico PC é a redução de dimensionalidade. Entretanto, dependendo da perturbação e da correlação entre as variáveis originais, o gráfico é lento em sinalizar, exceto quando todas as variáveis são negativamente correlacionadas e a componente principal é adequadamente escolhida. Comparando os gráficos SU X, SUPC e T² concluímos que o gráfico SU X (gráfico SUPC tem um melhor desempenho global quando as variáveis são positivamente (negativamente correlacionadas. Desenvolvemos também uma expressão para obter o poder de detecção de dois gráficos de S² projetados para controlar a matriz de covariâncias. Os gráficos conjuntos de S² são, na maioria dos casos, mais eficientes que o gr

  13. Corporate failure: a non parametric method

    Directory of Open Access Journals (Sweden)

    Ben Jabeur Sami

    2013-07-01

    Full Text Available A number of authors suggested that the impact of the macroeconomic factors on the incidence of the financial distress, and afterward in case of failure of companies. However, macroeconomic factors rarely, if ever, appear as variables in predictive models that seek to identify distress and failure; modellers generally suggest that the impact of macroeconomic factors has already been taken into account by financial ratio variables.  This article presents a systematic study of this domain, by examining the link between the failure of companies and macroeconomic factors for the French companies to identify the most important variables and to estimate their utility in a predictive context. The results of the study suggest that several macroeconomic variables are strictly associated to the failure, and have a predictive value by specifying the relation between the financial distress and the failure.

  14. Parametric and Non-Parametric System Modelling

    DEFF Research Database (Denmark)

    Nielsen, Henrik Aalborg

    1999-01-01

    other aspects, the properties of a method for parameter estimation in stochastic differential equations is considered within the field of heat dynamics of buildings. In the second paper a lack-of-fit test for stochastic differential equations is presented. The test can be applied to both linear and non-linear...... networks is included. In this paper, neural networks are used for predicting the electricity production of a wind farm. The results are compared with results obtained using an adaptively estimated ARX-model. Finally, two papers on stochastic differential equations are included. In the first paper, among...... stochastic differential equations. Some applications are presented in the papers. In the summary report references are made to a number of other applications. Resumé på dansk: Nærværende afhandling består af ti artikler publiceret i perioden 1996-1999 samt et sammendrag og en perspektivering heraf. I...

  15. Non-Parametric Model Drift Detection

    Science.gov (United States)

    2016-07-01

    Analysis Division Information Directorate This report is published in the interest of scientific and technical...took place on datasets made up of text documents. The difference between datasets used to estimate potential error (drop in accuracy) that the model...Assistant, Extraction of executable rules from regulatory text 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT UU 18. NUMBER OF PAGES 19a

  16. Correlated Non-Parametric Latent Feature Models

    CERN Document Server

    Doshi-Velez, Finale

    2012-01-01

    We are often interested in explaining data through a set of hidden factors or features. When the number of hidden features is unknown, the Indian Buffet Process (IBP) is a nonparametric latent feature model that does not bound the number of active features in dataset. However, the IBP assumes that all latent features are uncorrelated, making it inadequate for many realworld problems. We introduce a framework for correlated nonparametric feature models, generalising the IBP. We use this framework to generate several specific models and demonstrate applications on realworld datasets.

  17. Non Parametric Classification Using Learning Vector Quantization

    Science.gov (United States)

    1990-08-21

    0.(0) = a. Then for every finite T and 7 > 0 lim P isup e,, - 6.(t,)I > 7) 0. (2.18)all 10 (t,. <T This result is proved in Section 2.3. The second...91 References A. Benveniste, M. Metivier & P. Priouret [1987], Algorithmes Adaptatifs et Ap- proximations Stochastiques, Mason, Paris . P. Billingsley

  18. Two Dimensions Are Not Better than One: STREAK and the Univariate Signal Detection Model of Remember/Know Performance

    Science.gov (United States)

    Starns, Jeffrey J.; Ratcliff, Roger

    2008-01-01

    We evaluated STREAK and the univariate signal detection model of Remember/Know (RK) judgments in terms of their ability to fit empirical data and produce psychologically meaningful parameter estimates. Participants studied pairs of words and completed item recognition tests with RK judgments as well as associative recognition tests. Fits to the RK…

  19. Statistical analyses for NANOGrav 5-year timing residuals

    Science.gov (United States)

    Wang, Yan; Cordes, James M.; Jenet, Fredrick A.; Chatterjee, Shami; Demorest, Paul B.; Dolch, Timothy; Ellis, Justin A.; Lam, Michael T.; Madison, Dustin R.; McLaughlin, Maura A.; Perrodin, Delphine; Rankin, Joanna; Siemens, Xavier; Vallisneri, Michele

    2017-02-01

    In pulsar timing, timing residuals are the differences between the observed times of arrival and predictions from the timing model. A comprehensive timing model will produce featureless residuals, which are presumably composed of dominating noise and weak physical effects excluded from the timing model (e.g. gravitational waves). In order to apply optimal statistical methods for detecting weak gravitational wave signals, we need to know the statistical properties of noise components in the residuals. In this paper we utilize a variety of non-parametric statistical tests to analyze the whiteness and Gaussianity of the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) 5-year timing data, which are obtained from Arecibo Observatory and Green Bank Telescope from 2005 to 2010. We find that most of the data are consistent with white noise; many data deviate from Gaussianity at different levels, nevertheless, removing outliers in some pulsars will mitigate the deviations.

  20. Statistical Analyses for NANOGrav 5-year Timing Residuals

    CERN Document Server

    Wang, Y; Jenet, F A; Chatterjee, S; Demorest, P B; Dolch, T; Ellis, J A; Lam, M T; Madison, D R; McLaughlin, M; Perrodin, D; Rankin, J; Siemens, X; Vallisneri, M

    2016-01-01

    In pulsar timing, timing residuals are the differences between the observed times of arrival and the predictions from the timing model. A comprehensive timing model will produce featureless residuals, which are presumably composed of dominating noise and weak physical effects excluded from the timing model (e.g. gravitational waves). In order to apply the optimal statistical methods for detecting the weak gravitational wave signals, we need to know the statistical properties of the noise components in the residuals. In this paper we utilize a variety of non-parametric statistical tests to analyze the whiteness and Gaussianity of the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) 5-year timing data which are obtained from the Arecibo Observatory and the Green Bank Telescope from 2005 to 2010 (Demorest et al. 2013). We find that most of the data are consistent with white noise; Many data deviate from Gaussianity at different levels, nevertheless, removing outliers in some pulsars will m...

  1. Univariate and multivariate models of positive and negative networks : Liking, disliking, and bully-victim relationships

    NARCIS (Netherlands)

    Huitsing, Gijs; van Duijn, Marijtje; Snijders, Thomas; Wang, P.; Sainio, Miia; Salmivalli, Christina; Veenstra, René

    2012-01-01

    Three relations between elementary school children were investigated: networks of general dislike and bullying were related to networks of general like. These were modeled using multivariate cross-sectional (statistical) network models. Exponential random graph models for a sample of 18 classrooms,

  2. Basic elements of computational statistics

    CERN Document Server

    Härdle, Wolfgang Karl; Okhrin, Yarema

    2017-01-01

    This textbook on computational statistics presents tools and concepts of univariate and multivariate statistical data analysis with a strong focus on applications and implementations in the statistical software R. It covers mathematical, statistical as well as programming problems in computational statistics and contains a wide variety of practical examples. In addition to the numerous R sniplets presented in the text, all computer programs (quantlets) and data sets to the book are available on GitHub and referred to in the book. This enables the reader to fully reproduce as well as modify and adjust all examples to their needs. The book is intended for advanced undergraduate and first-year graduate students as well as for data analysts new to the job who would like a tour of the various statistical tools in a data analysis workshop. The experienced reader with a good knowledge of statistics and programming might skip some sections on univariate models and enjoy the various mathematical roots of multivariate ...

  3. Comparison of Three Statistical Classification Techniques for Maser Identification

    Science.gov (United States)

    Manning, Ellen M.; Holland, Barbara R.; Ellingsen, Simon P.; Breen, Shari L.; Chen, Xi; Humphries, Melissa

    2016-04-01

    We applied three statistical classification techniques-linear discriminant analysis (LDA), logistic regression, and random forests-to three astronomical datasets associated with searches for interstellar masers. We compared the performance of these methods in identifying whether specific mid-infrared or millimetre continuum sources are likely to have associated interstellar masers. We also discuss the interpretability of the results of each classification technique. Non-parametric methods have the potential to make accurate predictions when there are complex relationships between critical parameters. We found that for the small datasets the parametric methods logistic regression and LDA performed best, for the largest dataset the non-parametric method of random forests performed with comparable accuracy to parametric techniques, rather than any significant improvement. This suggests that at least for the specific examples investigated here accuracy of the predictions obtained is not being limited by the use of parametric models. We also found that for LDA, transformation of the data to match a normal distribution led to a significant improvement in accuracy. The different classification techniques had significant overlap in their predictions; further astronomical observations will enable the accuracy of these predictions to be tested.

  4. On a New Class of Univariate Continuous Distributions that are Closed Under Inversion

    Directory of Open Access Journals (Sweden)

    Saleha Naghmi Habibullah

    2006-07-01

    Full Text Available Inverted probability distributions find applications in various real – life situations including econometrics, survey sampling, biological sciences and life – testing. Closure under inversion implies that the reciprocal of a continuous random variable X has the same probability function as the original random variable, allowing for a possible change in parameter values. To date, only a very few probability distributions have been found to possess the closure property. In this paper, an attempt has been made to generate a class of distributions that are closed under inversion, and to develop some statistical properties of this class of distributions.

  5. Ecotoxicology is not normal: A comparison of statistical approaches for analysis of count and proportion data in ecotoxicology.

    Science.gov (United States)

    Szöcs, Eduard; Schäfer, Ralf B

    2015-09-01

    Ecotoxicologists often encounter count and proportion data that are rarely normally distributed. To meet the assumptions of the linear model, such data are usually transformed or non-parametric methods are used if the transformed data still violate the assumptions. Generalized linear models (GLMs) allow to directly model such data, without the need for transformation. Here, we compare the performance of two parametric methods, i.e., (1) the linear model (assuming normality of transformed data), (2) GLMs (assuming a Poisson, negative binomial, or binomially distributed response), and (3) non-parametric methods. We simulated typical data mimicking low replicated ecotoxicological experiments of two common data types (counts and proportions from counts). We compared the performance of the different methods in terms of statistical power and Type I error for detecting a general treatment effect and determining the lowest observed effect concentration (LOEC). In addition, we outlined differences on a real-world mesocosm data set. For count data, we found that the quasi-Poisson model yielded the highest power. The negative binomial GLM resulted in increased Type I errors, which could be fixed using the parametric bootstrap. For proportions, binomial GLMs performed better than the linear model, except to determine LOEC at extremely low sample sizes. The compared non-parametric methods had generally lower power. We recommend that counts in one-factorial experiments should be analyzed using quasi-Poisson models and proportions from counts by binomial GLMs. These methods should become standard in ecotoxicology.

  6. Injury Statistics

    Science.gov (United States)

    ... Certification Import Safety International Recall Guidance Civil and Criminal Penalties Federal Court Orders & Decisions Research & Statistics Research & Statistics Technical Reports Injury Statistics NEISS Injury ...

  7. Cosmic Statistics of Statistics

    OpenAIRE

    Szapudi, I.; Colombi, S.; Bernardeau, F.

    1999-01-01

    The errors on statistics measured in finite galaxy catalogs are exhaustively investigated. The theory of errors on factorial moments by Szapudi & Colombi (1996) is applied to cumulants via a series expansion method. All results are subsequently extended to the weakly non-linear regime. Together with previous investigations this yields an analytic theory of the errors for moments and connected moments of counts in cells from highly nonlinear to weakly nonlinear scales. The final analytic formu...

  8. An updated weight of evidence approach to the aquatic hazard assessment of Bisphenol A and the derivation a new predicted no effect concentration (Pnec) using a non-parametric methodology.

    Science.gov (United States)

    Wright-Walters, Maxine; Volz, Conrad; Talbott, Evelyn; Davis, Devra

    2011-01-15

    An aquatic hazard assessment establishes a derived predicted no effect concentration (PNEC) below which it is assumed that aquatic organisms will not suffer adverse effects from exposure to a chemical. An aquatic hazard assessment of the endocrine disruptor Bisphenol A [BPA; 2, 2-bis (4-hydroxyphenyl) propane] was conducted using a weight of evidence approach, using the ecotoxicological endpoints of survival, growth and development and reproduction. New evidence has emerged that suggests that the aquatic system may not be sufficiently protected from adverse effects of BPA exposure at the current PNEC value of 100 μg/L. It is with this background that; 1) An aquatic hazard assessment for BPA using a weight of evidence approach, was conducted, 2) A PNEC value was derived using a non parametric hazardous concentration for 5% of the specie (HC(5)) approach and, 3) The derived BPA hazard assessment values were compared to aquatic environmental concentrations for BPA to determine, sufficient protectiveness from BPA exposure for aquatic species. A total of 61 studies yielded 94 no observed effect concentration (NOEC) and a toxicity dataset, which suggests that the aquatic effects of mortality, growth and development and reproduction are most likely to occur between the concentrations of 0.0483 μg/L and 2280 μg/L. This finding is within the range for aquatic adverse estrogenic effects reported in the literature. A PNEC of 0.06 μg/L was calculated. The 95% confidence interval was found to be (0.02, 3.40) μg/L. Thus, using the weight of evidence approach based on repeated measurements of these endpoints, the results indicate that currently observed BPA concentrations in surface waters exceed this newly derived PNEC value of 0.06 μg/L. This indicates that some aquatic receptors may be at risk for adverse effects on survival, growth and development and reproduction from BPA exposure at environmentally relevant concentrations. Copyright © 2010 Elsevier B.V. All rights

  9. Statistical significance of trends in monthly heavy precipitation over the US

    KAUST Repository

    Mahajan, Salil

    2011-05-11

    Trends in monthly heavy precipitation, defined by a return period of one year, are assessed for statistical significance in observations and Global Climate Model (GCM) simulations over the contiguous United States using Monte Carlo non-parametric and parametric bootstrapping techniques. The results from the two Monte Carlo approaches are found to be similar to each other, and also to the traditional non-parametric Kendall\\'s τ test, implying the robustness of the approach. Two different observational data-sets are employed to test for trends in monthly heavy precipitation and are found to exhibit consistent results. Both data-sets demonstrate upward trends, one of which is found to be statistically significant at the 95% confidence level. Upward trends similar to observations are observed in some climate model simulations of the twentieth century, but their statistical significance is marginal. For projections of the twenty-first century, a statistically significant upwards trend is observed in most of the climate models analyzed. The change in the simulated precipitation variance appears to be more important in the twenty-first century projections than changes in the mean precipitation. Stochastic fluctuations of the climate-system are found to be dominate monthly heavy precipitation as some GCM simulations show a downwards trend even in the twenty-first century projections when the greenhouse gas forcings are strong. © 2011 Springer-Verlag.

  10. The analysis of variance in anaesthetic research: statistics, biography and history.

    Science.gov (United States)

    Pandit, J J

    2010-12-01

    Multiple t-tests (or their non-parametric equivalents) are often used erroneously to compare the means of three or more groups in anaesthetic research. Methods for correcting the p value regarded as significant can be applied to take account of multiple testing, but these are somewhat arbitrary and do not avoid several unwieldy calculations. The appropriate method for most such comparisons is the 'analysis of variance' that not only economises on the number of statistical procedures, but also indicates if underlying factors or sub-groups have contributed to any significant results. This article outlines the history, rationale and method of this analysis.

  11. Probability theory for 3-layer remote sensing radiative transfer model: univariate case.

    Science.gov (United States)

    Ben-David, Avishai; Davidson, Charles E

    2012-04-23

    A probability model for a 3-layer radiative transfer model (foreground layer, cloud layer, background layer, and an external source at the end of line of sight) has been developed. The 3-layer model is fundamentally important as the primary physical model in passive infrared remote sensing. The probability model is described by the Johnson family of distributions that are used as a fit for theoretically computed moments of the radiative transfer model. From the Johnson family we use the SU distribution that can address a wide range of skewness and kurtosis values (in addition to addressing the first two moments, mean and variance). In the limit, SU can also describe lognormal and normal distributions. With the probability model one can evaluate the potential for detecting a target (vapor cloud layer), the probability of observing thermal contrast, and evaluate performance (receiver operating characteristics curves) in clutter-noise limited scenarios. This is (to our knowledge) the first probability model for the 3-layer remote sensing geometry that treats all parameters as random variables and includes higher-order statistics.

  12. Comparing two methods of univariate discriminant analysis for sex discrimination in an Iberian population.

    Science.gov (United States)

    Jiménez-Arenas, Juan Manuel; Esquivel, José Antonio

    2013-05-10

    This study assesses the performance of two analytical approaches to sex discrimination based on single linear variables: discriminant analysis and the Lubischew's test. Ninety individuals from an archaeological population (La Torrecilla-Arenas del Rey, Granada, southern Spain) and 17 craniometrical variables were included in the analyses. Most craniometrical variables were higher for men. The bizygomatic breadth enabled the highest level of discrimination: 87.5% and 88.5%, using discriminant analysis and Lubischew's test, respectively. Bizygomatic breadth proved highly dimorphic in comparison to other populations reported in the literature. Lubischew's test raised the discrimination percentage in specific craniometrical variables, while others showed a superior performance by means of the discriminant analysis. The inconsistent results across statistical methods resulted from the specific formulation of each procedure. Discriminant analysis accounts both for within-group and between-group variance, while Lubischew's test emphasizes between-group variation only. Therefore, both techniques are recommended, as they provide different means of achieving optimal discrimination percentages.

  13. 中国膳食暴露评估非参数概率模型构建%Establishment of non-parametric probabilistic model for evaluation of Chinese dietary exposure

    Institute of Scientific and Technical Information of China (English)

    孙金芳; 刘沛; 陈炳为; 陈启光; 余小金; 王灿楠; 李靖欣

    2010-01-01

    目的 为提高评估精度并与国际食品安全风险评估技术接轨,构建中国膳食暴露评估非参数概率模型.方法 利用我国膳食调查、污染物监测数据及相应的人口学资料建立膳食消费量和化学污染物浓度经验分布.通过蒙特卡洛(Monte Carlo)模拟和自助法(Bootstrap)抽样获得人群膳食暴露变异度和不确定度.其中,膳食量数据和人口学数据来源于2002年中国居民营养与健康状况调查24 h膳食回顾法收集的22 567个家庭66 172人连续3 d调查共计193 814个人日、1 810 703条数据.污染物监测数据为2000-2006年全国14个省或地区食品污染物监测网以及2005-2006年海关出口农产品监测数据,包括重金属、农药,以及霉菌毒素(如黄曲霉毒素)等135种污染物,涉及499种食物,共计487 819条数据.结果 构建了包括重金属、农药及部分毒素的我国人群膳食暴露非参数概率评估模型,得到不同污染物膳食暴露量分布的指标统计量和95%可信区间.对7~10岁儿童乙酰甲胺磷膳食暴露评估显示,城乡儿童膳食暴露量的中位数分别为1.77μg·kg~(-1)·d~(-1)和2.48μg·kg~(-10·d~(-1),其95%可信区间分别为(1.59~2.06)μg·kg~(-1)·d~(-1)和(2.33~2.80)μg·kg~(-1)·d~(-1).结论 构建的非参数概率模型可量化暴露评估中的变异度和不确定度,提高了膳食暴露评估精度.%Objective To establish a non-parametric probabilistic model for evaluation of Chinese dietary exposure and to improve the assessment accuracy while integrating into the global risk assessment on food safety.Methods Contamination data was from the national food contamination monitoring program during 2000-2006 ,including heavy metals,pesticides and mycotoxins,amounting to 135 contaminants with 499 commodities and 487 819 samples.Food consumption data was obtained from the national diet and nutrition survey conducted in 2002 with three consecutive days by 24-hour recall method

  14. Stochastic univariate and multivariate time series analysis of PM2.5 and PM10 air pollution: A comparative case study for Plovdiv and Asenovgrad, Bulgaria

    Science.gov (United States)

    Gocheva-Ilieva, S.; Stoimenova, M.; Ivanov, A.; Voynikova, D.; Iliev, I.

    2016-10-01

    Fine particulate matter PM2.5 and PM10 air pollutants are a serious problem in many urban areas affecting both the health of the population and the environment as a whole. The availability of large data arrays for the levels of these pollutants makes it possible to perform statistical analysis, to obtain relevant information, and to find patterns within the data. Research in this field is particularly topical for a number of Bulgarian cities, European country, where in recent years regulatory air pollution health limits are constantly being exceeded. This paper examines average daily data for air pollution with PM2.5 and PM10, collected by 3 monitoring stations in the cities of Plovdiv and Asenovgrad between 2011 and 2016. The goal is to find and analyze actual relationships in data time series, to build adequate mathematical models, and to develop short-term forecasts. Modeling is carried out by stochastic univariate and multivariate time series analysis, based on Box-Jenkins methodology. The best models are selected following initial transformation of the data and using a set of standard and robust statistical criteria. The Mathematica and SPSS software were used to perform calculations. This examination showed measured concentrations of PM2.5 and PM10 in the region of Plovdiv and Asenovgrad regularly exceed permissible European and national health and safety thresholds. We obtained adequate stochastic models with high statistical fit with the data and good quality forecasting when compared against actual measurements. The mathematical approach applied provides an independent alternative to standard official monitoring and control means for air pollution in urban areas.

  15. ANÁLISIS ESPECTRAL DE OLAS MARINAS: MODELOS UNIVARIADOS // SPECTRAL ANALYSIS OF WAVE SEA: UNIVARIATE MODELS

    Directory of Open Access Journals (Sweden)

    Nestor Escudero Mora

    2015-12-01

    periods of 20 minutes. The total of the studied periods is 244. Thus, it was determined that there is information that does not add information to the problem, and it represents noise, which is the 3% of energy and it was not taken for the study. The spectrum was divided into 12 subintervals. Autoregressive models were fitted under the scheme Box-Jenskin and Reinsel. The model was studied until the best possible approximation for each subinterval was found by the statistical properties obtained by each model, together with the respective forecasts.

  16. Statistical Test for Bivariate Uniformity

    Directory of Open Access Journals (Sweden)

    Zhenmin Chen

    2014-01-01

    Full Text Available The purpose of the multidimension uniformity test is to check whether the underlying probability distribution of a multidimensional population differs from the multidimensional uniform distribution. The multidimensional uniformity test has applications in various fields such as biology, astronomy, and computer science. Such a test, however, has received less attention in the literature compared with the univariate case. A new test statistic for checking multidimensional uniformity is proposed in this paper. Some important properties of the proposed test statistic are discussed. As a special case, the bivariate statistic test is discussed in detail in this paper. The Monte Carlo simulation is used to compare the power of the newly proposed test with the distance-to-boundary test, which is a recently published statistical test for multidimensional uniformity. It has been shown that the test proposed in this paper is more powerful than the distance-to-boundary test in some cases.

  17. SOCR Analyses: Implementation and Demonstration of a New Graphical Statistics Educational Toolkit

    Directory of Open Access Journals (Sweden)

    Annie Chu

    2009-04-01

    Full Text Available The web-based, Java-written SOCR (Statistical Online Computational Resource toolshave been utilized in many undergraduate and graduate level statistics courses for sevenyears now (Dinov 2006; Dinov et al. 2008b. It has been proven that these resourcescan successfully improve students' learning (Dinov et al. 2008b. Being rst publishedonline in 2005, SOCR Analyses is a somewhat new component and it concentrate on datamodeling for both parametric and non-parametric data analyses with graphical modeldiagnostics. One of the main purposes of SOCR Analyses is to facilitate statistical learn-ing for high school and undergraduate students. As we have already implemented SOCRDistributions and Experiments, SOCR Analyses and Charts fulll the rest of a standardstatistics curricula. Currently, there are four core components of SOCR Analyses. Linearmodels included in SOCR Analyses are simple linear regression, multiple linear regression,one-way and two-way ANOVA. Tests for sample comparisons include t-test in the para-metric category. Some examples of SOCR Analyses' in the non-parametric category areWilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, Kolmogorov-Smirno testand Fligner-Killeen test. Hypothesis testing models include contingency table, Friedman'stest and Fisher's exact test. The last component of Analyses is a utility for computingsample sizes for normal distribution. In this article, we present the design framework,computational implementation and the utilization of SOCR Analyses.

  18. Choosing the best non-parametric richness estimator for benthic macroinvertebrates databases Eligiendo el mejor estimador no paramétrico para calcular riqueza en bases de datos de macroinvertebrados bentónicos

    Directory of Open Access Journals (Sweden)

    Carola V. Basualdo

    2011-06-01

    Full Text Available Non-parametric estimators allow to compare the estimates of richness among data sets from heterogeneous sources. However, since the estimator performance depends on the species-abundance distribution of the sample, preference for one or another is a difficult issue. The present study recovers and revalues some criteria already present in the literature in order to choose the most suitable estimator for streams macroinvertebrates, and provides some tools to apply them. Two abundance and four incidence estimators were applied to a regional database at family and genus level. They were evaluated under four criteria: sub-sample size required to estimate the observed richness; constancy of the sub-sample size; lack of erratic behavior and similarity in curve shape through different data sets. Among incidence estimators, Jack1 had the best performance. Between abundance estimators, ACE was the best when the observed richness was small and Chao1 when the observed richness was high. The uniformity of curves shapes allowed to describe the general sequences of curves behavior that could act as references to compare estimations of small databases and to infer the possible behavior of the curve (i.e the expected richness if the sample were larger. These results can be very useful for environmental management, and update the state of knowledge of regional macroinvertebrates.Los estimadores no paramétricos permiten comparar la riqueza estimada de conjuntos de datos de origen diverso. Empero, como su comportamiento depende de la distribución de abundancia del conjunto de datos, la preferencia por alguno representa una decisión difícil. Este trabajo rescata algunos criterios presentes en la literatura para elegir el estimador más adecuado para macroinvertebrados bentónicos de ríos y ofrece algunas herramientas para su aplicación. Cuatro estimadores de incidencia y dos de abundancia se aplicaron a un inventario regional a nivel de familia y género. Para

  19. Suppressing the charged coupled device noise in univariate thin-layer videoscans: a comparison of several algorithms.

    Science.gov (United States)

    Komsta, Lukasz

    2009-03-20

    The digital processing of chromatographic thin-layer plate images has increasing popularity among last years. When using a camera instead of flatbed scanner, the charged coupled device (CCD) noise is a well-known problem-especially when scanning dark plates with weakly fluorescing spots. Various techniques are proposed to denoise (smooth) univariate signals in chemometric processing, but the choice could be difficult. In the current paper the classical filters (Savitzky-Golay, adaptive degree polynomial filter, Fourier denoising, Butterworth and Chebyshev infinite impulse response filters) were compared with the wavelet shrinkage (31 mother wavelets, 3 thresholding techniques and 8 decomposition levels). The signal obtained from 256 averaged videoscans was treated as the reference signal (with noise naturally suppressed, which was found to be almost white one). The best choice for denoising was the Haar mother wavelet with soft denoising and any decomposition level larger than 1. Satisfying similarity to reference signal was also observed in the case of Butterworth filter, Savitzky-Golay smoothing, ADPF filter, Fourier denoising and soft-thresholded wavelet shrinkage with any mother wavelet and middle to high decomposition level. The Chebyshev filters, Whittaker smoother and wavelet shrinkage with hard thresholding were found to be less efficient. The results obtained can be used as general recommendations for univariate denoising of such signals.

  20. Simultaneous Segmentation and Statistical Label Fusion.

    Science.gov (United States)

    Asman, Andrew J; Landmana, Bennett A

    2012-02-23

    Labeling or segmentation of structures of interest in medical imaging plays an essential role in both clinical and scientific understanding. Two of the common techniques to obtain these labels are through either fully automated segmentation or through multi-atlas based segmentation and label fusion. Fully automated techniques often result in highly accurate segmentations but lack the robustness to be viable in many cases. On the other hand, label fusion techniques are often extremely robust, but lack the accuracy of automated algorithms for specific classes of problems. Herein, we propose to perform simultaneous automated segmentation and statistical label fusion through the reformulation of a generative model to include a linkage structure that explicitly estimates the complex global relationships between labels and intensities. These relationships are inferred from the atlas labels and intensities and applied to the target using a non-parametric approach. The novelty of this approach lies in the combination of previously exclusive techniques and attempts to combine the accuracy benefits of automated segmentation with the robustness of a multi-atlas based approach. The accuracy benefits of this simultaneous approach are assessed using a multi-label multi- atlas whole-brain segmentation experiment and the segmentation of the highly variable thyroid on computed tomography images. The results demonstrate that this technique has major benefits for certain types of problems and has the potential to provide a paradigm shift in which the lines between statistical label fusion and automated segmentation are dramatically blurred.

  1. Algebraic Statistics

    OpenAIRE

    Norén, Patrik

    2013-01-01

    Algebraic statistics brings together ideas from algebraic geometry, commutative algebra, and combinatorics to address problems in statistics and its applications. Computer algebra provides powerful tools for the study of algorithms and software. However, these tools are rarely prepared to address statistical challenges and therefore new algebraic results need often be developed. This way of interplay between algebra and statistics fertilizes both disciplines. Algebraic statistics is a relativ...

  2. The power and statistical behaviour of allele-sharing statistics when applied to models with two disease loci

    Indian Academy of Sciences (India)

    Yin Y. Shugart; Bing-Jian Feng; Andrew Collins

    2002-11-01

    We have evaluated the power for detecting a common trait determined by two loci, using seven statistics, of which five are implemented in the computer program SimWalk2, and two are implemented in GENEHUNTER. Unlike most previous reports which involve evaluations of the power of allele-sharing statistics for a single disease locus, we have used a simulated data set of general pedigrees in which a two-locus disease is segregating and evaluated several non-parametric linkage statistics implemented in the two programs. We found that the power for detecting linkage using the $S_{\\text{all}}$ statistic in GENEHUNTER (GH, version 2.1), implemented as statistic in SimWalk2 (version 2.82), is different in the two. The values associated with statistic output by SimWalk2 are consistently more conservative than those from GENEHUNTER except when the underlying model includes heterogeneity at a level of 50% where the values output are very comparable. On the other hand, when the thresholds are determined empirically under the null hypothesis, $S_{\\text{all}}$ in GENEHUNTER and statistic have similar power.

  3. Applied multivariate statistics with R

    CERN Document Server

    Zelterman, Daniel

    2015-01-01

    This book brings the power of multivariate statistics to graduate-level practitioners, making these analytical methods accessible without lengthy mathematical derivations. Using the open source, shareware program R, Professor Zelterman demonstrates the process and outcomes for a wide array of multivariate statistical applications. Chapters cover graphical displays, linear algebra, univariate, bivariate and multivariate normal distributions, factor methods, linear regression, discrimination and classification, clustering, time series models, and additional methods. Zelterman uses practical examples from diverse disciplines to welcome readers from a variety of academic specialties. Those with backgrounds in statistics will learn new methods while they review more familiar topics. Chapters include exercises, real data sets, and R implementations. The data are interesting, real-world topics, particularly from health and biology-related contexts. As an example of the approach, the text examines a sample from the B...

  4. Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities

    KAUST Repository

    Nielsen, Frank

    2016-12-09

    Information-theoreticmeasures, such as the entropy, the cross-entropy and the Kullback-Leibler divergence between two mixture models, are core primitives in many signal processing tasks. Since the Kullback-Leibler divergence of mixtures provably does not admit a closed-form formula, it is in practice either estimated using costly Monte Carlo stochastic integration, approximated or bounded using various techniques. We present a fast and generic method that builds algorithmically closed-form lower and upper bounds on the entropy, the cross-entropy, the Kullback-Leibler and the α-divergences of mixtures. We illustrate the versatile method by reporting our experiments for approximating the Kullback-Leibler and the α-divergences between univariate exponential mixtures, Gaussian mixtures, Rayleigh mixtures and Gamma mixtures.

  5. Data quality assurance in monitoring of wastewater quality: Univariate on-line and off-line methods

    DEFF Research Database (Denmark)

    Alferes, J.; Poirier, P.; Lamaire-Chad, C.;

    To make water quality monitoring networks useful for practice, the automation of data collection and data validation still represents an important challenge. Efficient monitoring depends on careful quality control and quality assessment. With a practical orientation a data quality assurance...... procedure is presented that combines univariate off-line and on-line methods to assess water quality sensors and to detect and replace doubtful data. While the off-line concept uses control charts for quality control, the on-line methods aim at outlier and fault detection by using autoregressive models....... The proposed tools were successfully tested with data sets collected at the inlet of a primary clarifier,where probably the toughest measurement conditions are found in wastewater treatment plants....

  6. Evaluation of standard and advanced preprocessing methods for the univariate analysis of blood serum 1H-NMR spectra.

    Science.gov (United States)

    De Meyer, Tim; Sinnaeve, Davy; Van Gasse, Bjorn; Rietzschel, Ernst-R; De Buyzere, Marc L; Langlois, Michel R; Bekaert, Sofie; Martins, José C; Van Criekinge, Wim

    2010-10-01

    Proton nuclear magnetic resonance ((1)H-NMR)-based metabolomics enables the high-resolution and high-throughput assessment of a broad spectrum of metabolites in biofluids. Despite the straightforward character of the experimental methodology, the analysis of spectral profiles is rather complex, particularly due to the requirement of numerous data preprocessing steps. Here, we evaluate how several of the most common preprocessing procedures affect the subsequent univariate analyses of blood serum spectra, with a particular focus on how the standard methods perform compared to more advanced examples. Carr-Purcell-Meiboom-Gill 1D (1)H spectra were obtained for 240 serum samples from healthy subjects of the Asklepios study. We studied the impact of different preprocessing steps--integral (standard method) and probabilistic quotient normalization; no, equidistant (standard), and adaptive-intelligent binning; mean (standard) and maximum bin intensity data summation--on the resonance intensities of three different types of metabolites: triglycerides, glucose, and creatinine. The effects were evaluated by correlating the differently preprocessed NMR data with the independently measured metabolite concentrations. The analyses revealed that the standard methods performed inferiorly and that a combination of probabilistic quotient normalization after adaptive-intelligent binning and maximum intensity variable definition yielded the best overall results (triglycerides, R = 0.98; glucose, R = 0.76; creatinine, R = 0.70). Therefore, at least in the case of serum metabolomics, these or equivalent methods should be preferred above the standard preprocessing methods, particularly for univariate analyses. Additional optimization of the normalization procedure might further improve the analyses.

  7. Evaluation of statistical tools used in short-term repeated dose administration toxicity studies with rodents.

    Science.gov (United States)

    Kobayashi, Katsumi; Pillai, K Sadasivan; Sakuratani, Yuki; Abe, Takemaru; Kamata, Eiichi; Hayashi, Makoto

    2008-02-01

    In order to know the different statistical tools used to analyze the data obtained from twenty-eight-day repeated dose oral toxicity studies with rodents and the impact of these statistical tools on interpretation of data obtained from the studies, study reports of 122 numbers of twenty-eight-day repeated dose oral toxicity studies conducted in rats were examined. It was found that both complex and easy routes of decision trees were followed for the analysis of the quantitative data. These tools include Scheffe's test, non-parametric type Dunnett's and Scheffe's tests with very low power. Few studies used the non-parametric Dunnett type test and Mann-Whitney's U test. Though Chi-square and Fisher's tests are widely used for analysis of qualitative data, their sensitivity to detect a treatment-related effect is questionable. Mann-Whitney's U test has better sensitivity to analyze qualitative data than the chi-square and Fisher's tests. We propose Dunnett's test for analysis of quantitative data obtained from twenty-eight-day repeated dose oral toxicity tests and for qualitative data, Mann-Whitney's U test. For both tests, one-sided test with p=0.05 may be applied.

  8. Bayesian statistics

    OpenAIRE

    新家, 健精

    2013-01-01

    © 2012 Springer Science+Business Media, LLC. All rights reserved. Article Outline: Glossary Definition of the Subject and Introduction The Bayesian Statistical Paradigm Three Examples Comparison with the Frequentist Statistical Paradigm Future Directions Bibliography

  9. Statistical Analysis for High-Dimensional Data : The Abel Symposium 2014

    CERN Document Server

    Bühlmann, Peter; Glad, Ingrid; Langaas, Mette; Richardson, Sylvia; Vannucci, Marina

    2016-01-01

    This book features research contributions from The Abel Symposium on Statistical Analysis for High Dimensional Data, held in Nyvågar, Lofoten, Norway, in May 2014. The focus of the symposium was on statistical and machine learning methodologies specifically developed for inference in “big data” situations, with particular reference to genomic applications. The contributors, who are among the most prominent researchers on the theory of statistics for high dimensional inference, present new theories and methods, as well as challenging applications and computational solutions. Specific themes include, among others, variable selection and screening, penalised regression, sparsity, thresholding, low dimensional structures, computational challenges, non-convex situations, learning graphical models, sparse covariance and precision matrices, semi- and non-parametric formulations, multiple testing, classification, factor models, clustering, and preselection. Highlighting cutting-edge research and casting light on...

  10. Harmonic statistics

    Science.gov (United States)

    Eliazar, Iddo

    2017-05-01

    The exponential, the normal, and the Poisson statistical laws are of major importance due to their universality. Harmonic statistics are as universal as the three aforementioned laws, but yet they fall short in their 'public relations' for the following reason: the full scope of harmonic statistics cannot be described in terms of a statistical law. In this paper we describe harmonic statistics, in their full scope, via an object termed harmonic Poisson process: a Poisson process, over the positive half-line, with a harmonic intensity. The paper reviews the harmonic Poisson process, investigates its properties, and presents the connections of this object to an assortment of topics: uniform statistics, scale invariance, random multiplicative perturbations, Pareto and inverse-Pareto statistics, exponential growth and exponential decay, power-law renormalization, convergence and domains of attraction, the Langevin equation, diffusions, Benford's law, and 1/f noise.

  11. Statistical physics

    CERN Document Server

    Sadovskii, Michael V

    2012-01-01

    This volume provides a compact presentation of modern statistical physics at an advanced level. Beginning with questions on the foundations of statistical mechanics all important aspects of statistical physics are included, such as applications to ideal gases, the theory of quantum liquids and superconductivity and the modern theory of critical phenomena. Beyond that attention is given to new approaches, such as quantum field theory methods and non-equilibrium problems.

  12. Statistical methods

    CERN Document Server

    Szulc, Stefan

    1965-01-01

    Statistical Methods provides a discussion of the principles of the organization and technique of research, with emphasis on its application to the problems in social statistics. This book discusses branch statistics, which aims to develop practical ways of collecting and processing numerical data and to adapt general statistical methods to the objectives in a given field.Organized into five parts encompassing 22 chapters, this book begins with an overview of how to organize the collection of such information on individual units, primarily as accomplished by government agencies. This text then

  13. Statistical optics

    CERN Document Server

    Goodman, Joseph W

    2015-01-01

    This book discusses statistical methods that are useful for treating problems in modern optics, and the application of these methods to solving a variety of such problems This book covers a variety of statistical problems in optics, including both theory and applications.  The text covers the necessary background in statistics, statistical properties of light waves of various types, the theory of partial coherence and its applications, imaging with partially coherent light, atmospheric degradations of images, and noise limitations in the detection of light. New topics have been introduced i

  14. Histoplasmosis Statistics

    Science.gov (United States)

    ... Foodborne, Waterborne, and Environmental Diseases Mycotic Diseases Branch Histoplasmosis Statistics Recommend on Facebook Tweet Share Compartir How common is histoplasmosis? In the United States, an estimated 60% to ...

  15. Statistical distributions

    CERN Document Server

    Forbes, Catherine; Hastings, Nicholas; Peacock, Brian J.

    2010-01-01

    A new edition of the trusted guide on commonly used statistical distributions Fully updated to reflect the latest developments on the topic, Statistical Distributions, Fourth Edition continues to serve as an authoritative guide on the application of statistical methods to research across various disciplines. The book provides a concise presentation of popular statistical distributions along with the necessary knowledge for their successful use in data modeling and analysis. Following a basic introduction, forty popular distributions are outlined in individual chapters that are complete with re

  16. Harmonic statistics

    Energy Technology Data Exchange (ETDEWEB)

    Eliazar, Iddo, E-mail: eliazar@post.tau.ac.il

    2017-05-15

    The exponential, the normal, and the Poisson statistical laws are of major importance due to their universality. Harmonic statistics are as universal as the three aforementioned laws, but yet they fall short in their ‘public relations’ for the following reason: the full scope of harmonic statistics cannot be described in terms of a statistical law. In this paper we describe harmonic statistics, in their full scope, via an object termed harmonic Poisson process: a Poisson process, over the positive half-line, with a harmonic intensity. The paper reviews the harmonic Poisson process, investigates its properties, and presents the connections of this object to an assortment of topics: uniform statistics, scale invariance, random multiplicative perturbations, Pareto and inverse-Pareto statistics, exponential growth and exponential decay, power-law renormalization, convergence and domains of attraction, the Langevin equation, diffusions, Benford’s law, and 1/f noise. - Highlights: • Harmonic statistics are described and reviewed in detail. • Connections to various statistical laws are established. • Connections to perturbation, renormalization and dynamics are established.

  17. What do differences between multi-voxel and univariate analysis mean? How subject-, voxel-, and trial-level variance impact fMRI analysis.

    Science.gov (United States)

    Davis, Tyler; LaRocque, Karen F; Mumford, Jeanette A; Norman, Kenneth A; Wagner, Anthony D; Poldrack, Russell A

    2014-08-15

    Multi-voxel pattern analysis (MVPA) has led to major changes in how fMRI data are analyzed and interpreted. Many studies now report both MVPA results and results from standard univariate voxel-wise analysis, often with the goal of drawing different conclusions from each. Because MVPA results can be sensitive to latent multidimensional representations and processes whereas univariate voxel-wise analysis cannot, one conclusion that is often drawn when MVPA and univariate results differ is that the activation patterns underlying MVPA results contain a multidimensional code. In the current study, we conducted simulations to formally test this assumption. Our findings reveal that MVPA tests are sensitive to the magnitude of voxel-level variability in the effect of a condition within subjects, even when the same linear relationship is coded in all voxels. We also find that MVPA is insensitive to subject-level variability in mean activation across an ROI, which is the primary variance component of interest in many standard univariate tests. Together, these results illustrate that differences between MVPA and univariate tests do not afford conclusions about the nature or dimensionality of the neural code. Instead, targeted tests of the informational content and/or dimensionality of activation patterns are critical for drawing strong conclusions about the representational codes that are indicated by significant MVPA results.

  18. Scan Statistics

    CERN Document Server

    Glaz, Joseph

    2009-01-01

    Suitable for graduate students and researchers in applied probability and statistics, as well as for scientists in biology, computer science, pharmaceutical science and medicine, this title brings together a collection of chapters illustrating the depth and diversity of theory, methods and applications in the area of scan statistics.

  19. Statistical Diversions

    Science.gov (United States)

    Petocz, Peter; Sowey, Eric

    2008-01-01

    In this article, the authors focus on hypothesis testing--that peculiarly statistical way of deciding things. Statistical methods for testing hypotheses were developed in the 1920s and 1930s by some of the most famous statisticians, in particular Ronald Fisher, Jerzy Neyman and Egon Pearson, who laid the foundations of almost all modern methods of…

  20. Practical Statistics

    CERN Document Server

    Lyons, L

    2016-01-01

    Accelerators and detectors are expensive, both in terms of money and human effort. It is thus important to invest effort in performing a good statistical anal- ysis of the data, in order to extract the best information from it. This series of five lectures deals with practical aspects of statistical issues that arise in typical High Energy Physics analyses.

  1. SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

    Science.gov (United States)

    Chu, Annie; Cui, Jenny; Dinov, Ivo D

    2009-03-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most

  2. Predicting rice hybrid performance using univariate and multivariate GBLUP models based on North Carolina mating design II.

    Science.gov (United States)

    Wang, X; Li, L; Yang, Z; Zheng, X; Yu, S; Xu, C; Hu, Z

    2017-03-01

    Genomic selection (GS) is more efficient than traditional phenotype-based methods in hybrid breeding. The present study investigated the predictive ability of genomic best linear unbiased prediction models for rice hybrids based on the North Carolina mating design II, in which a total of 115 inbred rice lines were crossed with 5 male sterile lines. Using 8 traits of the 575 (115 × 5) hybrids from two environments, both univariate (UV) and multivariate (MV) prediction analyses, including additive and dominance effects, were performed. Using UV models, the prediction results of cross-validation indicated that including dominance effects could improve the predictive ability for some traits in rice hybrids. Additionally, we could take advantage of GS even for a low-heritability trait, such as grain yield per plant (GY), because a modest increase in the number of top selection could generate a higher, more stable mean phenotypic value for rice hybrids. Thus this strategy was used to select superior potential crosses between the 115 inbred lines and those between the 5 male sterile lines and other genotyped varieties. In our MV research, an MV model (MV-ADV) was developed utilizing a MV relationship matrix constructed with auxiliary variates. Based on joint analysis with multi-trait (MT) or with multi-environment, the prediction results confirmed the superiority of MV-ADV over an UV model, particularly in the MT scenario for a low-heritability target trait (such as GY), with highly correlated auxiliary traits. For a high-heritability trait (such as thousand-grain weight), MT prediction is unnecessary, and UV prediction is sufficient.

  3. Introductory statistics

    CERN Document Server

    Ross, Sheldon M

    2005-01-01

    In this revised text, master expositor Sheldon Ross has produced a unique work in introductory statistics. The text's main merits are the clarity of presentation, contemporary examples and applications from diverse areas, and an explanation of intuition and ideas behind the statistical methods. To quote from the preface, ""It is only when a student develops a feel or intuition for statistics that she or he is really on the path toward making sense of data."" Ross achieves this goal through a coherent mix of mathematical analysis, intuitive discussions and examples.* Ross's clear writin

  4. Introductory statistics

    CERN Document Server

    Ross, Sheldon M

    2010-01-01

    In this 3rd edition revised text, master expositor Sheldon Ross has produced a unique work in introductory statistics. The text's main merits are the clarity of presentation, contemporary examples and applications from diverse areas, and an explanation of intuition and ideas behind the statistical methods. Concepts are motivated, illustrated and explained in a way that attempts to increase one's intuition. To quote from the preface, ""It is only when a student develops a feel or intuition for statistics that she or he is really on the path toward making sense of data."" Ross achieves this

  5. Statistics Clinic

    Science.gov (United States)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  6. Statistical physics

    CERN Document Server

    Wannier, Gregory H

    2010-01-01

    Until recently, the field of statistical physics was traditionally taught as three separate subjects: thermodynamics, statistical mechanics, and kinetic theory. This text, a forerunner in its field and now a classic, was the first to recognize the outdated reasons for their separation and to combine the essentials of the three subjects into one unified presentation of thermal physics. It has been widely adopted in graduate and advanced undergraduate courses, and is recommended throughout the field as an indispensable aid to the independent study and research of statistical physics.Designed for

  7. Semiconductor statistics

    CERN Document Server

    Blakemore, J S

    1962-01-01

    Semiconductor Statistics presents statistics aimed at complementing existing books on the relationships between carrier densities and transport effects. The book is divided into two parts. Part I provides introductory material on the electron theory of solids, and then discusses carrier statistics for semiconductors in thermal equilibrium. Of course a solid cannot be in true thermodynamic equilibrium if any electrical current is passed; but when currents are reasonably small the distribution function is but little perturbed, and the carrier distribution for such a """"quasi-equilibrium"""" co

  8. SEER Statistics

    Science.gov (United States)

    The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute works to provide information on cancer statistics in an effort to reduce the burden of cancer among the U.S. population.

  9. Cancer Statistics

    Science.gov (United States)

    ... Resources Conducting Clinical Trials Statistical Tools and Data Terminology Resources NCI Data Catalog Cryo-EM NCI's Role ... Contacts Other Funding Find NCI funding for small business innovation, technology transfer, and contracts Training Cancer Training ...

  10. CMS Statistics

    Data.gov (United States)

    U.S. Department of Health & Human Services — The CMS Center for Strategic Planning produces an annual CMS Statistics reference booklet that provides a quick reference for summary information about health...

  11. Reversible Statistics

    DEFF Research Database (Denmark)

    Tryggestad, Kjell

    2004-01-01

    The study aims is to describe how the inclusion and exclusion of materials and calculative devices construct the boundaries and distinctions between statistical facts and artifacts in economics. My methodological approach is inspired by John Graunt's (1667) Political arithmetic and more recent work...... within constructivism and the field of Science and Technology Studies (STS). The result of this approach is here termed reversible statistics, reconstructing the findings of a statistical study within economics in three different ways. It is argued that all three accounts are quite normal, albeit...... in different ways. The presence and absence of diverse materials, both natural and political, is what distinguishes them from each other. Arguments are presented for a more symmetric relation between the scientific statistical text and the reader. I will argue that a more symmetric relation can be achieved...

  12. Image Statistics

    Energy Technology Data Exchange (ETDEWEB)

    Wendelberger, Laura Jean [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-08-08

    In large datasets, it is time consuming or even impossible to pick out interesting images. Our proposed solution is to find statistics to quantify the information in each image and use those to identify and pick out images of interest.

  13. Accident Statistics

    Data.gov (United States)

    Department of Homeland Security — Accident statistics available on the Coast Guard’s website by state, year, and one variable to obtain tables and/or graphs. Data from reports has been loaded for...

  14. Multiparametric statistics

    CERN Document Server

    Serdobolskii, Vadim Ivanovich

    2007-01-01

    This monograph presents mathematical theory of statistical models described by the essentially large number of unknown parameters, comparable with sample size but can also be much larger. In this meaning, the proposed theory can be called "essentially multiparametric". It is developed on the basis of the Kolmogorov asymptotic approach in which sample size increases along with the number of unknown parameters.This theory opens a way for solution of central problems of multivariate statistics, which up until now have not been solved. Traditional statistical methods based on the idea of an infinite sampling often break down in the solution of real problems, and, dependent on data, can be inefficient, unstable and even not applicable. In this situation, practical statisticians are forced to use various heuristic methods in the hope the will find a satisfactory solution.Mathematical theory developed in this book presents a regular technique for implementing new, more efficient versions of statistical procedures. ...

  15. Rainfall statistics changes in Sicily

    Directory of Open Access Journals (Sweden)

    E. Arnone

    2013-02-01

    Full Text Available Changes in rainfall characteristics are one of the most relevant signs of current climate alterations. Many studies have demonstrated an increase in rainfall intensity and a reduction of frequency in several areas of the world, including Mediterranean areas. Rainfall characteristics may be crucial for vegetation patterns formation and evolution in Mediterranean ecosystems, with important implications, for example, in vegetation water stress or coexistence and competition dynamics. At the same time, characteristics of extreme rainfall events are fundamental for the estimation of flood peaks and quantiles which can be used in many hydrological applications, such as design of the most common hydraulic structures, or planning and management of flood prone areas.

    In the past, Sicily has been screened for several signals of possible climate change. Annual, seasonal and monthly rainfall data in the entire Sicilian region have been analyzed, showing a global reduction of total annual rainfall. Moreover, annual maximum rainfall series for different durations have been rarely analyzed in order to detect the presence of trends. Results indicated that for short durations, historical series generally exhibit increasing trends while for longer durations the trends are mainly negative.

    Starting from these premises, the aim of this study is to investigate and quantify changes in rainfall statistics in Sicily, during the second half of the last century. Time series of about 60 stations over the region have been processed and screened by using the non parametric Mann–Kendall test.

    Particularly, extreme events have been analyzed using annual maximum rainfall series at 1, 3, 6, 12 and 24 h duration while daily rainfall properties have been analyzed in term of frequency and intensity, also characterizing seasonal rainfall features. Results of extreme events analysis confirmed an increasing trend for rainfall of short durations

  16. Use of beta regression for statistical downscaling of precipitation in the Campbell River basin, British Columbia, Canada

    Science.gov (United States)

    Mandal, Sohom; Srivastav, Roshan K.; Simonovic, Slobodan P.

    2016-07-01

    Impacts of global climate change on water resources systems are assessed by downscaling coarse scale climate variables into regional scale hydro-climate variables. In this study, a new multisite statistical downscaling method based on beta regression (BR) is developed for generating synthetic precipitation series, which can preserve temporal and spatial dependence along with other historical statistics. The beta regression based downscaling method includes two main steps: (1) prediction of precipitation states for the study area using classification and regression trees, and (2) generation of precipitation at different stations in the study area conditioned on the precipitation states. Daily precipitation data for 53 years from the ANUSPLIN data set is used to predict precipitation states of the study area where predictor variables are extracted from the NCEP/NCAR reanalysis data set for the same interval. The proposed model is applied to downscaling daily precipitation at ten different stations in the Campbell River basin, British Columbia, Canada. Results show that the proposed downscaling model can capture spatial and temporal variability of local precipitation very well at various locations. The performance of the model is compared with a recently developed non-parametric kernel regression based downscaling model. The BR model performs better regarding extrapolation compared to the non-parametric kernel regression model. Future precipitation changes under different GHG (greenhouse gas) emission scenarios also projected with the developed downscaling model that reveals a significant amount of changes in future seasonal precipitation and number of wet days in the river basin.

  17. 非参数回归中方差变点的小波检测%Detection of Change Points in Volatility of Non-Parametric Regression by Wavelets

    Institute of Scientific and Technical Information of China (English)

    王景乐; 郑明

    2012-01-01

    This paper studies the detection and estimation of change points in volatility under nonparametric regression models.Wavelet methods are applied to construct the test statistics which can be used to detect change points in volatility.The asymptotic distributions of the test statistics are established.We also utilize the test statistics to construct the estimators for the locations and jump sizes of the change points in volatility.The asymptotic properties of these estimators are derived.Some simulation studies are conducted to assess the finite sample performance of the proposed procedures.%本文主要研究了非参数回归模型中方差函数的变点,利用小波方法构造的检验量来检测方差中的变点,建立了这些检验量的渐近分布,并且运用这些检验量构造了方差变点的位置和跳跃幅度的估计,给出了这些估计的渐近性质,并进一步通过随机模拟验证了本文方法在有限样本下的性质.

  18. Statistical Modeling of Bivariate Data.

    Science.gov (United States)

    1982-08-01

    end identify by lock nsum br) joint density-quantile function, dependence-density, non-parametric bivariate density estimation, entropy , exponential...estimated, by autoregressive or exponential model estimators I with maximum entropy properties, is investigated in this thesis. The results provide...important and useful procedures for nonparametric bivariate density estimation. The thesis discusses estimators of the entropy H(d) of ul2) which seem to me

  19. Statistical mechanics

    CERN Document Server

    Jana, Madhusudan

    2015-01-01

    Statistical mechanics is self sufficient, written in a lucid manner, keeping in mind the exam system of the universities. Need of study this subject and its relation to Thermodynamics is discussed in detail. Starting from Liouville theorem gradually, the Statistical Mechanics is developed thoroughly. All three types of Statistical distribution functions are derived separately with their periphery of applications and limitations. Non-interacting ideal Bose gas and Fermi gas are discussed thoroughly. Properties of Liquid He-II and the corresponding models have been depicted. White dwarfs and condensed matter physics, transport phenomenon - thermal and electrical conductivity, Hall effect, Magneto resistance, viscosity, diffusion, etc. are discussed. Basic understanding of Ising model is given to explain the phase transition. The book ends with a detailed coverage to the method of ensembles (namely Microcanonical, canonical and grand canonical) and their applications. Various numerical and conceptual problems ar...

  20. Statistical mechanics

    CERN Document Server

    Schwabl, Franz

    2006-01-01

    The completely revised new edition of the classical book on Statistical Mechanics covers the basic concepts of equilibrium and non-equilibrium statistical physics. In addition to a deductive approach to equilibrium statistics and thermodynamics based on a single hypothesis - the form of the microcanonical density matrix - this book treats the most important elements of non-equilibrium phenomena. Intermediate calculations are presented in complete detail. Problems at the end of each chapter help students to consolidate their understanding of the material. Beyond the fundamentals, this text demonstrates the breadth of the field and its great variety of applications. Modern areas such as renormalization group theory, percolation, stochastic equations of motion and their applications to critical dynamics, kinetic theories, as well as fundamental considerations of irreversibility, are discussed. The text will be useful for advanced students of physics and other natural sciences; a basic knowledge of quantum mechan...

  1. Statistical inference

    CERN Document Server

    Rohatgi, Vijay K

    2003-01-01

    Unified treatment of probability and statistics examines and analyzes the relationship between the two fields, exploring inferential issues. Numerous problems, examples, and diagrams--some with solutions--plus clear-cut, highlighted summaries of results. Advanced undergraduate to graduate level. Contents: 1. Introduction. 2. Probability Model. 3. Probability Distributions. 4. Introduction to Statistical Inference. 5. More on Mathematical Expectation. 6. Some Discrete Models. 7. Some Continuous Models. 8. Functions of Random Variables and Random Vectors. 9. Large-Sample Theory. 10. General Meth

  2. Statistical Physics

    CERN Document Server

    Mandl, Franz

    1988-01-01

    The Manchester Physics Series General Editors: D. J. Sandiford; F. Mandl; A. C. Phillips Department of Physics and Astronomy, University of Manchester Properties of Matter B. H. Flowers and E. Mendoza Optics Second Edition F. G. Smith and J. H. Thomson Statistical Physics Second Edition E. Mandl Electromagnetism Second Edition I. S. Grant and W. R. Phillips Statistics R. J. Barlow Solid State Physics Second Edition J. R. Hook and H. E. Hall Quantum Mechanics F. Mandl Particle Physics Second Edition B. R. Martin and G. Shaw The Physics of Stars Second Edition A. C. Phillips Computing for Scient

  3. AP statistics

    CERN Document Server

    Levine-Wissing, Robin

    2012-01-01

    All Access for the AP® Statistics Exam Book + Web + Mobile Everything you need to prepare for the Advanced Placement® exam, in a study system built around you! There are many different ways to prepare for an Advanced Placement® exam. What's best for you depends on how much time you have to study and how comfortable you are with the subject matter. To score your highest, you need a system that can be customized to fit you: your schedule, your learning style, and your current level of knowledge. This book, and the online tools that come with it, will help you personalize your AP® Statistics prep

  4. Statistical methods

    CERN Document Server

    Freund, Rudolf J; Wilson, William J

    2010-01-01

    Statistical Methods, 3e provides students with a working introduction to statistical methods offering a wide range of applications that emphasize the quantitative skills useful across many academic disciplines. This text takes a classic approach emphasizing concepts and techniques for working out problems and intepreting results. The book includes research projects, real-world case studies, numerous examples and data exercises organized by level of difficulty. This text requires that a student be familiar with algebra. New to this edition: NEW expansion of exercises a

  5. Statistical mechanics

    CERN Document Server

    Davidson, Norman

    2003-01-01

    Clear and readable, this fine text assists students in achieving a grasp of the techniques and limitations of statistical mechanics. The treatment follows a logical progression from elementary to advanced theories, with careful attention to detail and mathematical development, and is sufficiently rigorous for introductory or intermediate graduate courses.Beginning with a study of the statistical mechanics of ideal gases and other systems of non-interacting particles, the text develops the theory in detail and applies it to the study of chemical equilibrium and the calculation of the thermody

  6. Statistics; Tilastot

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1998-12-31

    For the year 1997 and 1998, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually includes also historical time series over a longer period (see e.g. Energiatilastot 1997, Statistics Finland, Helsinki 1998, ISSN 0784-3165). The inside of the Review`s back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO{sub 2}-emissions, Electricity supply, Energy imports by country of origin in January-September 1998, Energy exports by recipient country in January-September 1998, Consumer prices of liquid fuels, Consumer prices of hard coal, Natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, Value added taxes and fiscal charges and fees included in consumer prices of some energy sources, Energy taxes and precautionary stock fees, pollution fees on oil products

  7. Statistics; Tilastot

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1998-12-31

    For the year 1997 and 1998, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually includes also historical time series over a longer period (see e.g. Energiatilastot 1996, Statistics Finland, Helsinki 1997, ISSN 0784-3165). The inside of the Review`s back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO{sub 2}-emissions, Electricity supply, Energy imports by country of origin in January-June 1998, Energy exports by recipient country in January-June 1998, Consumer prices of liquid fuels, Consumer prices of hard coal, Natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, Value added taxes and fiscal charges and fees included in consumer prices of some energy sources, Energy taxes and precautionary stock fees, pollution fees on oil products

  8. Statistical Mechancis

    CERN Document Server

    Gallavotti, Giovanni

    2011-01-01

    C. Cercignani: A sketch of the theory of the Boltzmann equation.- O.E. Lanford: Qualitative and statistical theory of dissipative systems.- E.H. Lieb: many particle Coulomb systems.- B. Tirozzi: Report on renormalization group.- A. Wehrl: Basic properties of entropy in quantum mechanics.

  9. Parametric statistical change point analysis

    CERN Document Server

    Chen, Jie

    2000-01-01

    This work is an in-depth study of the change point problem from a general point of view and a further examination of change point analysis of the most commonly used statistical models Change point problems are encountered in such disciplines as economics, finance, medicine, psychology, signal processing, and geology, to mention only several The exposition is clear and systematic, with a great deal of introductory material included Different models are presented in each chapter, including gamma and exponential models, rarely examined thus far in the literature Other models covered in detail are the multivariate normal, univariate normal, regression, and discrete models Extensive examples throughout the text emphasize key concepts and different methodologies are used, namely the likelihood ratio criterion, and the Bayesian and information criterion approaches A comprehensive bibliography and two indices complete the study

  10. Comparison of univariate and multivariate models for prediction of major and minor elements from laser-induced breakdown spectra with and without masking

    Science.gov (United States)

    Dyar, M. Darby; Fassett, Caleb I.; Giguere, Stephen; Lepore, Kate; Byrne, Sarah; Boucher, Thomas; Carey, CJ; Mahadevan, Sridhar

    2016-09-01

    This study uses 1356 spectra from 452 geologically-diverse samples, the largest suite of LIBS rock spectra ever assembled, to compare the accuracy of elemental predictions in models that use only spectral regions thought to contain peaks arising from the element of interest versus those that use information in the entire spectrum. Results show that for the elements Si, Al, Ti, Fe, Mg, Ca, Na, K, Ni, Mn, Cr, Co, and Zn, univariate predictions based on single emission lines are by far the least accurate, no matter how carefully the region of channels/wavelengths is chosen and despite the prominence of the selected emission lines. An automated iterative algorithm was developed to sweep through all 5485 channels of data and select the single region that produces the optimal prediction accuracy for each element using univariate analysis. For the eight major elements, use of this technique results in a 35% improvement in prediction accuracy; for minors, the improvement is 13%. The best wavelength region choice for any given univariate analysis is likely to be an inherent property of the specific training set that cannot be generalized. In comparison, multivariate analysis using partial least-squares (PLS) almost universally outperforms univariate analysis. PLS using all the same wavelength regions from the univariate analysis produces results that improve in accuracy by 63% for major elements and 3% for minor element. This difference is likely a reflection of signal to noise ratios, which are far better for major elements than for minor elements, and likely limit their prediction accuracy by any technique. We also compare predictions using specific wavelength ranges for each element against those employing all channels. Masking out channels to focus on emission lines from a specific element that occurs decreases prediction accuracy for major elements but is useful for minor elements with low signals and proportionally much higher noise; use of PLS rather than univariate

  11. Experimental statistics

    CERN Document Server

    Natrella, Mary Gibbons

    2005-01-01

    Formulated to assist scientists and engineers engaged in army ordnance research and development programs, this well-known and highly regarded handbook is a ready reference for advanced undergraduate and graduate students as well as for professionals seeking engineering information and quantitative data for designing, developing, constructing, and testing equipment. Topics include characterizing and comparing the measured performance of a material, product, or process; general considerations in planning experiments; statistical techniques for analyzing extreme-value data; use of transformations

  12. Arc Statistics

    CERN Document Server

    Meneghetti, M; Dahle, H; Limousin, M

    2013-01-01

    The existence of an arc statistics problem was at the center of a strong debate in the last fifteen years. With the aim to clarify if the optical depth for giant gravitational arcs by galaxy clusters in the so called concordance model is compatible with observations, several studies were carried out which helped to significantly improve our knowledge of strong lensing clusters, unveiling their extremely complex internal structure. In particular, the abundance and the frequency of strong lensing events like gravitational arcs turned out to be a potentially very powerful tool to trace the structure formation. However, given the limited size of observational and theoretical data-sets, the power of arc statistics as a cosmological tool has been only minimally exploited so far. On the other hand, the last years were characterized by significant advancements in the field, and several cluster surveys that are ongoing or planned for the near future seem to have the potential to make arc statistics a competitive cosmo...

  13. A Comparative Analysis of Multivariate Statistical Detection Methods Applied to Syndromic Surveillance

    Science.gov (United States)

    2007-06-01

    the observed system. Our research involved a comparative analysis of two multivariate statistical methods, the multivariate CUSUM (MCUSUM) and the...outbreaks. We found that, similar to results for the univariate CUSUM and EWMA, the directionally-sensitive MCUSUM and MEWMA perform very similarly. 14...SUBJECT TERMS Biosurveillance, Multivariate CUSUM , Multivariate EWMA, Statistical Process Control, Syndromic Surveillance 15. NUMBER OF PAGES

  14. Reciprocal Benefits of Mass-Univariate and Multivariate Modeling in Brain Mapping: Applications to Event-Related Functional MRI, H215O-, and FDG-PET

    Directory of Open Access Journals (Sweden)

    James R. Moeller

    2006-01-01

    Full Text Available In brain mapping studies of sensory, cognitive, and motor operations, specific waveforms of dynamic neural activity are predicted based on theoretical models of human information processing. For example in event-related functional MRI (fMRI, the general linear model (GLM is employed in mass-univariate analyses to identify the regions whose dynamic activity closely matches the expected waveforms. By comparison multivariate analyses based on PCA or ICA provide greater flexibility in detecting spatiotemporal properties of experimental data that may strongly support alternative neuroscientific explanations. We investigated conjoint multivariate and mass-univariate analyses that combine the capabilities to (1 verify activation of neural machinery we already understand and (2 discover reliable signatures of new neural machinery. We examined combinations of GLM and PCA that recover latent neural signals (waveforms and footprints with greater accuracy than either method alone. Comparative results are illustrated with analyses of real fMRI data, adding to Monte Carlo simulation support.

  15. Assessment of brown trout habitat suitability in the Jucar River Basin (SPAIN): comparison of data-driven approaches with fuzzy-logic models and univariate suitability curves.

    Science.gov (United States)

    Muñoz-Mas, Rafael; Martínez-Capel, Francisco; Schneider, Matthias; Mouton, Ans M

    2012-12-01

    The implementation of the Water Framework Directive implies the determination of an environmental flow (E-flow) in each running water body. In Spain, many of the minimum flow assessments were determined with the physical habitat simulation system based on univariate habitat suitability curves. Multivariate habitat suitability models, widely applied in habitat assessment, are potentially more accurate than univariate suitability models. This article analyses the microhabitat selection by medium-sized (10-20 cm) brown trout (Salmo trutta fario) in three streams of the Jucar River Basin District (eastern Iberian Peninsula). The data were collected with an equal effort sampling approach. Univariate habitat suitability curves were built with a data-driven process for depth, mean velocity and substrate classes; three types of data-driven fuzzy models were generated with the FISH software: two models of presence-absence and a model of abundance. FISH applies a hill-climbing algorithm to optimize the fuzzy rules. A hydraulic model was calibrated with the tool River-2D in a segment of the Cabriel River (Jucar River Basin). The fuzzy-logic models and three methods to produce a suitability index from the three univariate curves were applied to evaluate the river habitat in the tool CASiMiR©. The comparison of results was based on the spatial arrangement of habitat suitability and the curves of weighted usable area versus discharge. The differences were relevant in different aspects, e.g. in the estimated minimum environmental flow according to the Spanish legal norm for hydrological planning. This work demonstrates the impact of the model's selection on the habitat suitability modelling and the assessment of environmental flows, based on an objective data-driven procedure; the conclusions are important for the water management in the Jucar River Basin and other river systems in Europe, where the environmental flows are a keystone for the achievement of the goals established

  16. 1st Conference of the International Society for Nonparametric Statistics

    CERN Document Server

    Lahiri, S; Politis, Dimitris

    2014-01-01

    This volume is composed of peer-reviewed papers that have developed from the First Conference of the International Society for NonParametric Statistics (ISNPS). This inaugural conference took place in Chalkidiki, Greece, June 15-19, 2012. It was organized with the co-sponsorship of the IMS, the ISI, and other organizations. M.G. Akritas, S.N. Lahiri, and D.N. Politis are the first executive committee members of ISNPS, and the editors of this volume. ISNPS has a distinguished Advisory Committee that includes Professors R.Beran, P.Bickel, R. Carroll, D. Cook, P. Hall, R. Johnson, B. Lindsay, E. Parzen, P. Robinson, M. Rosenblatt, G. Roussas, T. SubbaRao, and G. Wahba. The Charting Committee of ISNPS consists of more than 50 prominent researchers from all over the world.   The chapters in this volume bring forth recent advances and trends in several areas of nonparametric statistics. In this way, the volume facilitates the exchange of research ideas, promotes collaboration among researchers from all over the wo...

  17. Statistical Scalability Analysis of Communication Operations in Distributed Applications

    Energy Technology Data Exchange (ETDEWEB)

    Vetter, J S; McCracken, M O

    2001-02-27

    Current trends in high performance computing suggest that users will soon have widespread access to clusters of multiprocessors with hundreds, if not thousands, of processors. This unprecedented degree of parallelism will undoubtedly expose scalability limitations in existing applications, where scalability is the ability of a parallel algorithm on a parallel architecture to effectively utilize an increasing number of processors. Users will need precise and automated techniques for detecting the cause of limited scalability. This paper addresses this dilemma. First, we argue that users face numerous challenges in understanding application scalability: managing substantial amounts of experiment data, extracting useful trends from this data, and reconciling performance information with their application's design. Second, we propose a solution to automate this data analysis problem by applying fundamental statistical techniques to scalability experiment data. Finally, we evaluate our operational prototype on several applications, and show that statistical techniques offer an effective strategy for assessing application scalability. In particular, we find that non-parametric correlation of the number of tasks to the ratio of the time for individual communication operations to overall communication time provides a reliable measure for identifying communication operations that scale poorly.

  18. USING ARTIFICIAL NEURAL NETWORKS AS STATISTICAL TOOLS FOR ANALYSIS OF MEDICAL DATA

    Directory of Open Access Journals (Sweden)

    ANOUSHIRAVAN KAZEMNEZHAD

    2003-06-01

    Full Text Available Introduction: Artificial neural networks mimic brains behavior. They are able to predict and feature recognition and classification. Therefore, neural networks seem to serious rivals for statistical models like regression and discriminant analysis. Methods: We have introduced biological neuron and generalized their function for artificial neurons and described back propagation error algoritm for training of networks in details. Result: Based on two simulated data and one real data we built neural networks by using back propagation and compared them by regression models. Discussion: Neural networks can be considered as a non parametric method for data modeling and seem that they are potentially. more powerful than regression for modeling, but more ambiguous in notation.

  19. Statistical analysis used in the nutritional assessment of novel food using the proof of safety.

    Science.gov (United States)

    Hothorn, Ludwig A; Oberdoerfer, Regina

    2006-03-01

    The safety assessment of Novel Food, including GM biotechnology-derived crops, starts with the comparison of the Novel Food with a traditional counterpart that is generally accepted as safe based on a history of human food use. Substantial equivalence is established if no meaningful difference from the conventional counterpart was found, leading to the conclusion that the Novel Food is as safe and nutritious as its traditional counterpart. In general, the non-significance of p value is used for the proof of safety. From a statistical perspective, the problems connected with such an approach are demonstrated, namely that quite different component-specific false negative error rates result. As an alternative, the proof of safety is discussed with the inherently related definition of safety thresholds. Moreover, parametric and non-parametric confidence intervals for the difference and the ratio to control (conventional line) are described in detail. Finally, the treatment of multiple components for a global proof of safety is explained.

  20. Depth statistics

    OpenAIRE

    2012-01-01

    In 1975 John Tukey proposed a multivariate median which is the 'deepest' point in a given data cloud in R^d. Later, in measuring the depth of an arbitrary point z with respect to the data, David Donoho and Miriam Gasko considered hyperplanes through z and determined its 'depth' by the smallest portion of data that are separated by such a hyperplane. Since then, these ideas has proved extremely fruitful. A rich statistical methodology has developed that is based on data depth and, more general...

  1. Statistical mechanics

    CERN Document Server

    Sheffield, Scott

    2009-01-01

    In recent years, statistical mechanics has been increasingly recognized as a central domain of mathematics. Major developments include the Schramm-Loewner evolution, which describes two-dimensional phase transitions, random matrix theory, renormalization group theory and the fluctuations of random surfaces described by dimers. The lectures contained in this volume present an introduction to recent mathematical progress in these fields. They are designed for graduate students in mathematics with a strong background in analysis and probability. This book will be of particular interest to graduate students and researchers interested in modern aspects of probability, conformal field theory, percolation, random matrices and stochastic differential equations.

  2. Integration of association statistics over genomic regions using Bayesian adaptive regression splines

    Directory of Open Access Journals (Sweden)

    Zhang Xiaohua

    2003-11-01

    Full Text Available Abstract In the search for genetic determinants of complex disease, two approaches to association analysis are most often employed, testing single loci or testing a small group of loci jointly via haplotypes for their relationship to disease status. It is still debatable which of these approaches is more favourable, and under what conditions. The former has the advantage of simplicity but suffers severely when alleles at the tested loci are not in linkage disequilibrium (LD with liability alleles; the latter should capture more of the signal encoded in LD, but is far from simple. The complexity of haplotype analysis could be especially troublesome for association scans over large genomic regions, which, in fact, is becoming the standard design. For these reasons, the authors have been evaluating statistical methods that bridge the gap between single-locus and haplotype-based tests. In this article, they present one such method, which uses non-parametric regression techniques embodied by Bayesian adaptive regression splines (BARS. For a set of markers falling within a common genomic region and a corresponding set of single-locus association statistics, the BARS procedure integrates these results into a single test by examining the class of smooth curves consistent with the data. The non-parametric BARS procedure generally finds no signal when no liability allele exists in the tested region (ie it achieves the specified size of the test and it is sensitive enough to pick up signals when a liability allele is present. The BARS procedure provides a robust and potentially powerful alternative to classical tests of association, diminishes the multiple testing problem inherent in those tests and can be applied to a wide range of data types, including genotype frequencies estimated from pooled samples.

  3. Statistical time series methods for damage diagnosis in a scale aircraft skeleton structure: loosened bolts damage scenarios

    Energy Technology Data Exchange (ETDEWEB)

    Kopsaftopoulos, Fotis P; Fassois, Spilios D, E-mail: fkopsaf@mech.upatras.gr, E-mail: fassois@mech.upatras.gr [Stochastic Mechanical Systems and Automation (SMSA) Laboratory Department of Mechanical and Aeronautical Engineering University of Patras, GR 265 00 Patras (Greece)

    2011-07-19

    A comparative assessment of several vibration based statistical time series methods for Structural Health Monitoring (SHM) is presented via their application to a scale aircraft skeleton laboratory structure. A brief overview of the methods, which are either scalar or vector type, non-parametric or parametric, and pertain to either the response-only or excitation-response cases, is provided. Damage diagnosis, including both the detection and identification subproblems, is tackled via scalar or vector vibration signals. The methods' effectiveness is assessed via repeated experiments under various damage scenarios, with each scenario corresponding to the loosening of one or more selected bolts. The results of the study confirm the 'global' damage detection capability and effectiveness of statistical time series methods for SHM.

  4. Statistical Neurodynamics.

    Science.gov (United States)

    Paine, Gregory Harold

    1982-03-01

    The primary objective of the thesis is to explore the dynamical properties of small nerve networks by means of the methods of statistical mechanics. To this end, a general formalism is developed and applied to elementary groupings of model neurons which are driven by either constant (steady state) or nonconstant (nonsteady state) forces. Neuronal models described by a system of coupled, nonlinear, first-order, ordinary differential equations are considered. A linearized form of the neuronal equations is studied in detail. A Lagrange function corresponding to the linear neural network is constructed which, through a Legendre transformation, provides a constant of motion. By invoking the Maximum-Entropy Principle with the single integral of motion as a constraint, a probability distribution function for the network in a steady state can be obtained. The formalism is implemented for some simple networks driven by a constant force; accordingly, the analysis focuses on a study of fluctuations about the steady state. In particular, a network composed of N noninteracting neurons, termed Free Thinkers, is considered in detail, with a view to interpretation and numerical estimation of the Lagrange multiplier corresponding to the constant of motion. As an archetypical example of a net of interacting neurons, the classical neural oscillator, consisting of two mutually inhibitory neurons, is investigated. It is further shown that in the case of a network driven by a nonconstant force, the Maximum-Entropy Principle can be applied to determine a probability distribution functional describing the network in a nonsteady state. The above examples are reconsidered with nonconstant driving forces which produce small deviations from the steady state. Numerical studies are performed on simplified models of two physical systems: the starfish central nervous system and the mammalian olfactory bulb. Discussions are given as to how statistical neurodynamics can be used to gain a better

  5. Imaging of Osteoarthritic Human Articular Cartilage using Fourier Transform Infrared Microspectroscopy Combined with Multivariate and Univariate Analysis.

    Science.gov (United States)

    Oinas, J; Rieppo, L; Finnilä, M A J; Valkealahti, M; Lehenkari, P; Saarakkala, S

    2016-07-21

    The changes in chemical composition of human articular cartilage (AC) caused by osteoarthritis (OA) were investigated using Fourier transform infrared microspectroscopy (FTIR-MS). We demonstrate the sensitivity of FTIR-MS for monitoring compositional changes that occur with OA progression. Twenty-eight AC samples from tibial plateaus were imaged with FTIR-MS. Hyperspectral images of all samples were combined for K-means clustering. Partial least squares regression (PLSR) analysis was used to compare the spectra with the OARSI grade (histopathological grading of OA). Furthermore, the amide I and the carbohydrate regions were used to estimate collagen and proteoglycan contents, respectively. Spectral peak at 1338 cm(-1) was used to estimate the integrity of the collagen network. The layered structure of AC was revealed using the carbohydrate region for clustering. Statistically significant correlation was observed between the OARSI grade and the collagen integrity in the superficial (r = -0.55) and the deep (r = -0.41) zones. Furthermore, PLSR models predicted the OARSI grade from the superficial (r = 0.94) and the deep (r = 0.77) regions of the AC with high accuracy. Obtained results suggest that quantitative and qualitative changes occur in the AC composition during OA progression, and these can be monitored by the use of FTIR-MS.

  6. Comparison of spectrum normalization techniques for univariate analysis of stainless steel by laser-induced breakdown spectroscopy

    Indian Academy of Sciences (India)

    KARKI VIJAY; SARKAR ARNAB; SINGH MANJEET; MAURYA GULAB SINGH; KUMAR ROHIT; RAI AWADHESH KUMAR; AGGARWAL SURESH KUMAR

    2016-06-01

    Analytical performance of six different spectrum normalization techniques, namelyinternal normalization, normalization with total light, normalization with background along with their three-point smoothing methods were studied using LIBS for quantification of Cr, Mn and Ni in stainless steel. Optimization of the number of laser shots per spectrum was carried out to obtain the best analytical results. Internal normalization technique model was used for selecting the bestemission lines having sufficient intensity and spectral purity for Cr, Mn and Ni for comparison of different normalization techniques. For detailed evaluation of these normalization techniques, under optimized experimental conditions, three statistical parameters i.e., standard error of prediction, relative standard deviation and average bias, were compared for these techniques using theselected emission lines. Results show that the internal normalization technique produces the best analytical results followed by total light normalization. The smoothing of the raw spectra reduces the random error and produces better analytical results provided the peak under study has sufficient $(\\ge 7)$ number of pixels.

  7. A procedure for the change point problem in parametric models based on phi-divergence test-statistics

    CERN Document Server

    Batsidis, Apostolos; Pardo, Leandro; Zografos, Konstantinos

    2011-01-01

    This paper studies the change point problem for a general parametric, univariate or multivariate family of distributions. An information theoretic procedure is developed which is based on general divergence measures for testing the hypothesis of the existence of a change. For comparing the accuracy of the new test-statistic a simulation study is performed for the special case of a univariate discrete model. Finally, the procedure proposed in this paper is illustrated through a classical change-point example.

  8. A handbook of statistical graphics using SAS ODS

    CERN Document Server

    Der, Geoff

    2014-01-01

    An Introduction to Graphics: Good Graphics, Bad Graphics, Catastrophic Graphics and Statistical GraphicsThe Challenger DisasterGraphical DisplaysA Little History and Some Early Graphical DisplaysGraphical DeceptionAn Introduction to ODS GraphicsGenerating ODS GraphsODS DestinationsStatistical Graphics ProceduresODS Graphs from Statistical ProceduresControlling ODS GraphicsControlling Labelling in GraphsODS Graphics EditorGraphs for Displaying the Characteristics of Univariate Data: Horse Racing, Mortality Rates, Forearm Lengths, Survival Times and Geyser EruptionsIntroductionPie Chart, Bar Cha

  9. Statistical analysis: the need, the concept, and the usage

    Directory of Open Access Journals (Sweden)

    Naduvilath Thomas

    1998-01-01

    Full Text Available In general, better understanding of the need and usage of statistics would benefit the medical community in India. This paper explains why statistical analysis is needed, and what is the conceptual basis for it. Ophthalmic data are used as examples. The concept of sampling variation is explained to further corroborate the need for statistical analysis in medical research. Statistical estimation and testing of hypothesis which form the major components of statistical inference are construed. Commonly reported univariate and multivariate statistical tests are explained in order to equip the ophthalmologist with basic knowledge of statistics for better understanding of research data. It is felt that this understanding would facilitate well designed investigations ultimately leading to higher quality practice of ophthalmology in our country.

  10. New Graphical Methods and Test Statistics for Testing Composite Normality

    Directory of Open Access Journals (Sweden)

    Marc S. Paolella

    2015-07-01

    Full Text Available Several graphical methods for testing univariate composite normality from an i.i.d. sample are presented. They are endowed with correct simultaneous error bounds and yield size-correct tests. As all are based on the empirical CDF, they are also consistent for all alternatives. For one test, called the modified stabilized probability test, or MSP, a highly simplified computational method is derived, which delivers the test statistic and also a highly accurate p-value approximation, essentially instantaneously. The MSP test is demonstrated to have higher power against asymmetric alternatives than the well-known and powerful Jarque-Bera test. A further size-correct test, based on combining two test statistics, is shown to have yet higher power. The methodology employed is fully general and can be applied to any i.i.d. univariate continuous distribution setting.

  11. SOLVING PROBLEMS OF STATISTICS WITH THE METHODS OF INFORMATION THEORY

    Directory of Open Access Journals (Sweden)

    Lutsenko Y. V.

    2015-02-01

    Full Text Available The article presents a theoretical substantiation, methods of numerical calculations and software implementation of the decision of problems of statistics, in particular the study of statistical distributions, methods of information theory. On the basis of empirical data by calculation we have determined the number of observations used for the analysis of statistical distributions. The proposed method of calculating the amount of information is not based on assumptions about the independence of observations and the normal distribution, i.e., is non-parametric and ensures the correct modeling of nonlinear systems, and also allows comparable to process heterogeneous (measured in scales of different types data numeric and non-numeric nature that are measured in different units. Thus, ASC-analysis and "Eidos" system is a modern innovation (ready for implementation technology solving problems of statistical methods of information theory. This article can be used as a description of the laboratory work in the disciplines of: intelligent systems; knowledge engineering and intelligent systems; intelligent technologies and knowledge representation; knowledge representation in intelligent systems; foundations of intelligent systems; introduction to neuromaturation and methods neural networks; fundamentals of artificial intelligence; intelligent technologies in science and education; knowledge management; automated system-cognitive analysis and "Eidos" intelligent system which the author is developing currently, but also in other disciplines associated with the transformation of data into information, and its transformation into knowledge and application of this knowledge to solve problems of identification, forecasting, decision making and research of the simulated subject area (which is virtually all subjects in all fields of science

  12. Confronting Passive and Active Sensors with Non-Gaussian Statistics

    Directory of Open Access Journals (Sweden)

    Pablo Rodríguez-Gonzálvez

    2014-07-01

    Full Text Available This paper has two motivations: firstly, to compare the Digital Surface Models (DSM derived by passive (digital camera and by active (terrestrial laser scanner remote sensing systems when applied to specific architectural objects, and secondly, to test how well the Gaussian classic statistics, with its Least Squares principle, adapts to data sets where asymmetrical gross errors may appear and whether this approach should be changed for a non-parametric one. The field of geomatic technology automation is immersed in a high demanding competition in which any innovation by one of the contenders immediately challenges the opponents to propose a better improvement. Nowadays, we seem to be witnessing an improvement of terrestrial photogrammetry and its integration with computer vision to overcome the performance limitations of laser scanning methods. Through this contribution some of the issues of this “technological race” are examined from the point of view of photogrammetry. A new software is introduced and an experimental test is designed, performed and assessed to try to cast some light on this thrilling match. For the case considered in this study, the results show good agreement between both sensors, despite considerable asymmetry. This asymmetry suggests that the standard Normal parameters are not adequate to assess this type of data, especially when accuracy is of importance. In this case, standard deviation fails to provide a good estimation of the results, whereas the results obtained for the Median Absolute Deviation and for the Biweight Midvariance are more appropriate measures.

  13. Non-parametric causal inference for bivariate time series

    CERN Document Server

    McCracken, James M

    2015-01-01

    We introduce new quantities for exploratory causal inference between bivariate time series. The quantities, called penchants and leanings, are computationally straightforward to apply, follow directly from assumptions of probabilistic causality, do not depend on any assumed models for the time series generating process, and do not rely on any embedding procedures; these features may provide a clearer interpretation of the results than those from existing time series causality tools. The penchant and leaning are computed based on a structured method for computing probabilities.

  14. Multi-Directional Non-Parametric Analysis of Agricultural Efficiency

    DEFF Research Database (Denmark)

    Balezentis, Tomas

    the Multi-Directional Efficiency Analysis approach, (iii) to account for uncertainties via the use of probabilistic and fuzzy measures. Therefore, the thesis encompass six papers dedicated to (the combinations of) these objectives. One of the main contributions of this thesis is a number of extensions...... relative to labour, intermediate consumption and land (in some cases land was not treated as a discretionary input). These findings call for further research on relationships among financial structure, investment decisions, and efficiency in Lithuanian family farms. Application of different techniques...

  15. Using non-parametric methods in econometric production analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb......-Douglas or the Translog production function is used. However, the specification of a functional form for the production function involves the risk of specifying a functional form that is not similar to the “true” relationship between the inputs and the output. This misspecification might result in biased estimation...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...

  16. Non-Parametric Bayesian State Space Estimator for Negative Information

    Directory of Open Access Journals (Sweden)

    Guillaume de Chambrier

    2017-09-01

    Full Text Available Simultaneous Localization and Mapping (SLAM is concerned with the development of filters to accurately and efficiently infer the state parameters (position, orientation, etc. of an agent and aspects of its environment, commonly referred to as the map. A mapping system is necessary for the agent to achieve situatedness, which is a precondition for planning and reasoning. In this work, we consider an agent who is given the task of finding a set of objects. The agent has limited perception and can only sense the presence of objects if a direct contact is made, as a result most of the sensing is negative information. In the absence of recurrent sightings or direct measurements of objects, there are no correlations from the measurement errors that can be exploited. This renders SLAM estimators, for which this fact is their backbone such as EKF-SLAM, ineffective. In addition for our setting, no assumptions are taken with respect to the marginals (beliefs of both the agent and objects (map. From the loose assumptions we stipulate regarding the marginals and measurements, we adopt a histogram parametrization. We introduce a Bayesian State Space Estimator (BSSE, which we name Measurement Likelihood Memory Filter (MLMF, in which the values of the joint distribution are not parametrized but instead we directly apply changes from the measurement integration step to the marginals. This is achieved by keeping track of the history of likelihood functions’ parameters. We demonstrate that the MLMF gives the same filtered marginals as a histogram filter and show two implementations: MLMF and scalable-MLMF that both have a linear space complexity. The original MLMF retains an exponential time complexity (although an order of magnitude smaller than the histogram filter while the scalable-MLMF introduced independence assumption such to have a linear time complexity. We further quantitatively demonstrate the scalability of our algorithm with 25 beliefs having up to 10,000,000 states each. In an Active-SLAM setting, we evaluate the impact that the size of the memory’s history has on the decision-theoretic process in a search scenario for a one-step look-ahead information gain planner. We report on both 1D and 2D experiments.

  17. Homothetic Efficiency and Test Power: A Non-Parametric Approach

    NARCIS (Netherlands)

    J. Heufer (Jan); P. Hjertstrand (Per)

    2015-01-01

    markdownabstract__Abstract__ We provide a nonparametric revealed preference approach to demand analysis based on homothetic efficiency. Homotheticity is a useful restriction but data rarely satisfies testable conditions. To overcome this we provide a way to estimate homothetic efficiency of

  18. Homothetic Efficiency and Test Power: A Non-Parametric Approach

    NARCIS (Netherlands)

    J. Heufer (Jan); P. Hjertstrand (Per)

    2015-01-01

    markdownabstract__Abstract__ We provide a nonparametric revealed preference approach to demand analysis based on homothetic efficiency. Homotheticity is a useful restriction but data rarely satisfies testable conditions. To overcome this we provide a way to estimate homothetic efficiency of consump

  19. A non-parametric 2D deformable template classifier

    DEFF Research Database (Denmark)

    Schultz, Nette; Nielsen, Allan Aasbjerg; Conradsen, Knut;

    2005-01-01

    We introduce an interactive segmentation method for a sea floor survey. The method is based on a deformable template classifier and is developed to segment data from an echo sounder post-processor called RoxAnn. RoxAnn collects two different measures for each observation point, and in this 2D...... feature space the ship-master will be able to interactively define a segmentation map, which is refined and optimized by the deformable template algorithms. The deformable templates are defined as two-dimensional vector-cycles. Local random transformations are applied to the vector-cycles, and stochastic...... relaxation in a Bayesian scheme is used. In the Bayesian likelihood a class density function and its estimate hereof is introduced, which is designed to separate the feature space. The method is verified on data collected in Øresund, Scandinavia. The data come from four geographically different areas. Two...

  20. A Non-parametric Analysis of Morbidity/Mortality Data

    Science.gov (United States)

    1998-11-01

    numjcov,cov,numout); %sumstd(denjcov,cov, denout ); /**The files numout and denout contain one observation each*** ***with variables sum and stdev...symput(’numsum’,compress(sum)); stop; run; data _null_; set denout ; 51 sig = stdev*stdev; call symput(’densig’,compress(sig)); call symput(’densum...covden,2,1,&rownum,&colnum,denjcov); %sumstd(numjcov,cov,numout); %sumstd(denjcov,cov, denout ); /**The files numout and denout contain one observation each

  1. Statistics for Patch Observations

    Science.gov (United States)

    Hingee, K. L.

    2016-06-01

    In the application of remote sensing it is common to investigate processes that generate patches of material. This is especially true when using categorical land cover or land use maps. Here we view some existing tools, landscape pattern indices (LPI), as non-parametric estimators of random closed sets (RACS). This RACS framework enables LPIs to be studied rigorously. A RACS is any random process that generates a closed set, which encompasses any processes that result in binary (two-class) land cover maps. RACS theory, and methods in the underlying field of stochastic geometry, are particularly well suited to high-resolution remote sensing where objects extend across tens of pixels, and the shapes and orientations of patches are symptomatic of underlying processes. For some LPI this field already contains variance information and border correction techniques. After introducing RACS theory we discuss the core area LPI in detail. It is closely related to the spherical contact distribution leading to conditional variants, a new version of contagion, variance information and multiple border-corrected estimators. We demonstrate some of these findings on high resolution tree canopy data.

  2. Is there much variation in variation? Revisiting statistics of small area variation in health services research

    Directory of Open Access Journals (Sweden)

    Ibáñez Berta

    2009-04-01

    Full Text Available Abstract Background The importance of Small Area Variation Analysis for policy-making contrasts with the scarcity of work on the validity of the statistics used in these studies. Our study aims at 1 determining whether variation in utilization rates between health areas is higher than would be expected by chance, 2 estimating the statistical power of the variation statistics; and 3 evaluating the ability of different statistics to compare the variability among different procedures regardless of their rates. Methods Parametric bootstrap techniques were used to derive the empirical distribution for each statistic under the hypothesis of homogeneity across areas. Non-parametric procedures were used to analyze the empirical distribution for the observed statistics and compare the results in six situations (low/medium/high utilization rates and low/high variability. A small scale simulation study was conducted to assess the capacity of each statistic to discriminate between different scenarios with different degrees of variation. Results Bootstrap techniques proved to be good at quantifying the difference between the null hypothesis and the variation observed in each situation, and to construct reliable tests and confidence intervals for each of the variation statistics analyzed. Although the good performance of Systematic Component of Variation (SCV, Empirical Bayes (EB statistic shows better behaviour under the null hypothesis, it is able to detect variability if present, it is not influenced by the procedure rate and it is best able to discriminate between different degrees of heterogeneity. Conclusion The EB statistics seems to be a good alternative to more conventional statistics used in small-area variation analysis in health service research because of its robustness.

  3. In situ calibration using univariate analyses based on the onboard ChemCam targets: first prediction of Martian rock and soil compositions

    Energy Technology Data Exchange (ETDEWEB)

    Fabre, C. [GeoRessources lab, Université de Lorraine, Nancy (France); Cousin, A.; Wiens, R.C. [Los Alamos National Laboratory, Los Alamos, NM (United States); Ollila, A. [University of NM, Albuquerque (United States); Gasnault, O.; Maurice, S. [IRAP, Toulouse (France); Sautter, V. [Museum National d' Histoire Naturelle, Paris (France); Forni, O.; Lasue, J. [IRAP, Toulouse (France); Tokar, R.; Vaniman, D. [Planetary Science Institute, Tucson, AZ (United States); Melikechi, N. [Delaware State University (United States)

    2014-09-01

    Curiosity rover landed on August 6th, 2012 in Gale Crater, Mars and it possesses unique analytical capabilities to investigate the chemistry and mineralogy of the Martian soil. In particular, the LIBS technique is being used for the first time on another planet with the ChemCam instrument, and more than 75,000 spectra have been returned in the first year on Mars. Curiosity carries body-mounted calibration targets specially designed for the ChemCam instrument, some of which are homgeneous glasses and others that are fine-grained glass-ceramics. We present direct calibrations, using these onboard standards to infer elements and element ratios by ratioing relative peak areas. As the laser spot size is around 300 μm, the LIBS technique provides measurements of the silicate glass compositions representing homogeneous material and measurements of the ceramic targets that are comparable to fine-grained rock or soil. The laser energy and the auto-focus are controlled for all sequences used for calibration. The univariate calibration curves present relatively to very good correlation coefficients with low RSDs for major and ratio calibrations. Trace element calibration curves (Li, Sr, and Mn), down to several ppm, can be used as a rapid tool to draw attention to remarkable rocks and soils along the traverse. First comparisons to alpha-particle X-ray spectroscopy (APXS) data, on selected targets, show good agreement for most elements and for Mg# and Al/Si estimates. SiO{sub 2} estimates using univariate cannot be yet used. Na{sub 2}O and K{sub 2}O estimates are relevant for high alkali contents, but probably under estimated due to the CCCT initial compositions. Very good results for CaO and Al{sub 2}O{sub 3} estimates and satisfactory results for FeO are obtained. - Highlights: • In situ LIBS univariate calibrations are done using the Curiosity onboard standards. • Major and minor element contents can be rapidly obtained. • Trace element contents can be used as a

  4. Monitoring endemic livestock diseases using laboratory diagnostic data: A simulation study to evaluate the performance of univariate process monitoring control algorithms.

    Science.gov (United States)

    Lopes Antunes, Ana Carolina; Dórea, Fernanda; Halasa, Tariq; Toft, Nils

    2016-05-01

    Surveillance systems are critical for accurate, timely monitoring and effective disease control. In this study, we investigated the performance of univariate process monitoring control algorithms in detecting changes in seroprevalence for endemic diseases. We also assessed the effect of sample size (number of sentinel herds tested in the surveillance system) on the performance of the algorithms. Three univariate process monitoring control algorithms were compared: Shewart p Chart(1) (PSHEW), Cumulative Sum(2) (CUSUM) and Exponentially Weighted Moving Average(3) (EWMA). Increases in seroprevalence were simulated from 0.10 to 0.15 and 0.20 over 4, 8, 24, 52 and 104 weeks. Each epidemic scenario was run with 2000 iterations. The cumulative sensitivity(4) (CumSe) and timeliness were used to evaluate the algorithms' performance with a 1% false alarm rate. Using these performance evaluation criteria, it was possible to assess the accuracy and timeliness of the surveillance system working in real-time. The results showed that EWMA and PSHEW had higher CumSe (when compared with the CUSUM) from week 1 until the end of the period for all simulated scenarios. Changes in seroprevalence from 0.10 to 0.20 were more easily detected (higher CumSe) than changes from 0.10 to 0.15 for all three algorithms. Similar results were found with EWMA and PSHEW, based on the median time to detection. Changes in the seroprevalence were detected later with CUSUM, compared to EWMA and PSHEW for the different scenarios. Increasing the sample size 10 fold halved the time to detection (CumSe=1), whereas increasing the sample size 100 fold reduced the time to detection by a factor of 6. This study investigated the performance of three univariate process monitoring control algorithms in monitoring endemic diseases. It was shown that automated systems based on these detection methods identified changes in seroprevalence at different times. Increasing the number of tested herds would lead to faster

  5. An Application of Multivariate Statistical Analysis for Query-Driven Visualization

    Energy Technology Data Exchange (ETDEWEB)

    Gosink, Luke J.; Garth, Christoph; Anderson, John C.; Bethel, E. Wes; Joy, Kenneth I.

    2010-03-01

    Abstract?Driven by the ability to generate ever-larger, increasingly complex data, there is an urgent need in the scientific community for scalable analysis methods that can rapidly identify salient trends in scientific data. Query-Driven Visualization (QDV) strategies are among the small subset of techniques that can address both large and highly complex datasets. This paper extends the utility of QDV strategies with a statistics-based framework that integrates non-parametric distribution estimation techniques with a new segmentation strategy to visually identify statistically significant trends and features within the solution space of a query. In this framework, query distribution estimates help users to interactively explore their query's solution and visually identify the regions where the combined behavior of constrained variables is most important, statistically, to their inquiry. Our new segmentation strategy extends the distribution estimation analysis by visually conveying the individual importance of each variable to these regions of high statistical significance. We demonstrate the analysis benefits these two strategies provide and show how they may be used to facilitate the refinement of constraints over variables expressed in a user's query. We apply our method to datasets from two different scientific domains to demonstrate its broad applicability.

  6. Homogeneity and change-point detection tests for multivariate data using rank statistics

    CERN Document Server

    Lung-Yut-Fong, Alexandre; Cappé, Olivier

    2011-01-01

    Detecting and locating changes in highly multivariate data is a major concern in several current statistical applications. In this context, the first contribution of the paper is a novel non-parametric two-sample homogeneity test for multivariate data based on the well-known Wilcoxon rank statistic. The proposed two-sample homogeneity test statistic can be extended to deal with ordinal or censored data as well as to test for the homogeneity of more than two samples. The second contribution of the paper concerns the use of the proposed test statistic to perform retrospective change-point analysis. It is first shown that the approach is computationally feasible even when looking for a large number of change-points thanks to the use of dynamic programming. Computable asymptotic $p$-values for the test are then provided in the case where a single potential change-point is to be detected. Compared to available alternatives, the proposed approach appears to be very reliable and robust. This is particularly true in ...

  7. Univariate and multivariate molecular spectral analyses of lipid related molecular structural components in relation to nutrient profile in feed and food mixtures

    Science.gov (United States)

    Abeysekara, Saman; Damiran, Daalkhaijav; Yu, Peiqiang

    2013-02-01

    The objectives of this study were (i) to determine lipid related molecular structures components (functional groups) in feed combination of cereal grain (barley, Hordeum vulgare) and wheat (Triticum aestivum) based dried distillers grain solubles (wheat DDGSs) from bioethanol processing at five different combination ratios using univariate and multivariate molecular spectral analyses with infrared Fourier transform molecular spectroscopy, and (ii) to correlate lipid-related molecular-functional structure spectral profile to nutrient profiles. The spectral intensity of (i) CH3 asymmetric, CH2 asymmetric, CH3 symmetric and CH2 symmetric groups, (ii) unsaturation (Cdbnd C) group, and (iii) carbonyl ester (Cdbnd O) group were determined. Spectral differences of functional groups were detected by hierarchical cluster analysis (HCA) and principal components analysis (PCA). The results showed that the combination treatments significantly inflicted modifications (P spectroscopy. These changes were associated with nutrient profiles and functionality.

  8. Statistical Algorithms for Models in State Space Using SsfPack 2.2

    NARCIS (Netherlands)

    Koopman, S.J.M.; Shephard, N.; Doornik, J.A.

    1998-01-01

    This paper discusses and documents the algorithms of SsfPack 2.2. SsfPack is a suite of C routines for carrying out computations involving the statistical analysis of univariate and multivariate models in state space form. The emphasis is on documenting the link we have made to the Ox computing envi

  9. Statistical Algorithms for Models in State Space Using SsfPack 2.2

    NARCIS (Netherlands)

    Koopman, S.J.M.; Shephard, N.; Doornik, J.A.

    1998-01-01

    This paper discusses and documents the algorithms of SsfPack 2.2. SsfPack is a suite of C routines for carrying out computations involving the statistical analysis of univariate and multivariate models in state space form. The emphasis is on documenting the link we have made to the Ox computing envi

  10. Plastic Surgery Statistics

    Science.gov (United States)

    ... PSN PSEN GRAFT Contact Us News Plastic Surgery Statistics Plastic surgery procedural statistics from the American Society of Plastic Surgeons. Statistics by Year Print 2016 Plastic Surgery Statistics 2015 ...

  11. MQSA National Statistics

    Science.gov (United States)

    ... Standards Act and Program MQSA Insights MQSA National Statistics Share Tweet Linkedin Pin it More sharing options ... but should level off with time. Archived Scorecard Statistics 2017 Scorecard Statistics 2016 Scorecard Statistics (Archived) 2015 ...

  12. 病例对照设计为基础的候选基因关联研究中交互作用的统计方法进展%Progress of statistical methods for testing interactions in candidate gene association studies based on case-control design

    Institute of Scientific and Technical Information of China (English)

    金如锋

    2011-01-01

    候选基因关联研究中基因-基因、基因-环境交互作用的统计分析有利于揭示疾病的发生机制.本文针对病例对照设计的候选基因关联研究,综述交互作用的统计方法及其进展.交互作用的统计方法包括参数法和非参数法.参数法中最常用的为Logistic回归模型,非参数法主要是数据挖掘方法.有4类数据挖掘方法可用于候选基因关联研究,包括降维法、基于树的方法、模式识别法和贝叶斯法.本文对最常用且可靠的几种数据挖掘方法(多因子降维法、分类回归树、随机森林、贝叶斯上位效应关联图谱)的原理、分析过程和优缺点予以比较.参数法和非参数法分析交互作用时各有优缺点;低维数据的分析可采用参数法和非参数法,高维数据的分析则主要采用非参数法.随着基因分型技术的发展,可检测的SNP规模逐渐增大,使得非参数方法的应用越来越广.%Testing for gene-gene and gene-environment interactions in candidate gene association studies will help to reveal possible mechanisms underlying diseases. This article summarized the progress of statistical methods for testing interactions in candidate gene association studies based on case-control design. Parametric and non-parametric methods can be used to detect the interactions. Logistic regression is the most frequently used parametric method,and data mining techniques offer a variety of alternative non-parametric methods. Data mining techniques that can be applied in association studies consist of dimension reduction, tree-based approach, pattern recognition and Bayesian methods. Among alternative non-parametric methods we concentrated on the four methods which have become popular and are reliable for detection of interactions, including multifactor dimensionality reduction (MDR),classification and regression tree (CART), random forest, and Bayesian epistasis association mapping (BEAM). The principles

  13. Predicting Warm Season Nocturnal Cloud-To-Ground Lightning Near Cape Canaveral, Florida

    Science.gov (United States)

    2008-12-19

    rendered the standard t-test inappropriate. However, Mielke et al. (1976, 1981) has 12 developed a non-parametric test for investigating univariate...higher confidence level. A detailed description of the procedure can be found in Mielke et al. (1981). In assessing the statistical significance of...1984: Average diurnal variation of summer lightning over the Florida peninsula. Mon. Wea. Rev., 112, 1134-1140. Mielke , P. W., K. J. Berry, and E. S

  14. Predict! Teaching Statistics Using Informational Statistical Inference

    Science.gov (United States)

    Makar, Katie

    2013-01-01

    Statistics is one of the most widely used topics for everyday life in the school mathematics curriculum. Unfortunately, the statistics taught in schools focuses on calculations and procedures before students have a chance to see it as a useful and powerful tool. Researchers have found that a dominant view of statistics is as an assortment of tools…

  15. Statistical Modeling of Large-Scale Scientific Simulation Data

    Energy Technology Data Exchange (ETDEWEB)

    Eliassi-Rad, T; Baldwin, C; Abdulla, G; Critchlow, T

    2003-11-15

    With the advent of massively parallel computer systems, scientists are now able to simulate complex phenomena (e.g., explosions of a stars). Such scientific simulations typically generate large-scale data sets over the spatio-temporal space. Unfortunately, the sheer sizes of the generated data sets make efficient exploration of them impossible. Constructing queriable statistical models is an essential step in helping scientists glean new insight from their computer simulations. We define queriable statistical models to be descriptive statistics that (1) summarize and describe the data within a user-defined modeling error, and (2) are able to answer complex range-based queries over the spatiotemporal dimensions. In this chapter, we describe systems that build queriable statistical models for large-scale scientific simulation data sets. In particular, we present our Ad-hoc Queries for Simulation (AQSim) infrastructure, which reduces the data storage requirements and query access times by (1) creating and storing queriable statistical models of the data at multiple resolutions, and (2) evaluating queries on these models of the data instead of the entire data set. Within AQSim, we focus on three simple but effective statistical modeling techniques. AQSim's first modeling technique (called univariate mean modeler) computes the ''true'' (unbiased) mean of systematic partitions of the data. AQSim's second statistical modeling technique (called univariate goodness-of-fit modeler) uses the Andersen-Darling goodness-of-fit method on systematic partitions of the data. Finally, AQSim's third statistical modeling technique (called multivariate clusterer) utilizes the cosine similarity measure to cluster the data into similar groups. Our experimental evaluations on several scientific simulation data sets illustrate the value of using these statistical models on large-scale simulation data sets.

  16. Statistical group differences in anatomical shape analysis using Hotelling T2 metric

    Science.gov (United States)

    Styner, Martin; Oguz, Ipek; Xu, Shun; Pantazis, Dimitrios; Gerig, Guido

    2007-03-01

    Shape analysis has become of increasing interest to the neuroimaging community due to its potential to precisely locate morphological changes between healthy and pathological structures. This manuscript presents a comprehensive set of tools for the computation of 3D structural statistical shape analysis. It has been applied in several studies on brain morphometry, but can potentially be employed in other 3D shape problems. Its main limitations is the necessity of spherical topology. The input of the proposed shape analysis is a set of binary segmentation of a single brain structure, such as the hippocampus or caudate. These segmentations are converted into a corresponding spherical harmonic description (SPHARM), which is then sampled into a triangulated surfaces (SPHARM-PDM). After alignment, differences between groups of surfaces are computed using the Hotelling T2 two sample metric. Statistical p-values, both raw and corrected for multiple comparisons, result in significance maps. Additional visualization of the group tests are provided via mean difference magnitude and vector maps, as well as maps of the group covariance information. The correction for multiple comparisons is performed via two separate methods that each have a distinct view of the problem. The first one aims to control the family-wise error rate (FWER) or false-positives via the extrema histogram of non-parametric permutations. The second method controls the false discovery rate and results in a less conservative estimate of the false-negatives. Prior versions of this shape analysis framework have been applied already to clinical studies on hippocampus and lateral ventricle shape in adult schizophrenics. The novelty of this submission is the use of the Hotelling T2 two-sample group difference metric for the computation of a template free statistical shape analysis. Template free group testing allowed this framework to become independent of any template choice, as well as it improved the

  17. Hyperspectral Raman imaging of human prostatic cells: An attempt to differentiate normal and malignant cell lines by univariate and multivariate data analysis

    Science.gov (United States)

    Musto, P.; Calarco, A.; Pannico, M.; La Manna, P.; Margarucci, S.; Tafuri, A.; Peluso, G.

    2017-02-01

    Hyperspectral Raman images of human prostatic cells have been collected and analysed with several approaches to reveal differences among normal and tumor cell lines. The objective of the study was to test the potential of different chemometric methods in providing diagnostic responses. We focused our analysis on the ν(Csbnd H) region (2800-3100 cm- 1) owing to its optimal Signal-to-Noise ratio and because the main differences between the spectra of the two cell lines were observed in this frequency range. Multivariate analysis identified two principal components, which were positively recognized as due to the protein and the lipid fractions, respectively. The tumor cells exhibited a modified distribution of the cytoplasmatic lipid fraction (mainly localized alongside the cell boundary) which may result very useful for a preliminary screening. Principal Component analysis was found to provide high contrast and to be well suited for image-processing purposes. Self-Modelling Curve Resolution made available meaningful spectra and relative-concentration values; it revealed a 97% increase of the lipid fraction in the tumor cell with respect to the control. Finally, a univariate approach confirmed significant and reproducible differences between normal and tumor cells.

  18. Univariate and multiple linear regression analyses for 23 single nucleotide polymorphisms in 14 genes predisposing to chronic glomerular diseases and IgA nephropathy in Han Chinese

    Directory of Open Access Journals (Sweden)

    Hui Wang

    2014-01-01

    Full Text Available Immunoglobulin A nephropathy (IgAN is a complex trait regulated by the inter-action among multiple physiologic regulatory systems and probably involving numerous genes, which leads to inconsistent findings in genetic studies. One possibility of failure to replicate some single-locus results is that the underlying genetics of IgAN nephropathy is based on multiple genes with minor effects. To learn the association between 23 single nucleotide polymorphisms (SNPs in 14 genes predisposing to chronic glomerular diseases and IgAN in Han males, the 23 SNPs genotypes of 21 Han males were detected and analyzed with a BaiO gene chip, and their asso-ciations were analyzed with univariate analysis and multiple linear regression analysis. Analysis showed that CTLA4 rs231726 and CR2 rs1048971 revealed a significant association with IgAN. These findings support the multi-gene nature of the etiology of IgAN and propose a potential gene-gene interactive model for future studies.

  19. Univariate and simplex optimization for the flow-injection spectrophotometric determination of copper using nitroso-R salt as a complexing agent.

    Science.gov (United States)

    Purachat, B; Liawruangrath, S; Sooksamiti, P; Rattanaphani, S; Buddhasukh, D

    2001-03-01

    A simple colorimetric flow-injection system for the determination of Cu(II) based on a complexation reaction with nitroso-R salt is described. The chemical and FIA variables were established using the univariate and simplex methods. A small volume of Cu(II) was mixed with merged streams of nitroso-R salt and acetate buffer solutions. The absorbance of the complex was continuously monitored at 492 nm. The calibration curve over the concentration range 1.0-7.0 microg ml(-1) was obtained. The relative standard deviation for determining 4.0 microg ml(-1) Cu(II) was 0.47% (n = 11). The detection limit (3sigma) was 0.68 microg ml(-1) and the sample throughput was 150 h(-1). The validity of the method has been satisfactorily examined for the determination of Cu(II) in wastewater and copper ore samples. The accuracy was found to be high, because the student t-values were calculated to be less than the theoretical values when the results were compared with those obtained by FAAS.

  20. Multiple univariate data analysis reveals the inulin effects on the high-fat-diet induced metabolic alterations in rat myocardium and testicles in the preobesity state.

    Science.gov (United States)

    Duan, Yixuan; An, Yanpeng; Li, Ning; Liu, Bifeng; Wang, Yulan; Tang, Huiru

    2013-07-01

    Obesity is a worldwide epidemic and a well-known risk factor for many diseases affecting billions of people's health and well-being. However, little information is available for metabolic changes associated with the effects of obesity development and interventions on cardiovascular and reproduction systems. Here, we systematically analyzed the effects of high-fat diet (HFD) and inulin intake on the metabolite compositions of myocardium and testicle using NMR spectroscopy. We developed a useful high-throughput method based on multiple univariate data analysis (MUDA) to visualize and efficiently extract information on metabolites significantly affected by an intervention. We found that HFD caused widespread metabolic changes in both rat myocardium and testicles involving fatty acid β-oxidation together with the metabolisms of choline, amino acids, purines and pyrimidines even before HFD caused significant body-weight increases. Inulin intake ameliorated some of the HFD-induced metabolic changes in both myocardium (3-HB, lactate and guanosine) and testicle tissues (3-HB, inosine and betaine). A remarkable elevation of scyllo-inositol was also observable with inulin intake in both tissues. These findings offered essential information for the inulin effects on the HFD-induced metabolic changes and demonstrated this MUDA method as a powerful alternative to traditionally used multivariate data analysis for metabonomics.

  1. Clinical and Histological Prognostic Factors in Axillary Node-Negative BreastCancer: Univariate and Multivariate Analysis with Relation to 5-Year Recurrence.

    Science.gov (United States)

    Khanna; Tokuda; Shibuya; Tanaka; Sekine; Tajima; Osamura; Mitomi

    1995-04-30

    In the recent years several studies have shown that about 30% of cases with axillary node-nagative breast cancer suffer relapse of the disease. Our attempt was made to evaluate the most significant prognostic factors to predict this high risk group which may be benefited from adjuvant treatment. For this purpose, we selected 9 patients out of 80 cases of node-negative breast cancer who had been followed up at least for 5 years and had the recurrence of the disease. For comparison, 16 patients from the same group who did not have relapse were selected on a random basis. Histology, receptor status, AgNOR, DNA flow cytometry and various immunohistochemical parameters were compared between the groups with recurrence and that without recurrence. On univariate analysis, tumor size, immunohistochemical expressions of PCNA, MIB-1, c-erbB-2 and S-phase fraction were significantly different between the above two groups. By multivariate analysis, immunohistochemical c-erbB-2 expression (more than 50% of cancer cells) was an independent parameter. As a summary from our studies, c-erbB-2 immunohistochemical staining on paraffin sections might be the best independent prognostic factor in axillary node-negative breast cancers.

  2. Toward improved statistical methods for analyzing Cotinine-Biomarker health association data

    Directory of Open Access Journals (Sweden)

    Clark John D

    2011-10-01

    Full Text Available Abstract Background Serum cotinine, a metabolite of nicotine, is frequently used in research as a biomarker of recent tobacco smoke exposure. Historically, secondhand smoke (SHS research uses suboptimal statistical methods due to censored serum cotinine values, meaning a measurement below the limit of detection (LOD. Methods We compared commonly used methods for analyzing censored serum cotinine data using parametric and non-parametric techniques employing data from the 1999-2004 National Health and Nutrition Examination Surveys (NHANES. To illustrate the differences in associations obtained by various analytic methods, we compared parameter estimates for the association between cotinine and the inflammatory marker homocysteine using complete case analysis, single and multiple imputation, "reverse" Kaplan-Meier, and logistic regression models. Results Parameter estimates and statistical significance varied according to the statistical method used with censored serum cotinine values. Single imputation of censored values with either 0, LOD or LOD/√2 yielded similar estimates and significance; multiple imputation method yielded smaller estimates than the other methods and without statistical significance. Multiple regression modelling using the "reverse" Kaplan-Meier method yielded statistically significant estimates that were larger than those from parametric methods. Conclusions Analyses of serum cotinine data with values below the LOD require special attention. "Reverse" Kaplan-Meier was the only method inherently able to deal with censored data with multiple LODs, and may be the most accurate since it avoids data manipulation needed for use with other commonly used statistical methods. Additional research is needed into the identification of optimal statistical methods for analysis of SHS biomarkers subject to a LOD.

  3. The foundations of statistics

    CERN Document Server

    Savage, Leonard J

    1972-01-01

    Classic analysis of the foundations of statistics and development of personal probability, one of the greatest controversies in modern statistical thought. Revised edition. Calculus, probability, statistics, and Boolean algebra are recommended.

  4. Adrenal Gland Tumors: Statistics

    Science.gov (United States)

    ... Gland Tumor: Statistics Request Permissions Adrenal Gland Tumor: Statistics Approved by the Cancer.Net Editorial Board , 03/ ... primary adrenal gland tumor is very uncommon. Exact statistics are not available for this type of tumor ...

  5. Blood Facts and Statistics

    Science.gov (United States)

    ... Facts and Statistics Printable Version Blood Facts and Statistics Facts about blood needs Facts about the blood ... to Top Learn About Blood Blood Facts and Statistics Blood Components Whole Blood and Red Blood Cells ...

  6. Algebraic statistics computational commutative algebra in statistics

    CERN Document Server

    Pistone, Giovanni; Wynn, Henry P

    2000-01-01

    Written by pioneers in this exciting new field, Algebraic Statistics introduces the application of polynomial algebra to experimental design, discrete probability, and statistics. It begins with an introduction to Gröbner bases and a thorough description of their applications to experimental design. A special chapter covers the binary case with new application to coherent systems in reliability and two level factorial designs. The work paves the way, in the last two chapters, for the application of computer algebra to discrete probability and statistical modelling through the important concept of an algebraic statistical model.As the first book on the subject, Algebraic Statistics presents many opportunities for spin-off research and applications and should become a landmark work welcomed by both the statistical community and its relatives in mathematics and computer science.

  7. PROBABILITY AND STATISTICS.

    Science.gov (United States)

    STATISTICAL ANALYSIS, REPORTS), (*PROBABILITY, REPORTS), INFORMATION THEORY, DIFFERENTIAL EQUATIONS, STATISTICAL PROCESSES, STOCHASTIC PROCESSES, MULTIVARIATE ANALYSIS, DISTRIBUTION THEORY , DECISION THEORY, MEASURE THEORY, OPTIMIZATION

  8. [Evaluation of using statistical methods in selected national medical journals].

    Science.gov (United States)

    Sych, Z

    1996-01-01

    most important methods of mathematical statistics such as parametric tests of significance, analysis of variance (in single and dual classifications). non-parametric tests of significance, correlation and regression. The works, in which use was made of either multiple correlation or multiple regression or else more complex methods of studying the relationship for two or more numbers of variables, were incorporated into the works whose statistical methods were constituted by correlation and regression as well as other methods, e.g. statistical methods being used in epidemiology (coefficients of incidence and morbidity, standardization of coefficients, survival tables) factor analysis conducted by Jacobi-Hotellng's method, taxonomic methods and others. On the basis of the performed studies it has been established that the frequency of employing statistical methods in the six selected national, medical journals in the years 1988-1992 was 61.1-66.0% of the analyzed works (Tab. 3), and they generally were almost similar to the frequency provided in English language medical journals. On a whole, no significant differences were disclosed in the frequency of applied statistical methods (Tab. 4) as well as in frequency of random tests (Tab. 3) in the analyzed works, appearing in the medical journals in respective years 1988-1992. The most frequently used statistical methods in analyzed works for 1988-1992 were the measures of position 44.2-55.6% and measures of dispersion 32.5-38.5% as well as parametric tests of significance 26.3-33.1% of the works analyzed (Tab. 4). For the purpose of increasing the frequency and reliability of the used statistical methods, the didactics should be widened in the field of biostatistics at medical studies and postgraduation training designed for physicians and scientific-didactic workers.

  9. Análise não-paramétrica da sanidade de sementes e índices de eliminação e classificação de genótipos de soja Non-parametric analysis of seed sanity and elimination and ranking indices of soybean genotypes

    Directory of Open Access Journals (Sweden)

    Edmar Soares de Vasconcelos

    2008-03-01

    Full Text Available O objetivo deste trabalho foi avaliar genótipos de soja quanto à sanidade de semente, com um método de análise, pelo qual se obtém índices de sanidade (eliminação e classificação com base em análise não-paramétrica. Esses índices consistiram em eliminar os genótipos com incidência de patógenos acima de um dado valor, estabelecido pelo experimentador e, em seguida, classificar os genótipos não eliminados, por ordem de incidência desses patógenos. A fim de comprovar sua eficácia, realizaram-se a simulação e comparação desse método com outros, e seu uso em dados de germinação e sanidade das sementes de cultivares e linhagens de soja, de ensaios finais do Programa de Melhoramento de Soja, do Departamento de Fitotecnia, da Universidade Federal de Viçosa, conduzidos no ano agrícola de 2002/2003. Os pesos das variáveis e os limites de corte, utilizados nos índices, foram estabelecidos tendo-se levado em consideração estudos que relacionam a sanidade das sementes e sua germinação. A utilização dos índices propostos permite classificar genótipos de soja, quanto à qualidade sanitária das sementes, e eliminar das análises os genótipos que não atingiram os níveis mínimos requeridos.The objective of this work was to assess soybean genotypes for seed sanity, with a method by which a sanity index (elimination and classification is obtained based on non-parametric analysis. This index consisted in the elimination of genotypes with pathogen incidence above a certain value, established by the researcher, and then the classification of the noneliminated genotypes in the first step, ordering them according to the incidence of the pathogens. To verify its effectiveness, it was accomplished a simulation study and comparison of this proposed method with others, and its use in germination and sanity data of seeds from soybean lineages and cultivars of final experiments of the Soybean Breeding Program of Departmento de

  10. Seleção de híbridos diplóides (AA de bananeira com base em três índices não paramétricos Selection of (AA diploid banana hybrids using three non-parametric indices

    Directory of Open Access Journals (Sweden)

    Lauro Saraiva Lessa

    2010-01-01

    Full Text Available Objetivou-se selecionar híbridos diplóides (AA de bananeira com base em três índices não paramétricos, a fim de orientar a seleção e aumentar o aproveitamento da variabilidade existente no Banco de Germoplasma de Bananeira da Embrapa Mandioca e Fruticultura Tropical. Foram avaliados 11 híbridos, no delineamento de blocos ao acaso, com quatro repetições. As parcelas constituíram-se de seis plantas, espaçadas de 2,5 m x 2,5 m, tendo na bordadura plantas da cultivar Pacovan. Tomaram-se dados dos seguintes caracteres: altura da planta, diâmetro do pseudocaule, número de filhos na floração, número de folhas na floração, ciclo da planta do plantio à emissão do cacho, presença de pólen, número de pencas, número de frutos, comprimento do fruto e resistência à Sigatoka-amarela. As médias desses 10 caracteres foram empregadas no cálculo dos índices multiplicativos, de soma de classificação e da distância genótipo-ideótipo. Os dois híbridos de melhor desempenho geral, o SH3263 e o 1318-01, foram classificados, respectivamente, em primeiro e segundo lugares pelos índices multiplicativos e de soma de classificação, enquanto o índice da distância genótipo-ideótipo os classificou em primeiro e quarto lugares respectivamente. Embora os três índices tenham demonstrado uma boa correspondência entre o desempenho geral dos híbridos e a sua classificação, os índices multiplicativo e de soma de classificação propiciaram classificação mais adequada desses híbridos.The objective of the present study was to select diploids (AA hybrids of banana based on three non-parametric indices as to guide the selection and increase the use of the variability present in the Banana Germplasm Bank of Embrapa Cassava and Tropical Fruits. Eleven hybrids were evaluated in random blocks with four replicates. The plots consisted of six plants spaced 2.5 m x 2.5 m whereas the border rows were from the Pacovan cultivar. The following

  11. A consistent framework for Horton regression statistics that leads to a modified Hack's law

    Science.gov (United States)

    Furey, Peter R.; Troutman, Brent M.

    2008-12-01

    A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ω. Data show that ω plays a statistically significant role in the modified Hack's law expression.

  12. A consistent framework for Horton regression statistics that leads to a modified Hack's law

    Science.gov (United States)

    Furey, P.R.; Troutman, B.M.

    2008-01-01

    A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.

  13. Appropriateness of reporting statistical results in orthodontics: the dominance of P values over confidence intervals.

    Science.gov (United States)

    Polychronopoulou, Argy; Pandis, Nikolaos; Eliades, Theodore

    2011-02-01

    The purpose of this study was to search the orthodontic literature and determine the frequency of reporting of confidence intervals (CIs) in orthodontic journals with an impact factor. The six latest issues of the American Journal of Orthodontics and Dentofacial Orthopedics, the European Journal of Orthodontics, and the Angle Orthodontist were hand searched and the reporting of CIs, P values, and implementation of univariate or multivariate statistical analyses were recorded. Additionally, studies were classified according to the type/design as cross-sectional, case-control, cohort, and clinical trials, and according to the subject of the study as growth/genetics, behaviour/psychology, diagnosis/treatment, and biomaterials/biomechanics. The data were analyzed using descriptive statistics followed by univariate examination of statistical associations, logistic regression, and multivariate modelling. CI reporting was very limited and was recorded in only 6 per cent of the included published studies. CI reporting was independent of journal, study area, and design. Studies that used multivariate statistical analyses had a higher probability of reporting CIs compared with those using univariate statistical analyses. Misunderstanding of the use of P values and CIs may have important implications in implementation of research findings in clinical practice.

  14. Explorations in statistics: statistical facets of reproducibility.

    Science.gov (United States)

    Curran-Everett, Douglas

    2016-06-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This eleventh installment of Explorations in Statistics explores statistical facets of reproducibility. If we obtain an experimental result that is scientifically meaningful and statistically unusual, we would like to know that our result reflects a general biological phenomenon that another researcher could reproduce if (s)he repeated our experiment. But more often than not, we may learn this researcher cannot replicate our result. The National Institutes of Health and the Federation of American Societies for Experimental Biology have created training modules and outlined strategies to help improve the reproducibility of research. These particular approaches are necessary, but they are not sufficient. The principles of hypothesis testing and estimation are inherent to the notion of reproducibility in science. If we want to improve the reproducibility of our research, then we need to rethink how we apply fundamental concepts of statistics to our science.

  15. Statistics using R

    CERN Document Server

    Purohit, Sudha G; Deshmukh, Shailaja R

    2015-01-01

    STATISTICS USING R will be useful at different levels, from an undergraduate course in statistics, through graduate courses in biological sciences, engineering, management and so on. The book introduces statistical terminology and defines it for the benefit of a novice. For a practicing statistician, it will serve as a guide to R language for statistical analysis. For a researcher, it is a dual guide, simultaneously explaining appropriate statistical methods for the problems at hand and indicating how these methods can be implemented using the R language. For a software developer, it is a guide in a variety of statistical methods for development of a suite of statistical procedures.

  16. Industrial statistics with Minitab

    CERN Document Server

    Cintas, Pere Grima; Llabres, Xavier Tort-Martorell

    2012-01-01

    Industrial Statistics with MINITAB demonstrates the use of MINITAB as a tool for performing statistical analysis in an industrial context. This book covers introductory industrial statistics, exploring the most commonly used techniques alongside those that serve to give an overview of more complex issues. A plethora of examples in MINITAB are featured along with case studies for each of the statistical techniques presented. Industrial Statistics with MINITAB: Provides comprehensive coverage of user-friendly practical guidance to the essential statistical methods applied in industry.Explores

  17. Statistics For Dummies

    CERN Document Server

    Rumsey, Deborah

    2011-01-01

    The fun and easy way to get down to business with statistics Stymied by statistics? No fear ? this friendly guide offers clear, practical explanations of statistical ideas, techniques, formulas, and calculations, with lots of examples that show you how these concepts apply to your everyday life. Statistics For Dummies shows you how to interpret and critique graphs and charts, determine the odds with probability, guesstimate with confidence using confidence intervals, set up and carry out a hypothesis test, compute statistical formulas, and more.Tracks to a typical first semester statistics cou

  18. CMS Program Statistics

    Data.gov (United States)

    U.S. Department of Health & Human Services — The CMS Office of Enterprise Data and Analytics has developed CMS Program Statistics, which includes detailed summary statistics on national health care, Medicare...

  19. Alcohol Facts and Statistics

    Science.gov (United States)

    ... Standard Drink? Drinking Levels Defined Alcohol Facts and Statistics Print version Alcohol Use in the United States: ... 1245, 2004. PMID: 15010446 11 National Center for Statistics and Analysis. 2014 Crash Data Key Findings (Traffic ...

  20. Bureau of Labor Statistics

    Science.gov (United States)

    ... Statistics Students' Pages Errata Other Statistical Sites Subjects Inflation & Prices » Consumer Price Index Producer Price Indexes Import/Export Price ... Choose a Subject Employment and Unemployment Employment Unemployment Inflation, Prices, and ... price indexes Consumer spending Industry price indexes Pay ...